Pull clk fix from Stephen Boyd:
"One hotfix for a NULL pointer deref in the Renesas usb clk driver"
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: renesas: rcar-usb2-clock-sel: Fix kernel NULL pointer dereference
Pull scheduler fixes from Borislav Petkov:
- Have get_push_task() check whether current has migration disabled and
thus avoid useless invocations of the migration thread
- Rework initialization flow so that all rq->core's are initialized,
even of CPUs which have not been onlined yet, so that iterating over
them all works as expected
* tag 'sched_urgent_for_v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched: Fix get_push_task() vs migrate_disable()
sched: Fix Core-wide rq->lock for uninitialized CPUs
Pull irq fix from Borislav Petkov:
- Have msix_mask_all() check a global control which says whether MSI-X
masking should be done and thus make it usable on Xen-PV too
* tag 'irq_urgent_for_v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
PCI/MSI: Skip masking MSI-X on Xen PV
Pull perf fixes from Borislav Petkov:
- Prevent the amd/power module from being removed while in use
- Mark AMD IBS as not supporting content exclusion
- Add a workaround for AMD erratum #1197 where IBS registers might not
be restored properly after exiting CC6 state
- Fix a potential truncation of a 32-bit variable due to shifting
- Read the correct bits describing the number of configurable address
ranges on Intel PT
* tag 'perf_urgent_for_v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/amd/power: Assign pmu.module
perf/x86/amd/ibs: Extend PERF_PMU_CAP_NO_EXCLUDE to IBS Op
perf/x86/amd/ibs: Work around erratum #1197
perf/x86/intel/uncore: Fix integer overflow on 23 bit left shift of a u32
perf/x86/intel/pt: Fix mask of num_address_ranges
Pull x86 fixes from Borislav Petkov:
- Fix build error on RHEL where -Werror=maybe-uninitialized is set.
- Restore the firmware's IDT when calling EFI boot services and before
ExitBootServices() has been called. This fixes a boot failure on what
appears to be a tablet with 32-bit UEFI running a 64-bit kernel.
* tag 'x86_urgent_for_v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/resctrl: Fix a maybe-uninitialized build warning treated as error
x86/efi: Restore Firmware IDT before calling ExitBootServices()
The probe was manually passing NULL instead of dev to devm_clk_hw_register.
This caused a Unable to handle kernel NULL pointer dereference error.
Fix this by passing 'dev'.
Signed-off-by: Adam Ford <aford173@gmail.com>
Fixes: a20a40a8bb ("clk: renesas: rcar-usb2-clock-sel: Fix error handling in .probe()")
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Pull SCSI fix from James Bottomley:
"A single fix for a race introduced by a fix that went into 5.14-rc5"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: core: Fix hang of freezing queue between blocking and running device
Pull USB fixes from Greg KH:
"Here are a few tiny USB fixes for reported issues with some USB
drivers.
These fixes include:
- gadget driver fixes for regressions
- tcpm driver fix
- dwc3 driver fixes
- xhci renesas firmware loading fix, again.
- usb serial option driver device id addition
- usb serial ch341 revert for regression
All all of these have been in linux-next with no reported problems"
* tag 'usb-5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
usb: gadget: u_audio: fix race condition on endpoint stop
usb: gadget: f_uac2: fixup feedback endpoint stop
usb: typec: tcpm: Raise vdm_sm_running flag only when VDM SM is running
usb: renesas-xhci: Prefer firmware loading on unknown ROM state
usb: dwc3: gadget: Stop EP0 transfers during pullup disable
usb: dwc3: gadget: Fix dwc3_calc_trbs_left()
Revert "USB: serial: ch341: fix character loss at high transfer rates"
USB: serial: option: add new VID/PID to support Fibocom FG150
Pull powerpc fixes from Michael Ellerman:
- Fix scv implicit soft-mask table for relocated (eg. kdump) kernels
- Re-enable ARCH_ENABLE_SPLIT_PMD_PTLOCK, which was disabled due to a
typo
Thanks to Lukas Bulwahn, Nicholas Piggin, and Daniel Axtens.
* tag 'powerpc-5.14-7' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/64s: Fix scv implicit soft-mask table for relocated kernels
powerpc: Re-enable ARCH_ENABLE_SPLIT_PMD_PTLOCK
Pull block fixes from Jens Axboe:
- Revert the mq-deadline priority handling, it's causing serious
performance regressions. While experimental patches exists to fix
this up, it's too late to do so now. Revert it and re-do it properly
for 5.15 instead.
- Fix a NULL vs IS_ERR() regression in this release (Dan)
- Fix a mq-deadline accounting regression in this release (Bart)
- Mark cryptoloop as deprecated. It's broken and dm-crypt fully
supports it, and it's actively intefering with loop. Plan on removal
for 5.16 (Christoph)
* tag 'block-5.14-2021-08-27' of git://git.kernel.dk/linux-block:
cryptoloop: add a deprecation warning
pd: fix a NULL vs IS_ERR() check
Revert "block/mq-deadline: Prioritize high-priority requests"
mq-deadline: Fix request accounting
Pull ARM SoC fixes from Arnd Bergmann:
"Just two trivial fixes from the reset driver tree, nothing else came
up since the last soc fixes"
* tag 'soc-fixes-5.14-4' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
reset: reset-zynqmp: Fixed the argument data type
reset: RESET_MCHP_SPARX5 should depend on ARCH_SPARX5
Pull ACPI fix from Rafael Wysocki:
"Fix a regression introduced during this cycle that has been partially
addressed by an earlier commit (Andy Shevchenko)"
* tag 'acpi-5.14-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
media: ipu3-cio2: Drop reference on error path in cio2_bridge_connect_sensor()
Pull power management fixes from Rafael Wysocki:
"These fix two issues introduced during this cycle, one of which is a
regression and the other one affects new code.
Specifics:
- Prevent the operating performance points (OPP) code from crashing
when some entries in the table of required OPPs are set to error
pointer values (Marijn Suijten)
- Prevent the generic power domains (genpd) framework from
incorrectly overriding the performance state of a device set by its
driver while it is runtime-suspended or when runtime PM of it is
disabled (Dmitry Osipenko)"
* tag 'pm-5.14-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PM: domains: Improve runtime PM performance state handling
opp: core: Check for pending links before reading required_opp pointers
Pull RISC-V fixes from Palmer Dabbelt:
- device tree updates for the Microsemi Polarfire development kit that
fix some mismatches between the u-boot and Linux enternet entries
- ensure that the F register state is correctly reflected in core dumps
* tag 'riscv-for-linus-5.14-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: dts: microchip: Add ethernet0 to the aliases node
riscv: dts: microchip: Use 'local-mac-address' for emac1
riscv: Ensure the value of FP registers in the core dump file is up to date
Pull MMC host fix from Ulf Hansson:
- sdhci-iproc: Fix clock error for ACPI rpi's
* tag 'mmc-v5.14-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
Revert "mmc: sdhci-iproc: Set SDHCI_QUIRK_CAP_CLOCK_BASE_BROKEN on BCM2711"
Support for cryptoloop has been officially marked broken and deprecated
in favor of dm-crypt (which supports the same broken algorithms if
needed) in Linux 2.6.4 (released in March 2004), and support for it has
been entirely removed from losetup in util-linux 2.23 (released in April
2013). Add a warning and a deprecation schedule.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210827163250.255325-1-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pull ARM fix from Russell King:
"Resolve a Keystone 2 kernel mapping regression"
* tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm:
ARM: 9104/2: Fix Keystone 2 kernel mapping regression
This reverts commit 419dd626e3.
It turned out that the change from the reverted commit breaks the ACPI
based rpi's because it causes the 100Mhz max clock to be overridden to the
return from sdhci_iproc_get_max_clock(), which is 0 because there isn't a
OF/DT based clock device.
Reported-by: Jeremy Linton <jeremy.linton@arm.com>
Fixes: 419dd626e3 ("mmc: sdhci-iproc: Set SDHCI_QUIRK_CAP_CLOCK_BASE_BROKEN on BCM2711")
Acked-by: Stefan Wahren <stefan.wahren@i2se.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
When the uac2 function is stopped, there seems to be an issue reported on
some platforms (Intel Merrifield at least)
BUG: kernel NULL pointer dereference, address: 0000000000000008
...
RIP: 0010:dwc3_gadget_del_and_unmap_request+0x19/0xe0
...
Call Trace:
dwc3_remove_requests.constprop.0+0x12f/0x170
__dwc3_gadget_ep_disable+0x7a/0x160
dwc3_gadget_ep_disable+0x3d/0xd0
usb_ep_disable+0x1c/0x70
u_audio_stop_capture+0x79/0x120 [u_audio]
afunc_set_alt+0x73/0x80 [usb_f_uac2]
composite_setup+0x224/0x1b90 [libcomposite]
The issue happens only when the gadget is using the sync type "async", not
"adaptive". This indicates that problem is coming from the feedback
endpoint, which is only used with async synchronization mode.
The problem is that request is freed regardless of usb_ep_dequeue(), which
ends up badly if the request is not actually dequeued yet.
Update the feedback endpoint free function to release the endpoint the same
way it is done for the data endpoint, which takes care of the problem.
Fixes: 24f779dac8 ("usb: gadget: f_uac2/u_audio: add feedback endpoint support")
Reported-by: Ferry Toth <ftoth@exalondelft.nl>
Tested-by: Ferry Toth <ftoth@exalondelft.nl>
Acked-by: Felipe Balbi <balbi@kernel.org>
Signed-off-by: Jerome Brunet <jbrunet@baylibre.com>
Link: https://lore.kernel.org/r/20210827075853.266912-1-jbrunet@baylibre.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Pull drm fixes from Dave Airlie:
"Last set of fixes for 5.14, nothing major a couple of i915, couple of
imx and a few amdgpu. All pretty small.
i915:
- Fix syncmap memory leak
- Drop redundant display port debug print
amdgpu:
- Fix for pinning display buffers multiple times
- Fix delayed work handling for GFXOFF
- Fix build when CONFIG_SUSPEND is not set
imx:
- fix planar offset calculations
- fix accidental partial revert"
* tag 'drm-fixes-2021-08-27' of git://anongit.freedesktop.org/drm/drm:
drm/i915/dp: Drop redundant debug print
drm/i915: Fix syncmap memory leak
drm/amdgpu: Fix build with missing pm_suspend_target_state module export
drm/amdgpu: Cancel delayed work when GFXOFF is disabled
drm/amdgpu: use the preferred pin domain after the check
drm/imx: ipuv3-plane: fix accidental partial revert of 8 pixel alignment fix
gpu: ipu-v3: Fix i.MX IPU-v3 offset calculations for (semi)planar U/V formats
When running as Xen PV guest, masking MSI-X is a responsibility of the
hypervisor. The guest has no write access to the relevant BAR at all - when
it tries to, it results in a crash like this:
BUG: unable to handle page fault for address: ffffc9004069100c
#PF: supervisor write access in kernel mode
#PF: error_code(0x0003) - permissions violation
RIP: e030:__pci_enable_msix_range.part.0+0x26b/0x5f0
e1000e_set_interrupt_capability+0xbf/0xd0 [e1000e]
e1000_probe+0x41f/0xdb0 [e1000e]
local_pci_probe+0x42/0x80
(...)
The recently introduced function msix_mask_all() does not check the global
variable pci_msi_ignore_mask which is set by XEN PV to bypass the masking
of MSI[-X] interrupts.
Add the check to make this function XEN PV compatible.
Fixes: 7d5ec3d361 ("PCI/MSI: Mask all unused MSI-X entries")
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20210826170342.135172-1-marmarek@invisiblethingslab.com
Pull nfsd fix from Bruce Fields:
"This is a one-liner fix for a serious bug that can cause the server to
become unresponsive to a client, so I think it's worth the last-minute
inclusion for 5.14"
* tag 'nfsd-5.14-1' of git://linux-nfs.org/~bfields/linux:
SUNRPC: Fix XPT_BUSY flag leakage in svc_handle_xprt()...
Pull networking fixes from Jakub Kicinski:
"Networking fixes, including fixes from can and bpf.
Closing three hw-dependent regressions. Any fixes of note are in the
'old code' category. Nothing blocking release from our perspective.
Current release - regressions:
- stmmac: revert "stmmac: align RX buffers"
- usb: asix: ax88772: move embedded PHY detection as early as
possible
- usb: asix: do not call phy_disconnect() for ax88178
- Revert "net: really fix the build...", from Kalle to fix QCA6390
Current release - new code bugs:
- phy: mediatek: add the missing suspend/resume callbacks
Previous releases - regressions:
- qrtr: fix another OOB Read in qrtr_endpoint_post
- stmmac: dwmac-rk: fix unbalanced pm_runtime_enable warnings
Previous releases - always broken:
- inet: use siphash in exception handling
- ip_gre: add validation for csum_start
- bpf: fix ringbuf helper function compatibility
- rtnetlink: return correct error on changing device netns
- e1000e: do not try to recover the NVM checksum on Tiger Lake"
* tag 'net-5.14-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (43 commits)
Revert "net: really fix the build..."
net: hns3: fix get wrong pfc_en when query PFC configuration
net: hns3: fix GRO configuration error after reset
net: hns3: change the method of getting cmd index in debugfs
net: hns3: fix duplicate node in VLAN list
net: hns3: fix speed unknown issue in bond 4
net: hns3: add waiting time before cmdq memory is released
net: hns3: clear hardware resource when loading driver
net: fix NULL pointer reference in cipso_v4_doi_free
rtnetlink: Return correct error on changing device netns
net: dsa: hellcreek: Adjust schedule look ahead window
net: dsa: hellcreek: Fix incorrect setting of GCL
cxgb4: dont touch blocked freelist bitmap after free
ipv4: use siphash instead of Jenkins in fnhe_hashfun()
ipv6: use siphash in rt6_exception_hash()
can: usb: esd_usb2: esd_usb2_rx_event(): fix the interchange of the CAN RX and TX error counters
net: usb: asix: ax88772: fix boolconv.cocci warnings
net/sched: ets: fix crash when flipping from 'strict' to 'quantum'
qede: Fix memset corruption
net: stmmac: fix kernel panic due to NULL pointer dereference of buf->xdp
...
This reverts commit fb926032b3.
Zhen reports that this commit slows down mq-deadline on a 128 thread
box, going from 258K IOPS to 170-180K. My testing shows that Optane
gen2 IOPS goes from 2.3M IOPS to 1.2M IOPS on a 64 thread box.
Looking in detail at the code, the main culprit here is needing to sum
percpu counters in the dispatch hot path, leading to very high CPU
utilization there. To make matters worse, the code currently needs to
sum 2 percpu counters, and it does so in the most naive way of iterating
possible CPUs _twice_.
Since we're close to release, revert this commit and we can re-do it
with regular per-priority counters instead for the 5.15 kernel.
Link: https://lore.kernel.org/linux-block/20210826144039.2143-1-thunder.leizhen@huawei.com/
Reported-by: Zhen Lei <thunder.leizhen@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pull arm64 fix from Will Deacon:
"We received a report this week that the generic version of
pfn_valid(), which we switched to this merge window in 16c9afc776
("arm64/mm: drop HAVE_ARCH_PFN_VALID"), interacts badly with
dma_map_resource() due to the following check:
/* Don't allow RAM to be mapped */
if (WARN_ON_ONCE(pfn_valid(PHYS_PFN(phys_addr))))
return DMA_MAPPING_ERROR;
Since the ongoing saga to determine the semantics of pfn_valid() is
unlikely to be resolved this week (does it indicate valid memory, or
just the presence of a struct page, or whether that struct page has
been initialised?), just revert back to our old version of pfn_valid()
for 5.14.
Summary:
- Fix dma_map_resource() by reverting back to old pfn_valid() code"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
Partially revert "arm64/mm: drop HAVE_ARCH_PFN_VALID"
Pull ceph fixes from Ilya Dryomov:
"Two memory management fixes for the filesystem"
* tag 'ceph-for-5.14-rc8' of git://github.com/ceph/ceph-client:
ceph: fix possible null-pointer dereference in ceph_mdsmap_decode()
ceph: correctly handle releasing an embedded cap flush
This reverts commit ce78ffa3ef.
Wren and Nicolas reported that ath11k was failing to initialise QCA6390
Wi-Fi 6 device with error:
qcom_mhi_qrtr: probe of mhi0_IPCR failed with error -22
Commit ce78ffa3ef ("net: really fix the build..."), introduced in
v5.14-rc5, caused this regression in qrtr. Most likely all ath11k
devices are broken, but I only tested QCA6390. Let's revert the broken
commit so that ath11k works again.
Reported-by: Wren Turkal <wt@penguintechs.org>
Reported-by: Nicolas Schichan <nschichan@freebox.fr>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/20210826172816.24478-1-kvalo@codeaurora.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull btrfs fix from David Sterba:
"One more fix that I think qualifies for a late merge. It's a revert of
a one-liner fix that meanwhile got backported to stable kernels and we
got reports from users.
The broken fix prevents creating compressed inline extents, which
could be noticeable on space consumption.
Technically it's a regression as the patch was merged in 5.14-rc1 but
got propagated to several stable kernels and has higher exposure than
a 'typical' development cycle bug"
* tag 'for-5.14-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
Revert "btrfs: compression: don't try to compress if we don't have enough pages"
push_rt_task() attempts to move the currently running task away if the
next runnable task has migration disabled and therefore is pinned on the
current CPU.
The current task is retrieved via get_push_task() which only checks for
nr_cpus_allowed == 1, but does not check whether the task has migration
disabled and therefore cannot be moved either. The consequence is a
pointless invocation of the migration thread which correctly observes
that the task cannot be moved.
Return NULL if the task has migration disabled and cannot be moved to
another CPU.
Fixes: a7c81556ec ("sched: Fix migrate_disable() vs rt/dl balancing")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210826133738.yiotqbtdaxzjsnfj@linutronix.de
The commit 71f6428332 ("ACPI: utils: Fix reference counting in
for_each_acpi_dev_match()") moved adev assignment outside of error
path and hence made acpi_dev_put(sensor->adev) a no-op. We still
need to drop reference count on error path, and to achieve that,
replace sensor->adev by locally assigned adev.
Fixes: 71f6428332 ("ACPI: utils: Fix reference counting in for_each_acpi_dev_match()")
Depends-on: fc68f42aa7 ("ACPI: fix NULL pointer dereference")
Reported-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Currently, when query PFC configuration by dcbtool, driver will return
PFC enable status based on TC. As all priorities are mapped to TC0 by
default, if TC0 is enabled, then all priorities mapped to TC0 will be
shown as enabled status when query PFC setting, even though some
priorities have never been set.
for example:
$ dcb pfc show dev eth0
pfc-cap 4 macsec-bypass off delay 0
prio-pfc 0:off 1:off 2:off 3:off 4:off 5:off 6:off 7:off
$ dcb pfc set dev eth0 prio-pfc 0:on 1:on 2:on 3:on
$ dcb pfc show dev eth0
pfc-cap 4 macsec-bypass off delay 0
prio-pfc 0:on 1:on 2:on 3:on 4:on 5:on 6:on 7:on
To fix this problem, just returns user's PFC config parameter saved in
driver.
Fixes: cacde272dd ("net: hns3: Add hclge_dcb module for the support of DCB feature")
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The GRO configuration is enabled by default after reset. This
is incorrect and should be restored to the user-configured value.
So this restoration is added during reset initialization.
Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Currently, the cmd index is obtained in debugfs by comparing file names.
However, this method may cause errors when processing more complex file
names. So, change this method by saving cmd in private data and comparing
it when getting cmd index in debugfs for optimization.
Fixes: 5e69ea7ee2 ("net: hns3: refactor the debugfs process")
Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
VLAN list should not be added duplicate VLAN node, otherwise it would
cause "add failed" when restore VLAN from VLAN list, so this patch adds
VLAN ID check before adding node into VLAN list.
Fixes: c6075b1934 ("net: hns3: Record VF vlan tables")
Signed-off-by: Guojia Liao <liaoguojia@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In bond 4, when the link goes down and up repeatedly, the bond may get an
unknown speed, and then this port can not work.
The driver notify netif_carrier_on() before update the link state, when the
bond receive carrier on, will query the speed of the port, if the query
operation happens before updating the link state, will get an unknown
speed. So need to notify netif_carrier_on() after update the link state.
Fixes: 46a3df9f97 ("net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support")
Fixes: e2cb1dec97 ("net: hns3: Add HNS3 VF HCL(Hardware Compatibility Layer) Support")
Signed-off-by: Yonglong Liu <liuyonglong@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
After the cmdq registers are cleared, the firmware may take time to
clear out possible left over commands in the cmdq. Driver must release
cmdq memory only after firmware has completed processing of left over
commands.
Fixes: 232d0d55fc ("net: hns3: uninitialize command queue while unloading PF driver")
Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If a PF is bonded to a virtual machine and the virtual machine exits
unexpectedly, some hardware resource cannot be cleared. In this case,
loading driver may cause exceptions. Therefore, the hardware resource
needs to be cleared when the driver is loaded.
Fixes: 46a3df9f97 ("net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support")
Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If the port is going to send Discover_Identity Message, vdm_sm_running
flag was intentionally set before entering Ready States in order to
avoid the conflict because the port and the port partner might start
AMS at almost the same time after entering Ready States.
However, the original design has a problem. When the port is doing
DR_SWAP from Device to Host, it raises the flag. Later in the
tcpm_send_discover_work, the flag blocks the procedure of sending the
Discover_Identity and it might never be cleared until disconnection.
Since there exists another flag send_discover representing that the port
is going to send Discover_Identity or not, it is enough to use that flag
to prevent the conflict. Also change the timing of the set/clear of
vdm_sm_running to indicate whether the VDM SM is actually running or
not.
Fixes: c34e85fa69 ("usb: typec: tcpm: Send DISCOVER_IDENTITY from dedicated work")
Cc: stable <stable@vger.kernel.org>
Cc: Badhri Jagan Sridharan <badhri@google.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Acked-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Signed-off-by: Kyle Tso <kyletso@google.com>
Link: https://lore.kernel.org/r/20210826124201.1562502-1-kyletso@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The recent attempt to handle an unknown ROM state in the commit
d143825baf ("usb: renesas-xhci: Fix handling of unknown ROM state")
resulted in a regression and reverted later by the commit 44cf53602f
("Revert "usb: renesas-xhci: Fix handling of unknown ROM state"").
The problem of the former fix was that it treated the failure of
firmware loading as a fatal error. Since the firmware files aren't
included in the standard linux-firmware tree, most users don't have
them, hence they got the non-working system after that. The revert
fixed the regression, but also it didn't make the firmware loading
triggered even on the devices that do need it. So we need still a fix
for them.
This is another attempt to handle the unknown ROM state. Like the
previous fix, this also tries to load the firmware when ROM shows
unknown state. In this patch, however, the failure of a firmware
loading (such as a missing firmware file) isn't handled as a fatal
error any longer when ROM has been already detected, but it falls back
to the ROM mode like before. The error is returned only when no ROM
is detected and the firmware loading failed.
Along with it, for simplifying the code flow, the detection and the
check of ROM is factored out from renesas_fw_check_running() and done
in the caller side, renesas_xhci_check_request_fw(). It avoids the
redundant ROM checks.
The patch was tested on Lenovo Thinkpad T14 gen (BIOS 1.34). Also it
was confirmed that no regression is seen on another Thinkpad T14
machine that has worked without the patch, too.
Fixes: 44cf53602f ("Revert "usb: renesas-xhci: Fix handling of unknown ROM state"")
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
BugLink: https://bugzilla.opensuse.org/show_bug.cgi?id=1189207
Link: https://lore.kernel.org/r/20210826124127.14789-1-tiwai@suse.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
During a USB cable disconnect, or soft disconnect scenario, a pending
SETUP transaction may not be completed, leading to the following
error:
dwc3 a600000.dwc3: timed out waiting for SETUP phase
If this occurs, then the entire pullup disable routine is skipped and
proper cleanup and halting of the controller does not complete.
Instead of returning an error (which is ignored from the UDC
perspective), allow the pullup disable routine to continue, which
will also handle disabling of EP0/1. This will end any active
transfers as well. Ensure to clear any delayed_status also, as the
timeout could happen within the STATUS stage.
Fixes: bb01473648 ("usb: dwc3: gadget: don't clear RUN/STOP when it's invalid to do so")
Cc: <stable@vger.kernel.org>
Reviewed-by: Thinh Nguyen <Thinh.Nguyen@synopsys.com>
Acked-by: Felipe Balbi <balbi@kernel.org>
Signed-off-by: Wesley Cheng <wcheng@codeaurora.org>
Link: https://lore.kernel.org/r/20210825042855.7977-1-wcheng@codeaurora.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reset controller fixes for v5.14
Hide the Sparx5 reset driver unless the ARCH_SPARX5 or COMPILE_TEST
options are enabled, to avoid unnecessarily asking users about this
driver. Fix a return value argument type in the ZynqMP reset driver.
* tag 'reset-fixes-for-v5.14' of git://git.pengutronix.de/pza/linux:
reset: reset-zynqmp: Fixed the argument data type
reset: RESET_MCHP_SPARX5 should depend on ARCH_SPARX5
Link: https://lore.kernel.org/r/e543959c5b5ee7b25686f81049bf187d602daeda.camel@pengutronix.de
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
We can't depend on the TRB's HWO bit to determine if the TRB ring is
"full". A TRB is only available when the driver had processed it, not
when the controller consumed and relinquished the TRB's ownership to the
driver. Otherwise, the driver may overwrite unprocessed TRBs. This can
happen when many transfer events accumulate and the system is slow to
process them and/or when there are too many small requests.
If a request is in the started_list, that means there is one or more
unprocessed TRBs remained. Check this instead of the TRB's HWO bit
whether the TRB ring is full.
Fixes: c4233573f6 ("usb: dwc3: gadget: prepare TRBs on update transfers too")
Cc: <stable@vger.kernel.org>
Acked-by: Felipe Balbi <balbi@kernel.org>
Signed-off-by: Thinh Nguyen <Thinh.Nguyen@synopsys.com>
Link: https://lore.kernel.org/r/e91e975affb0d0d02770686afc3a5b9eb84409f6.1629335416.git.Thinh.Nguyen@synopsys.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In netlbl_cipsov4_add_std() when 'doi_def->map.std' alloc
failed, we sometime observe panic:
BUG: kernel NULL pointer dereference, address:
...
RIP: 0010:cipso_v4_doi_free+0x3a/0x80
...
Call Trace:
netlbl_cipsov4_add_std+0xf4/0x8c0
netlbl_cipsov4_add+0x13f/0x1b0
genl_family_rcv_msg_doit.isra.15+0x132/0x170
genl_rcv_msg+0x125/0x240
This is because in cipso_v4_doi_free() there is no check
on 'doi_def->map.std' when 'doi_def->type' equal 1, which
is possibe, since netlbl_cipsov4_add_std() haven't initialize
it before alloc 'doi_def->map.std'.
This patch just add the check to prevent panic happen for similar
cases.
Reported-by: Abaci <abaci@linux.alibaba.com>
Signed-off-by: Michael Wang <yun.wang@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently when device is moved between network namespaces using
RTM_NEWLINK message type and one of netns attributes (FLA_NET_NS_PID,
IFLA_NET_NS_FD, IFLA_TARGET_NETNSID) but w/o specifying IFLA_IFNAME, and
target namespace already has device with same name, userspace will get
EINVAL what is confusing and makes debugging harder.
Fix it so that userspace gets more appropriate EEXIST instead what makes
debugging much easier.
Before:
# ./ifname.sh
+ ip netns add ns0
+ ip netns exec ns0 ip link add l0 type dummy
+ ip netns exec ns0 ip link show l0
8: l0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether 66:90:b5:d5:78:69 brd ff:ff:ff:ff:ff:ff
+ ip link add l0 type dummy
+ ip link show l0
10: l0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether 6e:c6:1f:15:20:8d brd ff:ff:ff:ff:ff:ff
+ ip link set l0 netns ns0
RTNETLINK answers: Invalid argument
After:
# ./ifname.sh
+ ip netns add ns0
+ ip netns exec ns0 ip link add l0 type dummy
+ ip netns exec ns0 ip link show l0
8: l0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether 1e:4a:72:e3:e3:8f brd ff:ff:ff:ff:ff:ff
+ ip link add l0 type dummy
+ ip link show l0
10: l0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether f2:fc:fe:2b:7d:a6 brd ff:ff:ff:ff:ff:ff
+ ip link set l0 netns ns0
RTNETLINK answers: File exists
The problem is that do_setlink() passes its `char *ifname` argument,
that it gets from a caller, to __dev_change_net_namespace() as is (as
`const char *pat`), but semantics of ifname and pat can be different.
For example, __rtnl_newlink() does this:
net/core/rtnetlink.c
3270 char ifname[IFNAMSIZ];
...
3286 if (tb[IFLA_IFNAME])
3287 nla_strscpy(ifname, tb[IFLA_IFNAME], IFNAMSIZ);
3288 else
3289 ifname[0] = '\0';
...
3364 if (dev) {
...
3394 return do_setlink(skb, dev, ifm, extack, tb, ifname, status);
3395 }
, i.e. do_setlink() gets ifname pointer that is always valid no matter
if user specified IFLA_IFNAME or not and then do_setlink() passes this
ifname pointer as is to __dev_change_net_namespace() as pat argument.
But the pat (pattern) in __dev_change_net_namespace() is used as:
net/core/dev.c
11198 err = -EEXIST;
11199 if (__dev_get_by_name(net, dev->name)) {
11200 /* We get here if we can't use the current device name */
11201 if (!pat)
11202 goto out;
11203 err = dev_get_valid_name(net, dev, pat);
11204 if (err < 0)
11205 goto out;
11206 }
As the result the `goto out` path on line 11202 is neven taken and
instead of returning EEXIST defined on line 11198,
__dev_change_net_namespace() returns an error from dev_get_valid_name()
and this, in turn, will be EINVAL for ifname[0] = '\0' set earlier.
Fixes: d8a5ec6727 ("[NET]: netlink support for moving devices between network namespaces.")
Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kurt Kanzenbach says:
====================
net: dsa: hellcreek: 802.1Qbv Fixes
while using TAPRIO offloading on the Hirschmann hellcreek switch, I've noticed
two issues in the current implementation:
1. The gate control list is incorrectly programmed
2. The admin base time is not set properly
Fix it.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Traffic schedules can only be started up to eight seconds within the
future. Therefore, the driver periodically checks every two seconds whether the
admin base time provided by the user is inside that window. If so the schedule
is started. Otherwise the check is deferred.
However, according to the programming manual the look ahead window size should
be four - not eight - seconds. By using the proposed value of four seconds
starting a schedule at a specified admin base time actually works as expected.
Fixes: 24dfc6eb39 ("net: dsa: hellcreek: Add TAPRIO offloading support")
Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently the gate control list which is programmed into the hardware is
incorrect resulting in wrong traffic schedules. The problem is the loop
variables are incremented before they are referenced. Therefore, move the
increment to the end of the loop.
Fixes: 24dfc6eb39 ("net: dsa: hellcreek: Add TAPRIO offloading support")
Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When adapter init fails, the blocked freelist bitmap is already freed
up and should not be touched. So, move the bitmap zeroing closer to
where it was successfully allocated. Also handle adapter init failure
unwind path immediately and avoid setting up RDMA memory windows.
Fixes: 5b377d114f ("cxgb4: Add debugfs facility to inject FL starvation")
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet says:
====================
inet: use siphash in exception handling
A group of security researchers brought to our attention
the weakness of hash functions used in rt6_exception_hash()
and fnhe_hashfun()
I made two distinct patches to help backports, since IPv6
part was added in 4.15
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
A group of security researchers brought to our attention
the weakness of hash function used in fnhe_hashfun().
Lets use siphash instead of Jenkins Hash, to considerably
reduce security risks.
Also remove the inline keyword, this really is distracting.
Fixes: d546c62154 ("ipv4: harden fnhe_hashfun()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Keyu Man <kman001@ucr.edu>
Cc: Willy Tarreau <w@1wt.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
A group of security researchers brought to our attention
the weakness of hash function used in rt6_exception_hash()
Lets use siphash instead of Jenkins Hash, to considerably
reduce security risks.
Following patch deals with IPv4.
Fixes: 35732d01fe ("ipv6: introduce a hash table to store dst cache")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Keyu Man <kman001@ucr.edu>
Cc: Wei Wang <weiwan@google.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Acked-by: Wei Wang <weiwan@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Marc Kleine says:
====================
pull-request: can 2021-08-26
this is a pull request of a single patch for net/master.
Stefan Mätje's patch fixes the interchange of RX and TX error counters
inthe esd_usb2 CAN driver.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Johan writes:
USB-serial fixes for 5.14-rc8
Here's a fix for a regression in 5.14 (also backported to stable) which
caused reads to stall for ch341 devices.
Included is also a new modem device id.
All but the revert have been in linux-next, and with no reported issues.
* tag 'usb-serial-5.14-rc8' of https://git.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial:
Revert "USB: serial: ch341: fix character loss at high transfer rates"
USB: serial: option: add new VID/PID to support Fibocom FG150
The u32 variable pci_dword is being masked with 0x1fffffff and then left
shifted 23 places. The shift is a u32 operation,so a value of 0x200 or
more in pci_dword will overflow the u32 and only the bottow 32 bits
are assigned to addr. I don't believe this was the original intent.
Fix this by casting pci_dword to a resource_size_t to ensure no
overflow occurs.
Note that the mask and 12 bit left shift operation does not need this
because the mask SNR_IMC_MMIO_MEM0_MASK and shift is always a 32 bit
value.
Fixes: ee49532b38 ("perf/x86/intel/uncore: Add IMC uncore support for Snow Ridge")
Addresses-Coverity: ("Unintentional integer overflow")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Link: https://lore.kernel.org/r/20210706114553.28249-1-colin.king@canonical.com
drivers/net/usb/asix_devices.c:757:60-65: WARNING: conversion to bool not needed here
Remove unneeded conversion to bool
Semantic patch information:
Relational and logical operators evaluate to bool,
explicit conversion is overly verbose and unneeded.
Generated by: scripts/coccinelle/misc/boolconv.cocci
Fixes: 7a141e64cf ("net: usb: asix: ax88772: move embedded PHY detection as early as possible")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: kernel test robot <lkp@intel.com>
Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://lore.kernel.org/r/20210825183538.13070-1-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If the attempt to reserve a slot fails, we currently leak the XPT_BUSY
flag on the socket. Among other things, this make it impossible to close
the socket.
Fixes: 82011c80b3 ("SUNRPC: Move svc_xprt_received() call sites")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Merge fixes from Andrew Morton:
"2 patches.
Subsystems affected by this patch series: mm/memory-hotplug and
MAINTAINERS"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
MAINTAINERS: exfat: update my email address
mm/memory_hotplug: fix potential permanent lru cache disable
GENPD core doesn't support handling performance state changes while
consumer device is runtime-suspended or when runtime PM is disabled.
GENPD core may override performance state that was configured by device
driver while RPM of the device was disabled or device was RPM-suspended.
Let's close that gap by allowing drivers to control performance state
while RPM of a consumer device is disabled and to set up performance
state of RPM-suspended device that will be applied by GENPD core on
RPM-resume of the device.
Fixes: 5937c3ce21 ("PM: domains: Drop/restore performance state votes for devices at runtime PM")
Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
It turns out that the SIGIO/FASYNC situation is almost exactly the same
as the EPOLLET case was: user space really wants to be notified after
every operation.
Now, in a perfect world it should be sufficient to only notify user
space on "state transitions" when the IO state changes (ie when a pipe
goes from unreadable to readable, or from unwritable to writable). User
space should then do as much as possible - fully emptying the buffer or
what not - and we'll notify it again the next time the state changes.
But as with EPOLLET, we have at least one case (stress-ng) where the
kernel sent SIGIO due to the pipe being marked for asynchronous
notification, but the user space signal handler then didn't actually
necessarily read it all before returning (it read more than what was
written, but since there could be multiple writes, it could leave data
pending).
The user space code then expected to get another SIGIO for subsequent
writes - even though the pipe had been readable the whole time - and
would only then read more.
This is arguably a user space bug - and Colin King already fixed the
stress-ng code in question - but the kernel regression rules are clear:
it doesn't matter if kernel people think that user space did something
silly and wrong. What matters is that it used to work.
So if user space depends on specific historical kernel behavior, it's a
regression when that behavior changes. It's on us: we were silly to
have that non-optimal historical behavior, and our old kernel behavior
was what user space was tested against.
Because of how the FASYNC notification was tied to wakeup behavior, this
was first broken by commits f467a6a664 and 1b6b26ae70 ("pipe: fix
and clarify pipe read/write wakeup logic"), but at the time it seems
nobody noticed. Probably because the stress-ng problem case ends up
being timing-dependent too.
It was then unwittingly fixed by commit 3a34b13a88 ("pipe: make pipe
writes always wake up readers") only to be broken again when by commit
3b844826b6 ("pipe: avoid unnecessary EPOLLET wakeups under normal
loads").
And at that point the kernel test robot noticed the performance
refression in the stress-ng.sigio.ops_per_sec case. So the "Fixes" tag
below is somewhat ad hoc, but it matches when the issue was noticed.
Fix it for good (knock wood) by simply making the kill_fasync() case
separate from the wakeup case. FASYNC is quite rare, and we clearly
shouldn't even try to use the "avoid unnecessary wakeups" logic for it.
Link: https://lore.kernel.org/lkml/20210824151337.GC27667@xsang-OptiPlex-9020/
Fixes: 3b844826b6 ("pipe: avoid unnecessary EPOLLET wakeups under normal loads")
Reported-by: kernel test robot <oliver.sang@intel.com>
Tested-by: Oliver Sang <oliver.sang@intel.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull ucount fixes from Eric Biederman:
"This branch fixes a regression that made it impossible to increase
rlimits that had been converted to the ucount infrastructure, and also
fixes a reference counting bug where the reference was not incremented
soon enough.
The fixes are trivial and the bugs have been encountered in the wild,
and the fixes have been tested"
* 'for-v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
ucounts: Increase ucounts reference counter before the security hook
ucounts: Fix regression preventing increasing of rlimits in init_user_ns
kcalloc() is called to allocate memory for m->m_info, and if it fails,
ceph_mdsmap_destroy() behind the label out_err will be called:
ceph_mdsmap_destroy(m);
In ceph_mdsmap_destroy(), m->m_info is dereferenced through:
kfree(m->m_info[i].export_targets);
To fix this possible null-pointer dereference, check m->m_info before the
for loop to free m->m_info[i].export_targets.
[ jlayton: fix up whitespace damage
only kfree(m->m_info) if it's non-NULL ]
Reported-by: TOTE Robot <oslab@tsinghua.edu.cn>
Signed-off-by: Tuo Li <islituo@gmail.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
The ceph_cap_flush structures are usually dynamically allocated, but
the ceph_cap_snap has an embedded one.
When force umounting, the client will try to remove all the session
caps. During this, it will free them, but that should not be done
with the ones embedded in a capsnap.
Fix this by adding a new boolean that indicates that the cap flush is
embedded in a capsnap, and skip freeing it if that's set.
At the same time, switch to using list_del_init() when detaching the
i_list and g_list heads. It's possible for a forced umount to remove
these objects but then handle_cap_flushsnap_ack() races in and does the
list_del_init() again, corrupting memory.
Cc: stable@vger.kernel.org
URL: https://tracker.ceph.com/issues/52283
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
This reverts commit f216562731.
[BUG]
It's no longer possible to create compressed inline extent after commit
f216562731 ("btrfs: compression: don't try to compress if we don't
have enough pages").
[CAUSE]
For compression code, there are several possible reasons we have a range
that needs to be compressed while it's no more than one page.
- Compressed inline write
The data is always smaller than one sector and the test lacks the
condition to properly recognize a non-inline extent.
- Compressed subpage write
For the incoming subpage compressed write support, we require page
alignment of the delalloc range.
And for 64K page size, we can compress just one page into smaller
sectors.
For those reasons, the requirement for the data to be more than one page
is not correct, and is already causing regression for compressed inline
data writeback. The idea of skipping one page to avoid wasting CPU time
could be revisited in the future.
[FIX]
Fix it by reverting the offending commit.
Reported-by: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
Link: https://lore.kernel.org/linux-btrfs/afa2742.c084f5d6.17b6b08dffc@tnonline.net
Fixes: f216562731 ("btrfs: compression: don't try to compress if we don't have enough pages")
CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This partially reverts commit 16c9afc776.
Alex Bee reports a regression in 5.14 on their RK3328 SoC when
configuring the PL330 DMA controller:
| ------------[ cut here ]------------
| WARNING: CPU: 2 PID: 373 at kernel/dma/mapping.c:235 dma_map_resource+0x68/0xc0
| Modules linked in: spi_rockchip(+) fuse
| CPU: 2 PID: 373 Comm: systemd-udevd Not tainted 5.14.0-rc7 #1
| Hardware name: Pine64 Rock64 (DT)
| pstate: 80000005 (Nzcv daif -PAN -UAO -TCO BTYPE=--)
| pc : dma_map_resource+0x68/0xc0
| lr : pl330_prep_slave_fifo+0x78/0xd0
This appears to be because dma_map_resource() is being called for a
physical address which does not correspond to a memory address yet does
have a valid 'struct page' due to the way in which the vmemmap is
constructed.
Prior to 16c9afc776 ("arm64/mm: drop HAVE_ARCH_PFN_VALID"), the arm64
implementation of pfn_valid() called memblock_is_memory() to return
'false' for such regions and the DMA mapping request would proceed.
However, now that we are using the generic implementation where only the
presence of the memory map entry is considered, we return 'true' and
erroneously fail with DMA_MAPPING_ERROR because we identify the region
as DRAM.
Although fixing this in the DMA mapping code is arguably the right fix,
it is a risky, cross-architecture change at this stage in the cycle. So
just revert arm64 back to its old pfn_valid() implementation for v5.14.
The change to the generic pfn_valid() code is preserved from the original
patch, so as to avoid impacting other architectures.
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Reported-by: Alex Bee <knaerzche@gmail.com>
Link: https://lore.kernel.org/r/d3a3c828-b777-faf8-e901-904995688437@gmail.com
Signed-off-by: Will Deacon <will@kernel.org>
While running kselftests, Hangbin observed that sch_ets.sh often crashes,
and splats like the following one are seen in the output of 'dmesg':
BUG: kernel NULL pointer dereference, address: 0000000000000000
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 159f12067 P4D 159f12067 PUD 159f13067 PMD 0
Oops: 0000 [#1] SMP NOPTI
CPU: 2 PID: 921 Comm: tc Not tainted 5.14.0-rc6+ #458
Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014
RIP: 0010:__list_del_entry_valid+0x2d/0x50
Code: 48 8b 57 08 48 b9 00 01 00 00 00 00 ad de 48 39 c8 0f 84 ac 6e 5b 00 48 b9 22 01 00 00 00 00 ad de 48 39 ca 0f 84 cf 6e 5b 00 <48> 8b 32 48 39 fe 0f 85 af 6e 5b 00 48 8b 50 08 48 39 f2 0f 85 94
RSP: 0018:ffffb2da005c3890 EFLAGS: 00010217
RAX: 0000000000000000 RBX: ffff9073ba23f800 RCX: dead000000000122
RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff9073ba23fbc8
RBP: ffff9073ba23f890 R08: 0000000000000001 R09: 0000000000000001
R10: 0000000000000001 R11: 0000000000000001 R12: dead000000000100
R13: ffff9073ba23fb00 R14: 0000000000000002 R15: 0000000000000002
FS: 00007f93e5564e40(0000) GS:ffff9073bba00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 000000014ad34000 CR4: 0000000000350ee0
Call Trace:
ets_qdisc_reset+0x6e/0x100 [sch_ets]
qdisc_reset+0x49/0x1d0
tbf_reset+0x15/0x60 [sch_tbf]
qdisc_reset+0x49/0x1d0
dev_reset_queue.constprop.42+0x2f/0x90
dev_deactivate_many+0x1d3/0x3d0
dev_deactivate+0x56/0x90
qdisc_graft+0x47e/0x5a0
tc_get_qdisc+0x1db/0x3e0
rtnetlink_rcv_msg+0x164/0x4c0
netlink_rcv_skb+0x50/0x100
netlink_unicast+0x1a5/0x280
netlink_sendmsg+0x242/0x480
sock_sendmsg+0x5b/0x60
____sys_sendmsg+0x1f2/0x260
___sys_sendmsg+0x7c/0xc0
__sys_sendmsg+0x57/0xa0
do_syscall_64+0x3a/0x80
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f93e44b8338
Code: 89 02 48 c7 c0 ff ff ff ff eb b5 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 8d 05 25 43 2c 00 8b 00 85 c0 75 17 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 41 54 41 89 d4 55
RSP: 002b:00007ffc0db737a8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 0000000061255c06 RCX: 00007f93e44b8338
RDX: 0000000000000000 RSI: 00007ffc0db73810 RDI: 0000000000000003
RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000
R10: 000000000000000b R11: 0000000000000246 R12: 0000000000000001
R13: 0000000000687880 R14: 0000000000000000 R15: 0000000000000000
Modules linked in: sch_ets sch_tbf dummy rfkill iTCO_wdt iTCO_vendor_support intel_rapl_msr intel_rapl_common joydev i2c_i801 pcspkr i2c_smbus lpc_ich virtio_balloon ip_tables xfs libcrc32c crct10dif_pclmul crc32_pclmul crc32c_intel ahci libahci ghash_clmulni_intel libata serio_raw virtio_blk virtio_console virtio_net net_failover failover sunrpc dm_mirror dm_region_hash dm_log dm_mod
CR2: 0000000000000000
When the change() function decreases the value of 'nstrict', we must take
into account that packets might be already enqueued on a class that flips
from 'strict' to 'quantum': otherwise that class will not be added to the
bandwidth-sharing list. Then, a call to ets_qdisc_reset() will attempt to
do list_del(&alist) with 'alist' filled with zero, hence the NULL pointer
dereference.
For classes flipping from 'strict' to 'quantum', initialize an empty list
and eventually add it to the bandwidth-sharing list, if there are packets
already enqueued. In this way, the kernel will:
a) prevent crashing as described above.
b) avoid retaining the backlog packets (for an arbitrarily long time) in
case no packet is enqueued after a change from 'strict' to 'quantum'.
Reported-by: Hangbin Liu <liuhangbin@gmail.com>
Fixes: dcc68b4d80 ("net: sch_ets: Add a new Qdisc")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Thanks to Kees Cook who detected the problem of memset that starting
from not the first member, but sized for the whole struct.
The better change will be to remove the redundant memset and to clear
only the msix_cnt member.
Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
Reported-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ensure a valid XSK buffer before proceed to free the xdp buffer.
The following kernel panic is observed without this patch:
RIP: 0010:xp_free+0x5/0x40
Call Trace:
stmmac_napi_poll_rxtx+0x332/0xb30 [stmmac]
? stmmac_tx_timer+0x3c/0xb0 [stmmac]
net_rx_action+0x13d/0x3d0
__do_softirq+0xfc/0x2fb
? smpboot_register_percpu_thread+0xe0/0xe0
run_ksoftirqd+0x32/0x70
smpboot_thread_fn+0x1d8/0x2c0
kthread+0x169/0x1a0
? kthread_park+0x90/0x90
ret_from_fork+0x1f/0x30
---[ end trace 0000000000000002 ]---
Fixes: bba2556efa ("net: stmmac: Enable RX via AF_XDP zero-copy")
Cc: <stable@vger.kernel.org> # 5.13.x
Suggested-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: Song Yoong Siang <yoong.siang.song@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
After free xsk_pool, there is possibility that napi polling is still
running in the middle, thus causes a kernel crash due to kernel NULL
pointer dereference of rx_q->xsk_pool and tx_q->xsk_pool.
Fix this by changing the XDP pool setup sequence to:
1. disable napi before free xsk_pool
2. enable napi after init xsk_pool
The following kernel panic is observed without this patch:
RIP: 0010:xsk_uses_need_wakeup+0x5/0x10
Call Trace:
stmmac_napi_poll_rxtx+0x3a9/0xae0 [stmmac]
__napi_poll+0x27/0x130
net_rx_action+0x233/0x280
__do_softirq+0xe2/0x2b6
run_ksoftirqd+0x1a/0x20
smpboot_thread_fn+0xac/0x140
? sort_range+0x20/0x20
kthread+0x124/0x150
? set_kthread_struct+0x40/0x40
ret_from_fork+0x1f/0x30
---[ end trace a77c8956b79ac107 ]---
Fixes: bba2556efa ("net: stmmac: Enable RX via AF_XDP zero-copy")
Cc: <stable@vger.kernel.org> # 5.13.x
Signed-off-by: Song Yoong Siang <yoong.siang.song@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
macb_ptp_desc will not return NULL under most circumstances with correct
Kconfig and IP design config register. But for the sake of the extreme
corner case, check for NULL when using the helper. In case of rx_tstamp,
no action is necessary except to return (similar to timestamp disabled)
and warn. In case of TX, return -EINVAL to let the skb be free. Perform
this check before marking skb in progress.
Fixes coverity warning:
(4) Event dereference:
Dereferencing a null pointer "desc_ptp"
Signed-off-by: Harini Katakam <harini.katakam@xilinx.com>
Reviewed-by: Radhey Shyam Pandey <radhey.shyam.pandey@xilinx.com>
Signed-off-by: Michal Simek <michal.simek@xilinx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit 2c896fb02e
"net: stmmac: dwmac-rk: add pd_gmac support for rk3399" and fixes
unbalanced pm_runtime_enable warnings.
In the commit to be reverted, support for power management was
introduced to the Rockchip glue code. Later, power management support
was introduced to the stmmac core code, resulting in multiple
invocations of pm_runtime_{enable,disable,get_sync,put_sync}.
The multiple invocations happen in rk_gmac_powerup and
stmmac_{dvr_probe, resume} as well as in rk_gmac_powerdown and
stmmac_{dvr_remove, suspend}, respectively, which are always called
in conjunction.
Fixes: 5ec5582343 ("net: stmmac: add clocks management for gmac driver")
Signed-off-by: Michael Riesch <michael.riesch@wolfvision.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
U-Boot expects this alias to be in place in order to fix up the mac
address of the ethernet node.
Note on the Icicle Kit board, currently only emac1 is enabled so it
becomes the 'ethernet0'.
Signed-off-by: Bin Meng <bin.meng@windriver.com>
Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Per the DT spec, 'local-mac-address' is used to specify MAC address
that was assigned to the network device, while 'mac-address' is used
to specify the MAC address that was last used by the boot program,
and shall be used only if the value differs from 'local-mac-address'
property value.
Signed-off-by: Bin Meng <bin.meng@windriver.com>
Reviewed-by: conor dooley <conor.dooley@microchip.com>
Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
The value of FP registers in the core dump file comes from the
thread.fstate. However, kernel saves the FP registers to the thread.fstate
only before scheduling out the process. If no process switch happens
during the exception handling process, kernel will not have a chance to
save the latest value of FP registers to thread.fstate. It will cause the
value of FP registers in the core dump file may be incorrect. To solve this
problem, this patch force lets kernel save the FP register into the
thread.fstate if the target task_struct equals the current.
Signed-off-by: Vincent Chen <vincent.chen@sifive.com>
Reviewed-by: Jisheng Zhang <jszhang@kernel.org>
Fixes: b8c8a9590e ("RISC-V: Add FP register ptrace support for gdb.")
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
We found a hang, the steps to reproduce are as follows:
1. blocking device via scsi_device_set_state()
2. dd if=/dev/sda of=/mnt/t.log bs=1M count=10
3. echo none > /sys/block/sda/queue/scheduler
4. echo "running" >/sys/block/sda/device/state
Step 3 and 4 should complete after step 4, but they hang.
CPU#0 CPU#1 CPU#2
--------------- ---------------- ----------------
Step 1: blocking device
Step 2: dd xxxx
^^^^^^ get request
q_usage_counter++
Step 3: switching scheculer
elv_iosched_store
elevator_switch
blk_mq_freeze_queue
blk_freeze_queue
> blk_freeze_queue_start
^^^^^^ mq_freeze_depth++
> blk_mq_run_hw_queues
^^^^^^ can't run queue when dev blocked
> blk_mq_freeze_queue_wait
^^^^^^ Hang here!!!
wait q_usage_counter==0
Step 4: running device
store_state_field
scsi_rescan_device
scsi_attach_vpd
scsi_vpd_inquiry
__scsi_execute
blk_get_request
blk_mq_alloc_request
blk_queue_enter
^^^^^^ Hang here!!!
wait mq_freeze_depth==0
blk_mq_run_hw_queues
^^^^^^ dispatch IO, q_usage_counter will reduce to zero
blk_mq_unfreeze_queue
^^^^^ mq_freeze_depth--
To fix this, we need to run queue before rescanning device when the device
state changes to SDEV_RUNNING.
Link: https://lore.kernel.org/r/20210824025921.3277629-1-lijinlin3@huawei.com
Fixes: f0f82e2476 ("scsi: core: Fix capacity set to zero after offlinining device")
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Li Jinlin <lijinlin3@huawei.com>
Signed-off-by: Qiu Laibin <qiulaibin@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Pull rdma fixes from Jason Gunthorpe:
"Several small fixes, the first three are significant:
- mlx5 crash unloading drivers with a rare HW config
- missing userspace reporting for the new dmabuf objects
- random rxe failure due to missing memory zeroing
- static checker/etc reports: missing spin lock init, null pointer
deref on error, extra unlock on error path, memory allocation under
spinlock, missing IRQ vector cleanup
- kconfig typo in the new irdma driver"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
RDMA/rxe: Zero out index member of struct rxe_queue
RDMA/efa: Free IRQ vectors on error flow
RDMA/rxe: Fix memory allocation while in a spin lock
RDMA/bnxt_re: Remove unpaired rtnl unlock in bnxt_re_dev_init()
IB/hfi1: Fix possible null-pointer dereference in _extend_sdma_tx_descs()
RDMA/irdma: Use correct kconfig symbol for AUXILIARY_BUS
RDMA/bnxt_re: Add missing spin lock initialization
RDMA/uverbs: Track dmabuf memory regions
RDMA/mlx5: Fix crash when unbind multiport slave
Building a randconfig here triggered:
ERROR: modpost: "pm_suspend_target_state" [drivers/gpu/drm/amd/amdgpu/amdgpu.ko] undefined!
because the module export of that symbol happens in
kernel/power/suspend.c which is enabled with CONFIG_SUSPEND.
The ifdef guards in amdgpu_acpi_is_s0ix_supported(), however, test for
CONFIG_PM_SLEEP which is defined like this:
config PM_SLEEP
def_bool y
depends on SUSPEND || HIBERNATE_CALLBACKS
and that randconfig has:
# CONFIG_SUSPEND is not set
CONFIG_HIBERNATE_CALLBACKS=y
leading to the module export missing.
Change the ifdeffery to depend directly on CONFIG_SUSPEND.
Fixes: 5706cb3c91 ("drm/amdgpu: fix checking pmops when PM_SLEEP is not enabled")
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/YSP6Lv53QV0cOAsd@zn.tnic
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
In early erratas this issue only covered port 0 when changing from
[x]MII (rev A 3.6). In subsequent errata versions this errata changed to
cover the additional "Hardware reset in CPU managed mode" condition, and
removed the note specifying that it only applied to port 0.
In designs where the device is configured with CPU managed mode
(CPU_MGD), on reset all SERDES ports (p0, p9, p10) have a stuck power
down bit and require this initial power up procedure. As such apply this
errata to all three SERDES ports of the mv88e6393x.
Signed-off-by: Nathan Rossi <nathan.rossi@digi.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For VFs we should return with an error in case we didn't get the exact
number of msix vectors as we requested.
Not doing that will lead to a crash when starting queues for this VF.
Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
"Ma, XinjianX" <xinjianx.ma@intel.com> reported:
> When lkp team run kernel selftests, we found after these series of patches, testcase mqueue: mq_perf_tests
> in kselftest failed with following message.
>
> # selftests: mqueue: mq_perf_tests
> #
> # Initial system state:
> # Using queue path: /mq_perf_tests
> # RLIMIT_MSGQUEUE(soft): 819200
> # RLIMIT_MSGQUEUE(hard): 819200
> # Maximum Message Size: 8192
> # Maximum Queue Size: 10
> # Nice value: 0
> #
> # Adjusted system state for testing:
> # RLIMIT_MSGQUEUE(soft): (unlimited)
> # RLIMIT_MSGQUEUE(hard): (unlimited)
> # Maximum Message Size: 16777216
> # Maximum Queue Size: 65530
> # Nice value: -20
> # Continuous mode: (disabled)
> # CPUs to pin: 3
> # ./mq_perf_tests: mq_open() at 296: Too many open files
> not ok 2 selftests: mqueue: mq_perf_tests # exit=1
> ```
>
> Test env:
> rootfs: debian-10
> gcc version: 9
After investigation the problem turned out to be that ucount_max for
the rlimits in init_user_ns was being set to the initial rlimit value.
The practical problem is that ucount_max provides a limit that
applications inside the user namespace can not exceed. Which means in
practice that rlimits that have been converted to use the ucount
infrastructure were not able to exceend their initial rlimits.
Solve this by setting the relevant values of ucount_max to
RLIM_INIFINITY. A limit in init_user_ns is pointless so the code
should allow the values to grow as large as possible without riscking
an underflow or an overflow.
As the ltp test case was a bit of a pain I have reproduced the rlimit failure
and tested the fix with the following little C program:
> #include <stdio.h>
> #include <fcntl.h>
> #include <sys/stat.h>
> #include <mqueue.h>
> #include <sys/time.h>
> #include <sys/resource.h>
> #include <errno.h>
> #include <string.h>
> #include <stdlib.h>
> #include <limits.h>
> #include <unistd.h>
>
> int main(int argc, char **argv)
> {
> struct mq_attr mq_attr;
> struct rlimit rlim;
> mqd_t mqd;
> int ret;
>
> ret = getrlimit(RLIMIT_MSGQUEUE, &rlim);
> if (ret != 0) {
> fprintf(stderr, "getrlimit(RLIMIT_MSGQUEUE) failed: %s\n", strerror(errno));
> exit(EXIT_FAILURE);
> }
> printf("RLIMIT_MSGQUEUE %lu %lu\n",
> rlim.rlim_cur, rlim.rlim_max);
> rlim.rlim_cur = RLIM_INFINITY;
> rlim.rlim_max = RLIM_INFINITY;
> ret = setrlimit(RLIMIT_MSGQUEUE, &rlim);
> if (ret != 0) {
> fprintf(stderr, "setrlimit(RLIMIT_MSGQUEUE, RLIM_INFINITY) failed: %s\n", strerror(errno));
> exit(EXIT_FAILURE);
> }
>
> memset(&mq_attr, 0, sizeof(struct mq_attr));
> mq_attr.mq_maxmsg = 65536 - 1;
> mq_attr.mq_msgsize = 16*1024*1024 - 1;
>
> mqd = mq_open("/mq_rlimit_test", O_RDONLY|O_CREAT, 0600, &mq_attr);
> if (mqd == (mqd_t)-1) {
> fprintf(stderr, "mq_open failed: %s\n", strerror(errno));
> exit(EXIT_FAILURE);
> }
> ret = mq_close(mqd);
> if (ret) {
> fprintf(stderr, "mq_close failed; %s\n", strerror(errno));
> exit(EXIT_FAILURE);
> }
>
> return EXIT_SUCCESS;
> }
Fixes: 6e52a9f053 ("Reimplement RLIMIT_MSGQUEUE on top of ucounts")
Fixes: d7c9e99aee ("Reimplement RLIMIT_MEMLOCK on top of ucounts")
Fixes: d646969055 ("Reimplement RLIMIT_SIGPENDING on top of ucounts")
Fixes: 21d1c5e386 ("Reimplement RLIMIT_NPROC on top of ucounts")
Reported-by: kernel test robot lkp@intel.com
Acked-by: Alexey Gladkov <legion@kernel.org>
Link: https://lkml.kernel.org/r/87eeajswfc.fsf_-_@disp2133
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Commit 457f44363a ("bpf: Implement BPF ring buffer and verifier support
for it") extended check_map_func_compatibility() by enforcing map -> helper
function match, but not helper -> map type match.
Due to this all of the bpf_ringbuf_*() helper functions could be used with
a wrong map type such as array or hash map, leading to invalid access due
to type confusion.
Also, both BPF_FUNC_ringbuf_{submit,discard} have ARG_PTR_TO_ALLOC_MEM as
argument and not a BPF map. Therefore, their check_map_func_compatibility()
presence is incorrect since it's only for map type checking.
Fixes: 457f44363a ("bpf: Implement BPF ring buffer and verifier support for it")
Reported-by: Ryota Shiga (Flatt Security)
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
This reverts commit 819fbd3d8e.
It turns out that some user-space applications use these uapi header
files, so even though the only user of the interface is an old driver
that was moved to staging, moving the header files causes unnecessary
pain.
Generally, we really don't want user space to use kernel headers
directly (exactly because it causes pain when we re-organize), and
instead copy them as needed. But these things happen, and the headers
were in the uapi directory, so I guess it's not entirely unreasonable.
Link: https://lore.kernel.org/lkml/4e3e0d40-df4a-94f8-7c2d-85010b0873c4@web.de/
Reported-by: Soeren Moch <smoch@web.de>
Cc: stable@kernel.org # 5.13
Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull regression fix for the operating performance points (OPP)
framework for v5.15 from Viresh Kumar:
"This fixes regression in the OPP core for a corner case."
* 'opp/fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm:
opp: core: Check for pending links before reading required_opp pointers
Oleksij Rempel says:
====================
asix fixes
changes v2:
- rebase against current net
- add one more fix for the ax88178 variant
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix crash on reboot on a system with ASIX AX88178 USB adapter attached
to it:
| asix 1-1.4:1.0 eth0: unregister 'asix' usb-ci_hdrc.0-1.4, ASIX AX88178 USB 2.0 Ethernet
| 8<--- cut here ---
| Unable to handle kernel NULL pointer dereference at virtual address 0000028c
| pgd = 5ec93aee
| [0000028c] *pgd=00000000
| Internal error: Oops: 5 [#1] PREEMPT SMP ARM
| Modules linked in:
| CPU: 1 PID: 1 Comm: systemd-shutdow Not tainted 5.14.0-rc1-20210811-1 #4
| Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
| PC is at phy_disconnect+0x8/0x48
| LR is at ax88772_unbind+0x14/0x20
| [<80650d04>] (phy_disconnect) from [<80741aa4>] (ax88772_unbind+0x14/0x20)
| [<80741aa4>] (ax88772_unbind) from [<8074e250>] (usbnet_disconnect+0x48/0xd8)
| [<8074e250>] (usbnet_disconnect) from [<807655e0>] (usb_unbind_interface+0x78/0x25c)
| [<807655e0>] (usb_unbind_interface) from [<805b03a0>] (__device_release_driver+0x154/0x20c)
| [<805b03a0>] (__device_release_driver) from [<805b0478>] (device_release_driver+0x20/0x2c)
| [<805b0478>] (device_release_driver) from [<805af944>] (bus_remove_device+0xcc/0xf8)
| [<805af944>] (bus_remove_device) from [<805ab26c>] (device_del+0x178/0x4b0)
| [<805ab26c>] (device_del) from [<807634a4>] (usb_disable_device+0xcc/0x178)
| [<807634a4>] (usb_disable_device) from [<8075a060>] (usb_disconnect+0xd8/0x238)
| [<8075a060>] (usb_disconnect) from [<8075a02c>] (usb_disconnect+0xa4/0x238)
| [<8075a02c>] (usb_disconnect) from [<8075a02c>] (usb_disconnect+0xa4/0x238)
| [<8075a02c>] (usb_disconnect) from [<80af3520>] (usb_remove_hcd+0xa0/0x198)
| [<80af3520>] (usb_remove_hcd) from [<807902e0>] (host_stop+0x38/0xa8)
| [<807902e0>] (host_stop) from [<8078d9e4>] (ci_hdrc_remove+0x3c/0x118)
| [<8078d9e4>] (ci_hdrc_remove) from [<805b27ec>] (platform_remove+0x20/0x50)
| [<805b27ec>] (platform_remove) from [<805b03a0>] (__device_release_driver+0x154/0x20c)
| [<805b03a0>] (__device_release_driver) from [<805b0478>] (device_release_driver+0x20/0x2c)
| [<805b0478>] (device_release_driver) from [<805af944>] (bus_remove_device+0xcc/0xf8)
| [<805af944>] (bus_remove_device) from [<805ab26c>] (device_del+0x178/0x4b0)
For this adapter we call ax88178_bind() and ax88772_unbind(), which is
related to different chip version and different counter part *bind()
function.
Since this chip is currently not ported to the PHYLIB, we do not need to
call phy_disconnect() here. So, to fix this crash, we need to add
ax88178_unbind().
Fixes: e532a096be ("net: usb: asix: ax88772: add phylib support")
Reported-by: Robin van der Gracht <robin@protonic.nl>
Tested-by: Robin van der Gracht <robin@protonic.nl>
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some HW revisions need additional MAC configuration before the embedded PHY
can be enabled. If this is not done, we won't be able to get response
from the internal PHY.
This issue was detected on chipcode == AX_AX88772_CHIPCODE variant,
where ax88772_hw_reset() was executed with missing embd_phy flag.
Fixes: e532a096be ("net: usb: asix: ax88772: add phylib support")
Reported-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>
Tested-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
According to Armada XP datasheet bit at 0 position is corresponding for
TxInProg indication.
Fixes: c5aff18204 ("net: mvneta: driver for Marvell Armada 370/XP network unit")
Signed-off-by: Maxim Kiselev <bigunclemax@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In the case of taprio offload is not enabled, the error handling path
causes a kernel crash due to kernel NULL pointer deference.
Fix this by adding check for NULL before attempt to access 'plat->est'
on the mutex_lock() call.
The following kernel panic is observed without this patch:
RIP: 0010:mutex_lock+0x10/0x20
Call Trace:
tc_setup_taprio+0x482/0x560 [stmmac]
kmem_cache_alloc_trace+0x13f/0x490
taprio_disable_offload.isra.0+0x9d/0x180 [sch_taprio]
taprio_destroy+0x6c/0x100 [sch_taprio]
qdisc_create+0x2e5/0x4f0
tc_modify_qdisc+0x126/0x740
rtnetlink_rcv_msg+0x12b/0x380
_raw_spin_lock_irqsave+0x19/0x40
_raw_spin_unlock_irqrestore+0x18/0x30
create_object+0x212/0x340
rtnl_calcit.isra.0+0x110/0x110
netlink_rcv_skb+0x50/0x100
netlink_unicast+0x191/0x230
netlink_sendmsg+0x243/0x470
sock_sendmsg+0x5e/0x60
____sys_sendmsg+0x20b/0x280
copy_msghdr_from_user+0x5c/0x90
__mod_memcg_state+0x87/0xf0
___sys_sendmsg+0x7c/0xc0
lru_cache_add+0x7f/0xa0
_raw_spin_unlock+0x16/0x30
wp_page_copy+0x449/0x890
handle_mm_fault+0x921/0xfc0
__sys_sendmsg+0x59/0xa0
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xa9
---[ end trace b1f19b24368a96aa ]---
Fixes: b60189e039 ("net: stmmac: Integrate EST with TAPRIO scheduler API")
Cc: <stable@vger.kernel.org> # 5.10.x
Signed-off-by: Wong Vee Khee <vee.khee.wong@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2021-08-20
This series contains updates to igc and e1000e drivers.
Aaron Ma resolves a page fault which occurs when thunderbolt is
unplugged for igc.
Toshiki Nishioka fixes Tx queue looping to use actual number of queues
instead of max value for igc.
Sasha fixes an incorrect latency comparison by decoding the values before
comparing and prevents attempted writes to read-only NVMs for e1000e.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
A successful 'xge_mdio_config()' call should be balanced by a corresponding
'xge_mdio_remove()' call in the error handling path of the probe, as
already done in the remove function.
Update the error handling path accordingly.
Fixes: ea8ab16ab2 ("drivers: net: xgene-v2: Add MDIO support")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 4fa82a87ba ("opp: Allow required-opps to be used for non genpd
use cases") dereferences the pointers in required_opp_tables but these
might be set to an ERR_PTR if the list still has lazy links pending,
resulting in segfaults. Prior to this patch IS_ERR was also checked on
required_opp_tables[i] before reading ->is_genpd inside
_opp_table_alloc_required_tables, which is at the same time the
predicate to add this table to the lazy list. This segfault is solved
by reordering the checks to bail on lazy pending tables before reading
->is_genpd.
Fixes: 4fa82a87ba ("opp: Allow required-opps to be used for non genpd use cases")
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@somainline.org>
Signed-off-by: Marijn Suijten <marijn.suijten@somainline.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Validate csum_start in gre_handle_offloads before we call _gre_xmit so
that we do not crash later when the csum_start value is used in the
lco_csum function call.
This patch deals with ipv6 code.
Fixes: Fixes: b05229f442 ("gre6: Cleanup GREv6 transmit path, call common
GRE functions")
Reported-by: syzbot+ff8e1b9f2f36481e2efc@syzkaller.appspotmail.com
Signed-off-by: Shreyansh Chouhan <chouhan.shreyansh630@gmail.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull powerpc fixes from Michael Ellerman:
- Fix random crashes on some 32-bit CPUs by adding isync() after
locking/unlocking KUEP
- Fix intermittent crashes when loading modules with strict module RWX
- Fix a section mismatch introduce by a previous fix.
Thanks to Christophe Leroy, Fabiano Rosas, Laurent Vivier, Murilo
Opsfelder Araújo, Nathan Chancellor, and Stan Johnson.
h# -----BEGIN PGP SIGNATURE-----
* tag 'powerpc-5.14-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/mm: Fix set_memory_*() against concurrent accesses
powerpc/32s: Fix random crashes by adding isync() after locking/unlocking KUEP
powerpc/xive: Do not mark xive_request_ipi() as __init
The recent commit
064855a690 ("x86/resctrl: Fix default monitoring groups reporting")
caused a RHEL build failure with an uninitialized variable warning
treated as an error because it removed the default case snippet.
The RHEL Makefile uses '-Werror=maybe-uninitialized' to force possibly
uninitialized variable warnings to be treated as errors. This is also
reported by smatch via the 0day robot.
The error from the RHEL build is:
arch/x86/kernel/cpu/resctrl/monitor.c: In function ‘__mon_event_count’:
arch/x86/kernel/cpu/resctrl/monitor.c:261:12: error: ‘m’ may be used
uninitialized in this function [-Werror=maybe-uninitialized]
m->chunks += chunks;
^~
The upstream Makefile does not build using '-Werror=maybe-uninitialized'.
So, the problem is not seen there. Fix the problem by putting back the
default case snippet.
[ bp: note that there's nothing wrong with the code and other compilers
do not trigger this warning - this is being done just so the RHEL compiler
is happy. ]
Fixes: 064855a690 ("x86/resctrl: Fix default monitoring groups reporting")
Reported-by: Terry Bowman <Terry.Bowman@amd.com>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/162949631908.23903.17090272726012848523.stgit@bmoger-ubuntu
Pull clk driver fixes from Stephen Boyd:
- Make the regulator state match the GDSC power domain state at boot on
Qualcomm SoCs so that the regulator isn't turned off inadvertently.
- Fix earlycon on i.MX6Q SoCs
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: qcom: gdsc: Ensure regulator init state matches GDSC state
clk: imx6q: fix uart earlycon unwork
Pull char/misc driver fixes from Greg KH:
"Here are some small driver fixes for 5.14-rc7.
They consist of:
- revert for an interconnect patch that was found to have problems
- ipack tpci200 driver fixes for reported problems
- slimbus messaging and ngd fixes for reported problems
All are small and have been in linux-next for a while with no reported
issues"
* tag 'char-misc-5.14-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
ipack: tpci200: fix memory leak in the tpci200_register
ipack: tpci200: fix many double free issues in tpci200_pci_probe
slimbus: ngd: reset dma setup during runtime pm
slimbus: ngd: set correct device for pm
slimbus: messaging: check for valid transaction id
slimbus: messaging: start transaction ids from 1 instead of zero
Revert "interconnect: qcom: icc-rpmh: Add BCMs to commit list in pre_aggregate"
Pull USB fix from Greg KH:
"Here is a single USB typec tcpm fix for a reported problem for
5.14-rc7. It showed up in 5.13 and resolves an issue that Hans found.
It has been in linux-next this week with no reported problems"
* tag 'usb-5.14-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
usb: typec: tcpm: Fix VDMs sometimes not being forwarded to alt-mode drivers
Pull RISC-V fixes from Palmer Dabbelt:
- fix the sifive-l2-cache device tree bindings for json-schema
compatibility. This does not change the intended behavior of the
binding.
- avoid improperly freeing necessary resources during early boot.
* tag 'riscv-for-linus-5.14-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: Fix a number of free'd resources in init_resources()
dt-bindings: sifive-l2-cache: Fix 'select' matching
Pull s390 fix from Vasily Gorbik:
- fix use after free of zpci_dev in pci code
* tag 's390-5.14-5' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/pci: fix use after free of zpci_dev
Pull mandatory file locking deprecation warning from Jeff Layton:
"As discussed on the list, this patch just adds a new warning for folks
who still have mandatory locking enabled and actually mount with '-o
mand'. I'd like to get this in for v5.14 so we can push this out into
stable kernels and hopefully reach folks who have mounts with -o mand.
For now, I'm operating under the assumption that we'll fully remove
this support in v5.15, but we can move that out if any legitimate
users of this facility speak up between now and then"
* tag 'locks-v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux:
fs: warn about impending deprecation of mandatory locks
Commit
79419e13e8 ("x86/boot/compressed/64: Setup IDT in startup_32 boot path")
introduced an IDT into the 32-bit boot path of the decompressor stub.
But the IDT is set up before ExitBootServices() is called, and some UEFI
firmwares rely on their own IDT.
Save the firmware IDT on boot and restore it before calling into EFI
functions to fix boot failures introduced by above commit.
Fixes: 79419e13e8 ("x86/boot/compressed/64: Setup IDT in startup_32 boot path")
Reported-by: Fabio Aiuto <fabioaiuto83@gmail.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Cc: stable@vger.kernel.org # 5.13+
Link: https://lkml.kernel.org/r/20210820125703.32410-1-joro@8bytes.org
Pull block fixes from Jens Axboe:
"Three fixes from Ming Lei that should go into 5.14:
- Fix for a kernel panic when iterating over tags for some cases
where a flush request is present, a regression in this cycle.
- Request timeout fix
- Fix flush request checking"
* tag 'block-5.14-2021-08-20' of git://git.kernel.dk/linux-block:
blk-mq: fix is_flush_rq
blk-mq: fix kernel panic during iterating over flush request
blk-mq: don't grab rq's refcount in blk_mq_check_expired()
Pull io_uring fixes from Jens Axboe:
"A few small fixes that should go into this release:
- Fix never re-assigning an initial error value for io_uring_enter()
for SQPOLL, if asked to do nothing
- Fix xa_alloc_cycle() return value checking, for cases where we have
wrapped around
- Fix for a ctx pin issue introduced in this cycle (Pavel)"
* tag 'io_uring-5.14-2021-08-20' of git://git.kernel.dk/linux-block:
io_uring: fix xa_alloc_cycle() error return value check
io_uring: pin ctx on fallback execution
io_uring: only assign io_uring_enter() SQPOLL error in actual error case
We've had CONFIG_MANDATORY_FILE_LOCKING since 2015 and a lot of distros
have disabled it. Warn the stragglers that still use "-o mand" that
we'll be dropping support for that mount option.
Cc: stable@vger.kernel.org
Signed-off-by: Jeff Layton <jlayton@kernel.org>
We currently check for ret != 0 to indicate error, but '1' is a valid
return and just indicates that the allocation succeeded with a wrap.
Correct the check to be for < 0, like it was before the xarray
conversion.
Cc: stable@vger.kernel.org
Fixes: 61cf93700f ("io_uring: Convert personality_idr to XArray")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pull ACPI fixes from Rafael Wysocki:
"These fix two mistakes in new code.
Specifics:
- Prevent confusing messages from being printed if the PRMT table is
not present or there are no PRM modules (Aubrey Li).
- Fix the handling of suspend-to-idle entry and exit in the case when
the Microsoft UUID is used with the Low-Power S0 Idle _DSM
interface (Mario Limonciello)"
* tag 'acpi-5.14-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: PM: s2idle: Invert Microsoft UUID entry and exit
ACPI: PRM: Deal with table not present or no module found
Pull power management fixes from Rafael Wysocki:
"These fix some issues in the ARM cpufreq drivers and in the operating
performance points (OPP) framework.
Specifics:
- Fix useless WARN() in the OPP core and prevent a noisy warning
from being printed by OPP _put functions (Dmitry Osipenko).
- Fix error path when allocation failed in the arm_scmi cpufreq
driver (Lukasz Luba).
- Blacklist Qualcomm sc8180x and Qualcomm sm8150 in
cpufreq-dt-platdev (Bjorn Andersson, Thara Gopinath).
- Forbid cpufreq for 1.2 GHz variant in the armada-37xx cpufreq
driver (Marek Behún)"
* tag 'pm-5.14-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
opp: Drop empty-table checks from _put functions
cpufreq: armada-37xx: forbid cpufreq for 1.2 GHz variant
cpufreq: blocklist Qualcomm sm8150 in cpufreq-dt-platdev
cpufreq: arm_scmi: Fix error path when allocation failed
opp: remove WARN when no valid OPPs remain
cpufreq: blacklist Qualcomm sc8180x in cpufreq-dt-platdev
Pull drm fixes from Dave Airlie:
"Regularly scheduled fixes. The ttm one solves a problem of GPU drivers
failing to load if debugfs is off in Kconfig, otherwise the i915 and
mediatek, and amdgpu fixes all fairly normal.
Nouveau has a couple of display fixes, but it has a fix for a
longstanding race condition in it's memory manager code, and the fix
mostly removes some code that wasn't working properly and has no
userspace users. This fix makes the diffstat kinda larger but in a
good (negative line-count) way.
core:
- fix drm_wait_vblank uapi copying bug
ttm:
- fix debugfs init when debugfs is off
amdgpu:
- vega10 SMU workload fix
- DCN VM fix
- DCN 3.01 watermark fix
amdkfd:
- SVM fix
nouveau:
- ampere display fixes
- remove MM misfeature to fix a longstanding race condition
i915:
- tweaked display workaround for all PCHs
- eDP MSO pipe sanity for ADL-P fix
- remove unused symbol export
mediatek:
- AAL output size setting
- Delete component in remove function"
* tag 'drm-fixes-2021-08-20-3' of git://anongit.freedesktop.org/drm/drm:
drm/amd/display: Use DCN30 watermark calc for DCN301
drm/i915/dp: remove superfluous EXPORT_SYMBOL()
drm/i915/edp: fix eDP MSO pipe sanity checks for ADL-P
drm/i915: Tweaked Wa_14010685332 for all PCHs
drm/nouveau: rip out nvkm_client.super
drm/nouveau: block a bunch of classes from userspace
drm/nouveau/fifo/nv50-: rip out dma channels
drm/nouveau/kms/nv50: workaround EFI GOP window channel format differences
drm/nouveau/disp: power down unused DP links during init
drm/nouveau: recognise GA107
drm: Copy drm_wait_vblank to user before returning
drm/amd/display: Ensure DCN save after VM setup
drm/amdkfd: fix random KFDSVMRangeTest.SetGetAttributesTest test failure
drm/amd/pm: change the workload type for some cards
Revert "drm/amd/pm: fix workload mismatch on vega10"
drm: ttm: Don't bail from ttm_global_init if debugfs_create_dir fails
drm/mediatek: Add component_del in OVL and COLOR remove function
drm/mediatek: Add AAL output size configuration
Pull PCI fixes from Bjorn Helgaas:
- Add Rahul Tanwar as Intel LGM Gateway PCIe maintainer (Rahul Tanwar)
- Add Jim Quinlan et al as Broadcom STB PCIe maintainers (Jim Quinlan)
- Increase D3hot-to-D0 delay for AMD Renoir/Cezanne XHCI (Marcin
Bachry)
- Correct iomem_get_mapping() usage for legacy_mem sysfs (Krzysztof
Wilczyński)
* tag 'pci-v5.14-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
PCI/sysfs: Use correct variable for the legacy_mem sysfs object
PCI: Increase D3 delay for AMD Renoir/Cezanne XHCI
MAINTAINERS: Add Jim Quinlan et al as Broadcom STB PCIe maintainers
MAINTAINERS: Add Rahul Tanwar as Intel LGM Gateway PCIe maintainer
Pull MMC host fixes from Ulf Hansson:
- dw_mmc: Fix hang on data CRC error
- mmci: Fix voltage switch procedure for the stm32 variant
- sdhci-iproc: Fix some clock issues for BCM2711
- sdhci-msm: Fixup software timeout value
* tag 'mmc-v5.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
mmc: sdhci-iproc: Set SDHCI_QUIRK_CAP_CLOCK_BASE_BROKEN on BCM2711
mmc: sdhci-iproc: Cap min clock frequency on BCM2711
mmc: sdhci-msm: Update the software timeout value for sdhc
mmc: mmci: stm32: Check when the voltage switch procedure should be done
mmc: dw_mmc: Fix hang on data CRC error
Pull more sound fixes from Takashi Iwai:
"This is a quick follow up for 5.14: a fix for a very recently
introduced regression on ASoC Intel Atom driver, and another trivial
HD-audio quirk for HP laptops"
* tag 'sound-5.14-rc7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ASoC: intel: atom: Fix breakage for PCM buffer address setup
ALSA: hda/realtek: Limit mic boost on HP ProBook 445 G8
Pull arm64 fixes from Will Deacon:
- Fix cleaning of vDSO directories
- Ensure CNTHCTL_EL2 is fully initialised when booting at EL2
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: initialize all of CNTHCTL_EL2
arm64: clean vdso & vdso32 files
Pull iommu fixes from Joerg Roedel:
- Fix for a potential NULL-ptr dereference in IOMMU core code
- Two resource leak fixes
- Cache flush fix in the Intel VT-d driver
* tag 'iommu-fixes-v5.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu/vt-d: Fix incomplete cache flush in intel_pasid_tear_down_entry()
iommu/vt-d: Fix PASID reference leak
iommu: Check if group is NULL before remove device
iommu/dma: Fix leak in non-contiguous API
1) New index member of struct rxe_queue was introduced but not zeroed so
the initial value of index may be random.
2) The current index is not masked off to index_mask.
In this case producer_addr() and consumer_addr() will get an invalid
address by the random index and then accessing the invalid address
triggers the following panic:
"BUG: unable to handle page fault for address: ffff9ae2c07a1414"
Fix the issue by using kzalloc() to zero out index member.
Fixes: 5bcf5a59c4 ("RDMA/rxe: Protext kernel index from user space")
Link: https://lore.kernel.org/r/20210820111509.172500-1-yangx.jy@fujitsu.com
Signed-off-by: Xiao Yang <yangx.jy@fujitsu.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
syzbot hit kernel BUG at fs/hugetlbfs/inode.c:532 as described in [1].
This BUG triggers if the HPageRestoreReserve flag is set on a page in
the page cache. It should never be set, as the routine
huge_add_to_page_cache explicitly clears the flag after adding a page to
the cache.
The only code other than huge page allocation which sets the flag is
restore_reserve_on_error. It will potentially set the flag in rare out
of memory conditions. syzbot was injecting errors to cause memory
allocation errors which exercised this specific path.
The code in restore_reserve_on_error is doing the right thing. However,
there are instances where pages in the page cache were being passed to
restore_reserve_on_error. This is incorrect, as once a page goes into
the cache reservation information will not be modified for the page
until it is removed from the cache. Error paths do not remove pages
from the cache, so even in the case of error, the page will remain in
the cache and no reservation adjustment is needed.
Modify routines that potentially call restore_reserve_on_error with a
page cache page to no longer do so.
Note on fixes tag: Prior to commit 846be08578 ("mm/hugetlb: expand
restore_reserve_on_error functionality") the routine would not process
page cache pages because the HPageRestoreReserve flag is not set on such
pages. Therefore, this issue could not be trigggered. The code added
by commit 846be08578 ("mm/hugetlb: expand restore_reserve_on_error
functionality") is needed and correct. It exposed incorrect calls to
restore_reserve_on_error which is the root cause addressed by this
commit.
[1] https://lore.kernel.org/linux-mm/00000000000050776d05c9b7c7f0@google.com/
Link: https://lkml.kernel.org/r/20210818213304.37038-1-mike.kravetz@oracle.com
Fixes: 846be08578 ("mm/hugetlb: expand restore_reserve_on_error functionality")
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reported-by: <syzbot+67654e51e54455f1c585@syzkaller.appspotmail.com>
Cc: Mina Almasry <almasrymina@google.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Naoya Horiguchi <naoya.horiguchi@linux.dev>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Originally the addr != NULL check was meant to take care of the case
where __kfence_pool == NULL (KFENCE is disabled). However, this does
not work for addresses where addr > 0 && addr < KFENCE_POOL_SIZE.
This can be the case on NULL-deref where addr > 0 && addr < PAGE_SIZE or
any other faulting access with addr < KFENCE_POOL_SIZE. While the
kernel would likely crash, the stack traces and report might be
confusing due to double faults upon KFENCE's attempt to unprotect such
an address.
Fix it by just checking that __kfence_pool != NULL instead.
Link: https://lkml.kernel.org/r/20210818130300.2482437-1-elver@google.com
Fixes: 0ce20dd840 ("mm: add Kernel Electric-Fence infrastructure")
Signed-off-by: Marco Elver <elver@google.com>
Reported-by: Kuan-Ying Lee <Kuan-Ying.Lee@mediatek.com>
Acked-by: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: <stable@vger.kernel.org> [5.12+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In a debugging session the other day, Rik noticed that node_reclaim()
was missing memstall annotations. This means we'll miss pressure and
lost productivity resulting from reclaim on an overloaded local NUMA
node when vm.zone_reclaim_mode is enabled.
There haven't been any reports, but that's likely because
vm.zone_reclaim_mode hasn't been a commonly used feature recently, and
the intersection between such setups and psi users is probably nil.
But secondary memory such as CXL-connected DIMMS, persistent memory etc,
and the page demotion patches that handle them
(https://lore.kernel.org/lkml/20210401183216.443C4443@viggo.jf.intel.com/)
could soon make this a more common codepath again.
Link: https://lkml.kernel.org/r/20210818152457.35846-1-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reported-by: Rik van Riel <riel@surriel.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We've noticed occasional OOM killing when memory.low settings are in
effect for cgroups. This is unexpected and undesirable as memory.low is
supposed to express non-OOMing memory priorities between cgroups.
The reason for this is proportional memory.low reclaim. When cgroups
are below their memory.low threshold, reclaim passes them over in the
first round, and then retries if it couldn't find pages anywhere else.
But when cgroups are slightly above their memory.low setting, page scan
force is scaled down and diminished in proportion to the overage, to the
point where it can cause reclaim to fail as well - only in that case we
currently don't retry, and instead trigger OOM.
To fix this, hook proportional reclaim into the same retry logic we have
in place for when cgroups are skipped entirely. This way if reclaim
fails and some cgroups were scanned with diminished pressure, we'll try
another full-force cycle before giving up and OOMing.
[akpm@linux-foundation.org: coding-style fixes]
Link: https://lkml.kernel.org/r/20210817180506.220056-1-hannes@cmpxchg.org
Fixes: 9783aa9917 ("mm, memcg: proportional memory.{low,min} reclaim")
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reported-by: Leon Yang <lnyng@fb.com>
Reviewed-by: Rik van Riel <riel@surriel.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Roman Gushchin <guro@fb.com>
Acked-by: Chris Down <chris@chrisdown.name>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: <stable@vger.kernel.org> [5.4+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When placing pages on a pcp list, migratetype values over
MIGRATE_PCPTYPES get added to the MIGRATE_MOVABLE pcp list.
However, the actual migratetype is preserved in the page and should
not be changed to MIGRATE_MOVABLE or the page may end up on the wrong
free_list.
The impact is that HIGHATOMIC or CMA pages getting bulk freed from the
PCP lists could potentially end up on the wrong buddy list. There are
various consequences but minimally NR_FREE_CMA_PAGES accounting could
get screwed up.
[mgorman@techsingularity.net: changelog update]
Link: https://lkml.kernel.org/r/20210811182917.2607994-1-opendmb@gmail.com
Fixes: df1acc8569 ("mm/page_alloc: avoid conflating IRQs disabled with zone->lock")
Signed-off-by: Doug Berger <opendmb@gmail.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
schedule_delayed_work does not push back the work if it was already
scheduled before, so amdgpu_device_delay_enable_gfx_off ran ~100 ms
after the first time GFXOFF was disabled and re-enabled, even if GFXOFF
was disabled and re-enabled again during those 100 ms.
This resulted in frame drops / stutter with the upcoming mutter 41
release on Navi 14, due to constantly enabling GFXOFF in the HW and
disabling it again (for getting the GPU clock counter).
To fix this, call cancel_delayed_work_sync when the disable count
transitions from 0 to 1, and only schedule the delayed work on the
reverse transition, not if the disable count was already 0. This makes
sure the delayed work doesn't run at unexpected times, and allows it to
be lock-free.
v2:
* Use cancel_delayed_work_sync & mutex_trylock instead of
mod_delayed_work.
v3:
* Make amdgpu_device_delay_enable_gfx_off lock-free (Christian König)
v4:
* Fix race condition between amdgpu_gfx_off_ctrl incrementing
adev->gfx.gfx_off_req_count and amdgpu_device_delay_enable_gfx_off
checking for it to be 0 (Evan Quan)
Cc: stable@vger.kernel.org
Reviewed-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> # v3
Acked-by: Christian König <christian.koenig@amd.com> # v3
Signed-off-by: Michel Dänzer <mdaenzer@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Function init_resources() allocates a boot memory block to hold an array of
resources which it adds to iomem_resource. The array is filled in from its
end and the function then attempts to free any unused memory at the
beginning. The problem is that size of the unused memory is incorrectly
calculated and this can result in releasing memory which is in use by
active resources. Their data then gets corrupted later when the memory is
reused by a different part of the system.
Fix the size of the released memory to correctly match the number of unused
resource entries.
Fixes: ffe0e52612 ("RISC-V: Improve init_resources()")
Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
Reviewed-by: Sunil V L <sunilvl@ventanamicro.com>
Acked-by: Nick Kossifidis <mick@ics.forth.gr>
Tested-by: Sunil V L <sunilvl@ventanamicro.com>
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
We should decode the latency and the max_latency before directly compare.
The latency should be presented as lat_enc = scale x value:
lat_enc_d = (lat_enc & 0x0x3ff) x (1U << (5*((max_ltr_enc & 0x1c00)
>> 10)))
Fixes: cf8fb73c23 ("e1000e: add support for LTR on I217/I218")
Suggested-by: Yee Li <seven.yi.lee@gmail.com>
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Dvora Fuxbrumer <dvorax.fuxbrumer@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
After unplug thunderbolt dock with i225, pciehp interrupt is triggered,
remove call will read/write mmio address which is already disconnected,
then cause page fault and make system hang.
Check PCI state to remove device safely.
Trace:
BUG: unable to handle page fault for address: 000000000000b604
Oops: 0000 [#1] SMP NOPTI
RIP: 0010:igc_rd32+0x1c/0x90 [igc]
Call Trace:
igc_ptp_suspend+0x6c/0xa0 [igc]
igc_ptp_stop+0x12/0x50 [igc]
igc_remove+0x7f/0x1c0 [igc]
pci_device_remove+0x3e/0xb0
__device_release_driver+0x181/0x240
Fixes: 13b5b7fd6a ("igc: Add support for Tx/Rx rings")
Fixes: b03c49cde6 ("igc: Save PTP time before a reset")
Signed-off-by: Aaron Ma <aaron.ma@canonical.com>
Tested-by: Dvora Fuxbrumer <dvorax.fuxbrumer@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
- restore the behavior in enable_net_traffic() to avoid regressions - Jakub
Kicinski;
- hurried up and removed redundant assignment in pegasus_open() before yet
another checker complains;
Fixes: 8a160e2e9a ("net: usb: pegasus: Check the return value of get_geristers() and friends;")
Reported-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Petko Manolov <petko.manolov@konsulko.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This check was incomplete, did not consider size is 0:
if (len != ALIGN(size, 4) + hdrlen)
goto err;
if size from qrtr_hdr is 0, the result of ALIGN(size, 4)
will be 0, In case of len == hdrlen and size == 0
in header this check won't fail and
if (cb->type == QRTR_TYPE_NEW_SERVER) {
/* Remote node endpoint can bridge other distant nodes */
const struct qrtr_ctrl_pkt *pkt = data + hdrlen;
qrtr_node_assign(node, le32_to_cpu(pkt->server.node));
}
will also read out of bound from data, which is hdrlen allocated block.
Fixes: 194ccc8829 ("net: qrtr: Support decoding incoming v2 packets")
Fixes: ad9d24c942 ("net: qrtr: fix OOB Read in qrtr_endpoint_post")
Signed-off-by: Xiaolong Huang <butterflyhuangxx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The implict soft-mask table addresses get relocated if they use a
relative symbol like a label. This is right for code that runs relocated
but not for unrelocated. The scv interrupt vectors run unrelocated, so
absolute addresses are required for their soft-mask table entry.
This fixes crashing with relocated kernels, usually an asynchronous
interrupt hitting in the scv handler, then hitting the trap that checks
whether r1 is in userspace.
Fixes: 325678fd05 ("powerpc/64s: add a table of implicit soft-masked addresses")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210820103431.1701240-1-npiggin@gmail.com
Eugene tripped over the case where rq_lock(), as called in a
for_each_possible_cpu() loop came apart because rq->core hadn't been
setup yet.
This is a somewhat unusual, but valid case.
Rework things such that rq->core is initialized to point at itself. IOW
initialize each CPU as a single threaded Core. CPU online will then join
the new CPU (thread) to an existing Core where needed.
For completeness sake, have CPU offline fully undo the state so as to
not presume the topology will match the next time it comes online.
Fixes: 9edeaea1bc ("sched: Core-wide rq->lock")
Reported-by: Eugene Syromiatnikov <esyr@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Josh Don <joshdon@google.com>
Tested-by: Eugene Syromiatnikov <esyr@redhat.com>
Link: https://lkml.kernel.org/r/YR473ZGeKqMs6kw+@hirez.programming.kicks-ass.net
When the schema fixups are applied to 'select' the result is a single
entry is required for a match, but that will never match as there should
be 2 entries. Also, a 'select' schema should have the widest possible
match, so use 'contains' which matches the compatible string(s) in any
position and not just the first position.
Fixes: 993dcfac64 ("dt-bindings: riscv: sifive-l2-cache: convert bindings to json-schema")
Signed-off-by: Rob Herring <robh@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Commit 66f24fa766 ("mm: drop redundant ARCH_ENABLE_SPLIT_PMD_PTLOCK")
broke PMD split page table lock for powerpc.
It selects the non-existent config ARCH_ENABLE_PMD_SPLIT_PTLOCK in
arch/powerpc/platforms/Kconfig.cputype, but clearly intended to
select ARCH_ENABLE_SPLIT_PMD_PTLOCK (notice the word swapping!), as
that commit did for all other architectures.
Fix it by selecting the correct symbol ARCH_ENABLE_SPLIT_PMD_PTLOCK.
Fixes: 66f24fa766 ("mm: drop redundant ARCH_ENABLE_SPLIT_PMD_PTLOCK")
Cc: stable@vger.kernel.org # v5.13+
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Reviewed-by: Daniel Axtens <dja@axtens.net>
[mpe: Reword change log to make it clear this is a bug fix]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210819113954.17515-3-lukas.bulwahn@gmail.com
The devlink dev info command reports version information about the
device and firmware running on the board. This includes the "board.id"
field which is supposed to represent an identifier of the board design.
The ice driver uses the Product Board Assembly identifier for this.
In some cases, the PBA is not present in the NVM. If this happens,
devlink dev info will fail with an error. Instead, modify the
ice_info_pba function to just exit without filling in the context
buffer. This will cause the board.id field to be skipped. Log a dev_dbg
message in case someone wants to confirm why board.id is not showing up
for them.
Fixes: e961b679fb ("ice: add board identifier info to devlink .info_get")
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://lore.kernel.org/r/20210819223451.245613-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
rxe_mcast_add_grp_elem() in rxe_mcast.c calls rxe_alloc() while holding
spinlocks which in turn calls kzalloc(size, GFP_KERNEL) which is
incorrect. This patch replaces rxe_alloc() by rxe_alloc_locked() which
uses GFP_ATOMIC. This bug was caused by the below mentioned commit and
failing to handle the need for the atomic allocate.
Fixes: 4276fd0ddd ("RDMA/rxe: Remove RXE_POOL_ATOMIC")
Link: https://lore.kernel.org/r/20210813210625.4484-1-rpearsonhpe@gmail.com
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Pull ARM SoC fixes from Arnd Bergmann:
"Not much to see here. Half the fixes this time are for Qualcomm dts
files, fixing small mistakes on certain machines. The other fixes are:
- A 5.13 regression fix for freescale QE interrupt controller\
- A fix for TI OMAP gpt12 timer error handling
- A randconfig build regression fix for ixp4xx
- Another defconfig fix following the CONFIG_FB dependency rework"
* tag 'soc-fixes-5.14-3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
soc: fsl: qe: fix static checker warning
ARM: ixp4xx: fix building both pci drivers
ARM: configs: Update the nhk8815_defconfig
bus: ti-sysc: Fix error handling for sysc_check_active_timer()
soc: fsl: qe: convert QE interrupt controller to platform_device
arm64: dts: qcom: sdm845-oneplus: fix reserved-mem
arm64: dts: qcom: msm8994-angler: Disable cont_splash_mem
arm64: dts: qcom: sc7280: Fixup cpufreq domain info for cpu7
arm64: dts: qcom: msm8992-bullhead: Fix cont_splash_mem mapping
arm64: dts: qcom: msm8992-bullhead: Remove PSCI
arm64: dts: qcom: c630: fix correct powerdown pin for WSA881x
Pull networking fixes from Jakub Kicinski:
"Networking fixes, including fixes from bpf, wireless and mac80211
trees.
Current release - regressions:
- tipc: call tipc_wait_for_connect only when dlen is not 0
- mac80211: fix locking in ieee80211_restart_work()
Current release - new code bugs:
- bpf: add rcu_read_lock in bpf_get_current_[ancestor_]cgroup_id()
- ethernet: ice: fix perout start time rounding
- wwan: iosm: prevent underflow in ipc_chnl_cfg_get()
Previous releases - regressions:
- bpf: clear zext_dst of dead insns
- sch_cake: fix srchost/dsthost hashing mode
- vrf: reset skb conntrack connection on VRF rcv
- net/rds: dma_map_sg is entitled to merge entries
Previous releases - always broken:
- ethernet: bnxt: fix Tx path locking and races, add Rx path
barriers"
* tag 'net-5.14-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (42 commits)
net: dpaa2-switch: disable the control interface on error path
Revert "flow_offload: action should not be NULL when it is referenced"
iavf: Fix ping is lost after untrusted VF had tried to change MAC
i40e: Fix ATR queue selection
r8152: fix the maximum number of PLA bp for RTL8153C
r8152: fix writing USB_BP2_EN
mptcp: full fully established support after ADD_ADDR
mptcp: fix memory leak on address flush
net/rds: dma_map_sg is entitled to merge entries
net: mscc: ocelot: allow forwarding from bridge ports to the tag_8021q CPU port
net: asix: fix uninit value bugs
ovs: clear skb->tstamp in forwarding path
net: mdio-mux: Handle -EPROBE_DEFER correctly
net: mdio-mux: Don't ignore memory allocation errors
net: mdio-mux: Delete unnecessary devm_kfree
net: dsa: sja1105: fix use-after-free after calling of_find_compatible_node, or worse
sch_cake: fix srchost/dsthost hashing mode
ixgbe, xsk: clean up the resources in ixgbe_xsk_pool_enable error path
net: qlcnic: add missed unlock in qlcnic_83xx_flash_read32
mac80211: fix locking in ieee80211_restart_work()
...
Pull x86 platform driver fixes from Hans de Goede:
- Enable SW_TABLET_MODE support for the TP200s
- Enable WMI on two more Gigabyte motherboards
* tag 'platform-drivers-x86-v5.14-4' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
platform/x86: gigabyte-wmi: add support for B450M S2H V2
platform/x86: gigabyte-wmi: add support for X570 GAMING X
platform/x86: asus-nb-wmi: Add tablet_mode_sw=lid-flip quirk for the TP200s
platform/x86: asus-nb-wmi: Allow configuring SW_TABLET_MODE method with a module option
Currently dpaa2_switch_takedown has a funny name and does not do the
opposite of dpaa2_switch_init, which makes probing fail when we need to
handle an -EPROBE_DEFER.
A sketch of what dpaa2_switch_init does:
dpsw_open
dpaa2_switch_detect_features
dpsw_reset
for (i = 0; i < ethsw->sw_attr.num_ifs; i++) {
dpsw_if_disable
dpsw_if_set_stp
dpsw_vlan_remove_if_untagged
dpsw_if_set_tci
dpsw_vlan_remove_if
}
dpsw_vlan_remove
alloc_ordered_workqueue
dpsw_fdb_remove
dpaa2_switch_ctrl_if_setup
When dpaa2_switch_takedown is called from the error path of
dpaa2_switch_probe(), the control interface, enabled by
dpaa2_switch_ctrl_if_setup from dpaa2_switch_init, remains enabled,
because dpaa2_switch_takedown does not call
dpaa2_switch_ctrl_if_teardown.
Since dpaa2_switch_probe might fail due to EPROBE_DEFER of a PHY, this
means that a second probe of the driver will happen with the control
interface directly enabled.
This will trigger a second error:
[ 93.273528] fsl_dpaa2_switch dpsw.0: dpsw_ctrl_if_set_pools() failed
[ 93.281966] fsl_dpaa2_switch dpsw.0: fsl_mc_driver_probe failed: -13
[ 93.288323] fsl_dpaa2_switch: probe of dpsw.0 failed with error -13
Which if we investigate the /dev/dpaa2_mc_console log, we find out is
caused by:
[E, ctrl_if_set_pools:2211, DPMNG] ctrl_if must be disabled
So make dpaa2_switch_takedown do the opposite of dpaa2_switch_init (in
reasonable limits, no reason to change STP state, re-add VLANs etc), and
rename it to something more conventional, like dpaa2_switch_teardown.
Fixes: 613c0a5810 ("staging: dpaa2-switch: enable the control interface")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Link: https://lore.kernel.org/r/20210819141755.1931423-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This reverts commit 9ea3e52c5b.
Cited commit added a check to make sure 'action' is not NULL, but
'action' is already dereferenced before the check, when calling
flow_offload_has_one_action().
Therefore, the check does not make any sense and results in a smatch
warning:
include/net/flow_offload.h:322 flow_action_mixed_hw_stats_check() warn:
variable dereferenced before check 'action' (see line 319)
Fix by reverting this commit.
Cc: gushengxian <gushengxian@yulong.com>
Fixes: 9ea3e52c5b ("flow_offload: action should not be NULL when it is referenced")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://lore.kernel.org/r/20210819105842.1315705-1-idosch@idosch.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2021-08-18
This series contains updates to i40e and iavf drivers.
Arkadiusz fixes Flow Director not using the correct queue due to calling
the wrong pick Tx function for i40e.
Sylwester resolves traffic loss for iavf when it attempts to change its
MAC address when it does not have permissions to do so.
====================
Link: https://lore.kernel.org/r/20210818174217.4138922-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Make changes to MAC address dependent on the response of PF.
Disallow changes to HW MAC address and MAC filter from untrusted
VF, thanks to that ping is not lost if VF tries to change MAC.
Add a new field in iavf_mac_filter, to indicate whether there
was response from PF for given filter. Based on this field pass
or discard the filter.
If untrusted VF tried to change it's address, it's not changed.
Still filter was changed, because of that ping couldn't go through.
Fixes: c5c922b3e0 ("iavf: fix MAC address setting for VFs when filter is rejected")
Signed-off-by: Przemyslaw Patynowski <przemyslawx.patynowski@intel.com>
Signed-off-by: Sylwester Dziedziuch <sylwesterx.dziedziuch@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Tested-by: Gurucharan G <Gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Without this patch, ATR does not work. Receive/transmit uses queue
selection based on SW DCB hashing method.
If traffic classes are not configured for PF, then use
netdev_pick_tx function for selecting queue for packet transmission.
Instead of calling i40e_swdcb_skb_tx_hash, call netdev_pick_tx,
which ensures that packet is transmitted/received from CPU that is
running the application.
Reproduction steps:
1. Load i40e driver
2. Map each MSI interrupt of i40e port for each CPU
3. Disable ntuple, enable ATR i.e.:
ethtool -K $interface ntuple off
ethtool --set-priv-flags $interface flow-director-atr
4. Run application that is generating traffic and is bound to a
single CPU, i.e.:
taskset -c 9 netperf -H 1.1.1.1 -t TCP_RR -l 10
5. Observe behavior:
Application's traffic should be restricted to the CPU provided in
taskset.
Fixes: 89ec1f0886 ("i40e: Fix queue-to-TC mapping on Tx")
Signed-off-by: Przemyslaw Patynowski <przemyslawx.patynowski@intel.com>
Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Tested-by: Dave Switzer <david.switzer@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Daniel Borkmann says:
====================
pull-request: bpf 2021-08-19
We've added 3 non-merge commits during the last 3 day(s) which contain
a total of 3 files changed, 29 insertions(+), 6 deletions(-).
The main changes are:
1) Fix to clear zext_dst for dead instructions which was causing invalid program
rejections on JITs with bpf_jit_needs_zext such as s390x, from Ilya Leoshkevich.
2) Fix RCU splat in bpf_get_current_{ancestor_,}cgroup_id() helpers when they are
invoked from sleepable programs, from Yonghong Song.
* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
selftests, bpf: Test that dead ldx_w insns are accepted
bpf: Clear zext_dst of dead insns
bpf: Add rcu_read_lock in bpf_get_current_[ancestor_]cgroup_id() helpers
====================
Link: https://lore.kernel.org/r/20210819144904.20069-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The commit 2e6b836312 ("ASoC: intel: atom: Fix reference to PCM
buffer address") changed the reference of PCM buffer address to
substream->runtime->dma_addr as the buffer address may change
dynamically. However, I forgot that the dma_addr field is still not
set up for the CONTINUOUS buffer type (that this driver uses) yet in
5.14 and earlier kernels, and it resulted in garbage I/O. The problem
will be fixed in 5.15, but we need to address it quickly for now.
The fix is to deduce the address again from the DMA pointer with
virt_to_phys(), but from the right one, substream->runtime->dma_area.
Fixes: 2e6b836312 ("ASoC: intel: atom: Fix reference to PCM buffer address")
Reported-and-tested-by: Hans de Goede <hdegoede@redhat.com>
Cc: <stable@vger.kernel.org>
Acked-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/2048c6aa-2187-46bd-6772-36a4fb3c5aeb@redhat.com
Link: https://lore.kernel.org/r/20210819152945.8510-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Fix for omap gpt12 timer error handling
Two of the recent fixes for ti-sysc driver had bad interaction for a
function return value that caused one of the fixes to not work so we
need to change the return value handling. Otherwise early beagleboard
variants still have a boot issue.
* tag 'omap-for-v5.14/gpt12-fix-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap:
bus: ti-sysc: Fix error handling for sysc_check_active_timer()
Link: https://lore.kernel.org/r/pull-1629354796-830948@atomide.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Two legacy PCI sysfs objects "legacy_io" and "legacy_mem" were updated
to use an unified address space in the commit 636b21b501 ("PCI: Revoke
mappings like devmem"). This allows for revocations to be managed from
a single place when drivers want to take over and mmap() a /dev/mem
range.
Following the update, both of the sysfs objects should leverage the
iomem_get_mapping() function to get an appropriate address range, but
only the "legacy_io" has been correctly updated - the second attribute
seems to be using a wrong variable to pass the iomem_get_mapping()
function to.
Thus, correct the variable name used so that the "legacy_mem" sysfs
object would also correctly call the iomem_get_mapping() function.
Fixes: 636b21b501 ("PCI: Revoke mappings like devmem")
Link: https://lore.kernel.org/r/20210812132144.791268-1-kw@linux.com
Signed-off-by: Krzysztof Wilczyński <kw@linux.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Add Jim Quinlan, Nicolas Saenz Julienne, and Florian Fainelli as
maintainers of the Broadcom STB PCIe controller driver.
This driver is also included in these entries:
BROADCOM BCM2711/BCM2835 ARM ARCHITECTURE
BROADCOM BCM7XXX ARM ARCHITECTURE
which cover the Raspberry Pi specifics of the PCIe driver.
Link: https://lore.kernel.org/r/20210818225031.8502-1-jim2101024@gmail.com
Signed-off-by: Jim Quinlan <jim2101024@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
kmalloc_array() is called to allocate memory for tx->descp. If it fails,
the function __sdma_txclean() is called:
__sdma_txclean(dd, tx);
However, in the function __sdma_txclean(), tx-descp is dereferenced if
tx->num_desc is not zero:
sdma_unmap_desc(dd, &tx->descp[0]);
To fix this possible null-pointer dereference, assign the return value of
kmalloc_array() to a local variable descp, and then assign it to tx->descp
if it is not NULL. Otherwise, go to enomem.
Fixes: 7724105686 ("IB/hfi1: add driver files")
Link: https://lore.kernel.org/r/20210806133029.194964-1-islituo@gmail.com
Reported-by: TOTE Robot <oslab@tsinghua.edu.cn>
Signed-off-by: Tuo Li <islituo@gmail.com>
Tested-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>
Acked-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
In Kconfig, references to config symbols do not use the prefix "CONFIG_".
Commit fa0cf568fd ("RDMA/irdma: Add irdma Kconfig/Makefile and remove
i40iw") selects config CONFIG_AUXILIARY_BUS in config INFINIBAND_IRDMA,
but intended to select config AUXILIARY_BUS.
Fixes: fa0cf568fd ("RDMA/irdma: Add irdma Kconfig/Makefile and remove i40iw")
Link: https://lore.kernel.org/r/20210817084158.10095-1-lukas.bulwahn@gmail.com
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Fix the below crash when deleting a slave from the unaffiliated list
twice. First time when the slave is bound to the master and the second
when the slave is unloaded.
Fix it by checking if slave is unaffiliated (doesn't have ib device)
before removing from the list.
RIP: 0010:mlx5r_mp_remove+0x4e/0xa0 [mlx5_ib]
Call Trace:
auxiliary_bus_remove+0x18/0x30
__device_release_driver+0x177/x220
device_release_driver+0x24/0x30
bus_remove_device+0xd8/0x140
device_del+0x18a/0x3e0
mlx5_rescan_drivers_locked+0xa9/0x210 [mlx5_core]
mlx5_unregister_device+0x34/0x60 [mlx5_core]
mlx5_uninit_one+0x32/0x100 [mlx5_core]
remove_one+0x6e/0xe0 [mlx5_core]
pci_device_remove+0x36/0xa0
__device_release_driver+0x177/0x220
device_driver_detach+0x3c/0xa0
unbind_store+0x113/0x130
kernfs_fop_write_iter+0x110/0x1a0
new_sync_write+0x116/0x1a0
vfs_write+0x1ba/0x260
ksys_write+0x5f/0xe0
do_syscall_64+0x3d/0x90
entry_SYSCALL_64_after_hwframe+0x44/0xae
Fixes: 93f8244431 ("RDMA/mlx5: Convert mlx5_ib to use auxiliary bus")
Link: https://lore.kernel.org/r/17ec98989b0ba88f7adfbad68eb20bce8d567b44.1628587493.git.leonro@nvidia.com
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Hayes Wang says:
====================
r8152: fix bp settings
Fix the wrong bp settings of the firmware.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The maximum PLA bp number of RTL8153C is 16, not 8. That is, the
bp 0 ~ 15 are at 0xfc28 ~ 0xfc46, and the bp_en is at 0xfc48.
Fixes: 195aae321c ("r8152: support new chips")
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The register of USB_BP2_EN is 16 bits, so we should use
ocp_write_word(), not ocp_write_byte().
Fixes: 9370f2d05a ("support request_firmware for RTL8153")
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mat Martineau says:
====================
mptcp: Bug fixes
Here are two bug fixes for the net tree:
Patch 1 fixes a memory leak that could be encountered when clearing the
list of advertised MPTCP addresses.
Patch 2 fixes a protocol issue early in an MPTCP connection, to ensure
both peers correctly understand that the full MPTCP connection handshake
has completed even when the server side quickly sends an ADD_ADDR
option.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
If directly after an MP_CAPABLE 3WHS, the client receives an ADD_ADDR
with HMAC from the server, it is enough to switch to a "fully
established" mode because it has received more MPTCP options.
It was then OK to enable the "fully_established" flag on the MPTCP
socket. Still, best to check if the ADD_ADDR looks valid by looking if
it contains an HMAC (no 'echo' bit). If an ADD_ADDR echo is received
while we are not in "fully established" mode, it is strange and then
we should not switch to this mode now.
But that is not enough. On one hand, the path-manager has be notified
the state has changed. On the other hand, the "fully_established" flag
on the subflow socket should be turned on as well not to re-send the
MP_CAPABLE 3rd ACK content with the next ACK.
Fixes: 84dfe3677a ("mptcp: send out dedicated ADD_ADDR packet")
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In __init_el2_timers we initialize CNTHCTL_EL2.{EL1PCEN,EL1PCTEN} with a
RMW sequence, leaving all other bits UNKNOWN.
In general, we should initialize all bits in a register rather than
using an RMW sequence, since most bits are UNKNOWN out of reset, and as
new bits are added to the reigster their reset value might not result in
expected behaviour.
In the case of CNTHCTL_EL2, FEAT_ECV added a number of new control bits
in previously RES0 bits, which reset to UNKNOWN values, and may cause
issues for EL1 and EL0:
* CNTHCTL_EL2.ECV enables the CNTPOFF_EL2 offset (which itself resets to
an UNKNOWN value) at EL0 and EL1. Since the offset could reset to
distinct values across CPUs, when the control bit resets to 1 this
could break timekeeping generally.
* CNTHCTL_EL2.{EL1TVT,EL1TVCT} trap EL0 and EL1 accesses to the EL1
virtual timer/counter registers to EL2. When reset to 1, this could
cause unexpected traps to EL2.
Initializing these bits to zero avoids these problems, and all other
bits in CNTHCTL_EL2 other than EL1PCEN and EL1PCTEN can safely be reset
to zero.
This patch ensures we initialize CNTHCTL_EL2 accordingly, only setting
EL1PCEN and EL1PCTEN, and setting all other bits to zero.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oupton@google.com>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Oliver Upton <oupton@google.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210818161535.52786-1-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Laurent reported that STRICT_MODULE_RWX was causing intermittent crashes
on one of his systems:
kernel tried to execute exec-protected page (c008000004073278) - exploit attempt? (uid: 0)
BUG: Unable to handle kernel instruction fetch
Faulting instruction address: 0xc008000004073278
Oops: Kernel access of bad area, sig: 11 [#1]
LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
Modules linked in: drm virtio_console fuse drm_panel_orientation_quirks ...
CPU: 3 PID: 44 Comm: kworker/3:1 Not tainted 5.14.0-rc4+ #12
Workqueue: events control_work_handler [virtio_console]
NIP: c008000004073278 LR: c008000004073278 CTR: c0000000001e9de0
REGS: c00000002e4ef7e0 TRAP: 0400 Not tainted (5.14.0-rc4+)
MSR: 800000004280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 24002822 XER: 200400cf
...
NIP fill_queue+0xf0/0x210 [virtio_console]
LR fill_queue+0xf0/0x210 [virtio_console]
Call Trace:
fill_queue+0xb4/0x210 [virtio_console] (unreliable)
add_port+0x1a8/0x470 [virtio_console]
control_work_handler+0xbc/0x1e8 [virtio_console]
process_one_work+0x290/0x590
worker_thread+0x88/0x620
kthread+0x194/0x1a0
ret_from_kernel_thread+0x5c/0x64
Jordan, Fabiano & Murilo were able to reproduce and identify that the
problem is caused by the call to module_enable_ro() in do_init_module(),
which happens after the module's init function has already been called.
Our current implementation of change_page_attr() is not safe against
concurrent accesses, because it invalidates the PTE before flushing the
TLB and then installing the new PTE. That leaves a window in time where
there is no valid PTE for the page, if another CPU tries to access the
page at that time we see something like the fault above.
We can't simply switch to set_pte_at()/flush TLB, because our hash MMU
code doesn't handle a set_pte_at() of a valid PTE. See [1].
But we do have pte_update(), which replaces the old PTE with the new,
meaning there's no window where the PTE is invalid. And the hash MMU
version hash__pte_update() deals with synchronising the hash page table
correctly.
[1]: https://lore.kernel.org/linuxppc-dev/87y318wp9r.fsf@linux.ibm.com/
Fixes: 1f9ad21c3b ("powerpc/mm: Implement set_memory() routines")
Reported-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Murilo Opsfelder Araújo <muriloo@linux.ibm.com>
Tested-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210818120518.3603172-1-mpe@ellerman.id.au
Commit b5efec00b6 ("powerpc/32s: Move KUEP locking/unlocking in C")
removed the 'isync' instruction after adding/removing NX bit in user
segments. The reasoning behind this change was that when setting the
NX bit we don't mind it taking effect with delay as the kernel never
executes text from userspace, and when clearing the NX bit this is
to return to userspace and then the 'rfi' should synchronise the
context.
However, it looks like on book3s/32 having a hash page table, at least
on the G3 processor, we get an unexpected fault from userspace, then
this is followed by something wrong in the verification of MSR_PR
at end of another interrupt.
This is fixed by adding back the removed isync() following update
of NX bit in user segment registers. Only do it for cores with an
hash table, as 603 cores don't exhibit that problem and the two isync
increase ./null_syscall selftest by 6 cycles on an MPC 832x.
First problem: unexpected WARN_ON() for mysterious PROTFAULT
WARNING: CPU: 0 PID: 1660 at arch/powerpc/mm/fault.c:354 do_page_fault+0x6c/0x5b0
Modules linked in:
CPU: 0 PID: 1660 Comm: Xorg Not tainted 5.13.0-pmac-00028-gb3c15b60339a #40
NIP: c001b5c8 LR: c001b6f8 CTR: 00000000
REGS: e2d09e40 TRAP: 0700 Not tainted (5.13.0-pmac-00028-gb3c15b60339a)
MSR: 00021032 <ME,IR,DR,RI> CR: 42d04f30 XER: 20000000
GPR00: c000424c e2d09f00 c301b680 e2d09f40 0000001e 42000000 00cba028 00000000
GPR08: 08000000 48000010 c301b680 e2d09f30 22d09f30 00c1fff0 00cba000 a7b7ba4c
GPR16: 00000031 00000000 00000000 00000000 00000000 00000000 a7b7b0d0 00c5c010
GPR24: a7b7b64c a7b7d2f0 00000004 00000000 c1efa6c0 00cba02c 00000300 e2d09f40
NIP [c001b5c8] do_page_fault+0x6c/0x5b0
LR [c001b6f8] do_page_fault+0x19c/0x5b0
Call Trace:
[e2d09f00] [e2d09f04] 0xe2d09f04 (unreliable)
[e2d09f30] [c000424c] DataAccess_virt+0xd4/0xe4
--- interrupt: 300 at 0xa7a261dc
NIP: a7a261dc LR: a7a253bc CTR: 00000000
REGS: e2d09f40 TRAP: 0300 Not tainted (5.13.0-pmac-00028-gb3c15b60339a)
MSR: 0000d032 <EE,PR,ME,IR,DR,RI> CR: 228428e2 XER: 20000000
DAR: 00cba02c DSISR: 42000000
GPR00: a7a27448 afa6b0e0 a74c35c0 a7b7b614 0000001e a7b7b614 00cba028 00000000
GPR08: 00020fd9 00000031 00cb9ff8 a7a273b0 220028e2 00c1fff0 00cba000 a7b7ba4c
GPR16: 00000031 00000000 00000000 00000000 00000000 00000000 a7b7b0d0 00c5c010
GPR24: a7b7b64c a7b7d2f0 00000004 00000002 0000001e a7b7b614 a7b7aff4 00000030
NIP [a7a261dc] 0xa7a261dc
LR [a7a253bc] 0xa7a253bc
--- interrupt: 300
Instruction dump:
7c4a1378 810300a0 75278410 83820298 83a300a4 553b018c 551e0036 4082038c
2e1b0000 40920228 75280800 41820220 <0fe00000> 3b600000 41920214 81420594
Second problem: MSR PR is seen unset allthough the interrupt frame shows it set
kernel BUG at arch/powerpc/kernel/interrupt.c:458!
Oops: Exception in kernel mode, sig: 5 [#1]
BE PAGE_SIZE=4K MMU=Hash SMP NR_CPUS=2 PowerMac
Modules linked in:
CPU: 0 PID: 1660 Comm: Xorg Tainted: G W 5.13.0-pmac-00028-gb3c15b60339a #40
NIP: c0011434 LR: c001629c CTR: 00000000
REGS: e2d09e70 TRAP: 0700 Tainted: G W (5.13.0-pmac-00028-gb3c15b60339a)
MSR: 00029032 <EE,ME,IR,DR,RI> CR: 42d09f30 XER: 00000000
GPR00: 00000000 e2d09f30 c301b680 e2d09f40 83440000 c44d0e68 e2d09e8c 00000000
GPR08: 00000002 00dc228a 00004000 e2d09f30 22d09f30 00c1fff0 afa6ceb4 00c26144
GPR16: 00c25fb8 00c26140 afa6ceb8 90000000 00c944d8 0000001c 00000000 00200000
GPR24: 00000000 000001fb afa6d1b4 00000001 00000000 a539a2a0 a530fd80 00000089
NIP [c0011434] interrupt_exit_kernel_prepare+0x10/0x70
LR [c001629c] interrupt_return+0x9c/0x144
Call Trace:
[e2d09f30] [c000424c] DataAccess_virt+0xd4/0xe4 (unreliable)
--- interrupt: 300 at 0xa09be008
NIP: a09be008 LR: a09bdfe8 CTR: a09bdfc0
REGS: e2d09f40 TRAP: 0300 Tainted: G W (5.13.0-pmac-00028-gb3c15b60339a)
MSR: 0000d032 <EE,PR,ME,IR,DR,RI> CR: 420028e2 XER: 20000000
DAR: a539a308 DSISR: 0a000000
GPR00: a7b90d50 afa6b2d0 a74c35c0 a0a8b690 a0a8b698 a5365d70 a4fa82a8 00000004
GPR08: 00000000 a09bdfc0 00000000 a5360000 a09bde7c 00c1fff0 afa6ceb4 00c26144
GPR16: 00c25fb8 00c26140 afa6ceb8 90000000 00c944d8 0000001c 00000000 00200000
GPR24: 00000000 000001fb afa6d1b4 00000001 00000000 a539a2a0 a530fd80 00000089
NIP [a09be008] 0xa09be008
LR [a09bdfe8] 0xa09bdfe8
--- interrupt: 300
Instruction dump:
80010024 83e1001c 7c0803a6 4bffff80 3bc00800 4bffffd0 486b42fd 4bffffcc
81430084 71480002 41820038 554a0462 <0f0a0000> 80620060 74630001 40820034
Fixes: b5efec00b6 ("powerpc/32s: Move KUEP locking/unlocking in C")
Cc: stable@vger.kernel.org # v5.13+
Reported-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/4856f5574906e2aec0522be17bf3848a22b2cd0b.1629269345.git.christophe.leroy@csgroup.eu
Function "dma_map_sg" is entitled to merge adjacent entries
and return a value smaller than what was passed as "nents".
Subsequently "ib_map_mr_sg" needs to work with this value ("sg_dma_len")
rather than the original "nents" parameter ("sg_len").
This old RDS bug was exposed and reliably causes kernel panics
(using RDMA operations "rds-stress -D") on x86_64 starting with:
commit c588072bba ("iommu/vt-d: Convert intel iommu driver to the iommu ops")
Simply put: Linux 5.11 and later.
Signed-off-by: Gerd Rausch <gerd.rausch@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Link: https://lore.kernel.org/r/60efc69f-1f35-529d-a7ef-da0549cad143@oracle.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Currently we are unable to ping a bridge on top of a felix switch which
uses the ocelot-8021q tagger. The packets are dropped on the ingress of
the user port and the 'drop_local' counter increments (the counter which
denotes drops due to no valid destinations).
Dumping the PGID tables, it becomes clear that the PGID_SRC of the user
port is zero, so it has no valid destinations.
But looking at the code, the cpu_fwd_mask (the bit mask of DSA tag_8021q
ports) is clearly missing from the forwarding mask of ports that are
under a bridge. So this has always been broken.
Looking at the version history of the patch, in v7
https://patchwork.kernel.org/project/netdevbpf/patch/20210125220333.1004365-12-olteanv@gmail.com/
the code looked like this:
/* Standalone ports forward only to DSA tag_8021q CPU ports */
unsigned long mask = cpu_fwd_mask;
(...)
} else if (ocelot->bridge_fwd_mask & BIT(port)) {
mask |= ocelot->bridge_fwd_mask & ~BIT(port);
while in v8 (the merged version)
https://patchwork.kernel.org/project/netdevbpf/patch/20210129010009.3959398-12-olteanv@gmail.com/
it looked like this:
unsigned long mask;
(...)
} else if (ocelot->bridge_fwd_mask & BIT(port)) {
mask = ocelot->bridge_fwd_mask & ~BIT(port);
So the breakage was introduced between v7 and v8 of the patch.
Fixes: e21268efbe ("net: dsa: felix: perform switch setup for tag_8021q")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20210817160425.3702809-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull btrfs fix from David Sterba:
"One more fix for cross-rename, adding a missing check for directory
and subvolume, this could lead to a crash"
* tag 'for-5.14-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: prevent rename2 from exchanging a subvol with a directory from different parents
Pull sound fixes from Takashi Iwai:
"Only a few regression fixes and trivial device quirks"
* tag 'sound-5.14-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda/via: Apply runtime PM workaround for ASUS B23E
ALSA: hda: Fix hang during shutdown due to link reset
ALSA: hda/realtek: Enable 4-speaker output for Dell XPS 15 9510 laptop
ALSA: oxfw: fix functioal regression for silence in Apogee Duet FireWire
ALSA: hda - fix the 'Capture Switch' value change notifications
Pull clang cfi fix from Kees Cook:
- Use rcu_read_{un}lock_sched_notrace to avoid recursion (Elliot Berman)
* tag 'cfi-v5.14-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
cfi: Use rcu_read_{un}lock_sched_notrace
I had forgotten just how sensitive hackbench is to extra pipe wakeups,
and commit 3a34b13a88 ("pipe: make pipe writes always wake up
readers") ended up causing a quite noticeable regression on larger
machines.
Now, hackbench isn't necessarily a hugely meaningful benchmark, and it's
not clear that this matters in real life all that much, but as Mel
points out, it's used often enough when comparing kernels and so the
performance regression shows up like a sore thumb.
It's easy enough to fix at least for the common cases where pipes are
used purely for data transfer, and you never have any exciting poll
usage at all. So set a special 'poll_usage' flag when there is polling
activity, and make the ugly "EPOLLET has crazy legacy expectations"
semantics explicit to only that case.
I would love to limit it to just the broken EPOLLET case, but the pipe
code can't see the difference between epoll and regular select/poll, so
any non-read/write waiting will trigger the extra wakeup behavior. That
is sufficient for at least the hackbench case.
Apart from making the odd extra wakeup cases more explicitly about
EPOLLET, this also makes the extra wakeup be at the _end_ of the pipe
write, not at the first write chunk. That is actually much saner
semantics (as much as you can call any of the legacy edge-triggered
expectations for EPOLLET "sane") since it means that you know the wakeup
will happen once the write is done, rather than possibly in the middle
of one.
[ For stable people: I'm putting a "Fixes" tag on this, but I leave it
up to you to decide whether you actually want to backport it or not.
It likely has no impact outside of synthetic benchmarks - Linus ]
Link: https://lore.kernel.org/lkml/20210802024945.GA8372@xsang-OptiPlex-9020/
Fixes: 3a34b13a88 ("pipe: make pipe writes always wake up readers")
Reported-by: kernel test robot <oliver.sang@intel.com>
Tested-by: Sandeep Patil <sspatil@android.com>
Tested-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit a20dcf53ea ("usb: typec: tcpm: Respond Not_Supported if no
snk_vdo"), stops tcpm_pd_data_request() calling tcpm_handle_vdm_request()
when port->nr_snk_vdo is not set. But the VDM might be intended for an
altmode-driver, in which case nr_snk_vdo does not matter.
This change breaks the forwarding of connector hotplug (HPD) events
for displayport altmode on devices which don't set nr_snk_vdo.
tcpm_pd_data_request() is the only caller of tcpm_handle_vdm_request(),
so we can move the nr_snk_vdo check to inside it, at which point we
have already looked up the altmode device so we can check for this too.
Doing this check here also ensures that vdm_state gets set to
VDM_STATE_DONE if it was VDM_STATE_BUSY, even if we end up with
responding with PD_MSG_CTRL_NOT_SUPP later.
Note that tcpm_handle_vdm_request() was already sending
PD_MSG_CTRL_NOT_SUPP in some circumstances, after moving the nr_snk_vdo
check the same error-path is now taken when that check fails. So that
we have only one error-path for this and not two. Replace the
tcpm_queue_message(PD_MSG_CTRL_NOT_SUPP) used by the existing error-path
with the more robust tcpm_pd_handle_msg() from the (now removed) second
error-path.
Fixes: a20dcf53ea ("usb: typec: tcpm: Respond Not_Supported if no snk_vdo")
Cc: stable <stable@vger.kernel.org>
Cc: Kyle Tso <kyletso@google.com>
Acked-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Acked-by: Kyle Tso <kyletso@google.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20210816154632.381968-1-hdegoede@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Compiling ppc64le_defconfig with clang-14 shows a modpost warning:
WARNING: modpost: vmlinux.o(.text+0xa74e0): Section mismatch in
reference from the function xive_setup_cpu_ipi() to the function
.init.text:xive_request_ipi()
The function xive_setup_cpu_ipi() references
the function __init xive_request_ipi().
This is often because xive_setup_cpu_ipi lacks a __init
annotation or the annotation of xive_request_ipi is wrong.
xive_request_ipi() is called from xive_setup_cpu_ipi(), which is not
__init, so xive_request_ipi() should not be marked __init. Remove the
attribute so there is no more warning.
Fixes: cbc06f051c ("powerpc/xive: Do not skip CPU-less nodes when creating the IPIs")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210816185711.21563-1-nathan@kernel.org
This fixes improper iotlb invalidation in intel_pasid_tear_down_entry().
When a PASID was used as nested mode, released and reused, the following
error message will appear:
[ 180.187556] Unexpected page request in Privilege Mode
[ 180.187565] Unexpected page request in Privilege Mode
[ 180.279933] Unexpected page request in Privilege Mode
[ 180.279937] Unexpected page request in Privilege Mode
Per chapter 6.5.3.3 of VT-d spec 3.3, when tear down a pasid entry, the
software should use Domain selective IOTLB flush if the PGTT of the pasid
entry is SL only or Nested, while for the pasid entries whose PGTT is FL
only or PT using PASID-based IOTLB flush is enough.
Fixes: 2cd1311a26 ("iommu/vt-d: Add set domain DOMAIN_ATTR_NESTING attr")
Signed-off-by: Kumar Sanjay K <sanjay.k.kumar@intel.com>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
Tested-by: Yi Sun <yi.y.sun@intel.com>
Link: https://lore.kernel.org/r/20210817042425.1784279-1-yi.l.liu@intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20210817124321.1517985-3-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
A PASID reference is increased whenever a device is bound to an mm (and
its PASID) successfully (i.e. the device's sdev user count is increased).
But the reference is not dropped every time the device is unbound
successfully from the mm (i.e. the device's sdev user count is decreased).
The reference is dropped only once by calling intel_svm_free_pasid() when
there isn't any device bound to the mm. intel_svm_free_pasid() drops the
reference and only frees the PASID on zero reference.
Fix the issue by dropping the PASID reference and freeing the PASID when
no reference on successful unbinding the device by calling
intel_svm_free_pasid() .
Fixes: 4048377414 ("iommu/vt-d: Use iommu_sva_alloc(free)_pasid() helpers")
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: https://lore.kernel.org/r/20210813181345.1870742-1-fenghua.yu@intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20210817124321.1517985-2-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Syzbot reported uninit-value in asix_mdio_read(). The problem was in
missing error handling. asix_read_cmd() should initialize passed stack
variable smsr, but it can fail in some cases. Then while condidition
checks possibly uninit smsr variable.
Since smsr is uninitialized stack variable, driver can misbehave,
because smsr will be random in case of asix_read_cmd() failure.
Fix it by adding error handling and just continue the loop instead of
checking uninit value.
Added helper function for checking Host_En bit, since wrong loop was used
in 4 functions and there is no need in copy-pasting code parts.
Cc: Robert Foss <robert.foss@collabora.com>
Fixes: d9fe64e511 ("net: asix: Add in_pm parameter")
Reported-by: syzbot+a631ec9e717fb0423053@syzkaller.appspotmail.com
Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
fq qdisc requires tstamp to be cleared in the forwarding path. Now ovs
doesn't clear skb->tstamp. We encountered a problem with linux
version 5.4.56 and ovs version 2.14.1, and packets failed to
dequeue from qdisc when fq qdisc was attached to ovs port.
Fixes: fb420d5d91 ("tcp/fq: move back to CLOCK_MONOTONIC")
Signed-off-by: kaixi.fan <fankaixi.li@bytedance.com>
Signed-off-by: xiexiaohui <xiexiaohui.xxh@bytedance.com>
Reviewed-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Saravana Kannan says:
====================
Clean up and fix error handling in mdio_mux_init()
This patch series was started due to -EPROBE_DEFER not being handled
correctly in mdio_mux_init() and causing issues [1]. While at it, I also
did some more error handling fixes and clean ups. The -EPROBE_DEFER fix is
the last patch.
Ideally, in the last patch we'd treat any error similar to -EPROBE_DEFER
but I'm not sure if it'll break any board/platforms where some child
mdiobus never successfully registers. If we treated all errors similar to
-EPROBE_DEFER, then none of the child mdiobus will work and that might be a
regression. If people are sure this is not a real case, then I can fix up
the last patch to always fail the entire mdio-mux init if any of the child
mdiobus registration fails.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
When registering mdiobus children, if we get an -EPROBE_DEFER, we shouldn't
ignore it and continue registering the rest of the mdiobus children. This
would permanently prevent the deferring child mdiobus from working instead
of reattempting it in the future. So, if a child mdiobus needs to be
reattempted in the future, defer the entire mdio-mux initialization.
This fixes the issue where PHYs sitting under the mdio-mux aren't
initialized correctly if the PHY's interrupt controller is not yet ready
when the mdio-mux is being probed. Additional context in the link below.
Fixes: 0ca2997d14 ("netdev/of/phy: Add MDIO bus multiplexer support.")
Link: https://lore.kernel.org/lkml/CAGETcx95kHrv8wA-O+-JtfH7H9biJEGJtijuPVN0V5dUKUAB3A@mail.gmail.com/#t
Signed-off-by: Saravana Kannan <saravanak@google.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Marc Zyngier <maz@kernel.org>
Tested-by: Marc Zyngier <maz@kernel.org>
Acked-by: Kevin Hilman <khilman@baylibre.com>
Tested-by: Kevin Hilman <khilman@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If we are seeing memory allocation errors, don't try to continue
registering child mdiobus devices. It's unlikely they'll succeed.
Fixes: 342fa19644 ("mdio: mux: make child bus walking more permissive and errors more verbose")
Signed-off-by: Saravana Kannan <saravanak@google.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Marc Zyngier <maz@kernel.org>
Tested-by: Marc Zyngier <maz@kernel.org>
Acked-by: Kevin Hilman <khilman@baylibre.com>
Tested-by: Kevin Hilman <khilman@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The whole point of devm_* APIs is that you don't have to undo them if you
are returning an error that's going to get propagated out of a probe()
function. So delete unnecessary devm_kfree() call in the error return path.
Fixes: b601616681 ("mdio: mux: Correct mdio_mux_init error path issues")
Signed-off-by: Saravana Kannan <saravanak@google.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Marc Zyngier <maz@kernel.org>
Tested-by: Marc Zyngier <maz@kernel.org>
Acked-by: Kevin Hilman <khilman@baylibre.com>
Tested-by: Kevin Hilman <khilman@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It seems that of_find_compatible_node has a weird calling convention in
which it calls of_node_put() on the "from" node argument, instead of
leaving that up to the caller. This comes from the fact that
of_find_compatible_node with a non-NULL "from" argument it only supposed
to be used as the iterator function of for_each_compatible_node(). OF
iterator functions call of_node_get on the next OF node and of_node_put()
on the previous one.
When of_find_compatible_node calls of_node_put, it actually never
expects the refcount to drop to zero, because the call is done under the
atomic devtree_lock context, and when the refcount drops to zero it
triggers a kobject and a sysfs file deletion, which assume blocking
context.
So any driver call to of_find_compatible_node is probably buggy because
an unexpected of_node_put() takes place.
What should be done is to use the of_get_compatible_child() function.
Fixes: 5a8f09748e ("net: dsa: sja1105: register the MDIO buses for 100base-T1 and 100base-TX")
Link: https://lore.kernel.org/netdev/20210814010139.kzryimmp4rizlznt@skbuf/
Suggested-by: Frank Rowand <frowand.list@gmail.com>
Suggested-by: Rob Herring <robh+dt@kernel.org>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When adding support for using the skb->hash value as the flow hash in CAKE,
I accidentally introduced a logic error that broke the host-only isolation
modes of CAKE (srchost and dsthost keywords). Specifically, the flow_hash
variable should stay initialised to 0 in cake_hash() in pure host-based
hashing mode. Add a check for this before using the skb->hash value as
flow_hash.
Fixes: b0c19ed608 ("sch_cake: Take advantage of skb->hash where appropriate")
Reported-by: Pete Heist <pete@heistp.net>
Tested-by: Pete Heist <pete@heistp.net>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
No longer required now that userspace can't touch anything that might
need it, and should fix DRM MM operations racing with each other, and
the random hangs/crashes that come with that.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Lyude Paul <lyude@redhat.com>
Long ago, there had been plans for making use of a bunch of these APIs
from userspace and there's various checks in place to stop misbehaving.
Countless other projects have occurred in the meantime, and the pieces
didn't finish falling into place for that to happen.
They will (hopefully) in the not-too-distant future, but it won't look
quite as insane. The super checks are causing problems right now, and
are going to be removed.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Lyude Paul <lyude@redhat.com>
I honestly don't even know why... These have never been used.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Lyude Paul <lyude@redhat.com>
Should fix some initial modeset failures on (at least) Ampere boards.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Lyude Paul <lyude@redhat.com>
When booted with multiple displays attached, the EFI GOP driver on (at
least) Ampere, can leave DP links powered up that aren't being used to
display anything. This confuses our tracking of SOR routing, with the
likely result being a failed modeset and display engine hang.
Fix this by (ab?)using the DisableLT IED script to power-down the link,
restoring HW to a state the driver expects.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Lyude Paul <lyude@redhat.com>
The struct pci_dev uses reference counting but zPCI assumed erroneously
that the last reference would always be the local reference after
calling pci_stop_and_remove_bus_device(). This is usually the case but
not how reference counting works and thus inherently fragile.
In fact one case where this causes a NULL pointer dereference when on an
SRIOV device the function 0 was hot unplugged before another function of
the same multi-function device. In this case the second function's
pdev->sriov->dev reference keeps the struct pci_dev of function 0 alive
even after the unplug. This bug was previously hidden by the fact that
we were leaking the struct pci_dev which in turn means that it always
outlived the struct zpci_dev. This was fixed in commit 0b13525c20
("s390/pci: fix leak of PCI device structure") exposing the broken
behavior.
Fix this by accounting for the long living reference a struct pci_dev
has to its underlying struct zpci_dev via the zbus->function[] array and
only release that in pcibios_release_device() ensuring that the struct
pci_dev is not left with a dangling reference. This is a minimal fix in
the future it would probably better to use fine grained reference
counting for struct zpci_dev.
Fixes: 05bc1be6db ("s390/pci: create zPCI bus")
Cc: stable@vger.kernel.org
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
is_flush_rq() is called from bt_iter()/bt_tags_iter(), and runs the
following check:
hctx->fq->flush_rq == req
but the passed hctx from bt_iter()/bt_tags_iter() may be NULL because:
1) memory re-order in blk_mq_rq_ctx_init():
rq->mq_hctx = data->hctx;
...
refcount_set(&rq->ref, 1);
OR
2) tag re-use and ->rqs[] isn't updated with new request.
Fix the issue by re-writing is_flush_rq() as:
return rq->end_io == flush_end_io;
which turns out simpler to follow and immune to data race since we have
ordered WRITE rq->end_io and refcount_set(&rq->ref, 1).
Fixes: 2e315dc07d ("blk-mq: grab rq->refcount before calling ->fn in blk_mq_tagset_busy_iter")
Cc: "Blank-Burian, Markus, Dr." <blankburian@uni-muenster.de>
Cc: Yufen Yu <yuyufen@huawei.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20210818010925.607383-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Kalle Valo says:
====================
wireless-drivers fixes for v5.14
First set of fixes for v5.14 and nothing major this time. New devices
for iwlwifi and one fix for a compiler warning.
iwlwifi
* support for new devices
mt76
* fix compiler warning about MT_CIPHER_NONE
* tag 'wireless-drivers-2021-08-17' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers:
mt76: fix enum type mismatch
iwlwifi: add new so-jf devices
iwlwifi: add new SoF with JF devices
iwlwifi: pnvm: accept multiple HW-type TLVs
====================
Link: https://lore.kernel.org/r/20210817171027.EC1E6C43460@smtp.codeaurora.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull tracing fix from Steven Rostedt:
"Limit the shooting in the foot of tp_printk
The "tp_printk" option redirects the trace event output to printk at
boot up. This is useful when a machine crashes before boot where the
trace events can not be retrieved by the in kernel ring buffer. But it
can be "dangerous" because trace events can be located in high
frequency locations such as interrupts and the scheduler, where a
printk can slow it down that it live locks the machine (because by the
time the printk finishes, the next event is triggered). Thus tp_printk
must be used with care.
It was discovered that the filter logic to trace events does not apply
to the tp_printk events. This can cause a surprise and live lock when
the user expects it to be filtered to limit the amount of events
printed to the console when in fact it still prints everything"
* tag 'trace-v5.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Apply trace filters on all output channels
Pull ARM cpufreq fixes for v5.14 from Viresh Kumar:
"This contains:
- Addition of SoCs to blocklist for cpufreq-dt driver (Bjorn Andersson
and Thara Gopinath).
- Fix error path for scmi driver (Lukasz Luba).
- Temporarily disable highest frequency for armada, its unsafe and
breaks stuff."
* 'cpufreq/arm/fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm:
cpufreq: armada-37xx: forbid cpufreq for 1.2 GHz variant
cpufreq: blocklist Qualcomm sm8150 in cpufreq-dt-platdev
cpufreq: arm_scmi: Fix error path when allocation failed
cpufreq: blacklist Qualcomm sc8180x in cpufreq-dt-platdev
[Why]
Userspace should get back a copy of drm_wait_vblank that's been modified
even when drm_wait_vblank_ioctl returns a failure.
Rationale:
drm_wait_vblank_ioctl modifies the request and expects the user to read
it back. When the type is RELATIVE, it modifies it to ABSOLUTE and updates
the sequence to become current_vblank_count + sequence (which was
RELATIVE), but now it became ABSOLUTE.
drmWaitVBlank (in libdrm) expects this to be the case as it modifies
the request to be Absolute so it expects the sequence to would have been
updated.
The change is in compat_drm_wait_vblank, which is called by
drm_compat_ioctl. This change of copying the data back regardless of the
return number makes it en par with drm_ioctl, which always copies the
data before returning.
[How]
Return from the function after everything has been copied to user.
Fixes IGT:kms_flip::modeset-vs-vblank-race-interruptible
Tested on ChromeOS Trogdor(msm)
Reviewed-by: Michel Dänzer <mdaenzer@redhat.com>
Signed-off-by: Mark Yacoub <markyacoub@chromium.org>
Signed-off-by: Sean Paul <seanpaul@chromium.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20210812194917.1703356-1-markyacoub@chromium.org
qlcnic_83xx_unlock_flash() is called on all paths after we call
qlcnic_83xx_lock_flash(), except for one error path on failure
of QLCRD32(), which may cause a deadlock. This bug is suggested
by a static analysis tool, please advise.
Fixes: 81d0aeb0a4 ("qlcnic: flash template based firmware reset recovery")
Signed-off-by: Dinghao Liu <dinghao.liu@zju.edu.cn>
Link: https://lore.kernel.org/r/20210816131405.24024-1-dinghao.liu@zju.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
For fixing use-after-free during iterating over requests, we grabbed
request's refcount before calling ->fn in commit 2e315dc07d ("blk-mq:
grab rq->refcount before calling ->fn in blk_mq_tagset_busy_iter").
Turns out this way may cause kernel panic when iterating over one flush
request:
1) old flush request's tag is just released, and this tag is reused by
one new request, but ->rqs[] isn't updated yet
2) the flush request can be re-used for submitting one new flush command,
so blk_rq_init() is called at the same time
3) meantime blk_mq_queue_tag_busy_iter() is called, and old flush request
is retrieved from ->rqs[tag]; when blk_mq_put_rq_ref() is called,
flush_rq->end_io may not be updated yet, so NULL pointer dereference
is triggered in blk_mq_put_rq_ref().
Fix the issue by calling refcount_set(&flush_rq->ref, 1) after
flush_rq->end_io is set. So far the only other caller of blk_rq_init() is
scsi_ioctl_reset() in which the request doesn't enter block IO stack and
the request reference count isn't used, so the change is safe.
Fixes: 2e315dc07d ("blk-mq: grab rq->refcount before calling ->fn in blk_mq_tagset_busy_iter")
Reported-by: "Blank-Burian, Markus, Dr." <blankburian@uni-muenster.de>
Tested-by: "Blank-Burian, Markus, Dr." <blankburian@uni-muenster.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: John Garry <john.garry@huawei.com>
Link: https://lore.kernel.org/r/20210811142624.618598-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Commit a02e8964ea ("virtio-net: ethtool configurable LRO")
maps LRO to virtio guest offloading features and allows the
administrator to enable and disable those features via ethtool.
This leads to several issues:
- For a device that doesn't support control guest offloads, the "LRO"
can't be disabled triggering WARN in dev_disable_lro() when turning
off LRO or when enabling forwarding bridging etc.
- For a device that supports control guest offloads, the guest
offloads are disabled in cases of bridging, forwarding etc slowing
down the traffic.
Fix this by using NETIF_F_GRO_HW instead. Though the spec does not
guarantee packets to be re-segmented as the original ones,
we can add that to the spec, possibly with a flag for devices to
differentiate between GRO and LRO.
Further, we never advertised LRO historically before a02e8964ea
("virtio-net: ethtool configurable LRO") and so bridged/forwarded
configs effectively always relied on virtio receive offloads behaving
like GRO - thus even if this breaks any configs it is at least not
a regression.
Fixes: a02e8964ea ("virtio-net: ethtool configurable LRO")
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reported-by: Ivan <ivan@prestigetransportation.com>
Tested-by: Ivan <ivan@prestigetransportation.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull crypto fix from Herbert Xu:
"This contains a fix for a potential boot failure due to a missing
Kconfig dependency for people upgrading with the DRBG enabled"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: drbg - select SHA512
To fix the "reverse-NAT" for replies.
When a packet is sent over a VRF, the POST_ROUTING hooks are called
twice: Once from the VRF interface, and once from the "actual"
interface the packet will be sent from:
1) First SNAT: l3mdev_l3_out() -> vrf_l3_out() -> .. -> vrf_output_direct()
This causes the POST_ROUTING hooks to run.
2) Second SNAT: 'ip_output()' calls POST_ROUTING hooks again.
Similarly for replies, first ip_rcv() calls PRE_ROUTING hooks, and
second vrf_l3_rcv() calls them again.
As an example, consider the following SNAT rule:
> iptables -t nat -A POSTROUTING -p udp -m udp --dport 53 -j SNAT --to-source 2.2.2.2 -o vrf_1
In this case sending over a VRF will create 2 conntrack entries.
The first is from the VRF interface, which performs the IP SNAT.
The second will run the SNAT, but since the "expected reply" will remain
the same, conntrack randomizes the source port of the packet:
e..g With a socket bound to 1.1.1.1:10000, sending to 3.3.3.3:53, the conntrack
rules are:
udp 17 29 src=2.2.2.2 dst=3.3.3.3 sport=10000 dport=53 packets=1 bytes=68 [UNREPLIED] src=3.3.3.3 dst=2.2.2.2 sport=53 dport=61033 packets=0 bytes=0 mark=0 use=1
udp 17 29 src=1.1.1.1 dst=3.3.3.3 sport=10000 dport=53 packets=1 bytes=68 [UNREPLIED] src=3.3.3.3 dst=2.2.2.2 sport=53 dport=10000 packets=0 bytes=0 mark=0 use=1
i.e. First SNAT IP from 1.1.1.1 --> 2.2.2.2, and second the src port is
SNAT-ed from 10000 --> 61033.
But when a reply is sent (3.3.3.3:53 -> 2.2.2.2:61033) only the later
conntrack entry is matched:
udp 17 29 src=2.2.2.2 dst=3.3.3.3 sport=10000 dport=53 packets=1 bytes=68 src=3.3.3.3 dst=2.2.2.2 sport=53 dport=61033 packets=1 bytes=49 mark=0 use=1
udp 17 28 src=1.1.1.1 dst=3.3.3.3 sport=10000 dport=53 packets=1 bytes=68 [UNREPLIED] src=3.3.3.3 dst=2.2.2.2 sport=53 dport=10000 packets=0 bytes=0 mark=0 use=1
And a "port 61033 unreachable" ICMP packet is sent back.
The issue is that when PRE_ROUTING hooks are called from vrf_l3_rcv(),
the skb already has a conntrack flow attached to it, which means
nf_conntrack_in() will not resolve the flow again.
This means only the dest port is "reverse-NATed" (61033 -> 10000) but
the dest IP remains 2.2.2.2, and since the socket is bound to 1.1.1.1 it's
not received.
This can be verified by logging the 4-tuple of the packet in '__udp4_lib_rcv()'.
The fix is then to reset the flow when skb is received on a VRF, to let
conntrack resolve the flow again (which now will hit the earlier flow).
To reproduce: (Without the fix "Got pkt_to_nat_port" will not be printed by
running 'bash ./repro'):
$ cat run_in_A1.py
import logging
logging.getLogger("scapy.runtime").setLevel(logging.ERROR)
from scapy.all import *
import argparse
def get_packet_to_send(udp_dst_port, msg_name):
return Ether(src='11:22:33:44:55:66', dst=iface_mac)/ \
IP(src='3.3.3.3', dst='2.2.2.2')/ \
UDP(sport=53, dport=udp_dst_port)/ \
Raw(f'{msg_name}\x0012345678901234567890')
parser = argparse.ArgumentParser()
parser.add_argument('-iface_mac', dest="iface_mac", type=str, required=True,
help="From run_in_A3.py")
parser.add_argument('-socket_port', dest="socket_port", type=str,
required=True, help="From run_in_A3.py")
parser.add_argument('-v1_mac', dest="v1_mac", type=str, required=True,
help="From script")
args, _ = parser.parse_known_args()
iface_mac = args.iface_mac
socket_port = int(args.socket_port)
v1_mac = args.v1_mac
print(f'Source port before NAT: {socket_port}')
while True:
pkts = sniff(iface='_v0', store=True, count=1, timeout=10)
if 0 == len(pkts):
print('Something failed, rerun the script :(', flush=True)
break
pkt = pkts[0]
if not pkt.haslayer('UDP'):
continue
pkt_sport = pkt.getlayer('UDP').sport
print(f'Source port after NAT: {pkt_sport}', flush=True)
pkt_to_send = get_packet_to_send(pkt_sport, 'pkt_to_nat_port')
sendp(pkt_to_send, '_v0', verbose=False) # Will not be received
pkt_to_send = get_packet_to_send(socket_port, 'pkt_to_socket_port')
sendp(pkt_to_send, '_v0', verbose=False)
break
$ cat run_in_A2.py
import socket
import netifaces
print(f"{netifaces.ifaddresses('e00000')[netifaces.AF_LINK][0]['addr']}",
flush=True)
s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
s.setsockopt(socket.SOL_SOCKET, socket.SO_BINDTODEVICE,
str('vrf_1' + '\0').encode('utf-8'))
s.connect(('3.3.3.3', 53))
print(f'{s. getsockname()[1]}', flush=True)
s.settimeout(5)
while True:
try:
# Periodically send in order to keep the conntrack entry alive.
s.send(b'a'*40)
resp = s.recvfrom(1024)
msg_name = resp[0].decode('utf-8').split('\0')[0]
print(f"Got {msg_name}", flush=True)
except Exception as e:
pass
$ cat repro.sh
ip netns del A1 2> /dev/null
ip netns del A2 2> /dev/null
ip netns add A1
ip netns add A2
ip -n A1 link add _v0 type veth peer name _v1 netns A2
ip -n A1 link set _v0 up
ip -n A2 link add e00000 type bond
ip -n A2 link add lo0 type dummy
ip -n A2 link add vrf_1 type vrf table 10001
ip -n A2 link set vrf_1 up
ip -n A2 link set e00000 master vrf_1
ip -n A2 addr add 1.1.1.1/24 dev e00000
ip -n A2 link set e00000 up
ip -n A2 link set _v1 master e00000
ip -n A2 link set _v1 up
ip -n A2 link set lo0 up
ip -n A2 addr add 2.2.2.2/32 dev lo0
ip -n A2 neigh add 1.1.1.10 lladdr 77:77:77:77:77:77 dev e00000
ip -n A2 route add 3.3.3.3/32 via 1.1.1.10 dev e00000 table 10001
ip netns exec A2 iptables -t nat -A POSTROUTING -p udp -m udp --dport 53 -j \
SNAT --to-source 2.2.2.2 -o vrf_1
sleep 5
ip netns exec A2 python3 run_in_A2.py > x &
XPID=$!
sleep 5
IFACE_MAC=`sed -n 1p x`
SOCKET_PORT=`sed -n 2p x`
V1_MAC=`ip -n A2 link show _v1 | sed -n 2p | awk '{print $2'}`
ip netns exec A1 python3 run_in_A1.py -iface_mac ${IFACE_MAC} -socket_port \
${SOCKET_PORT} -v1_mac ${SOCKET_PORT}
sleep 5
kill -9 $XPID
wait $XPID 2> /dev/null
ip netns del A1
ip netns del A2
tail x -n 2
rm x
set +x
Fixes: 73e20b761a ("net: vrf: Add support for PREROUTING rules on vrf device")
Signed-off-by: Lahav Schlesinger <lschlesinger@drivenets.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20210815120002.2787653-1-lschlesinger@drivenets.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Qualcomm ARM64 fixes for v5.14
This fixes three regressions across Angler and Bullhead, introduced by
advancements in the platform definition. It then corrects the powerdown
GPIOs for the speaker amps on C630 and lastly fixes a typo that assigned
CPU7 in SC7280 to the wrong CPUfreq domain.
* tag 'qcom-arm64-fixes-for-5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux:
arm64: dts: qcom: sdm845-oneplus: fix reserved-mem
arm64: dts: qcom: msm8994-angler: Disable cont_splash_mem
arm64: dts: qcom: sc7280: Fixup cpufreq domain info for cpu7
arm64: dts: qcom: msm8992-bullhead: Fix cont_splash_mem mapping
arm64: dts: qcom: msm8992-bullhead: Remove PSCI
arm64: dts: qcom: c630: fix correct powerdown pin for WSA881x
Link: https://lore.kernel.org/r/20210816205030.576348-1-bjorn.andersson@linaro.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
NXP/FSL SoC driver fixes for v5.14
QE interrupt controller driver
- Convert it to platform_device driver to make it work with fw_devlink
- Fix static analysis issue
* tag 'soc-fsl-fix-v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/leo/linux:
soc: fsl: qe: fix static checker warning
soc: fsl: qe: convert QE interrupt controller to platform_device
Link: https://lore.kernel.org/r/20210813222305.13663-1-leoyang.li@nxp.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
[Why]
DM initializes VM context after DMCUB initialization.
This results in loss of DCN_VM_CONTEXT registers after z10.
[How]
Notify DMCUB when VM setup is complete, and have DMCUB
save init registers.
v2: squash in CONFIG_DRM_AMD_DC_DCN3_1 fix
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Acked-by: Wayne Lin <wayne.lin@amd.com>
Signed-off-by: Jake Wang <haonan.wang2@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
KFDSVMRangeTest.SetGetAttributesTest randomly fails in stress test.
Note: Google Test filter = KFDSVMRangeTest.*
[==========] Running 18 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 18 tests from KFDSVMRangeTest
[ RUN ] KFDSVMRangeTest.BasicSystemMemTest
[ OK ] KFDSVMRangeTest.BasicSystemMemTest (30 ms)
[ RUN ] KFDSVMRangeTest.SetGetAttributesTest
[ ] Get default atrributes
/home/yifan/brahma/libhsakmt/tests/kfdtest/src/KFDSVMRangeTest.cpp:154: Failure
Value of: expectedDefaultResults[i]
Actual: 4294967295
Expected: outputAttributes[i].value
Which is: 0
/home/yifan/brahma/libhsakmt/tests/kfdtest/src/KFDSVMRangeTest.cpp:154: Failure
Value of: expectedDefaultResults[i]
Actual: 4294967295
Expected: outputAttributes[i].value
Which is: 0
/home/yifan/brahma/libhsakmt/tests/kfdtest/src/KFDSVMRangeTest.cpp:152: Failure
Value of: expectedDefaultResults[i]
Actual: 4
Expected: outputAttributes[i].type
Which is: 2
[ ] Setting/Getting atrributes
[ FAILED ]
the root cause is that svm work queue has not finished when svm_range_get_attr is called, thus
some garbage svm interval tree data make svm_range_get_attr get wrong result. Flush work queue before
iterate svm interval tree.
Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Pull tracing fixes from Steven Rostedt:
"Fixes and clean ups to tracing:
- Fix header alignment when PREEMPT_RT is enabled for osnoise tracer
- Inject "stop" event to see where osnoise stopped the trace
- Define DYNAMIC_FTRACE_WITH_ARGS as some code had an #ifdef for it
- Fix erroneous message for bootconfig cmdline parameter
- Fix crash caused by not found variable in histograms"
* tag 'trace-v5.14-rc5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing / histogram: Fix NULL pointer dereference on strcmp() on NULL event name
init: Suppress wrong warning for bootconfig cmdline parameter
tracing: define needed config DYNAMIC_FTRACE_WITH_ARGS
trace/osnoise: Print a stop tracing message
trace/timerlat: Add a header with PREEMPT_RT additional fields
trace/osnoise: Add a header with PREEMPT_RT additional fields
It was reported by a user with a Dell m15 R5 (5800H) that
the keyboard backlight was turning on when entering suspend
and turning off when exiting (the opposite of how it should be).
The user bisected it back to commit 5dbf509975 ("ACPI: PM:
s2idle: Add support for new Microsoft UUID"). Previous to that
commit the LEDs didn't turn off at all. Confirming in the spec,
these were reversed when introduced.
Fix them to match the spec.
BugLink: https://gitlab.freedesktop.org/drm/amd/-/issues/1230#note_1021836
Fixes: 5dbf509975 ("ACPI: PM: s2idle: Add support for new Microsoft UUID")
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Pull KVM fixes from Paolo Bonzini:
"Two nested virtualization fixes for AMD processors"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: nSVM: always intercept VMLOAD/VMSAVE when nested (CVE-2021-3656)
KVM: nSVM: avoid picking up unsupported bits from L2 in int_ctl (CVE-2021-3653)
Pull virtio fixes from Michael Tsirkin:
"Fixes in virtio, vhost, and vdpa drivers"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
vdpa/mlx5: Fix queue type selection logic
vdpa/mlx5: Avoid destroying MR on empty iotlb
tools/virtio: fix build
virtio_ring: pull in spinlock header
vringh: pull in spinlock header
virtio-blk: Add validation for block size in config space
vringh: Use wiov->used to check for read/write desc order
virtio_vdpa: reject invalid vq indices
vdpa: Add documentation for vdpa_alloc_device() macro
vDPA/ifcvf: Fix return value check for vdpa_alloc_device()
vp_vdpa: Fix return value check for vdpa_alloc_device()
vdpa_sim: Fix return value check for vdpa_alloc_device()
vhost: Fix the calculation in vhost_overflow()
vhost-vdpa: Fix integer overflow in vhost_vdpa_process_iotlb_update()
virtio_pci: Support surprise removal of virtio pci device
virtio: Protect vqs list access
virtio: Keep vring_del_virtqueue() mirror of VQ create
virtio: Improve vq->broken access to avoid any compiler optimization
On the system PRMT table is not present, dmesg output:
$ dmesg | grep PRM
[ 1.532237] ACPI: PRMT not present
[ 1.532237] PRM: found 4294967277 modules
The result of acpi_table_parse_entries need to be checked and return
immediately if PRMT table is not present or no PRM module found.
Signed-off-by: Aubrey Li <aubrey.li@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Pull operating performance points (OPP) framework fixes for v5.14
from iresh Kumar:
"This removes few WARN() statements from the OPP core."
* 'opp/fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm:
opp: Drop empty-table checks from _put functions
opp: remove WARN when no valid OPPs remain
If L1 disables VMLOAD/VMSAVE intercepts, and doesn't enable
Virtual VMLOAD/VMSAVE (currently not supported for the nested hypervisor),
then VMLOAD/VMSAVE must operate on the L1 physical memory, which is only
possible by making L0 intercept these instructions.
Failure to do so allowed the nested guest to run VMLOAD/VMSAVE unintercepted,
and thus read/write portions of the host physical memory.
Fixes: 89c8a4984f ("KVM: SVM: Enable Virtual VMLOAD VMSAVE feature")
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* Invert the mask of bits that we pick from L2 in
nested_vmcb02_prepare_control
* Invert and explicitly use VIRQ related bits bitmask in svm_clear_vintr
This fixes a security issue that allowed a malicious L1 to run L2 with
AVIC enabled, which allowed the L2 to exploit the uninitialized and enabled
AVIC to read/write the host physical memory at some offsets.
Fixes: 3d6368ef58 ("KVM: SVM: Add VMRUN handler")
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Video captured in 1400x1050 resolution (bytesperline aka stride = 1408
bytes) is invalid. Fix it.
Signed-off-by: Krzysztof Halasa <khalasa@piap.pl>
Link: https://lore.kernel.org/r/m3y2bmq7a4.fsf@t19.piap.pl
[p.zabel@pengutronix.de: added "gpu: ipu-v3:" prefix to commit description]
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
The bounds check on "index" doesn't catch negative values. Using
ARRAY_SIZE() directly is more readable and more robust because it prevents
negative values for "index". Fortunately we only pass valid values to
ipc_chnl_cfg_get() so this patch does not affect runtime.
Reported-by: Solomon Ucko <solly.ucko@gmail.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: M Chetan Kumar <m.chetan.kumar@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In 69de4421bb ("drm/ttm: Initialize debugfs from
ttm_global_init()"), ttm_global_init was changed so that if creation
of the debugfs global root directory fails, ttm_global_init will bail
out early and return an error, leading to initialization failure of
DRM drivers. However, not every system will be using debugfs. On such
a system, debugfs directory creation can be expected to fail, but DRM
drivers must still be usable. This changes it so that if creation of
TTM's debugfs root directory fails, then no biggie: keep calm and
carry on.
Fixes: 69de4421bb ("drm/ttm: Initialize debugfs from ttm_global_init()")
Signed-off-by: Dan Moulding <dmoulding@me.com>
Tested-by: Huacai Chen <chenhuacai@loongson.cn>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210810195906.22220-2-dmoulding@me.com
Signed-off-by: Christian König <christian.koenig@amd.com>
Cross-rename lacks a check when that would prevent exchanging a
directory and subvolume from different parent subvolume. This causes
data inconsistencies and is caught before commit by tree-checker,
turning the filesystem to read-only.
Calling the renameat2 with RENAME_EXCHANGE flags like
renameat2(AT_FDCWD, namesrc, AT_FDCWD, namedest, (1 << 1))
on two paths:
namesrc = dir1/subvol1/dir2
namedest = subvol2/subvol3
will cause key order problem with following write time tree-checker
report:
[1194842.307890] BTRFS critical (device loop1): corrupt leaf: root=5 block=27574272 slot=10 ino=258, invalid previous key objectid, have 257 expect 258
[1194842.322221] BTRFS info (device loop1): leaf 27574272 gen 8 total ptrs 11 free space 15444 owner 5
[1194842.331562] BTRFS info (device loop1): refs 2 lock_owner 0 current 26561
[1194842.338772] item 0 key (256 1 0) itemoff 16123 itemsize 160
[1194842.338793] inode generation 3 size 16 mode 40755
[1194842.338801] item 1 key (256 12 256) itemoff 16111 itemsize 12
[1194842.338809] item 2 key (256 84 2248503653) itemoff 16077 itemsize 34
[1194842.338817] dir oid 258 type 2
[1194842.338823] item 3 key (256 84 2363071922) itemoff 16043 itemsize 34
[1194842.338830] dir oid 257 type 2
[1194842.338836] item 4 key (256 96 2) itemoff 16009 itemsize 34
[1194842.338843] item 5 key (256 96 3) itemoff 15975 itemsize 34
[1194842.338852] item 6 key (257 1 0) itemoff 15815 itemsize 160
[1194842.338863] inode generation 6 size 8 mode 40755
[1194842.338869] item 7 key (257 12 256) itemoff 15801 itemsize 14
[1194842.338876] item 8 key (257 84 2505409169) itemoff 15767 itemsize 34
[1194842.338883] dir oid 256 type 2
[1194842.338888] item 9 key (257 96 2) itemoff 15733 itemsize 34
[1194842.338895] item 10 key (258 12 256) itemoff 15719 itemsize 14
[1194842.339163] BTRFS error (device loop1): block=27574272 write time tree block corruption detected
[1194842.339245] ------------[ cut here ]------------
[1194842.443422] WARNING: CPU: 6 PID: 26561 at fs/btrfs/disk-io.c:449 csum_one_extent_buffer+0xed/0x100 [btrfs]
[1194842.511863] CPU: 6 PID: 26561 Comm: kworker/u17:2 Not tainted 5.14.0-rc3-git+ #793
[1194842.511870] Hardware name: empty empty/S3993, BIOS PAQEX0-3 02/24/2008
[1194842.511876] Workqueue: btrfs-worker-high btrfs_work_helper [btrfs]
[1194842.511976] RIP: 0010:csum_one_extent_buffer+0xed/0x100 [btrfs]
[1194842.512068] RSP: 0018:ffffa2c284d77da0 EFLAGS: 00010282
[1194842.512074] RAX: 0000000000000000 RBX: 0000000000001000 RCX: ffff928867bd9978
[1194842.512078] RDX: 0000000000000000 RSI: 0000000000000027 RDI: ffff928867bd9970
[1194842.512081] RBP: ffff92876b958000 R08: 0000000000000001 R09: 00000000000c0003
[1194842.512085] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
[1194842.512088] R13: ffff92875f989f98 R14: 0000000000000000 R15: 0000000000000000
[1194842.512092] FS: 0000000000000000(0000) GS:ffff928867a00000(0000) knlGS:0000000000000000
[1194842.512095] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[1194842.512099] CR2: 000055f5384da1f0 CR3: 0000000102fe4000 CR4: 00000000000006e0
[1194842.512103] Call Trace:
[1194842.512128] ? run_one_async_free+0x10/0x10 [btrfs]
[1194842.631729] btree_csum_one_bio+0x1ac/0x1d0 [btrfs]
[1194842.631837] run_one_async_start+0x18/0x30 [btrfs]
[1194842.631938] btrfs_work_helper+0xd5/0x1d0 [btrfs]
[1194842.647482] process_one_work+0x262/0x5e0
[1194842.647520] worker_thread+0x4c/0x320
[1194842.655935] ? process_one_work+0x5e0/0x5e0
[1194842.655946] kthread+0x135/0x160
[1194842.655953] ? set_kthread_struct+0x40/0x40
[1194842.655965] ret_from_fork+0x1f/0x30
[1194842.672465] irq event stamp: 1729
[1194842.672469] hardirqs last enabled at (1735): [<ffffffffbd1104f5>] console_trylock_spinning+0x185/0x1a0
[1194842.672477] hardirqs last disabled at (1740): [<ffffffffbd1104cc>] console_trylock_spinning+0x15c/0x1a0
[1194842.672482] softirqs last enabled at (1666): [<ffffffffbdc002e1>] __do_softirq+0x2e1/0x50a
[1194842.672491] softirqs last disabled at (1651): [<ffffffffbd08aab7>] __irq_exit_rcu+0xa7/0xd0
The corrupted data will not be written, and filesystem can be unmounted
and mounted again (all changes since the last commit will be lost).
Add the missing check for new_ino so that all non-subvolumes must reside
under the same parent subvolume. There's an exception allowing to
exchange two subvolumes from any parents as the directory representing a
subvolume is only a logical link and does not have any other structures
related to the parent subvolume, unlike files, directories etc, that
are always in the inode namespace of the parent subvolume.
Fixes: cdd1fedf82 ("btrfs: add support for RENAME_EXCHANGE and RENAME_WHITEOUT")
CC: stable@vger.kernel.org # 4.7+
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Michael Chan says:
====================
bnxt_en: 2 bug fixes
The first one disables aRFS/NTUPLE on an older broken firmware version.
The second one adds missing memory barriers related to completion ring
handling.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Each completion ring entry has a valid bit to indicate that the entry
contains a valid completion event. The driver's main poll loop
__bnxt_poll_work() has the proper dma_rmb() to make sure the valid
bit of the next entry has been checked before proceeding further.
But when we call bnxt_rx_pkt() to process the RX event, the RX
completion event consists of two completion entries and only the
first entry has been checked to be valid. We need the same barrier
after checking the next completion entry. Add missing dma_rmb()
barriers in bnxt_rx_pkt() and other similar locations.
Fixes: 67a95e2022 ("bnxt_en: Need memory barrier when processing the completion ring.")
Reported-by: Lance Richardson <lance.richardson@broadcom.com>
Reviewed-by: Andy Gospodarek <gospo@broadcom.com>
Reviewed-by: Lance Richardson <lance.richardson@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
212 firmware broke aRFS, so disable it. Traffic may stop after ntuple
filters are inserted and deleted by the 212 firmware.
Fixes: ae10ae740a ("bnxt_en: Add new hardware RFS mode.")
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix a possible null-pointer dereference in qed_rdma_create_qp().
Changes from V2:
- Revert checkpatch fixes.
Reported-by: TOTE Robot <oslab@tsinghua.edu.cn>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Avoiding qed ll2 race condition and NULL pointer dereference as part
of the remove and recovery flows.
Changes form V1:
- Change (!p_rx->set_prod_addr).
- qed_ll2.c checkpatch fixes.
Change from V2:
- Revert "qed_ll2.c checkpatch fixes".
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
__tipc_sendmsg() is called to send SYN packet by either tipc_sendmsg()
or tipc_connect(). The difference is in tipc_connect(), it will call
tipc_wait_for_connect() after __tipc_sendmsg() to wait until connecting
is done. So there's no need to wait in __tipc_sendmsg() for this case.
This patch is to fix it by calling tipc_wait_for_connect() only when dlen
is not 0 in __tipc_sendmsg(), which means it's called by tipc_connect().
Note this also fixes the failure in tipcutils/test/ptts/:
# ./tipcTS &
# ./tipcTC 9
(hang)
Fixes: 36239dab6da7 ("tipc: fix implicit-connect for SYN+")
Reported-by: Shuang Li <shuali@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
During the swap dependency on PCH_GBE to selection PTP_1588_CLOCK_PCH
incidentally dropped the implicit dependency on the PCI. Restore it.
Fixes: 18d359ceb0 ("pch_gbe, ptp_pch: Fix the dependency direction between these drivers")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Syzbot reported slab-out-of bounds write in decode_data().
The problem was in missing validation checks.
Syzbot's reproducer generated malicious input, which caused
decode_data() to be called a lot in sixpack_decode(). Since
rx_count_cooked is only 400 bytes and noone reported before,
that 400 bytes is not enough, let's just check if input is malicious
and complain about buffer overrun.
Fail log:
==================================================================
BUG: KASAN: slab-out-of-bounds in drivers/net/hamradio/6pack.c:843
Write of size 1 at addr ffff888087c5544e by task kworker/u4:0/7
CPU: 0 PID: 7 Comm: kworker/u4:0 Not tainted 5.6.0-rc3-syzkaller #0
...
Workqueue: events_unbound flush_to_ldisc
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x197/0x210 lib/dump_stack.c:118
print_address_description.constprop.0.cold+0xd4/0x30b mm/kasan/report.c:374
__kasan_report.cold+0x1b/0x32 mm/kasan/report.c:506
kasan_report+0x12/0x20 mm/kasan/common.c:641
__asan_report_store1_noabort+0x17/0x20 mm/kasan/generic_report.c:137
decode_data.part.0+0x23b/0x270 drivers/net/hamradio/6pack.c:843
decode_data drivers/net/hamradio/6pack.c:965 [inline]
sixpack_decode drivers/net/hamradio/6pack.c:968 [inline]
Reported-and-tested-by: syzbot+fc8cd9a673d4577fb2e4@syzkaller.appspotmail.com
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
Reviewed-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current_opp is released only when whole OPP table is released,
otherwise it's only marked as removed by dev_pm_opp_remove_table().
Functions like dev_pm_opp_put_clkname() and dev_pm_opp_put_supported_hw()
are checking whether OPP table is empty and it's not if current_opp is
set since it holds the refcount of OPP, this produces a noisy warning
from these functions about busy OPP table. Remove the checks to fix it.
Cc: stable@vger.kernel.org
Fixes: 81c4d8a3c4 ("opp: Keep track of currently programmed OPP")
Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Pull powerpc fixes from Michael Ellerman:
- Fix crashes coming out of nap on 32-bit Book3s (eg. powerbooks).
- Fix critical and debug interrupts on BookE, seen as crashes when
using ptrace.
- Fix an oops when running an SMP kernel on a UP system.
- Update pseries LPAR security flavor after partition migration.
- Fix an oops when using kprobes on BookE.
- Fix oops on 32-bit pmac by not calling do_IRQ() from
timer_interrupt().
- Fix softlockups on CPU hotplug into a CPU-less node with xive (P9).
Thanks to Cédric Le Goater, Christophe Leroy, Finn Thain, Geetika
Moolchandani, Laurent Dufour, Laurent Vivier, Nicholas Piggin, Pu Lehui,
Radu Rendec, Srikar Dronamraju, and Stan Johnson.
* tag 'powerpc-5.14-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/xive: Do not skip CPU-less nodes when creating the IPIs
powerpc/interrupt: Do not call single_step_exception() from other exceptions
powerpc/interrupt: Fix OOPS by not calling do_IRQ() from timer_interrupt()
powerpc/kprobes: Fix kprobe Oops happens in booke
powerpc/pseries: Fix update of LPAR security flavor after LPM
powerpc/smp: Fix OOPS in topology_init()
powerpc/32: Fix critical and debug interrupts on BOOKE
powerpc/32s: Fix napping restore in data storage interrupt (DSI)
Pull irq fixes from Thomas Gleixner:
"A set of fixes for PCI/MSI and x86 interrupt startup:
- Mask all MSI-X entries when enabling MSI-X otherwise stale unmasked
entries stay around e.g. when a crashkernel is booted.
- Enforce masking of a MSI-X table entry when updating it, which
mandatory according to speification
- Ensure that writes to MSI[-X} tables are flushed.
- Prevent invalid bits being set in the MSI mask register
- Properly serialize modifications to the mask cache and the mask
register for multi-MSI.
- Cure the violation of the affinity setting rules on X86 during
interrupt startup which can cause lost and stale interrupts. Move
the initial affinity setting ahead of actualy enabling the
interrupt.
- Ensure that MSI interrupts are completely torn down before freeing
them in the error handling case.
- Prevent an array out of bounds access in the irq timings code"
* tag 'irq-urgent-2021-08-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
driver core: Add missing kernel doc for device::msi_lock
genirq/msi: Ensure deactivation on teardown
genirq/timings: Prevent potential array overflow in __irq_timings_store()
x86/msi: Force affinity setup before startup
x86/ioapic: Force affinity setup before startup
genirq: Provide IRQCHIP_AFFINITY_PRE_STARTUP
PCI/MSI: Protect msi_desc::masked for multi-MSI
PCI/MSI: Use msi_mask_irq() in pci_msi_shutdown()
PCI/MSI: Correct misleading comments
PCI/MSI: Do not set invalid bits in MSI mask
PCI/MSI: Enforce MSI[X] entry updates to be visible
PCI/MSI: Enforce that MSI-X table entry is masked for update
PCI/MSI: Mask all unused MSI-X entries
PCI/MSI: Enable and mask MSI-X early
Pull locking fix from Borislav Petkov:
- Fix a CONFIG symbol's spelling
* tag 'locking_urgent_for_v5.14_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/rtmutex: Use the correct rtmutex debugging config option
Pull EFI fixes from Borislav Petkov:
"A batch of fixes for the arm64 stub image loader:
- fix a logic bug that can make the random page allocator fail
spuriously
- force reallocation of the Image when it overlaps with firmware
reserved memory regions
- fix an oversight that defeated on optimization introduced earlier
where images loaded at a suitable offset are never moved if booting
without randomization
- complain about images that were not loaded at the right offset by
the firmware image loader"
* tag 'efi_urgent_for_v5.14_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
efi/libstub: arm64: Double check image alignment at entry
efi/libstub: arm64: Warn when efi_random_alloc() fails
efi/libstub: arm64: Relax 2M alignment again for relocatable kernels
efi/libstub: arm64: Force Image reallocation if BSS was not reserved
arm64: efi: kaslr: Fix occasional random alloc (and boot) failure
Pull x86 fixes from Borislav Petkov:
"Two fixes:
- An objdump checker fix to ignore parenthesized strings in the
objdump version
- Fix resctrl default monitoring groups reporting when new subgroups
get created"
* tag 'x86_urgent_for_v5.14_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/resctrl: Fix default monitoring groups reporting
x86/tools: Fix objdump version check again
Pull KVM fixes from Paolo Bonzini:
"ARM:
- Plug race between enabling MTE and creating vcpus
- Fix off-by-one bug when checking whether an address range is RAM
x86:
- Fixes for the new MMU, especially a memory leak on hosts with <39
physical address bits
- Remove bogus EFER.NX checks on 32-bit non-PAE hosts
- WAITPKG fix"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86/mmu: Protect marking SPs unsync when using TDP MMU with spinlock
KVM: x86/mmu: Don't step down in the TDP iterator when zapping all SPTEs
KVM: x86/mmu: Don't leak non-leaf SPTEs when zapping all SPTEs
KVM: nVMX: Use vmx_need_pf_intercept() when deciding if L0 wants a #PF
kvm: vmx: Sync all matching EPTPs when injecting nested EPT fault
KVM: x86: remove dead initialization
KVM: x86: Allow guest to set EFER.NX=1 on non-PAE 32-bit kernels
KVM: VMX: Use current VMCS to query WAITPKG support for MSR emulation
KVM: arm64: Fix race when enabling KVM_ARM_CAP_MTE
KVM: arm64: Fix off-by-one in range_is_memory
Georgi writes:
interconnect fix for v5.14
This contains a revert for a patch that has been causing issues:
- Revert: qcom: rpmh: Add BCMs to commit list in pre_aggregate
Signed-off-by: Georgi Djakov <djakov@kernel.org>
* tag 'icc-5.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/djakov/icc:
Revert "interconnect: qcom: icc-rpmh: Add BCMs to commit list in pre_aggregate"
Pull SCSI fixes from James Bottomley:
"Three minor fixes, all in drivers"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: mpt3sas: Fix incorrectly assigned error return and check
scsi: storvsc: Log TEST_UNIT_READY errors as warnings
scsi: lpfc: Move initialization of phba->poll_list earlier to avoid crash
Pull libnvdimm fixes from Dan Williams:
"A couple of fixes for long standing bugs, a warning fixup, and some
miscellaneous dax cleanups.
The bugs were recently found due to new platforms looking to use the
ACPI NFIT "virtual" device definition, and new error injection
capabilities to trigger error responses to label area requests. Ira's
cleanups have been long pending, I neglected to send them earlier, and
see no harm in including them now. This has all appeared in -next with
no reported issues.
Summary:
- Fix support for NFIT "virtual" ranges (BIOS-defined memory disks)
- Fix recovery from failed label storage areas on NVDIMM devices
- Miscellaneous cleanups from Ira's investigation of
dax_direct_access paths preparing for stray-write protection"
* tag 'libnvdimm-fixes-5.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
tools/testing/nvdimm: Fix missing 'fallthrough' warning
libnvdimm/region: Fix label activation vs errors
ACPI: NFIT: Fix support for virtual SPA ranges
dax: Ensure errno is returned from dax_direct_access
fs/dax: Clarify nr_pages to dax_direct_access()
fs/fuse: Remove unneeded kaddr parameter
Pull USB fix from Greg KH:
"A single revert of a commit that caused problems in 5.14-rc5 for
5.14-rc6. It has been in linux-next almost all week, and has resolved
the issues that were reported on lots of different systems that were
not the platform that the change was originally tested on (gotta love
SoC cores used in multiple devices from multiple vendors...)"
* tag 'usb-5.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
Revert "usb: dwc3: gadget: Use list_replace_init() before traversing lists"
Pull IIO driver fixes from Greg KH:
"Here are some small IIO driver fixes for reported problems for
5.14-rc6 (no staging driver fixes at the moment).
All of them resolve reported issues and have been in linux-next all
week with no reported problems. Full details are in the shortlog"
* tag 'staging-5.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
iio: adc: Fix incorrect exit of for-loop
iio: humidity: hdc100x: Add margin to the conversion time
dt-bindings: iio: st: Remove wrong items length check
iio: accel: fxls8962af: fix i2c dependency
iio: adis: set GPIO reset pin direction
iio: adc: ti-ads7950: Ensure CS is deasserted after reading channels
iio: accel: fxls8962af: fix potential use of uninitialized symbol
Pull i2c fixes from Wolfram Sang:
"One driver bugfix, a documentation bugfix, and an "uninitialized data"
leak fix for the core"
* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
Documentation: i2c: add i2c-sysfs into index
i2c: dev: zero out array used for i2c reads from userspace
i2c: iproc: fix race between client unreg and tasklet
If an SQPOLL based ring is newly created and an application issues an
io_uring_enter(2) system call on it, then we can return a spurious
-EOWNERDEAD error. This happens because there's nothing to submit, and
if the caller doesn't specify any other action, the initial error
assignment of -EOWNERDEAD never gets overwritten. This causes us to
return it directly, even if it isn't valid.
Move the error assignment into the actual failure case instead.
Cc: stable@vger.kernel.org
Fixes: d9d05217cb ("io_uring: stop SQPOLL submit on creator's death")
Reported-by: Sherlock Holo sherlockya@gmail.com
Link: https://github.com/axboe/liburing/issues/413
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pull xen fixes from Juergen Gross:
"A small cleanup patch and a fix of a rare race in the Xen evtchn
driver"
* tag 'for-linus-5.14-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen/events: Fix race in set_evtchn_to_irq
xen/events: remove redundant initialization of variable irq
Pull RISC-V fixes from Palmer Dabbelt:
- avoid passing -mno-relax to compilers that don't support it
- a comment fix
* tag 'riscv-for-linus-5.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: Fix comment regarding kernel mapping overlapping with IS_ERR_VALUE
riscv: kexec: do not add '-mno-relax' flag if compiler doesn't support it
Pull configfs fix from Christoph Hellwig:
- fix to revert to the historic write behavior (Bart Van Assche)
* tag 'configfs-5.14' of git://git.infradead.org/users/hch/configfs:
configfs: restore the kernel v5.13 text attribute write behavior
Merge misc fixes from Andrew Morton:
"7 patches.
Subsystems affected by this patch series: mm (kasan, mm/slub,
mm/madvise, and memcg), and lib"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
lib: use PFN_PHYS() in devmem_is_allowed()
mm/memcg: fix incorrect flushing of lruvec data in obj_stock
mm/madvise: report SIGBUS as -EFAULT for MADV_POPULATE_(READ|WRITE)
mm: slub: fix slub_debug disabling for list of slabs
slub: fix kmalloc_pagealloc_invalid_free unit test
kasan, slub: reset tag when printing address
kasan, kmemleak: reset tags when scanning block
Pull cifs fixes from Steve French:
"Four CIFS/SMB3 Fixes, all for stable, two relating to deferred close,
and one for the 'modefromsid' mount option (when 'idsfromsid' not
specified)"
* tag '5.14-rc5-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: Call close synchronously during unlink/rename/lease break.
cifs: Handle race conditions during rename
cifs: use the correct max-length for dentry_path_raw()
cifs: create sd context must be a multiple of 8
Pull Kselftest fix from Shuah Khan:
"A single patch to sgx test to fix Q1 and Q2 calculation"
* tag 'linux-kselftest-fixes-5.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests/sgx: Fix Q1 and Q2 calculation in sigstruct.c
Vijayanand Jitta reports:
Consider the scenario where CONFIG_SLUB_DEBUG_ON is set and we would
want to disable slub_debug for few slabs. Using boot parameter with
slub_debug=-,slab_name syntax doesn't work as expected i.e; only
disabling debugging for the specified list of slabs. Instead it
disables debugging for all slabs, which is wrong.
This patch fixes it by delaying the moment when the global slub_debug
flags variable is updated. In case a "slub_debug=-,slab_name" has been
passed, the global flags remain as initialized (depending on
CONFIG_SLUB_DEBUG_ON enabled or disabled) and are not simply reset to 0.
Link: https://lkml.kernel.org/r/8a3d992a-473a-467b-28a0-4ad2ff60ab82@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reported-by: Vijayanand Jitta <vjitta@codeaurora.org>
Reviewed-by: Vijayanand Jitta <vjitta@codeaurora.org>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Vinayak Menon <vinmenon@codeaurora.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The unit test kmalloc_pagealloc_invalid_free makes sure that for the
higher order slub allocation which goes to page allocator, the free is
called with the correct address i.e. the virtual address of the head
page.
Commit f227f0faf6 ("slub: fix unreclaimable slab stat for bulk free")
unified the free code paths for page allocator based slub allocations
but instead of using the address passed by the caller, it extracted the
address from the page. Thus making the unit test
kmalloc_pagealloc_invalid_free moot. So, fix this by using the address
passed by the caller.
Should we fix this? I think yes because dev expect kasan to catch these
type of programming bugs.
Link: https://lkml.kernel.org/r/20210802180819.1110165-1-shakeelb@google.com
Fixes: f227f0faf6 ("slub: fix unreclaimable slab stat for bulk free")
Signed-off-by: Shakeel Butt <shakeelb@google.com>
Reported-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Acked-by: Roman Gushchin <guro@fb.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Patch series "kasan, slub: reset tag when printing address", v3.
With hardware tag-based kasan enabled, we reset the tag when we access
metadata to avoid from false alarm.
This patch (of 2):
Kmemleak needs to scan kernel memory to check memory leak. With hardware
tag-based kasan enabled, when it scans on the invalid slab and
dereference, the issue will occur as below.
Hardware tag-based KASAN doesn't use compiler instrumentation, we can not
use kasan_disable_current() to ignore tag check.
Based on the below report, there are 11 0xf7 granules, which amounts to
176 bytes, and the object is allocated from the kmalloc-256 cache. So
when kmemleak accesses the last 256-176 bytes, it causes faults, as those
are marked with KASAN_KMALLOC_REDZONE == KASAN_TAG_INVALID == 0xfe.
Thus, we reset tags before accessing metadata to avoid from false positives.
BUG: KASAN: out-of-bounds in scan_block+0x58/0x170
Read at addr f7ff0000c0074eb0 by task kmemleak/138
Pointer tag: [f7], memory tag: [fe]
CPU: 7 PID: 138 Comm: kmemleak Not tainted 5.14.0-rc2-00001-g8cae8cd89f05-dirty #134
Hardware name: linux,dummy-virt (DT)
Call trace:
dump_backtrace+0x0/0x1b0
show_stack+0x1c/0x30
dump_stack_lvl+0x68/0x84
print_address_description+0x7c/0x2b4
kasan_report+0x138/0x38c
__do_kernel_fault+0x190/0x1c4
do_tag_check_fault+0x78/0x90
do_mem_abort+0x44/0xb4
el1_abort+0x40/0x60
el1h_64_sync_handler+0xb4/0xd0
el1h_64_sync+0x78/0x7c
scan_block+0x58/0x170
scan_gray_list+0xdc/0x1a0
kmemleak_scan+0x2ac/0x560
kmemleak_scan_thread+0xb0/0xe0
kthread+0x154/0x160
ret_from_fork+0x10/0x18
Allocated by task 0:
kasan_save_stack+0x2c/0x60
__kasan_kmalloc+0xec/0x104
__kmalloc+0x224/0x3c4
__register_sysctl_paths+0x200/0x290
register_sysctl_table+0x2c/0x40
sysctl_init+0x20/0x34
proc_sys_init+0x3c/0x48
proc_root_init+0x80/0x9c
start_kernel+0x648/0x6a4
__primary_switched+0xc0/0xc8
Freed by task 0:
kasan_save_stack+0x2c/0x60
kasan_set_track+0x2c/0x40
kasan_set_free_info+0x44/0x54
____kasan_slab_free.constprop.0+0x150/0x1b0
__kasan_slab_free+0x14/0x20
slab_free_freelist_hook+0xa4/0x1fc
kfree+0x1e8/0x30c
put_fs_context+0x124/0x220
vfs_kern_mount.part.0+0x60/0xd4
kern_mount+0x24/0x4c
bdev_cache_init+0x70/0x9c
vfs_caches_init+0xdc/0xf4
start_kernel+0x638/0x6a4
__primary_switched+0xc0/0xc8
The buggy address belongs to the object at ffff0000c0074e00
which belongs to the cache kmalloc-256 of size 256
The buggy address is located 176 bytes inside of
256-byte region [ffff0000c0074e00, ffff0000c0074f00)
The buggy address belongs to the page:
page:(____ptrval____) refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x100074
head:(____ptrval____) order:2 compound_mapcount:0 compound_pincount:0
flags: 0xbfffc0000010200(slab|head|node=0|zone=2|lastcpupid=0xffff|kasantag=0x0)
raw: 0bfffc0000010200 0000000000000000 dead000000000122 f5ff0000c0002300
raw: 0000000000000000 0000000000200020 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff0000c0074c00: f0 f0 f0 f0 f0 f0 f0 f0 f0 fe fe fe fe fe fe fe
ffff0000c0074d00: fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe
>ffff0000c0074e00: f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 fe fe fe fe fe
^
ffff0000c0074f00: fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe
ffff0000c0075000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================
Disabling lock debugging due to kernel taint
kmemleak: 181 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
Link: https://lkml.kernel.org/r/20210804090957.12393-1-Kuan-Ying.Lee@mediatek.com
Link: https://lkml.kernel.org/r/20210804090957.12393-2-Kuan-Ying.Lee@mediatek.com
Signed-off-by: Kuan-Ying Lee <Kuan-Ying.Lee@mediatek.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Marco Elver <elver@google.com>
Cc: Nicholas Tang <nicholas.tang@mediatek.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Chinwen Chang <chinwen.chang@mediatek.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull block fixes from Jens Axboe:
"A few fixes for block that should go into 5.14:
- Revert the mq-deadline cgroup addition. More work is needed on this
front, let's revert it for now and get it right before having it in
a released kernel (Tejun)
- blk-iocost lockdep fix (Ming)
- nbd double completion fix (Xie)
- Fix for non-idling when clearing the shared tag flag (Yu)"
* tag 'block-5.14-2021-08-13' of git://git.kernel.dk/linux-block:
nbd: Aovid double completion of a request
blk-mq: clear active_queues before clearing BLK_MQ_F_TAG_QUEUE_SHARED
Revert "block/mq-deadline: Add cgroup support"
blk-iocost: fix lockdep warning on blkcg->lock
Pull io_uring fixes from Jens Axboe:
"A bit bigger than the previous weeks, but mostly just a few stable
bound fixes. In detail:
- Followup fixes to patches from last week for io-wq, turns out they
weren't complete (Hao)
- Two lockdep reported fixes out of the RT camp (me)
- Sync the io_uring-cp example with liburing, as a few bug fixes
never made it to the kernel carried version (me)
- SQPOLL related TIF_NOTIFY_SIGNAL fix (Nadav)
- Use WRITE_ONCE() when writing sq flags (Nadav)
- io_rsrc_put_work() deadlock fix (Pavel)"
* tag 'io_uring-5.14-2021-08-13' of git://git.kernel.dk/linux-block:
tools/io_uring/io_uring-cp: sync with liburing example
io_uring: fix ctx-exit io_rsrc_put_work() deadlock
io_uring: drop ctx->uring_lock before flushing work item
io-wq: fix IO_WORKER_F_FIXED issue in create_io_worker()
io-wq: fix bug of creating io-wokers unconditionally
io_uring: rsrc ref lock needs to be IRQ safe
io_uring: Use WRITE_ONCE() when writing to sq_flags
io_uring: clear TIF_NOTIFY_SIGNAL when running task work
Pull pin control fixes from Linus Walleij:
"An assortment of pin control fixes of varying importance, the most
important ones affecting Intel and AMD laptops turned up the recent
few days so it's time to push this to your tree.
- Fix the Kconfig dependency for Qualcomm SM8350 pin controller
- Fix pin biasing fallback behaviour on the Mediatek pin controller
- Fix the GPIO numbering scheme for Intel Tiger Lake-H to correspond
to the products that are now actually out on the market
- Fix a pin control function itemization in the Sunxi driver
out-of-bounds access bug
- Fix disable clocking for the RISC-V K210 pin controller on the
errorpath
- Fix a system shutdown bug affecting AMD Ryzen-based laptops, the
system would not suspend but just bounce back up"
* tag 'pinctrl-v5.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
pinctrl: amd: Fix an issue with shutdown when system set to s0ix
pinctrl: k210: Fix k210_fpioa_probe()
pinctrl: sunxi: Don't underestimate number of functions
pinctrl: tigerlake: Fix GPIO mapping for newer version of software
pinctrl: mediatek: Fix fallback behavior for bias_set_combo
pinctrl: qcom: fix GPIOLIB dependencies
The patch be7ecbd240: "soc: fsl: qe: convert QE interrupt
controller to platform_device" from Aug 3, 2021, leads to the
following static checker warning:
drivers/soc/fsl/qe/qe_ic.c:438 qe_ic_init()
warn: unsigned 'qe_ic->virq_low' is never less than zero.
In old variant irq_of_parse_and_map() returns zero if failed so
unsigned int for virq_high/virq_low was ok.
In new variant platform_get_irq() returns negative error codes
if failed so we need to use int for virq_high/virq_low.
Also simplify high_handler checking and remove the curly braces
to make checkpatch happy.
Fixes: be7ecbd240 ("soc: fsl: qe: convert QE interrupt controller to platform_device")
Signed-off-by: Maxim Kochetkov <fido_max@inbox.ru>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Li Yang <leoyang.li@nxp.com>
Jakub Kicinski says:
====================
bnxt: Tx NAPI disabling resiliency improvements
A lockdep warning was triggered by netpoll because napi poll
was taking the xmit lock. Fix that and a couple more issues
noticed while reading the code.
====================
Link: https://lore.kernel.org/r/20210812214242.578039-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
skbs are freed on error and not put on the ring. We may, however,
be in a situation where we're freeing the last skb of a batch,
and there is a doorbell ring pending because of xmit_more() being
true earlier. Make sure we ring the door bell in such situations.
Since errors are rare don't pay attention to xmit_more() and just
always flush the pending frames.
The busy case should be safe to be left alone because it can
only happen if start_xmit races with completions and they
both enable the queue. In that case the kick can't be pending.
Noticed while reading the code.
Fixes: 4d172f21ce ("bnxt_en: Implement xmit_more.")
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
napi schedules DIM, napi has to be disabled first,
then DIM canceled.
Noticed while reading the code.
Fixes: 0bc0b97fca ("bnxt_en: cleanup DIM work on device shutdown")
Fixes: 6a8788f256 ("bnxt_en: add support for software dynamic interrupt moderation")
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
We can't take the tx lock from the napi poll routine, because
netpoll can poll napi at any moment, including with the tx lock
already held.
The tx lock is protecting against two paths - the disable
path, and (as Michael points out) the NETDEV_TX_BUSY case
which may occur if NAPI completions race with start_xmit
and both decide to re-enable the queue.
For the disable/ifdown path use synchronize_net() to make sure
closing the device does not race we restarting the queues.
Annotate accesses to dev_state against data races.
For the NAPI cleanup vs start_xmit path - appropriate barriers
are already in place in the main spot where Tx queue is stopped
but we need to do the same careful dance in the TX_BUSY case.
Fixes: c0c050c58d ("bnxt_en: New Broadcom ethernet driver.")
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
"access skb fields ok" verifier test fails on s390 with the "verifier
bug. zext_dst is set, but no reg is defined" message. The first insns
of the test prog are ...
0: 61 01 00 00 00 00 00 00 ldxw %r0,[%r1+0]
8: 35 00 00 01 00 00 00 00 jge %r0,0,1
10: 61 01 00 08 00 00 00 00 ldxw %r0,[%r1+8]
... and the 3rd one is dead (this does not look intentional to me, but
this is a separate topic).
sanitize_dead_code() converts dead insns into "ja -1", but keeps
zext_dst. When opt_subreg_zext_lo32_rnd_hi32() tries to parse such
an insn, it sees this discrepancy and bails. This problem can be seen
only with JITs whose bpf_jit_needs_zext() returns true.
Fix by clearning dead insns' zext_dst.
The commits that contributed to this problem are:
1. 5aa5bd14c5 ("bpf: add initial suite for selftests"), which
introduced the test with the dead code.
2. 5327ed3d44 ("bpf: verifier: mark verified-insn with
sub-register zext flag"), which introduced the zext_dst flag.
3. 83a2881903 ("bpf: Account for BPF_FETCH in
insn_has_def32()"), which introduced the sanity check.
4. 9183671af6 ("bpf: Fix leakage under speculation on
mispredicted branches"), which bisect points to.
It's best to fix this on stable branches that contain the second one,
since that's the point where the inconsistency was introduced.
Fixes: 5327ed3d44 ("bpf: verifier: mark verified-insn with sub-register zext flag")
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210812151811.184086-2-iii@linux.ibm.com
This example is missing a few fixes that are in the liburing version,
synchronize with the upstream version.
Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
We run a test that delete and recover devcies frequently(two devices on
the same host), and we found that 'active_queues' is super big after a
period of time.
If device a and device b share a tag set, and a is deleted, then
blk_mq_exit_queue() will clear BLK_MQ_F_TAG_QUEUE_SHARED because there
is only one queue that are using the tag set. However, if b is still
active, the active_queues of b might never be cleared even if b is
deleted.
Thus clear active_queues before BLK_MQ_F_TAG_QUEUE_SHARED is cleared.
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20210731062130.1533893-1-yukuai3@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The function tpci200_register called by tpci200_install and
tpci200_unregister called by tpci200_uninstall are in pair. However,
tpci200_unregister has some cleanup operations not in the
tpci200_register. So the error handling code of tpci200_pci_probe has
many different double free issues.
Fix this problem by moving those cleanup operations out of
tpci200_unregister, into tpci200_pci_remove and reverting
the previous commit 9272e5d002 ("ipack/carriers/tpci200:
Fix a double free in tpci200_pci_probe").
Fixes: 9272e5d002 ("ipack/carriers/tpci200: Fix a double free in tpci200_pci_probe")
Cc: stable@vger.kernel.org
Reported-by: Dongliang Mu <mudongliangabcd@gmail.com>
Signed-off-by: Dongliang Mu <mudongliangabcd@gmail.com>
Link: https://lore.kernel.org/r/20210810100323.3938492-1-mudongliangabcd@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In some usecases transaction ids are dynamically allocated inside
the controller driver after sending the messages which have generic
acknowledge responses. So check for this before refcounting pm_runtime.
Without this we would end up imbalancing runtime pm count by
doing pm_runtime_put() in both slim_do_transfer() and slim_msg_response()
for a single pm_runtime_get() in slim_do_transfer()
Fixes: d3062a2109 ("slimbus: messaging: add slim_alloc/free_txn_tid()")
Cc: <stable@vger.kernel.org>
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Link: https://lore.kernel.org/r/20210809082428.11236-3-srinivas.kandagatla@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Add yet another spinlock for the TDP MMU and take it when marking indirect
shadow pages unsync. When using the TDP MMU and L1 is running L2(s) with
nested TDP, KVM may encounter shadow pages for the TDP entries managed by
L1 (controlling L2) when handling a TDP MMU page fault. The unsync logic
is not thread safe, e.g. the kvm_mmu_page fields are not atomic, and
misbehaves when a shadow page is marked unsync via a TDP MMU page fault,
which runs with mmu_lock held for read, not write.
Lack of a critical section manifests most visibly as an underflow of
unsync_children in clear_unsync_child_bit() due to unsync_children being
corrupted when multiple CPUs write it without a critical section and
without atomic operations. But underflow is the best case scenario. The
worst case scenario is that unsync_children prematurely hits '0' and
leads to guest memory corruption due to KVM neglecting to properly sync
shadow pages.
Use an entirely new spinlock even though piggybacking tdp_mmu_pages_lock
would functionally be ok. Usurping the lock could degrade performance when
building upper level page tables on different vCPUs, especially since the
unsync flow could hold the lock for a comparatively long time depending on
the number of indirect shadow pages and the depth of the paging tree.
For simplicity, take the lock for all MMUs, even though KVM could fairly
easily know that mmu_lock is held for write. If mmu_lock is held for
write, there cannot be contention for the inner spinlock, and marking
shadow pages unsync across multiple vCPUs will be slow enough that
bouncing the kvm_arch cacheline should be in the noise.
Note, even though L2 could theoretically be given access to its own EPT
entries, a nested MMU must hold mmu_lock for write and thus cannot race
against a TDP MMU page fault. I.e. the additional spinlock only _needs_ to
be taken by the TDP MMU, as opposed to being taken by any MMU for a VM
that is running with the TDP MMU enabled. Holding mmu_lock for read also
prevents the indirect shadow page from being freed. But as above, keep
it simple and always take the lock.
Alternative #1, the TDP MMU could simply pass "false" for can_unsync and
effectively disable unsync behavior for nested TDP. Write protecting leaf
shadow pages is unlikely to noticeably impact traditional L1 VMMs, as such
VMMs typically don't modify TDP entries, but the same may not hold true for
non-standard use cases and/or VMMs that are migrating physical pages (from
L1's perspective).
Alternative #2, the unsync logic could be made thread safe. In theory,
simply converting all relevant kvm_mmu_page fields to atomics and using
atomic bitops for the bitmap would suffice. However, (a) an in-depth audit
would be required, (b) the code churn would be substantial, and (c) legacy
shadow paging would incur additional atomic operations in performance
sensitive paths for no benefit (to legacy shadow paging).
Fixes: a2855afc7e ("KVM: x86/mmu: Allow parallel page faults for the TDP MMU")
Cc: stable@vger.kernel.org
Cc: Ben Gardon <bgardon@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210812181815.3378104-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Set the min_level for the TDP iterator at the root level when zapping all
SPTEs to optimize the iterator's try_step_down(). Zapping a non-leaf
SPTE will recursively zap all its children, thus there is no need for the
iterator to attempt to step down. This avoids rereading the top-level
SPTEs after they are zapped by causing try_step_down() to short-circuit.
In most cases, optimizing try_step_down() will be in the noise as the cost
of zapping SPTEs completely dominates the overall time. The optimization
is however helpful if the zap occurs with relatively few SPTEs, e.g. if KVM
is zapping in response to multiple memslot updates when userspace is adding
and removing read-only memslots for option ROMs. In that case, the task
doing the zapping likely isn't a vCPU thread, but it still holds mmu_lock
for read and thus can be a noisy neighbor of sorts.
Reviewed-by: Ben Gardon <bgardon@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210812181414.3376143-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Pass "all ones" as the end GFN to signal "zap all" for the TDP MMU and
really zap all SPTEs in this case. As is, zap_gfn_range() skips non-leaf
SPTEs whose range exceeds the range to be zapped. If shadow_phys_bits is
not aligned to the range size of top-level SPTEs, e.g. 512gb with 4-level
paging, the "zap all" flows will skip top-level SPTEs whose range extends
beyond shadow_phys_bits and leak their SPs when the VM is destroyed.
Use the current upper bound (based on host.MAXPHYADDR) to detect that the
caller wants to zap all SPTEs, e.g. instead of using the max theoretical
gfn, 1 << (52 - 12). The more precise upper bound allows the TDP iterator
to terminate its walk earlier when running on hosts with MAXPHYADDR < 52.
Add a WARN on kmv->arch.tdp_mmu_pages when the TDP MMU is destroyed to
help future debuggers should KVM decide to leak SPTEs again.
The bug is most easily reproduced by running (and unloading!) KVM in a
VM whose host.MAXPHYADDR < 39, as the SPTE for gfn=0 will be skipped.
=============================================================================
BUG kvm_mmu_page_header (Not tainted): Objects remaining in kvm_mmu_page_header on __kmem_cache_shutdown()
-----------------------------------------------------------------------------
Slab 0x000000004d8f7af1 objects=22 used=2 fp=0x00000000624d29ac flags=0x4000000000000200(slab|zone=1)
CPU: 0 PID: 1582 Comm: rmmod Not tainted 5.14.0-rc2+ #420
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
Call Trace:
dump_stack_lvl+0x45/0x59
slab_err+0x95/0xc9
__kmem_cache_shutdown.cold+0x3c/0x158
kmem_cache_destroy+0x3d/0xf0
kvm_mmu_module_exit+0xa/0x30 [kvm]
kvm_arch_exit+0x5d/0x90 [kvm]
kvm_exit+0x78/0x90 [kvm]
vmx_exit+0x1a/0x50 [kvm_intel]
__x64_sys_delete_module+0x13f/0x220
do_syscall_64+0x3b/0xc0
entry_SYSCALL_64_after_hwframe+0x44/0xae
Fixes: faaf05b00a ("kvm: x86/mmu: Support zapping SPTEs in the TDP MMU")
Cc: stable@vger.kernel.org
Cc: Ben Gardon <bgardon@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210812181414.3376143-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
KVM/arm64 fixes for 5.14, take #2
- Plug race between enabling MTE and creating vcpus
- Fix off-by-one bug when checking whether an address range is RAM
Use vmx_need_pf_intercept() when determining if L0 wants to handle a #PF
in L2 or if the VM-Exit should be forwarded to L1. The current logic fails
to account for the case where #PF is intercepted to handle
guest.MAXPHYADDR < host.MAXPHYADDR and ends up reflecting all #PFs into
L1. At best, L1 will complain and inject the #PF back into L2. At
worst, L1 will eat the unexpected fault and cause L2 to hang on infinite
page faults.
Note, while the bug was technically introduced by the commit that added
support for the MAXPHYADDR madness, the shame is all on commit
a0c134347b ("KVM: VMX: introduce vmx_need_pf_intercept").
Fixes: 1dbf5d68af ("KVM: VMX: Add guest physical address check in EPT violation and misconfig")
Cc: stable@vger.kernel.org
Cc: Peter Shier <pshier@google.com>
Cc: Oliver Upton <oupton@google.com>
Cc: Jim Mattson <jmattson@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210812045615.3167686-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
When a nested EPT violation/misconfig is injected into the guest,
the shadow EPT PTEs associated with that address need to be synced.
This is done by kvm_inject_emulated_page_fault() before it calls
nested_ept_inject_page_fault(). However, that will only sync the
shadow EPT PTE associated with the current L1 EPTP. Since the ASID
is based on EP4TA rather than the full EPTP, so syncing the current
EPTP is not enough. The SPTEs associated with any other L1 EPTPs
in the prev_roots cache with the same EP4TA also need to be synced.
Signed-off-by: Junaid Shahid <junaids@google.com>
Message-Id: <20210806222229.1645356-1-junaids@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
hv_vcpu is initialized again a dozen lines below, and at this
point vcpu->arch.hyperv is not valid. Remove the initializer.
Reported-by: kernel test robot <lkp@intel.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Remove an ancient restriction that disallowed exposing EFER.NX to the
guest if EFER.NX=0 on the host, even if NX is fully supported by the CPU.
The motivation of the check, added by commit 2cc51560ae ("KVM: VMX:
Avoid saving and restoring msr_efer on lightweight vmexit"), was to rule
out the case of host.EFER.NX=0 and guest.EFER.NX=1 so that KVM could run
the guest with the host's EFER.NX and thus avoid context switching EFER
if the only divergence was the NX bit.
Fast forward to today, and KVM has long since stopped running the guest
with the host's EFER.NX. Not only does KVM context switch EFER if
host.EFER.NX=1 && guest.EFER.NX=0, KVM also forces host.EFER.NX=0 &&
guest.EFER.NX=1 when using shadow paging (to emulate SMEP). Furthermore,
the entire motivation for the restriction was made obsolete over a decade
ago when Intel added dedicated host and guest EFER fields in the VMCS
(Nehalem timeframe), which reduced the overhead of context switching EFER
from 400+ cycles (2 * WRMSR + 1 * RDMSR) to a mere ~2 cycles.
In practice, the removed restriction only affects non-PAE 32-bit kernels,
as EFER.NX is set during boot if NX is supported and the kernel will use
PAE paging (32-bit or 64-bit), regardless of whether or not the kernel
will actually use NX itself (mark PTEs non-executable).
Alternatively and/or complementarily, startup_32_smp() in head_32.S could
be modified to set EFER.NX=1 regardless of paging mode, thus eliminating
the scenario where NX is supported but not enabled. However, that runs
the risk of breaking non-KVM non-PAE kernels (though the risk is very,
very low as there are no known EFER.NX errata), and also eliminates an
easy-to-use mechanism for stressing KVM's handling of guest vs. host EFER
across nested virtualization transitions.
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210805183804.1221554-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Pull networking fixes from Jakub Kicinski:
"Networking fixes, including fixes from netfilter, bpf, can and
ieee802154.
The size of this is pretty normal, but we got more fixes for 5.14
changes this week than last week. Nothing major but the trend is the
opposite of what we like. We'll see how the next week goes..
Current release - regressions:
- r8169: fix ASPM-related link-up regressions
- bridge: fix flags interpretation for extern learn fdb entries
- phy: micrel: fix link detection on ksz87xx switch
- Revert "tipc: Return the correct errno code"
- ptp: fix possible memory leak caused by invalid cast
Current release - new code bugs:
- bpf: add missing bpf_read_[un]lock_trace() for syscall program
- bpf: fix potentially incorrect results with bpf_get_local_storage()
- page_pool: mask the page->signature before the checking, avoid dma
mapping leaks
- netfilter: nfnetlink_hook: 5 fixes to information in netlink dumps
- bnxt_en: fix firmware interface issues with PTP
- mlx5: Bridge, fix ageing time
Previous releases - regressions:
- linkwatch: fix failure to restore device state across
suspend/resume
- bareudp: fix invalid read beyond skb's linear data
Previous releases - always broken:
- bpf: fix integer overflow involving bucket_size
- ppp: fix issues when desired interface name is specified via
netlink
- wwan: mhi_wwan_ctrl: fix possible deadlock
- dsa: microchip: ksz8795: fix number of VLAN related bugs
- dsa: drivers: fix broken backpressure in .port_fdb_dump
- dsa: qca: ar9331: make proper initial port defaults
Misc:
- bpf: add lockdown check for probe_write_user helper
- netfilter: conntrack: remove offload_pickup sysctl before 5.14 is
out
- netfilter: conntrack: collect all entries in one cycle,
heuristically slow down garbage collection scans on idle systems to
prevent frequent wake ups"
* tag 'net-5.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (87 commits)
vsock/virtio: avoid potential deadlock when vsock device remove
wwan: core: Avoid returning NULL from wwan_create_dev()
net: dsa: sja1105: unregister the MDIO buses during teardown
Revert "tipc: Return the correct errno code"
net: mscc: Fix non-GPL export of regmap APIs
net: igmp: increase size of mr_ifc_count
MAINTAINERS: switch to my OMP email for Renesas Ethernet drivers
tcp_bbr: fix u32 wrap bug in round logic if bbr_init() called after 2B packets
net: pcs: xpcs: fix error handling on failed to allocate memory
net: linkwatch: fix failure to restore device state across suspend/resume
net: bridge: fix memleak in br_add_if()
net: switchdev: zero-initialize struct switchdev_notifier_fdb_info emitted by drivers towards the bridge
net: bridge: fix flags interpretation for extern learn fdb entries
net: dsa: sja1105: fix broken backpressure in .port_fdb_dump
net: dsa: lantiq: fix broken backpressure in .port_fdb_dump
net: dsa: lan9303: fix broken backpressure in .port_fdb_dump
net: dsa: hellcreek: fix broken backpressure in .port_fdb_dump
bpf, core: Fix kernel-doc notation
net: igmp: fix data-race in igmp_ifc_timer_expire()
net: Fix memory leak in ieee802154_raw_deliver
...
Pull ceph fixes from Ilya Dryomov:
"A patch to avoid a soft lockup in ceph_check_delayed_caps() from Luis
and a reference handling fix from Jeff that should address some memory
corruption reports in the snaprealm area.
Both marked for stable"
* tag 'ceph-for-5.14-rc6' of git://github.com/ceph/ceph-client:
ceph: take snap_empty_lock atomically with snaprealm refcount change
ceph: reduce contention in ceph_check_delayed_caps()
Pull drm fixes from Dave Airlie:
"Another week, another set of pretty regular fixes, nothing really
stands out too much.
amdgpu:
- Yellow carp update
- RAS EEPROM fixes
- BACO/BOCO fixes
- Fix a memory leak in an error path
- Freesync fix
- VCN harvesting fix
- Display fixes
i915:
- GVT fix for Windows VM hang.
- Display fix of 12 BPC bits for display 12 and newer.
- Don't try to access some media register for fused off domains.
- Fix kerneldoc build warnings.
mediatek:
- Fix dpi bridge bug.
- Fix cursor plane no update.
meson:
- Fix colors when booting with HDR"
* tag 'drm-fixes-2021-08-13' of git://anongit.freedesktop.org/drm/drm:
drm/doc/rfc: drop lmem uapi section
drm/i915: Only access SFC_DONE when media domain is not fused off
drm/i915/display: Fix the 12 BPC bits for PIPE_MISC reg
drm/amd/display: use GFP_ATOMIC in amdgpu_dm_irq_schedule_work
drm/amd/display: Remove invalid assert for ODM + MPC case
drm/amd/pm: bug fix for the runtime pm BACO
drm/amdgpu: handle VCN instances when harvesting (v2)
drm/meson: fix colour distortion from HDR set during vendor u-boot
drm/i915/gvt: Fix cached atomics setting for Windows VM
drm/amdgpu: Add preferred mode in modeset when freesync video mode's enabled.
drm/amd/pm: Fix a memory leak in an error handling path in 'vangogh_tables_init()'
drm/amdgpu: don't enable baco on boco platforms in runpm
drm/amdgpu: set RAS EEPROM address from VBIOS
drm/amd/pm: update smu v13.0.1 firmware header
drm/mediatek: Fix cursor plane no update
drm/mediatek: mtk-dpi: Set out_fmt from config if not the last bridge
drm/mediatek: dpi: Fix NULL dereference in mtk_dpi_bridge_atomic_check
When both the old and the new PCI drivers are enabled
in the same kernel, there are a couple of namespace
conflicts that cause a build failure:
drivers/pci/controller/pci-ixp4xx.c:38: error: "IXP4XX_PCI_CSR" redefined [-Werror]
38 | #define IXP4XX_PCI_CSR 0x1c
|
In file included from arch/arm/mach-ixp4xx/include/mach/hardware.h:23,
from arch/arm/mach-ixp4xx/include/mach/io.h:15,
from arch/arm/include/asm/io.h:198,
from include/linux/io.h:13,
from drivers/pci/controller/pci-ixp4xx.c:20:
arch/arm/mach-ixp4xx/include/mach/ixp4xx-regs.h:221: note: this is the location of the previous definition
221 | #define IXP4XX_PCI_CSR(x) ((volatile u32 *)(IXP4XX_PCI_CFG_BASE_VIRT+(x)))
|
drivers/pci/controller/pci-ixp4xx.c:148:12: error: 'ixp4xx_pci_read' redeclared as different kind of symbol
148 | static int ixp4xx_pci_read(struct ixp4xx_pci *p, u32 addr, u32 cmd, u32 *data)
| ^~~~~~~~~~~~~~~
Rename both the ixp4xx_pci_read/ixp4xx_pci_write functions and the
IXP4XX_PCI_CSR macro. In each case, I went with the version that
has fewer callers to keep the change small.
Fixes: f7821b4934 ("PCI: ixp4xx: Add a new driver for IXP4xx")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: soc@kernel.org
Link: https://lore.kernel.org/r/20210721151546.2325937-1-arnd@kernel.org'
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Stefan Schmidt says:
====================
ieee802154 for net 2021-08-12
Mostly fixes coming from bot reports. Dongliang Mu tackled some syzkaller
reports in hwsim again and Takeshi Misawa a memory leak in ieee802154 raw.
* tag 'ieee802154-for-davem-2021-08-12' of git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan:
net: Fix memory leak in ieee802154_raw_deliver
ieee802154: hwsim: fix GPF in hwsim_new_edge_nl
ieee802154: hwsim: fix GPF in hwsim_set_edge_lqi
====================
Link: https://lore.kernel.org/r/20210812183912.1663996-1-stefan@datenfreihafen.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Creating a new sub monitoring group in the root /sys/fs/resctrl leads to
getting the "Unavailable" value for mbm_total_bytes and mbm_local_bytes
on the entire filesystem.
Steps to reproduce:
1. mount -t resctrl resctrl /sys/fs/resctrl/
2. cd /sys/fs/resctrl/
3. cat mon_data/mon_L3_00/mbm_total_bytes
23189832
4. Create sub monitor group:
mkdir mon_groups/test1
5. cat mon_data/mon_L3_00/mbm_total_bytes
Unavailable
When a new monitoring group is created, a new RMID is assigned to the
new group. But the RMID is not active yet. When the events are read on
the new RMID, it is expected to report the status as "Unavailable".
When the user reads the events on the default monitoring group with
multiple subgroups, the events on all subgroups are consolidated
together. Currently, if any of the RMID reads report as "Unavailable",
then everything will be reported as "Unavailable".
Fix the issue by discarding the "Unavailable" reads and reporting all
the successful RMID reads. This is not a problem on Intel systems as
Intel reports 0 on Inactive RMIDs.
Fixes: d89b737901 ("x86/intel_rdt/cqm: Add mon_data")
Reported-by: Paweł Szulik <pawel.szulik@intel.com>
Signed-off-by: Babu Moger <Babu.Moger@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Reinette Chatre <reinette.chatre@intel.com>
Cc: stable@vger.kernel.org
Link: https://bugzilla.kernel.org/show_bug.cgi?id=213311
Link: https://lkml.kernel.org/r/162793309296.9224.15871659871696482080.stgit@bmoger-ubuntu
There's a potential deadlock case when remove the vsock device or
process the RESET event:
vsock_for_each_connected_socket:
spin_lock_bh(&vsock_table_lock) ----------- (1)
...
virtio_vsock_reset_sock:
lock_sock(sk) --------------------- (2)
...
spin_unlock_bh(&vsock_table_lock)
lock_sock() may do initiative schedule when the 'sk' is owned by
other thread at the same time, we would receivce a warning message
that "scheduling while atomic".
Even worse, if the next task (selected by the scheduler) try to
release a 'sk', it need to request vsock_table_lock and the deadlock
occur, cause the system into softlockup state.
Call trace:
queued_spin_lock_slowpath
vsock_remove_bound
vsock_remove_sock
virtio_transport_release
__vsock_release
vsock_release
__sock_release
sock_close
__fput
____fput
So we should not require sk_lock in this case, just like the behavior
in vhost_vsock or vmci.
Fixes: 0ea9e1d3a9 ("VSOCK: Introduce virtio_transport.ko")
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Longpeng(Mike) <longpeng2@huawei.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://lore.kernel.org/r/20210812053056.1699-1-longpeng2@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Since the 'bootconfig' command line parameter is handled before
parsing the command line, it doesn't use early_param(). But in
this case, kernel shows a wrong warning message about it.
[ 0.013714] Kernel command line: ro console=ttyS0 bootconfig console=tty0
[ 0.013741] Unknown command line parameters: bootconfig
To suppress this message, add a dummy handler for 'bootconfig'.
Link: https://lkml.kernel.org/r/162812945097.77369.1849780946468010448.stgit@devnote2
Fixes: 86d1919a4f ("init: print out unknown kernel parameters")
Reviewed-by: Andrew Halaney <ahalaney@redhat.com>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Commit 2860cd8a23 ("livepatch: Use the default ftrace_ops instead of
REGS when ARGS is available") intends to enable config LIVEPATCH when
ftrace with ARGS is available. However, the chain of configs to enable
LIVEPATCH is incomplete, as HAVE_DYNAMIC_FTRACE_WITH_ARGS is available,
but the definition of DYNAMIC_FTRACE_WITH_ARGS, combining DYNAMIC_FTRACE
and HAVE_DYNAMIC_FTRACE_WITH_ARGS, needed to enable LIVEPATCH, is missing
in the commit.
Fortunately, ./scripts/checkkconfigsymbols.py detects this and warns:
DYNAMIC_FTRACE_WITH_ARGS
Referencing files: kernel/livepatch/Kconfig
So, define the config DYNAMIC_FTRACE_WITH_ARGS analogously to the already
existing similar configs, DYNAMIC_FTRACE_WITH_REGS and
DYNAMIC_FTRACE_WITH_DIRECT_CALLS, in ./kernel/trace/Kconfig to connect the
chain of configs.
Link: https://lore.kernel.org/kernel-janitors/CAKXUXMwT2zS9fgyQHKUUiqo8ynZBdx2UEUu1WnV_q0OCmknqhw@mail.gmail.com/
Link: https://lkml.kernel.org/r/20210806195027.16808-1-lukas.bulwahn@gmail.com
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Miroslav Benes <mbenes@suse.cz>
Cc: stable@vger.kernel.org
Fixes: 2860cd8a23 ("livepatch: Use the default ftrace_ops instead of REGS when ARGS is available")
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Some extra flags are printed to the trace header when using the
PREEMPT_RT config. The extra flags are: need-resched-lazy,
preempt-lazy-depth, and migrate-disable.
Without printing these fields, the timerlat specific fields are
shifted by three positions, for example:
# tracer: timerlat
#
# _-----=> irqs-off
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# || /
# |||| ACTIVATION
# TASK-PID CPU# |||| TIMESTAMP ID CONTEXT LATENCY
# | | | |||| | | | |
<idle>-0 [000] d..h... 3279.798871: #1 context irq timer_latency 830 ns
<...>-807 [000] ....... 3279.798881: #1 context thread timer_latency 11301 ns
Add a new header for timerlat with the missing fields, to be used
when the PREEMPT_RT is enabled.
Link: https://lkml.kernel.org/r/babb83529a3211bd0805be0b8c21608230202c55.1626598844.git.bristot@kernel.org
Cc: Tom Zanussi <zanussi@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Some extra flags are printed to the trace header when using the
PREEMPT_RT config. The extra flags are: need-resched-lazy,
preempt-lazy-depth, and migrate-disable.
Without printing these fields, the osnoise specific fields are
shifted by three positions, for example:
# tracer: osnoise
#
# _-----=> irqs-off
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth MAX
# || / SINGLE Interference counters:
# |||| RUNTIME NOISE %% OF CPU NOISE +-----------------------------+
# TASK-PID CPU# |||| TIMESTAMP IN US IN US AVAILABLE IN US HW NMI IRQ SIRQ THREAD
# | | | |||| | | | | | | | | | |
<...>-741 [000] ....... 1105.690909: 1000000 234 99.97660 36 21 0 1001 22 3
<...>-742 [001] ....... 1105.691923: 1000000 281 99.97190 197 7 0 1012 35 14
<...>-743 [002] ....... 1105.691958: 1000000 1324 99.86760 118 11 0 1016 155 143
<...>-744 [003] ....... 1105.691998: 1000000 109 99.98910 21 4 0 1004 33 7
<...>-745 [004] ....... 1105.692015: 1000000 2023 99.79770 97 37 0 1023 52 18
Add a new header for osnoise with the missing fields, to be used
when the PREEMPT_RT is enabled.
Link: https://lkml.kernel.org/r/1f03289d2a51fde5a58c2e7def063dc630820ad1.1626598844.git.bristot@kernel.org
Cc: Tom Zanussi <zanussi@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Pull ucounts fix from Eric Biederman:
"This fixes the ucount sysctls on big endian architectures.
The counts were expanded to be longs instead of ints, and the sysctl
code was overlooked, so only the low 32bit were being processed. On
litte endian just processing the low 32bits is fine, but on 64bit big
endian processing just the low 32bits results in the high order bits
instead of the low order bits being processed and nothing works
proper.
This change took a little bit to mature as we have the SYSCTL_ZERO,
and SYSCTL_INT_MAX macros that are only usable for sysctls operating
on ints, but unfortunately are not obviously broken. Which resulted in
the versions of this change working on big endian and not on little
endian, because the int SYSCTL_ZERO when extended 64bit wound up being
0x100000000. So we only allowed values greater than 0x100000000 and
less than 0faff. Which unfortunately broken everything that tried to
set the sysctls. (First reported with the windows subsystem for
linux).
I have tested this on x86_64 64bit after first reproducing the
problems with the earlier version of this change, and then verifying
the problems do not exist when we use appropriate long min and max
values for extra1 and extra2"
* 'for-v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
ucounts: add missing data type changes
During unlink/rename/lease break, deferred work for close is
scheduled immediately but in an asynchronous manner which might
lead to race with actual(unlink/rename) commands.
This change will schedule close synchronously which will avoid
the race conditions with other commands.
Signed-off-by: Rohith Surabattula <rohiths@microsoft.com>
Reviewed-by: Shyam Prasad N <sprasad@microsoft.com>
Cc: stable@vger.kernel.org # 5.13
Signed-off-by: Steve French <stfrench@microsoft.com>
When rename is executed on directory which has files for which
close is deferred, then rename will fail with EACCES.
This patch will try to close all deferred files when EACCES is received
and retry rename on a directory.
Signed-off-by: Rohith Surabattula <rohiths@microsoft.com>
Cc: stable@vger.kernel.org # 5.13
Reviewed-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
There is a TOCTOU issue in set_evtchn_to_irq. Rows in the evtchn_to_irq
mapping are lazily allocated in this function. The check whether the row
is already present and the row initialization is not synchronized. Two
threads can at the same time allocate a new row for evtchn_to_irq and
add the irq mapping to the their newly allocated row. One thread will
overwrite what the other has set for evtchn_to_irq[row] and therefore
the irq mapping is lost. This will trigger a BUG_ON later in
bind_evtchn_to_cpu:
INFO: pci 0000:1a:15.4: [1d0f:8061] type 00 class 0x010802
INFO: nvme 0000:1a:12.1: enabling device (0000 -> 0002)
INFO: nvme nvme77: 1/0/0 default/read/poll queues
CRIT: kernel BUG at drivers/xen/events/events_base.c:427!
WARN: invalid opcode: 0000 [#1] SMP NOPTI
WARN: Workqueue: nvme-reset-wq nvme_reset_work [nvme]
WARN: RIP: e030:bind_evtchn_to_cpu+0xc2/0xd0
WARN: Call Trace:
WARN: set_affinity_irq+0x121/0x150
WARN: irq_do_set_affinity+0x37/0xe0
WARN: irq_setup_affinity+0xf6/0x170
WARN: irq_startup+0x64/0xe0
WARN: __setup_irq+0x69e/0x740
WARN: ? request_threaded_irq+0xad/0x160
WARN: request_threaded_irq+0xf5/0x160
WARN: ? nvme_timeout+0x2f0/0x2f0 [nvme]
WARN: pci_request_irq+0xa9/0xf0
WARN: ? pci_alloc_irq_vectors_affinity+0xbb/0x130
WARN: queue_request_irq+0x4c/0x70 [nvme]
WARN: nvme_reset_work+0x82d/0x1550 [nvme]
WARN: ? check_preempt_wakeup+0x14f/0x230
WARN: ? check_preempt_curr+0x29/0x80
WARN: ? nvme_irq_check+0x30/0x30 [nvme]
WARN: process_one_work+0x18e/0x3c0
WARN: worker_thread+0x30/0x3a0
WARN: ? process_one_work+0x3c0/0x3c0
WARN: kthread+0x113/0x130
WARN: ? kthread_park+0x90/0x90
WARN: ret_from_fork+0x3a/0x50
This patch sets evtchn_to_irq rows via a cmpxchg operation so that they
will be set only once. The row is now cleared before writing it to
evtchn_to_irq in order to not create a race once the row is visible for
other threads.
While at it, do not require the page to be zeroed, because it will be
overwritten with -1's in clear_evtchn_to_irq_row anyway.
Signed-off-by: Maximilian Heyne <mheyne@amazon.de>
Fixes: d0b075ffee ("xen/events: Refactor evtchn_to_irq array to be dynamically allocated")
Link: https://lore.kernel.org/r/20210812130930.127134-1-mheyne@amazon.de
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Skip (omit) any version string info that is parenthesized.
Warning: objdump version 15) is older than 2.19
Warning: Skipping posttest.
where 'objdump -v' says:
GNU objdump (GNU Binutils; SUSE Linux Enterprise 15) 2.35.1.20201123-7.18
Fixes: 8bee738bb1 ("x86: Fix objdump version check in chkobjdump.awk for different formats.")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Link: https://lore.kernel.org/r/20210731000146.2720-1-rdunlap@infradead.org
The current comment states that we check if the 64-bit kernel mapping
overlaps with the last 4K of the address space that is reserved to
error values in create_kernel_page_table, which is not the case since it
is done in setup_vm. But anyway, remove the reference to any function
and simply note that in 64-bit kernel, the check should be done as soon
as the kernel mapping base address is known.
Fixes: db6b84a368 ("riscv: Make sure the kernel mapping does not overlap with IS_ERR_VALUE")
Signed-off-by: Alexandre Ghiti <alex@ghiti.fr>
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
The RISC-V special option '-mno-relax' which to disable linker relaxations
is supported by GCC8+. For GCC7 and lower versions do not support this
option.
Fixes: fba8a8674f ("RISC-V: Add kexec support")
Signed-off-by: Changbin Du <changbin.du@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
On PowerVM, CPU-less nodes can be populated with hot-plugged CPUs at
runtime. Today, the IPI is not created for such nodes, and hot-plugged
CPUs use a bogus IPI, which leads to soft lockups.
We can not directly allocate and request the IPI on demand because
bringup_up() is called under the IRQ sparse lock. The alternative is
to allocate the IPIs for all possible nodes at startup and to request
the mapping on demand when the first CPU of a node is brought up.
Fixes: 7dcc37b3ef ("powerpc/xive: Map one IPI interrupt per node")
Cc: stable@vger.kernel.org # v5.13
Reported-by: Geetika Moolchandani <Geetika.Moolchandani1@ibm.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Tested-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Tested-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210807072057.184698-1-clg@kaod.org
OXFW 971 has no function to use the value in syt field of received
isochronous packet for playback timing generation. In kernel prepatch for
v5.14, ALSA OXFW driver got change to send NO_INFO value in the field
instead of actual timing value. The change brings Apogee Duet FireWire to
generate no playback sound, while output meter moves.
As long as I investigate, _any_ value in the syt field takes the device to
generate sound. It's reasonable to think that the device just ignores data
blocks in packet with NO_INFO value in its syt field for audio data
processing.
This commit adds a new flag for the quirk to fix regression.
Fixes: 029ffc4294 ("ALSA: oxfw: perform sequence replay for media clock recovery")
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Link: https://lore.kernel.org/r/20210812022839.42043-1-o-takashi@sakamocchi.jp
Signed-off-by: Takashi Iwai <tiwai@suse.de>
The SFC_DONE register lives within the corresponding VD0/VD2/VD4/VD6
forcewake domain and is not accessible if the vdbox in that domain is
fused off and the forcewake is not initialized.
This mistake went unnoticed because until recently we were using the
wrong register offset for the SFC_DONE register; once the register
offset was corrected, we started hitting errors like
<4> [544.989065] i915 0000:cc:00.0: Uninitialized forcewake domain(s) 0x80 accessed at 0x1ce000
on parts with fused-off vdbox engines.
Fixes: e50dbdbfd9 ("drm/i915/tgl: Add SFC instdone to error state")
Fixes: 9c9c6d0ab0 ("drm/i915: Correct SFC_DONE register offset")
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210806174130.1058960-1-matthew.d.roper@intel.com
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
(cherry picked from commit c5589bb5dc)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
[Changed Fixes tag to match the cherry-picked 82929a2140]
The call to sja1105_mdiobus_unregister is present in the error path but
absent from the main driver unbind path.
Fixes: 5a8f09748e ("net: dsa: sja1105: register the MDIO buses for 100base-T1 and 100base-TX")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit 0efea3c649 because of:
- The returning -ENOBUF error is fine on socket buffer allocation.
- There is side effect in the calling path
tipc_node_xmit()->tipc_link_xmit() when checking error code returning.
Fixes: 0efea3c649 ("tipc: Return the correct errno code")
Acked-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
The ocelot driver makes use of regmap, wrapping it with driver specific
operations that are thin wrappers around the core regmap APIs. These are
exported with EXPORT_SYMBOL, dropping the _GPL from the core regmap
exports which is frowned upon. Add _GPL suffixes to at least the APIs that
are doing register I/O.
Signed-off-by: Mark Brown <broonie@kernel.org>
Acked-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit f84f5b6f72, which is
causing regressions on some platforms, preventing them to boot or do a
clean reboot. This is because the above commit is sending also all the
zero bandwidth requests to turn off any resources that might be enabled
unnecessarily, but currently this may turn off interconnects that are
enabled by default, but with no consumer to keep them on.
Let's revert this for now as some platforms are not ready for such
change yet. In the future we can introduce some _ignore_unused option
that could keep also the unused resources on platforms that have only
partial interconnect support and also add .shutdown callbacks to deal
with disabling the resources in the right order.
Reported-by: Stephen Boyd <swboyd@chromium.org>
Reported-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Link: https://lore.kernel.org/r/CAE-0n52iVgX0JjjnYi=NDg49xP961p=+W5R2bmO+2xwRceFhfA@mail.gmail.com
Signed-off-by: Georgi Djakov <djakov@kernel.org>
Pull seccomp fixes from Kees Cook:
- Fix typo in user notification documentation (Rodrigo Campos)
- Fix userspace counter report when using TSYNC (Hsuan-Chi Kuo, Wiktor
Garbacz)
* tag 'seccomp-v5.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
seccomp: Fix setting loaded filter count during TSYNC
Documentation: seccomp: Fix typo in user notification
Add component_del in OVL and COLOR remove function.
Fixes: ff1395609e ("drm/mediatek: Move mtk_ddp_comp_init() from sub driver to DRM driver")
Signed-off-by: jason-jh.lin <jason-jh.lin@mediatek.com>
Signed-off-by: Chun-Kuang Hu <chunkuang.hu@kernel.org>
To avoid the output width and height is incorrect,
AAL_OUTPUT_SIZE configuration should be set.
Fixes: 0664d1392c ("drm/mediatek: Add AAL engine basic function")
Signed-off-by: jason-jh.lin <jason-jh.lin@mediatek.com>
Signed-off-by: Chun-Kuang Hu <chunkuang.hu@kernel.org>
Currently if BBR congestion control is initialized after more than 2B
packets have been delivered, depending on the phase of the
tp->delivered counter the tracking of BBR round trips can get stuck.
The bug arises because if tp->delivered is between 2^31 and 2^32 at
the time the BBR congestion control module is initialized, then the
initialization of bbr->next_rtt_delivered to 0 will cause the logic to
believe that the end of the round trip is still billions of packets in
the future. More specifically, the following check will fail
repeatedly:
!before(rs->prior_delivered, bbr->next_rtt_delivered)
and thus the connection will take up to 2B packets delivered before
that check will pass and the connection will set:
bbr->round_start = 1;
This could cause many mechanisms in BBR to fail to trigger, for
example bbr_check_full_bw_reached() would likely never exit STARTUP.
This bug is 5 years old and has not been observed, and as a practical
matter this would likely rarely trigger, since it would require
transferring at least 2B packets, or likely more than 3 terabytes of
data, before switching congestion control algorithms to BBR.
This patch is a stable candidate for kernels as far back as v4.9,
when tcp_bbr.c was added.
Fixes: 0f8782ea14 ("tcp_bbr: add BBR congestion control")
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Yuchung Cheng <ycheng@google.com>
Reviewed-by: Kevin Yang <yyd@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20210811024056.235161-1-ncardwell@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
After migrating my laptop from 4.19-LTS to 5.4-LTS a while ago I noticed
that my Ethernet port to which a bond and a VLAN interface are attached
appeared to remain up after resuming from suspend with the cable unplugged
(and that problem still persists with 5.10-LTS).
It happens that the following happens:
- the network driver (e1000e here) prepares to suspend, calls e1000e_down()
which calls netif_carrier_off() to signal that the link is going down.
- netif_carrier_off() adds a link_watch event to the list of events for
this device
- the device is completely stopped.
- the machine suspends
- the cable is unplugged and the machine brought to another location
- the machine is resumed
- the queued linkwatch events are processed for the device
- the device doesn't yet have the __LINK_STATE_PRESENT bit and its events
are silently dropped
- the device is resumed with its link down
- the upper VLAN and bond interfaces are never notified that the link had
been turned down and remain up
- the only way to provoke a change is to physically connect the machine
to a port and possibly unplug it.
The state after resume looks like this:
$ ip -br li | egrep 'bond|eth'
bond0 UP e8:6a:64:64:64:64 <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP>
eth0 DOWN e8:6a:64:64:64:64 <NO-CARRIER,BROADCAST,MULTICAST,SLAVE,UP>
eth0.2@eth0 UP e8:6a:64:64:64:64 <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP>
Placing an explicit call to netdev_state_change() either in the suspend
or the resume code in the NIC driver worked around this but the solution
is not satisfying.
The issue in fact really is in link_watch that loses events while it
ought not to. It happens that the test for the device being present was
added by commit 124eee3f69 ("net: linkwatch: add check for netdevice
being present to linkwatch_do_dev") in 4.20 to avoid an access to
devices that are not present.
Instead of dropping events, this patch proceeds slightly differently by
postponing their handling so that they happen after the device is fully
resumed.
Fixes: 124eee3f69 ("net: linkwatch: add check for netdevice being present to linkwatch_do_dev")
Link: https://lists.openwall.net/netdev/2018/03/15/62
Cc: Heiner Kallweit <hkallweit1@gmail.com>
Cc: Geert Uytterhoeven <geert+renesas@glider.be>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
Link: https://lore.kernel.org/r/20210809160628.22623-1-w@1wt.eu
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If rcu_read_lock_sched tracing is enabled, the tracing subsystem can
perform a jump which needs to be checked by CFI. For example, stm_ftrace
source is enabled as a module and hooks into enabled ftrace events. This
can cause an recursive loop where find_shadow_check_fn ->
rcu_read_lock_sched -> (call to stm_ftrace generates cfi slowpath) ->
find_shadow_check_fn -> rcu_read_lock_sched -> ...
To avoid the recursion, either the ftrace codes needs to be marked with
__no_cfi or CFI should not trace. Use the "_notrace" in CFI to avoid
tracing so that CFI can guard ftrace.
Signed-off-by: Elliot Berman <quic_eberman@quicinc.com>
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Cc: stable@vger.kernel.org
Fixes: cf68fffb66 ("add support for Clang CFI")
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20210811155914.19550-1-quic_eberman@quicinc.com
This reverts commit 08a9ad8bf6 ("block/mq-deadline: Add cgroup support")
and a follow-up commit c06bc5a3fb ("block/mq-deadline: Remove a
WARN_ON_ONCE() call"). The added cgroup support has the following issues:
* It breaks cgroup interface file format rule by adding custom elements to a
nested key-value file.
* It registers mq-deadline as a cgroup-aware policy even though all it's
doing is collecting per-cgroup stats. Even if we need these stats, this
isn't the right way to add them.
* It hasn't been reviewed from cgroup side.
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
A recent change in LLVM causes module_{c,d}tor sections to appear when
CONFIG_K{A,C}SAN are enabled, which results in orphan section warnings
because these are not handled anywhere:
ld.lld: warning: arch/x86/pci/built-in.a(legacy.o):(.text.asan.module_ctor) is being placed in '.text.asan.module_ctor'
ld.lld: warning: arch/x86/pci/built-in.a(legacy.o):(.text.asan.module_dtor) is being placed in '.text.asan.module_dtor'
ld.lld: warning: arch/x86/pci/built-in.a(legacy.o):(.text.tsan.module_ctor) is being placed in '.text.tsan.module_ctor'
Fangrui explains: "the function asan.module_ctor has the SHF_GNU_RETAIN
flag, so it is in a separate section even with -fno-function-sections
(default)".
Place them in the TEXT_TEXT section so that these technologies continue
to work with the newer compiler versions. All of the KASAN and KCSAN
KUnit tests continue to pass after this change.
Cc: stable@vger.kernel.org
Link: https://github.com/ClangBuiltLinux/linux/issues/1432
Link: 7b78956224
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Fangrui Song <maskray@google.com>
Acked-by: Marco Elver <elver@google.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20210731023107.1932981-1-nathan@kernel.org
Fix the NFIT parsing code to treat a 0 index in a SPA Range Structure as
a special case and not match Region Mapping Structures that use 0 to
indicate that they are not mapped. Without this fix some platform BIOS
descriptions of "virtual disk" ranges do not result in the pmem driver
attaching to the range.
Details:
In addition to typical persistent memory ranges, the ACPI NFIT may also
convey "virtual" ranges. These ranges are indicated by a UUID in the SPA
Range Structure of UUID_VOLATILE_VIRTUAL_DISK, UUID_VOLATILE_VIRTUAL_CD,
UUID_PERSISTENT_VIRTUAL_DISK, or UUID_PERSISTENT_VIRTUAL_CD. The
critical difference between virtual ranges and UUID_PERSISTENT_MEMORY,
is that virtual do not support associations with Region Mapping
Structures. For this reason the "index" value of virtual SPA Range
Structures is allowed to be 0. If a platform BIOS decides to represent
NVDIMMs with disconnected "Region Mapping Structures" (range-index ==
0), the kernel may falsely associate them with standalone ranges where
the "SPA Range Structure Index" is also zero. When this happens the
driver may falsely require labels where "virtual disks" are expected to
be label-less. I.e. "label-less" is where the namespace-range ==
region-range and the pmem driver attaches with no user action to create
a namespace.
Cc: Jacek Zloch <jacek.zloch@intel.com>
Cc: Lukasz Sobieraj <lukasz.sobieraj@intel.com>
Cc: "Lee, Chun-Yi" <jlee@suse.com>
Cc: <stable@vger.kernel.org>
Fixes: c2f32acdf8 ("acpi, nfit: treat virtual ramdisk SPA as pmem region")
Reported-by: Krzysztof Rusocki <krzysztof.rusocki@intel.com>
Reported-by: Damian Bassa <damian.bassa@intel.com>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Link: https://lore.kernel.org/r/162870796589.2521182.1240403310175570220.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Currently, if bpf_get_current_cgroup_id() or
bpf_get_current_ancestor_cgroup_id() helper is
called with sleepable programs e.g., sleepable
fentry/fmod_ret/fexit/lsm programs, a rcu warning
may appear. For example, if I added the following
hack to test_progs/test_lsm sleepable fentry program
test_sys_setdomainname:
--- a/tools/testing/selftests/bpf/progs/lsm.c
+++ b/tools/testing/selftests/bpf/progs/lsm.c
@@ -168,6 +168,10 @@ int BPF_PROG(test_sys_setdomainname, struct pt_regs *regs)
int buf = 0;
long ret;
+ __u64 cg_id = bpf_get_current_cgroup_id();
+ if (cg_id == 1000)
+ copy_test++;
+
ret = bpf_copy_from_user(&buf, sizeof(buf), ptr);
if (len == -2 && ret == 0 && buf == 1234)
copy_test++;
I will hit the following rcu warning:
include/linux/cgroup.h:481 suspicious rcu_dereference_check() usage!
other info that might help us debug this:
rcu_scheduler_active = 2, debug_locks = 1
1 lock held by test_progs/260:
#0: ffffffffa5173360 (rcu_read_lock_trace){....}-{0:0}, at: __bpf_prog_enter_sleepable+0x0/0xa0
stack backtrace:
CPU: 1 PID: 260 Comm: test_progs Tainted: G O 5.14.0-rc2+ #176
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
Call Trace:
dump_stack_lvl+0x56/0x7b
bpf_get_current_cgroup_id+0x9c/0xb1
bpf_prog_a29888d1c6706e09_test_sys_setdomainname+0x3e/0x89c
bpf_trampoline_6442469132_0+0x2d/0x1000
__x64_sys_setdomainname+0x5/0x110
do_syscall_64+0x3a/0x80
entry_SYSCALL_64_after_hwframe+0x44/0xae
I can get similar warning using bpf_get_current_ancestor_cgroup_id() helper.
syzbot reported a similar issue in [1] for syscall program. Helper
bpf_get_current_cgroup_id() or bpf_get_current_ancestor_cgroup_id()
has the following callchain:
task_dfl_cgroup
task_css_set
task_css_set_check
and we have
#define task_css_set_check(task, __c) \
rcu_dereference_check((task)->cgroups, \
lockdep_is_held(&cgroup_mutex) || \
lockdep_is_held(&css_set_lock) || \
((task)->flags & PF_EXITING) || (__c))
Since cgroup_mutex/css_set_lock is not held and the task
is not existing and rcu read_lock is not held, a warning
will be issued. Note that bpf sleepable program is protected by
rcu_read_lock_trace().
The above sleepable bpf programs are already protected
by migrate_disable(). Adding rcu_read_lock() in these
two helpers will silence the above warning.
I marked the patch fixing 95b861a793
("bpf: Allow bpf_get_current_ancestor_cgroup_id for tracing")
which added bpf_get_current_ancestor_cgroup_id() to tracing programs
in 5.14. I think backporting 5.14 is probably good enough as sleepable
progrems are not widely used.
This patch should fix [1] as well since syscall program is a sleepable
program protected with migrate_disable().
[1] https://lore.kernel.org/bpf/0000000000006d5cab05c7d9bb87@google.com/
Fixes: 95b861a793 ("bpf: Allow bpf_get_current_ancestor_cgroup_id for tracing")
Reported-by: syzbot+7ee5c2c09c284495371f@syzkaller.appspotmail.com
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210810230537.2864668-1-yhs@fb.com
intel-pinctrl for v5.14-2
* Fix the software mapping of GPIOs on Intel Tiger Lake-H
The following is an automated git shortlog grouped by driver:
tigerlake:
- Fix GPIO mapping for newer version of software
get_queue_type() comments that splict virtqueue is preferred, however,
the actual logic preferred packed virtqueues. Since firmware has not
supported packed virtqueues we ended up using split virtqueues as was
desired.
Since we do not advertise support for packed virtqueues, we add a check
to verify split virtqueues are indeed supported.
Fixes: 1a86b377aa ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices")
Signed-off-by: Eli Cohen <elic@nvidia.com>
Link: https://lore.kernel.org/r/20210811053759.66752-1-elic@nvidia.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The current code treats an empty iotlb provdied in set_map() as a
special case and destroy the memory region object. This must not be done
since the virtqueue objects reference this MR. Doing so will cause the
driver unload to emit errors and log timeouts caused by the firmware
complaining on busy resources.
This patch treats an empty iotlb as any other change of mapping. In this
case, mlx5_vdpa_create_mr() will fail and the entire set_map() call to
fail.
This issue has not been encountered before but was seen to occur in a
non-official version of qemu. Since qemu is a userspace program, the
driver must protect against such case.
Fixes: 94abbccdf2 ("vdpa/mlx5: Add shared memory registration code")
Signed-off-by: Eli Cohen <elic@nvidia.com>
Link: https://lore.kernel.org/r/20210811053713.66658-1-elic@nvidia.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
An untrusted device might presents an invalid block size
in configuration space. This tries to add validation for it
in the validate callback and clear the VIRTIO_BLK_F_BLK_SIZE
feature bit if the value is out of the supported range.
And we also double check the value in virtblk_probe() in
case that it's changed after the validation.
Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
Link: https://lore.kernel.org/r/20210809101609.148-1-xieyongji@bytedance.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
As __vringh_iov() traverses a descriptor chain, it populates
each descriptor entry into either read or write vring iov
and increments that iov's ->used member. So, as we iterate
over a descriptor chain, at any point, (riov/wriov)->used
value gives the number of descriptor enteries available,
which are to be read or written by the device. As all read
iovs must precede the write iovs, wiov->used should be zero
when we are traversing a read descriptor. Current code checks
for wiov->i, to figure out whether any previous entry in the
current descriptor chain was a write descriptor. However,
iov->i is only incremented, when these vring iovs are consumed,
at a later point, and remain 0 in __vringh_iov(). So, correct
the check for read and write descriptor order, to use
wiov->used.
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Neeraj Upadhyay <neeraju@codeaurora.org>
Link: https://lore.kernel.org/r/1624591502-4827-1-git-send-email-neeraju@codeaurora.org
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
commit a5b8ca97fb ("arm64: do not descend to vdso directories twice")
changes the cleaning behavior of arm64's vdso files, in that vdso.lds,
vdso.so, and vdso.so.dbg are not removed upon a 'make clean/mrproper':
$ make defconfig ARCH=arm64
$ make ARCH=arm64
$ make mrproper ARCH=arm64
$ git clean -nxdf
Would remove arch/arm64/kernel/vdso/vdso.lds
Would remove arch/arm64/kernel/vdso/vdso.so
Would remove arch/arm64/kernel/vdso/vdso.so.dbg
To remedy this, manually descend into arch/arm64/kernel/vdso upon
cleaning.
After this commit:
$ make defconfig ARCH=arm64
$ make ARCH=arm64
$ make mrproper ARCH=arm64
$ git clean -nxdf
<empty>
Similar results are obtained for the vdso32 equivalent.
Signed-off-by: Andrew Delgadillo <adelg@google.com>
Cc: stable@vger.kernel.org
Fixes: a5b8ca97fb ("arm64: do not descend to vdso directories twice")
Link: https://lore.kernel.org/r/20210810231755.1743524-1-adelg@google.com
Signed-off-by: Will Deacon <will@kernel.org>
We have changed the return type for sysc_check_active_timer() from -EBUSY
to -ENXIO, but the gpt12 system timer fix still checks for -EBUSY. We are
also not returning on other errors like we did earlier as noted by
Pavel Machek <pavel@denx.de>.
Commit 3ff340e24c ("bus: ti-sysc: Fix gpt12 system timer issue with
reserved status") should have been updated for commit 65fb736761
("bus: ti-sysc: suppress err msg for timers used as clockevent/source").
Let's fix the issue by checking for -ENXIO and returning on any other
errors as suggested by Pavel Machek <pavel@denx.de>.
Fixes: 3ff340e24c ("bus: ti-sysc: Fix gpt12 system timer issue with reserved status")
Depends-on: 65fb736761 ("bus: ti-sysc: suppress err msg for timers used as clockevent/source")
Reported-by: Pavel Machek <pavel@denx.de>
Reviewed-by: Pavel Machek (CIP) <pavel@denx.de>
Cc: Grygorii Strashko <grygorii.strashko@ti.com>
Cc: Jarkko Nikula <jarkko.nikula@bitmer.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Pull ARC fixes from Vineet Gupta:
- Fix FPU_STATUS update
- Update my email address
- Other spellos and fixes
* tag 'arc-5.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
MAINTAINERS: update Vineet's email address
ARC: fp: set FPU_STATUS.FWE to enable FPU_STATUS update on context switch
ARC: Fix CONFIG_STACKDEPOT
arc: Fix spelling mistake and grammar in Kconfig
arc: Prefer unsigned int to bare use of unsigned
Append i2c-sysfs to toctree in order to get rid of building warnings.
Fixes: 31df7195b1 ("Documentation: i2c: Add doc for I2C sysfs")
Signed-off-by: Hu Haowen <src.res@email.cn>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
If an i2c driver happens to not provide the full amount of data that a
user asks for, it is possible that some uninitialized data could be sent
to userspace. While all in-kernel drivers look to be safe, just be sure
by initializing the buffer to zero before it is passed to the i2c driver
so that any future drivers will not have this issue.
Also properly copy the amount of data recvieved to the userspace buffer,
as pointed out by Dan Carpenter.
Reported-by: Eric Dumazet <edumazet@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
The blamed commit added a new field to struct switchdev_notifier_fdb_info,
but did not make sure that all call paths set it to something valid.
For example, a switchdev driver may emit a SWITCHDEV_FDB_ADD_TO_BRIDGE
notifier, and since the 'is_local' flag is not set, it contains junk
from the stack, so the bridge might interpret those notifications as
being for local FDB entries when that was not intended.
To avoid that now and in the future, zero-initialize all
switchdev_notifier_fdb_info structures created by drivers such that all
newly added fields to not need to touch drivers again.
Fixes: 2c4eca3ef7 ("net: bridge: switchdev: include local flag in FDB notifications")
Reported-by: Ido Schimmel <idosch@idosch.org>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Karsten Graul <kgraul@linux.ibm.com>
Link: https://lore.kernel.org/r/20210810115024.1629983-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ignore fdb flags when adding port extern learn entries and always set
BR_FDB_LOCAL flag when adding bridge extern learn entries. This is
closest to the behaviour we had before and avoids breaking any use cases
which were allowed.
This patch fixes iproute2 calls which assume NUD_PERMANENT and were
allowed before, example:
$ bridge fdb add 00:11:22:33:44:55 dev swp1 extern_learn
Extern learn entries are allowed to roam, but do not expire, so static
or dynamic flags make no sense for them.
Also add a comment for future reference.
Fixes: eb100e0e24 ("net: bridge: allow to add externally learned entries from user-space")
Fixes: 0541a62932 ("net: bridge: validate the NUD_PERMANENT bit when adding an extern_learn FDB entry")
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20210810110010.43859-1-razor@blackwall.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Use the secondary_exec_controls_get() accessor in vmx_has_waitpkg() to
effectively get the controls for the current VMCS, as opposed to using
vmx->secondary_exec_controls, which is the cached value of KVM's desired
controls for vmcs01 and truly not reflective of any particular VMCS.
While the waitpkg control is not dynamic, i.e. vmcs01 will always hold
the same waitpkg configuration as vmx->secondary_exec_controls, the same
does not hold true for vmcs02 if the L1 VMM hides the feature from L2.
If L1 hides the feature _and_ does not intercept MSR_IA32_UMWAIT_CONTROL,
L2 could incorrectly read/write L1's virtual MSR instead of taking a #GP.
Fixes: 6e3ba4abce ("KVM: vmx: Emulate MSR IA32_UMWAIT_CONTROL")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210810171952.2758100-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Pull x86 platform driver fixes from Hans de Goede:
"Small set of pdx86 fixes for 5.14"
* tag 'platform-drivers-x86-v5.14-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
platform/x86: pcengines-apuv2: Add missing terminating entries to gpio-lookup tables
platform/x86: Make dual_accel_detect() KIOX010A + KIOX020A detect more robust
platform/x86: Add and use a dual_accel_detect() helper
Pull overlayfs fixes from Miklos Szeredi:
"Fix several bugs in overlayfs"
* tag 'ovl-fixes-5.14-rc6-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
ovl: prevent private clone if bind mount is not allowed
ovl: fix uninitialized pointer read in ovl_lookup_real_one()
ovl: fix deadlock in splice write
ovl: skip stale entries in merge dir cache iteration
When a virtio pci device undergo surprise removal (aka async removal in
PCIe spec), mark the device as broken so that any upper layer drivers can
abort any outstanding operation.
When a virtio net pci device undergo surprise removal which is used by a
NetworkManager, a below call trace was observed.
kernel:watchdog: BUG: soft lockup - CPU#1 stuck for 26s! [kworker/1:1:27059]
watchdog: BUG: soft lockup - CPU#1 stuck for 52s! [kworker/1:1:27059]
CPU: 1 PID: 27059 Comm: kworker/1:1 Tainted: G S W I L 5.13.0-hotplug+ #8
Hardware name: Dell Inc. PowerEdge R640/0H28RR, BIOS 2.9.4 11/06/2020
Workqueue: events linkwatch_event
RIP: 0010:virtnet_send_command+0xfc/0x150 [virtio_net]
Call Trace:
virtnet_set_rx_mode+0xcf/0x2a7 [virtio_net]
? __hw_addr_create_ex+0x85/0xc0
__dev_mc_add+0x72/0x80
igmp6_group_added+0xa7/0xd0
ipv6_mc_up+0x3c/0x60
ipv6_find_idev+0x36/0x80
addrconf_add_dev+0x1e/0xa0
addrconf_dev_config+0x71/0x130
addrconf_notify+0x1f5/0xb40
? rtnl_is_locked+0x11/0x20
? __switch_to_asm+0x42/0x70
? finish_task_switch+0xaf/0x2c0
? raw_notifier_call_chain+0x3e/0x50
raw_notifier_call_chain+0x3e/0x50
netdev_state_change+0x67/0x90
linkwatch_do_dev+0x3c/0x50
__linkwatch_run_queue+0xd2/0x220
linkwatch_event+0x21/0x30
process_one_work+0x1c8/0x370
worker_thread+0x30/0x380
? process_one_work+0x370/0x370
kthread+0x118/0x140
? set_kthread_struct+0x40/0x40
ret_from_fork+0x1f/0x30
Hence, add the ability to abort the command on surprise removal
which prevents infinite loop and system lockup.
Signed-off-by: Parav Pandit <parav@nvidia.com>
Link: https://lore.kernel.org/r/20210721142648.1525924-5-parav@nvidia.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Currently vq->broken field is read by virtqueue_is_broken() in busy
loop in one context by virtnet_send_command().
vq->broken is set to true in other process context by
virtio_break_device(). Reader and writer are accessing it without any
synchronization. This may lead to a compiler optimization which may
result to optimize reading vq->broken only once.
Hence, force reading vq->broken on each invocation of
virtqueue_is_broken() and also force writing it so that such
update is visible to the readers.
It is a theoretical fix that isn't yet encountered in the field.
Signed-off-by: Parav Pandit <parav@nvidia.com>
Link: https://lore.kernel.org/r/20210721142648.1525924-2-parav@nvidia.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Daniel Borkmann says:
====================
bpf 2021-08-10
We've added 5 non-merge commits during the last 2 day(s) which contain
a total of 7 files changed, 27 insertions(+), 15 deletions(-).
1) Fix missing bpf_read_lock_trace() context for BPF loader progs, from Yonghong Song.
2) Fix corner case where BPF prog retrieves wrong local storage, also from Yonghong Song.
3) Restrict availability of BPF write_user helper behind lockdown, from Daniel Borkmann.
4) Fix multiple kernel-doc warnings in BPF core, from Randy Dunlap.
* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
bpf, core: Fix kernel-doc notation
bpf: Fix potentially incorrect results with bpf_get_local_storage()
bpf: Add missing bpf_read_[un]lock_trace() for syscall program
bpf: Add lockdown check for probe_write_user helper
bpf: Add _kernel suffix to internal lockdown_bpf_read
====================
Link: https://lore.kernel.org/r/20210810144025.22814-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
msi_domain_alloc_irqs() invokes irq_domain_activate_irq(), but
msi_domain_free_irqs() does not enforce deactivation before tearing down
the interrupts.
This happens when PCI/MSI interrupts are set up and never used before being
torn down again, e.g. in error handling pathes. The only place which cleans
that up is the error handling path in msi_domain_alloc_irqs().
Move the cleanup from msi_domain_alloc_irqs() into msi_domain_free_irqs()
to cure that.
Fixes: f3b0946d62 ("genirq/msi: Make sure PCI MSIs are activated early")
Signed-off-by: Bixuan Cui <cuibixuan@huawei.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20210518033117.78104-1-cuibixuan@huawei.com
When we are building all the various pinctrl structures for the
Allwinner pinctrl devices, we do some estimation about the maximum
number of distinct function (names) that we will need.
So far we take the number of pins as an upper bound, even though we
can actually have up to four special functions per pin. This wasn't a
problem until now, since we indeed have typically far more pins than
functions, and most pins share common functions.
However the H616 "-r" pin controller has only two pins, but four
functions, so we run over the end of the array when we are looking for
a matching function name in sunxi_pinctrl_add_function - there is no
NULL sentinel left that would terminate the loop:
[ 8.200648] Unable to handle kernel paging request at virtual address fffdff7efbefaff5
[ 8.209179] Mem abort info:
....
[ 8.368456] Call trace:
[ 8.370925] __pi_strcmp+0x90/0xf0
[ 8.374559] sun50i_h616_r_pinctrl_probe+0x1c/0x28
[ 8.379557] platform_probe+0x68/0xd8
Do an actual worst case allocation (4 functions per pin, three common
functions and the sentinel) for the initial array allocation. This is
now heavily overestimating the number of functions in the common case,
but we will reallocate this array later with the actual number of
functions, so it's only temporarily.
Fixes: 561c1cf17c ("pinctrl: sunxi: Add support for the Allwinner H616-R pin controller")
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Acked-by: Maxime Ripard <maxime@cerno.tech>
Link: https://lore.kernel.org/r/20210722132548.22121-1-andre.przywara@arm.com
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Vladimir Oltean says:
====================
Fix broken backpressure during FDB dump in DSA drivers
rtnl_fdb_dump() has logic to split a dump of PF_BRIDGE neighbors into
multiple netlink skbs if the buffer provided by user space is too small
(one buffer will typically handle a few hundred FDB entries).
When the current buffer becomes full, nlmsg_put() in
dsa_slave_port_fdb_do_dump() returns -EMSGSIZE and DSA saves the index
of the last dumped FDB entry, returns to rtnl_fdb_dump() up to that
point, and then the dump resumes on the same port with a new skb, and
FDB entries up to the saved index are simply skipped.
Since dsa_slave_port_fdb_do_dump() is pointed to by the "cb" passed to
drivers, then drivers must check for the -EMSGSIZE error code returned
by it. Otherwise, when a netlink skb becomes full, DSA will no longer
save newly dumped FDB entries to it, but the driver will continue
dumping. So FDB entries will be missing from the dump.
DSA is one of the few switchdev drivers that have an .ndo_fdb_dump
implementation, because of the assumption that the hardware and software
FDBs cannot be efficiently kept in sync via SWITCHDEV_FDB_ADD_TO_BRIDGE.
Other drivers with a home-cooked .ndo_fdb_dump implementation are
ocelot and dpaa2-switch. These appear to do the correct thing, as do the
other DSA drivers, so nothing else appears to need fixing.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
rtnl_fdb_dump() has logic to split a dump of PF_BRIDGE neighbors into
multiple netlink skbs if the buffer provided by user space is too small
(one buffer will typically handle a few hundred FDB entries).
When the current buffer becomes full, nlmsg_put() in
dsa_slave_port_fdb_do_dump() returns -EMSGSIZE and DSA saves the index
of the last dumped FDB entry, returns to rtnl_fdb_dump() up to that
point, and then the dump resumes on the same port with a new skb, and
FDB entries up to the saved index are simply skipped.
Since dsa_slave_port_fdb_do_dump() is pointed to by the "cb" passed to
drivers, then drivers must check for the -EMSGSIZE error code returned
by it. Otherwise, when a netlink skb becomes full, DSA will no longer
save newly dumped FDB entries to it, but the driver will continue
dumping. So FDB entries will be missing from the dump.
Fix the broken backpressure by propagating the "cb" return code and
allow rtnl_fdb_dump() to restart the FDB dump with a new skb.
Fixes: 291d1e72b7 ("net: dsa: sja1105: Add support for FDB and MDB management")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
rtnl_fdb_dump() has logic to split a dump of PF_BRIDGE neighbors into
multiple netlink skbs if the buffer provided by user space is too small
(one buffer will typically handle a few hundred FDB entries).
When the current buffer becomes full, nlmsg_put() in
dsa_slave_port_fdb_do_dump() returns -EMSGSIZE and DSA saves the index
of the last dumped FDB entry, returns to rtnl_fdb_dump() up to that
point, and then the dump resumes on the same port with a new skb, and
FDB entries up to the saved index are simply skipped.
Since dsa_slave_port_fdb_do_dump() is pointed to by the "cb" passed to
drivers, then drivers must check for the -EMSGSIZE error code returned
by it. Otherwise, when a netlink skb becomes full, DSA will no longer
save newly dumped FDB entries to it, but the driver will continue
dumping. So FDB entries will be missing from the dump.
Fix the broken backpressure by propagating the "cb" return code and
allow rtnl_fdb_dump() to restart the FDB dump with a new skb.
Fixes: 58c59ef9e9 ("net: dsa: lantiq: Add Forwarding Database access")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
rtnl_fdb_dump() has logic to split a dump of PF_BRIDGE neighbors into
multiple netlink skbs if the buffer provided by user space is too small
(one buffer will typically handle a few hundred FDB entries).
When the current buffer becomes full, nlmsg_put() in
dsa_slave_port_fdb_do_dump() returns -EMSGSIZE and DSA saves the index
of the last dumped FDB entry, returns to rtnl_fdb_dump() up to that
point, and then the dump resumes on the same port with a new skb, and
FDB entries up to the saved index are simply skipped.
Since dsa_slave_port_fdb_do_dump() is pointed to by the "cb" passed to
drivers, then drivers must check for the -EMSGSIZE error code returned
by it. Otherwise, when a netlink skb becomes full, DSA will no longer
save newly dumped FDB entries to it, but the driver will continue
dumping. So FDB entries will be missing from the dump.
Fix the broken backpressure by propagating the "cb" return code and
allow rtnl_fdb_dump() to restart the FDB dump with a new skb.
Fixes: ab335349b8 ("net: dsa: lan9303: Add port_fast_age and port_fdb_dump methods")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
rtnl_fdb_dump() has logic to split a dump of PF_BRIDGE neighbors into
multiple netlink skbs if the buffer provided by user space is too small
(one buffer will typically handle a few hundred FDB entries).
When the current buffer becomes full, nlmsg_put() in
dsa_slave_port_fdb_do_dump() returns -EMSGSIZE and DSA saves the index
of the last dumped FDB entry, returns to rtnl_fdb_dump() up to that
point, and then the dump resumes on the same port with a new skb, and
FDB entries up to the saved index are simply skipped.
Since dsa_slave_port_fdb_do_dump() is pointed to by the "cb" passed to
drivers, then drivers must check for the -EMSGSIZE error code returned
by it. Otherwise, when a netlink skb becomes full, DSA will no longer
save newly dumped FDB entries to it, but the driver will continue
dumping. So FDB entries will be missing from the dump.
Fix the broken backpressure by propagating the "cb" return code and
allow rtnl_fdb_dump() to restart the FDB dump with a new skb.
Fixes: e4b27ebc78 ("net: dsa: Add DSA driver for Hirschmann Hellcreek switches")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Kurt Kanzenbach <kurt@linutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
This fixes a Keystone 2 regression discovered as a side effect of
defining an passing the physical start/end sections of the kernel
to the MMU remapping code.
As the Keystone applies an offset to all physical addresses,
including those identified and patches by phys2virt, we fail to
account for this offset in the kernel_sec_start and kernel_sec_end
variables.
Further these offsets can extend into the 64bit range on LPAE
systems such as the Keystone 2.
Fix it like this:
- Extend kernel_sec_start and kernel_sec_end to be 64bit
- Add the offset also to kernel_sec_start and kernel_sec_end
As passing kernel_sec_start and kernel_sec_end as 64bit invariably
incurs BE8 endianness issues I have attempted to dry-code around
these.
Tested on the Vexpress QEMU model both with and without LPAE
enabled.
Fixes: 6e121df14c ("ARM: 9090/1: Map the lowmem and kernel separately")
Reported-by: Nishanth Menon <nmenon@kernel.org>
Suggested-by: Russell King <rmk+kernel@armlinux.org.uk>
Tested-by: Grygorii Strashko <grygorii.strashko@ti.com>
Tested-by: Nishanth Menon <nmenon@kernel.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Fix kernel-doc warnings in kernel/bpf/core.c (found by scripts/kernel-doc
and W=1 builds). That is, correct a function name in a comment and add
return descriptions for 2 functions.
Fixes these kernel-doc warnings:
kernel/bpf/core.c:1372: warning: expecting prototype for __bpf_prog_run(). Prototype was for ___bpf_prog_run() instead
kernel/bpf/core.c:1372: warning: No description found for return value of '___bpf_prog_run'
kernel/bpf/core.c:1883: warning: No description found for return value of 'bpf_prog_select_runtime'
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210809215229.7556-1-rdunlap@infradead.org
The X86 MSI mechanism cannot handle interrupt affinity changes safely after
startup other than from an interrupt handler, unless interrupt remapping is
enabled. The startup sequence in the generic interrupt code violates that
assumption.
Mark the irq chips with the new IRQCHIP_AFFINITY_PRE_STARTUP flag so that
the default interrupt setting happens before the interrupt is started up
for the first time.
While the interrupt remapping MSI chip does not require this, there is no
point in treating it differently as this might spare an interrupt to a CPU
which is not in the default affinity mask.
For the non-remapping case go to the direct write path when the interrupt
is not yet started similar to the not yet activated case.
Fixes: 1840475676 ("genirq: Expose default irq affinity mask (take 3)")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20210729222542.886722080@linutronix.de
The IO/APIC cannot handle interrupt affinity changes safely after startup
other than from an interrupt handler. The startup sequence in the generic
interrupt code violates that assumption.
Mark the irq chip with the new IRQCHIP_AFFINITY_PRE_STARTUP flag so that
the default interrupt setting happens before the interrupt is started up
for the first time.
Fixes: 1840475676 ("genirq: Expose default irq affinity mask (take 3)")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20210729222542.832143400@linutronix.de
X86 IO/APIC and MSI interrupts (when used without interrupts remapping)
require that the affinity setup on startup is done before the interrupt is
enabled for the first time as the non-remapped operation mode cannot safely
migrate enabled interrupts from arbitrary contexts. Provide a new irq chip
flag which allows affected hardware to request this.
This has to be opt-in because there have been reports in the past that some
interrupt chips cannot handle affinity setting before startup.
Fixes: 1840475676 ("genirq: Expose default irq affinity mask (take 3)")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20210729222542.779791738@linutronix.de
Multi-MSI uses a single MSI descriptor and there is a single mask register
when the device supports per vector masking. To avoid reading back the mask
register the value is cached in the MSI descriptor and updates are done by
clearing and setting bits in the cache and writing it to the device.
But nothing protects msi_desc::masked and the mask register from being
modified concurrently on two different CPUs for two different Linux
interrupts which belong to the same multi-MSI descriptor.
Add a lock to struct device and protect any operation on the mask and the
mask register with it.
This makes the update of msi_desc::masked unconditional, but there is no
place which requires a modification of the hardware register without
updating the masked cache.
msi_mask_irq() is now an empty wrapper which will be cleaned up in follow
up changes.
The problem goes way back to the initial support of multi-MSI, but picking
the commit which introduced the mask cache is a valid cut off point
(2.6.30).
Fixes: f2440d9acb ("PCI MSI: Refactor interrupt masking code")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20210729222542.726833414@linutronix.de
msi_mask_irq() takes a mask and a flags argument. The mask argument is used
to mask out bits from the cached mask and the flags argument to set bits.
Some places invoke it with a flags argument which sets bits which are not
used by the device, i.e. when the device supports up to 8 vectors a full
unmask in some places sets the mask to 0xFFFFFF00. While devices probably
do not care, it's still bad practice.
Fixes: 7ba1930db0 ("PCI MSI: Unmask MSI if setup failed")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20210729222542.568173099@linutronix.de
Nothing enforces the posted writes to be visible when the function
returns. Flush them even if the flush might be redundant when the entry is
masked already as the unmask will flush as well. This is either setup or a
rare affinity change event so the extra flush is not the end of the world.
While this is more a theoretical issue especially the logic in the X86
specific msi_set_affinity() function relies on the assumption that the
update has reached the hardware when the function returns.
Again, as this never has been enforced the Fixes tag refers to a commit in:
git://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
Fixes: f036d4ea5fa7 ("[PATCH] ia32 Message Signalled Interrupt support")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20210729222542.515188147@linutronix.de
The specification (PCIe r5.0, sec 6.1.4.5) states:
For MSI-X, a function is permitted to cache Address and Data values
from unmasked MSI-X Table entries. However, anytime software unmasks a
currently masked MSI-X Table entry either by clearing its Mask bit or
by clearing the Function Mask bit, the function must update any Address
or Data values that it cached from that entry. If software changes the
Address or Data value of an entry while the entry is unmasked, the
result is undefined.
The Linux kernel's MSI-X support never enforced that the entry is masked
before the entry is modified hence the Fixes tag refers to a commit in:
git://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
Enforce the entry to be masked across the update.
There is no point in enforcing this to be handled at all possible call
sites as this is just pointless code duplication and the common update
function is the obvious place to enforce this.
Fixes: f036d4ea5fa7 ("[PATCH] ia32 Message Signalled Interrupt support")
Reported-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20210729222542.462096385@linutronix.de
When MSI-X is enabled the ordering of calls is:
msix_map_region();
msix_setup_entries();
pci_msi_setup_msi_irqs();
msix_program_entries();
This has a few interesting issues:
1) msix_setup_entries() allocates the MSI descriptors and initializes them
except for the msi_desc:masked member which is left zero initialized.
2) pci_msi_setup_msi_irqs() allocates the interrupt descriptors and sets
up the MSI interrupts which ends up in pci_write_msi_msg() unless the
interrupt chip provides its own irq_write_msi_msg() function.
3) msix_program_entries() does not do what the name suggests. It solely
updates the entries array (if not NULL) and initializes the masked
member for each MSI descriptor by reading the hardware state and then
masks the entry.
Obviously this has some issues:
1) The uninitialized masked member of msi_desc prevents the enforcement
of masking the entry in pci_write_msi_msg() depending on the cached
masked bit. Aside of that half initialized data is a NONO in general
2) msix_program_entries() only ensures that the actually allocated entries
are masked. This is wrong as experimentation with crash testing and
crash kernel kexec has shown.
This limited testing unearthed that when the production kernel had more
entries in use and unmasked when it crashed and the crash kernel
allocated a smaller amount of entries, then a full scan of all entries
found unmasked entries which were in use in the production kernel.
This is obviously a device or emulation issue as the device reset
should mask all MSI-X table entries, but obviously that's just part
of the paper specification.
Cure this by:
1) Masking all table entries in hardware
2) Initializing msi_desc::masked in msix_setup_entries()
3) Removing the mask dance in msix_program_entries()
4) Renaming msix_program_entries() to msix_update_entries() to
reflect the purpose of that function.
As the masking of unused entries has never been done the Fixes tag refers
to a commit in:
git://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
Fixes: f036d4ea5fa7 ("[PATCH] ia32 Message Signalled Interrupt support")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20210729222542.403833459@linutronix.de
The ordering of MSI-X enable in hardware is dysfunctional:
1) MSI-X is disabled in the control register
2) Various setup functions
3) pci_msi_setup_msi_irqs() is invoked which ends up accessing
the MSI-X table entries
4) MSI-X is enabled and masked in the control register with the
comment that enabling is required for some hardware to access
the MSI-X table
Step #4 obviously contradicts #3. The history of this is an issue with the
NIU hardware. When #4 was introduced the table access actually happened in
msix_program_entries() which was invoked after enabling and masking MSI-X.
This was changed in commit d71d6432e1 ("PCI/MSI: Kill redundant call of
irq_set_msi_desc() for MSI-X interrupts") which removed the table write
from msix_program_entries().
Interestingly enough nobody noticed and either NIU still works or it did
not get any testing with a kernel 3.19 or later.
Nevertheless this is inconsistent and there is no reason why MSI-X can't be
enabled and masked in the control register early on, i.e. move step #4
above to step #1. This preserves the NIU workaround and has no side effects
on other hardware.
Fixes: d71d6432e1 ("PCI/MSI: Kill redundant call of irq_set_msi_desc() for MSI-X interrupts")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Ashok Raj <ashok.raj@intel.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20210729222542.344136412@linutronix.de
Ben Hutchings says:
====================
ksz8795 VLAN fixes
This series fixes a number of bugs in the ksz8795 driver that affect
VLAN filtering, tag insertion, and tag removal.
I've tested these on the KSZ8795CLXD evaluation board, and checked the
register usage against the datasheets for the other supported chips.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The magic number 4 in VLAN table lookup was the number of entries we
can read and write at once. Using phy_port_cnt here doesn't make
sense and presumably broke VLAN filtering for 3-port switches. Change
it back to 4.
Fixes: 4ce2a984ab ("net: dsa: microchip: ksz8795: use phy_port_cnt ...")
Signed-off-by: Ben Hutchings <ben.hutchings@mind.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently ksz8_port_vlan_filtering() sets or clears the VLAN Enable
hardware flag. That controls discarding of packets with a VID that
has not been enabled for any port on the switch.
Since it is a global flag, set the dsa_switch::vlan_filtering_is_global
flag so that the DSA core understands this can't be controlled per
port.
When VLAN filtering is enabled, the switch should also discard packets
with a VID that's not enabled on the ingress port. Set or clear each
external port's VLAN Ingress Filter flag in ksz8_port_vlan_filtering()
to make that happen.
Fixes: e66f840c08 ("net: dsa: ksz: Add Microchip KSZ8795 DSA driver")
Signed-off-by: Ben Hutchings <ben.hutchings@mind.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
On the CPU port, we can support both tagged and untagged VLANs at the
same time by doing any necessary untagging in software rather than
hardware. To enable that, keep the CPU port's Remove Tag flag cleared
and set the dsa_switch::untag_bridge_pvid flag.
Fixes: e66f840c08 ("net: dsa: ksz: Add Microchip KSZ8795 DSA driver")
Signed-off-by: Ben Hutchings <ben.hutchings@mind.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a VLAN is deleted from a port, the flags in struct
switchdev_obj_port_vlan are always 0. ksz8_port_vlan_del() copies the
BRIDGE_VLAN_INFO_UNTAGGED flag to the port's Tag Removal flag, and
therefore always clears it.
In case there are multiple VLANs configured as untagged on this port -
which seems useless, but is allowed - deleting one of them changes the
remaining VLANs to be tagged.
It's only ever necessary to change this flag when a VLAN is added to
the port, so leave it unchanged in ksz8_port_vlan_del().
Fixes: e66f840c08 ("net: dsa: ksz: Add Microchip KSZ8795 DSA driver")
Signed-off-by: Ben Hutchings <ben.hutchings@mind.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
The switches supported by ksz8795 only have a per-port flag for Tag
Removal. This means it is not possible to support both tagged and
untagged VLANs on the same port. Reject attempts to add a VLAN that
requires the flag to be changed, unless there are no VLANs currently
configured.
VID 0 is excluded from this check since it is untagged regardless of
the state of the flag.
On the CPU port we could support tagged and untagged VLANs at the same
time. This will be enabled by a later patch.
Fixes: e66f840c08 ("net: dsa: ksz: Add Microchip KSZ8795 DSA driver")
Signed-off-by: Ben Hutchings <ben.hutchings@mind.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
ksz8795 has never actually enabled PVID tag insertion, and it also
programmed the PVID incorrectly. To fix this:
* Allow tag insertion to be controlled per ingress port. On most
chips, set bit 2 in Global Control 19. On KSZ88x3 this control
flag doesn't exist.
* When adding a PVID:
- Set the appropriate register bits to enable tag insertion on
egress at every other port if this was the packet's ingress port.
- Mask *out* the VID from the default tag, before or-ing in the new
PVID.
* When removing a PVID:
- Clear the same control bits to disable tag insertion.
- Don't update the default tag. This wasn't doing anything useful.
Fixes: e66f840c08 ("net: dsa: ksz: Add Microchip KSZ8795 DSA driver")
Signed-off-by: Ben Hutchings <ben.hutchings@mind.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
ksz_read64() currently does some dubious byte-swapping on the two
halves of a 64-bit register, and then only returns the high bits.
Replace this with a straightforward expression.
Fixes: e66f840c08 ("net: dsa: ksz: Add Microchip KSZ8795 DSA driver")
Signed-off-by: Ben Hutchings <ben.hutchings@mind.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
linux-can-fixes-for-5.14-20210810
Marc Kleine-Budde says:
====================
pull-request: can 2021-08-10
this is a pull request of 2 patches for net/master.
Baruch Siach's patch fixes a typo for the Microchip CAN BUS Analyzer
Tool entry in the MAINTAINERS file.
Hussein Alasadi fixes the setting of the M_CAN_DBTP register in the
m_can driver. The regression git mainline in v5.14-rc1, so no backport
to stable is needed.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Saeed Mahameed says:
====================
mlx5 fixes 2021-08-09
This series introduces fixes to mlx5 driver.
Please pull and let me know if there is any problem.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2021-08-09
This series contains updates to ice and iavf drivers.
Ani prevents the ice driver from accidentally being probed to a virtual
function and stops processing of VF messages when VFs are being torn
down.
Brett prevents the ice driver from deleting is own MAC address.
Fahad ensures the RSS LUT and key are always set following reset for
iavf.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit b910eaaaa4 ("bpf: Fix NULL pointer dereference in bpf_get_local_storage()
helper") fixed a bug for bpf_get_local_storage() helper so different tasks
won't mess up with each other's percpu local storage.
The percpu data contains 8 slots so it can hold up to 8 contexts (same or
different tasks), for 8 different program runs, at the same time. This in
general is sufficient. But our internal testing showed the following warning
multiple times:
[...]
warning: WARNING: CPU: 13 PID: 41661 at include/linux/bpf-cgroup.h:193
__cgroup_bpf_run_filter_sock_ops+0x13e/0x180
RIP: 0010:__cgroup_bpf_run_filter_sock_ops+0x13e/0x180
<IRQ>
tcp_call_bpf.constprop.99+0x93/0xc0
tcp_conn_request+0x41e/0xa50
? tcp_rcv_state_process+0x203/0xe00
tcp_rcv_state_process+0x203/0xe00
? sk_filter_trim_cap+0xbc/0x210
? tcp_v6_inbound_md5_hash.constprop.41+0x44/0x160
tcp_v6_do_rcv+0x181/0x3e0
tcp_v6_rcv+0xc65/0xcb0
ip6_protocol_deliver_rcu+0xbd/0x450
ip6_input_finish+0x11/0x20
ip6_input+0xb5/0xc0
ip6_sublist_rcv_finish+0x37/0x50
ip6_sublist_rcv+0x1dc/0x270
ipv6_list_rcv+0x113/0x140
__netif_receive_skb_list_core+0x1a0/0x210
netif_receive_skb_list_internal+0x186/0x2a0
gro_normal_list.part.170+0x19/0x40
napi_complete_done+0x65/0x150
mlx5e_napi_poll+0x1ae/0x680
__napi_poll+0x25/0x120
net_rx_action+0x11e/0x280
__do_softirq+0xbb/0x271
irq_exit_rcu+0x97/0xa0
common_interrupt+0x7f/0xa0
</IRQ>
asm_common_interrupt+0x1e/0x40
RIP: 0010:bpf_prog_1835a9241238291a_tw_egress+0x5/0xbac
? __cgroup_bpf_run_filter_skb+0x378/0x4e0
? do_softirq+0x34/0x70
? ip6_finish_output2+0x266/0x590
? ip6_finish_output+0x66/0xa0
? ip6_output+0x6c/0x130
? ip6_xmit+0x279/0x550
? ip6_dst_check+0x61/0xd0
[...]
Using drgn [0] to dump the percpu buffer contents showed that on this CPU
slot 0 is still available, but slots 1-7 are occupied and those tasks in
slots 1-7 mostly don't exist any more. So we might have issues in
bpf_cgroup_storage_unset().
Further debugging confirmed that there is a bug in bpf_cgroup_storage_unset().
Currently, it tries to unset "current" slot with searching from the start.
So the following sequence is possible:
1. A task is running and claims slot 0
2. Running BPF program is done, and it checked slot 0 has the "task"
and ready to reset it to NULL (not yet).
3. An interrupt happens, another BPF program runs and it claims slot 1
with the *same* task.
4. The unset() in interrupt context releases slot 0 since it matches "task".
5. Interrupt is done, the task in process context reset slot 0.
At the end, slot 1 is not reset and the same process can continue to occupy
slots 2-7 and finally, when the above step 1-5 is repeated again, step 3 BPF
program won't be able to claim an empty slot and a warning will be issued.
To fix the issue, for unset() function, we should traverse from the last slot
to the first. This way, the above issue can be avoided.
The same reverse traversal should also be done in bpf_get_local_storage() helper
itself. Otherwise, incorrect local storage may be returned to BPF program.
[0] https://github.com/osandov/drgn
Fixes: b910eaaaa4 ("bpf: Fix NULL pointer dereference in bpf_get_local_storage() helper")
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210810010413.1976277-1-yhs@fb.com
Pull EFI fixes from Ard Biesheuvel:
A batch of fixes for the arm64 stub image loader:
- fix a logic bug that can make the random page allocator fail
spuriously
- force reallocation of the Image when it overlaps with firmware
reserved memory regions
- fix an oversight that defeated on optimization introduced earlier
where images loaded at a suitable offset are never moved if booting
without randomization
- complain about images that were not loaded at the right offset by the
firmware image loader.
Link: https://lore.kernel.org/r/20210803091215.2566-1-ardb@kernel.org
Add the following checks from __do_loopback() to clone_private_mount() as
well:
- verify that the mount is in the current namespace
- verify that there are no locked children
Reported-by: Alois Wohlschlager <alois1@gmx-topmail.de>
Fixes: c771d683a6 ("vfs: introduce clone_private_mount()")
Cc: <stable@vger.kernel.org> # v3.18
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
One error path can result in release_dentry_name_snapshot() being called
before "name" was initialized by take_dentry_name_snapshot().
Fix by moving the release_dentry_name_snapshot() to immediately after the
only use.
Reported-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
There's possibility of an ABBA deadlock in case of a splice write to an
overlayfs file and a concurrent splice write to a corresponding real file.
The call chain for splice to an overlay file:
-> do_splice [takes sb_writers on overlay file]
-> do_splice_from
-> iter_file_splice_write [takes pipe->mutex]
-> vfs_iter_write
...
-> ovl_write_iter [takes sb_writers on real file]
And the call chain for splice to a real file:
-> do_splice [takes sb_writers on real file]
-> do_splice_from
-> iter_file_splice_write [takes pipe->mutex]
Syzbot successfully bisected this to commit 82a763e61e ("ovl: simplify
file splice").
Fix by reverting the write part of the above commit and by adding missing
bits from ovl_write_iter() into ovl_splice_write().
Fixes: 82a763e61e ("ovl: simplify file splice")
Reported-and-tested-by: syzbot+579885d1a9a833336209@syzkaller.appspotmail.com
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
On the first getdents call, ovl_iterate() populates the readdir cache
with a list of entries, but for upper entries with origin lower inode,
p->ino remains zero.
Following getdents calls traverse the readdir cache list and call
ovl_cache_update_ino() for entries with zero p->ino to lookup the entry
in the overlay and return d_ino that is consistent with st_ino.
If the upper file was unlinked between the first getdents call and the
getdents call that lists the file entry, ovl_cache_update_ino() will not
find the entry and fall back to setting d_ino to the upper real st_ino,
which is inconsistent with how this object was presented to users.
Instead of listing a stale entry with inconsistent d_ino, simply skip
the stale entry, which is better for users.
xfstest overlay/077 is failing without this patch.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Link: https://lore.kernel.org/fstests/CAOQ4uxgR_cLnC_vdU5=seP3fwqVkuZM_-WfD6maFTMbMYq=a9w@mail.gmail.com/
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Commit 79a7f8bdb1 ("bpf: Introduce bpf_sys_bpf() helper and program type.")
added support for syscall program, which is a sleepable program.
But the program run missed bpf_read_lock_trace()/bpf_read_unlock_trace(),
which is needed to ensure proper rcu callback invocations. This patch adds
bpf_read_[un]lock_trace() properly.
Fixes: 79a7f8bdb1 ("bpf: Introduce bpf_sys_bpf() helper and program type.")
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210809235151.1663680-1-yhs@fb.com
Back then, commit 96ae522795 ("bpf: Add bpf_probe_write_user BPF helper
to be called in tracers") added the bpf_probe_write_user() helper in order
to allow to override user space memory. Its original goal was to have a
facility to "debug, divert, and manipulate execution of semi-cooperative
processes" under CAP_SYS_ADMIN. Write to kernel was explicitly disallowed
since it would otherwise tamper with its integrity.
One use case was shown in cf9b1199de ("samples/bpf: Add test/example of
using bpf_probe_write_user bpf helper") where the program DNATs traffic
at the time of connect(2) syscall, meaning, it rewrites the arguments to
a syscall while they're still in userspace, and before the syscall has a
chance to copy the argument into kernel space. These days we have better
mechanisms in BPF for achieving the same (e.g. for load-balancers), but
without having to write to userspace memory.
Of course the bpf_probe_write_user() helper can also be used to abuse
many other things for both good or bad purpose. Outside of BPF, there is
a similar mechanism for ptrace(2) such as PTRACE_PEEK{TEXT,DATA} and
PTRACE_POKE{TEXT,DATA}, but would likely require some more effort.
Commit 96ae522795 explicitly dedicated the helper for experimentation
purpose only. Thus, move the helper's availability behind a newly added
LOCKDOWN_BPF_WRITE_USER lockdown knob so that the helper is disabled under
the "integrity" mode. More fine-grained control can be implemented also
from LSM side with this change.
Fixes: 96ae522795 ("bpf: Add bpf_probe_write_user BPF helper to be called in tracers")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Jonathan writes:
First set of fixes for IIO in the 5.14 cycle
adi,adis:
- Ensure GPIO pin direction set explicitly in driver.
fxls8952af:
- Fix use of ret when not initialized.
- Fix issue with use of module symbol from built in.
hdc100x:
- Add a margin to conversion time as some parts run to slowly.
palmas-adc:
- Fix a wrong exit condition that leads to adc period always being set
to maximum value.
st,sensors:
- Drop a wrong restriction on number of interrupts in dt binding.
ti-ads7950:
- Ensure CS deasserted after channel read.
* tag 'iio-fixes-5.14a' of https://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio:
iio: adc: Fix incorrect exit of for-loop
iio: humidity: hdc100x: Add margin to the conversion time
dt-bindings: iio: st: Remove wrong items length check
iio: accel: fxls8962af: fix i2c dependency
iio: adis: set GPIO reset pin direction
iio: adc: ti-ads7950: Ensure CS is deasserted after reading channels
iio: accel: fxls8962af: fix potential use of uninitialized symbol
The CQ destroy is performed based on the IRQ number that is stored in
cq->irqn. That number wasn't set explicitly during CQ creation and as
expected some of the API users of mlx5_core_create_cq() forgot to update
it.
This caused to wrong synchronization call of the wrong IRQ with a number
0 instead of the real one.
As a fix, set the IRQ number directly in the mlx5_core_create_cq() and
update all users accordingly.
Fixes: 1a86b377aa ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices")
Fixes: ef1659ade3 ("IB/mlx5: Add DEVX support for CQ events")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Free the offload sample action on error.
Fixes: f94d6389f6 ("net/mlx5e: TC, Add support to offload sample action")
Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Currently irq->pool is set after the irq is insert to the xarray.
Set irq->pool before the irq is inserted to the xarray.
Fixes: 71e084e264 ("net/mlx5: Allocating a pool of MSI-X vectors for SFs")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Change order of functions in mlx5_irq_detach_nb() so it will be
a mirror of mlx5_irq_attach_nb.
Fixes: 71e084e264 ("net/mlx5: Allocating a pool of MSI-X vectors for SFs")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Since switchdev mode can't support devlink traps, verify there are
no active devlink traps before moving eswitch to switchdev mode. If
there are active traps, prevent the switchdev mode configuration.
Fixes: eb3862a052 ("net/mlx5e: Enable traps according to link state")
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
mlx5e_close_xdpsq does the cleanup: it calls mlx5e_free_xdpsq_descs to
free the outstanding descriptors, which relies on
mlx5e_page_release_dynamic and page_pool_release_page. However,
page_pool_destroy is already called by this point, because
mlx5e_close_rq runs before mlx5e_close_xdpsq.
This commit fixes the use-after-free by swapping mlx5e_close_xdpsq and
mlx5e_close_rq.
The commit cited below started calling page_pool_destroy directly from
the driver. Previously, the page pool was destroyed under a call_rcu
from xdp_rxq_info_unreg_mem_model, which would defer the deallocation
until after the XDPSQ is cleaned up.
Fixes: 1da4bbeffe ("net: core: page_pool: add user refcnt and reintroduce page_pool_destroy")
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Ageing time is not converted from clock_t to jiffies which results
incorrect ageing timeout calculation in workqueue update task. Fix it by
applying clock_t_to_jiffies() to provided value.
Fixes: c636a0f0f3 ("net/mlx5: Bridge, dynamic entry ageing")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
It could be local and remote are on the same machine and the route
result will be a local route which will result in creating encap id
with src/dst mac address of 0.
Fixes: a54e20b4fc ("net/mlx5e: Add basic TC tunnel set action for SRIOV offloads")
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
While processing encapsulated packet on RX, one of the fields that is
checked is the inner packet length. If the length as specified in the header
doesn't match the actual inner packet length, the packet is invalid
and should be dropped. However, such packet caused the NIC to hang.
This patch turns on a 'fail_on_error' HW bit which allows HW to drop
such an invalid packet while processing RX packet and trying to decap it.
Fixes: ad17dc8cf9 ("net/mlx5: DR, Move STEv0 action apply logic")
Signed-off-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Currently the call to _base_static_config_pages() is assigning the error
return to variable 'rc' but checking the error return in error 'r'. Fix
this by assigning the error return to variable 'r' instead of 'rc'.
Link: https://lore.kernel.org/r/20210804134940.114011-1-colin.king@canonical.com
Fixes: 19a622c39a ("scsi: mpt3sas: Handle firmware faults during first half of IOC init")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Addresses-Coverity: ("Unused value")
Commit 08f76547f0 ("scsi: storvsc: Update error logging") added more
robust logging of errors, particularly those reported as Hyper-V
errors. But this change produces extra logging noise in that
TEST_UNIT_READY may report errors during the normal course of detecting
device adds and removes.
Fix this by logging TEST_UNIT_READY errors as warnings, so that log lines
are produced only if the storvsc log level is changed to WARN level on the
kernel boot line.
Link: https://lore.kernel.org/r/1628269970-87876-1-git-send-email-mikelley@microsoft.com
Fixes: 08f76547f0 ("scsi: storvsc: Update error logging")
Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Ammar reports that he's seeing a lockdep splat on running test/rsrc_tags
from the regression suite:
======================================================
WARNING: possible circular locking dependency detected
5.14.0-rc3-bluetea-test-00249-gc7d102232649 #5 Tainted: G OE
------------------------------------------------------
kworker/2:4/2684 is trying to acquire lock:
ffff88814bb1c0a8 (&ctx->uring_lock){+.+.}-{3:3}, at: io_rsrc_put_work+0x13d/0x1a0
but task is already holding lock:
ffffc90001c6be70 ((work_completion)(&(&ctx->rsrc_put_work)->work)){+.+.}-{0:0}, at: process_one_work+0x1bc/0x530
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 ((work_completion)(&(&ctx->rsrc_put_work)->work)){+.+.}-{0:0}:
__flush_work+0x31b/0x490
io_rsrc_ref_quiesce.part.0.constprop.0+0x35/0xb0
__do_sys_io_uring_register+0x45b/0x1060
do_syscall_64+0x35/0xb0
entry_SYSCALL_64_after_hwframe+0x44/0xae
-> #0 (&ctx->uring_lock){+.+.}-{3:3}:
__lock_acquire+0x119a/0x1e10
lock_acquire+0xc8/0x2f0
__mutex_lock+0x86/0x740
io_rsrc_put_work+0x13d/0x1a0
process_one_work+0x236/0x530
worker_thread+0x52/0x3b0
kthread+0x135/0x160
ret_from_fork+0x1f/0x30
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock((work_completion)(&(&ctx->rsrc_put_work)->work));
lock(&ctx->uring_lock);
lock((work_completion)(&(&ctx->rsrc_put_work)->work));
lock(&ctx->uring_lock);
*** DEADLOCK ***
2 locks held by kworker/2:4/2684:
#0: ffff88810004d938 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x1bc/0x530
#1: ffffc90001c6be70 ((work_completion)(&(&ctx->rsrc_put_work)->work)){+.+.}-{0:0}, at: process_one_work+0x1bc/0x530
stack backtrace:
CPU: 2 PID: 2684 Comm: kworker/2:4 Tainted: G OE 5.14.0-rc3-bluetea-test-00249-gc7d102232649 #5
Hardware name: Acer Aspire ES1-421/OLVIA_BE, BIOS V1.05 07/02/2015
Workqueue: events io_rsrc_put_work
Call Trace:
dump_stack_lvl+0x6a/0x9a
check_noncircular+0xfe/0x110
__lock_acquire+0x119a/0x1e10
lock_acquire+0xc8/0x2f0
? io_rsrc_put_work+0x13d/0x1a0
__mutex_lock+0x86/0x740
? io_rsrc_put_work+0x13d/0x1a0
? io_rsrc_put_work+0x13d/0x1a0
? io_rsrc_put_work+0x13d/0x1a0
? process_one_work+0x1ce/0x530
io_rsrc_put_work+0x13d/0x1a0
process_one_work+0x236/0x530
worker_thread+0x52/0x3b0
? process_one_work+0x530/0x530
kthread+0x135/0x160
? set_kthread_struct+0x40/0x40
ret_from_fork+0x1f/0x30
which is due to holding the ctx->uring_lock when flushing existing
pending work, while the pending work flushing may need to grab the uring
lock if we're using IOPOLL.
Fix this by dropping the uring_lock a bit earlier as part of the flush.
Cc: stable@vger.kernel.org
Link: https://github.com/axboe/liburing/issues/404
Tested-by: Ammar Faizi <ammarfaizi2@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
There may be cases like:
A B
spin_lock(wqe->lock)
nr_workers is 0
nr_workers++
spin_unlock(wqe->lock)
spin_lock(wqe->lock)
nr_wokers is 1
nr_workers++
spin_unlock(wqe->lock)
create_io_worker()
acct->worker is 1
create_io_worker()
acct->worker is 1
There should be one worker marked IO_WORKER_F_FIXED, but no one is.
Fix this by introduce a new agrument for create_io_worker() to indicate
if it is the first worker.
Fixes: 3d4e4face9 ("io-wq: fix no lock protection of acct->nr_worker")
Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
Link: https://lore.kernel.org/r/20210808135434.68667-3-haoxu@linux.alibaba.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The former patch to add check between nr_workers and max_workers has a
bug, which will cause unconditionally creating io-workers. That's
because the result of the check doesn't affect the call of
create_io_worker(), fix it by bringing in a boolean value for it.
Fixes: 21698274da ("io-wq: fix lack of acct->nr_workers < acct->max_workers judgement")
Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
Link: https://lore.kernel.org/r/20210808135434.68667-2-haoxu@linux.alibaba.com
[axboe: drop hunk that isn't strictly needed]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pull cgroup fix from Tejun Heo:
"One commit to fix a possible A-A deadlock around u64_stats_sync on
32bit machines caused by updating it without disabling IRQ when it may
be read from IRQ context"
* 'for-5.14-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup: rstat: fix A-A deadlock on 32bit around u64_stats_sync
Repair kernel-doc notation in a few places to make it conform to
the expected format.
Fixes the following kernel-doc warnings:
flow.c:296: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
* Parse vlan tag from vlan header.
flow.c:296: warning: missing initial short description on line:
* Parse vlan tag from vlan header.
flow.c:537: warning: No description found for return value of 'key_extract_l3l4'
flow.c:769: warning: No description found for return value of 'key_extract'
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Pravin B Shelar <pshelar@ovn.org>
Cc: dev@openvswitch.org
Link: https://lore.kernel.org/r/20210808190834.23362-1-rdunlap@infradead.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Rename LOCKDOWN_BPF_READ into LOCKDOWN_BPF_READ_KERNEL so we have naming
more consistent with a LOCKDOWN_BPF_WRITE_USER option that we are adding.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
iavf driver should set RSS LUT and key unconditionally in reset
path. Currently, the driver does not do that. This patch fixes
this issue.
Fixes: 2c86ac3c70 ("i40evf: create a generic config RSS function")
Signed-off-by: Md Fahad Iqbal Polash <md.fahad.iqbal.polash@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
In some circumstances, such as with bridging, it's possible that the
stack will add the device's own MAC address to its unicast address list.
If, later, the stack deletes this address, the driver will receive a
request to remove this address.
The driver stores its current MAC address as part of the VSI MAC filter
list instead of separately. So, this causes a problem when the device's
MAC address is deleted unexpectedly, which results in traffic failure in
some cases.
The following configuration steps will reproduce the previously
mentioned problem:
> ip link set eth0 up
> ip link add dev br0 type bridge
> ip link set br0 up
> ip addr flush dev eth0
> ip link set eth0 master br0
> echo 1 > /sys/class/net/br0/bridge/vlan_filtering
> modprobe -r veth
> modprobe -r bridge
> ip addr add 192.168.1.100/24 dev eth0
The following ping command fails due to the netdev->dev_addr being
deleted when removing the bridge module.
> ping <link partner>
Fix this by making sure to not delete the netdev->dev_addr during MAC
address sync. After fixing this issue it was noticed that the
netdev_warn() in .set_mac was overly verbose, so make it at
netdev_dbg().
Also, there is a possibility of a race condition between .set_mac and
.set_rx_mode. Fix this by calling netif_addr_lock_bh() and
netif_addr_unlock_bh() on the device's netdev when the netdev->dev_addr
is going to be updated in .set_mac.
Fixes: e94d447866 ("ice: Implement filter sync, NDO operations and bump version")
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Liang Li <liali@redhat.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
When VFs are setup and torn down in quick succession, it is possible
that a VF is torn down by the PF while the VF's virtchnl requests are
still in the PF's mailbox ring. Processing the VF's virtchnl request
when the VF itself doesn't exist results in undefined behavior. Fix
this by adding a check to stop processing virtchnl requests when VF
teardown is in progress.
Fixes: ddf30f7ff8 ("ice: Add handler to configure SR-IOV")
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The userspace utility "driverctl" can be used to change/override the
system's default driver choices. This is useful in some situations
(buggy driver, old driver missing a device ID, trying a workaround,
etc.) where the user needs to load a different driver.
However, this is also prone to user error, where a driver is mapped
to a device it's not designed to drive. For example, if the ice driver
is mapped to driver iavf devices, the ice driver crashes.
Add a check to return an error if the ice driver is being used to
probe a virtual function.
Fixes: 837f08fdec ("ice: Add basic driver framework for Intel(R) E800 Series")
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Instead of appending new text attribute data at the offset specified by the
write() system call, only pass the newly written data to the .store()
callback.
Reported-by: Bodo Stroesser <bostroesser@gmail.com>
Tested-by: Bodo Stroesser <bostroesser@gmail.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
When mirror/redirect a skb to a different port, the ct info should be reset
for reclassification. Or the pkts will match unexpected rules. For example,
with following topology and commands:
-----------
|
veth0 -+-------
|
veth1 -+-------
|
------------
tc qdisc add dev veth0 clsact
# The same with "action mirred egress mirror dev veth1" or "action mirred ingress redirect dev veth1"
tc filter add dev veth0 egress chain 1 protocol ip flower ct_state +trk action mirred ingress mirror dev veth1
tc filter add dev veth0 egress chain 0 protocol ip flower ct_state -inv action ct commit action goto chain 1
tc qdisc add dev veth1 clsact
tc filter add dev veth1 ingress chain 0 protocol ip flower ct_state +trk action drop
ping <remove ip via veth0> &
tc -s filter show dev veth1 ingress
With command 'tc -s filter show', we can find the pkts were dropped on
veth1.
Fixes: b57dc7c13e ("net/sched: Introduce action ct")
Signed-off-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Guvenc Gulce says:
====================
net/smc: fixes 2021-08-09
please apply the following patch series for smc to netdev's net tree.
One patch fixes invalid connection counting for links and the other
one fixes an access to an already cleared link.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
SMC clients may be assigned to a different link after the initial
connection between two peers was established. In such a case,
the connection counter was not correctly set.
Update the connection counter correctly when a smc client connection
is assigned to a different smc link.
Fixes: 07d51580ff ("net/smc: Add connection counters for links")
Signed-off-by: Guvenc Gulce <guvenc@linux.ibm.com>
Tested-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There can be a race between the waiters for a tx work request buffer
and the link down processing that finally clears the link. Although
all waiters are woken up before the link is cleared there might be
waiters which did not yet get back control and are still waiting.
This results in an access to a cleared wait queue head.
Fix this by introducing atomic reference counting around the wait calls,
and wait with the link clear processing until all waiters have finished.
Move the work request layer related calls into smc_wr.c and set the
link state to INACTIVE before calling smcr_link_clear() in
smc_llc_srv_add_link().
Fixes: 15e1b99aad ("net/smc: no WR buffer wait for terminating link group")
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Guvenc Gulce <guvenc@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The CPSW switchdev driver inherited fix from commit 9421c90150 ("net:
ethernet: ti: cpsw: fix min eth packet size") which changes min TX packet
size to 64bytes (VLAN_ETH_ZLEN, excluding ETH_FCS). It was done to fix HW
packed drop issue when packets are sent from Host to the port with PVID and
un-tagging enabled. Unfortunately this breaks some other non-switch
specific use-cases, like:
- [1] CPSW port as DSA CPU port with DSA-tag applied at the end of the
packet
- [2] Some industrial protocols, which expects min TX packet size 60Bytes
(excluding FCS).
Fix it by configuring min TX packet size depending on driver mode
- 60Bytes (ETH_ZLEN) for multi mac (dual-mac) mode
- 64Bytes (VLAN_ETH_ZLEN) for switch mode
and update it during driver mode change and annotate with
READ_ONCE()/WRITE_ONCE() as it can be read by napi while writing.
[1] https://lore.kernel.org/netdev/20210531124051.GA15218@cephalopod/
[2] https://e2e.ti.com/support/arm/sitara_arm/f/791/t/701669
Cc: stable@vger.kernel.org
Fixes: ed3525eda4 ("net: ethernet: ti: introduce cpsw switchdev based driver part 1 - dual-emac")
Reported-by: Ben Hutchings <ben.hutchings@essensium.com>
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As mentioned in commit c07aea3ef4 ("mm: add a signature in
struct page"):
"The page->signature field is aliased to page->lru.next and
page->compound_head."
And as the comment in page_is_pfmemalloc():
"lru.next has bit 1 set if the page is allocated from the
pfmemalloc reserves. Callers may simply overwrite it if they
do not need to preserve that information."
The page->signature is OR’ed with PP_SIGNATURE when a page is
allocated in page pool, see __page_pool_alloc_pages_slow(),
and page->signature is checked directly with PP_SIGNATURE in
page_pool_return_skb_page(), which might cause resoure leaking
problem for a page from page pool if bit 1 of lru.next is set
for a pfmemalloc page. What happens here is that the original
pp->signature is OR'ed with PP_SIGNATURE after the allocation
in order to preserve any existing bits(such as the bit 1, used
to indicate a pfmemalloc page), so when those bits are present,
those page is not considered to be from page pool and the DMA
mapping of those pages will be left stale.
As bit 0 is for page->compound_head, So mask both bit 0/1 before
the checking in page_pool_return_skb_page(). And we will return
those pfmemalloc pages back to the page allocator after cleaning
up the DMA mapping.
Fixes: 6a5bcd84e8 ("page_pool: Allow drivers to hint on SKB recycling")
Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
GCC complains about empty macros in an 'if' statement, so convert
them to 'do {} while (0)' macros.
Fixes these build warnings:
net/dccp/output.c: In function 'dccp_xmit_packet':
../net/dccp/output.c:283:71: warning: suggest braces around empty body in an 'if' statement [-Wempty-body]
283 | dccp_pr_debug("transmit_skb() returned err=%d\n", err);
net/dccp/ackvec.c: In function 'dccp_ackvec_update_old':
../net/dccp/ackvec.c:163:80: warning: suggest braces around empty body in an 'else' statement [-Wempty-body]
163 | (unsigned long long)seqno, state);
Fixes: dc841e30ea ("dccp: Extend CCID packet dequeueing interface")
Fixes: 3802408644 ("dccp ccid-2: Update code for the Ack Vector input/registration routine")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: dccp@vger.kernel.org
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
The recent fix c4824ae7db ("ALSA: pcm: Fix mmap capability check")
restricts the mmap capability only to the drivers that properly set up
the buffers, but it caused a regression for a few drivers that manage
the buffer on its own way.
For those with UNKNOWN buffer type (i.e. the uninitialized / unused
substream->dma_buffer), just assume that the driver handles the mmap
properly and blindly trust the hardware info bit.
Fixes: c4824ae7db ("ALSA: pcm: Fix mmap capability check")
Reported-and-tested-by: Jeff Woods <jwoods@fnordco.com>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/s5him0gpghv.wl-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
The 1.2 GHz variant of the Armada 3720 SOC is unstable with DVFS: when
the SOC boots, the WTMI firmware sets clocks and AVS values that work
correctly with 1.2 GHz CPU frequency, but random crashes occur once
cpufreq driver starts scaling.
We do not know currently what is the reason:
- it may be that the voltage value for L0 for 1.2 GHz variant provided
by the vendor in the OTP is simply incorrect when scaling is used,
- it may be that some delay is needed somewhere,
- it may be something else.
The most sane solution now seems to be to simply forbid the cpufreq
driver on 1.2 GHz variant.
Signed-off-by: Marek Behún <kabel@kernel.org>
Fixes: 92ce45fb87 ("cpufreq: Add DVFS support for Armada 37xx")
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
The compiler should be forbidden from any strange optimization for async
writes to user visible data-structures. Without proper protection, the
compiler can cause write-tearing or invent writes that would confuse the
userspace.
However, there are writes to sq_flags which are not protected by
WRITE_ONCE(). Use WRITE_ONCE() for these writes.
This is purely a theoretical issue. Presumably, any compiler is very
unlikely to do such optimizations.
Fixes: 75b28affdd ("io_uring: allocate the two rings together")
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Nadav Amit <namit@vmware.com>
Link: https://lore.kernel.org/r/20210808001342.964634-3-namit@vmware.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
When using SQPOLL, the submission queue polling thread calls
task_work_run() to run queued work. However, when work is added with
TWA_SIGNAL - as done by io_uring itself - the TIF_NOTIFY_SIGNAL remains
set afterwards and is never cleared.
Consequently, when the submission queue polling thread checks whether
signal_pending(), it may always find a pending signal, if
task_work_add() was ever called before.
The impact of this bug might be different on different kernel versions.
It appears that on 5.14 it would only cause unnecessary calculation and
prevent the polling thread from sleeping. On 5.13, where the bug was
found, it stops the polling thread from finding newly submitted work.
Instead of task_work_run(), use tracehook_notify_signal() that clears
TIF_NOTIFY_SIGNAL. Test for TIF_NOTIFY_SIGNAL in addition to
current->task_works to avoid a race in which task_works is cleared but
the TIF_NOTIFY_SIGNAL is set.
Fixes: 685fe7feed ("io-wq: eliminate the need for a manager thread")
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Nadav Amit <namit@vmware.com>
Link: https://lore.kernel.org/r/20210808001342.964634-2-namit@vmware.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
When registering new ppp interface via PPPIOCNEWUNIT ioctl then kernel has
to choose interface name as this ioctl API does not support specifying it.
Kernel in this case register new interface with name "ppp<id>" where <id>
is the ppp unit id, which can be obtained via PPPIOCGUNIT ioctl. This
applies also in the case when registering new ppp interface via rtnl
without supplying IFLA_IFNAME.
PPPIOCNEWUNIT ioctl allows to specify own ppp unit id which will kernel
assign to ppp interface, in case this ppp id is not already used by other
ppp interface.
In case user does not specify ppp unit id then kernel choose the first free
ppp unit id. This applies also for case when creating ppp interface via
rtnl method as it does not provide a way for specifying own ppp unit id.
If some network interface (does not have to be ppp) has name "ppp<id>"
with this first free ppp id then PPPIOCNEWUNIT ioctl or rtnl call fails.
And registering new ppp interface is not possible anymore, until interface
which holds conflicting name is renamed. Or when using rtnl method with
custom interface name in IFLA_IFNAME.
As list of allocated / used ppp unit ids is not possible to retrieve from
kernel to userspace, userspace has no idea what happens nor which interface
is doing this conflict.
So change the algorithm how ppp unit id is generated. And choose the first
number which is not neither used as ppp unit id nor in some network
interface with pattern "ppp<id>".
This issue can be simply reproduced by following pppd call when there is no
ppp interface registered and also no interface with name pattern "ppp<id>":
pppd ifname ppp1 +ipv6 noip noauth nolock local nodetach pty "pppd +ipv6 noip noauth nolock local nodetach notty"
Or by creating the one ppp interface (which gets assigned ppp unit id 0),
renaming it to "ppp1" and then trying to create a new ppp interface (which
will always fails as next free ppp unit id is 1, but network interface with
name "ppp1" exists).
This patch fixes above described issue by generating new and new ppp unit
id until some non-conflicting id with network interfaces is generated.
Signed-off-by: Pali Rohár <pali@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
IFLA_IFNAME is nul-term string which means that IFLA_IFNAME buffer can be
larger than length of string which contains.
Function __rtnl_newlink() generates new own ifname if either IFLA_IFNAME
was not specified at all or userspace passed empty nul-term string.
It is expected that if userspace does not specify ifname for new ppp netdev
then kernel generates one in format "ppp<id>" where id matches to the ppp
unit id which can be later obtained by PPPIOCGUNIT ioctl.
And it works in this way if IFLA_IFNAME is not specified at all. But it
does not work when IFLA_IFNAME is specified with empty string.
So fix this logic also for empty IFLA_IFNAME in ppp_nl_newlink() function
and correctly generates ifname based on ppp unit identifier if userspace
did not provided preferred ifname.
Without this patch when IFLA_IFNAME was specified with empty string then
kernel created a new ppp interface in format "ppp<id>" but id did not
match ppp unit id returned by PPPIOCGUNIT ioctl. In this case id was some
number generated by __rtnl_newlink() function.
Signed-off-by: Pali Rohár <pali@kernel.org>
Fixes: bb8082f691 ("ppp: build ifname using unit identifier for rtnl based devices")
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan says:
====================
bnxt_en: PTP fixes
This series includes 2 fixes for the PTP feature. Update to the new
firmware interface so that the driver can pass the PTP sequence number
header offset of TX packets to the firmware. This is needed for all
PTP packet types (v1, v2, with or without VLAN) to work. The 2nd
fix is to use a different register window to read the PHC to avoid
conflict with an older Broadcom tool.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Some older Broadcom debug tools use window 5 and may conflict, so switch
to use window 6 instead.
Fixes: 118612d519 ("bnxt_en: Add PTP clock APIs, ioctls, and ethtool methods")
Reviewed-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
New firmware interface requires the PTP sequence ID header offset to
be passed to the firmware to properly find the matching timestamp
for all protocols.
Fixes: 83bb623c96 ("bnxt_en: Transmit and retrieve packet timestamps")
Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The key change is the firmware call to retrieve the PTP TX timestamp.
The header offset for the PTP sequence number field is now added.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes possible leak of PTP virtual clocks.
The number of PTP virtual clocks to be unregistered is passed as
'u32', but the function that unregister the devices handles that as
'u8'.
Fixes: 73f37068d5 ("ptp: support ptp physical/virtual clocks conversion")
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit a5e63c7d38 "net: phy: micrel: Fix detection of ksz87xx
switch" broke link detection on the external ports of the KSZ8795.
The previously unused phy_driver structure for these devices specifies
config_aneg and read_status functions that appear to be designed for a
fixed link and do not work with the embedded PHYs in the KSZ8795.
Delete the use of these functions in favour of the generic PHY
implementations which were used previously.
Fixes: a5e63c7d38 ("net: phy: micrel: Fix detection of ksz87xx switch")
Signed-off-by: Ben Hutchings <ben.hutchings@mind.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lockdep detected possible interrupt unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&mhiwwan->rx_lock);
local_irq_disable();
lock(&mhi_cntrl->pm_lock);
lock(&mhiwwan->rx_lock);
<Interrupt>
lock(&mhi_cntrl->pm_lock);
*** DEADLOCK ***
To prevent this we need to disable the soft-interrupts when taking
the rx_lock.
Cc: stable@vger.kernel.org
Fixes: fa588eba63 ("net: Add Qcom WWAN control driver")
Reported-by: Thomas Perrot <thomas.perrot@bootlin.com>
Signed-off-by: Loic Poulain <loic.poulain@linaro.org>
Reviewed-by: Sergey Ryazanov <ryazanov.s.a@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make sure that all external port are actually isolated from each other,
so no packets are leaked.
Fixes: ec6698c272 ("net: dsa: add support for Atheros AR9331 built-in switch")
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hayes Wang says:
====================
r8169: adjust the setting for RTL8106e
These patches are uesed to avoid the delay of link-up interrupt, when
enabling ASPM for RTL8106e. The patch #1 is used to enable ASPM if
it is possible. And the patch #2 is used to modify the entrance latencies
of L0 and L1.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The original L0 and L1 entrance latencies of RTL8106e are 4us. And
they cause the delay of link-up interrupt when enabling ASPM. Change
the L0 entrance latency to 7us and L1 entrance latency to 32us. Then,
they could avoid the issue.
Tested-by: Koba Ko <koba.ko@canonical.com>
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann says:
====================
pull-request: bpf 2021-08-07
The following pull-request contains BPF updates for your *net* tree.
We've added 4 non-merge commits during the last 9 day(s) which contain
a total of 4 files changed, 8 insertions(+), 7 deletions(-).
The main changes are:
1) Fix integer overflow in htab's lookup + delete batch op, from Tatsuhiko Yasumatsu.
2) Fix invalid fd 0 close in libbpf if BTF parsing failed, from Daniel Xu.
3) Fix libbpf feature probe for BPF_PROG_TYPE_CGROUP_SOCKOPT, from Robin Gögge.
4) Fix minor libbpf doc warning regarding code-block language, from Randy Dunlap.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Since 5.13 QE's ucc nodes can't get interrupts from devicetree:
ucc@2000 {
cell-index = <1>;
reg = <0x2000 0x200>;
interrupts = <32>;
interrupt-parent = <&qeic>;
};
Now fw_devlink expects driver to create and probe a struct device
for interrupt controller.
So lets convert this driver to simple platform_device with probe().
Also use platform_get_ and devm_ family function to get/allocate
resources and drop unused .compatible = "qeic".
[1] - https://lore.kernel.org/lkml/CAGETcx9PiX==mLxB9PO8Myyk6u2vhPVwTMsA5NkD-ywH5xhusw@mail.gmail.com
Fixes: e590474768 ("driver core: Set fw_devlink=on by default")
Fixes: ea718c6990 ("Revert "Revert "driver core: Set fw_devlink=on by default""")
Signed-off-by: Maxim Kochetkov <fido_max@inbox.ru>
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Saravana Kannan <saravanak@google.com>
Signed-off-by: Li Yang <leoyang.li@nxp.com>
In __htab_map_lookup_and_delete_batch(), hash buckets are iterated
over to count the number of elements in each bucket (bucket_size).
If bucket_size is large enough, the multiplication to calculate
kvmalloc() size could overflow, resulting in out-of-bounds write
as reported by KASAN:
[...]
[ 104.986052] BUG: KASAN: vmalloc-out-of-bounds in __htab_map_lookup_and_delete_batch+0x5ce/0xb60
[ 104.986489] Write of size 4194224 at addr ffffc9010503be70 by task crash/112
[ 104.986889]
[ 104.987193] CPU: 0 PID: 112 Comm: crash Not tainted 5.14.0-rc4 #13
[ 104.987552] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
[ 104.988104] Call Trace:
[ 104.988410] dump_stack_lvl+0x34/0x44
[ 104.988706] print_address_description.constprop.0+0x21/0x140
[ 104.988991] ? __htab_map_lookup_and_delete_batch+0x5ce/0xb60
[ 104.989327] ? __htab_map_lookup_and_delete_batch+0x5ce/0xb60
[ 104.989622] kasan_report.cold+0x7f/0x11b
[ 104.989881] ? __htab_map_lookup_and_delete_batch+0x5ce/0xb60
[ 104.990239] kasan_check_range+0x17c/0x1e0
[ 104.990467] memcpy+0x39/0x60
[ 104.990670] __htab_map_lookup_and_delete_batch+0x5ce/0xb60
[ 104.990982] ? __wake_up_common+0x4d/0x230
[ 104.991256] ? htab_of_map_free+0x130/0x130
[ 104.991541] bpf_map_do_batch+0x1fb/0x220
[...]
In hashtable, if the elements' keys have the same jhash() value, the
elements will be put into the same bucket. By putting a lot of elements
into a single bucket, the value of bucket_size can be increased to
trigger the integer overflow.
Triggering the overflow is possible for both callers with CAP_SYS_ADMIN
and callers without CAP_SYS_ADMIN.
It will be trivial for a caller with CAP_SYS_ADMIN to intentionally
reach this overflow by enabling BPF_F_ZERO_SEED. As this flag will set
the random seed passed to jhash() to 0, it will be easy for the caller
to prepare keys which will be hashed into the same value, and thus put
all the elements into the same bucket.
If the caller does not have CAP_SYS_ADMIN, BPF_F_ZERO_SEED cannot be
used. However, it will be still technically possible to trigger the
overflow, by guessing the random seed value passed to jhash() (32bit)
and repeating the attempt to trigger the overflow. In this case,
the probability to trigger the overflow will be low and will take
a very long time.
Fix the integer overflow by calling kvmalloc_array() instead of
kvmalloc() to allocate memory.
Fixes: 057996380a ("bpf: Add batch ops to all htab bpf map")
Signed-off-by: Tatsuhiko Yasumatsu <th.yasumatsu@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210806150419.109658-1-th.yasumatsu@gmail.com
Use "code-block: none" instead of "c" for non-C-language code blocks.
Removes these warnings:
lnx-514-rc4/Documentation/bpf/libbpf/libbpf_naming_convention.rst:111: WARNING: Could not lex literal_block as "c". Highlighting skipped.
lnx-514-rc4/Documentation/bpf/libbpf/libbpf_naming_convention.rst:124: WARNING: Could not lex literal_block as "c". Highlighting skipped.
Fixes: f42cfb469f ("bpf: Add documentation for libbpf including API autogen")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210802015037.787-1-rdunlap@infradead.org
Before this patch, btf_new() was liable to close an arbitrary FD 0 if
BTF parsing failed. This was because:
* btf->fd was initialized to 0 through the calloc()
* btf__free() (in the `done` label) closed any FDs >= 0
* btf->fd is left at 0 if parsing fails
This issue was discovered on a system using libbpf v0.3 (without
BTF_KIND_FLOAT support) but with a kernel that had BTF_KIND_FLOAT types
in BTF. Thus, parsing fails.
While this patch technically doesn't fix any issues b/c upstream libbpf
has BTF_KIND_FLOAT support, it'll help prevent issues in the future if
more BTF types are added. It also allow the fix to be backported to
older libbpf's.
Fixes: 3289959b97 ("libbpf: Support BTF loading and raw data output in both endianness")
Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/5969bb991adedb03c6ae93e051fd2a00d293cf25.1627513670.git.dxu@dxuuu.xyz
After LPM, when migrating from a system with security mitigation enabled
to a system with mitigation disabled, the security flavor exposed in
/proc is not correctly set back to 0.
Do not assume the value of the security flavor is set to 0 when entering
init_cpu_char_feature_flags(), so when called after a LPM, the value is
set correctly even if the mitigation are not turned off.
Fixes: 6ce56e1ac3 ("powerpc/pseries: export LPAR security flavor in lparcfg")
Cc: stable@vger.kernel.org # v5.13+
Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210805152308.33988-1-ldufour@linux.ibm.com
32 bits BOOKE have special interrupts for debug and other
critical events.
When handling those interrupts, dedicated registers are saved
in the stack frame in addition to the standard registers, leading
to a shift of the pt_regs struct.
Since commit db297c3b07 ("powerpc/32: Don't save thread.regs on
interrupt entry"), the pt_regs struct is expected to be at the
same place all the time.
Instead of handling a special struct in addition to pt_regs, just
add those special registers to struct pt_regs.
Fixes: db297c3b07 ("powerpc/32: Don't save thread.regs on interrupt entry")
Cc: stable@vger.kernel.org
Reported-by: Radu Rendec <radu.rendec@gmail.com>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/028d5483b4851b01ea4334d0751e7f260419092b.1625637264.git.christophe.leroy@csgroup.eu
[Why]
With kernel module parameter "freesync_video" is enabled, if the mode
is changed to preferred mode(the mode with highest rate), then Freesync
fails because the preferred mode is treated as one of freesync video
mode, and then be configurated as freesync video mode(fixed refresh
rate).
[How]
Skip freesync fixed rate configurating when modeset to preferred mode.
Signed-off-by: Solomon Chiu <solomon.chiu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Due to 14f97f0b8e, the rawnand platforms without "secure-regions"
property defined in DT fails to probe. The issue is,
of_get_nand_secure_regions() errors out if
of_property_count_elems_of_size() returns a negative error code.
If the "secure-regions" property is not present in DT, then also we'll
get -EINVAL from of_property_count_elems_of_size() but it should not
be treated as an error for platforms not declaring "secure-regions"
in DT.
So fix this behaviour by checking for the existence of that property in
DT and return 0 if it is not present.
Fixes: 14f97f0b8e ("mtd: rawnand: Add a check in of_get_nand_secure_regions()")
Reported-by: Martin Kaiser <martin@kaiser.cx>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Reviewed-by: Martin Kaiser <martin@kaiser.cx>
Tested-by: Martin Kaiser <martin@kaiser.cx>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/linux-mtd/20210727062813.32619-1-manivannan.sadhasivam@linaro.org
There is a lock hierarchy of major_names_lock --> mtd_table_mutex. One
existing chain is as follows:
1. major_names_lock --> loop_ctl_mutex (when blk_request_module calls
loop_probe)
2. loop_ctl_mutex --> bdev->bd_mutex (when loop_control_ioctl calls
loop_remove, which then calls del_gendisk)
3. bdev->bd_mutex --> mtd_table_mutex (when blkdev_get_by_dev calls
__blkdev_get, which then calls blktrans_open)
Since unregister_blkdev grabs the major_names_lock, we need to call it
outside the critical section for mtd_table_mutex, otherwise we invert
the lock hierarchy.
Reported-by: Hillf Danton <hdanton@sina.com>
Signed-off-by: Desmond Cheong Zhi Xi <desmondcheongzx@gmail.com>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/linux-mtd/20210717100719.728829-1-desmondcheongzx@gmail.com
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for net:
1) Restrict range element expansion in ipset to avoid soft lockup,
from Jozsef Kadlecsik.
2) Memleak in error path for nf_conntrack_bridge for IPv4 packets,
from Yajun Deng.
3) Simplify conntrack garbage collection strategy to avoid frequent
wake-ups, from Florian Westphal.
4) Fix NFNLA_HOOK_FUNCTION_NAME string, do not include module name.
5) Missing chain family netlink attribute in chain description
in nfnetlink_hook.
6) Incorrect sequence number on nfnetlink_hook dumps.
7) Use netlink request family in reply message for consistency.
8) Remove offload_pickup sysctl, use conntrack for established state
instead, from Florian Westphal.
9) Translate NFPROTO_INET/ingress to NFPROTO_NETDEV/ingress, since
NFPROTO_INET is not exposed through nfnetlink_hook.
* git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf:
netfilter: nfnetlink_hook: translate inet ingress to netdev
netfilter: conntrack: remove offload_pickup sysctl again
netfilter: nfnetlink_hook: Use same family as request message
netfilter: nfnetlink_hook: use the sequence number of the request message
netfilter: nfnetlink_hook: missing chain family
netfilter: nfnetlink_hook: strip off module name from hookfn
netfilter: conntrack: collect all entries in one cycle
netfilter: nf_conntrack_bridge: Fix memory leak when error
netfilter: ipset: Limit the maximal range of consecutive elements to add/delete
====================
Link: https://lore.kernel.org/r/20210806151149.6356-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
'watermarks_table' must be freed instead 'clocks_table', because
'clocks_table' is known to be NULL at this point and 'watermarks_table' is
never freed if the last kzalloc fails.
Fixes: c98ee89736 ("drm/amd/pm: add the fine grain tuning function for vangogh")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The NFPROTO_INET pseudofamily is not exposed through this new netlink
interface. The netlink dump either shows NFPROTO_IPV4 or NFPROTO_IPV6
for NFPROTO_INET prerouting/input/forward/output/postrouting hooks.
The NFNLA_CHAIN_FAMILY attribute provides the family chain, which
specifies if this hook applies to inet traffic only (either IPv4 or
IPv6).
Translate the inet/ingress hook to netdev/ingress to fully hide the
NFPROTO_INET implementation details.
Fixes: e2cf17d377 ("netfilter: add new hook nfnl subsystem")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
These two sysctls were added because the hardcoded defaults (2 minutes,
tcp, 30 seconds, udp) turned out to be too low for some setups.
They appeared in 5.14-rc1 so it should be fine to remove it again.
Marcelo convinced me that there should be no difference between a flow
that was offloaded vs. a flow that was not wrt. timeout handling.
Thus the default is changed to those for TCP established and UDP stream,
5 days and 120 seconds, respectively.
Marcelo also suggested to account for the timeout value used for the
offloading, this avoids increase beyond the value in the conntrack-sysctl
and will also instantly expire the conntrack entry with altered sysctls.
Example:
nf_conntrack_udp_timeout_stream=60
nf_flowtable_udp_timeout=60
This will remove offloaded udp flows after one minute, rather than two.
An earlier version of this patch also cleared the ASSURED bit to
allow nf_conntrack to evict the entry via early_drop (i.e., table full).
However, it looks like we can safely assume that connection timed out
via HW is still in established state, so this isn't needed.
Quoting Oz:
[..] the hardware sends all packets with a set FIN flags to sw.
[..] Connections that are aged in hardware are expected to be in the
established state.
In case it turns out that back-to-sw-path transition can occur for
'dodgy' connections too (e.g., one side disappeared while software-path
would have been in RETRANS timeout), we can adjust this later.
Cc: Oz Shlomo <ozsh@nvidia.com>
Cc: Paul Blakey <paulb@nvidia.com>
Suggested-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Use the same family as the request message, for consistency. The
netlink payload provides sufficient information to describe the hook
object, including the family.
This makes it easier to userspace to correlate the hooks are that
visited by the packets for a certain family.
Fixes: e2cf17d377 ("netfilter: add new hook nfnl subsystem")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The sequence number allows to correlate the netlink reply message (as
part of the dump) with the original request message.
The cb->seq field is internally used to detect an interference (update)
of the hook list during the netlink dump, do not use it as sequence
number in the netlink dump header.
Fixes: e2cf17d377 ("netfilter: add new hook nfnl subsystem")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The family is relevant for pseudo-families like NFPROTO_INET
otherwise the user needs to rely on the hook function name to
differentiate it from NFPROTO_IPV4 and NFPROTO_IPV6 names.
Add nfnl_hook_chain_desc_attributes instead of using the existing
NFTA_CHAIN_* attributes, since these do not provide a family number.
Fixes: e2cf17d377 ("netfilter: add new hook nfnl subsystem")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
NFNLA_HOOK_FUNCTION_NAME should include the hook function name only,
the module name is already provided by NFNLA_HOOK_MODULE_NAME.
Fixes: e2cf17d377 ("netfilter: add new hook nfnl subsystem")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Michal Kubecek reports that conntrack gc is responsible for frequent
wakeups (every 125ms) on idle systems.
On busy systems, timed out entries are evicted during lookup.
The gc worker is only needed to remove entries after system becomes idle
after a busy period.
To resolve this, always scan the entire table.
If the scan is taking too long, reschedule so other work_structs can run
and resume from next bucket.
After a completed scan, wait for 2 minutes before the next cycle.
Heuristics for faster re-schedule are removed.
GC_SCAN_INTERVAL could be exposed as a sysctl in the future to allow
tuning this as-needed or even turn the gc worker off.
Reported-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
ASoC: Fixes for v5.14
Quite a lot of fixes here, the biggest set being for the cs42l42 driver
which is reasonably old but has seen a sudden uptick in activity.
There's also some fixes for correctly referencing PCM buffer addresses
and the removal of some driver-local bodges that had been done for the
lack of prefix handling in DAPM which were broken by the core handling
that as expected.
The gpiod_lookup_table.table passed to gpiod_add_lookup_table() must
be terminated with an empty entry, add this.
Note we have likely been getting away with this not being present because
the GPIO lookup code first matches on the dev_id, causing most lookups to
skip checking the table and the lookups which do check the table will
find a matching entry before reaching the end. With that said, terminating
these tables properly still is obviously the correct thing to do.
Fixes: f8eb0235f6 ("x86: pcengines apuv2 gpio/leds/keys platform driver")
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20210806115515.12184-1-hdegoede@redhat.com
360 degree hinges devices with dual KIOX010A + KIOX020A accelerometers
always have both a KIOX010A and a KIOX020A ACPI device (one for each
accel).
Theoretical some vendor may re-use some DSDT for a non-convertible
stripping out just the KIOX020A ACPI device from the DSDT. Check that
both ACPI devices are present to make the check more robust.
Fixes: 153cca9caa ("platform/x86: Add and use a dual_accel_detect() helper")
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20210802141000.978035-1-hdegoede@redhat.com
On s390, the following build warning occurs:
drivers/net/ethernet/marvell/mvpp2/mvpp2.h:844:2: warning: overflow in
conversion from 'long unsigned int' to 'int' changes value from
'18446744073709551584' to '-32' [-Woverflow]
844 | ((total_size) - MVPP2_SKB_HEADROOM - MVPP2_SKB_SHINFO_SIZE)
This happens because MVPP2_SKB_SHINFO_SIZE, which is 320 bytes (which is
already 64-byte aligned) on some architectures, actually gets ALIGN'd up
to 512 bytes in the s390 case.
So then, when this is invoked:
MVPP2_RX_MAX_PKT_SIZE(MVPP2_BM_SHORT_FRAME_SIZE)
...that turns into:
704 - 224 - 512 == -32
...which is not a good frame size to end up with! The warning above is a
bit lucky: it notices a signed/unsigned bad behavior here, which leads
to the real problem of a frame that is too short for its contents.
Increase MVPP2_BM_SHORT_FRAME_SIZE by 32 (from 704 to 736), which is
just exactly big enough. (The other values can't readily be changed
without causing a lot of other problems.)
Fixes: 07dd0a7aae ("mvpp2: add basic XDP support")
Cc: Sven Auhagen <sven.auhagen@voleatech.de>
Cc: Matteo Croce <mcroce@microsoft.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add the missing RxUnicast counter.
Fixes: b8f126a8d5 ("net-next: dsa: add dsa support for Mediatek MT7530 switch")
Signed-off-by: DENG Qingfang <dqfext@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is no 'NONE' version of 'enum mcu_cipher_type', and returning
'MT_CIPHER_NONE' causes a warning:
drivers/net/wireless/mediatek/mt76/mt7921/mcu.c: In function 'mt7921_mcu_get_cipher':
drivers/net/wireless/mediatek/mt76/mt7921/mcu.c:114:24: error: implicit conversion from 'enum mt76_cipher_type' to 'enum mcu_cipher_type' [-Werror=enum-conversion]
114 | return MT_CIPHER_NONE;
| ^~~~~~~~~~~~~~
Add the missing MCU_CIPHER_NONE defintion that fits in here with
the same value.
Fixes: c368362c36 ("mt76: fix iv and CCMP header insertion")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/20210721150745.1914829-1-arnd@kernel.org
As GDSCs are registered and found to be already enabled gdsc_init()
ensures that 1) the kernel state matches the hardware state, and 2)
votable GDSCs are properly enabled from this master as well.
But as the (optional) supply regulator is enabled deep into
gdsc_toggle_logic(), which is only executed for votable GDSCs, the
kernel's state of the regulator might not match the hardware. The
regulator might be automatically turned off if no other users are
present or the next call to gdsc_disable() would cause an unbalanced
regulator_disable().
Given that the votable case deals with an already enabled GDSC, most of
gdsc_enable() and gdsc_toggle_logic() can be skipped. Reduce it to just
clearing the SW_COLLAPSE_MASK and enabling hardware control to simply
call regulator_enable() in both cases.
The enablement of hardware control seems to be an independent property
from the GDSC being enabled, so this is moved outside that conditional
segment.
Lastly, as the propagation of ALWAYS_ON to GENPD_FLAG_ALWAYS_ON needs to
happen regardless of the initial state this is grouped together with the
other sc->pd updates at the end of the function.
Cc: stable@vger.kernel.org
Fixes: 37416e5549 ("clk: qcom: gdsc: Handle GDSC regulator supplies")
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Link: https://lore.kernel.org/r/20210721224056.3035016-1-bjorn.andersson@linaro.org
[sboyd@kernel.org: Rephrase commit text]
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
I2S always has two LRCLK phases and both CH1 and CH2 of the RX
must be enabled (corresponding to the low and high phases of LRCLK.)
The selection of the valid data channels is done by setting the DAC
CHA_SEL and CHB_SEL. CHA_SEL is always the first (left) channel,
CHB_SEL depends on the number of active channels.
Previously for mono ASP CH2 was not enabled, the result was playing
mono data would not produce any audio output.
Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
Fixes: 621d65f3b8 ("ASoC: cs42l42: Provide finer control on playback path")
Link: https://lore.kernel.org/r/20210805161111.10410-4-rf@opensource.cirrus.com
Signed-off-by: Mark Brown <broonie@kernel.org>
The lowest valid SCLK corresponds to 44.1 kHz at 16-bit. Sample
rates less than this would produce SCLK below the minimum when using
a normal I2S frame. A constraint must be applied to prevent this.
The constraint is not applied if the machine driver sets SCLK, to
allow setups where the host generates additional bits per LRCLK
phase to increase the SCLK frequency. In these cases the machine
driver would always have to inform this driver of the actual SCLK,
and it must select a legal SCLK.
Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
Link: https://lore.kernel.org/r/20210805161111.10410-3-rf@opensource.cirrus.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Both SCLK and PLL clocks must be running to drive the glitch-free mux
behind MCLK_SRC_SEL and complete the switchover.
This patch moves the writing of MCLK_SRC_SEL to when the PLL is started
and stopped, so that it only transitions while the PLL is running.
The unconditional write MCLK_SRC_SEL=0 in cs42l42_mute_stream() is safe
because if the PLL is not running MCLK_SRC_SEL is already 0.
Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
Fixes: 43fc357199 ("ASoC: cs42l42: Set clock source for both ways of stream")
Link: https://lore.kernel.org/r/20210805161111.10410-1-rf@opensource.cirrus.com
Signed-off-by: Mark Brown <broonie@kernel.org>
We used to follow the rule earlier that the create SD context
always be a multiple of 8. However, with the change:
cifs: refactor create_sd_buf() and and avoid corrupting the buffer
...we recompute the length, and we failed that rule.
Fixing that with this change.
Cc: <stable@vger.kernel.org> # v5.10+
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Fix the upper guard and the "removed_region", this fixes the random
crashes which used to occur in memory intensive loads. I'm not sure WHY
the upper guard being 0x2000 instead of 0x1000 doesn't fix this, but it
HAS to be 0x1000.
Fixes: e60fd5ac1f ("arm64: dts: qcom: sdm845-oneplus-common: guard rmtfs-mem")
Signed-off-by: Caleb Connolly <caleb@connolly.tech>
Link: https://lore.kernel.org/r/20210720153125.43389-2-caleb@connolly.tech
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
As the default definition breaks booting angler:
[ 1.862561] printk: console [ttyMSM0] enabled
[ 1.872260] msm_serial: driver initialized
D - 15524 - pm_driver_init, Delta
cont_splash_mem was introduced in 74d6d0a145, but the problem
manifested after commit '86588296acbf ("fdt: Properly handle "no-map"
field in the memory region")'.
Disabling it because Angler's firmware does not report where the memory
is allocated (dmesg from downstream kernel):
[ 0.000000] cma: Found cont_splash_mem@0, memory base 0x0000000000000000, size 16 MiB, limit 0x0000000000000000
[ 0.000000] cma: CMA: reserved 16 MiB at 0x0000000000000000 for cont_splash_mem
Similar issue might be on Google Nexus 5X (lg-bullhead). Other MSM8992/4
are known to report correct address.
Fixes: 74d6d0a145 ("arm64: dts: qcom: msm8994/8994-kitakami: Fix up the memory map")
Suggested-by: Konrad Dybcio <konradybcio@gmail.com>
Signed-off-by: Petr Vorel <petr.vorel@gmail.com>
Link: https://lore.kernel.org/r/20210622191019.23771-1-petr.vorel@gmail.com
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
There is a race in ceph_put_snap_realm. The change to the nref and the
spinlock acquisition are not done atomically, so you could decrement
nref, and before you take the spinlock, the nref is incremented again.
At that point, you end up putting it on the empty list when it
shouldn't be there. Eventually __cleanup_empty_realms runs and frees
it when it's still in-use.
Fix this by protecting the 1->0 transition with atomic_dec_and_lock,
and just drop the spinlock if we can get the rwsem.
Because these objects can also undergo a 0->1 refcount transition, we
must protect that change as well with the spinlock. Increment locklessly
unless the value is at 0, in which case we take the spinlock, increment
and then take it off the empty list if it did the 0->1 transition.
With these changes, I'm removing the dout() messages from these
functions, as well as in __put_snap_realm. They've always been racy, and
it's better to not print values that may be misleading.
Cc: stable@vger.kernel.org
URL: https://tracker.ceph.com/issues/46419
Reported-by: Mark Nelson <mnelson@redhat.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Luis Henriques <lhenriques@suse.de>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Function ceph_check_delayed_caps() is called from the mdsc->delayed_work
workqueue and it can be kept looping for quite some time if caps keep
being added back to the mdsc->cap_delay_list. This may result in the
watchdog tainting the kernel with the softlockup flag.
This patch breaks this loop if the caps have been recently (i.e. during
the loop execution). Any new caps added to the list will be handled in
the next run.
Also, allow schedule_delayed() callers to explicitly set the delay value
instead of defaulting to 5s, so we can ensure that it runs soon
afterward if it looks like there is more work.
Cc: stable@vger.kernel.org
URL: https://tracker.ceph.com/issues/46284
Signed-off-by: Luis Henriques <lhenriques@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
When a Data CRC interrupt is received, the driver disables the DMA, then
sends the stop/abort command and then waits for Data Transfer Over.
However, sometimes, when a data CRC error is received in the middle of a
multi-block write transfer, the Data Transfer Over interrupt is never
received, and the driver hangs and never completes the request.
The driver sets the BMOD.SWR bit (SDMMC_IDMAC_SWRESET) when stopping the
DMA, but according to the manual CMD.STOP_ABORT_CMD should be programmed
"before assertion of SWR". Do these operations in the recommended
order. With this change the Data Transfer Over is always received
correctly in my tests.
Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com>
Reviewed-by: Jaehoon Chung <jh80.chung@samsung.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20210630102232.16011-1-vincent.whitchurch@axis.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
It should be added kfree_skb_list() when err is not equal to zero
in nf_br_ip_fragment().
v2: keep this aligned with IPv6.
v3: modify iter.frag_list to iter.frag.
Fixes: 3c171f496e ("netfilter: bridge: add connection tracking system")
Signed-off-by: Yajun Deng <yajun.deng@linux.dev>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The range size of consecutive elements were not limited. Thus one could
define a huge range which may result soft lockup errors due to the long
execution time. Now the range size is limited to 2^20 entries.
Reported-by: Brad Spengler <spender@grsecurity.net>
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Stop the initialization when cpumask allocation failed and return an
error.
Fixes: 80a064dbd5 ("scmi-cpufreq: Get opp_shared_cpus from opp-v2 for EM")
Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
This WARN can be triggered per-core and the stack trace is not useful.
Replace it with plain dev_err(). Fix a comment while at it.
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
FPU_STATUS register contains FP exception flags bits which are updated
by core as side-effect of FP instructions but can also be manually
wiggled such as by glibc C99 functions fe{raise,clear,test}except() etc.
To effect the update, the programming model requires OR'ing FWE
bit (31). This bit is write-only and RAZ, meaning it is effectively
auto-cleared after write and thus needs to be set everytime: which
is how glibc implements this.
However there's another usecase of FPU_STATUS update, at the time of
Linux task switch when incoming task value needs to be programmed into
the register. This was added as part of f45ba2bd6d ("ARCv2:
fpu: preserve userspace fpu state") which missed OR'ing FWE bit,
meaning the new value is effectively not being written at all.
This patch remedies that.
Interestingly, this snafu was not caught in interm glibc testing as the
race window which relies on a specific exception bit to be set/clear is
really small specially when it nvolves context switch.
Fortunately this was caught by glibc's math/test-fenv-tls test which
repeatedly set/clear exception flags in a big loop, concurrently in main
program and also in a thread.
Fixes: https://github.com/foss-for-synopsys-dwc-arc-processors/linux/issues/54
Fixes: f45ba2bd6d ("ARCv2: fpu: preserve userspace fpu state")
Cc: stable@vger.kernel.org #5.6+
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Enabling CONFIG_STACKDEPOT results in the following build error.
arc-elf-ld: lib/stackdepot.o: in function `filter_irq_stacks':
stackdepot.c:(.text+0x456): undefined reference to `__irqentry_text_start'
arc-elf-ld: stackdepot.c:(.text+0x456): undefined reference to `__irqentry_text_start'
arc-elf-ld: stackdepot.c:(.text+0x476): undefined reference to `__irqentry_text_end'
arc-elf-ld: stackdepot.c:(.text+0x476): undefined reference to `__irqentry_text_end'
arc-elf-ld: stackdepot.c:(.text+0x484): undefined reference to `__softirqentry_text_start'
arc-elf-ld: stackdepot.c:(.text+0x484): undefined reference to `__softirqentry_text_start'
arc-elf-ld: stackdepot.c:(.text+0x48c): undefined reference to `__softirqentry_text_end'
arc-elf-ld: stackdepot.c:(.text+0x48c): undefined reference to `__softirqentry_text_end'
Other architectures address this problem by adding IRQENTRY_TEXT and
SOFTIRQENTRY_TEXT to the text segment, so do the same here.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
There is a spelling mistake and incorrect grammar in the Kconfig
text. Fix them.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Fix checkpatch warnings:
WARNING: Prefer 'unsigned int' to bare use of 'unsigned'
Signed-off-by: Jinchao Wang <wjc@cdjrlc.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
The driver was defining two ALSA controls that both change the same
register field for the wind noise filter corner frequency. The filter
response has two corners, at different frequencies, and the duplicate
controls most likely were an attempt to be able to set the value using
either of the frequencies.
However, having two controls changing the same field can be problematic
and it is unnecessary. Both frequencies are related to each other so
setting one implies exactly what the other would be.
Removing a control affects user-side code, but there is currently no
known use of the removed control so it would be best to remove it now
before it becomes a problem.
Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
Fixes: 2c394ca796 ("ASoC: Add support for CS42L42 codec")
Link: https://lore.kernel.org/r/20210803160834.9005-2-rf@opensource.cirrus.com
Signed-off-by: Mark Brown <broonie@kernel.org>
On arm64, the stub only moves the kernel image around in memory if
needed, which is typically only for KASLR, given that relocatable
kernels (which is the default) can run from any 64k aligned address,
which is also the minimum alignment communicated to EFI via the PE/COFF
header.
Unfortunately, some loaders appear to ignore this header, and load the
kernel at some arbitrary offset in memory. We can deal with this, but
let's check for this condition anyway, so non-compliant code can be
spotted and fixed.
Cc: <stable@vger.kernel.org> # v5.10+
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Randomization of the physical load address of the kernel image relies on
efi_random_alloc() returning successfully, and currently, we ignore any
failures and just carry on, using the ordinary, non-randomized page
allocator routine. This means we never find out if a failure occurs,
which could harm security, so let's at least warn about this condition.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Commit 82046702e2 ("efi/libstub/arm64: Replace 'preferred' offset with
alignment check") simplified the way the stub moves the kernel image
around in memory before booting it, given that a relocatable image does
not need to be copied to a 2M aligned offset if it was loaded on a 64k
boundary by EFI.
Commit d32de9130f ("efi/arm64: libstub: Deal gracefully with
EFI_RNG_PROTOCOL failure") inadvertently defeated this logic by
overriding the value of efi_nokaslr if EFI_RNG_PROTOCOL is not
available, which was mistaken by the loader logic as an explicit request
on the part of the user to disable KASLR and any associated relocation
of an Image not loaded on a 2M boundary.
So let's reinstate this functionality, by capturing the value of
efi_nokaslr at function entry to choose the minimum alignment.
Fixes: d32de9130f ("efi/arm64: libstub: Deal gracefully with EFI_RNG_PROTOCOL failure")
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Distro versions of GRUB replace the usual LoadImage/StartImage calls
used to load the kernel image with some local code that fails to honor
the allocation requirements described in the PE/COFF header, as it
does not account for the image's BSS section at all: it fails to
allocate space for it, and fails to zero initialize it.
Since the EFI stub itself is allocated in the .init segment, which is
in the middle of the image, its BSS section is not impacted by this,
and the main consequence of this omission is that the BSS section may
overlap with memory regions that are already used by the firmware.
So let's warn about this condition, and force image reallocation to
occur in this case, which works around the problem.
Fixes: 82046702e2 ("efi/libstub/arm64: Replace 'preferred' offset with alignment check")
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
If probe_device is failing, iommu_group is not initialized because
iommu_group_add_device is not reached, so freeing it will result
in NULL pointer access.
iommu_bus_init
->bus_iommu_probe
->probe_iommu_group in for each:/* return -22 in fail case */
->iommu_probe_device
->__iommu_probe_device /* return -22 here.*/
-> ops->probe_device /* return -22 here.*/
-> iommu_group_get_for_dev
-> ops->device_group
-> iommu_group_add_device //good case
->remove_iommu_group //in fail case, it will remove group
->iommu_release_device
->iommu_group_remove_device // here we don't have group
In my case ops->probe_device (mtk_iommu_probe_device from
mtk_iommu_v1.c) is due to failing fwspec->ops mismatch.
Fixes: d72e31c937 ("iommu: IOMMU Groups")
Signed-off-by: Frank Wunderlich <frank-w@public-files.de>
Link: https://lore.kernel.org/r/20210731074737.4573-1-linux@fw-web.de
Signed-off-by: Joerg Roedel <jroedel@suse.de>
PCM buffers might be allocated dynamically when the buffer
preallocation failed or a larger buffer is requested, and it's not
guaranteed that substream->dma_buffer points to the actually used
buffer. The driver needs to refer to substream->runtime->dma_addr
instead for the buffer address.
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Link: https://lore.kernel.org/r/20210731084331.32225-1-tiwai@suse.de
Signed-off-by: Mark Brown <broonie@kernel.org>
Currently the for-loop that scans for the optimial adc_period iterates
through all the possible adc_period levels because the exit logic in
the loop is inverted. I believe the comparison should be swapped and
the continue replaced with a break to exit the loop at the correct
point.
Addresses-Coverity: ("Continue has no effect")
Fixes: e08e19c331 ("iio:adc: add iio driver for Palmas (twl6035/7) gpadc")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Link: https://lore.kernel.org/r/20210730071651.17394-1-colin.king@canonical.com
Cc: <stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Q1 and Q2 are numbers with *maximum* length of 384 bytes. If the
calculated length of Q1 and Q2 is less than 384 bytes, things will
go wrong.
E.g. if Q2 is 383 bytes, then
1. The bytes of q2 are copied to sigstruct->q2 in calc_q1q2().
2. The entire sigstruct->q2 is reversed, which results it being
256 * Q2, given that the last byte of sigstruct->q2 is added
to before the bytes given by calc_q1q2().
Either change in key or measurement can trigger the bug. E.g. an
unmeasured heap could cause a devastating change in Q1 or Q2.
Reverse exactly the bytes of Q1 and Q2 in calc_q1q2() before returning
to the caller.
Fixes: 2adcba79e6 ("selftests/x86: Add a selftest for SGX")
Link: https://lore.kernel.org/linux-sgx/20210301051836.30738-1-tianjia.zhang@linux.alibaba.com/
Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
The transition to the managed PCM buffers allowed the dynamically
buffer allocation, while the driver code still assumes the fixed
preallocation buffer and sets up the DMA stuff at the open call.
This needs to be moved to hw_params after the buffer allocation and
setup. Also, the reference to the buffer address has to be corrected
to runtime->dma_addr.
Fixes: b3c0ae75f5 ("ASoC: kirkwood: Use managed DMA buffer allocation")
Cc: Lars-Peter Clausen <lars@metafoo.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Link: https://lore.kernel.org/r/20210728112353.6675-6-tiwai@suse.de
Signed-off-by: Mark Brown <broonie@kernel.org>
Along with the transition to the managed PCM buffers, the driver now
accepts the dynamically allocated buffer, while it still kept the
reference to the old preallocated buffer address. This patch corrects
to the right reference via runtime->dma_addr.
(Although this might have been already buggy before the cleanup with
the managed buffer, let's put Fixes tag to point that; it's a corner
case, after all.)
Fixes: d55894bc27 ("ASoC: uniphier: Use managed buffer allocation")
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Link: https://lore.kernel.org/r/20210728112353.6675-5-tiwai@suse.de
Signed-off-by: Mark Brown <broonie@kernel.org>
PCM buffers might be allocated dynamically when the buffer
preallocation failed or a larger buffer is requested, and it's not
guaranteed that substream->dma_buffer points to the actually used
buffer. The driver needs to refer to substream->runtime->dma_addr
instead for the buffer address.
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Link: https://lore.kernel.org/r/20210728112353.6675-4-tiwai@suse.de
Signed-off-by: Mark Brown <broonie@kernel.org>
PCM buffers might be allocated dynamically when the buffer
preallocation failed or a larger buffer is requested, and it's not
guaranteed that substream->dma_buffer points to the actually used
buffer. The address should be retrieved from runtime->dma_addr,
instead of substream->dma_buffer (and shouldn't use virt_to_phys).
Also, remove the line overriding runtime->dma_area superfluously,
which was already set up at the PCM buffer allocation.
Cc: Cezary Rojewski <cezary.rojewski@intel.com>
Cc: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Link: https://lore.kernel.org/r/20210728112353.6675-3-tiwai@suse.de
Signed-off-by: Mark Brown <broonie@kernel.org>
When enabling KVM_CAP_ARM_MTE the ioctl checks that there are no VCPUs
created to ensure that the capability is enabled before the VM is
running. However no locks are held at that point so it is
(theoretically) possible for another thread in the VMM to create VCPUs
between the check and actually setting mte_enabled. Close the race by
taking kvm->lock.
Reported-by: Alexandru Elisei <alexandru.elisei@arm.com>
Fixes: 673638f434 ("KVM: arm64: Expose KVM_ARM_CAP_MTE")
Signed-off-by: Steven Price <steven.price@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210729160036.20433-1-steven.price@arm.com
Hyp checks whether an address range only covers RAM by checking the
start/endpoints against a list of memblock_region structs. However,
the endpoint here is exclusive but internally is treated as inclusive.
Fix the off-by-one error that caused valid address ranges to be
rejected.
Cc: Quentin Perret <qperret@google.com>
Fixes: 90134ac9ca ("KVM: arm64: Protect the .hyp sections from the host")
Signed-off-by: David Brazdil <dbrazdil@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210728153232.1018911-2-dbrazdil@google.com
Various 360 degree hinges (yoga) style 2-in-1 devices use 2 accelerometers
to allow the OS to determine the angle between the display and the base of
the device.
On Windows these are read by a special HingeAngleService process which
calls undocumented ACPI methods, to let the firmware know if the 2-in-1 is
in tablet- or laptop-mode. The firmware may use this to disable the kbd and
touchpad to avoid spurious input in tablet-mode as well as to report
SW_TABLET_MODE info to the OS.
Since Linux does not call these undocumented methods, the SW_TABLET_MODE
info reported by various pdx86 drivers is incorrect on these devices.
Before this commit the intel-hid and thinkpad_acpi code already had 2
hardcoded checks for ACPI hardware-ids of dual-accel sensors to avoid
reporting broken info.
And now we also have a bug-report about the same problem in the intel-vbtn
code. Since there are at least 3 different ACPI hardware-ids in play, add
a new dual_accel_detect() helper which checks for all 3, rather then
adding different hardware-ids to the drivers as bug-reports trickle in.
Having shared code which checks all known hardware-ids is esp. important
for the intel-hid and intel-vbtn drivers as these are generic drivers
which are used on a lot of devices.
The BOSC0200 hardware-id requires special handling, because often it is
used for a single-accelerometer setup. Only in a few cases it refers to
a dual-accel setup, in which case there will be 2 I2cSerialBus resources
in the device's resource-list, so the helper checks for this.
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=209011
Reported-and-tested-by: Julius Lehmann <julius@devpi.de>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Link: https://lore.kernel.org/r/20210729082134.6683-1-hdegoede@redhat.com
When the component level pin control functions were added they for some
no longer obvious reason handled adding prefixing of widget names. This
meant that when the lack of prefix handling in the DAPM level pin
operations was fixed by ae4fc53224 (ASoC: dapm: use component
prefix when checking widget names) the one device using the component
level API ended up with the prefix being applied twice, causing all
lookups to fail.
Fix this by removing the redundant prefixing from the component code,
which has the nice side effect of also making that code much simpler.
Reported-by: Richard Fitzgerald <rf@opensource.cirrus.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Tested-by: Lucas Tanure <tanureal@opensource.cirrus.com>
Link: https://lore.kernel.org/r/20210726194123.54585-1-broonie@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
0fa294fb19 ("cgroup: Replace cgroup_rstat_mutex with a spinlock") added
cgroup_rstat_flush_irqsafe() allowing flushing to happen from the irq
context. However, rstat paths use u64_stats_sync to synchronize access to
64bit stat counters on 32bit machines. u64_stats_sync is implemented using
seq_lock and trying to read from an irq context can lead to A-A deadlock if
the irq happens to interrupt the stat update.
Fix it by using the irqsafe variants - u64_stats_update_begin_irqsave() and
u64_stats_update_end_irqrestore() - in the update paths. Note that none of
this matters on 64bit machines. All these are just for 32bit SMP setups.
Note that the interface was introduced way back, its first and currently
only use was recently added by 2d146aa3aa ("mm: memcontrol: switch to
rstat"). Stable tagging targets this commit.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Rik van Riel <riel@surriel.com>
Fixes: 2d146aa3aa ("mm: memcontrol: switch to rstat")
Cc: stable@vger.kernel.org # v5.13+
On some platforms with an external HDaudio codec, the DSDT reports the
presence of SoundWire devices. Pin-mux restrictions and board reworks
usually prevent coexistence between the two types of links, let's
prevent unnecessary operations from starting.
In the case of a single iDISP codec being detected, we still start the
links even if no SoundWire machine configuration was detected, so that
we can double-check what the hardware is and add the missing
configuration if applicable.
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
Reviewed-by: Bard Liao <bard.liao@intel.com>
Link: https://lore.kernel.org/r/20210726182855.179943-1-pierre-louis.bossart@linux.intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
The indexes of the devices are described within the topology file, it is a
possibility that the topology encodes invalid indexes when DYNAMIC_MINORS
is not enabled in kernel:
#define SNDRV_MINOR_COMPRESS 2 /* 2 - 3 */
#define SNDRV_MINOR_HWDEP 4 /* 4 - 7 */
#define SNDRV_MINOR_RAWMIDI 8 /* 8 - 15 */
#define SNDRV_MINOR_PCM_PLAYBACK 16 /* 16 - 23 */
#define SNDRV_MINOR_PCM_CAPTURE 24 /* 24 - 31 */
If the topology assigns an index greater than 7 for PLAYBACK/CAPTURE PCM
then there will be minor number collision.
As an example:
card0 creates a capture PCM with index 10 -> minor = 34
card1 creates compress device with index 0 -> minor = 34
Card1 will fail to instantiate because the minor for the compress stream is
already taken.
To avoid seemingly mysterious issues with card creation, select the
DYNAMIC_MINORS when the topology is enabled.
The other option would be to try to do out of bound index checks in case of
DYNAMIC_MINOR is not enabled and do not even attempt to create the device
with failing the topology load.
Signed-off-by: Peter Ujfalusi <peter.ujfalusi@linux.intel.com>
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Link: https://lore.kernel.org/r/20210726182142.179604-1-pierre-louis.bossart@linux.intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Currently, iommu_dma_alloc_noncontiguous() allocates a
struct dma_sgt_handle object to hold some state needed for
iommu_dma_free_noncontiguous().
However, the handle is neither freed nor returned explicitly by
the ->alloc_noncontiguous method, and therefore seems leaked.
This was found by code inspection, so please review carefully and test.
As a side note, it appears the struct dma_sgt_handle type is exposed
to users of the DMA-API by linux/dma-map-ops.h, but is has no users
or functions returning the type explicitly.
This may indicate it's a good idea to move the struct dma_sgt_handle type
to drivers/iommu/dma-iommu.c. The decision is left to maintainers :-)
Cc: stable@vger.kernel.org
Fixes: e817ee5f2f ("dma-iommu: implement ->alloc_noncontiguous")
Signed-off-by: Ezequiel Garcia <ezequiel@collabora.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210723010552.50969-1-ezequiel@collabora.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The tlv320aic31xx driver relies on regcache_sync() to restore the register
contents after going to _BIAS_OFF, for example during system suspend. This
does not work for the jack detection configuration since that is configured
via the same register that status is read back from so the register is
volatile and not cached. This can also cause issues during init if the jack
detection ends up getting set up before the CODEC is initially brought out
of _BIAS_OFF, we will reset the CODEC and resync the cache as part of that
process.
Fix this by explicitly reapplying the jack detection configuration after
resyncing the register cache during power on.
This issue was found by an engineer working off-list on a product
kernel, I just wrote up the upstream fix.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20210723180200.25105-1-broonie@kernel.org
Cc: stable@vger.kernel.org
The Qualcomm SC8180x platform uses the qcom-cpufreq-hw driver, so
it in the cpufreq-dt-platdev driver's blocklist.
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
The datasheets have the following note for the conversion time
specification: "This parameter is specified by design and/or
characterization and it is not tested in production."
Parts have been seen that require more time to do 14-bit conversions for
the relative humidity channel. The result is ENXIO due to the address
phase of a transfer not getting an ACK.
Delay an additional 1 ms per conversion to allow for additional margin.
Fixes: 4839367d99 ("iio: humidity: add HDC100x support")
Signed-off-by: Chris Lesiak <chris.lesiak@licor.com>
Acked-by: Matt Ranostay <matt.ranostay@konsulko.com>
Link: https://lore.kernel.org/r/20210614141820.2034827-1-chris.lesiak@licor.com
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
The original bindings was listing the length of the interrupts as either
1 or 2, depending on the setup. This is also what is enforced by the top
level schema.
However, that is further constrained with an if clause that require
exactly two interrupts, even though it might not make sense on those
devices or in some setups.
Let's remove the clause entirely.
Cc: Denis Ciocca <denis.ciocca@st.com>
Cc: Lars-Peter Clausen <lars@metafoo.de>
Fixes: 0cd7114580 ("iio: st-sensors: Update ST Sensor bindings")
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
Reviewed-by: Rob Herring <robh@kernel.org>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Link: https://lore.kernel.org/r/20210721140424.725744-16-maxime@cerno.tech
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
With CONFIG_SPI=y and CONFIG_I2C=m, building fxls8962af into vmlinux
causes a link error against the I2C module:
aarch64-linux-ld: drivers/iio/accel/fxls8962af-core.o: in function `fxls8962af_fifo_flush':
fxls8962af-core.c:(.text+0x3a0): undefined reference to `i2c_verify_client'
Work around it by adding a Kconfig dependency that forces the SPI driver
to be a loadable module whenever I2C is a module.
Fixes: af959b7b96 ("iio: accel: fxls8962af: fix errata bug E3 - I2C burst reads")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20210721151330.2176653-1-arnd@kernel.org
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Some pin doesn't support PUPD register, if it fails and fallbacks with
bias_set_combo case, it will call mtk_pinconf_bias_set_pupd_r1_r0() to
modify the PUPD pin again.
Since the general bias set are either PU/PD or PULLSEL/PULLEN, try
bias_set or bias_set_rev1 for the other fallback case. If the pin
doesn't support neither PU/PD nor PULLSEL/PULLEN, it will return
-ENOTSUPP.
Fixes: 81bd1579b4 ("pinctrl: mediatek: Fix fallback call path")
Signed-off-by: Hsin-Yi Wang <hsinyi@chromium.org>
Reviewed-by: Chen-Yu Tsai <wenst@chromium.org>
Reviewed-by: Zhiyong Tao <zhiyong.tao@mediatek.com>
Link: https://lore.kernel.org/r/20210701080955.2660294-1-hsinyi@chromium.org
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Enabling the PINCTRL_SM8350 symbol without GPIOLIB or SCM causes a build
failure:
WARNING: unmet direct dependencies detected for PINCTRL_MSM
Depends on [m]: PINCTRL [=y] && (ARCH_QCOM [=y] || COMPILE_TEST [=y]) && GPIOLIB [=y] && (QCOM_SCM [=m] || !QCOM_SCM [=m])
Selected by [y]:
- PINCTRL_SM8350 [=y] && PINCTRL [=y] && (ARCH_QCOM [=y] || COMPILE_TEST [=y]) && GPIOLIB [=y] && OF [=y]
aarch64-linux-ld: drivers/pinctrl/qcom/pinctrl-msm.o: in function `msm_gpio_irq_set_type':
pinctrl-msm.c:(.text.msm_gpio_irq_set_type+0x1c8): undefined reference to `qcom_scm_io_readl'
The main problem here is the 'select PINCTRL_MSM', which needs to be a
'depends on' as it is for all the other front-ends. As the GPIOLIB
dependency is now implied by that, symbol, remove the duplicate
dependencies in the process.
Fixes: d5d348a327 ("pinctrl: qcom: Add SM8350 pinctrl driver")
Fixes: 376f9e34c1 ("drivers: pinctrl: qcom: fix Kconfig dependency on GPIOLIB")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20210723091400.1669716-1-arnd@kernel.org
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
The cursor plane should use the current plane state in atomic_async_update
because it would not be the new plane state in the global atomic state
since _swap_state happened when those hook are run.
Fix cursor plane issue by below modification:
1. Remove plane_helper_funcs->atomic_update(plane, state) in
mtk_drm_crtc_async_update.
2. Add mtk_drm_update_new_state in to mtk_plane_atomic_async_update to
update the cursor plane by current plane state hook and update
others plane by the new_state.
Fixes: 37418bf14c ("drm: Use state helper instead of the plane state pointer")
Signed-off-by: jason-jh.lin <jason-jh.lin@mediatek.com>
Tested-by: Enric Balletbo i Serra <enric.balletbo@collabora.com>
Signed-off-by: Chun-Kuang Hu <chunkuang.hu@kernel.org>
atomic_get_output_bus_fmts() is only called when the bridge is the last
element in the bridge chain.
If mtk-dpi is not the last bridge, the format of output_bus_cfg is
MEDIA_BUS_FMT_FIXED, and mtk_dpi_dual_edge() will fail to write correct
value to regs.
Fixes: ec8747c524 ("drm/mediatek: dpi: Add bus format negotiation")
Signed-off-by: Hsin-Yi Wang <hsinyi@chromium.org>
Signed-off-by: Chun-Kuang Hu <chunkuang.hu@kernel.org>
The TAS2505/TAS2521 does support only three processing block options, unlike
TLV320AIC32x4 which supports 25. This is documented in TI slau472 2.5.1.2
Processing Blocks and Page 0 / Register 60: DAC Instruction Set - 0x00 / 0x3C.
Limit the Processing Blocks maximum value to 3 on TAS2505/TAS2521 and select
processing block PRB_P1 always, because for the configuration of teh codec
implemented in this driver, this is the best quality option.
Fixes: b4525b6196 ("ASoC: tlv320aic32x4: add support for TAS2505")
Signed-off-by: Marek Vasut <marex@denx.de>
Cc: Claudius Heine <ch@denx.de>
Cc: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20210720200348.182139-1-marex@denx.de
Signed-off-by: Mark Brown <broonie@kernel.org>
With SND_SOC_ALL_CODECS=y and SND_SOC_WCD938X_SDW=m, there is a link
error from a reverse dependency, since the built-in codec driver calls
into the modular soundwire back-end:
x86_64-linux-ld: sound/soc/codecs/wcd938x.o: in function `wcd938x_codec_free':
wcd938x.c:(.text+0x2c0): undefined reference to `wcd938x_sdw_free'
x86_64-linux-ld: sound/soc/codecs/wcd938x.o: in function `wcd938x_codec_hw_params':
wcd938x.c:(.text+0x2f6): undefined reference to `wcd938x_sdw_hw_params'
x86_64-linux-ld: sound/soc/codecs/wcd938x.o: in function `wcd938x_codec_set_sdw_stream':
wcd938x.c:(.text+0x332): undefined reference to `wcd938x_sdw_set_sdw_stream'
x86_64-linux-ld: sound/soc/codecs/wcd938x.o: in function `wcd938x_tx_swr_ctrl':
wcd938x.c:(.text+0x23de): undefined reference to `wcd938x_swr_get_current_bank'
x86_64-linux-ld: sound/soc/codecs/wcd938x.o: in function `wcd938x_bind':
wcd938x.c:(.text+0x2579): undefined reference to `wcd938x_sdw_device_get'
x86_64-linux-ld: wcd938x.c:(.text+0x25a1): undefined reference to `wcd938x_sdw_device_get'
x86_64-linux-ld: wcd938x.c:(.text+0x262a): undefined reference to `__devm_regmap_init_sdw'
Work around this using two small hacks: An added Kconfig dependency
prevents the main driver from being built-in when soundwire support
itself is a loadable module to allow calling devm_regmap_init_sdw(),
and a Makefile trick links the wcd938x-sdw backend as built-in
if needed to solve the dependency between the two modules.
Fixes: 0454422288 ("ASoC: codecs: wcd938x: add audio routing and Kconfig")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20210721150510.1837221-1-arnd@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
bridge->driver_private is not set (NULL) so use bridge_to_dpi(bridge)
like it's done in bridge_atomic_get_output_bus_fmts
Fixes: ec8747c524 ("drm/mediatek: dpi: Add bus format negotiation")
Signed-off-by: Frank Wunderlich <frank-w@public-files.de>
Tested-by: Hsin-Yi Wang <hsinyi@chromium.org>
Signed-off-by: Chun-Kuang Hu <chunkuang.hu@kernel.org>
The EFI stub random allocator used for kaslr on arm64 has a subtle
bug. In function get_entry_num_slots() which counts the number of
possible allocation "slots" for the image in a given chunk of free
EFI memory, "last_slot" can become negative if the chunk is smaller
than the requested allocation size.
The test "if (first_slot > last_slot)" doesn't catch it because
both first_slot and last_slot are unsigned.
I chose not to make them signed to avoid problems if this is ever
used on architectures where there are meaningful addresses with the
top bit set. Instead, fix it with an additional test against the
allocation size.
This can cause a boot failure in addition to a loss of randomisation
due to another bug in the arm64 stub fixed separately.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Fixes: 2ddbfc81ea ("efi: stub: add implementation of efi_random_alloc()")
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
cont_splash_mem has different memory mapping than generic from msm8994.dtsi:
[ 0.000000] cma: Found cont_splash_mem@0, memory base 0x0000000003400000, size 12 MiB, limit 0xffffffffffffffff
[ 0.000000] cma: CMA: reserved 12 MiB at 0x0000000003400000 for cont_splash_mem
This fixes boot.
Fixes: 976d321f32 ("arm64: dts: qcom: msm8992: Make the DT an overlay on top of 8994")
Signed-off-by: Petr Vorel <petr.vorel@gmail.com>
Link: https://lore.kernel.org/r/20210713185734.380-3-petr.vorel@gmail.com
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
WSA881x powerdown pin is connected to GPIO1, GPIO2 not GPIO2 and GPIO3,
so correct this. This was working so far due to a shift bug in gpio driver,
however once that is fixed this will stop working, so fix this!
For some reason we forgot to add this dts change in last merge cycle so
currently audio is broken in 5.13 as the gpio driver fix already landed
in 5.13.
Reported-by: Shawn Guo <shawnguo@kernel.org>
Fixes: 45021d35fc ("arm64: dts: qcom: c630: Enable audio support")
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Tested-by: Shawn Guo <shawnguo@kernel.org>
Link: https://lore.kernel.org/r/20210706083523.10601-1-srinivas.kandagatla@linaro.org
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
The ADS7950 requires that CS is deasserted after each SPI word. Before
commit e2540da86e ("iio: adc: ti-ads7950: use SPI_CS_WORD to reduce
CPU usage") the driver used a message with one spi transfer per channel
where each but the last one had .cs_change set to enforce a CS toggle.
This was wrongly translated into a message with a single transfer and
.cs_change set which results in a CS toggle after each word but the
last which corrupts the first adc conversion of all readouts after the
first readout.
Fixes: e2540da86e ("iio: adc: ti-ads7950: use SPI_CS_WORD to reduce CPU usage")
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Reviewed-by: David Lechner <david@lechnology.com>
Tested-by: David Lechner <david@lechnology.com>
Cc: <Stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20210709101110.1814294-1-u.kleine-koenig@pengutronix.de
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
There are flash drivers which registers the OTP callbacks although the
flash doesn't support OTP regions and return -ENODATA for these
callbacks if there is no OTP. If this happens, the probe of the whole
flash will fail. Fix it by handling the ENODATA return code and skip
the OTP region nvmem setup.
Fixes: 4b361cfa86 ("mtd: core: add OTP nvmem provider support")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Michael Walle <michael@walle.cc>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/linux-mtd/20210707135359.32398-1-michael@walle.cc
Syzbot reported a circular locking dependency:
https://syzkaller.appspot.com/bug?id=7bd106c28e846d1023d4ca915718b1a0905444cb
This happens because of the following lock dependencies:
1. loop_ctl_mutex -> bdev->bd_mutex (when loop_control_ioctl calls
loop_remove, which then calls del_gendisk; this also happens in
loop_exit which eventually calls loop_remove)
2. bdev->bd_mutex -> mtd_table_mutex (when blkdev_get_by_dev calls
__blkdev_get, which then calls blktrans_open)
3. mtd_table_mutex -> major_names_lock (when register_mtd_blktrans
calls __register_blkdev)
4. major_names_lock -> loop_ctl_mutex (when blk_request_module calls
loop_probe)
Hence there's an overall dependency of:
loop_ctl_mutex ----------> bdev->bd_mutex
^ |
| |
| v
major_names_lock <--------- mtd_table_mutex
We can break this circular dependency by holding mtd_table_mutex only
for the required critical section in register_mtd_blktrans. This
avoids the mtd_table_mutex -> major_names_lock dependency.
Reported-and-tested-by: syzbot+6a8a0d93c91e8fbf2e80@syzkaller.appspotmail.com
Co-developed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Desmond Cheong Zhi Xi <desmondcheongzx@gmail.com>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/linux-mtd/20210617160904.570111-1-desmondcheongzx@gmail.com
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.