Pull USB fixes from Greg KH:
"The USB sub-maintainers woke up this past week and sent a bunch of
tiny fixes. Here are a lot of small patches that that resolve a bunch
of reported issues in the USB core, drivers, serial drivers, gadget
drivers, and of course, xhci :)
All of these have been in linux-next with no reported issues"
* tag 'usb-5.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (31 commits)
usb: dwc3: gadget: fix race when disabling ep with cancelled xfers
usb: cdns3: gadget: Fix g_audio use case when connected to Super-Speed host
usb: cdns3: gadget: reset EP_CLAIMED flag while unloading
USB: serial: whiteheat: fix line-speed endianness
USB: serial: whiteheat: fix potential slab corruption
USB: gadget: Reject endpoints with 0 maxpacket value
UAS: Revert commit 3ae62a4209 ("UAS: fix alignment of scatter/gather segments")
usb-storage: Revert commit 747668dbc0 ("usb-storage: Set virt_boundary_mask to avoid SG overflows")
usbip: Fix free of unallocated memory in vhci tx
usbip: tools: Fix read_usb_vudc_device() error path handling
usb: xhci: fix __le32/__le64 accessors in debugfs code
usb: xhci: fix Immediate Data Transfer endianness
xhci: Fix use-after-free regression in xhci clear hub TT implementation
USB: ldusb: fix control-message timeout
USB: ldusb: use unsigned size format specifiers
USB: ldusb: fix ring-buffer locking
USB: Skip endpoints with 0 maxpacket length
usb: cdns3: gadget: Don't manage pullups
usb: dwc3: remove the call trace of USBx_GFLADJ
usb: gadget: configfs: fix concurrent issue between composite APIs
...
Pull cifs fix from Steve French:
"A small smb3 memleak fix"
* tag '5.4-rc6-smb3-fix' of git://git.samba.org/sfrench/cifs-2.6:
fix memory leak in large read decrypt offload
Pull hwmon fixes from Guenter Roeck:
- Fix read timeout problem in ina3221 driver
- Fix wrong bitmask in nct7904 driver
* tag 'hwmon-for-v5.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: (ina3221) Fix read timeout issue
hwmon: (nct7904) Fix the incorrect value of vsen_mask & tcpu_mask & temp_mode in nct7904_data struct.
Pull pwm fixes from Thierry Reding:
"It turned out that relying solely on drivers storing all the PWM state
in hardware was a little premature and causes a number of subtle (and
some not so subtle) regressions. Revert the offending patch for now"
* tag 'pwm/for-5.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm:
Revert "pwm: Let pwm_get_state() return the last implemented state"
Pull SCSI fixes from James Bottomley:
"Nine changes, eight in drivers [ufs, target, lpfc x 2, qla2xxx x 4]
and one core change in sd that fixes an I/O failure on DIF type 3
devices"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: qla2xxx: stop timer in shutdown path
scsi: sd: define variable dif as unsigned int instead of bool
scsi: target: cxgbit: Fix cxgbit_fw4_ack()
scsi: qla2xxx: Fix partial flash write of MBI
scsi: qla2xxx: Initialized mailbox to prevent driver load failure
scsi: lpfc: Honor module parameter lpfc_use_adisc
scsi: ufs-bsg: Wake the device before sending raw upiu commands
scsi: lpfc: Check queue pointer before use
scsi: qla2xxx: fixup incorrect usage of host_byte
Pull powerpc fixes from Michael Ellerman:
"Our recent cleanup of EEH led to an oops on bare metal machines when
the cxl (CAPI) driver creates virtual devices for an attached FPGA
accelerator.
The "secure virtual machine" support we added in v5.4 had a bug if the
kernel was relocated (moved during boot), in those cases the signature
of the kernel text wouldn't verify and the Ultravisor would refuse to
run the VM.
A recent change to disable interrupts before calling
arch_cpu_idle_dead() caused a WARN_ON() in our bare metal CPU offline
code to always trigger.
The KUAP (SMAP) support we added for 32-bit Book3S had a bug if the
address range crossed a segment (256MB) boundary which could lead to
spurious faults.
Thanks to: Christophe Leroy, Frederic Barrat, Michael Anderson,
Nicholas Piggin, Sam Bobroff, Thiago Jung Bauermann"
* tag 'powerpc-5.4-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/powernv: Fix CPU idle to be called with IRQs disabled
powerpc/prom_init: Undo relocation before entering secure mode
powerpc/powernv/eeh: Fix oops when probing cxl devices
powerpc/32s: fix allow/prevent_user_access() when crossing segment boundaries.
Pull s390 fixes from Vasily Gorbik:
- Fix cpu idle time accounting
- Fix stack unwinder case when both pt_regs and sp are specified
- Fix information leak via cmm timeout proc handler
* tag 's390-5.4-6' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/idle: fix cpu idle time calculation
s390/unwind: fix mixing regs and sp
s390/cmm: fix information leak in cmm_timeout_handler()
Pull networking fixes from David Miller:
1) Fix free/alloc races in batmanadv, from Sven Eckelmann.
2) Several leaks and other fixes in kTLS support of mlx5 driver, from
Tariq Toukan.
3) BPF devmap_hash cost calculation can overflow on 32-bit, from Toke
Høiland-Jørgensen.
4) Add an r8152 device ID, from Kazutoshi Noguchi.
5) Missing include in ipv6's addrconf.c, from Ben Dooks.
6) Use siphash in flow dissector, from Eric Dumazet. Attackers can
easily infer the 32-bit secret otherwise etc.
7) Several netdevice nesting depth fixes from Taehee Yoo.
8) Fix several KCSAN reported errors, from Eric Dumazet. For example,
when doing lockless skb_queue_empty() checks, and accessing
sk_napi_id/sk_incoming_cpu lockless as well.
9) Fix jumbo packet handling in RXRPC, from David Howells.
10) Bump SOMAXCONN and tcp_max_syn_backlog values, from Eric Dumazet.
11) Fix DMA synchronization in gve driver, from Yangchun Fu.
12) Several bpf offload fixes, from Jakub Kicinski.
13) Fix sk_page_frag() recursion during memory reclaim, from Tejun Heo.
14) Fix ping latency during high traffic rates in hisilicon driver, from
Jiangfent Xiao.
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (146 commits)
net: fix installing orphaned programs
net: cls_bpf: fix NULL deref on offload filter removal
selftests: bpf: Skip write only files in debugfs
selftests: net: reuseport_dualstack: fix uninitalized parameter
r8169: fix wrong PHY ID issue with RTL8168dp
net: dsa: bcm_sf2: Fix IMP setup for port different than 8
net: phylink: Fix phylink_dbg() macro
gve: Fixes DMA synchronization.
inet: stop leaking jiffies on the wire
ixgbe: Remove duplicate clear_bit() call
Documentation: networking: device drivers: Remove stray asterisks
e1000: fix memory leaks
i40e: Fix receive buffer starvation for AF_XDP
igb: Fix constant media auto sense switching when no cable is connected
net: ethernet: arc: add the missed clk_disable_unprepare
igb: Enable media autosense for the i350.
igb/igc: Don't warn on fatal read failures when the device is removed
tcp: increase tcp_max_syn_backlog max value
net: increase SOMAXCONN to 4096
netdevsim: Fix use-after-free during device dismantle
...
Pull NFS client bugfixes from Anna Schumaker:
"This contains two delegation fixes (with the RCU lock leak fix marked
for stable), and three patches to fix destroying the the sunrpc back
channel.
Stable bugfixes:
- Fix an RCU lock leak in nfs4_refresh_delegation_stateid()
Other fixes:
- The TCP back channel mustn't disappear while requests are
outstanding
- The RDMA back channel mustn't disappear while requests are
outstanding
- Destroy the back channel when we destroy the host transport
- Don't allow a cached open with a revoked delegation"
* tag 'nfs-for-5.4-3' of git://git.linux-nfs.org/projects/anna/linux-nfs:
NFS: Fix an RCU lock leak in nfs4_refresh_delegation_stateid()
NFSv4: Don't allow a cached open with a revoked delegation
SUNRPC: Destroy the back channel when we destroy the host transport
SUNRPC: The RDMA back channel mustn't disappear while requests are outstanding
SUNRPC: The TCP back channel mustn't disappear while requests are outstanding
Pull block fixes from Jens Axboe:
- Two small nvme fixes, one is a fabrics connection fix, the other one
a cleanup made possible by that fix (Anton, via Keith)
- Fix requeue handling in umb ubd (Anton)
- Fix spin_lock_irq() nesting in blk-iocost (Dan)
- Three small io_uring fixes:
- Install io_uring fd after done with ctx (me)
- Clear ->result before every poll issue (me)
- Fix leak of shadow request on error (Pavel)
* tag 'for-linus-20191101' of git://git.kernel.dk/linux-block:
iocost: don't nest spin_lock_irq in ioc_weight_write()
io_uring: ensure we clear io_kiocb->result before each issue
um-ubd: Entrust re-queue to the upper layers
nvme-multipath: remove unused groups_only mode in ana log
nvme-multipath: fix possible io hang after ctrl reconnect
io_uring: don't touch ctx in setup after ring fd install
io_uring: Fix leaked shadow_req
Pull RISC-V fixes from Paul Walmsley:
"One fix for PCIe users:
- Fix legacy PCI I/O port access emulation
One set of cleanups:
- Resolve most of the warnings generated by sparse across arch/riscv.
No functional changes
And one MAINTAINERS update:
- Update Palmer's E-mail address"
* tag 'riscv/for-v5.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
MAINTAINERS: Change to my personal email address
RISC-V: Add PCIe I/O BAR memory mapping
riscv: for C functions called only from assembly, mark with __visible
riscv: fp: add missing __user pointer annotations
riscv: add missing header file includes
riscv: mark some code and data as file-static
riscv: init: merge split string literals in preprocessor directive
riscv: add prototypes for assembly language functions from head.S
Pull parisc fix from Helge Deller:
"Fix a parisc kernel crash with ftrace functions when compiled without
frame pointers"
* 'parisc-5.4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: fix frame pointer in ftrace_regs_caller()
Jakub Kicinski says:
====================
fix BPF offload related bugs
test_offload.py catches some recently added bugs.
First of a bug in test_offload.py itself after recent changes
to netdevsim is fixed.
Second patch fixes a bug in cls_bpf, and last one addresses
a problem with the recently added XDP installation optimization.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
When netdevice with offloaded BPF programs is destroyed
the programs are orphaned and removed from the program
IDA - their IDs get released (the programs may remain
accessible via existing open file descriptors and pinned
files). After IDs are released they are set to 0.
This confuses dev_change_xdp_fd() because it compares
the __dev_xdp_query() result where 0 means no program
with prog->aux->id where 0 means orphaned.
dev_change_xdp_fd() would have incorrectly returned success
even though it had not installed the program.
Since drivers already catch this case via bpf_offload_dev_match()
let them handle this case. The error message drivers produce in
this case ("program loaded for a different device") is in fact
correct as the orphaned program must had to be loaded for a
different device.
Fixes: c14a9f633d ("net: Don't call XDP_SETUP_PROG when nothing is changed")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 4011921137 ("net: sched: refactor block offloads counter
usage") missed the fact that either new prog or old prog may be
NULL.
Fixes: 4011921137 ("net: sched: refactor block offloads counter usage")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
DebugFS for netdevsim now contains some "action trigger" files
which are write only. Don't try to capture the contents of those.
Note that we can't use os.access() because the script requires
root.
Fixes: 4418f862d6 ("netdevsim: implement support for devlink region and snapshots")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This test reports EINVAL for getsockopt(SOL_SOCKET, SO_DOMAIN)
occasionally due to the uninitialized length parameter.
Initialize it to fix this, and also use int for "test_family" to comply
with the API standard.
Fixes: d6a61f80b8 ("soreuseport: test mixed v4/v6 sockets")
Reported-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Wei Wang <weiwan@google.com>
Cc: Craig Gallek <cgallek@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As reported in [0] at least one RTL8168dp version has problems
establishing a link. This chip version has an integrated RTL8211b PHY,
however the chip seems to report a wrong PHY ID, resulting in a wrong
PHY driver (for Generic Realtek PHY) being loaded.
Work around this issue by adding a hook to r8168dp_2_mdio_read()
for returning the correct PHY ID.
[0] https://bbs.archlinux.org/viewtopic.php?id=246508
Fixes: 242cd9b586 ("r8169: use phy_resume/phy_suspend")
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since it became possible for the DSA core to use a CPU port different
than 8, our bcm_sf2_imp_setup() function was broken because it assumes
that registers are applicable to port 8. In particular, the port's MAC
is going to stay disabled, so make sure we clear the RX_DIS and TX_DIS
bits if we are not configured for port 8.
Fixes: 9f91484f6f ("net: dsa: make "label" property optional for dsa2")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The phylink_dbg() macro does not follow dynamic debug or defined(DEBUG)
and as a result, it spams the kernel log since a PR_DEBUG level is
currently used. Fix it to be defined appropriately whether
CONFIG_DYNAMIC_DEBUG or defined(DEBUG) are set.
Fixes: 17091180b1 ("net: phylink: Add phylink_{printk, err, warn, info, dbg} macros")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Synces the DMA buffer properly in order for CPU and device to see
the most up-to-data data.
Signed-off-by: Yangchun Fu <yangchun@google.com>
Reviewed-by: Catherine Sullivan <csully@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Historically linux tried to stick to RFC 791, 1122, 2003
for IPv4 ID field generation.
RFC 6864 made clear that no matter how hard we try,
we can not ensure unicity of IP ID within maximum
lifetime for all datagrams with a given source
address/destination address/protocol tuple.
Linux uses a per socket inet generator (inet_id), initialized
at connection startup with a XOR of 'jiffies' and other
fields that appear clear on the wire.
Thiemo Nagel pointed that this strategy is a privacy
concern as this provides 16 bits of entropy to fingerprint
devices.
Let's switch to a random starting point, this is just as
good as far as RFC 6864 is concerned and does not leak
anything critical.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Thiemo Nagel <tnagel@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jeff Kirsher says:
====================
Intel Wired LAN Driver Updates 2019-11-01
This series contains updates to e1000, igb, igc, ixgbe, i40e and driver
documentation.
Lyude Paul fixes an issue where a fatal read error occurs when the
device is unplugged from the machine. So change the read error into a
warn while the device is still present.
Manfred Rudigier found that the i350 device was not apart of the "Media
Auto Sense" feature, yet the device supports it. So add the missing
i350 device to the check and fix an issue where the media auto sense
would flip/flop when no cable was connected to the port causing spurious
kernel log messages.
I fixed an issue where the fix to resolve receive buffer starvation was
applied in more than one place in the driver, one being the incorrect
location in the i40e driver.
Wenwen Wang fixes a potential memory leak in e1000 where allocated
memory is not properly cleaned up in one of the error paths.
Jonathan Neuschäfer cleans up the driver documentation to be consistent
and remove the footnote reference, since the footnote no longer exists in
the documentation.
Igor Pylypiv cleans up a duplicate clearing of a bit, no need to clear
it twice.
v2: Fixed alignment issue in patch 3 of the series based on community
feedback.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
These asterisks were once references to a line that said:
"* Other names and brands may be claimed as the property of others."
But now, they serve no purpose; they can only irritate the reader.
Fixes: de3edab427 ("e1000: update README for e1000")
Fixes: a3fb65680f ("e100.txt: Cleanup license info in kernel doc")
Fixes: da8c01c450 ("e1000e.txt: Add e1000e documentation")
Fixes: f12a84a9f6 ("Documentation: fm10k: Add kernel documentation")
Fixes: b55c52b193 ("igb.txt: Add igb documentation")
Fixes: c4e9b56e24 ("igbvf.txt: Add igbvf Documentation")
Fixes: d7064f4c19 ("Documentation/networking/: Update Intel wired LAN driver documentation")
Fixes: c4b8c01112 ("ixgbevf.txt: Update ixgbevf documentation")
Fixes: 1e06edcc2f ("Documentation: i40e: Prepare documentation for RST conversion")
Fixes: 105bf2fe6b ("i40evf: add driver to kernel build system")
Fixes: 1fae869bcf ("Documentation: ice: Prepare documentation for RST conversion")
Fixes: df69ba4321 ("ionic: Add basic framework for IONIC Network device driver")
Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
In e1000_set_ringparam(), 'tx_old' and 'rx_old' are not deallocated if
e1000_up() fails, leading to memory leaks. Refactor the code to fix this
issue.
Signed-off-by: Wenwen Wang <wenwen@cs.uga.edu>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Magnus's fix to resolve a potential receive buffer starvation for AF_XDP
got applied to both the i40e_xsk_umem_enable/disable() functions, when it
should have only been applied to the "enable". So clean up the undesired
code in the disable function.
CC: Magnus Karlsson <magnus.karlsson@intel.com>
Fixes: 1f459bdc20 ("i40e: fix potential RX buffer starvation for AF_XDP")
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
At least on the i350 there is an annoying behavior that is maybe also
present on 82580 devices, but was probably not noticed yet as MAS is not
widely used.
If no cable is connected on both fiber/copper ports the media auto sense
code will constantly swap between them as part of the watchdog task and
produce many unnecessary kernel log messages.
The swap code responsible for this behavior (switching to fiber) should
not be executed if the current media type is copper and there is no signal
detected on the fiber port. In this case we can safely wait until the
AUTOSENSE_EN bit is cleared.
Signed-off-by: Manfred Rudigier <manfred.rudigier@omicronenergy.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Pull scheduler fixes from Ingo Molnar:
"Fix two scheduler topology bugs/oversights on Juno r0 2+4 big.LITTLE
systems"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/topology: Allow sched_asym_cpucapacity to be disabled
sched/topology: Don't try to build empty sched domains
Pull perf fixes from Ingo Molnar:
"Misc fixes: an ABI fix for a reserved field, AMD IBS fixes, an Intel
uncore PMU driver fix and a header typo fix"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/headers: Fix spelling s/EACCESS/EACCES/, s/privilidge/privilege/
perf/x86/uncore: Fix event group support
perf/x86/amd/ibs: Handle erratum #420 only on the affected CPU family (10h)
perf/x86/amd/ibs: Fix reading of the IBS OpData register and thus precise RIP validity
perf/core: Start rejecting the syscall with attr.__reserved_2 set
Pull EFI fixes from Ingo Molnar:
"Various fixes all over the map: prevent boot crashes on HyperV,
classify UEFI randomness as bootloader randomness, fix EFI boot for
the Raspberry Pi2, fix efi_test permissions, etc"
* 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
efi/efi_test: Lock down /dev/efi_test and require CAP_SYS_ADMIN
x86, efi: Never relocate kernel below lowest acceptable address
efi: libstub/arm: Account for firmware reserved memory at the base of RAM
efi/random: Treat EFI_RNG_PROTOCOL output as bootloader randomness
efi/tpm: Return -EINVAL when determining tpm final events log size fails
efi: Make CONFIG_EFI_RCI2_TABLE selectable on x86 only
Kalle Valo says:
====================
wireless-drivers fixes for 5.4
Third set of fixes for 5.4. Most of them are for iwlwifi but important
fixes also for rtlwifi and mt76, the overflow fix for rtlwifi being
most important.
iwlwifi
* fix merge damage on earlier patch
* various fixes to device id handling
* fix scan config command handling which caused firmware asserts
rtlwifi
* fix overflow on P2P IE handling
* don't deliver too small frames to mac80211
mt76
* disable PCIE_ASPM
* fix buffer DMA unmap on certain cases
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The remove misses to disable and unprepare priv->macclk like what is done
when probe fails.
Add the missed call in remove.
Signed-off-by: Chuhong Yuan <hslester96@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull arm64 fixes from Will Deacon:
"These are almost exclusively related to CPU errata in CPUs from
Broadcom and Qualcomm where the workarounds were either not being
enabled when they should have been or enabled when they shouldn't have
been.
The only "interesting" fix is ensuring that writeable, shared mappings
are initially mapped as clean since we inadvertently broke the logic
back in v4.14 and then noticed the problem via code inspection the
other day.
The only critical issue we have outstanding is a sporadic NULL
dereference in the scheduler, which doesn't appear to be
arm64-specific and PeterZ is tearing his hair out over it at the
moment.
Summary:
- Enable CPU errata workarounds for Broadcom Brahma-B53
- Enable CPU errata workarounds for Qualcomm Hydra/Kryo CPUs
- Fix initial dirty status of writeable, shared mappings"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: apply ARM64_ERRATUM_843419 workaround for Brahma-B53 core
arm64: Brahma-B53 is SSB and spectre v2 safe
arm64: apply ARM64_ERRATUM_845719 workaround for Brahma-B53 core
arm64: cpufeature: Enable Qualcomm Falkor errata 1009 for Kryo
arm64: cpufeature: Enable Qualcomm Falkor/Kryo errata 1003
arm64: Ensure VM_WRITE|VM_SHARED ptes are clean by default
Pull kvm fixes from Paolo Bonzini:
"generic:
- fix memory leak on failure to create VM
x86:
- fix MMU corner case with AMD nested paging disabled"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: vmx, svm: always run with EFER.NXE=1 when shadow paging is active
kvm: call kvm_arch_destroy_vm if vm creation fails
kvm: Allocate memslots and buses before calling kvm_arch_init_vm
Pull drm fixes from Dave Airlie:
"This is the regular drm fixes pull request for 5.4-rc6. It's a bit
larger than I'd like but then last week was quieter than usual.
The main fixes are amdgpu, and the two bigger area are navi fixes
which are the newest GPU range so still getting actively fixed up, but
also a bunch of clang stack alignment fixes (as amdgpu uses double in
some places).
Otherwise it's all fairly run of the mill fixes, i915, panfrost,
etnaviv, v3d and radeon, along with a core scheduler fix.
Summary:
amdgpu:
- clang alignment fixes
- Updated golden settings
- navi: gpuvm, sdma and display fixes
- Freesync fix
- Gamma fix for DCN
- DP dongle detection fix
- vega10: Fix for undervolting
radeon:
- reenable kexec fix for ppc
scheduler:
- set an error if hw job failed
i915:
- fix PCH reference clock for HSW/BDW
- TGL display PLL doc fix
panfrost:
- warning fix
- runtime pm fix
- bad pointer dereference fix
v3d:
- memleak fix
etnaviv:
- memory corruption fix
- deadlock fix
- reintroduce lost debug message"
* tag 'drm-fixes-2019-11-01' of git://anongit.freedesktop.org/drm/drm: (29 commits)
drm/amdgpu: enable -msse2 for GCC 7.1+ users
drm/amdgpu: fix stack alignment ABI mismatch for GCC 7.1+
drm/amdgpu: fix stack alignment ABI mismatch for Clang
drm/radeon: Fix EEH during kexec
drm/amdgpu/gmc10: properly set BANK_SELECT and FRAGMENT_SIZE
drm/amdgpu/powerplay/vega10: allow undervolting in p7
dc.c:use kzalloc without test
drm/amd/display: setting the DIG_MODE to the correct value.
drm/amd/display: Passive DP->HDMI dongle detection fix
drm/amd/display: add 50us buffer as WA for pstate switch in active
drm/amd/display: Allow inverted gamma
drm/amd/display: do not synchronize "drr" displays
drm/amdgpu: If amdgpu_ib_schedule fails return back the error.
drm/sched: Set error to s_fence if HW job submission failed.
drm/amdgpu/gfx10: update gfx golden settings for navi12
drm/amdgpu/gfx10: update gfx golden settings for navi14
drm/amdgpu/gfx10: update gfx golden settings
drm/amd/display: Change Navi14's DWB flag to 1
drm/amdgpu/sdma5: do not execute 0-sized IBs (v2)
drm/amdgpu: Fix SDMA hang when performing VKexample test
...
Pull power management fix from Rafael Wysocki:
"Fix a recently introduced (mostly theoretical) issue that the requests
to confine the maximum CPU frequency coming from the platform firmware
may not be taken into account if multiple CPUs are covered by one
cpufreq policy on a system with ACPI"
* tag 'pm-5.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: processor: Add QoS requests for all CPUs
Pull rdma fixes from Jason Gunthorpe:
"A number of bug fixes and a regression fix:
- Various issues from static analysis in hfi1, uverbs, hns, and cxgb4
- Fix for deadlock in a case when the new auto RDMA module loading is
used
- Missing _irq notation in a prior -rc patch found by lockdep
- Fix a locking and lifetime issue in siw
- Minor functional bug fixes in cxgb4, mlx5, qedr
- Fix a regression where vlan interfaces no longer worked with RDMA
CM in some cases"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
RDMA/hns: Prevent memory leaks of eq->buf_list
RDMA/iw_cxgb4: Avoid freeing skb twice in arp failure case
RDMA/mlx5: Use irq xarray locking for mkey_table
IB/core: Avoid deadlock during netlink message handling
RDMA/nldev: Skip counter if port doesn't match
RDMA/uverbs: Prevent potential underflow
IB/core: Use rdma_read_gid_l2_fields to compare GID L2 fields
RDMA/qedr: Fix reported firmware version
RDMA/siw: free siw_base_qp in kref release routine
RDMA/iwcm: move iw_rem_ref() calls out of spinlock
iw_cxgb4: fix ECN check on the passive accept
IB/hfi1: Use a common pad buffer for 9B and 16B packets
IB/hfi1: Avoid excessive retry for TID RDMA READ request
RDMA/mlx5: Clear old rate limit when closing QP
Pull sound fixes from Takashi Iwai:
"A couple of regression fixes and a fix for mutex deadlock at
hog-unplug, as well as other device-specific fixes:
- A commit to avoid the spurious unsolicited interrupt on HD-audio
bus caused a stall at shutdown, so it's reverted now.
- The recent support of AMD/Nvidia audio component binding caused a
mutex deadlock; fixed by splitting to another mutex
- The device hot-unplug and the ALSA timer close combo may lead to
another mutex deadlock; fixed by moving put_device() calls
- Usual device-specific small quirks for HD- and USB-audio drivers
- An old error check fix in FireWire driver"
* tag 'sound-5.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: timer: Fix mutex deadlock at releasing card
ALSA: hda - Fix mutex deadlock in HDMI codec driver
Revert "ALSA: hda: Flush interrupts on disabling"
ALSA: bebob: Fix prototype of helper function to return negative value
ALSA: hda/realtek - Fix 2 front mics of codec 0x623
ALSA: hda/realtek - Add support for ALC623
ALSA: usb-audio: Add DSD support for Gustard U16/X26 USB Interface
A typo in nfs4_refresh_delegation_stateid() means we're leaking an
RCU lock, and always returning a value of 'false'. As the function
description states, we were always supposed to return 'true' if a
matching delegation was found.
Fixes: 12f275cdd1 ("NFSv4: Retry CLOSE and DELEGRETURN on NFS4ERR_OLD_STATEID.")
Cc: stable@vger.kernel.org # v4.15+
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
If the delegation is marked as being revoked, we must not use it
for cached opens.
Fixes: 869f9dfa4d ("NFSv4: Fix races between nfs_remove_bad_delegation() and delegation return")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
The Broadcom Brahma-B53 core is susceptible to the issue described by
ARM64_ERRATUM_843419 so this commit enables the workaround to be applied
when executing on that core.
Since there are now multiple entries to match, we must convert the
existing ARM64_ERRATUM_843419 into an erratum list and use
cpucap_multi_entry_cap_matches to match our entries.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Will Deacon <will@kernel.org>
Add the Brahma-B53 CPU (all versions) to the whitelists of CPUs for the
SSB and spectre v2 mitigations.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Will Deacon <will@kernel.org>
The Broadcom Brahma-B53 core is susceptible to the issue described by
ARM64_ERRATUM_845719 so this commit enables the workaround to be applied
when executing on that core.
Since there are now multiple entries to match, we must convert the
existing ARM64_ERRATUM_845719 into an erratum list.
Signed-off-by: Doug Berger <opendmb@gmail.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Will Deacon <will@kernel.org>
This patch enables the hardware feature "Media Auto Sense" also on the
i350. It works in the same way as on the 82850 devices. Hardware designs
using dual PHYs (fiber/copper) can enable this feature by setting the MAS
enable bits in the NVM_COMPAT register (0x03) in the EEPROM.
Signed-off-by: Manfred Rudigier <manfred.rudigier@omicronenergy.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
tcp_max_syn_backlog default value depends on memory size
and TCP ehash size. Before this patch, the max value
was 2048 [1], which is considered too small nowadays.
Increase it to 4096 to match the recent SOMAXCONN change.
[1] This is with TCP ehash size being capped to 524288 buckets.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willy Tarreau <w@1wt.eu>
Cc: Yue Cao <ycao009@ucr.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
SOMAXCONN is /proc/sys/net/core/somaxconn default value.
It has been defined as 128 more than 20 years ago.
Since it caps the listen() backlog values, the very small value has
caused numerous problems over the years, and many people had
to raise it on their hosts after beeing hit by problems.
Google has been using 1024 for at least 15 years, and we increased
this to 4096 after TCP listener rework has been completed, more than
4 years ago. We got no complain of this change breaking any
legacy application.
Many applications indeed setup a TCP listener with listen(fd, -1);
meaning they let the system select the backlog.
Raising SOMAXCONN lowers chance of the port being unavailable under
even small SYNFLOOD attack, and reduces possibilities of side channel
vulnerabilities.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willy Tarreau <w@1wt.eu>
Cc: Yue Cao <ycao009@ucr.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
When rxrpc_recvmsg_data() sets the return value to 1 because it's drained
all the data for the last packet, it checks the last-packet flag on the
whole packet - but this is wrong, since the last-packet flag is only set on
the final subpacket of the last jumbo packet. This means that a call that
receives its last packet in a jumbo packet won't complete properly.
Fix this by having rxrpc_locate_data() determine the last-packet state of
the subpacket it's looking at and passing that back to the caller rather
than having the caller look in the packet header. The caller then needs to
cache this in the rxrpc_call struct as rxrpc_locate_data() isn't then
called again for this packet.
Fixes: 248f219cb8 ("rxrpc: Rewrite the data and ack handling code")
Fixes: e2de6c4048 ("rxrpc: Use info in skbuff instead of reparsing a jumbo packet")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Johannes Berg says:
====================
Just two fixes:
* HT operation is not allowed on channel 14 (Japan only)
* netlink policy for nexthop attribute was wrong
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This code causes a static analysis warning:
block/blk-iocost.c:2113 ioc_weight_write() error: double lock 'irq'
We disable IRQs in blkg_conf_prep() and re-enable them in
blkg_conf_finish(). IRQ disable/enable should not be nested because
that means the IRQs will be enabled at the first unlock instead of the
second one.
Fixes: 7caa47151a ("blkcg: implement blk-iocost")
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The idle time reported in /proc/stat sometimes incorrectly contains
huge values on s390. This is caused by a bug in arch_cpu_idle_time().
The kernel tries to figure out when a different cpu entered idle by
accessing its per-cpu data structure. There is an ordering problem: if
the remote cpu has an idle_enter value which is not zero, and an
idle_exit value which is zero, it is assumed it is idle since
"now". The "now" timestamp however is taken before the idle_enter
value is read.
Which in turn means that "now" can be smaller than idle_enter of the
remote cpu. Unconditionally subtracting idle_enter from "now" can thus
lead to a negative value (aka large unsigned value).
Fix this by moving the get_tod_clock() invocation out of the
loop. While at it also make the code a bit more readable.
A similar bug also exists for show_idle_time(). Fix this is as well.
Cc: <stable@vger.kernel.org>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
unwind_for_each_frame stops after the first frame if regs->gprs[15] <=
sp.
The reason is that in case regs are specified, the first frame should be
regs->psw.addr and the second frame should be sp->gprs[8]. However,
currently the second frame is regs->gprs[15], which confuses
outside_of_stack().
Fix by introducing a flag to distinguish this special case from
unwinding the interrupt handler, for which the current behavior is
appropriate.
Fixes: 78c98f9074 ("s390/unwind: introduce stack unwind API")
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Cc: stable@vger.kernel.org # v5.2+
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
The problem is that we were putting the NUL terminator too far:
buf[sizeof(buf) - 1] = '\0';
If the user input isn't NUL terminated and they haven't initialized the
whole buffer then it leads to an info leak. The NUL terminator should
be:
buf[len - 1] = '\0';
Signed-off-by: Yihui Zeng <yzeng56@asu.edu>
Cc: stable@vger.kernel.org
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
[heiko.carstens@de.ibm.com: keep semantics of how *lenp and *ppos are handled]
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
The Kryo cores share errata 1009 with Falkor, so add their model
definitions and enable it for them as well.
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
[will: Update entry in silicon-errata.rst]
Signed-off-by: Will Deacon <will@kernel.org>
VMX already does so if the host has SMEP, in order to support the combination of
CR0.WP=1 and CR4.SMEP=1. However, it is perfectly safe to always do so, and in
fact VMX already ends up running with EFER.NXE=1 on old processors that lack the
"load EFER" controls, because it may help avoiding a slow MSR write. Removing
all the conditionals simplifies the code.
SVM does not have similar code, but it should since recent AMD processors do
support SMEP. So this patch also makes the code for the two vendors more similar
while fixing NPT=0, CR0.WP=1 and CR4.SMEP=1 on AMD processors.
Cc: stable@vger.kernel.org
Cc: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
In kvm_create_vm(), if we've successfully called kvm_arch_init_vm(), but
then fail later in the function, we need to call kvm_arch_destroy_vm()
so that it can do any necessary cleanup (like freeing memory).
Fixes: 44a95dae1d ("KVM: x86: Detect and Initialize AVIC support")
Signed-off-by: John Sperbeck <jsperbeck@google.com>
Signed-off-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Junaid Shahid <junaids@google.com>
[Remove dependency on "kvm: Don't clear reference count on
kvm_create_vm() error path" which was not committed. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Currently, kernel fails to boot on some HyperV VMs when using EFI.
And it's a potential issue on all x86 platforms.
It's caused by broken kernel relocation on EFI systems, when below three
conditions are met:
1. Kernel image is not loaded to the default address (LOAD_PHYSICAL_ADDR)
by the loader.
2. There isn't enough room to contain the kernel, starting from the
default load address (eg. something else occupied part the region).
3. In the memmap provided by EFI firmware, there is a memory region
starts below LOAD_PHYSICAL_ADDR, and suitable for containing the
kernel.
EFI stub will perform a kernel relocation when condition 1 is met. But
due to condition 2, EFI stub can't relocate kernel to the preferred
address, so it fallback to ask EFI firmware to alloc lowest usable memory
region, got the low region mentioned in condition 3, and relocated
kernel there.
It's incorrect to relocate the kernel below LOAD_PHYSICAL_ADDR. This
is the lowest acceptable kernel relocation address.
The first thing goes wrong is in arch/x86/boot/compressed/head_64.S.
Kernel decompression will force use LOAD_PHYSICAL_ADDR as the output
address if kernel is located below it. Then the relocation before
decompression, which move kernel to the end of the decompression buffer,
will overwrite other memory region, as there is no enough memory there.
To fix it, just don't let EFI stub relocate the kernel to any address
lower than lowest acceptable address.
[ ardb: introduce efi_low_alloc_above() to reduce the scope of the change ]
Signed-off-by: Kairui Song <kasong@redhat.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: https://lkml.kernel.org/r/20191029173755.27149-6-ardb@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The EFI stubloader for ARM starts out by allocating a 32 MB window
at the base of RAM, in order to ensure that the decompressor (which
blindly copies the uncompressed kernel into that window) does not
overwrite other allocations that are made while running in the context
of the EFI firmware.
In some cases, (e.g., U-Boot running on the Raspberry Pi 2), this is
causing boot failures because this initial allocation conflicts with
a page of reserved memory at the base of RAM that contains the SMP spin
tables and other pieces of firmware data and which was put there by
the bootloader under the assumption that the TEXT_OFFSET window right
below the kernel is only used partially during early boot, and will be
left alone once the memory reservations are processed and taken into
account.
So let's permit reserved memory regions to exist in the region starting
at the base of RAM, and ending at TEXT_OFFSET - 5 * PAGE_SIZE, which is
the window below the kernel that is not touched by the early boot code.
Tested-by: Guillaume Gardet <Guillaume.Gardet@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Chester Lin <clin@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: https://lkml.kernel.org/r/20191029173755.27149-5-ardb@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Commit 428826f535 ("fdt: add support for rng-seed") introduced
add_bootloader_randomness(), permitting randomness provided by the
bootloader or firmware to be credited as entropy. However, the fact
that the UEFI support code was already wired into the RNG subsystem
via a call to add_device_randomness() was overlooked, and so it was
not converted at the same time.
Note that this UEFI (v2.4 or newer) feature is currently only
implemented for EFI stub booting on ARM, and further note that
CONFIG_RANDOM_TRUST_BOOTLOADER must be enabled, and this should be
done only if there indeed is sufficient trust in the bootloader
_and_ its source of randomness.
[ ardb: update commit log ]
Tested-by: Bhupesh Sharma <bhsharma@redhat.com>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: https://lkml.kernel.org/r/20191029173755.27149-4-ardb@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull dmaengine fixes from Vinod Koul:
"A few fixes to the dmaengine drivers:
- fix in sprd driver for link list and potential memory leak
- tegra transfer failure fix
- imx size check fix for script_number
- xilinx fix for 64bit AXIDMA and control reg update
- qcom bam dma resource leak fix
- cppi slave transfer fix when idle"
* tag 'dmaengine-fix-5.4-rc6' of git://git.infradead.org/users/vkoul/slave-dma:
dmaengine: cppi41: Fix cppi41_dma_prep_slave_sg() when idle
dmaengine: qcom: bam_dma: Fix resource leak
dmaengine: sprd: Fix the possible memory leak issue
dmaengine: xilinx_dma: Fix control reg update in vdma_channel_set_config
dmaengine: xilinx_dma: Fix 64-bit simple AXIDMA transfer
dmaengine: imx-sdma: fix size check for sdma script_number
dmaengine: tegra210-adma: fix transfer failure
dmaengine: sprd: Fix the link-list pointer register configuration issue
Haiyang Zhang says:
====================
hv_netvsc: fix error handling in netvsc_attach/set_features
The error handling code path in these functions are not correct.
This patch set fixes them.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
If rndis_filter_open() fails, we need to remove the rndis device created
in earlier steps, before returning an error code. Otherwise, the retry of
netvsc_attach() from its callers will fail and hang.
Fixes: 7b2ee50c0c ("hv_netvsc: common detach logic")
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When an error is returned by rndis_filter_set_offload_params(), we should
still assign the unaffected features to ndev->features. Otherwise, these
features will be missing.
Fixes: d6792a5a07 ("hv_netvsc: Add handler for LRO setting change")
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Release resources when attaching to ULD fail. Otherwise, data
mismatch is seen between LLD and ULD later on, which lead to
kernel panic when accessing resources that should not even
exist in the first place.
Fixes: 94cdb8bb99 ("cxgb4: Add support for dynamic allocation of resources for ULD")
Signed-off-by: Shahjada Abul Husain <shahjada@chelsio.com>
Signed-off-by: Vishal Kulkarni <vishal@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a card is disconnected while in use, the system waits until all
opened files are closed then releases the card. This is done via
put_device() of the card device in each device release code.
The recently reported mutex deadlock bug happens in this code path;
snd_timer_close() for the timer device deals with the global
register_mutex and it calls put_device() there. When this timer
device is the last one, the card gets freed and it eventually calls
snd_timer_free(), which has again the protection with the global
register_mutex -- boom.
Basically put_device() call itself is race-free, so a relative simple
workaround is to move this put_device() call out of the mutex. For
achieving that, in this patch, snd_timer_close_locked() got a new
argument to store the card device pointer in return, and each caller
invokes put_device() with the returned object after the mutex unlock.
Reported-and-tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
We use io_kiocb->result == -EAGAIN as a way to know if we need to
re-submit a polled request, as -EAGAIN reporting happens out-of-line
for IO submission failures. This field is cleared when we originally
allocate the request, but it isn't reset when we retry the submission
from async context. This can cause issues where we think something
needs a re-issue, but we're really just reading stale data.
Reset ->result whenever we re-prep a request for polled submission.
Cc: stable@vger.kernel.org
Fixes: 9e645e1105 ("io_uring: add support for sqe links")
Reported-by: Bijan Mottahedeh <bijan.mottahedeh@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The current code in ftrace_regs_caller() doesn't assign
%r3 to contain the address of the current frame. This
is hidden if the kernel is compiled with FRAME_POINTER,
but without it just crashes because it tries to dereference
an arbitrary address. Fix this by always setting %r3 to the
current stack frame.
Signed-off-by: Sven Schnelle <svens@stackframe.org>
Signed-off-by: Helge Deller <deller@gmx.de>
The devlink parameter "acl_region_rehash_interval" is a runtime
parameter whose value is stored in a dynamically allocated memory. While
reloading the driver, this memory is freed and then allocated again. A
use-after-free might happen if during this time frame someone tries to
retrieve its value.
Since commit 070c63f20f ("net: devlink: allow to change namespaces
during reload") the use-after-free can be reliably triggered when
reloading the driver into a namespace, as after freeing the memory (via
reload_down() callback) all the parameters are notified.
Fix this by unpublishing and then re-publishing the parameters during
reload.
Fixes: 98bbf70c1c ("mlxsw: spectrum: add "acl_region_rehash_interval" devlink param")
Fixes: 7c62cfb8c5 ("devlink: publish params only after driver init is done")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Current implementation for nvm_attr configuration instructs the management
FW to load/unload the nvm-cfg image for each user-provided attribute in
the input file. This consumes lot of cycles even for few tens of
attributes.
This patch updates the implementation to perform load/commit of the config
for every 50 attributes. After loading the nvm-image, MFW expects that
config should be committed in a predefined timer value (5 sec), hence it's
not possible to write large number of attributes in a single load/commit
window. Hence performing the commits in chunks.
Fixes: 0dabbe1bb3 ("qed: Add driver API for flashing the config attributes.")
Signed-off-by: Sudarsana Reddy Kalluru <skalluru@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
After commit 0ce1822c2a ("vxlan: add adjacent link to limit depth
level"), vxlan_changelink() could fail because of
netdev_adjacent_change_prepare().
netdev_adjacent_change_prepare() returns -EEXIST when old lower device
and new lower device are same.
(old lower device is "dst->remote_dev" and new lower device is "lowerdev")
So, before calling it, lowerdev should be NULL if these devices are same.
Test command1:
ip link add dummy0 type dummy
ip link add vxlan0 type vxlan dev dummy0 dstport 4789 vni 1
ip link set vxlan0 type vxlan ttl 5
RTNETLINK answers: File exists
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: 0ce1822c2a ("vxlan: add adjacent link to limit depth level")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When we're destroying the host transport mechanism, we should ensure
that we do not leak memory by failing to release any back channel
slots that might still exist.
Reported-by: Neil Brown <neilb@suse.de>
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
If there are RDMA back channel requests being processed by the
server threads, then we should hold a reference to the transport
to ensure it doesn't get freed from underneath us.
Reported-by: Neil Brown <neilb@suse.de>
Fixes: 63cae47005 ("xprtrdma: Handle incoming backward direction RPC calls")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
If there are TCP back channel requests being processed by the
server threads, then we should hold a reference to the transport
to ensure it doesn't get freed from underneath us.
Reported-by: Neil Brown <neilb@suse.de>
Fixes: 2ea24497a1 ("SUNRPC: RPC callbacks may be split across several..")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
A final attempt at enabling sse2 for GCC users.
Orininally attempted in:
commit 1011745073 ("drm/amd/display: add -msse2 to prevent Clang from emitting libcalls to undefined SW FP routines")
Reverted due to "reported instability" in:
commit 193392ed9f ("Revert "drm/amd/display: add -msse2 to prevent Clang from emitting libcalls to undefined SW FP routines"")
Re-added just for Clang in:
commit 0f0727d971 ("drm/amd/display: readd -msse2 to prevent Clang from emitting libcalls to undefined SW FP routines")
The original report didn't have enough information to know if the GPF
was due to misalignment, but I suspect that it was. (The missing
information was the disassembly of the function at the bottom of the
trace, to see if the instruction pointer pointed to an instruction with
16B alignment memory operand requirements. The stack trace does show
the stack was only 8B but not 16B aligned though, which makes this a
strong possibility).
Now that the stack misalignment issue has been fixed for users of GCC
7.1+, reattempt adding -msse2. This matches Clang.
It will likely never be safe to enable this for pre-GCC 7.1 AND use a
16B aligned stack in these translation units.
This is only a functional change for GCC 7.1+ users, and should be boot
tested.
Link: https://bugs.freedesktop.org/show_bug.cgi?id=109487
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
GCC earlier than 7.1 errors when compiling code that makes use of
`double`s and sets a stack alignment outside of the range of [2^4-2^12]:
$ cat foo.c
double foo(double x, double y) {
return x + y;
}
$ gcc-4.9 -mpreferred-stack-boundary=3 foo.c
error: -mpreferred-stack-boundary=3 is not between 4 and 12
This is likely why the AMDGPU driver was ever compiled with a different
stack alignment (and thus different ABI) than the rest of the x86
kernel. The kernel uses 8B stack alignment, while the driver was using
16B stack alignment in a few places.
Since GCC 7.1+ doesn't error, fix the ABI mismatch for users of newer
versions of GCC.
There was discussion about whether to mark the driver broken or not for
users of GCC earlier than 7.1, but since the driver currently is
working, don't explicitly break the driver for them here.
Relying on differing stack alignment is unspecified behavior, and
brittle, and may break in the future.
This patch is no functional change for GCC users earlier than 7.1. It's
been compile tested on GCC 4.9 and 8.3 to check the correct flags. It
should be boot tested when built with GCC 7.1+.
-mincoming-stack-boundary= or -mstackrealign may help keep this code
building for pre-GCC 7.1 users.
The version check for GCC is broken into two conditionals, both because
cc-ifversion is currently GCC specific, and it simplifies a subsequent
patch.
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The x86 kernel is compiled with an 8B stack alignment via
`-mpreferred-stack-boundary=3` for GCC since 3.6-rc1 via
commit d9b0cde91c ("x86-64, gcc: Use -mpreferred-stack-boundary=3 if supported")
or `-mstack-alignment=8` for Clang. Parts of the AMDGPU driver are
compiled with 16B stack alignment.
Generally, the stack alignment is part of the ABI. Linking together two
different translation units with differing stack alignment is dangerous,
particularly when the translation unit with the smaller stack alignment
makes calls into the translation unit with the larger stack alignment.
While 8B aligned stacks are sometimes also 16B aligned, they are not
always.
Multiple users have reported General Protection Faults (GPF) when using
the AMDGPU driver compiled with Clang. Clang is placing objects in stack
slots assuming the stack is 16B aligned, and selecting instructions that
require 16B aligned memory operands.
At runtime, syscall handlers with 8B aligned stack call into code that
assumes 16B stack alignment. When the stack is a multiple of 8B but not
16B, these instructions result in a GPF.
Remove the code that added compatibility between the differing compiler
flags, as it will result in runtime GPFs when built with Clang. Cleanups
for GCC will be sent in later patches in the series.
Link: https://github.com/ClangBuiltLinux/linux/issues/735
Debugged-by: Yuxuan Shui <yshuiv7@gmail.com>
Reported-by: Shirish S <shirish.s@amd.com>
Reported-by: Yuxuan Shui <yshuiv7@gmail.com>
Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
During kexec some adapters hit an EEH since they are not properly
shut down in the radeon_pci_shutdown() function. Adding
radeon_suspend_kms() fixes this issue.
Enabled only on PPC because this patch causes issues on some other
boards.
Signed-off-by: Kyle Mahlkuch <kmahlkuc@linux.vnet.ibm.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
This patch is for fixing Navi14 HDMI display pink screen issue.
[How]
Call stream->link->link_enc->funcs->setup twice. This is setting
the DIG_MODE to the correct value after having been overridden by
the call to transmitter control.
Signed-off-by: Zhan Liu <zhan.liu@amd.com>
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[WHY]
i2c_read is called to differentiate passive DP->HDMI and DP->DVI-D dongles
The call is expected to fail in DVI-D case but pass in HDMI case
Some HDMI dongles have a chance to fail as well, causing misdetection as DVI-D
[HOW]
Retry i2c_read to ensure failed result is valid
Signed-off-by: Michael Strauss <michael.strauss@amd.com>
Reviewed-by: Tony Cheng <Tony.Cheng@amd.com>
Acked-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[why]
A display that supports DRR can never really be considered
"synchronized" with any other display because we can dynamically
enable DRR (i.e. without modeset). this will cause their
relative CRTC positions to drift and lose sync. this will disrupt
features such as MCLK switching that assume and depend on
their permanent alignment (that can only change with modeset)
[how]
check for ignore_msa in stream when considered synchronizability
this ignore_msa is basically actually implemented as "supports drr"
Signed-off-by: Jun Lei <Jun.Lei@amd.com>
Reviewed-by: Yongqiang Sun <yongqiang.sun@amd.com>
Acked-by: Anthony Koo <Anthony.Koo@amd.com>
Acked-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Problem:
When run_job fails and HW fence returned is NULL we still signal
the s_fence to avoid hangs but the user has no way of knowing if
the actual HW job was ran and finished.
Fix:
Allow .run_job implementations to return ERR_PTR in the fence pointer
returned and then set this error for s_fence->finished fence so whoever
wait on this fence can inspect the signaled fence for an error.
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
DWB (Display Writeback) flag needs to be enabled as 1, or system
will throw out a few warnings when creating dcn20 resource pool.
Also, Navi14's dwb setting needs to match Navi10's,
which has already been set to 1.
[How]
Change value of num_dwb from 0 to 1.
Signed-off-by: Zhan Liu <zhan.liu@amd.com>
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The API was reduced to include only knowledge currently needed by the
FW scan logic, the rest is legacy. Support the new, reduced version.
Using the old API with newer firmwares (starting from
iwlwifi-*-50.ucode, which implements and requires the new API version)
causes an assertion failure similar to this one:
[ 2.854505] iwlwifi 0000:00:14.3: 0x20000038 | BAD_COMMAND
Signed-off-by: Ayala Beker <ayala.beker@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
mt76 dma layer is supposed to unmap skb data buffers while keep txwi
mapped on hw dma ring. At the moment mt76 wrongly unmap txwi or does
not unmap data fragments in even positions for non-linear skbs. This
issue may result in hw hangs with A-MSDU if the system relies on IOMMU
or SWIOTLB. Fix this behaviour properly unmapping data fragments on
non-linear skbs.
Fixes: 17f1de56df ("mt76: add common code shared between multiple chipsets")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
On same device (e.g. U7612E-H1) PCIE_ASPM causes continuous mcu hangs and
instability. Since mt76x2 series does not manage PCIE PS states, first we
try to disable ASPM using pci_disable_link_state. If it fails, we will
disable PCIE PS configuring PCI registers.
This patch has been successfully tested on U7612E-H1 mini-pice card
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
The commit ade49db337 ("ALSA: hda/hdmi - Allow audio component for
AMD/ATI and Nvidia HDMI") introduced the spec->pcm_lock mutex lock to
the whole generic_hdmi_init() function for avoiding the race with the
audio component registration. However, this caused a dead lock when
the unsolicited event is handled without the audio component, as the
codec gets runtime-resumed in hdmi_present_sense() which is already
inside the spec->pcm_lock in its caller.
For avoiding this deadlock, add a new mutex only for the audio
component binding that is used in both generic_hdmi_init() and the
audio notifier registration where the jack callbacks are handled /
re-registered.
Fixes: ade49db337 ("ALSA: hda/hdmi - Allow audio component for AMD/ATI and Nvidia HDMI")
Reported-and-tested-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://lore.kernel.org/r/s5himo7i89i.wl-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Pull iommu fixes from Joerg Roedel:
- Follow-on fix for Renesas IPMMU to get rid of a redundant error
message.
- Quirk for AMD IOMMU to make it work on another Acer Laptop model with
a broken IVRS ACPI table.
- Fix for a panic at kdump in the Intel IOMMU driver.
* tag 'iommu-fixes-v5.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu/vt-d: Fix panic after kexec -p for kdump
iommu/amd: Apply the same IVRS IOAPIC workaround to Acer Aspire A315-41
iommu/ipmmu-vmsa: Remove dev_err() on platform_get_irq() failure
Pull gfs2 fix from Andreas Gruenbacher:
"Fix remounting (broken in -rc1)."
* tag 'gfs2-v5.4-rc5.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
gfs2: Fix initialisation of args for remount
When gfs2 was converted to use fs_context, the initialisation of the
mount args structure to the currently active args was lost with the
removal of gfs2_remount_fs(), so the checks of the new args on remount
became checks against the default values instead of the current ones.
This caused unexpected remount behaviour and test failures (xfstests
generic/294, generic/306 and generic/452).
Reinstate the args initialisation, this time in gfs2_init_fs_context()
and conditional upon fc->purpose, as that's the only time we get control
before the mount args are parsed in the remount process.
Fixes: 1f52aa08d1 ("gfs2: Convert gfs2 to fs_context")
Signed-off-by: Andrew Price <anprice@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
platform_get_irq() will call dev_err() itself on failure,
so there is no need for the driver to also do this.
This is detected by coccinelle.
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
This patch disables setting of HT20 and more for channel 14 because
the channel is only for IEEE 802.11b.
The patch for net/wireless/util.c was unit-tested.
The patch for net/wireless/chan.c was tested with iw command.
Before this patch.
$ sudo iw dev <ifname> set channel 14 HT20
$
After this patch.
$ sudo iw dev <ifname> set channel 14 HT20
kernel reports: invalid channel definition
command failed: Invalid argument (-22)
$
Signed-off-by: Masashi Honma <masashi.honma@gmail.com>
Link: https://lore.kernel.org/r/20191021075045.2719-1-masashi.honma@gmail.com
[clean up the code, use != instead of equivalent >]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
I'm leaving SiFive in a bit less than two weeks, which means I'll be
losing my @sifive email address. I don't have my new email address yet,
so I'm switching over to my personal address instead.
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
Saeed Mahameed says:
====================
Mellanox, mlx5 fixes 2019-10-24
This series introduces misc fixes to mlx5 driver.
v1->v2:
- Dropped the kTLS counter documentation patch, Tariq will fix it and
send it later.
- Added a new fix for link speed mode reporting.
('net/mlx5e: Initialize link modes bitmap on stack')
For -stable v4.14
('net/mlx5e: Fix handling of compressed CQEs in case of low NAPI budget')
For -stable v4.19
('net/mlx5e: Fix ethtool self test: link speed')
For -stable v5.2
('net/mlx5: Fix flow counter list auto bits struct')
('net/mlx5: Fix rtable reference leak')
For -stable v5.3
('net/mlx5e: Remove incorrect match criteria assignment line')
('net/mlx5e: Determine source port properly for vlan push action')
('net/mlx5e: Initialize link modes bitmap on stack')
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
If a nonblocking socket is immediately closed after connect(),
the connect worker may not have started. This results in a refcount
problem, since sock_hold() is called from the connect worker.
This patch moves the sock_hold in front of the connect worker
scheduling.
Reported-by: syzbot+4c063e6dea39e4b79f29@syzkaller.appspotmail.com
Fixes: 50717a37db ("net/smc: nonblocking connect rework")
Reviewed-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use platform_get_irq_byname_optional() and platform_get_irq_optional()
instead of platform_get_irq_byname() and platform_get_irq() for optional
IRQs to avoid below error message during probe:
[ 0.795803] fec 30be0000.ethernet: IRQ pps not found
[ 0.800787] fec 30be0000.ethernet: IRQ index 3 not found
Signed-off-by: Anson Huang <Anson.Huang@nxp.com>
Acked-by: Fugang Duan <fugang.duan@nxp.com>
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Failed to get irq using name is NOT fatal as driver will use index
to get irq instead, use platform_get_irq_byname_optional() instead
of platform_get_irq_byname() to avoid below error message during
probe:
[ 0.819312] fec 30be0000.ethernet: IRQ int0 not found
[ 0.824433] fec 30be0000.ethernet: IRQ int1 not found
[ 0.829539] fec 30be0000.ethernet: IRQ int2 not found
Signed-off-by: Anson Huang <Anson.Huang@nxp.com>
Acked-by: Fugang Duan <fugang.duan@nxp.com>
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dave's Facebook email address is not working, and my attempts
to contact him are failing. Let's remove it to trim down the
list of TLS maintainers.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch is to improve the tun_info options_len by dropping
the skb when TUNNEL_VXLAN_OPT is set but options_len is less
than vxlan_metadata. This can void a potential out-of-bounds
access on ip_tun_info.
Fixes: ee122c79d4 ("vxlan: Flow based tunneling")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The check for !md doens't really work for ip_tunnel_info_opts(info) which
only does info + 1. Also to avoid out-of-bounds access on info, it should
ensure options_len is not less than erspan_metadata in both erspan_xmit()
and ip6erspan_tunnel_xmit().
Fixes: 1a66a836da ("gre: add collect_md mode to ERSPAN tunnel")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is due to error in over budget processing.
When dealing with high throughput, the used buffers
that exceeds the budget is not cleaned up. In addition,
it takes a lot of cycles to clean up the used buffer,
and then the buffer where the valid data is located can take effect.
Signed-off-by: Jiangfeng Xiao <xiaojiangfeng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Prior to this patch, the amount of counters guaranteed per VF in the
resource tracker was MLX4_VF_COUNTERS_PER_PORT * MLX4_MAX_PORTS. It was
set regardless if the VF was single or dual port.
This caused several VFs to have no guaranteed counters although the
system could satisfy their request.
The fix is to dynamically guarantee counters, based on each VF
specification.
Fixes: 9de92c60be ("net/mlx4_core: Adjust counter grant policy in the resource tracker")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Initialize link modes bitmap on stack before using it, otherwise the
outcome of ethtool set link ksettings might have unexpected values.
Fixes: 4b95840a6c ("net/mlx5e: Fix matching of speed to PRM link modes")
Signed-off-by: Aya Levin <ayal@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Ethtool self test contains a test for link speed. This test reads the
PTYS register and determines whether the current speed is valid or not.
Change current implementation to use the function mlx5e_port_linkspeed()
that does the same check and fails when speed is invalid. This code
redundancy lead to a bug when mlx5e_port_linkspeed() was updated with
expended speeds and the self test was not.
Fixes: 2c81bfd5ae ("net/mlx5e: Move port speed code from en_ethtool.c to en/port.c")
Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
When CQE compression is enabled, compressed CQEs use the following
structure: a title is followed by one or many blocks, each containing 8
mini CQEs (except the last, which may contain fewer mini CQEs).
Due to NAPI budget restriction, a complete structure is not always
parsed in one NAPI run, and some blocks with mini CQEs may be deferred
to the next NAPI poll call - we have the mlx5e_decompress_cqes_cont call
in the beginning of mlx5e_poll_rx_cq. However, if the budget is
extremely low, some blocks may be left even after that, but the code
that follows the mlx5e_decompress_cqes_cont call doesn't check it and
assumes that a new CQE begins, which may not be the case. In such cases,
random memory corruptions occur.
An extremely low NAPI budget of 8 is used when busy_poll or busy_read is
active.
This commit adds a check to make sure that the previous compressed CQE
has been completely parsed after mlx5e_decompress_cqes_cont, otherwise
it prevents a new CQE from being fetched in the middle of a compressed
CQE.
This commit fixes random crashes in __build_skb, __page_pool_put_page
and other not-related-directly places, that used to happen when both CQE
compression and busy_poll/busy_read were enabled.
Fixes: 7219ab34f1 ("net/mlx5e: CQE compression")
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
The cited commit refactored the encap id into a struct pointed from the
destination.
Bug fix for the case there is no encap for one of the destinations.
Fixes: 2b688ea5ef ("net/mlx5: Add flow steering actions to fs_cmd shim layer")
Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
If the rt entry gateway family is not AF_INET for multipath device,
rtable reference is leaked.
Hence, fix it by releasing the reference.
Fixes: 5fb091e813 ("net/mlx5e: Use hint to resolve route when in HW multipath mode")
Fixes: e32ee6c78e ("net/mlx5e: Support tunnel encap over tagged Ethernet")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
When encap entry initialization completes successfully e->compl_result is
set to positive value and not zero, like mlx5e_rep_update_flows() assumes
at the moment. Fix the conditional to only skip encap flows update when
e->compl_result < 0.
Fixes: 2a1f1768fa ("net/mlx5e: Refactor neigh update for concurrent execution")
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Driver have function, which enable match criteria for misc parameters
in dependence of eswitch capabilities.
Fixes: 4f5d1beadc ("Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux")
Signed-off-by: Dmytro Linkin <dmitrolin@mellanox.com>
Reviewed-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Termination tables are used for vlan push actions on uplink ports.
To support RoCE dual port the source port value was placed in a register.
Fix the code to use an API method returning the source port according to
the FW capabilities.
Fixes: 10caabdaad ("net/mlx5e: Use termination table for VLAN push actions")
Signed-off-by: Dmytro Linkin <dmitrolin@mellanox.com>
Reviewed-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
The union should contain the extended dest and counter list.
Remove the resevered 0x40 bits which is redundant.
This change doesn't break any functionally.
Everything works today because the code in fs_cmd.c is using
the correct structs if extended dest or the basic dest.
Fixes: 1b11549859 ("net/mlx5: Introduce extended destination fields")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Vladimir Oltean says:
====================
VLAN fixes for Ocelot switch
This series addresses 2 issues with vlan_filtering=1:
- Untagged traffic gets dropped unless commands are run in a very
specific order.
- Untagged traffic starts being transmitted as tagged after adding
another untagged VID on the port.
Tested on NXP LS1028A-RDB board.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The switch driver keeps a "vid" variable per port, which signifies _the_
VLAN ID that is stripped on that port's egress (aka the native VLAN on a
trunk port).
That is the way the hardware is designed (mostly). The port->vid is
programmed into REW:PORT:PORT_VLAN_CFG:PORT_VID and the rewriter is told
to send all traffic as tagged except the one having port->vid.
There exists a possibility of finer-grained egress untagging decisions:
using the VCAP IS1 engine, one rule can be added to match every
VLAN-tagged frame whose VLAN should be untagged, and set POP_CNT=1 as
action. However, the IS1 can hold at most 512 entries, and the VLANs are
in the order of 6 * 4096.
So the code is fine for now. But this sequence of commands:
$ bridge vlan add dev swp0 vid 1 pvid untagged
$ bridge vlan add dev swp0 vid 2 untagged
makes untagged and pvid-tagged traffic be sent out of swp0 as tagged
with VID 1, despite user's request.
Prevent that from happening. The user should temporarily remove the
existing untagged VLAN (1 in this case), add it back as tagged, and then
add the new untagged VLAN (2 in this case).
Cc: Antoine Tenart <antoine.tenart@bootlin.com>
Cc: Alexandre Belloni <alexandre.belloni@bootlin.com>
Fixes: 7142529f16 ("net: mscc: ocelot: add VLAN filtering")
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Background information: the driver operates the hardware in a mode where
a single VLAN can be transmitted as untagged on a particular egress
port. That is the "native VLAN on trunk port" use case. Its value is
held in port->vid.
Consider the following command sequence (no network manager, all
interfaces are down, debugging prints added by me):
$ ip link add dev br0 type bridge vlan_filtering 1
$ ip link set dev swp0 master br0
Kernel code path during last command:
br_add_slave -> ocelot_netdevice_port_event (NETDEV_CHANGEUPPER):
[ 21.401901] ocelot_vlan_port_apply: port 0 vlan aware 0 pvid 0 vid 0
br_add_slave -> nbp_vlan_init -> switchdev_port_attr_set -> ocelot_port_attr_set (SWITCHDEV_ATTR_ID_BRIDGE_VLAN_FILTERING):
[ 21.413335] ocelot_vlan_port_apply: port 0 vlan aware 1 pvid 0 vid 0
br_add_slave -> nbp_vlan_init -> nbp_vlan_add -> br_switchdev_port_vlan_add -> switchdev_port_obj_add -> ocelot_port_obj_add -> ocelot_vlan_vid_add
[ 21.667421] ocelot_vlan_port_apply: port 0 vlan aware 1 pvid 1 vid 1
So far so good. The bridge has replaced the driver's default pvid used
in standalone mode (0) with its own default_pvid (1). The port's vid
(native VLAN) has also changed from 0 to 1.
$ ip link set dev swp0 up
[ 31.722956] 8021q: adding VLAN 0 to HW filter on device swp0
do_setlink -> dev_change_flags -> vlan_vid_add -> ocelot_vlan_rx_add_vid -> ocelot_vlan_vid_add:
[ 31.728700] ocelot_vlan_port_apply: port 0 vlan aware 1 pvid 1 vid 0
The 8021q module uses the .ndo_vlan_rx_add_vid API on .ndo_open to make
ports be able to transmit and receive 802.1p-tagged traffic by default.
This API is supposed to offload a VLAN sub-interface, which for a switch
port means to add a VLAN that is not a pvid, and tagged on egress.
But the driver implementation of .ndo_vlan_rx_add_vid is wrong: it adds
back vid 0 as "egress untagged". Now back to the initial paragraph:
there is a single untagged VID that the driver keeps track of, and that
has just changed from 1 (the pvid) to 0. So this breaks the bridge
core's expectation, because it has changed vid 1 from untagged to
tagged, when what the user sees is.
$ bridge vlan
port vlan ids
swp0 1 PVID Egress Untagged
br0 1 PVID Egress Untagged
But curiously, instead of manifesting itself as "untagged and
pvid-tagged traffic gets sent as tagged on egress", the bug:
- is hidden when vlan_filtering=0
- manifests as dropped traffic when vlan_filtering=1, due to this setting:
if (port->vlan_aware && !port->vid)
/* If port is vlan-aware and tagged, drop untagged and priority
* tagged frames.
*/
val |= ANA_PORT_DROP_CFG_DROP_UNTAGGED_ENA |
ANA_PORT_DROP_CFG_DROP_PRIO_S_TAGGED_ENA |
ANA_PORT_DROP_CFG_DROP_PRIO_C_TAGGED_ENA;
which would have made sense if it weren't for this bug. The setting's
intention was "this is a trunk port with no native VLAN, so don't accept
untagged traffic". So the driver was never expecting to set VLAN 0 as
the value of the native VLAN, 0 was just encoding for "invalid".
So the fix is to not send 802.1p traffic as untagged, because that would
change the port's native vlan to 0, unbeknownst to the bridge, and
trigger unexpected code paths in the driver.
Cc: Antoine Tenart <antoine.tenart@bootlin.com>
Cc: Alexandre Belloni <alexandre.belloni@bootlin.com>
Fixes: 7142529f16 ("net: mscc: ocelot: add VLAN filtering")
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In the implementation of i2400m_op_rfkill_sw_toggle() the allocated
buffer for cmd should be released before returning. The
documentation for i2400m_msg_to_dev() says when it returns the buffer
can be reused. Meaning cmd should be released in either case. Move
kfree(cmd) before return to be reached by all execution paths.
Fixes: 2507e6ab7a ("wimax: i2400: fix memory leak")
Signed-off-by: Navid Emamdoost <navid.emamdoost@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It seems that killing an application while faults are occurring
(particularly with a GPU in FPGA at a whopping 40MHz) can lead to
handling a lingering page fault after all the address space contexts
have already been freed. In this situation, the LRU list is empty so
addr_to_drm_mm_node() ends up dereferencing the list head as if it were
a struct panfrost_mmu entry; this leaves "mmu->as" actually pointing at
the pfdev->alloc_mask bitmap, which is also empty, and given that the
fault has a high likelihood of being in AS0, hilarity ensues.
Sadly, the cleanest solution seems to involve another goto. Oh well, at
least it's robust...
Fixes: 65e51e30d8 ("drm/panfrost: Prevent race when handling page fault")
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Rob Herring <robh@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/9a0b09e6b5851f0d4428b72dd6b8b4c0d0ef4206.1572293305.git.robin.murphy@arm.com
We get these warnings when build kernel W=1:
drivers/gpu/drm/panfrost/panfrost_perfcnt.c:35:6: warning: no previous prototype for ‘panfrost_perfcnt_clean_cache_done’ [-Wmissing-prototypes]
drivers/gpu/drm/panfrost/panfrost_perfcnt.c:40:6: warning: no previous prototype for ‘panfrost_perfcnt_sample_done’ [-Wmissing-prototypes]
drivers/gpu/drm/panfrost/panfrost_perfcnt.c:190:5: warning: no previous prototype for ‘panfrost_ioctl_perfcnt_enable’ [-Wmissing-prototypes]
drivers/gpu/drm/panfrost/panfrost_perfcnt.c:218:5: warning: no previous prototype for ‘panfrost_ioctl_perfcnt_dump’ [-Wmissing-prototypes]
drivers/gpu/drm/panfrost/panfrost_perfcnt.c:250:6: warning: no previous prototype for ‘panfrost_perfcnt_close’ [-Wmissing-prototypes]
drivers/gpu/drm/panfrost/panfrost_perfcnt.c:264:5: warning: no previous prototype for ‘panfrost_perfcnt_init’ [-Wmissing-prototypes]
drivers/gpu/drm/panfrost/panfrost_perfcnt.c:320:6: warning: no previous prototype for ‘panfrost_perfcnt_fini’ [-Wmissing-prototypes]
drivers/gpu/drm/panfrost/panfrost_mmu.c:227:6: warning: no previous prototype for ‘panfrost_mmu_flush_range’ [-Wmissing-prototypes]
drivers/gpu/drm/panfrost/panfrost_mmu.c:435:5: warning: no previous prototype for ‘panfrost_mmu_map_fault_addr’ [-Wmissing-prototypes]
For file panfrost_mmu.c, make functions static to fix this.
For file panfrost_perfcnt.c, include header file can fix this.
Signed-off-by: Yi Wang <wang.yi59@zte.com.cn>
Reviewed-by: Steven Price <steven.price@arm.com>
Cc: stable@vger.kernel.org
[robh: fixup function parameter alignment]
Signed-off-by: Rob Herring <robh@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/1571967015-42854-1-git-send-email-wang.yi59@zte.com.cn
When rmmod hip04_eth.ko, we can get the following warning:
Task track: rmmod(1623)>bash(1591)>login(1581)>init(1)
------------[ cut here ]------------
WARNING: CPU: 0 PID: 1623 at kernel/irq/manage.c:1557 __free_irq+0xa4/0x2ac()
Trying to free already-free IRQ 200
Modules linked in: ping(O) pramdisk(O) cpuinfo(O) rtos_snapshot(O) interrupt_ctrl(O) mtdblock mtd_blkdevrtfs nfs_acl nfs lockd grace sunrpc xt_tcpudp ipt_REJECT iptable_filter ip_tables x_tables nf_reject_ipv
CPU: 0 PID: 1623 Comm: rmmod Tainted: G O 4.4.193 #1
Hardware name: Hisilicon A15
[<c020b408>] (rtos_unwind_backtrace) from [<c0206624>] (show_stack+0x10/0x14)
[<c0206624>] (show_stack) from [<c03f2be4>] (dump_stack+0xa0/0xd8)
[<c03f2be4>] (dump_stack) from [<c021a780>] (warn_slowpath_common+0x84/0xb0)
[<c021a780>] (warn_slowpath_common) from [<c021a7e8>] (warn_slowpath_fmt+0x3c/0x68)
[<c021a7e8>] (warn_slowpath_fmt) from [<c026876c>] (__free_irq+0xa4/0x2ac)
[<c026876c>] (__free_irq) from [<c0268a14>] (free_irq+0x60/0x7c)
[<c0268a14>] (free_irq) from [<c0469e80>] (release_nodes+0x1c4/0x1ec)
[<c0469e80>] (release_nodes) from [<c0466924>] (__device_release_driver+0xa8/0x104)
[<c0466924>] (__device_release_driver) from [<c0466a80>] (driver_detach+0xd0/0xf8)
[<c0466a80>] (driver_detach) from [<c0465e18>] (bus_remove_driver+0x64/0x8c)
[<c0465e18>] (bus_remove_driver) from [<c02935b0>] (SyS_delete_module+0x198/0x1e0)
[<c02935b0>] (SyS_delete_module) from [<c0202ed0>] (__sys_trace_return+0x0/0x10)
---[ end trace bb25d6123d849b44 ]---
Currently "rmmod hip04_eth.ko" call free_irq more than once
as devres_release_all and hip04_remove both call free_irq.
This results in a 'Trying to free already-free IRQ' warning.
To solve the problem free_irq has been moved out of hip04_remove.
Signed-off-by: Jiangfeng Xiao <xiaojiangfeng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With the introduction of 'cce360b54ce6 ("arm64: capabilities: Filter the
entries based on a given mask")' the Qualcomm Falkor/Kryo errata 1003 is
no long applied.
The result of not applying errata 1003 is that MSM8996 runs into various
RCU stalls and fails to boot most of the times.
Give 1003 a "type" to ensure they are not filtered out in
update_cpu_capabilities().
Fixes: cce360b54c ("arm64: capabilities: Filter the entries based on a given mask")
Cc: stable@vger.kernel.org
Reported-by: Mark Brown <broonie@kernel.org>
Suggested-by: Will Deacon <will@kernel.org>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Will Deacon <will@kernel.org>
etnaviv_iommuv2_dump_size(..) returns the number of PTE * SZ_4K but
etnaviv_iommuv2_dump(..) increments buf pointer even if there is no PTE.
This results in a bad buf pointer which gets used for memcpy(..), when
copying the MMU state in the coredump buffer.
Fixes: afb7b3b1de ("drm/etnaviv: implement IOMMUv2 translation")
Cc: stable@vger.kernel.org
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
The switch to per-process address spaces erroneously dropped the check
which validated that the command buffer is mapped through the linear
apperture as required by the hardware. This turned a system
misconfiguration with a helpful error message into a very hard to
debug issue. Reinstate the check at the appropriate location.
Fixes: 17e4660ae3 (drm/etnaviv: implement per-process address spaces on MMUv2)
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Guido Günther <agx@sigxcpu.org>
The GPU coredump function violates the locking order by holding the MMU
context lock while trying to acquire the etnaviv_gem_object lock. This
results in a possible ABBA deadlock with other codepaths which follow
the established locking order.
Fortunately this is easy to fix by dropping the MMU context lock
earlier, as the BO dumping doesn't need the MMU context to be stable.
The only thing the BO dumping cares about are the BO mappings, which
are stable across the lifetime of the job.
Fixes: 27b67278e0 (drm/etnaviv: rework MMU handling)
[ Not really the first bad commit, but the one where this fix applies
cleanly. Stable kernels need a manual backport. ]
Reported-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Tested-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Pull fuse fixes from Miklos Szeredi:
"Mostly virtiofs fixes, but also fixes a regression and couple of
longstanding data/metadata writeback ordering issues"
* tag 'fuse-fixes-5.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
fuse: redundant get_fuse_inode() calls in fuse_writepages_fill()
fuse: Add changelog entries for protocols 7.1 - 7.8
fuse: truncate pending writes on O_TRUNC
fuse: flush dirty data/metadata before non-truncate setattr
virtiofs: Remove set but not used variable 'fc'
virtiofs: Retry request submission from worker context
virtiofs: Count pending forgets as in_flight forgets
virtiofs: Set FR_SENT flag only after request has been sent
virtiofs: No need to check fpq->connected state
virtiofs: Do not end request in submission context
fuse: don't advise readdirplus for negative lookup
fuse: don't dereference req->args on finished request
virtio-fs: don't show mount options
virtio-fs: Change module name to virtiofs.ko
Shared and writable mappings (__S.1.) should be clean (!dirty) initially
and made dirty on a subsequent write either through the hardware DBM
(dirty bit management) mechanism or through a write page fault. A clean
pte for the arm64 kernel is one that has PTE_RDONLY set and PTE_DIRTY
clear.
The PAGE_SHARED{,_EXEC} attributes have PTE_WRITE set (PTE_DBM) and
PTE_DIRTY clear. Prior to commit 73e86cb03c ("arm64: Move PTE_RDONLY
bit handling out of set_pte_at()"), it was the responsibility of
set_pte_at() to set the PTE_RDONLY bit and mark the pte clean if the
software PTE_DIRTY bit was not set. However, the above commit removed
the pte_sw_dirty() check and the subsequent setting of PTE_RDONLY in
set_pte_at() while leaving the PAGE_SHARED{,_EXEC} definitions
unchanged. The result is that shared+writable mappings are now dirty by
default
Fix the above by explicitly setting PTE_RDONLY in PAGE_SHARED{,_EXEC}.
In addition, remove the superfluous PTE_DIRTY bit from the kernel PROT_*
attributes.
Fixes: 73e86cb03c ("arm64: Move PTE_RDONLY bit handling out of set_pte_at()")
Cc: <stable@vger.kernel.org> # 4.14.x-
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
The following scenario results in an IO hang:
1) ctrl completes a request with NVME_SC_ANA_TRANSITION.
NVME_NS_ANA_PENDING bit in ns->flags is set and ana_work is triggered.
2) ana_work: nvme_read_ana_log() tries to get the ANA log page from the ctrl.
This fails because ctrl disconnects.
Therefore nvme_update_ns_ana_state() is not called
and NVME_NS_ANA_PENDING bit in ns->flags is not cleared.
3) ctrl reconnects: nvme_mpath_init(ctrl,...) calls
nvme_read_ana_log(ctrl, groups_only=true).
However, nvme_update_ana_state() does not update namespaces
because nr_nsids = 0 (due to groups_only mode).
4) scan_work calls nvme_validate_ns() finds the ns and re-validates OK.
Result:
The ctrl is now live but NVME_NS_ANA_PENDING bit in ns->flags is still set.
Consequently ctrl will never be considered a viable path by __nvme_find_path().
IO will hang if ctrl is the only or the last path to the namespace.
More generally, while ctrl is reconnecting, its ANA state may change.
And because nvme_mpath_init() requests ANA log in groups_only mode,
these changes are not propagated to the existing ctrl namespaces.
This may result in a mal-function or an IO hang.
Solution:
nvme_mpath_init() will nvme_read_ana_log() with groups_only set to false.
This will not harm the new ctrl case (no namespaces present),
and will make sure the ANA state of namespaces gets updated after reconnect.
Note: Another option would be for nvme_mpath_init() to invoke
nvme_parse_ana_log(..., nvme_set_ns_ana_state) for each existing namespace.
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Anton Eidelman <anton@lightbitslabs.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Commit e78a7614f3 ("idle: Prevent late-arriving interrupts from
disrupting offline") changes arch_cpu_idle_dead to be called with
interrupts disabled, which triggers the WARN in pnv_smp_cpu_kill_self.
Fix this by fixing up irq_happened after hard disabling, rather than
requiring there are no pending interrupts, similarly to what was done
done until commit 2525db04d1 ("powerpc/powernv: Simplify lazy IRQ
handling in CPU offline").
Fixes: e78a7614f3 ("idle: Prevent late-arriving interrupts from disrupting offline")
Reported-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Add unexpected_mask rather than checking for known bad values,
change the WARN_ON() to a WARN_ON_ONCE()]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191022115814.22456-1-npiggin@gmail.com
While the static key is correctly initialized as being disabled, it will
remain forever enabled once turned on. This means that if we start with an
asymmetric system and hotplug out enough CPUs to end up with an SMP system,
the static key will remain set - which is obviously wrong. We should detect
this and turn off things like misfit migration and capacity aware wakeups.
As Quentin pointed out, having separate root domains makes this slightly
trickier. We could have exclusive cpusets that create an SMP island - IOW,
the domains within this root domain will not see any asymmetry. This means
we can't just disable the key on domain destruction, we need to count how
many asymmetric root domains we have.
Consider the following example using Juno r0 which is 2+4 big.LITTLE, where
two identical cpusets are created: they both span both big and LITTLE CPUs:
asym0 asym1
[ ][ ]
L L B L L B
$ cgcreate -g cpuset:asym0
$ cgset -r cpuset.cpus=0,1,3 asym0
$ cgset -r cpuset.mems=0 asym0
$ cgset -r cpuset.cpu_exclusive=1 asym0
$ cgcreate -g cpuset:asym1
$ cgset -r cpuset.cpus=2,4,5 asym1
$ cgset -r cpuset.mems=0 asym1
$ cgset -r cpuset.cpu_exclusive=1 asym1
$ cgset -r cpuset.sched_load_balance=0 .
(the CPU numbering may look odd because on the Juno LITTLEs are CPUs 0,3-5
and bigs are CPUs 1-2)
If we make one of those SMP (IOW remove asymmetry) by e.g. hotplugging its
big core, we would end up with an SMP cpuset and an asymmetric cpuset - the
static key must remain set, because we still have one asymmetric root domain.
With the above example, this could be done with:
$ echo 0 > /sys/devices/system/cpu/cpu2/online
Which would result in:
asym0 asym1
[ ][ ]
L L B L L
When both SMP and asymmetric cpusets are present, all CPUs will observe
sched_asym_cpucapacity being set (it is system-wide), but not all CPUs
observe asymmetry in their sched domain hierarchy:
per_cpu(sd_asym_cpucapacity, <any CPU in asym0>) == <some SD at DIE level>
per_cpu(sd_asym_cpucapacity, <any CPU in asym1>) == NULL
Change the simple key enablement to an increment, and decrement the key
counter when destroying domains that cover asymmetric CPUs.
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Dietmar.Eggemann@arm.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: hannes@cmpxchg.org
Cc: lizefan@huawei.com
Cc: morten.rasmussen@arm.com
Cc: qperret@google.com
Cc: tj@kernel.org
Cc: vincent.guittot@linaro.org
Fixes: df054e8445 ("sched/topology: Add static_key for asymmetric CPU capacity optimizations")
Link: https://lkml.kernel.org/r/20191023153745.19515-3-valentin.schneider@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Endpoints with a maxpacket length of 0 are probably useless. They
can't transfer any data, and it's not at all unlikely that a UDC will
crash or hang when trying to handle a non-zero-length usb_request for
such an endpoint. Indeed, dummy-hcd gets a divide error when trying
to calculate the remainder of a transfer length by the maxpacket
value, as discovered by the syzbot fuzzer.
Currently the gadget core does not check for endpoints having a
maxpacket value of 0. This patch adds a check to usb_ep_enable(),
preventing such endpoints from being used.
As far as I know, none of the gadget drivers in the kernel tries to
create an endpoint with maxpacket = 0, but until now there has been
nothing to prevent userspace programs under gadgetfs or configfs from
doing it.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reported-and-tested-by: syzbot+8ab8bf161038a8768553@syzkaller.appspotmail.com
CC: <stable@vger.kernel.org>
Acked-by: Felipe Balbi <balbi@kernel.org>
Link: https://lore.kernel.org/r/Pine.LNX.4.44L0.1910281052370.1485-100000@iolanthe.rowland.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The ultravisor will do an integrity check of the kernel image but we
relocated it so the check will fail. Restore the original image by
relocating it back to the kernel virtual base address.
This works because during build vmlinux is linked with an expected
virtual runtime address of KERNELBASE.
Fixes: 6a9c930bd7 ("powerpc/prom_init: Add the ESM call to prom_init")
Signed-off-by: Thiago Jung Bauermann <bauerman@linux.ibm.com>
Tested-by: Michael Anderson <andmike@linux.ibm.com>
[mpe: Add IS_ENABLED() to fix the CONFIG_RELOCATABLE=n build]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190911163433.12822-1-bauerman@linux.ibm.com
In shutdown/reboot paths, the timer is not stopped:
qla2x00_shutdown
pci_device_shutdown
device_shutdown
kernel_restart_prepare
kernel_restart
sys_reboot
This causes lockups (on powerpc) when firmware config space access calls
are interrupted by smp_send_stop later in reboot.
Fixes: e30d175648 ("[SCSI] qla2xxx: Addition of shutdown callback handler.")
Link: https://lore.kernel.org/r/20191024063804.14538-1-npiggin@gmail.com
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Acked-by: Himanshu Madhani <hmadhani@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
After introducing "samples" to the calculation of wait time, the
driver might timeout at the regmap_field_read_poll_timeout call,
because the wait time could be longer than the 100000 usec limit
due to a large "samples" number.
So this patch sets the timeout limit to 2 times of the wait time
in order to fix this issue.
Fixes: 5c090abf94 ("hwmon: (ina3221) Add averaging mode support")
Signed-off-by: Nicolin Chen <nicoleotsuka@gmail.com>
Link: https://lore.kernel.org/r/20191022005922.30239-1-nicoleotsuka@gmail.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Simon Wunderlich says:
====================
Here are two batman-adv bugfixes:
* Fix free/alloc race for OGM and OGMv2, by Sven Eckelmann (2 patches)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
An earlier bugfix introduced a dependency on CONFIG_NET_SCH_TAPRIO,
but this missed the case of NET_SCH_TAPRIO=m and NET_DSA_SJA1105=y,
which still causes a link error:
drivers/net/dsa/sja1105/sja1105_tas.o: In function `sja1105_setup_tc_taprio':
sja1105_tas.c:(.text+0x5c): undefined reference to `taprio_offload_free'
sja1105_tas.c:(.text+0x3b4): undefined reference to `taprio_offload_get'
drivers/net/dsa/sja1105/sja1105_tas.o: In function `sja1105_tas_teardown':
sja1105_tas.c:(.text+0x6ec): undefined reference to `taprio_offload_free'
Change the dependency to only allow selecting the TAS code when it
can link against the taprio code.
Fixes: a8d570de0c ("net: dsa: sja1105: Add dependency for NET_DSA_SJA1105_TAS")
Fixes: 317ab5b86c ("net: dsa: sja1105: Configure the Time-Aware Scheduler via tc-taprio offload")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
We are calling the checksum helper after the dma_map_single()
call to map the packet. This is incorrect as the checksumming
code will touch the packet from the CPU. This means the cache
won't be properly flushes (or the bounce buffering will leave
us with the unmodified packet to DMA).
This moves the calculation of the checksum & vlan tags to
before the DMA mapping.
This also has the side effect of fixing another bug: If the
checksum helper fails, we goto "drop" to drop the packet, which
will not unmap the DMA mapping.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Fixes: 05690d633f ("ftgmac100: Upgrade to NETIF_F_HW_CSUM")
Reviewed-by: Vijay Khemka <vijaykhemka@fb.com>
Tested-by: Vijay Khemka <vijaykhemka@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
sk_page_frag() optimizes skb_frag allocations by using per-task
skb_frag cache when it knows it's the only user. The condition is
determined by seeing whether the socket allocation mask allows
blocking - if the allocation may block, it obviously owns the task's
context and ergo exclusively owns current->task_frag.
Unfortunately, this misses recursion through memory reclaim path.
Please take a look at the following backtrace.
[2] RIP: 0010:tcp_sendmsg_locked+0xccf/0xe10
...
tcp_sendmsg+0x27/0x40
sock_sendmsg+0x30/0x40
sock_xmit.isra.24+0xa1/0x170 [nbd]
nbd_send_cmd+0x1d2/0x690 [nbd]
nbd_queue_rq+0x1b5/0x3b0 [nbd]
__blk_mq_try_issue_directly+0x108/0x1b0
blk_mq_request_issue_directly+0xbd/0xe0
blk_mq_try_issue_list_directly+0x41/0xb0
blk_mq_sched_insert_requests+0xa2/0xe0
blk_mq_flush_plug_list+0x205/0x2a0
blk_flush_plug_list+0xc3/0xf0
[1] blk_finish_plug+0x21/0x2e
_xfs_buf_ioapply+0x313/0x460
__xfs_buf_submit+0x67/0x220
xfs_buf_read_map+0x113/0x1a0
xfs_trans_read_buf_map+0xbf/0x330
xfs_btree_read_buf_block.constprop.42+0x95/0xd0
xfs_btree_lookup_get_block+0x95/0x170
xfs_btree_lookup+0xcc/0x470
xfs_bmap_del_extent_real+0x254/0x9a0
__xfs_bunmapi+0x45c/0xab0
xfs_bunmapi+0x15/0x30
xfs_itruncate_extents_flags+0xca/0x250
xfs_free_eofblocks+0x181/0x1e0
xfs_fs_destroy_inode+0xa8/0x1b0
destroy_inode+0x38/0x70
dispose_list+0x35/0x50
prune_icache_sb+0x52/0x70
super_cache_scan+0x120/0x1a0
do_shrink_slab+0x120/0x290
shrink_slab+0x216/0x2b0
shrink_node+0x1b6/0x4a0
do_try_to_free_pages+0xc6/0x370
try_to_free_mem_cgroup_pages+0xe3/0x1e0
try_charge+0x29e/0x790
mem_cgroup_charge_skmem+0x6a/0x100
__sk_mem_raise_allocated+0x18e/0x390
__sk_mem_schedule+0x2a/0x40
[0] tcp_sendmsg_locked+0x8eb/0xe10
tcp_sendmsg+0x27/0x40
sock_sendmsg+0x30/0x40
___sys_sendmsg+0x26d/0x2b0
__sys_sendmsg+0x57/0xa0
do_syscall_64+0x42/0x100
entry_SYSCALL_64_after_hwframe+0x44/0xa9
In [0], tcp_send_msg_locked() was using current->page_frag when it
called sk_wmem_schedule(). It already calculated how many bytes can
be fit into current->page_frag. Due to memory pressure,
sk_wmem_schedule() called into memory reclaim path which called into
xfs and then IO issue path. Because the filesystem in question is
backed by nbd, the control goes back into the tcp layer - back into
tcp_sendmsg_locked().
nbd sets sk_allocation to (GFP_NOIO | __GFP_MEMALLOC) which makes
sense - it's in the process of freeing memory and wants to be able to,
e.g., drop clean pages to make forward progress. However, this
confused sk_page_frag() called from [2]. Because it only tests
whether the allocation allows blocking which it does, it now thinks
current->page_frag can be used again although it already was being
used in [0].
After [2] used current->page_frag, the offset would be increased by
the used amount. When the control returns to [0],
current->page_frag's offset is increased and the previously calculated
number of bytes now may overrun the end of allocated memory leading to
silent memory corruptions.
Fix it by adding gfpflags_normal_context() which tests sleepable &&
!reclaim and use it to determine whether to use current->task_frag.
v2: Eric didn't like gfp flags being tested twice. Introduce a new
helper gfpflags_normal_context() and combine the two tests.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
KCSAN reported a data-race in udp_set_dev_scratch() [1]
The issue here is that we must not write over skb fields
if skb is shared. A similar issue has been fixed in commit
89c22d8c3b ("net: Fix skb csum races when peeking")
While we are at it, use a helper only dealing with
udp_skb_scratch(skb)->csum_unnecessary, as this allows
udp_set_dev_scratch() to be called once and thus inlined.
[1]
BUG: KCSAN: data-race in udp_set_dev_scratch / udpv6_recvmsg
write to 0xffff888120278317 of 1 bytes by task 10411 on cpu 1:
udp_set_dev_scratch+0xea/0x200 net/ipv4/udp.c:1308
__first_packet_length+0x147/0x420 net/ipv4/udp.c:1556
first_packet_length+0x68/0x2a0 net/ipv4/udp.c:1579
udp_poll+0xea/0x110 net/ipv4/udp.c:2720
sock_poll+0xed/0x250 net/socket.c:1256
vfs_poll include/linux/poll.h:90 [inline]
do_select+0x7d0/0x1020 fs/select.c:534
core_sys_select+0x381/0x550 fs/select.c:677
do_pselect.constprop.0+0x11d/0x160 fs/select.c:759
__do_sys_pselect6 fs/select.c:784 [inline]
__se_sys_pselect6 fs/select.c:769 [inline]
__x64_sys_pselect6+0x12e/0x170 fs/select.c:769
do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290
entry_SYSCALL_64_after_hwframe+0x44/0xa9
read to 0xffff888120278317 of 1 bytes by task 10413 on cpu 0:
udp_skb_csum_unnecessary include/net/udp.h:358 [inline]
udpv6_recvmsg+0x43e/0xe90 net/ipv6/udp.c:310
inet6_recvmsg+0xbb/0x240 net/ipv6/af_inet6.c:592
sock_recvmsg_nosec+0x5c/0x70 net/socket.c:871
___sys_recvmsg+0x1a0/0x3e0 net/socket.c:2480
do_recvmmsg+0x19a/0x5c0 net/socket.c:2601
__sys_recvmmsg+0x1ef/0x200 net/socket.c:2680
__do_sys_recvmmsg net/socket.c:2703 [inline]
__se_sys_recvmmsg net/socket.c:2696 [inline]
__x64_sys_recvmmsg+0x89/0xb0 net/socket.c:2696
do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 10413 Comm: syz-executor.0 Not tainted 5.4.0-rc3+ #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Fixes: 2276f58ac5 ("udp: use a separate rx queue for packet reception")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch corrects the SPDX License Identifier style in
header files related to DPAA2 Ethernet driver supporting
Freescale SoCs with DPAA2. For C header files
Documentation/process/license-rules.rst mandates C-like comments
(opposed to C source files where C++ style should be used)
Changes made by using a script provided by Joe Perches here:
https://lkml.org/lkml/2019/2/7/46.
Suggested-by: Joe Perches <joe@perches.com>
Signed-off-by: Nishad Kamdar <nishadkamdar@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet says:
====================
net: avoid KCSAN splats
Often times we use skb_queue_empty() without holding a lock,
meaning that other cpus (or interrupt) can change the queue
under us. This is fine, but we need to properly annotate
the lockless intent to make sure the compiler wont over
optimize things.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Busy polling usually runs without locks.
Let's use skb_queue_empty_lockless() instead of skb_queue_empty()
Also uses READ_ONCE() in __skb_try_recv_datagram() to address
a similar potential problem.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Many poll() handlers are lockless. Using skb_queue_empty_lockless()
instead of skb_queue_empty() is more appropriate.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some paths call skb_queue_empty() without holding
the queue lock. We must use a barrier in order
to not let the compiler do strange things, and avoid
KCSAN splats.
Adding a barrier in skb_queue_empty() might be overkill,
I prefer adding a new helper to clearly identify
points where the callers might be lockless. This might
help us finding real bugs.
The corresponding WRITE_ONCE() should add zero cost
for current compilers.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull ARC fixes from Vineet Gupta:
"Small fixes for ARC:
- perf fix for Big Endian build [Alexey]
- hadk platform enable soem peripherals [Eugeniy]"
* tag 'arc-5.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
ARC: perf: Accommodate big-endian CPU
ARC: [plat-hsdk]: Enable on-boardi SPI ADC IC
ARC: [plat-hsdk]: Enable on-board SPI NOR flash IC
For legacy I/O BARs (non-MMIO BARs) to work correctly on RISC-V Linux,
we need to establish a reserved memory region for them, so that drivers
that wish to use the legacy I/O BARs can issue reads and writes against
a memory region that is mapped to the host PCIe controller's I/O BAR
mapping.
Signed-off-by: Yash Shah <yash.shah@sifive.com>
Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
Commit 3ae62a4209 ("UAS: fix alignment of scatter/gather segments"),
copying a similar commit for usb-storage, attempted to solve a problem
involving scatter-gather I/O and USB/IP by setting the
virt_boundary_mask for mass-storage devices.
However, it now turns out that the analogous change in usb-storage
interacted badly with commit 09324d32d2 ("block: force an unlimited
segment size on queues with a virt boundary"), which was added later.
A typical error message is:
ehci-pci 0000:00:13.2: swiotlb buffer is full (sz: 327680 bytes),
total 32768 (slots), used 97 (slots)
There is no longer any reason to keep the virt_boundary_mask setting
in the uas driver. It was needed in the first place only for
handling devices with a block size smaller than the maxpacket size and
where the host controller was not capable of fully general
scatter-gather operation (that is, able to merge two SG segments into
a single USB packet). But:
High-speed or slower connections never use a bulk maxpacket
value larger than 512;
The SCSI layer does not handle block devices with a block size
smaller than 512 bytes;
All the host controllers capable of SuperSpeed operation can
handle fully general SG;
Since commit ea44d19076 ("usbip: Implement SG support to
vhci-hcd and stub driver") was merged, the USB/IP driver can
also handle SG.
Therefore all supported device/controller combinations should be okay
with no need for any special virt_boundary_mask. So in order to head
off potential problems similar to those affecting usb-storage, this
patch reverts commit 3ae62a4209.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
CC: Oliver Neukum <oneukum@suse.com>
CC: <stable@vger.kernel.org>
Acked-by: Christoph Hellwig <hch@lst.de>
Fixes: 3ae62a4209 ("UAS: fix alignment of scatter/gather segments")
Link: https://lore.kernel.org/r/Pine.LNX.4.44L0.1910231132470.1878-100000@iolanthe.rowland.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 747668dbc0 ("usb-storage: Set virt_boundary_mask to avoid SG
overflows") attempted to solve a problem involving scatter-gather I/O
and USB/IP by setting the virt_boundary_mask for mass-storage devices.
However, it now turns out that this interacts badly with commit
09324d32d2 ("block: force an unlimited segment size on queues with a
virt boundary"), which was added later. A typical error message is:
ehci-pci 0000:00:13.2: swiotlb buffer is full (sz: 327680 bytes),
total 32768 (slots), used 97 (slots)
There is no longer any reason to keep the virt_boundary_mask setting
for usb-storage. It was needed in the first place only for handling
devices with a block size smaller than the maxpacket size and where
the host controller was not capable of fully general scatter-gather
operation (that is, able to merge two SG segments into a single USB
packet). But:
High-speed or slower connections never use a bulk maxpacket
value larger than 512;
The SCSI layer does not handle block devices with a block size
smaller than 512 bytes;
All the host controllers capable of SuperSpeed operation can
handle fully general SG;
Since commit ea44d19076 ("usbip: Implement SG support to
vhci-hcd and stub driver") was merged, the USB/IP driver can
also handle SG.
Therefore all supported device/controller combinations should be okay
with no need for any special virt_boundary_mask. So in order to fix
the swiotlb problem, this patch reverts commit 747668dbc0.
Reported-and-tested-by: Piergiorgio Sartor <piergiorgio.sartor@nexgo.de>
Link: https://marc.info/?l=linux-usb&m=157134199501202&w=2
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
CC: Seth Bollinger <Seth.Bollinger@digi.com>
CC: <stable@vger.kernel.org>
Fixes: 747668dbc0 ("usb-storage: Set virt_boundary_mask to avoid SG overflows")
Acked-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/Pine.LNX.4.44L0.1910211145520.1673-100000@iolanthe.rowland.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
iso_buffer should be set to NULL after use and free in the while loop.
In the case of isochronous URB in the while loop, iso_buffer is
allocated and after sending it to server, buffer is deallocated. And
then, if the next URB in the while loop is not a isochronous pipe,
iso_buffer still holds the previously deallocated buffer address and
kfree tries to free wrong buffer address.
Fixes: ea44d19076 ("usbip: Implement SG support to vhci-hcd and stub driver")
Reported-by: kbuild test robot <lkp@intel.com>
Reported-by: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: Suwan Kim <suwan.kim027@gmail.com>
Reviewed-by: Julia Lawall <julia.lawall@lip6.fr>
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Link: https://lore.kernel.org/r/20191022093017.8027-1-suwan.kim027@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
It looks like some of the xhci debug code is passing u32 to functions
directly from __le32/__le64 fields.
Fix this by using le{32,64}_to_cpu() on these to fix the following
sparse warnings;
xhci-debugfs.c:205:62: warning: incorrect type in argument 1 (different base types)
xhci-debugfs.c:205:62: expected unsigned int [usertype] field0
xhci-debugfs.c:205:62: got restricted __le32
xhci-debugfs.c:206:62: warning: incorrect type in argument 2 (different base types)
xhci-debugfs.c:206:62: expected unsigned int [usertype] field1
xhci-debugfs.c:206:62: got restricted __le32
...
[Trim down commit message, sparse warnings were similar -Mathias]
Cc: <stable@vger.kernel.org> # 4.15+
Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/1572013829-14044-4-git-send-email-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The arguments to queue_trb are always byteswapped to LE for placement in
the ring, but this should not happen in the case of immediate data; the
bytes copied out of transfer_buffer are already in the correct order.
Add a complementary byteswap so the bytes end up in the ring correctly.
This was observed on BE ppc64 with a "Texas Instruments TUSB73x0
SuperSpeed USB 3.0 xHCI Host Controller [104c:8241]" as a ch341
usb-serial adapter ("1a86:7523 QinHeng Electronics HL-340 USB-Serial
adapter") always transmitting the same character (generally NUL) over
the serial link regardless of the key pressed.
Cc: <stable@vger.kernel.org> # 5.2+
Fixes: 33e39350eb ("usb: xhci: add Immediate Data Transfer support")
Signed-off-by: Samuel Holland <samuel@sholland.org>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/1572013829-14044-3-git-send-email-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ef513be0a9 ("usb: xhci: Add Clear_TT_Buffer") schedules work
to clear TT buffer, but causes a use-after-free regression at the same time
Make sure hub_tt_work finishes before endpoint is disabled, otherwise
the work will dereference already freed endpoint and device related
pointers.
This was triggered when usb core failed to read the configuration
descriptor of a FS/LS device during enumeration.
xhci driver queued clear_tt_work while usb core freed and reallocated
a new device for the next enumeration attempt.
EHCI driver implents ehci_endpoint_disable() that makes sure
clear_tt_work has finished before it returns, but xhci lacks this support.
usb core will call hcd->driver->endpoint_disable() callback before
disabling endpoints, so we want this in xhci as well.
The added xhci_endpoint_disable() is based on ehci_endpoint_disable()
Fixes: ef513be0a9 ("usb: xhci: Add Clear_TT_Buffer")
Cc: <stable@vger.kernel.org> # v5.3
Reported-by: Johan Hovold <johan@kernel.org>
Suggested-by: Johan Hovold <johan@kernel.org>
Reviewed-by: Johan Hovold <johan@kernel.org>
Tested-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/1572013829-14044-2-git-send-email-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
A recent info-leak bug manifested itself along with warning about a
negative buffer overflow:
ldusb 1-1:0.28: Read buffer overflow, -131383859965943 bytes dropped
when it was really a rather large positive one.
A sanity check that prevents this has now been put in place, but let's
fix up the size format specifiers, which should all be unsigned.
Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://lore.kernel.org/r/20191022143203.5260-3-johan@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The custom ring-buffer implementation was merged without any locking or
explicit memory barriers, but a spinlock was later added by commit
9d33efd9a7 ("USB: ldusb bugfix").
The lock did not cover the update of the tail index once the entry had
been processed, something which could lead to memory corruption on
weakly ordered architectures or due to compiler optimisations.
Specifically, a completion handler running on another CPU might observe
the incremented tail index and update the entry before ld_usb_read() is
done with it.
Fixes: 2824bd250f ("[PATCH] USB: add ldusb driver")
Fixes: 9d33efd9a7 ("USB: ldusb bugfix")
Cc: stable <stable@vger.kernel.org> # 2.6.13
Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://lore.kernel.org/r/20191022143203.5260-2-johan@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Endpoints with a maxpacket length of 0 are probably useless. They
can't transfer any data, and it's not at all unlikely that an HCD will
crash or hang when trying to handle an URB for such an endpoint.
Currently the USB core does not check for endpoints having a maxpacket
value of 0. This patch adds a check, printing a warning and skipping
over any endpoints it catches.
Now, the USB spec does not rule out endpoints having maxpacket = 0.
But since they wouldn't have any practical use, there doesn't seem to
be any good reason for us to accept them.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Link: https://lore.kernel.org/r/Pine.LNX.4.44L0.1910281050420.1485-100000@iolanthe.rowland.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Felipe writes:
USB: fixes for v5.4-rc5
Not much here, only 14 commits in different drivers.
As for the specifics, Roger Quadros fixed an important bug in cdns3
where the driver was making decisions about data pull-up management
behind the UDC framework's back.
The Atmel UDC got a fix for interrupt storm in FIFO mode, this was done
by Cristian Brisan.
Apart from these, we have the usual set of non-critical fixes.
Signed-off-by: Felipe Balbi <balbi@kernel.org>
* tag 'fixes-for-v5.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/balbi/usb:
usb: cdns3: gadget: Don't manage pullups
usb: dwc3: remove the call trace of USBx_GFLADJ
usb: gadget: configfs: fix concurrent issue between composite APIs
usb: dwc3: pci: prevent memory leak in dwc3_pci_probe
usb: gadget: composite: Fix possible double free memory bug
usb: gadget: udc: atmel: Fix interrupt storm in FIFO mode.
usb: renesas_usbhs: fix type of buf
usb: renesas_usbhs: Fix warnings in usbhsg_recip_handler_std_set_device()
usb: gadget: udc: renesas_usb3: Fix __le16 warnings
usb: renesas_usbhs: fix __le16 warnings
usb: cdns3: include host-export,h for cdns3_host_init
usb: mtu3: fix missing include of mtu3_dr.h
usb: fsl: Check memory resource before releasing it
usb: dwc3: select CONFIG_REGMAP_MMIO
syzkaller reported an issue where it looks like a malicious app can
trigger a use-after-free of reading the ctx ->sq_array and ->rings
value right after having installed the ring fd in the process file
table.
Defer ring fd installation until after we're done reading those
values.
Fixes: 75b28affdd ("io_uring: allocate the two rings together")
Reported-by: syzbot+6f03d895a6cd0d06187f@syzkaller.appspotmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pull HID fixes from Jiri Kosina:
- HID++ device support regression fixes (race condition during cleanup,
device detection fix, opps fix) from Andrey Smirnov
- disable PM on i2c-hid, as it's causing problems with a lot of
devices; other OSes apparently don't implement/enable it either; from
Kai-Heng Feng
- error handling fix in intel-ish driver, from Zhang Lixu
- syzbot fuzzer fix for HID core code from Alan Stern
- a few other tiny fixups (printk message cleanup, new device ID)
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
HID: i2c-hid: add Trekstor Primebook C11B to descriptor override
HID: logitech-hidpp: do all FF cleanup in hidpp_ff_destroy()
HID: logitech-hidpp: rework device validation
HID: logitech-hidpp: split g920_get_config()
HID: i2c-hid: Remove runtime power management
HID: intel-ish-hid: fix wrong error handling in ishtp_cl_alloc_tx_ring()
HID: google: add magnemite/masterball USB ids
HID: Fix assumption that devices have inputs
HID: prodikeys: make array keys static const, makes object smaller
HID: fix error message in hid_open_report()
Pull virtio fixes from Michael Tsirkin:
"Some minor fixes"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
vringh: fix copy direction of vringh_iov_push_kern()
vsock/virtio: remove unused 'work' field from 'struct virtio_vsock_pkt'
virtio_ring: fix stalls for packed rings
The events in the same group don't start or stop simultaneously.
Here is the ftrace when enabling event group for uncore_iio_0:
# perf stat -e "{uncore_iio_0/event=0x1/,uncore_iio_0/event=0xe/}"
<idle>-0 [000] d.h. 8959.064832: read_msr: a41, value
b2b0b030 //Read counter reg of IIO unit0 counter0
<idle>-0 [000] d.h. 8959.064835: write_msr: a48, value
400001 //Write Ctrl reg of IIO unit0 counter0 to enable
counter0. <------ Although counter0 is enabled, Unit Ctrl is still
freezed. Nothing will count. We are still good here.
<idle>-0 [000] d.h. 8959.064836: read_msr: a40, value
30100 //Read Unit Ctrl reg of IIO unit0
<idle>-0 [000] d.h. 8959.064838: write_msr: a40, value
30000 //Write Unit Ctrl reg of IIO unit0 to enable all
counters in the unit by clear Freeze bit <------Unit0 is un-freezed.
Counter0 has been enabled. Now it starts counting. But counter1 has not
been enabled yet. The issue starts here.
<idle>-0 [000] d.h. 8959.064846: read_msr: a42, value 0
//Read counter reg of IIO unit0 counter1
<idle>-0 [000] d.h. 8959.064847: write_msr: a49, value
40000e //Write Ctrl reg of IIO unit0 counter1 to enable
counter1. <------ Now, counter1 just starts to count. Counter0 has
been running for a while.
Current code un-freezes the Unit Ctrl right after the first counter is
enabled. The subsequent group events always loses some counter values.
Implement pmu_enable and pmu_disable support for uncore, which can help
to batch hardware accesses.
No one uses uncore_enable_box and uncore_disable_box. Remove them.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: linux-drivers-review@eclists.intel.com
Cc: linux-perf@eclists.intel.com
Fixes: 087bfbb032 ("perf/x86: Add generic Intel uncore PMU support")
Link: https://lkml.kernel.org/r/1572014593-31591-1-git-send-email-kan.liang@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
We want to copy from iov to buf, so the direction was wrong.
Note: no real user for the helper, but it will be used by future
features.
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The 'work' field was introduced with commit 06a8fc7836
("VSOCK: Introduce virtio_vsock_common.ko")
but it is never used in the code, so we can remove it to save
memory allocated in the per-packet 'struct virtio_vsock_pkt'
Suggested-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
When VIRTIO_F_RING_EVENT_IDX is negotiated, virtio devices can
use virtqueue_enable_cb_delayed_packed to reduce the number of device
interrupts. At the moment, this is the case for virtio-net when the
napi_tx module parameter is set to false.
In this case, the virtio driver selects an event offset and expects that
the device will send a notification when rolling over the event offset
in the ring. However, if this roll-over happens before the event
suppression structure update, the notification won't be sent. To address
this race condition the driver needs to check wether the device rolled
over the offset after updating the event suppression structure.
With VIRTIO_F_RING_PACKED, the virtio driver did this by reading the
flags field of the descriptor at the specified offset.
Unfortunately, checking at the event offset isn't reliable: if
descriptors are chained (e.g. when INDIRECT is off) not all descriptors
are overwritten by the device, so it's possible that the device skipped
the specific descriptor driver is checking when writing out used
descriptors. If this happens, the driver won't detect the race condition
and will incorrectly expect the device to send a notification.
For virtio-net, the result will be a TX queue stall, with the
transmission getting blocked forever.
With the packed ring, it isn't easy to find a location which is
guaranteed to change upon the roll-over, except the next device
descriptor, as described in the spec:
Writes of device and driver descriptors can generally be
reordered, but each side (driver and device) are only required to
poll (or test) a single location in memory: the next device descriptor after
the one they processed previously, in circular order.
while this might be sub-optimal, let's do exactly this for now.
Cc: stable@vger.kernel.org
Cc: Jason Wang <jasowang@redhat.com>
Fixes: f51f982682 ("virtio_ring: leverage event idx in packed ring")
Signed-off-by: Marvin Liu <yong.liu@intel.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Rather than adding prototypes for C functions called only by assembly
code, mark them as __visible. This avoids adding prototypes that will
never be used by the callers. Resolves the following sparse warnings:
arch/riscv/kernel/irq.c:27:29: warning: symbol 'do_IRQ' was not declared. Should it be static?
arch/riscv/kernel/ptrace.c:151:6: warning: symbol 'do_syscall_trace_enter' was not declared. Should it be static?
arch/riscv/kernel/ptrace.c:165:6: warning: symbol 'do_syscall_trace_exit' was not declared. Should it be static?
arch/riscv/kernel/signal.c:295:17: warning: symbol 'do_notify_resume' was not declared. Should it be static?
arch/riscv/kernel/traps.c:92:1: warning: symbol 'do_trap_unknown' was not declared. Should it be static?
arch/riscv/kernel/traps.c:94:1: warning: symbol 'do_trap_insn_misaligned' was not declared. Should it be static?
arch/riscv/kernel/traps.c:96:1: warning: symbol 'do_trap_insn_fault' was not declared. Should it be static?
arch/riscv/kernel/traps.c:98:1: warning: symbol 'do_trap_insn_illegal' was not declared. Should it be static?
arch/riscv/kernel/traps.c:100:1: warning: symbol 'do_trap_load_misaligned' was not declared. Should it be static?
arch/riscv/kernel/traps.c:102:1: warning: symbol 'do_trap_load_fault' was not declared. Should it be static?
arch/riscv/kernel/traps.c:104:1: warning: symbol 'do_trap_store_misaligned' was not declared. Should it be static?
arch/riscv/kernel/traps.c:106:1: warning: symbol 'do_trap_store_fault' was not declared. Should it be static?
arch/riscv/kernel/traps.c:108:1: warning: symbol 'do_trap_ecall_u' was not declared. Should it be static?
arch/riscv/kernel/traps.c:110:1: warning: symbol 'do_trap_ecall_s' was not declared. Should it be static?
arch/riscv/kernel/traps.c:112:1: warning: symbol 'do_trap_ecall_m' was not declared. Should it be static?
arch/riscv/kernel/traps.c:124:17: warning: symbol 'do_trap_break' was not declared. Should it be static?
arch/riscv/kernel/smpboot.c:136:24: warning: symbol 'smp_callin' was not declared. Should it be static?
Based on a suggestion from Luc Van Oostenryck.
This version includes changes based on feedback from Christoph Hellwig
<hch@lst.de>.
Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de> # for do_syscall_trace_*
The __user annotations were removed from the {save,restore}_fp_state()
function signatures by commit 007f5c3589 ("Refactor FPU code in
signal setup/return procedures"), but should be present, and sparse
warns when they are not applied. Add them back in.
This change should have no functional impact.
Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
Fixes: 007f5c3589 ("Refactor FPU code in signal setup/return procedures")
Cc: Alan Kao <alankao@andestech.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
sparse identifies several missing prototypes caused by missing
preprocessor include directives:
arch/riscv/kernel/cpufeature.c:16:6: warning: symbol 'has_fpu' was not declared. Should it be static?
arch/riscv/kernel/process.c:26:6: warning: symbol 'arch_cpu_idle' was not declared. Should it be static?
arch/riscv/kernel/reset.c:15:6: warning: symbol 'pm_power_off' was not declared. Should it be static?
arch/riscv/kernel/syscall_table.c:15:6: warning: symbol 'sys_call_table' was not declared. Should it be static?
arch/riscv/kernel/traps.c:149:13: warning: symbol 'trap_init' was not declared. Should it be static?
arch/riscv/kernel/vdso.c:54:5: warning: symbol 'arch_setup_additional_pages' was not declared. Should it be static?
arch/riscv/kernel/smp.c:64:6: warning: symbol 'arch_match_cpu_phys_id' was not declared. Should it be static?
arch/riscv/kernel/module-sections.c:89:5: warning: symbol 'module_frob_arch_sections' was not declared. Should it be static?
arch/riscv/mm/context.c:42:6: warning: symbol 'switch_mm' was not declared. Should it be static?
Fix by including the appropriate header files in the appropriate
source files.
This patch should have no functional impact.
Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Several functions and arrays which are only used in the files in which
they are declared are missing "static" qualifiers. Warnings for these
symbols are reported by sparse:
arch/riscv/kernel/vdso.c:28:18: warning: symbol 'vdso_data' was not declared. Should it be static?
arch/riscv/mm/sifive_l2_cache.c:145:12: warning: symbol 'sifive_l2_init' was not declared. Should it be static?
Resolve these warnings by marking them as static.
This version incorporates feedback from Greentime Hu
<greentime.hu@sifive.com>.
Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Greentime Hu <greentime.hu@sifive.com>
sparse complains loudly when string literals associated with
preprocessor directives are split into multiple, separately quoted
strings across different lines:
arch/riscv/mm/init.c:341:9: error: Expected ; at the end of type declaration
arch/riscv/mm/init.c:341:9: error: got "not use absolute addressing."
arch/riscv/mm/init.c:358:9: error: Trying to use reserved word 'do' as identifier
arch/riscv/mm/init.c:358:9: error: Expected ; at end of declaration
[ ... ]
It turns out this doesn't compile. The existing Linux practice for
this situation is simply to use a single long line. So, fix by
concatenating the strings.
This patch should have no functional impact.
This version incorporates changes based on feedback from Luc Van
Oostenryck <luc.vanoostenryck@gmail.com>.
Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
Reviewed-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/linux-riscv/CAAhSdy2nX2LwEEAZuMtW_ByGTkHO6KaUEvVxRnba_ENEjmFayQ@mail.gmail.com/T/#mc1a58bc864f71278123d19a7abc083a9c8e37033
Fixes: 387181dcdb ("RISC-V: Always compile mm/init.c with cmodel=medany and notrace")
Cc: Anup Patel <anup.patel@wdc.com>
Add prototypes for assembly language functions defined in head.S,
and include these prototypes into C source files that call those
functions.
This patch resolves the following warnings from sparse:
arch/riscv/kernel/setup.c:39:10: warning: symbol 'hart_lottery' was not declared. Should it be static?
arch/riscv/kernel/setup.c:42:13: warning: symbol 'parse_dtb' was not declared. Should it be static?
arch/riscv/kernel/smpboot.c:33:6: warning: symbol '__cpu_up_stack_pointer' was not declared. Should it be static?
arch/riscv/kernel/smpboot.c:34:6: warning: symbol '__cpu_up_task_pointer' was not declared. Should it be static?
arch/riscv/mm/fault.c:25:17: warning: symbol 'do_page_fault' was not declared. Should it be static?
This change should have no functional impact.
Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
io_queue_link_head() owns shadow_req after taking it as an argument.
By not freeing it in case of an error, it can leak the request along
with taken ctx->refs.
Reviewed-by: Jackie Liu <liuyun01@kylinos.cn>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pablo Neira Ayuso says:
====================
Netfilter/IPVS fixes for net
The following patchset contains Netfilter/IPVS fixes for net:
1) Fix crash on flowtable due to race between garbage collection
and insertion.
2) Restore callback unbinding in netfilter offloads.
3) Fix races on IPVS module removal, from Davide Caratti.
4) Make old_secure_tcp per-netns to fix sysbot report,
from Eric Dumazet.
5) Validate matching length in netfilter offloads, from wenxu.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull x86 fixes from Thomas Gleixner:
"Two fixes for the VMWare guest support:
- Unbreak VMWare platform detection which got wreckaged by converting
an integer constant to a string constant.
- Fix the clang build of the VMWAre hypercall by explicitely
specifying the ouput register for INL instead of using the short
form"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/cpu/vmware: Fix platform detection VMWARE_PORT macro
x86/cpu/vmware: Use the full form of INL in VMWARE_HYPERCALL, for clang/llvm
Pull timer fixes from Thomas Gleixner:
"A small set of fixes for time(keeping):
- Add a missing include to prevent compiler warnings.
- Make the VDSO implementation of clock_getres() POSIX compliant
again. A recent change dropped the NULL pointer guard which is
required as NULL is a valid pointer value for this function.
- Fix two function documentation typos"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
posix-cpu-timers: Fix two trivial comments
timers/sched_clock: Include local timekeeping.h for missing declarations
lib/vdso: Make clock_getres() POSIX compliant again
Pull perf fixes from Thomas Gleixner:
"A set of perf fixes:
kernel:
- Unbreak the tracking of auxiliary buffer allocations which got
imbalanced causing recource limit failures.
- Fix the fallout of splitting of ToPA entries which missed to shift
the base entry PA correctly.
- Use the correct context to lookup the AUX event when unmapping the
associated AUX buffer so the event can be stopped and the buffer
reference dropped.
tools:
- Fix buildiid-cache mode setting in copyfile_mode_ns() when copying
/proc/kcore
- Fix freeing id arrays in the event list so the correct event is
closed.
- Sync sched.h anc kvm.h headers with the kernel sources.
- Link jvmti against tools/lib/ctype.o to have weak strlcpy().
- Fix multiple memory and file descriptor leaks, found by coverity in
perf annotate.
- Fix leaks in error handling paths in 'perf c2c', 'perf kmem', found
by a static analysis tool"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/aux: Fix AUX output stopping
perf/aux: Fix tracking of auxiliary trace buffer allocation
perf/x86/intel/pt: Fix base for single entry topa
perf kmem: Fix memory leak in compact_gfp_flags()
tools headers UAPI: Sync sched.h with the kernel
tools headers kvm: Sync kvm.h headers with the kernel sources
tools headers kvm: Sync kvm headers with the kernel sources
tools headers kvm: Sync kvm headers with the kernel sources
perf c2c: Fix memory leak in build_cl_output()
perf tools: Fix mode setting in copyfile_mode_ns()
perf annotate: Fix multiple memory and file descriptor leaks
perf tools: Fix resource leak of closedir() on the error paths
perf evlist: Fix fix for freed id arrays
perf jvmti: Link against tools/lib/ctype.h to have weak strlcpy()
Pull irq fixes from Thomas Gleixner:
"Two fixes for interrupt controller drivers:
- Skip IRQ_M_EXT entries in the device tree when initializing the
RISCV PLIC controller to avoid a double init attempt.
- Use the correct ITS list when issuing the VMOVP synchronization
command so the operation works only on the ITS instances which are
associated to a VM"
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/sifive-plic: Skip contexts except supervisor in plic_init()
irqchip/gic-v3-its: Use the exact ITSList for VMOVP
Pull cifs fixes from Steve French:
"Seven cifs/smb3 fixes, including three for stable"
* tag '5.4-rc5-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: Fix cifsInodeInfo lock_sem deadlock when reconnect occurs
CIFS: Fix use after free of file info structures
CIFS: Fix retry mid list corruption on reconnects
cifs: Fix missed free operations
CIFS: avoid using MID 0xFFFF
cifs: clarify comment about timestamp granularity for old servers
cifs: Handle -EINPROGRESS only when noblockcnt is set
Pull RISC-V fixes from Paul Walmsley:
"Several minor fixes and cleanups for v5.4-rc5:
- Three build fixes for various SPARSEMEM-related kernel
configurations
- Two cleanup patches for the kernel bug and breakpoint trap handler
code"
* tag 'riscv/for-v5.4-rc5-b' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: cleanup do_trap_break
riscv: cleanup <asm/bug.h>
riscv: Fix undefined reference to vmemmap_populate_basepages
riscv: Fix implicit declaration of 'page_to_section'
riscv: fix fs/proc/kcore.c compilation with sparsemem enabled
The USB gadget core is supposed to manage pullups
of the controller. Don't manage pullups from within
the controller driver. Otherwise, function drivers
are not able to keep the controller disconnected from
the bus till they are ready. (e.g. g_webcam)
Reviewed-by: Pawel Laszczak <pawell@cadence.com>
Signed-off-by: Roger Quadros <rogerq@ti.com>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
layerscape board sometimes reported some usb call trace, that is due to
kernel sent LPM tokerns automatically when it has no pending transfers
and think that the link is idle enough to enter L1, which procedure will
ask usb register has a recovery,then kernel will compare USBx_GFLADJ and
set GFLADJ_30MHZ, GFLADJ_30MHZ_REG until GFLADJ_30MHZ is equal 0x20, if
the conditions were met then issue occur, but whatever the conditions
whether were met that usb is all need keep GFLADJ_30MHZ of value is 0x20
(xhci spec ask use GFLADJ_30MHZ to adjust any offset from clock source
that generates the clock that drives the SOF counter, 0x20 is default
value of it)That is normal logic, so need remove the call trace.
Signed-off-by: Yinbo Zhu <yinbo.zhu@nxp.com>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
In dwc3_pci_probe a call to platform_device_alloc allocates a device
which is correctly put in case of error except one case: when the call to
platform_device_add_properties fails it directly returns instead of
going to error handling. This commit replaces return with the goto.
Fixes: 1a7b12f69a ("usb: dwc3: pci: Supply device properties via driver data")
Signed-off-by: Navid Emamdoost <navid.emamdoost@gmail.com>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
composite_dev_cleanup call from the failure of configfs_composite_bind
frees up the cdev->os_desc_req and cdev->req. If the previous calls of
bind and unbind is successful these will carry stale values.
Consider the below sequence of function calls:
configfs_composite_bind()
composite_dev_prepare()
- Allocate cdev->req, cdev->req->buf
composite_os_desc_req_prepare()
- Allocate cdev->os_desc_req, cdev->os_desc_req->buf
configfs_composite_unbind()
composite_dev_cleanup()
- free the cdev->os_desc_req->buf and cdev->req->buf
Next composition switch
configfs_composite_bind()
- If it fails goto err_comp_cleanup will call the
composite_dev_cleanup() function
composite_dev_cleanup()
- calls kfree up with the stale values of cdev->req->buf and
cdev->os_desc_req from the previous configfs_composite_bind
call. The free call on these stale values leads to double free.
Hence, Fix this issue by setting request and buffer pointer to NULL after
kfree.
Signed-off-by: Chandana Kishori Chiluveru <cchiluve@codeaurora.org>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
Fix interrupt storm generated by endpoints when working in FIFO mode.
The TX_COMPLETE interrupt is used only by control endpoints processing.
Do not enable it for other types of endpoints.
Fixes: 914a3f3b37 ("USB: add atmel_usba_udc driver")
Signed-off-by: Cristian Birsan <cristian.birsan@microchip.com>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
Fix the type of buf in __usbhsg_recip_send_status to
be __le16 to avoid the following sparse warning:
drivers/usb/renesas_usbhs/mod_gadget.c:335:14: warning: incorrect type in assignment (different base types)
drivers/usb/renesas_usbhs/mod_gadget.c:335:14: expected unsigned short
drivers/usb/renesas_usbhs/mod_gadget.c:335:14: got restricted __le16 [usertype]
Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
This patch fixes the following sparse warnings by shifting 8-bits after
le16_to_cpu().
drivers/usb/renesas_usbhs/mod_gadget.c:268:47: warning: restricted __le16 degrades to integer
drivers/usb/renesas_usbhs/mod_gadget.c:268:47: warning: cast to restricted __le16
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
This patch fixes the following sparse warnings by using
a macro and a suitable variable type.
drivers/usb/gadget/udc/renesas_usb3.c:1547:17: warning: restricted __le16 degrades to integer
drivers/usb/gadget/udc/renesas_usb3.c:1550:43: warning: incorrect type in argument 2 (different base types)
drivers/usb/gadget/udc/renesas_usb3.c:1550:43: expected unsigned short [usertype] addr
drivers/usb/gadget/udc/renesas_usb3.c:1550:43: got restricted __le16 [usertype] wValue
drivers/usb/gadget/udc/renesas_usb3.c:1607:24: warning: incorrect type in assignment (different base types)
drivers/usb/gadget/udc/renesas_usb3.c:1607:24: expected unsigned short [assigned] [usertype] status
drivers/usb/gadget/udc/renesas_usb3.c:1607:24: got restricted __le16 [usertype]
drivers/usb/gadget/udc/renesas_usb3.c:1775:17: warning: restricted __le16 degrades to integer
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
Fix the warnings generated by casting to/from __le16 without
using the correct functions.
Fixes the following sparse warnings:
drivers/usb/renesas_usbhs/common.c:165:25: warning: incorrect type in assignment (different base types)
drivers/usb/renesas_usbhs/common.c:165:25: expected restricted __le16 [usertype] wValue
drivers/usb/renesas_usbhs/common.c:165:25: got unsigned short
drivers/usb/renesas_usbhs/common.c:166:25: warning: incorrect type in assignment (different base types)
drivers/usb/renesas_usbhs/common.c:166:25: expected restricted __le16 [usertype] wIndex
drivers/usb/renesas_usbhs/common.c:166:25: got unsigned short
drivers/usb/renesas_usbhs/common.c:167:25: warning: incorrect type in assignment (different base types)
drivers/usb/renesas_usbhs/common.c:167:25: expected restricted __le16 [usertype] wLength
drivers/usb/renesas_usbhs/common.c:167:25: got unsigned short
drivers/usb/renesas_usbhs/common.c:173:39: warning: incorrect type in argument 3 (different base types)
drivers/usb/renesas_usbhs/common.c:173:39: expected unsigned short [usertype] data
drivers/usb/renesas_usbhs/common.c:173:39: got restricted __le16 [usertype] wValue
drivers/usb/renesas_usbhs/common.c:174:39: warning: incorrect type in argument 3 (different base types)
drivers/usb/renesas_usbhs/common.c:174:39: expected unsigned short [usertype] data
drivers/usb/renesas_usbhs/common.c:174:39: got restricted __le16 [usertype] wIndex
drivers/usb/renesas_usbhs/common.c:175:39: warning: incorrect type in argument 3 (different base types)
drivers/usb/renesas_usbhs/common.c:175:39: expected unsigned short [usertype] data
Note. I belive this to be correct, and should be a no-op on arm.
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
The cdns3_host_init() function is declared in host-export.h
but host.c does not include it. Add the include to have
the declaration present (and remove the declaration of
cdns3_host_exit which is now static).
Fixes the following sparse warning:
drivers/usb/cdns3/host.c:58:5: warning: symbol 'cdns3_host_init' was not declared. Should it be static?
Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
The declarations of ssusb_gadget_{init,exit} are
in the mtu3_dr.h file but the code does that implements
them does not include this. Add the include to fix the
following sparse warnigns:
drivers/usb/mtu3/mtu3_core.c:825:5: warning: symbol 'ssusb_gadget_init' was not declared. Should it be static?
drivers/usb/mtu3/mtu3_core.c:925:6: warning: symbol 'ssusb_gadget_exit' was not declared. Should it be static?
Acked-by: Chunfeng Yun <chunfeng.yun@mediatek.com>
Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
After many randconfig builds, one configuration caused a link
error with dwc3-meson-g12a lacking the regmap-mmio code:
drivers/usb/dwc3/dwc3-meson-g12a.o: In function `dwc3_meson_g12a_probe':
dwc3-meson-g12a.c:(.text+0x9f): undefined reference to `__devm_regmap_init_mmio_clk'
Add the select statement that we have for all other users
of that dependency.
Fixes: c99993376f ("usb: dwc3: Add Amlogic G12A DWC3 glue")
Acked-by: Neil Armstrong <narmstrong@baylibre.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
Daniel Borkmann says:
====================
pull-request: bpf 2019-10-27
The following pull-request contains BPF updates for your *net* tree.
We've added 7 non-merge commits during the last 11 day(s) which contain
a total of 7 files changed, 66 insertions(+), 16 deletions(-).
The main changes are:
1) Fix two use-after-free bugs in relation to RCU in jited symbol exposure to
kallsyms, from Daniel Borkmann.
2) Fix NULL pointer dereference in AF_XDP rx-only sockets, from Magnus Karlsson.
3) Fix hang in netdev unregister for hash based devmap as well as another overflow
bug on 32 bit archs in memlock cost calculation, from Toke Høiland-Jørgensen.
4) Fix wrong memory access in LWT BPF programs on reroute due to invalid dst.
Also fix BPF selftests to use more compatible nc options, from Jiri Benc.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull MIPS fixes from Paul Burton:
"A few MIPS fixes:
- Fix VDSO time-related function behavior for systems where we need
to fall back to syscalls, but were instead returning bogus results.
- A fix to TLB exception handlers for Cavium Octeon systems where
they would inadvertently clobber the $1/$at register.
- A build fix for bcm63xx configurations.
- Switch to using my @kernel.org email address"
* tag 'mips_fixes_5.4_3' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
MIPS: tlbex: Fix build_restore_pagemask KScratch restore
MIPS: bmips: mark exception vectors as char arrays
mips: vdso: Fix __arch_get_hw_counter()
MAINTAINERS: Use @kernel.org address for Paul Burton
Pull tty/serial driver fix from Greg KH:
"Here is a single tty/serial driver fix for 5.4-rc5 that resolves a
reported issue.
It has been in linux-next for a while with no problems"
* tag 'tty-5.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
8250-men-mcb: fix error checking when get_num_ports returns -ENODEV
Pull staging driver fix from Greg KH:
"Here is a single staging driver fix, for the wlan-ng driver, that
resolves a reported issue.
It is been in linux-next for a while with no reported issues"
* tag 'staging-5.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
staging: wlan-ng: fix exit return when sme->key_idx >= NUM_WEPKEYS
Pull driver core fix from Greg KH:
"Here is a single sysfs fix for 5.4-rc5.
It resolves an error if you actually try to use the __BIN_ATTR_WO()
macro, seems I never tested it properly before :(
This has been in linux-next for a while with no reported issues"
* tag 'driver-core-5.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
sysfs: Fixes __BIN_ATTR_WO() macro
Pull binder fix from Greg KH:
"This is a single binder fix to resolve a reported issue by Jann. It's
been in linux-next for a while with no reported issues"
* tag 'char-misc-5.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
binder: Don't modify VMA bounds in ->mmap handler
Pull USB fixes from Greg KH:
"Here are a number of small USB driver fixes for 5.4-rc5.
More "fun" with some of the misc USB drivers as found by syzbot, and
there are a number of other small bugfixes in here for reported
issues.
All have been in linux-next for a while with no reported issues"
* tag 'usb-5.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
usb: cdns3: Error out if USB_DR_MODE_UNKNOWN in cdns3_core_init_role()
USB: ldusb: fix read info leaks
USB: serial: ti_usb_3410_5052: clean up serial data access
USB: serial: ti_usb_3410_5052: fix port-close races
USB: usblp: fix use-after-free on disconnect
usb: udc: lpc32xx: fix bad bit shift operation
usb: cdns3: Fix dequeue implementation.
USB: legousbtower: fix a signedness bug in tower_probe()
USB: legousbtower: fix memleak on disconnect
USB: ldusb: fix memleak on disconnect
Pull i2c fixes from Wolfram Sang:
"A few driver fixes for the I2C subsystem"
* 'i2c/for-current-fixed' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: stm32f7: remove warning when compiling with W=1
i2c: stm32f7: fix a race in slave mode with arbitration loss irq
i2c: stm32f7: fix first byte to send in slave mode
i2c: mt65xx: fix NULL ptr dereference
i2c: aspeed: fix master pending state handling
Pull block and io_uring fixes from Jens Axboe:
"A bit bigger than usual at this point in time, mostly due to some good
bug hunting work by Pavel that resulted in three io_uring fixes from
him and two from me. Anyway, this pull request contains:
- Revert of the submit-and-wait optimization for io_uring, it can't
always be done safely. It depends on commands always making
progress on their own, which isn't necessarily the case outside of
strict file IO. (me)
- Series of two patches from me and three from Pavel, fixing issues
with shared data and sequencing for io_uring.
- Lastly, two timeout sequence fixes for io_uring (zhangyi)
- Two nbd patches fixing races (Josef)
- libahci regulator_get_optional() fix (Mark)"
* tag 'for-linus-2019-10-26' of git://git.kernel.dk/linux-block:
nbd: verify socket is supported during setup
ata: libahci_platform: Fix regulator_get_optional() misuse
nbd: handle racing with error'ed out commands
nbd: protect cmd->status with cmd->lock
io_uring: fix bad inflight accounting for SETUP_IOPOLL|SETUP_SQTHREAD
io_uring: used cached copies of sq->dropped and cq->overflow
io_uring: Fix race for sqes with userspace
io_uring: Fix broken links with offloading
io_uring: Fix corrupted user_data
io_uring: correct timeout req sequence when inserting a new entry
io_uring : correct timeout req sequence when waiting timeout
io_uring: revert "io_uring: optimize submit_and_wait API"
Paolo Abeni says:
====================
ipv4: fix route update on metric change.
This fixes connected route update on some edge cases for ip addr metric
change.
It additionally includes self tests for the covered scenarios. The new tests
fail on unpatched kernels and pass on the patched one.
v1 -> v2:
- add selftests
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds two more tests to ipv4_addr_metric_test() to
explicitly cover the scenarios fixed by the previous patch.
Suggested-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since commit af4d768ad2 ("net/ipv4: Add support for specifying metric
of connected routes"), when updating an IP address with a different metric,
the associated connected route is updated, too.
Still, the mentioned commit doesn't handle properly some corner cases:
$ ip addr add dev eth0 192.168.1.0/24
$ ip addr add dev eth0 192.168.2.1/32 peer 192.168.2.2
$ ip addr add dev eth0 192.168.3.1/24
$ ip addr change dev eth0 192.168.1.0/24 metric 10
$ ip addr change dev eth0 192.168.2.1/32 peer 192.168.2.2 metric 10
$ ip addr change dev eth0 192.168.3.1/24 metric 10
$ ip -4 route
192.168.1.0/24 dev eth0 proto kernel scope link src 192.168.1.0
192.168.2.2 dev eth0 proto kernel scope link src 192.168.2.1
192.168.3.0/24 dev eth0 proto kernel scope link src 192.168.2.1 metric 10
Only the last route is correctly updated.
The problem is the current test in fib_modify_prefix_metric():
if (!(dev->flags & IFF_UP) ||
ifa->ifa_flags & (IFA_F_SECONDARY | IFA_F_NOPREFIXROUTE) ||
ipv4_is_zeronet(prefix) ||
prefix == ifa->ifa_local || ifa->ifa_prefixlen == 32)
Which should be the logical 'not' of the pre-existing test in
fib_add_ifaddr():
if (!ipv4_is_zeronet(prefix) && !(ifa->ifa_flags & IFA_F_SECONDARY) &&
(prefix != addr || ifa->ifa_prefixlen < 32))
To properly negate the original expression, we need to change the last
logical 'or' to a logical 'and'.
Fixes: af4d768ad2 ("net/ipv4: Add support for specifying metric of connected routes")
Reported-and-suggested-by: Beniamino Galvani <bgalvani@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
memset() the structure ethtool_wolinfo that has padded bytes
but the padded bytes have not been zeroed out.
Signed-off-by: zhanglin <zhang.lin16@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
Simon Horman says:
====================
IPVS fixes for v5.4
* Eric Dumazet resolves a race condition in switching the defense level
* Davide Caratti resolves a race condition in module removal
====================
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Pull s390 fixes from Vasily Gorbik:
- Add R_390_GLOB_DAT relocation type support. This fixes boot problem
on linux-next.
- Fix memory leak in zcrypt
* tag 's390-5.4-5' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/kaslr: add support for R_390_GLOB_DAT relocation type
s390/zcrypt: fix memleak at release
Pull xen fixlet from Juergen Gross:
"Just one patch for issuing a deprecation warning for 32-bit Xen pv
guests"
* tag 'for-linus-5.4-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen: issue deprecation warning for 32-bit pv guest
Pull dma-mapping fix from Christoph Hellwig:
"Fix a regression in the intel-iommu get_required_mask conversion
(Arvind Sankar)"
* tag 'dma-mapping-5.4-2' of git://git.infradead.org/users/hch/dma-mapping:
iommu/vt-d: Return the correct dma mask when we are bypassing the IOMMU
Pull dax fix from Dan Williams:
"Fix a performance regression that followed from a fix to the
conversion of the fsdax implementation to the xarray. v5.3 users
report that they stop seeing huge page mappings on an application +
filesystem layout that was seeing huge pages previously on v5.2"
* tag 'dax-fix-5.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
fs/dax: Fix pmd vs pte conflict detection
For adapters which support the SGE Doorbell Queue Timer facility,
we configured the Ethernet TX Queues to send CIDX Updates to the
Associated Ethernet RX Response Queue with CPL_SGE_EGR_UPDATE
messages to allow us to respond more quickly to the CIDX Updates.
But, this was adding load to PCIe Link RX bandwidth and,
potentially, resulting in higher CPU Interrupt load.
This patch requests the HW to deliver the CIDX updates to the TX
queue status page rather than generating an ingress queue message
(as an interrupt). With this patch, the load on RX bandwidth is
reduced and a substantial improvement in BW is noticed at lower
IO sizes.
Fixes: d429005fdf ("cxgb4/cxgb4vf: Add support for SGE doorbell queue timer")
Signed-off-by: Raju Rangoju <rajur@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In rtnl_net_notifyid(), we certainly can't pass a null GFP flag to
rtnl_notify(). A GFP_KERNEL flag would be fine in most circumstances,
but there are a few paths calling rtnl_net_notifyid() from atomic
context or from RCU critical sections. The later also precludes the use
of gfp_any() as it wouldn't detect the RCU case. Also, the nlmsg_new()
call is wrong too, as it uses GFP_KERNEL unconditionally.
Therefore, we need to pass the GFP flags as parameter and propagate it
through function calls until the proper flags can be determined.
In most cases, GFP_KERNEL is fine. The exceptions are:
* openvswitch: ovs_vport_cmd_get() and ovs_vport_cmd_dump()
indirectly call rtnl_net_notifyid() from RCU critical section,
* rtnetlink: rtmsg_ifinfo_build_skb() already receives GFP flags as
parameter.
Also, in ovs_vport_cmd_build_info(), let's change the GFP flags used
by nlmsg_new(). The function is allowed to sleep, so better make the
flags consistent with the ones used in the following
ovs_vport_cmd_fill_info() call.
Found by code inspection.
Fixes: 9a9634545c ("netns: notify netns id events")
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch corrects the SPDX License Identifier style in
header file related to ethernet driver for Cortina Gemini
devices. For C header files Documentation/process/license-rules.rst
mandates C-like comments (opposed to C source files where
C++ style should be used)
Changes made by using a script provided by Joe Perches here:
https://lkml.org/lkml/2019/2/7/46.
Suggested-by: Joe Perches <joe@perches.com>
Signed-off-by: Nishad Kamdar <nishadkamdar@gmail.com>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Karsten Graul says:
====================
net/smc: fixes for -net
Fixes for the net tree, covering a memleak when closing
SMC fallback sockets and fix SMC-R connection establishment
when vlan-ids are used.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Creating of an SMC-R connection with vlan-id fails, because
smc_listen_work() determines the vlan_id of the connection,
saves it in struct smc_init_info ini, but clears the ini area
again if SMC-D is not applicable.
This patch just resets the ISM device before investigating
SMC-R availability.
Fixes: bc36d2fc93 ("net/smc: consolidate function parameters")
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For SMC sockets forced to fallback to TCP, the file is propagated
from the outer SMC to the internal TCP socket. When closing the SMC
socket, the internal TCP socket file pointer must be restored to the
original NULL value, otherwise memory leaks may show up (found with
CONFIG_DEBUG_KMEMLEAK).
The internal TCP socket is released in smc_clcsock_release(), which
calls __sock_release() function in net/socket.c. This calls the
needed iput(SOCK_INODE(sock)) only, if the file pointer has been reset
to the original NULL-value.
Fixes: 07603b2308 ("net/smc: propagate file from SMC to TCP socket")
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull SCSI fixes from James Bottomley:
"Nine changes, eight to drivers (qla2xxx, hpsa, lpfc, alua, ch,
53c710[x2], target) and one core change that tries to close a race
between sysfs delete and module removal"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: lpfc: remove left-over BUILD_NVME defines
scsi: core: try to get module before removing device
scsi: hpsa: add missing hunks in reset-patch
scsi: target: core: Do not overwrite CDB byte 1
scsi: ch: Make it possible to open a ch device multiple times again
scsi: fix kconfig dependency warning related to 53C700_LE_ON_BE
scsi: sni_53c710: fix compilation error
scsi: scsi_dh_alua: handle RTPG sense code correctly during state transitions
scsi: qla2xxx: fix a potential NULL pointer dereference
If we always compile the get_break_insn_length inline function we can
remove the ifdefs and let dead code elimination take care of the warn
branch that is now unreadable because the report_bug stub always
returns BUG_TRAP_TYPE_BUG.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Anup Patel <anup@brainfault.org>
Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
If CONFIG_NET_HWBM is not set, then these stub functions in
<net/hwbm.h> should be declared static to avoid trying to
export them from any driver that includes this.
Fixes the following sparse warnings:
./include/net/hwbm.h:24:6: warning: symbol 'hwbm_buf_free' was not declared. Should it be static?
./include/net/hwbm.h:25:5: warning: symbol 'hwbm_pool_refill' was not declared. Should it be static?
./include/net/hwbm.h:26:5: warning: symbol 'hwbm_pool_add' was not declared. Should it be static?
Signed-off-by: Ben Dooks (Codethink) <ben.dooks@codethink.co.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the CONFIG_MVNET_BA is not set, then make the stub functions
static inline to avoid trying to export them, and remove hte
following sparse warnings:
drivers/net/ethernet/marvell/mvneta_bm.h:163:6: warning: symbol 'mvneta_bm_pool_destroy' was not declared. Should it be static?
drivers/net/ethernet/marvell/mvneta_bm.h:165:6: warning: symbol 'mvneta_bm_bufs_free' was not declared. Should it be static?
drivers/net/ethernet/marvell/mvneta_bm.h:167:5: warning: symbol 'mvneta_bm_construct' was not declared. Should it be static?
drivers/net/ethernet/marvell/mvneta_bm.h:168:5: warning: symbol 'mvneta_bm_pool_refill' was not declared. Should it be static?
drivers/net/ethernet/marvell/mvneta_bm.h:170:23: warning: symbol 'mvneta_bm_pool_use' was not declared. Should it be static?
drivers/net/ethernet/marvell/mvneta_bm.h:181:18: warning: symbol 'mvneta_bm_get' was not declared. Should it be static?
drivers/net/ethernet/marvell/mvneta_bm.h:182:6: warning: symbol 'mvneta_bm_put' was not declared. Should it be static?
Signed-off-by: Ben Dooks (Codethink) <ben.dooks@codethink.co.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is networking hardware that isn't based on Ethernet for layers 1 and 2.
For example CAN.
CAN is a multi-master serial bus standard for connecting Electronic Control
Units [ECUs] also known as nodes. A frame on the CAN bus carries up to 8 bytes
of payload. Frame corruption is detected by a CRC. However frame loss due to
corruption is possible, but a quite unusual phenomenon.
While fq_codel works great for TCP/IP, it doesn't for CAN. There are a lot of
legacy protocols on top of CAN, which are not build with flow control or high
CAN frame drop rates in mind.
When using fq_codel, as soon as the queue reaches a certain delay based length,
skbs from the head of the queue are silently dropped. Silently meaning that the
user space using a send() or similar syscall doesn't get an error. However
TCP's flow control algorithm will detect dropped packages and adjust the
bandwidth accordingly.
When using fq_codel and sending raw frames over CAN, which is the common use
case, the user space thinks the package has been sent without problems, because
send() returned without an error. pfifo_fast will drop skbs, if the queue
length exceeds the maximum. But with this scheduler the skbs at the tail are
dropped, an error (-ENOBUFS) is propagated to user space. So that the user
space can slow down the package generation.
On distributions, where fq_codel is made default via CONFIG_DEFAULT_NET_SCH
during compile time, or set default during runtime with sysctl
net.core.default_qdisc (see [1]), we get a bad user experience. In my test case
with pfifo_fast, I can transfer thousands of million CAN frames without a frame
drop. On the other hand with fq_codel there is more then one lost CAN frame per
thousand frames.
As pointed out fq_codel is not suited for CAN hardware, so this patch changes
attach_one_default_qdisc() to use pfifo_fast for "ARPHRD_CAN" network devices.
During transition of a netdev from down to up state the default queuing
discipline is attached by attach_default_qdiscs() with the help of
attach_one_default_qdisc(). This patch modifies attach_one_default_qdisc() to
attach the pfifo_fast (pfifo_fast_ops) if the network device type is
"ARPHRD_CAN".
[1] https://github.com/systemd/systemd/issues/9194
Signed-off-by: Vincent Prince <vincent.prince.fr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull input fix from Dmitry Torokhov:
"A fix for st1232 driver to properly report coordinates for 2nd and
subsequent fingers when more than one is on the surface"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: st1232 - fix reporting multitouch coordinates
nbd requires socket families to support the shutdown method so the nbd
recv workqueue can be woken up from its sock_recvmsg call. If the socket
does not support the callout we will leave recv works running or get hangs
later when the device or module is removed.
This adds a check during socket connection/reconnection to make sure the
socket being passed in supports the needed callout.
Reported-by: syzbot+24c12fa8d218ed26011a@syzkaller.appspotmail.com
Fixes: e9e006f5fc ("nbd: fix max number of supported devs")
Tested-by: Richard W.M. Jones <rjones@redhat.com>
Signed-off-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This driver is using regulator_get_optional() to handle all the supplies
that it handles, and only ever enables and disables all supplies en masse
without ever doing any other configuration of the device to handle missing
power. These are clear signs that the API is being misused - it should only
be used for supplies that may be physically absent from the system and in
these cases the hardware usually needs different configuration if the
supply is missing. Instead use normal regualtor_get(), if the supply is
not described in DT then the framework will substitute a dummy regulator in
so no special handling is needed by the consumer driver.
In the case of the PHY regulator the handling in the driver is a hack to
deal with integrated PHYs; the supplies are only optional in the sense
that that there's some confusion in the code about where they're bound to.
From a code point of view they function exactly as normal supplies so can
be treated as such. It'd probably be better to model this by instantiating
a PHY object for integrated PHYs.
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
We hit the following warning in production
print_req_error: I/O error, dev nbd0, sector 7213934408 flags 80700
------------[ cut here ]------------
refcount_t: underflow; use-after-free.
WARNING: CPU: 25 PID: 32407 at lib/refcount.c:190 refcount_sub_and_test_checked+0x53/0x60
Workqueue: knbd-recv recv_work [nbd]
RIP: 0010:refcount_sub_and_test_checked+0x53/0x60
Call Trace:
blk_mq_free_request+0xb7/0xf0
blk_mq_complete_request+0x62/0xf0
recv_work+0x29/0xa1 [nbd]
process_one_work+0x1f5/0x3f0
worker_thread+0x2d/0x3d0
? rescuer_thread+0x340/0x340
kthread+0x111/0x130
? kthread_create_on_node+0x60/0x60
ret_from_fork+0x1f/0x30
---[ end trace b079c3c67f98bb7c ]---
This was preceded by us timing out everything and shutting down the
sockets for the device. The problem is we had a request in the queue at
the same time, so we completed the request twice. This can actually
happen in a lot of cases, we fail to get a ref on our config, we only
have one connection and just error out the command, etc.
Fix this by checking cmd->status in nbd_read_stat. We only change this
under the cmd->lock, so we are safe to check this here and see if we've
already error'ed this command out, which would indicate that we've
completed it as well.
Reviewed-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
We already do this for the most part, except in timeout and clear_req.
For the timeout case we take the lock after we grab a ref on the config,
but that isn't really necessary because we're safe to touch the cmd at
this point, so just move the order around.
For the clear_req cause this is initiated by the user, so again is safe.
Reviewed-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pull modules fixes from Jessica Yu:
- Revert __ksymtab_$namespace.$symbol naming scheme back to
__ksymtab_$symbol, as it was causing issues with depmod.
Instead, have modpost extract a symbol's namespace from __kstrtabns
and __ksymtab_strings.
- Fix `make nsdeps` for out of tree kernel builds (make O=...) caused
by unescaped '/'.
Use a different sed delimiter to avoid this problem.
* tag 'modules-for-v5.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux:
scripts/nsdeps: use alternative sed delimiter
symbol namespaces: revert to previous __ksymtab name scheme
modpost: make updating the symbol namespace explicit
modpost: delegate updating namespaces to separate function
Pull ARM SoC fixes from Olof Johansson:
"A slightly larger set of fixes have accrued in the last two weeks.
Mostly a collection of the usual smaller fixes:
- Marvell Armada: USB phy setup issues on Turris Mox
- Broadcom: GPIO/pinmux DT mapping corrections for Stingray, MMC bus
width fix for RPi Zero W, GPIO LED removal for RPI CM3. Also some
maintainer updates.
- OMAP: Fixlets for display config, interrupt settings for wifi, some
clock/PM pieces. Also IOMMU regression fix and a ti-sysc
no-watchdog regression fix.
- i.MX: A few fixes around PM/settings, some devicetree fixlets and
catching up with config option changes in DRM
- Rockchip: RockRro64 misc DT fixups, Hugsun X99 USB-C, Kevin display
panel settings
... and some smaller fixes for Davinci (backlight, McBSP DMA),
Allwinner (phy regulators, PMU removal on A64, etc)"
* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (42 commits)
ARM: dts: stm32: relax qspi pins slew-rate for stm32mp157
MAINTAINERS: Update the Spreadtrum SoC maintainer
MAINTAINERS: Remove Gregory and Brian for ARCH_BRCMSTB
ARM: dts: bcm2837-rpi-cm3: Avoid leds-gpio probing issue
bus: ti-sysc: Fix watchdog quirk handling
ARM: OMAP2+: Add pdata for OMAP3 ISP IOMMU
ARM: OMAP2+: Plug in device_enable/idle ops for IOMMUs
ARM: davinci_all_defconfig: enable GPIO backlight
ARM: davinci: dm365: Fix McBSP dma_slave_map entry
ARM: dts: bcm2835-rpi-zero-w: Fix bus-width of sdhci
ARM: imx_v6_v7_defconfig: Enable CONFIG_DRM_MSM
arm64: dts: imx8mn: Use correct clock for usdhc's ipg clk
arm64: dts: imx8mm: Use correct clock for usdhc's ipg clk
arm64: dts: imx8mq: Use correct clock for usdhc's ipg clk
ARM: dts: imx7s: Correct GPT's ipg clock source
ARM: dts: vf610-zii-scu4-aib: Specify 'i2c-mux-idle-disconnect'
ARM: dts: imx6q-logicpd: Re-Enable SNVS power key
arm64: dts: lx2160a: Correct CPU core idle state name
mailmap: Add Simon Arlott (replacement for expired email address)
arm64: dts: rockchip: Fix override mode for rk3399-kevin panel
...
Pull KVM fixes from Paolo Bonzini:
"Bugfixes for ARM, PPC and x86, plus selftest improvements"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: nVMX: Don't leak L1 MMIO regions to L2
KVM: SVM: Fix potential wrong physical id in avic_handle_ldr_update
kvm: clear kvmclock MSR on reset
KVM: x86: fix bugon.cocci warnings
KVM: VMX: Remove specialized handling of unexpected exit-reasons
selftests: kvm: fix sync_regs_test with newer gccs
selftests: kvm: vmx_dirty_log_test: skip the test when VMX is not supported
selftests: kvm: consolidate VMX support checks
selftests: kvm: vmx_set_nested_state_test: don't check for VMX support twice
KVM: Don't shrink/grow vCPU halt_poll_ns if host side polling is disabled
selftests: kvm: synchronize .gitignore to Makefile
kvm: x86: Expose RDPID in KVM_GET_SUPPORTED_CPUID
KVM: arm64: pmu: Reset sample period on overflow handling
KVM: arm64: pmu: Set the CHAINED attribute before creating the in-kernel event
arm64: KVM: Handle PMCR_EL0.LC as RES1 on pure AArch64 systems
KVM: arm64: pmu: Fix cycle counter truncation
KVM: PPC: Book3S HV: XIVE: Ensure VP isn't already in use
Pull drm fixes from Dave Airlie:
"Quiet week this week, which I suspect means some people just didn't
get around to sending me fixes pulls in time. This has 2 komeda and a
bunch of amdgpu fixes in it:
komeda:
- typo fixes
- flushing pipes fix
amdgpu:
- Fix suspend/resume issue related to multi-media engines
- Fix memory leak in user ptr code related to hmm conversion
- Fix possible VM faults when allocating page table memory
- Fix error handling in bo list ioctl"
* tag 'drm-fixes-2019-10-25' of git://anongit.freedesktop.org/drm/drm:
drm/komeda: Fix typos in komeda_splitter_validate
drm/komeda: Don't flush inactive pipes
drm/amdgpu/vce: fix allocation size in enc ring test
drm/amdgpu: fix error handling in amdgpu_bo_list_create
drm/amdgpu: fix potential VM faults
drm/amdgpu: user pages array memory leak fix
drm/amdgpu/vcn: fix allocation size in enc ring test
drm/amdgpu/uvd7: fix allocation size in enc ring test (v2)
drm/amdgpu/uvd6: fix allocation size in enc ring test (v2)
We currently assume that submissions from the sqthread are successful,
and if IO polling is enabled, we use that value for knowing how many
completions to look for. But if we overflowed the CQ ring or some
requests simply got errored and already completed, they won't be
available for polling.
For the case of IO polling and SQTHREAD usage, look at the pending
poll list. If it ever hits empty then we know that we don't have
anymore pollable requests inflight. For that case, simply reset
the inflight count to zero.
Reported-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
We currently use the ring values directly, but that can lead to issues
if the application is malicious and changes these values on our behalf.
Created in-kernel cached versions of them, and just overwrite the user
side when we update them. This is similar to how we treat the sq/cq
ring tail/head updates.
Reported-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
io_ring_submit() finalises with
1. io_commit_sqring(), which releases sqes to the userspace
2. Then calls to io_queue_link_head(), accessing released head's sqe
Reorder them.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
io_sq_thread() processes sqes by 8 without considering links. As a
result, links will be randomely subdivided.
The easiest way to fix it is to call io_get_sqring() inside
io_submit_sqes() as do io_ring_submit().
Downsides:
1. This removes optimisation of not grabbing mm_struct for fixed files
2. It submitting all sqes in one go, without finer-grained sheduling
with cq processing.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
There is a bug, where failed linked requests are returned not with
specified @user_data, but with garbage from a kernel stack.
The reason is that io_fail_links() uses req->user_data, which is
uninitialised when called from io_queue_sqe() on fail path.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Support for the kernel as Xen 32-bit PV guest will soon be removed.
Issue a warning when booted as such.
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Pull the second lot of irqchip updates for 5.4 from Marc Zyngier:
- Sifive PLIC: force driver to skip non-relevant contexts
- GICv4: Don't send VMOVP commands to ITSs that don't have
this vPE mapped
This reorganization will allow us to call kvm_arch_destroy_vm in the
event that kvm_create_vm fails after calling kvm_arch_init_vm.
Suggested-by: Junaid Shahid <junaids@google.com>
Signed-off-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Junaid Shahid <junaids@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Recent cleanup in the way EEH support is added to a device causes a
kernel oops when the cxl driver probes a device and creates virtual
devices discovered on the FPGA:
BUG: Kernel NULL pointer dereference at 0x000000a0
Faulting instruction address: 0xc000000000048070
Oops: Kernel access of bad area, sig: 7 [#1]
...
NIP eeh_add_device_late.part.9+0x50/0x1e0
LR eeh_add_device_late.part.9+0x3c/0x1e0
Call Trace:
_dev_info+0x5c/0x6c (unreliable)
pnv_pcibios_bus_add_device+0x60/0xb0
pcibios_bus_add_device+0x40/0x60
pci_bus_add_device+0x30/0x100
pci_bus_add_devices+0x64/0xd0
cxl_pci_vphb_add+0xe0/0x130 [cxl]
cxl_probe+0x504/0x5b0 [cxl]
local_pci_probe+0x6c/0x110
work_for_cpu_fn+0x38/0x60
The root cause is that those cxl virtual devices don't have a
representation in the device tree and therefore no associated pci_dn
structure. In eeh_add_device_late(), pdn is NULL, so edev is NULL and
we oops.
We never had explicit support for EEH for those virtual devices.
Instead, EEH events are reported to the (real) pci device and handled
by the cxl driver. Which can then forward to the virtual devices and
handle dependencies. The fact that we try adding EEH support for the
virtual devices is new and a side-effect of the recent cleanup.
This patch fixes it by skipping adding EEH support on powernv for
devices which don't have a pci_dn structure.
The cxl driver doesn't create virtual devices on pseries so this patch
doesn't fix it there intentionally.
Fixes: b905f8cdca ("powerpc/eeh: EEH for pSeries hot plug")
Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
Reviewed-by: Sam Bobroff <sbobroff@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191016162833.22509-1-fbarrat@linux.ibm.com
Modify plic_init() to skip .dts interrupt contexts other
than supervisor external interrupt.
The .dts entry for plic may specify multiple interrupt contexts.
For example, it may assign two entries IRQ_M_EXT and IRQ_S_EXT,
in that order, to the same interrupt controller. This patch
modifies plic_init() to skip the IRQ_M_EXT context since
IRQ_S_EXT is currently the only supported context.
If IRQ_M_EXT is not skipped, plic_init() will report "handler
already present for context" when it comes across the IRQ_S_EXT
context in the next iteration of its loop.
Without this patch, .dts would have to be edited to replace the
value of IRQ_M_EXT with -1 for it to be skipped.
Signed-off-by: Alan Mikhak <alan.mikhak@sifive.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Paul Walmsley <paul.walmsley@sifive.com> # arch/riscv
Link: https://lkml.kernel.org/r/1571933503-21504-1-git-send-email-alan.mikhak@sifive.com
The _PPC change notifications from the platform firmware are per-CPU,
so acpi_processor_ppc_init() needs to add a frequency QoS request
for each CPU covered by a cpufreq policy to take all of them into
account.
Even though ACPI thermal control of CPUs sets frequency limits
per processor package, it also needs a frequency QoS request for each
CPU in a cpufreq policy in case some of them are taken offline and
the frequency limit needs to be set through the remaining online
ones (this is slightly excessive, because all CPUs covered by one
cpufreq policy will set the same frequency limit through their QoS
requests, but it is not incorrect).
Modify the code in accordance with the above observations.
Fixes: d15ce41273 ("ACPI: cpufreq: Switch to QoS requests instead of cpufreq notifier")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
There's a deadlock that is possible and can easily be seen with
a test where multiple readers open/read/close of the same file
and a disruption occurs causing reconnect. The deadlock is due
a reader thread inside cifs_strict_readv calling down_read and
obtaining lock_sem, and then after reconnect inside
cifs_reopen_file calling down_read a second time. If in
between the two down_read calls, a down_write comes from
another process, deadlock occurs.
CPU0 CPU1
---- ----
cifs_strict_readv()
down_read(&cifsi->lock_sem);
_cifsFileInfo_put
OR
cifs_new_fileinfo
down_write(&cifsi->lock_sem);
cifs_reopen_file()
down_read(&cifsi->lock_sem);
Fix the above by changing all down_write(lock_sem) calls to
down_write_trylock(lock_sem)/msleep() loop, which in turn
makes the second down_read call benign since it will never
block behind the writer while holding lock_sem.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Suggested-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed--by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Currently the code assumes that if a file info entry belongs
to lists of open file handles of an inode and a tcon then
it has non-zero reference. The recent changes broke that
assumption when putting the last reference of the file info.
There may be a situation when a file is being deleted but
nothing prevents another thread to reference it again
and start using it. This happens because we do not hold
the inode list lock while checking the number of references
of the file info structure. Fix this by doing the proper
locking when doing the check.
Fixes: 487317c994 ("cifs: add spinlock for the openFileList to cifsInodeInfo")
Fixes: cb248819d2 ("cifs: use cifsInodeInfo->open_file_lock while iterating to avoid a panic")
Cc: Stable <stable@vger.kernel.org>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
When the client hits reconnect it iterates over the mid
pending queue marking entries for retry and moving them
to a temporary list to issue callbacks later without holding
GlobalMid_Lock. In the same time there is no guarantee that
mids can't be removed from the temporary list or even
freed completely by another thread. It may cause a temporary
list corruption:
[ 430.454897] list_del corruption. prev->next should be ffff98d3a8f316c0, but was 2e885cb266355469
[ 430.464668] ------------[ cut here ]------------
[ 430.466569] kernel BUG at lib/list_debug.c:51!
[ 430.468476] invalid opcode: 0000 [#1] SMP PTI
[ 430.470286] CPU: 0 PID: 13267 Comm: cifsd Kdump: loaded Not tainted 5.4.0-rc3+ #19
[ 430.473472] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[ 430.475872] RIP: 0010:__list_del_entry_valid.cold+0x31/0x55
...
[ 430.510426] Call Trace:
[ 430.511500] cifs_reconnect+0x25e/0x610 [cifs]
[ 430.513350] cifs_readv_from_socket+0x220/0x250 [cifs]
[ 430.515464] cifs_read_from_socket+0x4a/0x70 [cifs]
[ 430.517452] ? try_to_wake_up+0x212/0x650
[ 430.519122] ? cifs_small_buf_get+0x16/0x30 [cifs]
[ 430.521086] ? allocate_buffers+0x66/0x120 [cifs]
[ 430.523019] cifs_demultiplex_thread+0xdc/0xc30 [cifs]
[ 430.525116] kthread+0xfb/0x130
[ 430.526421] ? cifs_handle_standard+0x190/0x190 [cifs]
[ 430.528514] ? kthread_park+0x90/0x90
[ 430.530019] ret_from_fork+0x35/0x40
Fix this by obtaining extra references for mids being retried
and marking them as MID_DELETED which indicates that such a mid
has been dequeued from the pending list.
Also move mid cleanup logic from DeleteMidQEntry to
_cifs_mid_q_entry_release which is called when the last reference
to a particular mid is put. This allows to avoid any use-after-free
of response buffers.
The patch needs to be backported to stable kernels. A stable tag
is not mentioned below because the patch doesn't apply cleanly
to any actively maintained stable kernel.
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-and-tested-by: David Wysochanski <dwysocha@redhat.com>
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
When rdmacm module is not loaded, and when netlink message is received to
get char device info, it results into a deadlock due to recursive locking
of rdma_nl_mutex with the below call sequence.
[..]
rdma_nl_rcv()
mutex_lock()
[..]
rdma_nl_rcv_msg()
ib_get_client_nl_info()
request_module()
iw_cm_init()
rdma_nl_register()
mutex_lock(); <- Deadlock, acquiring mutex again
Due to above call sequence, following call trace and deadlock is observed.
kernel: __mutex_lock+0x35e/0x860
kernel: ? __mutex_lock+0x129/0x860
kernel: ? rdma_nl_register+0x1a/0x90 [ib_core]
kernel: rdma_nl_register+0x1a/0x90 [ib_core]
kernel: ? 0xffffffffc029b000
kernel: iw_cm_init+0x34/0x1000 [iw_cm]
kernel: do_one_initcall+0x67/0x2d4
kernel: ? kmem_cache_alloc_trace+0x1ec/0x2a0
kernel: do_init_module+0x5a/0x223
kernel: load_module+0x1998/0x1e10
kernel: ? __symbol_put+0x60/0x60
kernel: __do_sys_finit_module+0x94/0xe0
kernel: do_syscall_64+0x5a/0x270
kernel: entry_SYSCALL_64_after_hwframe+0x49/0xbe
process stack trace:
[<0>] __request_module+0x1c9/0x460
[<0>] ib_get_client_nl_info+0x5e/0xb0 [ib_core]
[<0>] nldev_get_chardev+0x1ac/0x320 [ib_core]
[<0>] rdma_nl_rcv_msg+0xeb/0x1d0 [ib_core]
[<0>] rdma_nl_rcv+0xcd/0x120 [ib_core]
[<0>] netlink_unicast+0x179/0x220
[<0>] netlink_sendmsg+0x2f6/0x3f0
[<0>] sock_sendmsg+0x30/0x40
[<0>] ___sys_sendmsg+0x27a/0x290
[<0>] __sys_sendmsg+0x58/0xa0
[<0>] do_syscall_64+0x5a/0x270
[<0>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
To overcome this deadlock and to allow multiple netlink messages to
progress in parallel, following scheme is implemented.
1. Split the lock protecting the cb_table into a per-index lock, and make
it a rwlock. This lock is used to ensure no callbacks are running after
unregistration returns. Since a module will not be registered once it
is already running callbacks, this avoids the deadlock.
2. Use smp_store_release() to update the cb_table during registration so
that no lock is required. This avoids lockdep problems with thinking
all the rwsems are the same lock class.
Fixes: 0e2d00eb6f ("RDMA: Add NLDEV_GET_CHARDEV to allow char dev discovery and autoload")
Link: https://lore.kernel.org/r/20191015080733.18625-1-leon@kernel.org
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Pull Devicetree fixes from Rob Herring:
"A couple more DT fixes for 5.4: fix a ref count, memory leak, and
Risc-V cpu schema warnings"
* tag 'devicetree-fixes-for-5.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
of: reserved_mem: add missing of_node_put() for proper ref-counting
of: unittest: fix memory leak in unittest_data_add
dt-bindings: riscv: Fix CPU schema errors
Taehee Yoo says:
====================
net: fix nested device bugs
This patchset fixes several bugs that are related to nesting
device infrastructure.
Current nesting infrastructure code doesn't limit the depth level of
devices. nested devices could be handled recursively. at that moment,
it needs huge memory and stack overflow could occur.
Below devices type have same bug.
VLAN, BONDING, TEAM, MACSEC, MACVLAN, IPVLAN, and VXLAN.
But I couldn't test all interface types so there could be more device
types, which have similar problems.
Maybe qmi_wwan.c code could have same problem.
So, I would appreciate if someone test qmi_wwan.c and other modules.
Test commands:
ip link add dummy0 type dummy
ip link add vlan1 link dummy0 type vlan id 1
for i in {2..100}
do
let A=$i-1
ip link add name vlan$i link vlan$A type vlan id $i
done
ip link del dummy0
1st patch actually fixes the root cause.
It adds new common variables {upper/lower}_level that represent
depth level. upper_level variable is depth of upper devices.
lower_level variable is depth of lower devices.
[U][L] [U][L]
vlan1 1 5 vlan4 1 4
vlan2 2 4 vlan5 2 3
vlan3 3 3 |
| |
+------------+
|
vlan6 4 2
dummy0 5 1
After this patch, the nesting infrastructure code uses this variable to
check the depth level.
2nd patch fixes Qdisc lockdep related problem.
Before this patch, devices use static lockdep map.
So, if devices that are same types are nested, lockdep will warn about
recursive situation.
These patches make these devices use dynamic lockdep key instead of
static lock or subclass.
3rd patch fixes unexpected IFF_BONDING bit unset.
When nested bonding interface scenario, bonding interface could lost it's
IFF_BONDING flag. This should not happen.
This patch adds a condition before unsetting IFF_BONDING.
4th patch fixes nested locking problem in bonding interface
Bonding interface has own lock and this uses static lock.
Bonding interface could be nested and it uses same lockdep key.
So that unexisting lockdep warning occurs.
5th patch fixes nested locking problem in team interface
Team interface has own lock and this uses static lock.
Team interface could be nested and it uses same lockdep key.
So that unexisting lockdep warning occurs.
6th patch fixes a refcnt leak in the macsec module.
When the macsec module is unloaded, refcnt leaks occur.
But actually, that holding refcnt is unnecessary.
So this patch just removes these code.
7th patch adds ignore flag to an adjacent structure.
In order to exchange an adjacent node safely, ignore flag is needed.
8th patch makes vxlan add an adjacent link to limit depth level.
Vxlan interface could set it's lower interface and these lower interfaces
are handled recursively.
So, if the depth of lower interfaces is too deep, stack overflow could
happen.
9th patch removes unnecessary variables and callback.
After 1st patch, subclass callback and variables are unnecessary.
This patch just removes these variables and callback.
10th patch fix refcnt leaks in the virt_wifi module
Like every nested interface, the upper interface should be deleted
before the lower interface is deleted.
In order to fix this, the notifier routine is added in this patch.
v4 -> v5 :
- Update log messages
- Move variables position, 1st patch
- Fix iterator routine, 1st patch
- Add generic lockdep key code, which replaces 2, 4, 5, 6, 7 patches.
- Log message update, 10th patch
- Fix wrong error value in error path of __init routine, 10th patch
- hold module refcnt when interface is created, 10th patch
v3 -> v4 :
- Add new 12th patch to fix refcnt leaks in the virt_wifi module
- Fix wrong usage netdev_upper_dev_link() in the vxlan.c
- Preserve reverse christmas tree variable ordering in the vxlan.c
- Add missing static keyword in the dev.c
- Expose netdev_adjacent_change_{prepare/commit/abort} instead of
netdev_adjacent_dev_{enable/disable}
v2 -> v3 :
- Modify nesting infrastructure code to use iterator instead of recursive.
v1 -> v2 :
- Make the 3rd patch do not add a new priv_flag.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch removes variables and callback these are related to the nested
device structure.
devices that can be nested have their own nest_level variable that
represents the depth of nested devices.
In the previous patch, new {lower/upper}_level variables are added and
they replace old private nest_level variable.
So, this patch removes all 'nest_level' variables.
In order to avoid lockdep warning, ->ndo_get_lock_subclass() was added
to get lockdep subclass value, which is actually lower nested depth value.
But now, they use the dynamic lockdep key to avoid lockdep warning instead
of the subclass.
So, this patch removes ->ndo_get_lock_subclass() callback.
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In order to link an adjacent node, netdev_upper_dev_link() is used
and in order to unlink an adjacent node, netdev_upper_dev_unlink() is used.
unlink operation does not fail, but link operation can fail.
In order to exchange adjacent nodes, we should unlink an old adjacent
node first. then, link a new adjacent node.
If link operation is failed, we should link an old adjacent node again.
But this link operation can fail too.
It eventually breaks the adjacent link relationship.
This patch adds an ignore flag into the netdev_adjacent structure.
If this flag is set, netdev_upper_dev_link() ignores an old adjacent
node for a moment.
This patch also adds new functions for other modules.
netdev_adjacent_change_prepare()
netdev_adjacent_change_commit()
netdev_adjacent_change_abort()
netdev_adjacent_change_prepare() inserts new device into adjacent list
but new device is not allowed to use immediately.
If netdev_adjacent_change_prepare() fails, it internally rollbacks
adjacent list so that we don't need any other action.
netdev_adjacent_change_commit() deletes old device in the adjacent list
and allows new device to use.
netdev_adjacent_change_abort() rollbacks adjacent list.
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a macsec interface is created, it increases a refcnt to a lower
device(real device). when macsec interface is deleted, the refcnt is
decreased in macsec_free_netdev(), which is ->priv_destructor() of
macsec interface.
The problem scenario is this.
When nested macsec interfaces are exiting, the exit routine of the
macsec module makes refcnt leaks.
Test commands:
ip link add dummy0 type dummy
ip link add macsec0 link dummy0 type macsec
ip link add macsec1 link macsec0 type macsec
modprobe -rv macsec
[ 208.629433] unregister_netdevice: waiting for macsec0 to become free. Usage count = 1
Steps of exit routine of macsec module are below.
1. Calls ->dellink() in __rtnl_link_unregister().
2. Checks refcnt and wait refcnt to be 0 if refcnt is not 0 in
netdev_run_todo().
3. Calls ->priv_destruvtor() in netdev_run_todo().
Step2 checks refcnt, but step3 decreases refcnt.
So, step2 waits forever.
This patch makes the macsec module do not hold a refcnt of the lower
device because it already holds a refcnt of the lower device with
netdev_upper_dev_link().
Fixes: c09440f7dc ("macsec: introduce IEEE 802.1AE driver")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
team interface could be nested and it's lock variable could be nested too.
But this lock uses static lockdep key and there is no nested locking
handling code such as mutex_lock_nested() and so on.
so the Lockdep would warn about the circular locking scenario that
couldn't happen.
In order to fix, this patch makes the team module to use dynamic lock key
instead of static key.
Test commands:
ip link add team0 type team
ip link add team1 type team
ip link set team0 master team1
ip link set team0 nomaster
ip link set team1 master team0
ip link set team1 nomaster
Splat that looks like:
[ 40.364352] WARNING: possible recursive locking detected
[ 40.364964] 5.4.0-rc3+ #96 Not tainted
[ 40.365405] --------------------------------------------
[ 40.365973] ip/750 is trying to acquire lock:
[ 40.366542] ffff888060b34c40 (&team->lock){+.+.}, at: team_set_mac_address+0x151/0x290 [team]
[ 40.367689]
but task is already holding lock:
[ 40.368729] ffff888051201c40 (&team->lock){+.+.}, at: team_del_slave+0x29/0x60 [team]
[ 40.370280]
other info that might help us debug this:
[ 40.371159] Possible unsafe locking scenario:
[ 40.371942] CPU0
[ 40.372338] ----
[ 40.372673] lock(&team->lock);
[ 40.373115] lock(&team->lock);
[ 40.373549]
*** DEADLOCK ***
[ 40.374432] May be due to missing lock nesting notation
[ 40.375338] 2 locks held by ip/750:
[ 40.375851] #0: ffffffffabcc42b0 (rtnl_mutex){+.+.}, at: rtnetlink_rcv_msg+0x466/0x8a0
[ 40.376927] #1: ffff888051201c40 (&team->lock){+.+.}, at: team_del_slave+0x29/0x60 [team]
[ 40.377989]
stack backtrace:
[ 40.378650] CPU: 0 PID: 750 Comm: ip Not tainted 5.4.0-rc3+ #96
[ 40.379368] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[ 40.380574] Call Trace:
[ 40.381208] dump_stack+0x7c/0xbb
[ 40.381959] __lock_acquire+0x269d/0x3de0
[ 40.382817] ? register_lock_class+0x14d0/0x14d0
[ 40.383784] ? check_chain_key+0x236/0x5d0
[ 40.384518] lock_acquire+0x164/0x3b0
[ 40.385074] ? team_set_mac_address+0x151/0x290 [team]
[ 40.385805] __mutex_lock+0x14d/0x14c0
[ 40.386371] ? team_set_mac_address+0x151/0x290 [team]
[ 40.387038] ? team_set_mac_address+0x151/0x290 [team]
[ 40.387632] ? mutex_lock_io_nested+0x1380/0x1380
[ 40.388245] ? team_del_slave+0x60/0x60 [team]
[ 40.388752] ? rcu_read_lock_sched_held+0x90/0xc0
[ 40.389304] ? rcu_read_lock_bh_held+0xa0/0xa0
[ 40.389819] ? lock_acquire+0x164/0x3b0
[ 40.390285] ? lockdep_rtnl_is_held+0x16/0x20
[ 40.390797] ? team_port_get_rtnl+0x90/0xe0 [team]
[ 40.391353] ? __module_text_address+0x13/0x140
[ 40.391886] ? team_set_mac_address+0x151/0x290 [team]
[ 40.392547] team_set_mac_address+0x151/0x290 [team]
[ 40.393111] dev_set_mac_address+0x1f0/0x3f0
[ ... ]
Fixes: 3d249d4ca7 ("net: introduce ethernet teaming device")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
All bonding device has same lockdep key and subclass is initialized with
nest_level.
But actual nest_level value can be changed when a lower device is attached.
And at this moment, the subclass should be updated but it seems to be
unsafe.
So this patch makes bonding use dynamic lockdep key instead of the
subclass.
Test commands:
ip link add bond0 type bond
for i in {1..5}
do
let A=$i-1
ip link add bond$i type bond
ip link set bond$i master bond$A
done
ip link set bond5 master bond0
Splat looks like:
[ 307.992912] WARNING: possible recursive locking detected
[ 307.993656] 5.4.0-rc3+ #96 Tainted: G W
[ 307.994367] --------------------------------------------
[ 307.995092] ip/761 is trying to acquire lock:
[ 307.995710] ffff8880513aac60 (&(&bond->stats_lock)->rlock#2/2){+.+.}, at: bond_get_stats+0xb8/0x500 [bonding]
[ 307.997045]
but task is already holding lock:
[ 307.997923] ffff88805fcbac60 (&(&bond->stats_lock)->rlock#2/2){+.+.}, at: bond_get_stats+0xb8/0x500 [bonding]
[ 307.999215]
other info that might help us debug this:
[ 308.000251] Possible unsafe locking scenario:
[ 308.001137] CPU0
[ 308.001533] ----
[ 308.001915] lock(&(&bond->stats_lock)->rlock#2/2);
[ 308.002609] lock(&(&bond->stats_lock)->rlock#2/2);
[ 308.003302]
*** DEADLOCK ***
[ 308.004310] May be due to missing lock nesting notation
[ 308.005319] 3 locks held by ip/761:
[ 308.005830] #0: ffffffff9fcc42b0 (rtnl_mutex){+.+.}, at: rtnetlink_rcv_msg+0x466/0x8a0
[ 308.006894] #1: ffff88805fcbac60 (&(&bond->stats_lock)->rlock#2/2){+.+.}, at: bond_get_stats+0xb8/0x500 [bonding]
[ 308.008243] #2: ffffffff9f9219c0 (rcu_read_lock){....}, at: bond_get_stats+0x9f/0x500 [bonding]
[ 308.009422]
stack backtrace:
[ 308.010124] CPU: 0 PID: 761 Comm: ip Tainted: G W 5.4.0-rc3+ #96
[ 308.011097] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[ 308.012179] Call Trace:
[ 308.012601] dump_stack+0x7c/0xbb
[ 308.013089] __lock_acquire+0x269d/0x3de0
[ 308.013669] ? register_lock_class+0x14d0/0x14d0
[ 308.014318] lock_acquire+0x164/0x3b0
[ 308.014858] ? bond_get_stats+0xb8/0x500 [bonding]
[ 308.015520] _raw_spin_lock_nested+0x2e/0x60
[ 308.016129] ? bond_get_stats+0xb8/0x500 [bonding]
[ 308.017215] bond_get_stats+0xb8/0x500 [bonding]
[ 308.018454] ? bond_arp_rcv+0xf10/0xf10 [bonding]
[ 308.019710] ? rcu_read_lock_held+0x90/0xa0
[ 308.020605] ? rcu_read_lock_sched_held+0xc0/0xc0
[ 308.021286] ? bond_get_stats+0x9f/0x500 [bonding]
[ 308.021953] dev_get_stats+0x1ec/0x270
[ 308.022508] bond_get_stats+0x1d1/0x500 [bonding]
Fixes: d3fff6c443 ("net: add netdev_lockdep_set_classes() helper")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some interface types could be nested.
(VLAN, BONDING, TEAM, MACSEC, MACVLAN, IPVLAN, VIRT_WIFI, VXLAN, etc..)
These interface types should set lockdep class because, without lockdep
class key, lockdep always warn about unexisting circular locking.
In the current code, these interfaces have their own lockdep class keys and
these manage itself. So that there are so many duplicate code around the
/driver/net and /net/.
This patch adds new generic lockdep keys and some helper functions for it.
This patch does below changes.
a) Add lockdep class keys in struct net_device
- qdisc_running, xmit, addr_list, qdisc_busylock
- these keys are used as dynamic lockdep key.
b) When net_device is being allocated, lockdep keys are registered.
- alloc_netdev_mqs()
c) When net_device is being free'd llockdep keys are unregistered.
- free_netdev()
d) Add generic lockdep key helper function
- netdev_register_lockdep_key()
- netdev_unregister_lockdep_key()
- netdev_update_lockdep_key()
e) Remove unnecessary generic lockdep macro and functions
f) Remove unnecessary lockdep code of each interfaces.
After this patch, each interface modules don't need to maintain
their lockdep keys.
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Current code doesn't limit the number of nested devices.
Nested devices would be handled recursively and this needs huge stack
memory. So, unlimited nested devices could make stack overflow.
This patch adds upper_level and lower_level, they are common variables
and represent maximum lower/upper depth.
When upper/lower device is attached or dettached,
{lower/upper}_level are updated. and if maximum depth is bigger than 8,
attach routine fails and returns -EMLINK.
In addition, this patch converts recursive routine of
netdev_walk_all_{lower/upper} to iterator routine.
Test commands:
ip link add dummy0 type dummy
ip link add link dummy0 name vlan1 type vlan id 1
ip link set vlan1 up
for i in {2..55}
do
let A=$i-1
ip link add vlan$i link vlan$A type vlan id $i
done
ip link del dummy0
Splat looks like:
[ 155.513226][ T908] BUG: KASAN: use-after-free in __unwind_start+0x71/0x850
[ 155.514162][ T908] Write of size 88 at addr ffff8880608a6cc0 by task ip/908
[ 155.515048][ T908]
[ 155.515333][ T908] CPU: 0 PID: 908 Comm: ip Not tainted 5.4.0-rc3+ #96
[ 155.516147][ T908] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[ 155.517233][ T908] Call Trace:
[ 155.517627][ T908]
[ 155.517918][ T908] Allocated by task 0:
[ 155.518412][ T908] (stack is not available)
[ 155.518955][ T908]
[ 155.519228][ T908] Freed by task 0:
[ 155.519885][ T908] (stack is not available)
[ 155.520452][ T908]
[ 155.520729][ T908] The buggy address belongs to the object at ffff8880608a6ac0
[ 155.520729][ T908] which belongs to the cache names_cache of size 4096
[ 155.522387][ T908] The buggy address is located 512 bytes inside of
[ 155.522387][ T908] 4096-byte region [ffff8880608a6ac0, ffff8880608a7ac0)
[ 155.523920][ T908] The buggy address belongs to the page:
[ 155.524552][ T908] page:ffffea0001822800 refcount:1 mapcount:0 mapping:ffff88806c657cc0 index:0x0 compound_mapcount:0
[ 155.525836][ T908] flags: 0x100000000010200(slab|head)
[ 155.526445][ T908] raw: 0100000000010200 ffffea0001813808 ffffea0001a26c08 ffff88806c657cc0
[ 155.527424][ T908] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 155.528429][ T908] page dumped because: kasan: bad access detected
[ 155.529158][ T908]
[ 155.529410][ T908] Memory state around the buggy address:
[ 155.530060][ T908] ffff8880608a6b80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 155.530971][ T908] ffff8880608a6c00: fb fb fb fb fb f1 f1 f1 f1 00 f2 f2 f2 f3 f3 f3
[ 155.531889][ T908] >ffff8880608a6c80: f3 fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 155.532806][ T908] ^
[ 155.533509][ T908] ffff8880608a6d00: fb fb fb fb fb fb fb fb fb f1 f1 f1 f1 00 00 00
[ 155.534436][ T908] ffff8880608a6d80: f2 f3 f3 f3 f3 fb fb fb 00 00 00 00 00 00 00 00
[ ... ]
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull ACPI fix from Rafael Wysocki:
"Fix locking issue in the error code path of a function that belongs to
the sysfs interface exposed by the ACPI NFIT handling code (Dan
Carpenter)"
* tag 'acpi-5.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: NFIT: Fix unlock on error in scrub_show()
Pull power management fixes from Rafael Wysocki:
"These fix problems related to frequency limits management in cpufreq
that were introduced during the 5.3 cycle (when PM QoS had started to
be used for that), fix a few issues in the OPP (operating performance
points) library code and fix up the recently added haltpoll cpuidle
driver.
The cpufreq changes are somewhat bigger that I would like them to be
at this stage of the cycle, but the problems fixed by them include
crashes on boot and shutdown in some cases (among other things) and in
my view it is better to address the root of the issue right away.
Specifics:
- Using device PM QoS of CPU devices for managing frequency limits in
cpufreq does not work, so introduce frequency QoS (based on the
original low-level PM QoS) for this purpose, switch cpufreq and
related code over to using it and fix a race involving deferred
updates of frequency limits on top of that (Rafael Wysocki, Sudeep
Holla).
- Avoid calling regulator_enable()/disable() from the OPP framework
to avoid side-effects on boot-enabled regulators that may change
their initial voltage due to performing initial voltage balancing
without all restrictions from the consumers (Marek Szyprowski).
- Avoid a kref management issue in the OPP library code and drop an
incorrectly added lockdep_assert_held() from it (Viresh Kumar).
- Make the recently added haltpoll cpuidle driver take the 'idle='
override into account as appropriate (Zhenzhong Duan)"
* tag 'pm-5.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
opp: Reinitialize the list_kref before adding the static OPPs again
cpufreq: Cancel policy update work scheduled before freeing
cpuidle: haltpoll: Take 'idle=' override into account
opp: core: Revert "add regulators enable and disable"
PM: QoS: Drop frequency QoS types from device PM QoS
cpufreq: Use per-policy frequency QoS
PM: QoS: Introduce frequency QoS
opp: of: drop incorrect lockdep_assert_held()
Pull gfs2 fix from Andreas Gruenbacher:
"Fix a memory leak introduced in -rc1"
* tag 'gfs2-v5.4-rc4.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
gfs2: Fix memory leak when gfs2meta's fs_context is freed
Remove the following warning:
drivers/i2c/busses/i2c-stm32f7.c:315:
warning: cannot understand function prototype:
'struct stm32f7_i2c_spec i2c_specs[] =
Replace a comment starting with /** by simply /* to avoid having
it interpreted as a kernel-doc comment.
Fixes: aeb068c572 ("i2c: i2c-stm32f7: add driver")
Signed-off-by: Alain Volmat <alain.volmat@st.com>
Reviewed-by: Pierre-Yves MORDRET <pierre-yves.mordret@st.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
When in slave mode, an arbitration loss (ARLO) may be detected before the
slave had a chance to detect the stop condition (STOPF in ISR).
This is seen when two master + slave adapters switch their roles. It
provokes the i2c bus to be stuck, busy as SCL line is stretched.
- the I2C_SLAVE_STOP event is never generated due to STOPF flag is set but
don't generate an irq (race with ARLO irq, STOPIE is masked). STOPF flag
remains set until next master xfer (e.g. when STOPIE irq get unmasked).
In this case, completion is generated too early: immediately upon new
transfer request (then it doesn't send all data).
- Some data get stuck in TXDR register. As a consequence, the controller
stretches the SCL line: the bus gets busy until a future master transfer
triggers the bus busy / recovery mechanism (this can take time... and
may never happen at all)
So choice is to let the STOPF being detected by the slave isr handler,
to properly handle this stop condition. E.g. don't mask IRQs in error
handler, when the slave is running.
Fixes: 60d609f30d ("i2c: i2c-stm32f7: Add slave support")
Signed-off-by: Fabrice Gasnier <fabrice.gasnier@st.com>
Reviewed-by: Pierre-Yves MORDRET <pierre-yves.mordret@st.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Since commit abf4923e97 ("i2c: mediatek: disable zero-length transfers
for mt8183"), there is a NULL pointer dereference for all the SoCs
that don't have any quirk. mtk_i2c_functionality is not checking that
the quirks pointer is not NULL before starting to use it.
This commit add a call to i2c_check_quirks which will check whether
the quirks pointer is set, and if so will check if the IP has the
NO_ZERO_LEN quirk.
Fixes: abf4923e97 ("i2c: mediatek: disable zero-length transfers for mt8183")
Signed-off-by: Fabien Parent <fparent@baylibre.com>
Reviewed-by: Cengiz Can <cengiz@kernel.wtf>
Reviewed-by: Hsin-Yi Wang <hsinyi@chromium.org>
Tested-by: Ulrich Hecht <uli@fpond.eu>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
On a system without Single VMOVP support (say GITS_TYPER.VMOVP == 0),
we will map vPEs only on ITSs that will actually control interrupts
for the given VM. And when moving a vPE, the VMOVP command will be
issued only for those ITSs.
But when issuing VMOVPs we seemed fail to present the exact ITSList
to ITSs who are actually included in the synchronization operation.
The its_list_map we're currently using includes all ITSs in the system,
even though some of them don't have the corresponding vPE mapping at all.
Introduce get_its_list() to get the per-VM its_list_map, to indicate
which ITSs have vPE mappings for the given VM, and use this map as
the expected ITSList when building VMOVP. This is hopefully a performance
gain not to do some synchronization with those unsuspecting ITSs.
And initialize the whole command descriptor to zero at beginning, since
the seq_num and its_list should be RES0 when GITS_TYPER.VMOVP == 1.
Signed-off-by: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/1571802386-2680-1-git-send-email-yuzenghui@huawei.com
gfs2 and gfs2meta share an ->init_fs_context function which allocates an
args structure stored in fc->fs_private. gfs2 registers a ->free
function to free this memory when the fs_context is cleaned up, but
there was not one registered for gfs2meta, causing a leak.
Register a ->free function for gfs2meta. The existing gfs2_fc_free
function does what we need.
Reported-by: syzbot+c2fdfd2b783754878fb6@syzkaller.appspotmail.com
Fixes: 1f52aa08d1 ("gfs2: Convert gfs2 to fs_context")
Signed-off-by: Andrew Price <anprice@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
* pm-cpuidle:
cpuidle: haltpoll: Take 'idle=' override into account
* pm-opp:
opp: Reinitialize the list_kref before adding the static OPPs again
opp: core: Revert "add regulators enable and disable"
opp: of: drop incorrect lockdep_assert_held()
Payload offload rule should also check the length of the match.
Moreover, check for unsupported link-layer fields:
nft --debug=netlink add rule firewall zones vlan id 100
...
[ payload load 2b @ link header + 0 => reg 1 ]
this loads 2byte base on ll header and offset 0.
This also fixes unsupported raw payload match.
Fixes: 92ad6325cb ("netfilter: nf_tables: add hardware offload support")
Signed-off-by: wenxu <wenxu@ucloud.cn>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Pull MFD fix from Lee Jones:
"Fix broken support for BananaPi-r2"
* tag 'mfd-fixes-5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd:
mfd: mt6397: Fix probe after changing mt6397-core
Pull sound fixes from Takashi Iwai:
"This is a usual small bump in the middle, we've got a set of ASoC
fixes in this week as shown in diffstat.
The only change in the core stuff is about (somewhat minor) PCM
debugfs error handling. The major changes are rather for Intel SOF and
topology coverage, as well as other platform (rockchip, samsung, stm)
and codec fixes.
As non-ASoC changes, a couple of new HD-audio chip fixes and a typo
correction of USB-audio driver validation code are found"
* tag 'sound-5.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (29 commits)
ALSA: hda: Add Tigerlake/Jasperlake PCI ID
ALSA: usb-audio: Fix copy&paste error in the validator
ALSA: hda/realtek - Add support for ALC711
ASoC: SOF: control: return true when kcontrol values change
ASoC: stm32: sai: fix sysclk management on shutdown
ASoC: Intel: sof-rt5682: add a check for devm_clk_get
ASoC: rsnd: Reinitialize bit clock inversion flag for every format setting
ASoC: simple_card_utils.h: Fix potential multiple redefinition error
ASoC: msm8916-wcd-digital: add missing MIX2 path for RX1/2
ASoC: core: Fix pcm code debugfs error
ASoc: rockchip: i2s: Fix RPM imbalance
ASoC: wm_adsp: Don't generate kcontrols without READ flags
ASoC: intel: bytcr_rt5651: add null check to support_button_press
ASoC: intel: sof_rt5682: add remove function to disable jack
ASoC: rt5682: add NULL handler to set_jack function
ASoC: intel: sof_rt5682: use separate route map for dmic
ASoC: SOF: Intel: hda: Disable DMI L1 entry during capture
ASoC: SOF: Intel: initialise and verify FW crash dump data.
ASoC: SOF: Intel: hda: fix warnings during FW load
ASoC: SOF: pcm: harden PCM STOP sequence
...
syzbot reported the following issue :
BUG: KCSAN: data-race in update_defense_level / update_defense_level
read to 0xffffffff861a6260 of 4 bytes by task 3006 on cpu 1:
update_defense_level+0x621/0xb30 net/netfilter/ipvs/ip_vs_ctl.c:177
defense_work_handler+0x3d/0xd0 net/netfilter/ipvs/ip_vs_ctl.c:225
process_one_work+0x3d4/0x890 kernel/workqueue.c:2269
worker_thread+0xa0/0x800 kernel/workqueue.c:2415
kthread+0x1d4/0x200 drivers/block/aoe/aoecmd.c:1253
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:352
write to 0xffffffff861a6260 of 4 bytes by task 7333 on cpu 0:
update_defense_level+0xa62/0xb30 net/netfilter/ipvs/ip_vs_ctl.c:205
defense_work_handler+0x3d/0xd0 net/netfilter/ipvs/ip_vs_ctl.c:225
process_one_work+0x3d4/0x890 kernel/workqueue.c:2269
worker_thread+0xa0/0x800 kernel/workqueue.c:2415
kthread+0x1d4/0x200 drivers/block/aoe/aoecmd.c:1253
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:352
Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 7333 Comm: kworker/0:5 Not tainted 5.4.0-rc3+ #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: events defense_work_handler
Indeed, old_secure_tcp is currently a static variable, while it
needs to be a per netns variable.
Fixes: a0840e2e16 ("IPVS: netns, ip_vs_ctl local vars moved to ipvs struct.")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Part 3 from this series [1] was not merged due to wrong splitting
and breaks mt6323 pmic on bananapi-r2
dmesg prints this line and at least switch is not initialized on bananapi-r2
mt6397 1000d000.pwrap:mt6323: unsupported chip: 0x0
this patch contains only the probe-changes and chip_data structs
from original part 3 by Hsin-Hsiung Wang
[1] https://patchwork.kernel.org/project/linux-mediatek/list/?series=164155
Fixes: a4872e80ce ("mfd: mt6397: Extract IRQ related code from core driver")
Signed-off-by: Frank Wunderlich <frank-w@public-files.de>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
The LAN8740, like the 8720, also requires a reset after enabling clock.
The datasheet [1] 3.8.5.1 says:
"During a Hardware reset, an external clock must be supplied
to the XTAL1/CLKIN signal."
I have observed this issue on a custom i.MX6 based board with
the LAN8740A.
[1] http://ww1.microchip.com/downloads/en/DeviceDoc/8740a.pdf
Signed-off-by: Martin Fuzzey <martin.fuzzey@flowbird.group>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
build_restore_pagemask() will restore the value of register $1/$at when
its restore_scratch argument is non-zero, and aims to do so by filling a
branch delay slot. Commit 0b24cae4d5 ("MIPS: Add missing EHB in mtc0
-> mfc0 sequence.") added an EHB instruction (Execution Hazard Barrier)
prior to restoring $1 from a KScratch register, in order to resolve a
hazard that can result in stale values of the KScratch register being
observed. In particular, P-class CPUs from MIPS with out of order
execution pipelines such as the P5600 & P6600 are affected.
Unfortunately this EHB instruction was inserted in the branch delay slot
causing the MFC0 instruction which performs the restoration to no longer
execute along with the branch. The result is that the $1 register isn't
actually restored, ie. the TLB refill exception handler clobbers it -
which is exactly the problem the EHB is meant to avoid for the P-class
CPUs.
Similarly build_get_pgd_vmalloc() will restore the value of $1/$at when
its mode argument equals refill_scratch, and suffers from the same
problem.
Fix this by in both cases moving the EHB earlier in the emitted code.
There's no reason it needs to immediately precede the MFC0 - it simply
needs to be between the MTC0 & MFC0.
This bug only affects Cavium Octeon systems which use
build_fast_tlb_refill_handler().
Signed-off-by: Paul Burton <paulburton@kernel.org>
Fixes: 0b24cae4d5 ("MIPS: Add missing EHB in mtc0 -> mfc0 sequence.")
Cc: Dmitry Korotin <dkorotin@wavecomp.com>
Cc: stable@vger.kernel.org # v3.15+
Cc: linux-mips@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
The sequence number of the timeout req (req->sequence) indicate the
expected completion request. Because of each timeout req consume a
sequence number, so the sequence of each timeout req on the timeout
list shouldn't be the same. But now, we may get the same number (also
incorrect) if we insert a new entry before the last one, such as submit
such two timeout reqs on a new ring instance below.
req->sequence
req_1 (count = 2): 2
req_2 (count = 1): 2
Then, if we submit a nop req, req_2 will still timeout even the nop req
finished. This patch fix this problem by adjust the sequence number of
each reordered reqs when inserting a new entry.
Signed-off-by: zhangyi (F) <yi.zhang@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The sequence number of reqs on the timeout_list before the timeout req
should be adjusted in io_timeout_fn(), because the current timeout req
will consumes a slot in the cq_ring and cq_tail pointer will be
increased, otherwise other timeout reqs may return in advance without
waiting for enough wait_nr.
Signed-off-by: zhangyi (F) <yi.zhang@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
There are cases where it isn't always safe to block for submission,
even if the caller asked to wait for events as well. Revert the
previous optimization of doing that.
This reverts two commits:
bf7ec93c64c576666863
Fixes: c576666863 ("io_uring: optimize submit_and_wait API")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The vectors span more than one byte, so mark them as arrays.
Fixes the following build error when building when using GCC 8.3:
In file included from ./include/linux/string.h:19,
from ./include/linux/bitmap.h:9,
from ./include/linux/cpumask.h:12,
from ./arch/mips/include/asm/processor.h:15,
from ./arch/mips/include/asm/thread_info.h:16,
from ./include/linux/thread_info.h:38,
from ./include/asm-generic/preempt.h:5,
from ./arch/mips/include/generated/asm/preempt.h:1,
from ./include/linux/preempt.h:81,
from ./include/linux/spinlock.h:51,
from ./include/linux/mmzone.h:8,
from ./include/linux/bootmem.h:8,
from arch/mips/bcm63xx/prom.c:10:
arch/mips/bcm63xx/prom.c: In function 'prom_init':
./arch/mips/include/asm/string.h:162:11: error: '__builtin_memcpy' forming offset [2, 32] is out of the bounds [0, 1] of object 'bmips_smp_movevec' with type 'char' [-Werror=array-bounds]
__ret = __builtin_memcpy((dst), (src), __len); \
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
arch/mips/bcm63xx/prom.c:97:3: note: in expansion of macro 'memcpy'
memcpy((void *)0xa0000200, &bmips_smp_movevec, 0x20);
^~~~~~
In file included from arch/mips/bcm63xx/prom.c:14:
./arch/mips/include/asm/bmips.h:80:13: note: 'bmips_smp_movevec' declared here
extern char bmips_smp_movevec;
Fixes: 18a1eef92d ("MIPS: BMIPS: Introduce bmips.h")
Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Paul Burton <paulburton@kernel.org>
Cc: linux-mips@vger.kernel.org
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: James Hogan <jhogan@kernel.org>
Having Rx-only AF_XDP sockets can potentially lead to a crash in the
system by a NULL pointer dereference in xsk_umem_consume_tx(). This
function iterates through a list of all sockets tied to a umem and
checks if there are any packets to send on the Tx ring. Rx-only
sockets do not have a Tx ring, so this will cause a NULL pointer
dereference. This will happen if you have registered one or more
Rx-only sockets to a umem and the driver is checking the Tx ring even
on Rx, or if the XDP_SHARED_UMEM mode is used and there is a mix of
Rx-only and other sockets tied to the same umem.
Fixed by only putting sockets with a Tx component on the list that
xsk_umem_consume_tx() iterates over.
Fixes: ac98d8aab6 ("xsk: wire upp Tx zero-copy functions")
Reported-by: Kal Cutter Conley <kal.conley@dectris.com>
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Link: https://lore.kernel.org/bpf/1571645818-16244-1-git-send-email-magnus.karlsson@intel.com
UDP IPv6 packets auto flowlabels are using a 32bit secret
(static u32 hashrnd in net/core/flow_dissector.c) and
apply jhash() over fields known by the receivers.
Attackers can easily infer the 32bit secret and use this information
to identify a device and/or user, since this 32bit secret is only
set at boot time.
Really, using jhash() to generate cookies sent on the wire
is a serious security concern.
Trying to change the rol32(hash, 16) in ip6_make_flowlabel() would be
a dead end. Trying to periodically change the secret (like in sch_sfq.c)
could change paths taken in the network for long lived flows.
Let's switch to siphash, as we did in commit df453700e8
("inet: switch IP ID generator to siphash")
Using a cryptographically strong pseudo random function will solve this
privacy issue and more generally remove other weak points in the stack.
Packet schedulers using skb_get_hash_perturb() benefit from this change.
Fixes: b56774163f ("ipv6: Enable auto flow labels by default")
Fixes: 42240901f7 ("ipv6: Implement different admin modes for automatic flow labels")
Fixes: 67800f9b1f ("ipv6: Call skb_get_hash_flowi6 to get skb->hash in ip6_make_flowlabel")
Fixes: cb1ce2ef38 ("ipv6: Implement automatic flow label generation on transmit")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Jonathan Berger <jonathann1@walla.com>
Reported-by: Amit Klein <aksecurity@gmail.com>
Reported-by: Benny Pinkas <benny@pinkas.net>
Cc: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This pull request contains MAINTAINERS file updates for Broadcom SoCs
for the 5.5 kernel, please pull the following:
- Simon adds a .mailmap alias for his old email
- Stefan updates the existing BCM2835 with BCM2711 which is the chip
name for the Raspberry Pi 4
- Florian removes Gregory and Brian from the MAINTAINERS file for
BRCMSTB SoCs
* tag 'arm-soc/for-5.5/maintainers' of https://github.com/Broadcom/stblinux:
MAINTAINERS: Remove Gregory and Brian for ARCH_BRCMSTB
mailmap: Add Simon Arlott (replacement for expired email address)
MAINTAINERS: Add BCM2711 to BCM2835 ARCH
Link: https://lore.kernel.org/r/20191023212814.30622-3-f.fainelli@gmail.com
Signed-off-by: Olof Johansson <olof@lixom.net>
Using CONFIG_SPARSEMEM_VMEMMAP instead of CONFIG_SPARSEMEM to fix
following build issue.
riscv64-linux-ld: arch/riscv/mm/init.o: in function 'vmemmap_populate':
init.c:(.meminit.text+0x8): undefined reference to 'vmemmap_populate_basepages'
Cc: Logan Gunthorpe <logang@deltatee.com>
Fixes: d95f1a542c ("RISC-V: Implement sparsemem")
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
With CONFIG_SPARSEMEM and !CONFIG_SPARSEMEM_VMEMMAP,
arch/riscv/include/asm/pgtable.h: In function ‘mk_pte’:
include/asm-generic/memory_model.h:64:14: error: implicit declaration of function ‘page_to_section’; did you mean ‘present_section’? [-Werror=implicit-function-declaration]
int __sec = page_to_section(__pg); \
^~~~~~~~~~~~~~~
Fixed by changing mk_pte() from inline function to macro.
Cc: Logan Gunthorpe <logang@deltatee.com>
Fixes: d95f1a542c ("RISC-V: Implement sparsemem")
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
[paul.walmsley@sifive.com: fixed checkpatch errors]
Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
Failed to compile Fedora/RISCV kernel (5.4-rc3+) with sparsemem enabled:
fs/proc/kcore.c: In function 'read_kcore':
fs/proc/kcore.c:510:8: error: implicit declaration of function 'kern_addr_valid'; did you mean 'virt_addr_valid'? [-Werror=implicit-function-declaration]
510 | if (kern_addr_valid(start)) {
| ^~~~~~~~~~~~~~~
| virt_addr_valid
Looking at other architectures I don't see kern_addr_valid being guarded by
CONFIG_FLATMEM.
Fixes: d95f1a542c ("RISC-V: Implement sparsemem")
Signed-off-by: David Abdurachmanov <david.abdurachmanov@sifive.com>
Tested-by: David Abdurachmanov <david.abdurachmanov@sifive.com>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
Commit d698a38814 ("of: reserved-memory: ignore disabled memory-region
nodes") added an early return in of_reserved_mem_device_init_by_idx(), but
didn't call of_node_put() on a device_node whose ref-count was incremented
in the call to of_parse_phandle() preceding the early exit.
Fixes: d698a38814 ("of: reserved-memory: ignore disabled memory-region nodes")
Signed-off-by: Chris Goldsworthy <cgoldswo@codeaurora.org>
Cc: stable@vger.kernel.org
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Rob Herring <robh@kernel.org>
Pull tracing fixes from Steven Rostedt:
"Two minor fixes:
- A race in perf trace initialization (missing mutexes)
- Minor fix to represent gfp_t in synthetic events as properly
signed"
* tag 'trace-v5.4-rc3-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Fix race in perf_trace_buf initialization
tracing: Fix "gfp_t" format for synthetic events
In unittest_data_add, a copy buffer is created via kmemdup. This buffer
is leaked if of_fdt_unflatten_tree fails. The release for the
unittest_data buffer is added.
Fixes: b951f9dc7f ("Enabling OF selftest to run without machine's devicetree")
Signed-off-by: Navid Emamdoost <navid.emamdoost@gmail.com>
Reviewed-by: Frank Rowand <frowand.list@gmail.com>
Signed-off-by: Rob Herring <robh@kernel.org>
Fix the errors in the RiscV CPU DT schema:
Documentation/devicetree/bindings/riscv/cpus.example.dt.yaml: cpu@0: 'timebase-frequency' is a required property
Documentation/devicetree/bindings/riscv/cpus.example.dt.yaml: cpu@1: 'timebase-frequency' is a required property
Documentation/devicetree/bindings/riscv/cpus.example.dt.yaml: cpu@0: compatible:0: 'riscv' is not one of ['sifive,rocket0', 'sifive,e5', 'sifive,e51', 'sifive,u54-mc', 'sifive,u54', 'sifive,u5']
Documentation/devicetree/bindings/riscv/cpus.example.dt.yaml: cpu@0: compatible: ['riscv'] is too short
Documentation/devicetree/bindings/riscv/cpus.example.dt.yaml: cpu@0: 'timebase-frequency' is a required property
The DT spec allows for 'timebase-frequency' to be in 'cpu' or 'cpus' node
and RiscV requires it in /cpus node, so make it disallowed in cpu
nodes.
Fixes: 4fd669a8c4 ("dt-bindings: riscv: convert cpu binding to json-schema")
Cc: Palmer Dabbelt <palmer@sifive.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: linux-riscv@lists.infradead.org
Acked-by: Paul Walmsley <paul.walmsley@sifive.com>
Signed-off-by: Rob Herring <robh@kernel.org>
Pull regulator fixes from Mark Brown:
"There are a few core fixes here around error handling and handling if
suspend mode configuration and some driver specific fixes here but the
most important change is the fix to the fixed-regulator DT schema
conversion introduced during the last merge window.
That fixes one of the last two errors preventing successful execution
of "make dt_binding_check" which will be enormously helpful for DT
schema development"
* tag 'regulator-fix-v5.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: qcom-rpmh: Fix PMIC5 BoB min voltage
regulator: pfuze100-regulator: Variable "val" in pfuze100_regulator_probe() could be uninitialized
regulator: lochnagar: Add on_off_delay for VDDCORE
regulator: ti-abb: Fix timeout in ti_abb_wait_txdone/ti_abb_clear_all_txdone
regulator: da9062: fix suspend_enable/disable preparation
dt-bindings: fixed-regulator: fix compatible enum
regulator: fixed: Prevent NULL pointer dereference when !CONFIG_OF
regulator: core: make regulator_register() EPROBE_DEFER aware
regulator: of: fix suspend-min/max-voltage parsing
Three fixes for omaps for v5.4-rc cycle
Two regression fixes for omap3 iommu. I missed applying two omap3
related iommu pdata quirks patches earlier because the kbuild test
robot produced errors on them for missing dependencies.
Fix ti-sysc interconnect target module driver handling for watchdog
quirk. I must have tested this earlier only with watchdog service
running, but clearly it does not do what it needs to do.
* tag 'omap-for-v5.4/fixes-rc4-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap:
bus: ti-sysc: Fix watchdog quirk handling
ARM: OMAP2+: Add pdata for OMAP3 ISP IOMMU
ARM: OMAP2+: Plug in device_enable/idle ops for IOMMUs
Link: https://lore.kernel.org/r/pull-1571848757-282222@atomide.com
Signed-off-by: Olof Johansson <olof@lixom.net>
Yegor Yefremov <yegorslists@googlemail.com> reported that musb and ftdi
uart can fail for the first open of the uart unless connected using
a hub.
This is because the first dma call done by musb_ep_program() must wait
if cppi41 is PM runtime suspended. Otherwise musb_ep_program() continues
with other non-dma packets before the DMA transfer is started causing at
least ftdi uarts to fail to receive data.
Let's fix the issue by waking up cppi41 with PM runtime calls added to
cppi41_dma_prep_slave_sg() and return NULL if still idled. This way we
have musb_ep_program() continue with PIO until cppi41 is awake.
Fixes: fdea2d09b9 ("dmaengine: cppi41: Add basic PM runtime support")
Reported-by: Yegor Yefremov <yegorslists@googlemail.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Cc: stable@vger.kernel.org # v4.9+
Link: https://lore.kernel.org/r/20191023153138.23442-1-tony@atomide.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
A number of fixes for this release, but mostly:
- A fixup for the A10 CSI DT binding merged during the 5.4-rc1 window
- A fix for a dt-binding error
- Addition of phy regulator delays
- The PMU on the A64 was found to be non-functional, so we've dropped it for now
* tag 'sunxi-fixes-for-5.4-1' of https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux:
ARM: dts: sun7i: Drop the module clock from the device tree
dt-bindings: media: sun4i-csi: Drop the module clock
media: dt-bindings: Fix building error for dt_binding_check
arm64: dts: allwinner: a64: sopine-baseboard: Add PHY regulator delay
arm64: dts: allwinner: a64: Drop PMU node
arm64: dts: allwinner: a64: pine64-plus: Add PHY regulator delay
Link: https://lore.kernel.org/r/80085a57-c40f-4bed-a9c3-19858d87564e.lettre@localhost
Signed-off-by: Olof Johansson <olof@lixom.net>
Recent changes modified the function arguments of
thread_group_sample_cputime() and task_cputimers_expired(), but forgot to
update the comments. Fix it up.
[ tglx: Changed the argument name of task_cputimers_expired() as the pointer
points to an array of samples. ]
Fixes: b7be4ef136 ("posix-cpu-timers: Switch thread group sampling to array")
Fixes: 001f797143 ("posix-cpu-timers: Make expiry checks array based")
Signed-off-by: Yi Wang <wang.yi59@zte.com.cn>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/1571643852-21848-1-git-send-email-wang.yi59@zte.com.cn
Include the timekeeping.h header to get the declaration of the
sched_clock_{suspend,resume} functions. Fixes the following sparse
warnings:
kernel/time/sched_clock.c:275:5: warning: symbol 'sched_clock_suspend' was not declared. Should it be static?
kernel/time/sched_clock.c:286:6: warning: symbol 'sched_clock_resume' was not declared. Should it be static?
Signed-off-by: Ben Dooks (Codethink) <ben.dooks@codethink.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20191022131226.11465-1-ben.dooks@codethink.co.uk
A recent commit removed the NULL pointer check from the clock_getres()
implementation causing a test case to fault.
POSIX requires an explicit NULL pointer check for clock_getres() aside of
the validity check of the clock_id argument for obscure reasons.
Add it back for both 32bit and 64bit.
Note, this is only a partial revert of the offending commit which does not
bring back the broken fallback invocation in the the 32bit compat
implementations of clock_getres() and clock_gettime().
Fixes: a9446a906f ("lib/vdso/32: Remove inconsistent NULL pointer checks")
Reported-by: Andreas Schwab <schwab@linux-m68k.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1910211202260.1904@nanos.tec.linutronix.de
Currently fuse_writepages_fill() calls get_fuse_inode() few times with
the same argument.
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Make sure cached writes are not reordered around open(..., O_TRUNC), with
the obvious wrong results.
Fixes: 4d99ff8f12 ("fuse: Turn writeback cache on")
Cc: <stable@vger.kernel.org> # v3.15+
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
If writeback cache is enabled, then writes might get reordered with
chmod/chown/utimes. The problem with this is that performing the write in
the fuse daemon might itself change some of these attributes. In such case
the following sequence of operations will result in file ending up with the
wrong mode, for example:
int fd = open ("suid", O_WRONLY|O_CREAT|O_EXCL);
write (fd, "1", 1);
fchown (fd, 0, 0);
fchmod (fd, 04755);
close (fd);
This patch fixes this by flushing pending writes before performing
chown/chmod/utimes.
Reported-by: Giuseppe Scrivano <gscrivan@redhat.com>
Tested-by: Giuseppe Scrivano <gscrivan@redhat.com>
Fixes: 4d99ff8f12 ("fuse: Turn writeback cache on")
Cc: <stable@vger.kernel.org> # v3.15+
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
In commit 8020919a9b ("mac80211: Properly handle SKB with radiotap
only"), buffers whose length is too short cause a WARN_ON(1) to be
executed. This change exposed a fault in rtlwifi drivers, which is fixed
by regarding packets with skb->len <= FCS_LEN as though they are in error
and dropping them. The test is now annotated as likely.
Cc: Stable <stable@vger.kernel.org> # v5.0+
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
When converting the wrong qu configurations in an earlier commit, I
accidentally swapped 0x2720 and 0x30DC. Instead of converting 0x2720,
I converted 0x30DC. Undo 0x30DC and convert 0x2720.
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Add a workaround that forces power gating to be enabled on integrated
22000 devices. This improves power saving in certain situations.
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
iwl_mvm_tvqm_enable_txq() can return an error, notably if unable
to allocate memory for the queue. Handle this error throughout,
avoiding storing the invalid value into a u16 which later leads
to a disable of an invalid queue ("queue 65524 not used", where
65524 is just -ENOMEM in a u16).
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
A bunch of the entries for qnj were wrong. The 9460 device doesn't
exist, so update them to 9461 and 9462. There are still a bunch of
other occurrences of 9460, but that will be fixed separately.
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Some entries for PCI ID 0x2720 were using iwl9260_2ac_cfg, but the
correct is to use iwl9260_2ac_cfg_soc. Fix that.
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Nicolas Waisman noticed that even though noa_len is checked for
a compatible length it's still possible to overrun the buffers
of p2pinfo since there's no check on the upper bound of noa_num.
Bound noa_num against P2P_MAX_NOA_NUM.
Reported-by: Nicolas Waisman <nico@semmle.com>
Signed-off-by: Laura Abbott <labbott@redhat.com>
Acked-by: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Two patches were sent out of order: one removed some conditions from
an if and the other moved the code elsewhere. When sending the patch
that moved the code, an older version of the original code was moved,
causing the "make QnJ exclusive" code to be essentially undone.
Fix that by removing the inclusive conditions from the check again.
Fixes: 809805a820 ("iwlwifi: pcie: move some cfg mangling from trans_pcie_alloc to probe")
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
:Pull ARM fixes from Russell King:
- fix for alignment faults under high memory pressure
- use u32 for ARM instructions in fault handler
- mark functions that must always be inlined with __always_inline
- fix for nommu XIP
- fix ARMv7M switch to handler mode in reboot path
- fix the recently introduced AMBA reset control error paths
* tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm:
ARM: 8926/1: v7m: remove register save to stack before svc
ARM: 8914/1: NOMMU: Fix exc_ret for XIP
ARM: 8908/1: add __always_inline to functions called from __get_user_check()
ARM: mm: alignment: use "u32" for 32-bit instructions
ARM: mm: fix alignment handler faults under memory pressure
drivers/amba: fix reset control error handling
Pull EDAC fix from Borislav Petkov:
"Fix ghes_edac UAF case triggered by KASAN and DEBUG_TEST_DRIVER_REMOVE.
Future pending rework of the ghes_edac instances registration will do
away with the single memory controller per system model and that ugly
hackery there.
This is a minimal fix for stable@, courtesy of James Morse"
* tag 'edac_urgent_for_5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
EDAC/ghes: Fix Use after free in ghes_edac remove path
Pull btrfs fixes from David Sterba:
- fixes of error handling cleanup of metadata accounting with qgroups
enabled
- fix swapped values for qgroup tracepoints
- fix race when handling full sync flag
- don't start unused worker thread, functionality removed already
* tag 'for-5.4-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
Btrfs: check for the full sync flag while holding the inode lock during fsync
Btrfs: fix qgroup double free after failure to reserve metadata for delalloc
btrfs: tracepoints: Fix bad entry members of qgroup events
btrfs: tracepoints: Fix wrong parameter order for qgroup events
btrfs: qgroup: Always free PREALLOC META reserve in btrfs_delalloc_release_extents()
btrfs: don't needlessly create extent-refs kernel thread
btrfs: block-group: Fix a memory leak due to missing btrfs_put_block_group()
Btrfs: add missing extents release on file extent cluster relocation error
When doing an out of tree build with O=, the nsdeps script constructs
the absolute pathname of the module source file so that it can insert
MODULE_IMPORT_NS statements in the right place. However, ${srctree}
contains an unescaped path to the source tree, which, when used in a sed
substitution, makes sed complain:
++ sed 's/[^ ]* *//home/jeyu/jeyu-linux\/&/g'
sed: -e expression #1, char 12: unknown option to `s'
The sed substitution command 's' ends prematurely with the forward
slashes in the pathname, and sed errors out when it encounters the 'h',
which is an invalid sed substitution option. To avoid escaping forward
slashes ${srctree}, we can use '|' as an alternative delimiter for
sed instead to avoid this error.
Reviewed-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Reviewed-by: Matthias Maennich <maennich@google.com>
Tested-by: Matthias Maennich <maennich@google.com>
Signed-off-by: Jessica Yu <jeyu@kernel.org>
Pull operating performance points (OPP) framework fixes for v5.4
from Viresh Kumar:
"This contains:
- Patch to revert addition of regulator enable/disable in OPP core
(Marek).
- Remove incorrect lockdep assert (Viresh).
- Fix a kref counting issue (Viresh)."
* 'opp/fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm:
opp: Reinitialize the list_kref before adding the static OPPs again
opp: core: Revert "add regulators enable and disable"
opp: of: drop incorrect lockdep_assert_held()
Fixes gcc '-Wunused-but-set-variable' warning:
fs/fuse/virtio_fs.c: In function virtio_fs_wake_pending_and_unlock:
fs/fuse/virtio_fs.c:983:20: warning: variable fc set but not used [-Wunused-but-set-variable]
It is not used since commit 7ee1e2e631 ("virtiofs: No need to check
fpq->connected state")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
The list_kref reaches a count of 0 when all the static OPPs are removed,
for example when dev_pm_opp_of_cpumask_remove_table() is called, though
the actual OPP table may not get freed as it may still be referenced by
other parts of the kernel, like from a call to
dev_pm_opp_set_supported_hw(). And if we call
dev_pm_opp_of_cpumask_add_table() again at this point, we must
reinitialize the list_kref otherwise the kernel will hit a WARN() in
kref infrastructure for incrementing a kref with value 0.
Fixes: 11e1a16482 ("opp: Don't decrement uninitialized list_kref")
Reported-by: Dmitry Osipenko <digetx@gmail.com>
Tested-by: Dmitry Osipenko <digetx@gmail.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
There is one more problematic case I noticed while recently fixing BPF kallsyms
handling in cd7455f101 ("bpf: Fix use after free in subprog's jited symbol
removal") and that is bpf_get_prog_name().
If BTF has been attached to the prog, then we may be able to fetch the function
signature type id in kallsyms through prog->aux->func_info[prog->aux->func_idx].type_id.
However, while the BTF object itself is torn down via RCU callback, the prog's
aux->func_info is immediately freed via kvfree(prog->aux->func_info) once the
prog's refcount either hit zero or when subprograms were already exposed via
kallsyms and we hit the error path added in 5482e9a93c ("bpf: Fix memleak in
aux->func_info and aux->btf").
This violates RCU as well since kallsyms could be walked in parallel where we
could access aux->func_info. Hence, defer kvfree() to after RCU grace period.
Looking at ba64e7d852 ("bpf: btf: support proper non-jit func info") there
is no reason/dependency where we couldn't defer the kvfree(aux->func_info) into
the RCU callback.
Fixes: 5482e9a93c ("bpf: Fix memleak in aux->func_info and aux->btf")
Fixes: ba64e7d852 ("bpf: btf: support proper non-jit func info")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/875f2906a7c1a0691f2d567b4d8e4ea2739b1e88.1571779205.git.daniel@iogearbox.net
This patch fixes issue with Gen7 adapter in a blade environment where one
of the ports will not be detected by driver. Firmware expects mailbox 11 to
be set or cleared by driver for newer ISP.
Following message is seen in the log file:
[ 18.810892] qla2xxx [0000:d8:00.0]-1820:1: **** Failed=102 mb[0]=4005 mb[1]=37 mb[2]=20 mb[3]=8
[ 18.819596] cmd=2 ****
[mkp: typos]
Link: https://lore.kernel.org/r/20191022193643.7076-2-hmadhani@marvell.com
Signed-off-by: Himanshu Madhani <hmadhani@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The initial lpfc_desc_set_adisc implementation in commit
dea3101e0a ("lpfc: add Emulex FC driver version 8.0.28") enabled ADISC if
cfg_use_adisc && RSCN_MODE && FCP_2_DEVICE
In commit 92d7f7b0cd ("[SCSI] lpfc: NPIV: add NPIV support on top of
SLI-3") this changed to
(cfg_use_adisc && RSC_MODE) || FCP_2_DEVICE
and later in commit ffc954936b ("[SCSI] lpfc 8.3.13: FC Discovery Fixes
and enhancements.") to
(cfg_use_adisc && RSC_MODE) || (FCP_2_DEVICE && FCP_TARGET)
A customer reports that after a devloss, an ADISC failure is logged. It
turns out the ADISC flag is set even the user explicitly set lpfc_use_adisc
= 0.
[Sat Dec 22 22:55:58 2018] lpfc 0000:82:00.0: 2:(0):0203 Devloss timeout on WWPN 50:01:43:80:12:8e:40:20 NPort x05df00 Data: x82000000 x8 xa
[Sat Dec 22 23:08:20 2018] lpfc 0000:82:00.0: 2:(0):2755 ADISC failure DID:05DF00 Status:x9/x70000
[mkp: fixed Hannes' email]
Fixes: 92d7f7b0cd ("[SCSI] lpfc: NPIV: add NPIV support on top of SLI-3")
Cc: Dick Kennedy <dick.kennedy@broadcom.com>
Cc: James Smart <james.smart@broadcom.com>
Link: https://lore.kernel.org/r/20191022072112.132268-1-dwagner@suse.de
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Daniel Wagner <dwagner@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Include <net/addrconf.h> for the missing declarations of
various functions. Fixes the following sparse warnings:
net/ipv6/addrconf_core.c:94:5: warning: symbol 'register_inet6addr_notifier' was not declared. Should it be static?
net/ipv6/addrconf_core.c:100:5: warning: symbol 'unregister_inet6addr_notifier' was not declared. Should it be static?
net/ipv6/addrconf_core.c:106:5: warning: symbol 'inet6addr_notifier_call_chain' was not declared. Should it be static?
net/ipv6/addrconf_core.c:112:5: warning: symbol 'register_inet6addr_validator_notifier' was not declared. Should it be static?
net/ipv6/addrconf_core.c:118:5: warning: symbol 'unregister_inet6addr_validator_notifier' was not declared. Should it be static?
net/ipv6/addrconf_core.c:125:5: warning: symbol 'inet6addr_validator_notifier_call_chain' was not declared. Should it be static?
net/ipv6/addrconf_core.c:237:6: warning: symbol 'in6_dev_finish_destroy' was not declared. Should it be static?
Signed-off-by: Ben Dooks (Codethink) <ben.dooks@codethink.co.uk>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Kernel test robot reported that the l2tp.sh test script failed:
# selftests: net: l2tp.sh
# Warning: file l2tp.sh is not executable, correct this.
Set executable bits.
Fixes: e858ef1cd4 ("selftests: Add l2tp tests")
Reported-by: kernel test robot <rong.a.chen@intel.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
We get one warnings when build kernel W=1:
net/sched/sch_taprio.c:1155:6: warning: no previous prototype for ‘taprio_offload_config_changed’ [-Wmissing-prototypes]
Make the function static to fix this.
Fixes: 9c66d15646 ("taprio: Add support for hardware offloading")
Signed-off-by: Yi Wang <wang.yi59@zte.com.cn>
Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Michael Chan says:
====================
Devlink and error recovery bug fix patches.
Most of the work is by Vasundhara Volam.
====================
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
With the recently added error recovery logic, the device may already
be disabled if the firmware recovery is unsuccessful. In
bnxt_remove_one(), check that the device is still enabled first
before calling pci_disable_device().
Fixes: 3bc7d4a352 ("bnxt_en: Add BNXT_STATE_IN_FW_RESET state.")
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
When firmware indicates that driver needs to invoke firmware reset
which is common for both error recovery and live firmware reset path,
driver needs a different time to wait before polling for firmware
readiness.
Modify the wait time to fw_reset_min_dsecs, which is initialised to
correct timeout for error recovery and firmware reset.
Fixes: 4037eb7156 ("bnxt_en: Add a new BNXT_FW_RESET_STATE_POLL_FW_DOWN state.")
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
The current code does not do endian swapping between the devlink
parameter and the internal NVRAM representation. Define a union to
represent the little endian NVRAM data and add 2 helper functions to
copy to and from the NVRAM data with the proper byte swapping.
Fixes: 782a624d00 ("bnxt_en: Add bnxt_en initial port params table and register it")
Cc: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
The current code that rounds up the NVRAM parameter bit size to the next
byte size for the devlink parameter is not always correct. The MSIX
devlink parameters are 4 bytes and we don't get the correct size
using this method.
Fix it by adding a new dl_num_bytes member to the bnxt_dl_nvm_param
structure which statically provides bytesize information according
to the devlink parameter type definition.
Fixes: 782a624d00 ("bnxt_en: Add bnxt_en initial port params table and register it")
Cc: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
The ionic driver started using dymamic_hex_dump(), but
that is not always defined:
drivers/net/ethernet/pensando/ionic/ionic_main.c:229:2: error: implicit declaration of function 'dynamic_hex_dump' [-Werror,-Wimplicit-function-declaration]
Add a dummy implementation to use when CONFIG_DYNAMIC_DEBUG
is disabled, printing nothing.
Fixes: 938962d552 ("ionic: Add adminq action")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
syzkaller managed to trigger the following crash:
[...]
BUG: unable to handle page fault for address: ffffc90001923030
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD aa551067 P4D aa551067 PUD aa552067 PMD a572b067 PTE 80000000a1173163
Oops: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 7982 Comm: syz-executor912 Not tainted 5.4.0-rc3+ #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:bpf_jit_binary_hdr include/linux/filter.h:787 [inline]
RIP: 0010:bpf_get_prog_addr_region kernel/bpf/core.c:531 [inline]
RIP: 0010:bpf_tree_comp kernel/bpf/core.c:600 [inline]
RIP: 0010:__lt_find include/linux/rbtree_latch.h:115 [inline]
RIP: 0010:latch_tree_find include/linux/rbtree_latch.h:208 [inline]
RIP: 0010:bpf_prog_kallsyms_find kernel/bpf/core.c:674 [inline]
RIP: 0010:is_bpf_text_address+0x184/0x3b0 kernel/bpf/core.c:709
[...]
Call Trace:
kernel_text_address kernel/extable.c:147 [inline]
__kernel_text_address+0x9a/0x110 kernel/extable.c:102
unwind_get_return_address+0x4c/0x90 arch/x86/kernel/unwind_frame.c:19
arch_stack_walk+0x98/0xe0 arch/x86/kernel/stacktrace.c:26
stack_trace_save+0xb6/0x150 kernel/stacktrace.c:123
save_stack mm/kasan/common.c:69 [inline]
set_track mm/kasan/common.c:77 [inline]
__kasan_kmalloc+0x11c/0x1b0 mm/kasan/common.c:510
kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:518
slab_post_alloc_hook mm/slab.h:584 [inline]
slab_alloc mm/slab.c:3319 [inline]
kmem_cache_alloc+0x1f5/0x2e0 mm/slab.c:3483
getname_flags+0xba/0x640 fs/namei.c:138
getname+0x19/0x20 fs/namei.c:209
do_sys_open+0x261/0x560 fs/open.c:1091
__do_sys_open fs/open.c:1115 [inline]
__se_sys_open fs/open.c:1110 [inline]
__x64_sys_open+0x87/0x90 fs/open.c:1110
do_syscall_64+0xf7/0x1c0 arch/x86/entry/common.c:290
entry_SYSCALL_64_after_hwframe+0x49/0xbe
[...]
After further debugging it turns out that we walk kallsyms while in parallel
we tear down a BPF program which contains subprograms that have been JITed
though the program itself has not been fully exposed and is eventually bailing
out with error.
The bpf_prog_kallsyms_del_subprogs() in bpf_prog_load()'s error path removes
the symbols, however, bpf_prog_free() tears down the JIT memory too early via
scheduled work. Instead, it needs to properly respect RCU grace period as the
kallsyms walk for BPF is under RCU.
Fix it by refactoring __bpf_prog_put()'s tear down and reuse it in our error
path where we defer final destruction when we have subprogs in the program.
Fixes: 7d1982b4e3 ("bpf: fix panic in prog load calls cleanup")
Fixes: 1c2a088a66 ("bpf: x64: add JIT support for multi-function programs")
Reported-by: syzbot+710043c5d1d5b5013bc7@syzkaller.appspotmail.com
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Tested-by: syzbot+710043c5d1d5b5013bc7@syzkaller.appspotmail.com
Link: https://lore.kernel.org/bpf/55f6367324c2d7e9583fa9ccf5385dcbba0d7a6e.1571752452.git.daniel@iogearbox.net
The issue is in drivers/infiniband/core/uverbs_std_types_cq.c in the
UVERBS_HANDLER(UVERBS_METHOD_CQ_CREATE) function. We check that:
if (attr.comp_vector >= attrs->ufile->device->num_comp_vectors) {
But we don't check if "attr.comp_vector" is negative. It could
potentially lead to an array underflow. My concern would be where
cq->vector is used in the create_cq() function from the cxgb4 driver.
And really "attr.comp_vector" is appears as a u32 to user space so that's
the right type to use.
Fixes: 9ee79fce36 ("IB/core: Add completion queue (cq) object actions")
Link: https://lore.kernel.org/r/20191011133419.GA22905@mwanda
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
If the "virtualize APIC accesses" VM-execution control is set in the
VMCS, the APIC virtualization hardware is triggered when a page walk
in VMX non-root mode terminates at a PTE wherein the address of the 4k
page frame matches the APIC-access address specified in the VMCS. On
hardware, the APIC-access address may be any valid 4k-aligned physical
address.
KVM's nVMX implementation enforces the additional constraint that the
APIC-access address specified in the vmcs12 must be backed by
a "struct page" in L1. If not, L0 will simply clear the "virtualize
APIC accesses" VM-execution control in the vmcs02.
The problem with this approach is that the L1 guest has arranged the
vmcs12 EPT tables--or shadow page tables, if the "enable EPT"
VM-execution control is clear in the vmcs12--so that the L2 guest
physical address(es)--or L2 guest linear address(es)--that reference
the L2 APIC map to the APIC-access address specified in the
vmcs12. Without the "virtualize APIC accesses" VM-execution control in
the vmcs02, the APIC accesses in the L2 guest will directly access the
APIC-access page in L1.
When there is no mapping whatsoever for the APIC-access address in L1,
the L2 VM just loses the intended APIC virtualization. However, when
the APIC-access address is mapped to an MMIO region in L1, the L2
guest gets direct access to the L1 MMIO device. For example, if the
APIC-access address specified in the vmcs12 is 0xfee00000, then L2
gets direct access to L1's APIC.
Since this vmcs12 configuration is something that KVM cannot
faithfully emulate, the appropriate response is to exit to userspace
with KVM_INTERNAL_ERROR_EMULATION.
Fixes: fe3ef05c75 ("KVM: nVMX: Prepare vmcs02 from vmcs01 and vmcs12")
Reported-by: Dan Cross <dcross@google.com>
Signed-off-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Peter Shier <pshier@google.com>
Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
8-letter strings representing ARC perf events are stores in two
32-bit registers as ASCII characters like that: "IJMP", "IALL", "IJMPTAK" etc.
And the same order of bytes in the word is used regardless CPU endianness.
Which means in case of big-endian CPU core we need to swap bytes to get
the same order as if it was on little-endian CPU.
Otherwise we're seeing the following error message on boot:
------------------------->8----------------------
ARC perf : 8 counters (32 bits), 40 conditions, [overflow IRQ support]
sysfs: cannot create duplicate filename '/devices/arc_pct/events/pmji'
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.2.18 #3
Stack Trace:
arc_unwind_core+0xd4/0xfc
dump_stack+0x64/0x80
sysfs_warn_dup+0x46/0x58
sysfs_add_file_mode_ns+0xb2/0x168
create_files+0x70/0x2a0
------------[ cut here ]------------
WARNING: CPU: 0 PID: 1 at kernel/events/core.c:12144 perf_event_sysfs_init+0x70/0xa0
Failed to register pmu: arc_pct, reason -17
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.2.18 #3
Stack Trace:
arc_unwind_core+0xd4/0xfc
dump_stack+0x64/0x80
__warn+0x9c/0xd4
warn_slowpath_fmt+0x22/0x2c
perf_event_sysfs_init+0x70/0xa0
---[ end trace a75fb9a9837bd1ec ]---
------------------------->8----------------------
What happens here we're trying to register more than one raw perf event
with the same name "PMJI". Why? Because ARC perf events are 4 to 8 letters
and encoded into two 32-bit words. In this particular case we deal with 2
events:
* "IJMP____" which counts all jump & branch instructions
* "IJMPC___" which counts only conditional jumps & branches
Those strings are split in two 32-bit words this way "IJMP" + "____" &
"IJMP" + "C___" correspondingly. Now if we read them swapped due to CPU core
being big-endian then we read "PMJI" + "____" & "PMJI" + "___C".
And since we interpret read array of ASCII letters as a null-terminated string
on big-endian CPU we end up with 2 events of the same name "PMJI".
Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com>
Cc: stable@vger.kernel.org
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Guest physical APIC ID may not equal to vcpu->vcpu_id in some case.
We may set the wrong physical id in avic_handle_ldr_update as we
always use vcpu->vcpu_id. Get physical APIC ID from vAPIC page
instead.
Export and use kvm_xapic_id here and in avic_handle_apic_id_update
as suggested by Vitaly.
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Scheduled policy update work may end up racing with the freeing of the
policy and unregistering the driver.
One possible race is as below, where the cpufreq_driver is unregistered,
but the scheduled work gets executed at later stage when, cpufreq_driver
is NULL (i.e. after freeing the policy and driver).
Unable to handle kernel NULL pointer dereference at virtual address 0000001c
pgd = (ptrval)
[0000001c] *pgd=80000080204003, *pmd=00000000
Internal error: Oops: 206 [#1] SMP THUMB2
Modules linked in:
CPU: 0 PID: 34 Comm: kworker/0:1 Not tainted 5.4.0-rc3-00006-g67f5a8081a4b #86
Hardware name: ARM-Versatile Express
Workqueue: events handle_update
PC is at cpufreq_set_policy+0x58/0x228
LR is at dev_pm_qos_read_value+0x77/0xac
Control: 70c5387d Table: 80203000 DAC: fffffffd
Process kworker/0:1 (pid: 34, stack limit = 0x(ptrval))
(cpufreq_set_policy) from (refresh_frequency_limits.part.24+0x37/0x48)
(refresh_frequency_limits.part.24) from (handle_update+0x2f/0x38)
(handle_update) from (process_one_work+0x16d/0x3cc)
(process_one_work) from (worker_thread+0xff/0x414)
(worker_thread) from (kthread+0xff/0x100)
(kthread) from (ret_from_fork+0x11/0x28)
Fixes: 67d874c3b2 ("cpufreq: Register notifiers with the PM QoS framework")
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
[ rjw: Cancel the work before dropping the QoS requests ]
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Commit "bpf: Process in-kernel BTF" in linux-next introduced an undefined
__weak symbol, which results in an R_390_GLOB_DAT relocation type. That
is not yet handled by the KASLR relocation code, and the kernel stops with
the message "Unknown relocation type".
Add code to detect and handle R_390_GLOB_DAT relocation types and undefined
symbols.
Fixes: 805bc0bc23 ("s390/kernel: build a relocatable kernel")
Cc: <stable@vger.kernel.org> # v5.2+
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
If a process is interrupted while accessing the crypto device and the
global ap_perms_mutex is contented, release() could return early and
fail to free related resources.
Fixes: 00fab2350e ("s390/zcrypt: multiple zcrypt device nodes support")
Cc: <stable@vger.kernel.org> # 4.19
Cc: Harald Freudenberger <freude@linux.ibm.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Commit:
8a58ddae23 ("perf/core: Fix exclusive events' grouping")
allows CAP_EXCLUSIVE events to be grouped with other events. Since all
of those also happen to be AUX events (which is not the case the other
way around, because arch/s390), this changes the rules for stopping the
output: the AUX event may not be on its PMU's context any more, if it's
grouped with a HW event, in which case it will be on that HW event's
context instead. If that's the case, munmap() of the AUX buffer can't
find and stop the AUX event, potentially leaving the last reference with
the atomic context, which will then end up freeing the AUX buffer. This
will then trip warnings:
Fix this by using the context's PMU context when looking for events
to stop, instead of the event's PMU context.
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20191022073940.61814-1-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
KVM/arm fixes for 5.4, take #2
Special PMU edition:
- Fix cycle counter truncation
- Fix cycle counter overflow limit on pure 64bit system
- Allow chained events to be actually functional
- Correct sample period after overflow
After resetting the vCPU, the kvmclock MSR keeps the previous value but it is
not enabled. This can be confusing, so fix it.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Use BUG_ON instead of a if condition followed by BUG.
Generated by: scripts/coccinelle/misc/bugon.cocci
Fixes: 4b526de50e ("KVM: x86: Check kvm_rebooting in kvm_spurious_fault()")
CC: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Commit bf653b78f9 ("KVM: vmx: Introduce handle_unexpected_vmexit
and handle WAITPKG vmexit") introduced specialized handling of
specific exit-reasons that should not be raised by CPU because
KVM configures VMCS such that they should never be raised.
However, since commit 7396d337cf ("KVM: x86: Return to userspace
with internal error on unexpected exit reason"), VMX & SVM
exit handlers were modified to generically handle all unexpected
exit-reasons by returning to userspace with internal error.
Therefore, there is no need for specialized handling of specific
unexpected exit-reasons (This specialized handling also introduced
inconsistency for these exit-reasons to silently skip guest instruction
instead of return to userspace on internal-error).
Fixes: bf653b78f9 ("KVM: vmx: Introduce handle_unexpected_vmexit and handle WAITPKG vmexit")
Signed-off-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Commit 204c91eff7 ("KVM: selftests: do not blindly clobber registers in
guest asm") was intended to make test more gcc-proof, however, the result
is exactly the opposite: on newer gccs (e.g. 8.2.1) the test breaks with
==== Test Assertion Failure ====
x86_64/sync_regs_test.c:168: run->s.regs.regs.rbx == 0xBAD1DEA + 1
pid=14170 tid=14170 - Invalid argument
1 0x00000000004015b3: main at sync_regs_test.c:166 (discriminator 6)
2 0x00007f413fb66412: ?? ??:0
3 0x000000000040191d: _start at ??:?
rbx sync regs value incorrect 0x1.
Apparently, compile is still free to play games with registers even
when they have variables attached.
Re-write guest code with 'asm volatile' by embedding ucall there and
making sure rbx is preserved.
Fixes: 204c91eff7 ("KVM: selftests: do not blindly clobber registers in guest asm")
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
vmx_dirty_log_test fails on AMD and this is no surprise as it is VMX
specific. Bail early when nested VMX is unsupported.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
vmx_* tests require VMX and three of them implement the same check. Move it
to vmx library.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
vmx_set_nested_state_test() checks if VMX is supported twice: in the very
beginning (and skips the whole test if it's not) and before doing
test_vmx_nested_state(). One should be enough.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
When the RDPID instruction is supported on the host, enumerate it in
KVM_GET_SUPPORTED_CPUID.
Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Pull pin control fixes from Linus Walleij:
"Here is a bunch of pin control fixes. I was lagging behind on this
one, some fixes should have come in earlier, sorry about that.
Anyways here it is, pretty straight-forward fixes, the Strago fix
stand out as something serious affecting a lot of machines.
Summary:
- Handle multiple instances of Intel chips without complaining.
- Restore the Intel Strago DMI workaround
- Make the Armada 37xx handle pins over 32
- Fix the polarity of the LED group on Armada 37xx
- Fix an off-by-one bug in the NS2 driver
- Fix error path for iproc's platform_get_irq()
- Fix error path on the STMFX driver
- Fix a typo in the Berlin AS370 driver
- Fix up misc errors in the Aspeed 2600 BMC support
- Fix a stray SPDX tag"
* tag 'pinctrl-v5.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
pinctrl: aspeed-g6: Rename SD3 to EMMC and rework pin groups
pinctrl: aspeed-g6: Fix UART13 group pinmux
pinctrl: aspeed-g6: Make SIG_DESC_CLEAR() behave intuitively
pinctrl: aspeed-g6: Fix I3C3/I3C4 pinmux configuration
pinctrl: aspeed-g6: Fix I2C14 SDA description
pinctrl: aspeed-g6: Sort pins for sanity
dt-bindings: pinctrl: aspeed-g6: Rework SD3 function and groups
pinctrl: berlin: as370: fix a typo s/spififib/spdifib
pinctrl: armada-37xx: swap polarity on LED group
pinctrl: stmfx: fix null pointer on remove
pinctrl: iproc: allow for error from platform_get_irq()
pinctrl: ns2: Fix off by one bugs in ns2_pinmux_enable()
pinctrl: bcm-iproc: Use SPDX header
pinctrl: armada-37xx: fix control of pins 32 and up
pinctrl: cherryview: restore Strago DMI workaround for all versions
pinctrl: intel: Allocate IRQ chip dynamic
Currenly haltpoll isn't aware of the 'idle=' override, the priority is
'idle=poll' > haltpoll > 'idle=halt'. When 'idle=poll' is used, cpuidle
driver is bypassed but current_driver in sys still shows 'haltpoll'.
When 'idle=halt' is used, haltpoll takes precedence and makes
'idle=halt' have no effect.
Add a check to prevent the haltpoll driver from loading if 'idle=' is
present.
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Co-developed-by: Joao Martins <joao.m.martins@oracle.com>
[ rjw: Subject ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The platform detection VMWARE_PORT macro uses the VMWARE_HYPERVISOR_PORT
definition, but expects it to be an integer. However, when it was moved
to the new vmware.h include file, it was changed to be a string to better
fit into the VMWARE_HYPERCALL set of macros. This obviously breaks the
platform detection VMWARE_PORT functionality.
Change the VMWARE_HYPERVISOR_PORT and VMWARE_HYPERVISOR_PORT_HB
definitions to be integers, and use __stringify() for their stringified
form when needed.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: b4dd4f6e36 ("Add a header file for hypercall definitions")
Link: https://lkml.kernel.org/r/20191021172403.3085-3-thomas_os@shipmail.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This pull request contains Broadcom ARM-based SoC Device Tree fixes for
5.4, please pull the following:
- Stefan removes the activity LED node from the CM3 DTS since there is
no driver for that LED yet and leds-gpio cannot drive it either
* tag 'arm-soc/for-5.4/devicetree-fixes-part2' of https://github.com/Broadcom/stblinux:
ARM: dts: bcm2837-rpi-cm3: Avoid leds-gpio probing issue
Link: https://lore.kernel.org/r/20191021194302.21024-1-f.fainelli@gmail.com
Signed-off-by: Olof Johansson <olof@lixom.net>
This pull request contains Broadcom ARM-based SoCs Device Tree fixes for
5.4, please pull the following:
- Stefan fixes the MMC controller bus-width property for the Raspberry Pi
Zero Wireless which was incorrect after a prior refactoring
* tag 'arm-soc/for-5.4/devicetree-fixes' of https://github.com/Broadcom/stblinux:
ARM: dts: bcm2835-rpi-zero-w: Fix bus-width of sdhci
Link: https://lore.kernel.org/r/20191015172356.9650-1-f.fainelli@gmail.com
Signed-off-by: Olof Johansson <olof@lixom.net>
DaVinci fixes for v5.4
======================
* fix GPIO backlight support on DA850 by enabling the needed config
in davinci_all_defconfig. This is a fix because the driver and board
support got converted to use BACKLIGHT_GPIO driver, but defconfig update
is still missing in v5.4.
* fix for McBSP DMA on DM365
* tag 'davinci-fixes-for-v5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/nsekhar/linux-davinci:
ARM: davinci_all_defconfig: enable GPIO backlight
ARM: davinci: dm365: Fix McBSP dma_slave_map entry
Link: https://lore.kernel.org/r/7f3393f9-59be-a2d4-c1e1-ba6e407681d1@ti.com
Signed-off-by: Olof Johansson <olof@lixom.net>
A number of fixes for individual boards like the rockpro64, and Hugsun X99
as well as a fix for the Gru-Kevin display override and fixing the dt-
binding for Theobroma boards to the correct naming that is also actually
used in the wild.
* tag 'v5.4-rockchip-dtsfixes1' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip:
arm64: dts: rockchip: Fix override mode for rk3399-kevin panel
arm64: dts: rockchip: Fix usb-c on Hugsun X99 TV Box
arm64: dts: rockchip: fix RockPro64 sdmmc settings
arm64: dts: rockchip: fix RockPro64 sdhci settings
arm64: dts: rockchip: fix RockPro64 vdd-log regulator settings
dt-bindings: arm: rockchip: fix Theobroma-System board bindings
arm64: dts: rockchip: fix Rockpro64 RK808 interrupt line
Link: https://lore.kernel.org/r/1599050.HRXuSXmxRg@phil
Signed-off-by: Olof Johansson <olof@lixom.net>
i.MX fixes for 5.4:
- Re-enable SNVS power key for imx6q-logicpd board which was accidentally
disabled by a SoC level change.
- Fix I2C switches on vf610-zii-scu4-aib board by specifying property
i2c-mux-idle-disconnect.
- A fix on imx-scu API that reads UID from firmware to avoid kernel NULL
pointer dump.
- A series from Anson to correct i.MX7 GPT and i.MX8 USDHC IPG clock.
- A fix on DRM_MSM Kconfig regression on i.MX5 by adding the option
explicitly into imx_v6_v7_defconfig.
- Fix ARM regulator states issue for zii-ultra board, which is impacting
stability of the board.
- A correction on CPU core idle state name for LayerScape LX2160A SoC.
* tag 'imx-fixes-5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux:
ARM: imx_v6_v7_defconfig: Enable CONFIG_DRM_MSM
arm64: dts: imx8mn: Use correct clock for usdhc's ipg clk
arm64: dts: imx8mm: Use correct clock for usdhc's ipg clk
arm64: dts: imx8mq: Use correct clock for usdhc's ipg clk
ARM: dts: imx7s: Correct GPT's ipg clock source
ARM: dts: vf610-zii-scu4-aib: Specify 'i2c-mux-idle-disconnect'
ARM: dts: imx6q-logicpd: Re-Enable SNVS power key
arm64: dts: lx2160a: Correct CPU core idle state name
arm64: dts: zii-ultra: fix ARM regulator states
soc: imx: imx-scu: Getting UID from SCU should have response
Link: https://lore.kernel.org/r/20191017141851.GA22506@dragon
Signed-off-by: Olof Johansson <olof@lixom.net>
Fixes for omaps for v5.4-rc cycle
More fixes for omap variants:
- Update more panel options in omap2plus_defconfig that got changed
as we moved to use generic LCD panels
- Remove unused twl_keypad for logicpd-torpedo-som to avoid boot
time warnings. This is only a cosmetic fix, but at least dmesg output
is now getting more readable after all the fixes to remove pointless
warnings
- Fix gpu_cm node name as we still have a non-standard node name
dependency for clocks. This should eventually get fixed by use
of domain specific compatible property
- Fix use of i2c-mux-idle-disconnect for m3874-iceboard
- Use level interrupt for omap4 & 5 wlcore to avoid lost edge
interrupts
* tag 'omap-for-v5.4/fixes-rc3-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap:
ARM: dts: Use level interrupt for omap4 & 5 wlcore
ARM: dts: am3874-iceboard: Fix 'i2c-mux-idle-disconnect' usage
ARM: dts: omap5: fix gpu_cm clock provider name
ARM: dts: logicpd-torpedo-som: Remove twl_keypad
ARM: omap2plus_defconfig: Fix selected panels after generic panel changes
Link: https://lore.kernel.org/r/pull-1571242890-118432@atomide.com
Signed-off-by: Olof Johansson <olof@lixom.net>
This device is sold as 'ThinkPad USB-C Dock Gen 2 (40AS)'.
Chipset is RTL8153 and works with r8152.
Without this, the generic cdc_ether grabs the device, and the device jam
connected networks up when the machine suspends.
Signed-off-by: Kazutoshi Noguchi <noguchi.kazutosi@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch removes the iph field from the state structure, which is not
properly initialized. Instead, add a new field to make the "do we want
to set DF" be the state bit and move the code to set the DF flag from
ip_frag_next().
Joint work with Pablo and Linus.
Fixes: 19c3401a91 ("net: ipv4: place control buffer handling away from fragmentation iterators")
Reported-by: Patrick Schönthaler <patrick@notvads.ovh>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
r0-r3 & r12 registers are saved & restored, before & after svc
respectively. Intention was to preserve those registers across thread to
handler mode switch.
On v7-M, hardware saves the register context upon exception in AAPCS
complaint way. Restoring r0-r3 & r12 is done from stack location where
hardware saves it, not from the location on stack where these registers
were saved.
To clarify, on stm32f429 discovery board:
1. before svc, sp - 0x90009ff8
2. r0-r3,r12 saved to 0x90009ff8 - 0x9000a00b
3. upon svc, h/w decrements sp by 32 & pushes registers onto stack
4. after svc, sp - 0x90009fd8
5. r0-r3,r12 restored from 0x90009fd8 - 0x90009feb
Above means r0-r3,r12 is not restored from the location where they are
saved, but since hardware pushes the registers onto stack, the registers
are restored correctly.
Note that during register saving to stack (step 2), it goes past
0x9000a000. And it seems, based on objdump, there are global symbols
residing there, and it perhaps can cause issues on a non-XIP Kernel
(on XIP, data section is setup later).
Based on the analysis above, manually saving registers onto stack is at
best no-op and at worst can cause data section corruption. Hence remove
storing of registers onto stack before svc.
Fixes: b70cd406d7 ("ARM: 8671/1: V7M: Preserve registers across switch from Thread to Handler mode")
Signed-off-by: afzal mohammed <afzal.mohd.ma@gmail.com>
Acked-by: Vladimir Murzin <vladimir.murzin@arm.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Saeed Mahameed says:
====================
Mellanox, mlx5 kTLS fixes 18-10-2019
This series introduces kTLS related fixes to mlx5 driver from Tariq,
and two misc memory leak fixes form Navid Emamdoost.
Please pull and let me know if there is any problem.
I would appreciate it if you queue up kTLS fixes from the list below to
stable kernel v5.3 !
For -stable v4.13:
nett/mlx5: prevent memory leak in mlx5_fpga_conn_create_cq
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
It turns out that commit 01ccf903ed ("pwm: Let pwm_get_state() return
the last implemented state") causes backlight failures on a number of
boards. The reason is that some of the drivers do not write the full
state through to the hardware registers, which means that ->get_state()
subsequently does not return the correct state. Consumers which rely on
pwm_get_state() returning the current state will therefore get confused
and subsequently try to program a bad state.
Before this change can be made, existing drivers need to be more
carefully audited and fixed to behave as the framework expects. Until
then, keep the original behaviour of returning the software state that
was applied rather than reading the state back from hardware.
Reviewed-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Tested-by: Enric Balletbo i Serra <enric.balletbo@collabora.com>
Tested-by: Michal Vokáč <michal.vokac@ysoft.com>
Signed-off-by: Thierry Reding <thierry.reding@gmail.com>
If regular request queue gets full, currently we sleep for a bit and
retrying submission in submitter's context. This assumes submitter is not
holding any spin lock. But this assumption is not true for background
requests. For background requests, we are called with fc->bg_lock held.
This can lead to deadlock where one thread is trying submission with
fc->bg_lock held while request completion thread has called
fuse_request_end() which tries to acquire fc->bg_lock and gets blocked. As
request completion thread gets blocked, it does not make further progress
and that means queue does not get empty and submitter can't submit more
requests.
To solve this issue, retry submission with the help of a worker, instead of
retrying in submitter's context. We already do this for hiprio/forget
requests.
Reported-by: Chirantan Ekbote <chirantan@chromium.org>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
If virtqueue is full, we put forget requests on a list and these forgets
are dispatched later using a worker. As of now we don't count these forgets
in fsvq->in_flight variable. This means when queue is being drained, we
have to have special logic to first drain these pending requests and then
wait for fsvq->in_flight to go to zero.
By counting pending forgets in fsvq->in_flight, we can get rid of special
logic and just wait for in_flight to go to zero. Worker thread will kick
and drain all the forgets anyway, leading in_flight to zero.
I also need similar logic for normal request queue in next patch where I am
about to defer request submission in the worker context if queue is full.
This simplifies the code a bit.
Also add two helper functions to inc/dec in_flight. Decrement in_flight
helper will later used to call completion when in_flight reaches zero.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
FR_SENT flag should be set when request has been sent successfully sent
over virtqueue. This is used by interrupt logic to figure out if interrupt
request should be sent or not.
Also add it to fqp->processing list after sending it successfully.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
In virtiofs we keep per queue connected state in virtio_fs_vq->connected
and use that to end request if queue is not connected. And virtiofs does
not even touch fpq->connected state.
We probably need to merge these two at some point of time. For now,
simplify the code a bit and do not worry about checking state of
fpq->connected.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Submission context can hold some locks which end request code tries to hold
again and deadlock can occur. For example, fc->bg_lock. If a background
request is being submitted, it might hold fc->bg_lock and if we could not
submit request (because device went away) and tried to end request, then
deadlock happens. During testing, I also got a warning from deadlock
detection code.
So put requests on a list and end requests from a worker thread.
I got following warning from deadlock detector.
[ 603.137138] WARNING: possible recursive locking detected
[ 603.137142] --------------------------------------------
[ 603.137144] blogbench/2036 is trying to acquire lock:
[ 603.137149] 00000000f0f51107 (&(&fc->bg_lock)->rlock){+.+.}, at: fuse_request_end+0xdf/0x1c0 [fuse]
[ 603.140701]
[ 603.140701] but task is already holding lock:
[ 603.140703] 00000000f0f51107 (&(&fc->bg_lock)->rlock){+.+.}, at: fuse_simple_background+0x92/0x1d0 [fuse]
[ 603.140713]
[ 603.140713] other info that might help us debug this:
[ 603.140714] Possible unsafe locking scenario:
[ 603.140714]
[ 603.140715] CPU0
[ 603.140716] ----
[ 603.140716] lock(&(&fc->bg_lock)->rlock);
[ 603.140718] lock(&(&fc->bg_lock)->rlock);
[ 603.140719]
[ 603.140719] *** DEADLOCK ***
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
If the FUSE_READDIRPLUS_AUTO feature is enabled, then lookups on a
directory before/during readdir are used as an indication that READDIRPLUS
should be used instead of READDIR. However if the lookup turns out to be
negative, then selecting READDIRPLUS makes no sense.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
In case of master pending state, it should not trigger a master
command, otherwise data could be corrupted because this H/W shares
the same data buffer for slave and master operations. It also means
that H/W command queue handling is unreliable because of the buffer
sharing issue. To fix this issue, it clears command queue if a
master command is queued in pending state to use S/W solution
instead of H/W command queue handling. Also, it refines restarting
mechanism of the pending master command.
Fixes: 2e57b7cebb ("i2c: aspeed: Add multi-master use case support")
Signed-off-by: Jae Hyun Yoo <jae.hyun.yoo@linux.intel.com>
Reviewed-by: Brendan Higgins <brendanhiggins@google.com>
Acked-by: Joel Stanley <joel@jms.id.au>
Tested-by: Tao Ren <taoren@fb.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
ASoC: Fixes for v5.4
A collection of fixes that have arrived since the merge window. There
are a small number of core fixes here but they are smaller ones around
error handling.
According to the App note[1] detailing the tuning algorithm, for
temperatures < -20C, the initial tuning value should be min(largest value
in LPW - 24, ceil(13/16 ratio of LPW)). The largest value in LPW is
(max_window + 4 * (max_len - 1)) and not (max_window + 4 * max_len) itself.
Fix this implementation.
[1] http://www.ti.com/lit/an/spraca9b/spraca9b.pdf
Fixes: 961de0a856 ("mmc: sdhci-omap: Workaround errata regarding SDR104/HS200 tuning failures (i929)")
Cc: stable@vger.kernel.org
Signed-off-by: Faiz Abbas <faiz_abbas@ti.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
The following commit from the v5.4 merge window:
d44248a413 ("perf/core: Rework memory accounting in perf_mmap()")
... breaks auxiliary trace buffer tracking.
If I run command 'perf record -e rbd000' to record samples and saving
them in the **auxiliary** trace buffer then the value of 'locked_vm' becomes
negative after all trace buffers have been allocated and released:
During allocation the values increase:
[52.250027] perf_mmap user->locked_vm:0x87 pinned_vm:0x0 ret:0
[52.250115] perf_mmap user->locked_vm:0x107 pinned_vm:0x0 ret:0
[52.250251] perf_mmap user->locked_vm:0x188 pinned_vm:0x0 ret:0
[52.250326] perf_mmap user->locked_vm:0x208 pinned_vm:0x0 ret:0
[52.250441] perf_mmap user->locked_vm:0x289 pinned_vm:0x0 ret:0
[52.250498] perf_mmap user->locked_vm:0x309 pinned_vm:0x0 ret:0
[52.250613] perf_mmap user->locked_vm:0x38a pinned_vm:0x0 ret:0
[52.250715] perf_mmap user->locked_vm:0x408 pinned_vm:0x2 ret:0
[52.250834] perf_mmap user->locked_vm:0x408 pinned_vm:0x83 ret:0
[52.250915] perf_mmap user->locked_vm:0x408 pinned_vm:0x103 ret:0
[52.251061] perf_mmap user->locked_vm:0x408 pinned_vm:0x184 ret:0
[52.251146] perf_mmap user->locked_vm:0x408 pinned_vm:0x204 ret:0
[52.251299] perf_mmap user->locked_vm:0x408 pinned_vm:0x285 ret:0
[52.251383] perf_mmap user->locked_vm:0x408 pinned_vm:0x305 ret:0
[52.251544] perf_mmap user->locked_vm:0x408 pinned_vm:0x386 ret:0
[52.251634] perf_mmap user->locked_vm:0x408 pinned_vm:0x406 ret:0
[52.253018] perf_mmap user->locked_vm:0x408 pinned_vm:0x487 ret:0
[52.253197] perf_mmap user->locked_vm:0x408 pinned_vm:0x508 ret:0
[52.253374] perf_mmap user->locked_vm:0x408 pinned_vm:0x589 ret:0
[52.253550] perf_mmap user->locked_vm:0x408 pinned_vm:0x60a ret:0
[52.253726] perf_mmap user->locked_vm:0x408 pinned_vm:0x68b ret:0
[52.253903] perf_mmap user->locked_vm:0x408 pinned_vm:0x70c ret:0
[52.254084] perf_mmap user->locked_vm:0x408 pinned_vm:0x78d ret:0
[52.254263] perf_mmap user->locked_vm:0x408 pinned_vm:0x80e ret:0
The value of user->locked_vm increases to a limit then the memory
is tracked by pinned_vm.
During deallocation the size is subtracted from pinned_vm until
it hits a limit. Then a larger value is subtracted from locked_vm
leading to a large number (because of type unsigned):
[64.267797] perf_mmap_close mmap_user->locked_vm:0x408 pinned_vm:0x78d
[64.267826] perf_mmap_close mmap_user->locked_vm:0x408 pinned_vm:0x70c
[64.267848] perf_mmap_close mmap_user->locked_vm:0x408 pinned_vm:0x68b
[64.267869] perf_mmap_close mmap_user->locked_vm:0x408 pinned_vm:0x60a
[64.267891] perf_mmap_close mmap_user->locked_vm:0x408 pinned_vm:0x589
[64.267911] perf_mmap_close mmap_user->locked_vm:0x408 pinned_vm:0x508
[64.267933] perf_mmap_close mmap_user->locked_vm:0x408 pinned_vm:0x487
[64.267952] perf_mmap_close mmap_user->locked_vm:0x408 pinned_vm:0x406
[64.268883] perf_mmap_close mmap_user->locked_vm:0x307 pinned_vm:0x406
[64.269117] perf_mmap_close mmap_user->locked_vm:0x206 pinned_vm:0x406
[64.269433] perf_mmap_close mmap_user->locked_vm:0x105 pinned_vm:0x406
[64.269536] perf_mmap_close mmap_user->locked_vm:0x4 pinned_vm:0x404
[64.269797] perf_mmap_close mmap_user->locked_vm:0xffffffffffffff84 pinned_vm:0x303
[64.270105] perf_mmap_close mmap_user->locked_vm:0xffffffffffffff04 pinned_vm:0x202
[64.270374] perf_mmap_close mmap_user->locked_vm:0xfffffffffffffe84 pinned_vm:0x101
[64.270628] perf_mmap_close mmap_user->locked_vm:0xfffffffffffffe04 pinned_vm:0x0
This value sticks for the user until system is rebooted, causing
follow-on system calls using locked_vm resource limit to fail.
Note: There is no issue using the normal trace buffer.
In fact the issue is in perf_mmap_close(). During allocation auxiliary
trace buffer memory is either traced as 'extra' and added to 'pinned_vm'
or trace as 'user_extra' and added to 'locked_vm'. This applies for
normal trace buffers and auxiliary trace buffer.
However in function perf_mmap_close() all auxiliary trace buffer is
subtraced from 'locked_vm' and never from 'pinned_vm'. This breaks the
ballance.
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: acme@kernel.org
Cc: gor@linux.ibm.com
Cc: hechaol@fb.com
Cc: heiko.carstens@de.ibm.com
Cc: linux-perf-users@vger.kernel.org
Cc: songliubraving@fb.com
Fixes: d44248a413 ("perf/core: Rework memory accounting in perf_mmap()")
Link: https://lkml.kernel.org/r/20191021083354.67868-1-tmricht@linux.ibm.com
[ Minor readability edits. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull perf/urgent fixes from Arnaldo Carvalho de Melo:
perf buildid-cache:
Adrian Hunter:
- Fix mode setting in copyfile_mode_ns() when copying /proc/kcore.
perf evlist:
Andi Kleen:
- Fix freeing id arrays.
tools headers:
- Sync sched.h anc kvm.h headers with the kernel sources.
perf jvmti:
Thomas Richter:
- Link against tools/lib/ctype.o to have weak strlcpy().
perf annotate:
Gustavo A. R. Silva:
- Fix multiple memory and file descriptor leaks, found by coverity.
perf c2c/kmem:
Yunfeng Ye:
- Fix leaks in error handling paths in 'perf c2c', 'perf kmem', found by
internal static analysis tool.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
All the drivers, which use the OPP framework control regulators, which
are already enabled. Typically those regulators are also system critical,
due to providing power to CPU core or system buses. It turned out that
there are cases, where calling regulator_enable() on such boot-enabled
regulator has side-effects and might change its initial voltage due to
performing initial voltage balancing without all restrictions from the
consumers. Until this issue becomes finally solved in regulator core,
avoid calling regulator_enable()/disable() from the OPP framework.
This reverts commit 7f93ff73f7.
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
cifs_setattr_nounix has two paths which miss free operations
for xid and fullpath.
Use goto cifs_setattr_exit like other paths to fix them.
CC: Stable <stable@vger.kernel.org>
Fixes: aa081859b1 ("cifs: flush before set-info if we have writeable handles")
Signed-off-by: Chuhong Yuan <hslester96@gmail.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
According to MS-CIFS specification MID 0xFFFF should not be used by the
CIFS client, but we actually do. Besides, this has proven to cause races
leading to oops between SendReceive2/cifs_demultiplex_thread. On SMB1,
MID is a 2 byte value easy to reach in CurrentMid which may conflict with
an oplock break notification request coming from server
Signed-off-by: Roberto Bergantinos Corpas <rbergant@redhat.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
CC: Stable <stable@vger.kernel.org>
It could be confusing why we set granularity to 1 seconds rather
than 2 seconds (1 second is the max the VFS allows) for these
mounts to very old servers ...
Signed-off-by: Steve French <stfrench@microsoft.com>
We only want to avoid blocking in connect when mounting SMB root
filesystems, otherwise bail out from generic_ip_connect() so cifs.ko
can perform any reconnect failover appropriately.
This fixes DFS failover/reconnection tests in upstream buildbot.
Fixes: 8eecd1c2e5 ("cifs: Add support for root file systems")
Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
There are no more active users of DEV_PM_QOS_MIN_FREQUENCY and
DEV_PM_QOS_MAX_FREQUENCY device PM QoS request types, so drop them
along with the code supporting them.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Replace the CPU device PM QoS used for the management of min and max
frequency constraints in cpufreq (and its users) with per-policy
frequency QoS to avoid problems with cpufreq policies covering
more then one CPU.
Namely, a cpufreq driver is registered with the subsys interface
which calls cpufreq_add_dev() for each CPU, starting from CPU0, so
currently the PM QoS notifiers are added to the first CPU in the
policy (i.e. CPU0 in the majority of cases).
In turn, when the cpufreq driver is unregistered, the subsys interface
doing that calls cpufreq_remove_dev() for each CPU, starting from CPU0,
and the PM QoS notifiers are only removed when cpufreq_remove_dev() is
called for the last CPU in the policy, say CPUx, which as a rule is
not CPU0 if the policy covers more than one CPU. Then, the PM QoS
notifiers cannot be removed, because CPUx does not have them, and
they are still there in the device PM QoS notifiers list of CPU0,
which prevents new PM QoS notifiers from being registered for CPU0
on the next attempt to register the cpufreq driver.
The same issue occurs when the first CPU in the policy goes offline
before unregistering the driver.
After this change it does not matter which CPU is the policy CPU at
the driver registration time and whether or not it is online all the
time, because the frequency QoS is per policy and not per CPU.
Fixes: 67d874c3b2 ("cpufreq: Register notifiers with the PM QoS framework")
Reported-by: Dmitry Osipenko <digetx@gmail.com>
Tested-by: Dmitry Osipenko <digetx@gmail.com>
Reported-by: Sudeep Holla <sudeep.holla@arm.com>
Tested-by: Sudeep Holla <sudeep.holla@arm.com>
Diagnosed-by: Viresh Kumar <viresh.kumar@linaro.org>
Link: https://lore.kernel.org/linux-pm/5ad2624194baa2f53acc1f1e627eb7684c577a19.1562210705.git.viresh.kumar@linaro.org/T/#md2d89e95906b8c91c15f582146173dce2e86e99f
Link: https://lore.kernel.org/linux-pm/20191017094612.6tbkwoq4harsjcqv@vireshk-i7/T/#m30d48cc23b9a80467fbaa16e30f90b3828a5a29b
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Introduce frequency QoS, based on the "raw" low-level PM QoS, to
represent min and max frequency requests and aggregate constraints.
The min and max frequency requests are to be represented by
struct freq_qos_request objects and the aggregate constraints are to
be represented by struct freq_constraints objects. The latter are
expected to be initialized with the help of freq_constraints_init().
The freq_qos_read_value() helper is defined to retrieve the aggregate
constraints values from a given struct freq_constraints object and
there are the freq_qos_add_request(), freq_qos_update_request() and
freq_qos_remove_request() helpers to manipulate the min and max
frequency requests. It is assumed that the the helpers will not
run concurrently with each other for the same struct freq_qos_request
object, so if that may be the case, their uses must ensure proper
synchronization between them (e.g. through locking).
In addition, freq_qos_add_notifier() and freq_qos_remove_notifier()
are provided to add and remove notifiers that will trigger on aggregate
constraint changes to and from a given struct freq_constraints object,
respectively.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Pull more Kbuild fixes from Masahiro Yamada:
- fix a bashism of setlocalversion
- do not use the too new --sort option of tar
* tag 'kbuild-fixes-v5.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kheaders: substituting --sort in archive creation
scripts: setlocalversion: fix a bashism
kbuild: update comment about KBUILD_ALLDIRS
Voltage sensors overlap with external temperature sensors. Detect
the multi-function of voltage, thermal diode, thermistor and
reserved from register VT_ADC_MD_REG to set value of vsen_mask &
tcpu_mask & temp_mode in nct7904_data struct. If the value is
reserved, needs to disable the vsen_mask & tcpu_mask.
Signed-off-by: amy.shih <amy.shih@advantech.com.tw>
Link: https://lore.kernel.org/r/20191014082451.2895-1-Amy.Shih@advantech.com.tw
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Pull x86 fixes from Thomas Gleixner:
"A small set of x86 fixes:
- Prevent a NULL pointer dereference in the X2APIC code in case of a
CPU hotplug failure.
- Prevent boot failures on HP superdome machines by invalidating the
level2 kernel pagetable entries outside of the kernel area as
invalid so BIOS reserved space won't be touched unintentionally.
Also ensure that memory holes are rounded up to the next PMD
boundary correctly.
- Enable X2APIC support on Hyper-V to prevent boot failures.
- Set the paravirt name when running on Hyper-V for consistency
- Move a function under the appropriate ifdef guard to prevent build
warnings"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/boot/acpi: Move get_cmdline_acpi_rsdp() under #ifdef guard
x86/hyperv: Set pv_info.name to "Hyper-V"
x86/apic/x2apic: Fix a NULL pointer deref when handling a dying cpu
x86/hyperv: Make vapic support x2apic mode
x86/boot/64: Round memory hole size up to next PMD page
x86/boot/64: Make level2_kernel_pgt pages invalid outside kernel area
Pull irq fixes from Thomas Gleixner:
"A small set of irq chip driver fixes and updates:
- Update the SIFIVE PLIC interrupt driver to use the fasteoi handler
to address the shortcomings of the existing flow handling which was
prone to lose interrupts
- Use the proper limit for GIC interrupt line numbers
- Add retrigger support for the recently merged Anapurna Labs Fabric
interrupt controller to make it complete
- Enable the ATMEL AIC5 interrupt controller driver on the new
SAM9X60 SoC"
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/sifive-plic: Switch to fasteoi flow
irqchip/gic-v3: Fix GIC_LINE_NR accessor
irqchip/atmel-aic5: Add support for sam9x60 irqchip
irqchip/al-fic: Add support for irq retrigger
Pull hrtimer fixlet from Thomas Gleixner:
"A single commit annotating the lockcless access to timer->base with
READ_ONCE() and adding the WRITE_ONCE() counterparts for completeness"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
hrtimer: Annotate lockless access to timer->base
Pull stop-machine fix from Thomas Gleixner:
"A single fix, amending stop machine with WRITE/READ_ONCE() to address
the fallout of KCSAN"
* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
stop_machine: Avoid potential race behaviour
The PMU emulation code uses the perf event sample period to trigger
the overflow detection. This works fine for the *first* overflow
handling, but results in a huge number of interrupts on the host,
unrelated to the number of interrupts handled in the guest (a x20
factor is pretty common for the cycle counter). On a slow system
(such as a SW model), this can result in the guest only making
forward progress at a glacial pace.
It turns out that the clue is in the name. The sample period is
exactly that: a period. And once the an overflow has occured,
the following period should be the full width of the associated
counter, instead of whatever the guest had initially programed.
Reset the sample period to the architected value in the overflow
handler, which now results in a number of host interrupts that is
much closer to the number of interrupts in the guest.
Fixes: b02386eb7d ("arm64: KVM: Add PMU overflow interrupt routing")
Reviewed-by: Andrew Murray <andrew.murray@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
The current convention for KVM to request a chained event from the
host PMU is to set bit[0] in attr.config1 (PERF_ATTR_CFG1_KVM_PMU_CHAINED).
But as it turns out, this bit gets set *after* we create the kernel
event that backs our virtual counter, meaning that we never get
a 64bit counter.
Moving the setting to an earlier point solves the problem.
Fixes: 80f393a23b ("KVM: arm/arm64: Support chained PMU counters")
Reviewed-by: Andrew Murray <andrew.murray@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Of PMCR_EL0.LC, the ARMv8 ARM says:
"In an AArch64 only implementation, this field is RES 1."
So be it.
Fixes: ab9468340d ("arm64: KVM: Add access handler for PMCR register")
Reviewed-by: Andrew Murray <andrew.murray@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
When a counter is disabled, its value is sampled before the event
is being disabled, and the value written back in the shadow register.
In that process, the value gets truncated to 32bit, which is adequate
for any counter but the cycle counter (defined as a 64bit counter).
This obviously results in a corrupted counter, and things like
"perf record -e cycles" not working at all when run in a guest...
A similar, but less critical bug exists in kvm_pmu_get_counter_value.
Make the truncation conditional on the counter not being the cycle
counter, which results in a minor code reorganisation.
Fixes: 80f393a23b ("KVM: arm/arm64: Support chained PMU counters")
Reviewed-by: Andrew Murray <andrew.murray@arm.com>
Reported-by: Julien Thierry <julien.thierry.kdev@gmail.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Pull networking fixes from David Miller:
"I was battling a cold after some recent trips, so quite a bit piled up
meanwhile, sorry about that.
Highlights:
1) Fix fd leak in various bpf selftests, from Brian Vazquez.
2) Fix crash in xsk when device doesn't support some methods, from
Magnus Karlsson.
3) Fix various leaks and use-after-free in rxrpc, from David Howells.
4) Fix several SKB leaks due to confusion of who owns an SKB and who
should release it in the llc code. From Eric Biggers.
5) Kill a bunc of KCSAN warnings in TCP, from Eric Dumazet.
6) Jumbo packets don't work after resume on r8169, as the BIOS resets
the chip into non-jumbo mode during suspend. From Heiner Kallweit.
7) Corrupt L2 header during MPLS push, from Davide Caratti.
8) Prevent possible infinite loop in tc_ctl_action, from Eric
Dumazet.
9) Get register bits right in bcmgenet driver, based upon chip
version. From Florian Fainelli.
10) Fix mutex problems in microchip DSA driver, from Marek Vasut.
11) Cure race between route lookup and invalidation in ipv4, from Wei
Wang.
12) Fix performance regression due to false sharing in 'net'
structure, from Eric Dumazet"
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (145 commits)
net: reorder 'struct net' fields to avoid false sharing
net: dsa: fix switch tree list
net: ethernet: dwmac-sun8i: show message only when switching to promisc
net: aquantia: add an error handling in aq_nic_set_multicast_list
net: netem: correct the parent's backlog when corrupted packet was dropped
net: netem: fix error path for corrupted GSO frames
macb: propagate errors when getting optional clocks
xen/netback: fix error path of xenvif_connect_data()
net: hns3: fix mis-counting IRQ vector numbers issue
net: usb: lan78xx: Connect PHY before registering MAC
vsock/virtio: discard packets if credit is not respected
vsock/virtio: send a credit update when buffer size is changed
mlxsw: spectrum_trap: Push Ethernet header before reporting trap
net: ensure correct skb->tstamp in various fragmenters
net: bcmgenet: reset 40nm EPHY on energy detect
net: bcmgenet: soft reset 40nm EPHYs before MAC init
net: phy: bcm7xxx: define soft_reset for 40nm EPHY
net: bcmgenet: don't set phydev->link from MAC
net: Update address for MediaTek ethernet driver in MAINTAINERS
ipv4: fix race condition between route lookup and invalidation
...
Intel test robot reported a ~7% regression on TCP_CRR tests
that they bisected to the cited commit.
Indeed, every time a new TCP socket is created or deleted,
the atomic counter net->count is touched (via get_net(net)
and put_net(net) calls)
So cpus might have to reload a contended cache line in
net_hash_mix(net) calls.
We need to reorder 'struct net' fields to move @hash_mix
in a read mostly cache line.
We move in the first cache line fields that can be
dirtied often.
We probably will have to address in a followup patch
the __randomize_layout that was added in linux-4.13,
since this might break our placement choices.
Fixes: 355b985537 ("netns: provide pure entropy for net_hash_mix()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If there are multiple switch trees on the device, only the last one
will be listed, because the arguments of list_add_tail are swapped.
Fixes: 83c0afaec7 ("net: dsa: Add new binding implementation")
Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Printing the info message every time more than the max number of mac
addresses are requested generates unnecessary log spam. Showing it only
when the hw is not already in promiscous mode is equally informative
without being annoying.
Signed-off-by: Mans Rullgard <mans@mansr.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
add an error handling in aq_nic_set_multicast_list, it may not
work when hw_multicast_list_set error; and at the same time
it will remove gcc Wunused-but-set-variable warning.
Signed-off-by: Chenwandun <chenwandun@huawei.com>
Reviewed-by: Igor Russkikh <igor.russkikh@aquantia.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski says:
====================
net: netem: fix further issues with packet corruption
This set is fixing two more issues with the netem packet corruption.
First patch (which was previously posted) avoids NULL pointer dereference
if the first frame gets freed due to allocation or checksum failure.
v2 improves the clarity of the code a little as requested by Cong.
Second patch ensures we don't return SUCCESS if the frame was in fact
dropped. Thanks to this commit message for patch 1 no longer needs the
"this will still break with a single-frame failure" disclaimer.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
If packet corruption failed we jump to finish_segs and return
NET_XMIT_SUCCESS. Seeing success will make the parent qdisc
increment its backlog, that's incorrect - we need to return
NET_XMIT_DROP.
Fixes: 6071bd1aa1 ("netem: Segment GSO packets on enqueue")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
To corrupt a GSO frame we first perform segmentation. We then
proceed using the first segment instead of the full GSO skb and
requeue the rest of the segments as separate packets.
If there are any issues with processing the first segment we
still want to process the rest, therefore we jump to the
finish_segs label.
Commit 177b800746 ("net: netem: fix backlog accounting for
corrupted GSO frames") started using the pointer to the first
segment in the "rest of segments processing", but as mentioned
above the first segment may had already been freed at this point.
Backlog corrections for parent qdiscs have to be adjusted.
Fixes: 177b800746 ("net: netem: fix backlog accounting for corrupted GSO frames")
Reported-by: kbuild test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Reported-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The tx_clk, rx_clk, and tsu_clk are optional. Currently the macb driver
marks clock as not available if it receives an error when trying to get
a clock. This is wrong, because a clock controller might return
-EPROBE_DEFER if a clock is not available, but will eventually become
available.
In these cases, the driver would probe successfully but will never be
able to adjust the clocks, because the clocks were not available during
probe, but became available later.
For example, the clock controller for the ZynqMP is implemented in the
PMU firmware and the clocks are only available after the firmware driver
has been probed.
Use devm_clk_get_optional() in instead of devm_clk_get() to get the
optional clock and propagate all errors to the calling function.
Signed-off-by: Michael Tretter <m.tretter@pengutronix.de>
Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com>
Tested-by: Nicolas Ferre <nicolas.ferre@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
xenvif_connect_data() calls module_put() in case of error. This is
wrong as there is no related module_get().
Remove the superfluous module_put().
Fixes: 279f438e36 ("xen-netback: Don't destroy the netdev until the vif is shut down")
Cc: <stable@vger.kernel.org> # 3.12
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Reviewed-by: Wei Liu <wei.liu@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, the num_msi_left means the vector numbers of NIC,
but if the PF supported RoCE, it contains the vector numbers
of NIC and RoCE(Not expected).
This may cause interrupts lost in some case, because of the
NIC module used the vector resources which belongs to RoCE.
This patch adds a new variable num_nic_msi to store the vector
numbers of NIC, and adjust the default TQP numbers and rss_size
according to the value of num_nic_msi.
Fixes: 46a3df9f97 ("net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support")
Signed-off-by: Yonglong Liu <liuyonglong@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Merge misc fixes from Andrew Morton:
"Rather a lot of fixes, almost all affecting mm/"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (26 commits)
scripts/gdb: fix debugging modules on s390
kernel/events/uprobes.c: only do FOLL_SPLIT_PMD for uprobe register
mm/thp: allow dropping THP from page cache
mm/vmscan.c: support removing arbitrary sized pages from mapping
mm/thp: fix node page state in split_huge_page_to_list()
proc/meminfo: fix output alignment
mm/init-mm.c: include <linux/mman.h> for vm_committed_as_batch
mm/filemap.c: include <linux/ramfs.h> for generic_file_vm_ops definition
mm: include <linux/huge_mm.h> for is_vma_temporary_stack
zram: fix race between backing_dev_show and backing_dev_store
mm/memcontrol: update lruvec counters in mem_cgroup_move_account
ocfs2: fix panic due to ocfs2_wq is null
hugetlbfs: don't access uninitialized memmaps in pfn_range_valid_gigantic()
mm: memblock: do not enforce current limit for memblock_phys* family
mm: memcg: get number of pages on the LRU list in memcgroup base on lru_zone_size
mm/gup: fix a misnamed "write" argument, and a related bug
mm/gup_benchmark: add a missing "w" to getopt string
ocfs2: fix error handling in ocfs2_setattr()
mm: memcg/slab: fix panic in __free_slab() caused by premature memcg pointer release
mm/memunmap: don't access uninitialized memmap in memunmap_pages()
...
Currently lx-symbols assumes that module text is always located at
module->core_layout->base, but s390 uses the following layout:
+------+ <- module->core_layout->base
| GOT |
+------+ <- module->core_layout->base + module->arch->plt_offset
| PLT |
+------+ <- module->core_layout->base + module->arch->plt_offset +
| TEXT | module->arch->plt_size
+------+
Therefore, when trying to debug modules on s390, all the symbol
addresses are skewed by plt_offset + plt_size.
Fix by adding plt_offset + plt_size to module_addr in
load_module_symbols().
Link: http://lkml.kernel.org/r/20191017085917.81791-1-iii@linux.ibm.com
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Reviewed-by: Jan Kiszka <jan.kiszka@siemens.com>
Cc: Kieran Bingham <kbingham@kernel.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Attaching uprobe to text section in THP splits the PMD mapped page table
into PTE mapped entries. On uprobe detach, we would like to regroup PMD
mapped page table entry to regain performance benefit of THP.
However, the regroup is broken For perf_event based trace_uprobe. This
is because perf_event based trace_uprobe calls uprobe_unregister twice
on close: first in TRACE_REG_PERF_CLOSE, then in
TRACE_REG_PERF_UNREGISTER. The second call will split the PMD mapped
page table entry, which is not the desired behavior.
Fix this by only use FOLL_SPLIT_PMD for uprobe register case.
Add a WARN() to confirm uprobe unregister never work on huge pages, and
abort the operation when this WARN() triggers.
Link: http://lkml.kernel.org/r/20191017164223.2762148-6-songliubraving@fb.com
Fixes: 5a52c9df62 ("uprobe: use FOLL_SPLIT_PMD instead of FOLL_SPLIT")
Signed-off-by: Song Liu <songliubraving@fb.com>
Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: William Kucharski <william.kucharski@oracle.com>
Cc: Yang Shi <yang.shi@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
CPU0: CPU1:
backing_dev_show backing_dev_store
...... ......
file = zram->backing_dev;
down_read(&zram->init_lock); down_read(&zram->init_init_lock)
file_path(file, ...); zram->backing_dev = backing_dev;
up_read(&zram->init_lock); up_read(&zram->init_lock);
gets the value of zram->backing_dev too early in backing_dev_show, which
resultin the value being NULL at the beginning, and not NULL later.
backtrace:
d_path+0xcc/0x174
file_path+0x10/0x18
backing_dev_show+0x40/0xb4
dev_attr_show+0x20/0x54
sysfs_kf_seq_show+0x9c/0x10c
kernfs_seq_show+0x28/0x30
seq_read+0x184/0x488
kernfs_fop_read+0x5c/0x1a4
__vfs_read+0x44/0x128
vfs_read+0xa0/0x138
SyS_read+0x54/0xb4
Link: http://lkml.kernel.org/r/1571046839-16814-1-git-send-email-chenwandun@huawei.com
Signed-off-by: Chenwandun <chenwandun@huawei.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: <stable@vger.kernel.org> [4.14+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Uninitialized memmaps contain garbage and in the worst case trigger
kernel BUGs, especially with CONFIG_PAGE_POISONING. They should not get
touched.
Let's make sure that we only consider online memory (managed by the
buddy) that has initialized memmaps. ZONE_DEVICE is not applicable.
page_zone() will call page_to_nid(), which will trigger
VM_BUG_ON_PGFLAGS(PagePoisoned(page), page) with CONFIG_PAGE_POISONING
and CONFIG_DEBUG_VM_PGFLAGS when called on uninitialized memmaps. This
can be the case when an offline memory block (e.g., never onlined) is
spanned by a zone.
Note: As explained by Michal in [1], alloc_contig_range() will verify
the range. So it boils down to the wrong access in this function.
[1] http://lkml.kernel.org/r/20180423000943.GO17484@dhcp22.suse.cz
Link: http://lkml.kernel.org/r/20191015120717.4858-1-david@redhat.com
Fixes: f1dd2cd13c ("mm, memory_hotplug: do not associate hotadded memory to zones until online") [visible after d0dc12e86b]
Signed-off-by: David Hildenbrand <david@redhat.com>
Reported-by: Michal Hocko <mhocko@kernel.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: <stable@vger.kernel.org> [4.13+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Until commit 92d12f9544 ("memblock: refactor internal allocation
functions") the maximal address for memblock allocations was forced to
memblock.current_limit only for the allocation functions returning
virtual address. The changes introduced by that commit moved the limit
enforcement into the allocation core and as a result the allocation
functions returning physical address also started to limit allocations
to memblock.current_limit.
This caused breakage of etnaviv GPU driver:
etnaviv etnaviv: bound 130000.gpu (ops gpu_ops)
etnaviv etnaviv: bound 134000.gpu (ops gpu_ops)
etnaviv etnaviv: bound 2204000.gpu (ops gpu_ops)
etnaviv-gpu 130000.gpu: model: GC2000, revision: 5108
etnaviv-gpu 130000.gpu: command buffer outside valid memory window
etnaviv-gpu 134000.gpu: model: GC320, revision: 5007
etnaviv-gpu 134000.gpu: command buffer outside valid memory window
etnaviv-gpu 2204000.gpu: model: GC355, revision: 1215
etnaviv-gpu 2204000.gpu: Ignoring GPU with VG and FE2.0
Restore the behaviour of memblock_phys* family so that these functions
will not enforce memblock.current_limit.
Link: http://lkml.kernel.org/r/1570915861-17633-1-git-send-email-rppt@kernel.org
Fixes: 92d12f9544 ("memblock: refactor internal allocation functions")
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Reported-by: Adam Ford <aford173@gmail.com>
Tested-by: Adam Ford <aford173@gmail.com> [imx6q-logicpd]
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Fabio Estevam <festevam@gmail.com>
Cc: Lucas Stach <l.stach@pengutronix.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 1a61ab8038 ("mm: memcontrol: replace zone summing with
lruvec_page_state()") has made lruvec_page_state to use per-cpu counters
instead of calculating it directly from lru_zone_size with an idea that
this would be more effective.
Tim has reported that this is not really the case for their database
benchmark which is showing an opposite results where lruvec_page_state
is taking up a huge chunk of CPU cycles (about 25% of the system time
which is roughly 7% of total cpu cycles) on 5.3 kernels. The workload
is running on a larger machine (96cpus), it has many cgroups (500) and
it is heavily direct reclaim bound.
Tim Chen said:
: The problem can also be reproduced by running simple multi-threaded
: pmbench benchmark with a fast Optane SSD swap (see profile below).
:
:
: 6.15% 3.08% pmbench [kernel.vmlinux] [k] lruvec_lru_size
: |
: |--3.07%--lruvec_lru_size
: | |
: | |--2.11%--cpumask_next
: | | |
: | | --1.66%--find_next_bit
: | |
: | --0.57%--call_function_interrupt
: | |
: | --0.55%--smp_call_function_interrupt
: |
: |--1.59%--0x441f0fc3d009
: | _ops_rdtsc_init_base_freq
: | access_histogram
: | page_fault
: | __do_page_fault
: | handle_mm_fault
: | __handle_mm_fault
: | |
: | --1.54%--do_swap_page
: | swapin_readahead
: | swap_cluster_readahead
: | |
: | --1.53%--read_swap_cache_async
: | __read_swap_cache_async
: | alloc_pages_vma
: | __alloc_pages_nodemask
: | __alloc_pages_slowpath
: | try_to_free_pages
: | do_try_to_free_pages
: | shrink_node
: | shrink_node_memcg
: | |
: | |--0.77%--lruvec_lru_size
: | |
: | --0.76%--inactive_list_is_low
: | |
: | --0.76%--lruvec_lru_size
: |
: --1.50%--measure_read
: page_fault
: __do_page_fault
: handle_mm_fault
: __handle_mm_fault
: do_swap_page
: swapin_readahead
: swap_cluster_readahead
: |
: --1.48%--read_swap_cache_async
: __read_swap_cache_async
: alloc_pages_vma
: __alloc_pages_nodemask
: __alloc_pages_slowpath
: try_to_free_pages
: do_try_to_free_pages
: shrink_node
: shrink_node_memcg
: |
: |--0.75%--inactive_list_is_low
: | |
: | --0.75%--lruvec_lru_size
: |
: --0.73%--lruvec_lru_size
The likely culprit is the cache traffic the lruvec_page_state_local
generates. Dave Hansen says:
: I was thinking purely of the cache footprint. If it's reading
: pn->lruvec_stat_local->count[idx] is three separate cachelines, so 192
: bytes of cache *96 CPUs = 18k of data, mostly read-only. 1 cgroup would
: be 18k of data for the whole system and the caching would be pretty
: efficient and all 18k would probably survive a tight page fault loop in
: the L1. 500 cgroups would be ~90k of data per CPU thread which doesn't
: fit in the L1 and probably wouldn't survive a tight page fault loop if
: both logical threads were banging on different cgroups.
:
: It's just a theory, but it's why I noted the number of cgroups when I
: initially saw this show up in profiles
Fix the regression by partially reverting the said commit and calculate
the lru size explicitly.
Link: http://lkml.kernel.org/r/20190905071034.16822-1-honglei.wang@oracle.com
Fixes: 1a61ab8038 ("mm: memcontrol: replace zone summing with lruvec_page_state()")
Signed-off-by: Honglei Wang <honglei.wang@oracle.com>
Reported-by: Tim Chen <tim.c.chen@linux.intel.com>
Acked-by: Tim Chen <tim.c.chen@linux.intel.com>
Tested-by: Tim Chen <tim.c.chen@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Roman Gushchin <guro@fb.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: <stable@vger.kernel.org> [5.2+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In several routines, the "flags" argument is incorrectly named "write".
Change it to "flags".
Also, in one place, the misnaming led to an actual bug:
"flags & FOLL_WRITE" is required, rather than just "flags".
(That problem was flagged by krobot, in v1 of this patch.)
Also, change the flags argument from int, to unsigned int.
You can see that this was a simple oversight, because the
calling code passes "flags" to the fifth argument:
gup_pgd_range():
...
if (!gup_huge_pd(__hugepd(pgd_val(pgd)), addr,
PGDIR_SHIFT, next, flags, pages, nr))
...which, until this patch, the callees referred to as "write".
Also, change two lines to avoid checkpatch line length
complaints, and another line to fix another oversight
that checkpatch called out: missing "int" on pdshift.
Link: http://lkml.kernel.org/r/20191014184639.1512873-3-jhubbard@nvidia.com
Fixes: b798bec474 ("mm/gup: change write parameter to flags in fast walk")
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Reported-by: kbuild test robot <lkp@intel.com>
Suggested-by: Kirill A. Shutemov <kirill@shutemov.name>
Suggested-by: Ira Weiny <ira.weiny@intel.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Karsten reported the following panic in __free_slab() happening on a s390x
machine:
Unable to handle kernel pointer dereference in virtual kernel address space
Failing address: 0000000000000000 TEID: 0000000000000483
Fault in home space mode while using kernel ASCE.
AS:00000000017d4007 R3:000000007fbd0007 S:000000007fbff000 P:000000000000003d
Oops: 0004 ilc:3 Ý#1¨ PREEMPT SMP
Modules linked in: tcp_diag inet_diag xt_tcpudp ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_at nf_nat
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.3.0-05872-g6133e3e4bada-dirty #14
Hardware name: IBM 2964 NC9 702 (z/VM 6.4.0)
Krnl PSW : 0704d00180000000 00000000003cadb6 (__free_slab+0x686/0x6b0)
R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 RI:0 EA:3
Krnl GPRS: 00000000f3a32928 0000000000000000 000000007fbf5d00 000000000117c4b8
0000000000000000 000000009e3291c1 0000000000000000 0000000000000000
0000000000000003 0000000000000008 000000002b478b00 000003d080a97600
0000000000000003 0000000000000008 000000002b478b00 000003d080a97600
000000000117ba00 000003e000057db0 00000000003cabcc 000003e000057c78
Krnl Code: 00000000003cada6: e310a1400004 lg %r1,320(%r10)
00000000003cadac: c0e50046c286 brasl %r14,ca32b8
#00000000003cadb2: a7f4fe36 brc 15,3caa1e
>00000000003cadb6: e32060800024 stg %r2,128(%r6)
00000000003cadbc: a7f4fd9e brc 15,3ca8f8
00000000003cadc0: c0e50046790c brasl %r14,c99fd8
00000000003cadc6: a7f4fe2c brc 15,3caa
00000000003cadc6: a7f4fe2c brc 15,3caa1e
00000000003cadca: ecb1ffff00d9 aghik %r11,%r1,-1
Call Trace:
(<00000000003cabcc> __free_slab+0x49c/0x6b0)
<00000000001f5886> rcu_core+0x5a6/0x7e0
<0000000000ca2dea> __do_softirq+0xf2/0x5c0
<0000000000152644> irq_exit+0x104/0x130
<000000000010d222> do_IRQ+0x9a/0xf0
<0000000000ca2344> ext_int_handler+0x130/0x134
<0000000000103648> enabled_wait+0x58/0x128
(<0000000000103634> enabled_wait+0x44/0x128)
<0000000000103b00> arch_cpu_idle+0x40/0x58
<0000000000ca0544> default_idle_call+0x3c/0x68
<000000000018eaa4> do_idle+0xec/0x1c0
<000000000018ee0e> cpu_startup_entry+0x36/0x40
<000000000122df34> arch_call_rest_init+0x5c/0x88
<0000000000000000> 0x0
INFO: lockdep is turned off.
Last Breaking-Event-Address:
<00000000003ca8f4> __free_slab+0x1c4/0x6b0
Kernel panic - not syncing: Fatal exception in interrupt
The kernel panics on an attempt to dereference the NULL memcg pointer.
When shutdown_cache() is called from the kmem_cache_destroy() context, a
memcg kmem_cache might have empty slab pages in a partial list, which are
still charged to the memory cgroup.
These pages are released by free_partial() at the beginning of
shutdown_cache(): either directly or by scheduling a RCU-delayed work
(if the kmem_cache has the SLAB_TYPESAFE_BY_RCU flag). The latter case
is when the reported panic can happen: memcg_unlink_cache() is called
immediately after shrinking partial lists, without waiting for scheduled
RCU works. It sets the kmem_cache->memcg_params.memcg pointer to NULL,
and the following attempt to dereference it by __free_slab() from the
RCU work context causes the panic.
To fix the issue, let's postpone the release of the memcg pointer to
destroy_memcg_params(). It's called from a separate work context by
slab_caches_to_rcu_destroy_workfn(), which contains a full RCU barrier.
This guarantees that all scheduled page release RCU works will complete
before the memcg pointer will be zeroed.
Big thanks for Karsten for the perfect report containing all necessary
information, his help with the analysis of the problem and testing of the
fix.
Link: http://lkml.kernel.org/r/20191010160549.1584316-1-guro@fb.com
Fixes: fb2f2b0adb ("mm: memcg/slab: reparent memcg kmem_caches on cgroup removal")
Signed-off-by: Roman Gushchin <guro@fb.com>
Reported-by: Karsten Graul <kgraul@linux.ibm.com>
Tested-by: Karsten Graul <kgraul@linux.ibm.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Cc: Karsten Graul <kgraul@linux.ibm.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: David Rientjes <rientjes@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Patch series "mm/memory_hotplug: Shrink zones before removing memory",
v6.
This series fixes the access of uninitialized memmaps when shrinking
zones/nodes and when removing memory. Also, it contains all fixes for
crashes that can be triggered when removing certain namespace using
memunmap_pages() - ZONE_DEVICE, reported by Aneesh.
We stop trying to shrink ZONE_DEVICE, as it's buggy, fixing it would be
more involved (we don't have SECTION_IS_ONLINE as an indicator), and
shrinking is only of limited use (set_zone_contiguous() cannot detect
the ZONE_DEVICE as contiguous).
We continue shrinking !ZONE_DEVICE zones, however, I reduced the amount
of code to a minimum. Shrinking is especially necessary to keep
zone->contiguous set where possible, especially, on memory unplug of
DIMMs at zone boundaries.
--------------------------------------------------------------------------
Zones are now properly shrunk when offlining memory blocks or when
onlining failed. This allows to properly shrink zones on memory unplug
even if the separate memory blocks of a DIMM were onlined to different
zones or re-onlined to a different zone after offlining.
Example:
:/# cat /proc/zoneinfo
Node 1, zone Movable
spanned 0
present 0
managed 0
:/# echo "online_movable" > /sys/devices/system/memory/memory41/state
:/# echo "online_movable" > /sys/devices/system/memory/memory43/state
:/# cat /proc/zoneinfo
Node 1, zone Movable
spanned 98304
present 65536
managed 65536
:/# echo 0 > /sys/devices/system/memory/memory43/online
:/# cat /proc/zoneinfo
Node 1, zone Movable
spanned 32768
present 32768
managed 32768
:/# echo 0 > /sys/devices/system/memory/memory41/online
:/# cat /proc/zoneinfo
Node 1, zone Movable
spanned 0
present 0
managed 0
This patch (of 10):
With an altmap, the memmap falling into the reserved altmap space are not
initialized and, therefore, contain a garbage NID and a garbage zone.
Make sure to read the NID/zone from a memmap that was initialized.
This fixes a kernel crash that is observed when destroying a namespace:
kernel BUG at include/linux/mm.h:1107!
cpu 0x1: Vector: 700 (Program Check) at [c000000274087890]
pc: c0000000004b9728: memunmap_pages+0x238/0x340
lr: c0000000004b9724: memunmap_pages+0x234/0x340
...
pid = 3669, comm = ndctl
kernel BUG at include/linux/mm.h:1107!
devm_action_release+0x30/0x50
release_nodes+0x268/0x2d0
device_release_driver_internal+0x174/0x240
unbind_store+0x13c/0x190
drv_attr_store+0x44/0x60
sysfs_kf_write+0x70/0xa0
kernfs_fop_write+0x1ac/0x290
__vfs_write+0x3c/0x70
vfs_write+0xe4/0x200
ksys_write+0x7c/0x140
system_call+0x5c/0x68
The "page_zone(pfn_to_page(pfn)" was introduced by 69324b8f48 ("mm,
devm_memremap_pages: add MEMORY_DEVICE_PRIVATE support"), however, I
think we will never have driver reserved memory with
MEMORY_DEVICE_PRIVATE (no altmap AFAIKS).
[david@redhat.com: minimze code changes, rephrase description]
Link: http://lkml.kernel.org/r/20191006085646.5768-2-david@redhat.com
Fixes: 2c2a5af6fe ("mm, memory_hotplug: add nid parameter to arch_remove_memory")
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Logan Gunthorpe <logang@deltatee.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Damian Tometzki <damian.tometzki@gmail.com>
Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Halil Pasic <pasic@linux.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jun Yao <yaojun8558363@gmail.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Pankaj Gupta <pagupta@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Pavel Tatashin <pavel.tatashin@microsoft.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Qian Cai <cai@lca.pw>
Cc: Rich Felker <dalias@libc.org>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Steve Capper <steve.capper@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Wei Yang <richardw.yang@linux.intel.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Yu Zhao <yuzhao@google.com>
Cc: <stable@vger.kernel.org> [5.0+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Uninitialized memmaps contain garbage and in the worst case trigger
kernel BUGs, especially with CONFIG_PAGE_POISONING. They should not get
touched.
For example, when not onlining a memory block that is spanned by a zone
and reading /proc/pagetypeinfo with CONFIG_DEBUG_VM_PGFLAGS and
CONFIG_PAGE_POISONING, we can trigger a kernel BUG:
:/# echo 1 > /sys/devices/system/memory/memory40/online
:/# echo 1 > /sys/devices/system/memory/memory42/online
:/# cat /proc/pagetypeinfo > test.file
page:fffff2c585200000 is uninitialized and poisoned
raw: ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
raw: ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
There is not page extension available.
------------[ cut here ]------------
kernel BUG at include/linux/mm.h:1107!
invalid opcode: 0000 [#1] SMP NOPTI
Please note that this change does not affect ZONE_DEVICE, because
pagetypeinfo_showmixedcount_print() is called from
mm/vmstat.c:pagetypeinfo_showmixedcount() only for populated zones, and
ZONE_DEVICE is never populated (zone->present_pages always 0).
[david@redhat.com: move check to outer loop, add comment, rephrase description]
Link: http://lkml.kernel.org/r/20191011140638.8160-1-david@redhat.com
Fixes: f1dd2cd13c ("mm, memory_hotplug: do not associate hotadded memory to zones until online") # visible after d0dc12e86b
Signed-off-by: Qian Cai <cai@lca.pw>
Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Miles Chen <miles.chen@mediatek.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Qian Cai <cai@lca.pw>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: <stable@vger.kernel.org> [4.13+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When CONFIG_PRINTK_CALLER is set, struct printk_log contains an
additional member caller_id. This affects the offset of the log text.
Account for this by using the type information from gdb to determine all
the offsets instead of using hardcoded values.
This fixes following error:
(gdb) lx-dmesg
Python Exception <class 'ValueError'> embedded null character:
Error occurred in Python command: embedded null character
The read_u* utility functions now take an offset argument to make them
easier to use.
Link: http://lkml.kernel.org/r/20191011142500.2339-1-joel.colledge@linbit.com
Signed-off-by: Joel Colledge <joel.colledge@linbit.com>
Reviewed-by: Jan Kiszka <jan.kiszka@siemens.com>
Cc: Kieran Bingham <kbingham@kernel.org>
Cc: Leonard Crestez <leonard.crestez@nxp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There are three places where we access uninitialized memmaps, namely:
- /proc/kpagecount
- /proc/kpageflags
- /proc/kpagecgroup
We have initialized memmaps either when the section is online or when the
page was initialized to the ZONE_DEVICE. Uninitialized memmaps contain
garbage and in the worst case trigger kernel BUGs, especially with
CONFIG_PAGE_POISONING.
For example, not onlining a DIMM during boot and calling /proc/kpagecount
with CONFIG_PAGE_POISONING:
:/# cat /proc/kpagecount > tmp.test
BUG: unable to handle page fault for address: fffffffffffffffe
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 114616067 P4D 114616067 PUD 114618067 PMD 0
Oops: 0000 [#1] SMP NOPTI
CPU: 0 PID: 469 Comm: cat Not tainted 5.4.0-rc1-next-20191004+ #11
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.4
RIP: 0010:kpagecount_read+0xce/0x1e0
Code: e8 09 83 e0 3f 48 0f a3 02 73 2d 4c 89 e7 48 c1 e7 06 48 03 3d ab 51 01 01 74 1d 48 8b 57 08 480
RSP: 0018:ffffa14e409b7e78 EFLAGS: 00010202
RAX: fffffffffffffffe RBX: 0000000000020000 RCX: 0000000000000000
RDX: 0000000000000001 RSI: 00007f76b5595000 RDI: fffff35645000000
RBP: 00007f76b5595000 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000140000
R13: 0000000000020000 R14: 00007f76b5595000 R15: ffffa14e409b7f08
FS: 00007f76b577d580(0000) GS:ffff8f41bd400000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: fffffffffffffffe CR3: 0000000078960000 CR4: 00000000000006f0
Call Trace:
proc_reg_read+0x3c/0x60
vfs_read+0xc5/0x180
ksys_read+0x68/0xe0
do_syscall_64+0x5c/0xa0
entry_SYSCALL_64_after_hwframe+0x49/0xbe
For now, let's drop support for ZONE_DEVICE from the three pseudo files
in order to fix this. To distinguish offline memory (with garbage
memmap) from ZONE_DEVICE memory with properly initialized memmaps, we
would have to check get_dev_pagemap() and pfn_zone_device_reserved()
right now. The usage of both (especially, special casing devmem) is
frowned upon and needs to be reworked.
The fundamental issue we have is:
if (pfn_to_online_page(pfn)) {
/* memmap initialized */
} else if (pfn_valid(pfn)) {
/*
* ???
* a) offline memory. memmap garbage.
* b) devmem: memmap initialized to ZONE_DEVICE.
* c) devmem: reserved for driver. memmap garbage.
* (d) devmem: memmap currently initializing - garbage)
*/
}
We'll leave the pfn_zone_device_reserved() check in stable_page_flags()
in place as that function is also used from memory failure. We now no
longer dump information about pages that are not in use anymore -
offline.
Link: http://lkml.kernel.org/r/20191009142435.3975-2-david@redhat.com
Fixes: f1dd2cd13c ("mm, memory_hotplug: do not associate hotadded memory to zones until online") [visible after d0dc12e86b]
Signed-off-by: David Hildenbrand <david@redhat.com>
Reported-by: Qian Cai <cai@lca.pw>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Toshiki Fukasawa <t-fukasawa@vx.jp.nec.com>
Cc: Pankaj gupta <pagupta@redhat.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Anthony Yznaga <anthony.yznaga@oracle.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: <stable@vger.kernel.org> [4.13+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Uninitialized memmaps contain garbage and in the worst case trigger kernel
BUGs, especially with CONFIG_PAGE_POISONING. They should not get touched.
Right now, when trying to soft-offline a PFN that resides on a memory
block that was never onlined, one gets a misleading error with
CONFIG_PAGE_POISONING:
:/# echo 5637144576 > /sys/devices/system/memory/soft_offline_page
[ 23.097167] soft offline: 0x150000 page already poisoned
But the actual result depends on the garbage in the memmap.
soft_offline_page() can only work with online pages, it returns -EIO in
case of ZONE_DEVICE. Make sure to only forward pages that are online
(iow, managed by the buddy) and, therefore, have an initialized memmap.
Add a check against pfn_to_online_page() and similarly return -EIO.
Link: http://lkml.kernel.org/r/20191010141200.8985-1-david@redhat.com
Fixes: f1dd2cd13c ("mm, memory_hotplug: do not associate hotadded memory to zones until online") [visible after d0dc12e86b]
Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: <stable@vger.kernel.org> [4.13+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull block fixes from Jens Axboe:
- NVMe pull request from Keith that address deadlocks, double resets,
memory leaks, and other regression.
- Fixup elv_support_iosched() for bio based devices (Damien)
- Fixup for the ahci PCS quirk (Dan)
- Socket O_NONBLOCK handling fix for io_uring (me)
- Timeout sequence io_uring fixes (yangerkun)
- MD warning fix for parameter default_layout (Song)
- blkcg activation fixes (Tejun)
- blk-rq-qos node deletion fix (Tejun)
* tag 'for-linus-2019-10-18' of git://git.kernel.dk/linux-block:
nvme-pci: Set the prp2 correctly when using more than 4k page
io_uring: fix logic error in io_timeout
io_uring: fix up O_NONBLOCK handling for sockets
md/raid0: fix warning message for parameter default_layout
libata/ahci: Fix PCS quirk application
blk-rq-qos: fix first node deletion of rq_qos_del()
blkcg: Fix multiple bugs in blkcg_activate_policy()
io_uring: consider the overflow of sequence for timeout req
nvme-tcp: fix possible leakage during error flow
nvmet-loop: fix possible leakage during error flow
block: Fix elv_support_iosched()
nvme-tcp: Initialize sk->sk_ll_usec only with NET_RX_BUSY_POLL
nvme: Wait for reset state when required
nvme: Prevent resets during paused controller state
nvme: Restart request timers in resetting state
nvme: Remove ADMIN_ONLY state
nvme-pci: Free tagset if no IO queues
nvme: retain split access workaround for capability reads
nvme: fix possible deadlock when nvme_update_formats fails
Pull RISC-V fixes from Paul Walmsley:
"Some RISC-V fixes:
- Fix the virtual memory layout so the fixaddr region doesn't overlap
with other regions. (This was originally intended to go in as part
of an earlier patch, but I inadvertently dropped it during a
rebase)
- Add the DT chosen/stdout-path property to the HiFive Unleashed DT
file. This is so "earlycon" can be specified with no arguments on
the kernel command line, and the correct UART will be automatically
selected.
And two cleanup patches:
- Simplify the code in our breakpoint trap handler.
- Drop a comment in our TLB flush code that has caused some
confusion"
* tag 'riscv/for-v5.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
RISC-V: fix virtual address overlapped in FIXADDR_START and VMEMMAP_START
riscv: tlbflush: remove confusing comment on local_flush_tlb_all()
riscv: dts: HiFive Unleashed: add default chosen/stdout-path
riscv: remove the switch statement in do_trap_break()
This was always meant to be a temporary thing, just for testing and to
see if it actually ever triggered.
The only thing that reported it was syzbot doing disk image fuzzing, and
then that warning is expected. So let's just remove it before -rc4,
because the extra sanity testing should probably go to -stable, but we
don't want the warning to do so.
Reported-by: syzbot+3031f712c7ad5dd4d926@syzkaller.appspotmail.com
Fixes: 8a23eb804c ("Make filldir[64]() verify the directory entry filename is valid")
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull ceph fixes from Ilya Dryomov:
"A future-proofing decoding fix from Jeff intended for stable and a
patch for a mostly benign race from Dongsheng"
* tag 'ceph-for-5.4-rc4' of git://github.com/ceph/ceph-client:
rbd: cancel lock_dwork if the wait is interrupted
ceph: just skip unrecognized info in ceph_reply_info_extra
Pull device mapper fixes from Mike Snitzer:
- Fix DM snapshot deadlock that can occur due to COW throttling
preventing locks from being released.
- Fix DM cache's GFP_NOWAIT allocation failure error paths by switching
to GFP_NOIO.
- Make __hash_find() static in the DM clone target.
* tag 'for-5.4/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm cache: fix bugs when a GFP_NOWAIT allocation fails
dm snapshot: rework COW throttling to fix deadlock
dm snapshot: introduce account_start_copy() and account_end_copy()
dm clone: Make __hash_find static
Pull iommu fixes from Joerg Roedel:
- Fixes for page-table issues on Mali GPUs
- Missing free in an error path for ARM-SMMU
- PASID decoding in the AMD IOMMU Event log code
- Another update for the locking fixes in the AMD IOMMU driver
- Reduce the calls to platform_get_irq() in the IPMMU-VMSA and Rockchip
IOMMUs to get rid of the warning message added to this function
recently
* tag 'iommu-fixes-v5.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu/amd: Check PM_LEVEL_SIZE() condition in locked section
iommu/amd: Fix incorrect PASID decoding from event log
iommu/ipmmu-vmsa: Only call platform_get_irq() when interrupt is mandatory
iommu/rockchip: Don't use platform_get_irq to implicitly count irqs
iommu/io-pgtable-arm: Support all Mali configurations
iommu/io-pgtable-arm: Correct Mali attributes
iommu/arm-smmu: Free context bitmap in the err path of arm_smmu_init_domain_context
Pull usercopy test fixlets from Christian Brauner:
"This contains two improvements for the copy_struct_from_user() tests:
- a coding style change to get rid of the ugly "if ((ret |= test()))"
pointed out when pulling the original patchset.
- avoid a soft lockups when running the usercopy tests on machines
with large page sizes by scanning only a 1024 byte region"
* tag 'copy-struct-from-user-v5.4-rc4' of gitolite.kernel.org:pub/scm/linux/kernel/git/brauner/linux:
usercopy: Avoid soft lockups in test_check_nonzero_user()
lib: test_user_copy: style cleanup
The scsi async probe process is calling blk_pm_runtime_init for each lun,
and then those request queues are monitored by the block layer pm
engine (blk-pm.c). This is however, not the case for scsi-passthrough
queues, created by bsg_setup_queue().
So the ufs-bsg driver might send various commands, disregarding the pm
status of the device. This is wrong, regardless if its request queue is
pm-aware or not.
Fixes: df032bf27a (scsi: ufs: Add a bsg endpoint that supports UPIUs)
Link: https://lore.kernel.org/r/1570696267-8487-1-git-send-email-avri.altman@wdc.com
Reported-by: Yuliy Izrailov <yuliy.izrailov@wdc.com>
Signed-off-by: Avri Altman <avri.altman@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
On some MIPS variants (e.g. MIPS r1), vDSO clock_mode is set to
VDSO_CLOCK_NONE.
When VDSO_CLOCK_NONE is set the expected kernel behavior is to fallback
on syscalls. To do that the generic vDSO library expects UULONG_MAX as
return value of __arch_get_hw_counter().
Fix __arch_get_hw_counter() on MIPS defining a __VDSO_USE_SYSCALL case
that addressed the described scenario.
Reported-by: Maxime Bizon <mbizon@freebox.fr>
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Tested-by: Maxime Bizon <mbizon@freebox.fr>
Signed-off-by: Paul Burton <paulburton@kernel.org>
Cc: linux-mips@vger.kernel.org
In mlx5_fw_fatal_reporter_dump if mlx5_crdump_collect fails the
allocated memory for cr_data must be released otherwise there will be
memory leak. To fix this, this commit changes the return instruction
into goto error handling.
Fixes: 9b1f298236 ("net/mlx5: Add support for FW fatal reporter dump")
Signed-off-by: Navid Emamdoost <navid.emamdoost@gmail.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
The completion queue consumer index increments upon a call to
mlx5_cqwq_pop().
When dumping an error CQE, the index is already incremented.
Decrease one for the print command.
Fixes: 16cc14d817 ("net/mlx5e: Dump xmit error completions")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Once the kTLS TX resync function is called, it used to return
a binary value, for success or failure.
However, in case the TLS SKB is a retransmission of the connection
handshake, it initiates the resync flow (as the tcp seq check holds),
while regular packet handle is expected.
In this patch, we identify this case and skip the resync operation
accordingly.
Counters:
- Add a counter (tls_skip_no_sync_data) to monitor this.
- Bump the dump counters up as they are used more frequently.
- Add a missing counter descriptor declaration for tls_resync_bytes
in sq_stats_desc.
Fixes: d2ead1f360 ("net/mlx5e: Add kTLS TX HW offload support")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Do not assume the crypto info is accessible during the
connection lifetime. Save a copy of it in the private
TX context.
Fixes: d2ead1f360 ("net/mlx5e: Add kTLS TX HW offload support")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Cipher type is checked upon connection addition.
No need to recheck it per every TX resync invocation.
Fixes: d2ead1f360 ("net/mlx5e: Add kTLS TX HW offload support")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
HW expects the data size in DUMP WQEs to be up to MTU.
Make sure they are in range.
We elevate the frag page refcount by 'n-1', in addition to the
one obtained in tx_sync_info_get(), having an overall of 'n'
references. We bulk increments by using a single page_ref_add()
command, to optimize perfermance.
The refcounts are released one by one, by the corresponding completions.
Fixes: d2ead1f360 ("net/mlx5e: Add kTLS TX HW offload support")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Before posting the context params WQEs, make sure there is enough
contiguous room for them, and fill frag edge if needed.
When posting only a nop, no need for room check, as it needs a single
WQEBB, meaning no contiguity issue.
Fixes: d2ead1f360 ("net/mlx5e: Add kTLS TX HW offload support")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
All references for frag pages that are obtained in tx_sync_info_get()
should be released.
Release usually occurs in the corresponding CQE of the WQE.
In error flows, not all fragments have a WQE posted for them, hence
no matching CQE will be generated.
For these pages, release the reference in the error flow.
Fixes: d2ead1f360 ("net/mlx5e: Add kTLS TX HW offload support")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Access the record fragments only under the TLS ctx lock.
In the resync flow, save a copy of them to be used when
preparing and posting the required DUMP WQEs.
Fixes: d2ead1f360 ("net/mlx5e: Add kTLS TX HW offload support")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
In TX resync flow where DUMP WQEs are posted, keep a pointer to
the fragment page to unref it upon completion, instead of saving
the whole fragment.
In addition, move it the end of the arguments list in tx_fill_wi().
Fixes: d2ead1f360 ("net/mlx5e: Add kTLS TX HW offload support")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
No Eth segment, so no dynamic inline headers.
The size of a Dump WQE is fixed, use constants and remove
unnecessary checks.
Fixes: d2ead1f360 ("net/mlx5e: Add kTLS TX HW offload support")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
A call to kTLS completion handler was missing in the TXQSQ release
flow. Add it.
Fixes: d2ead1f360 ("net/mlx5e: Add kTLS TX HW offload support")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Not all fields of WQE info are being written in the function,
having some leftovers from previous rounds.
Zero-memset it upon update.
Particularly, not nullifying the wi->resync_dump_frag field
will cause double free of the kTLS DUMPed frags.
Fixes: d2ead1f360 ("net/mlx5e: Add kTLS TX HW offload support")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Cited patch removed the assumption only in datapath.
Here we remove it also form control/cleanup flow.
Fixes: 9ab0233728 ("net/mlx5e: Tx, Don't implicitly assume SKB-less wqe has one WQEBB")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
bcm2835-rpi.dtsi defines the behavior of the ACT LED, which is available
on all Raspberry Pi boards. But there is no driver for this particual
GPIO on CM3 in mainline yet, so this node was left incomplete without
the actual GPIO definition. Since commit 025bf37725 ("gpio: Fix return
value mismatch of function gpiod_get_from_of_node()") this causing probe
issues of the leds-gpio driver for users of the CM3 dtsi file.
leds-gpio: probe of leds failed with error -2
Until we have the necessary GPIO driver hide the ACT node for CM3
to avoid this.
Reported-by: Fredrik Yhlen <fredrik.yhlen@endian.se>
Signed-off-by: Stefan Wahren <wahrenst@gmx.net>
Fixes: a54fe8a6cf ("ARM: dts: add Raspberry Pi Compute Module 3 and IO board")
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Fix broken read implementation, which could be used to trigger slab info
leaks.
The driver failed to check if the custom ring buffer was still empty
when waking up after having waited for more data. This would happen on
every interrupt-in completion, even if no data had been added to the
ring buffer (e.g. on disconnect events).
Due to missing sanity checks and uninitialised (kmalloced) ring-buffer
entries, this meant that huge slab info leaks could easily be triggered.
Note that the empty-buffer check after wakeup is enough to fix the info
leak on disconnect, but let's clear the buffer on allocation and add a
sanity check to read() to prevent further leaks.
Fixes: 2824bd250f ("[PATCH] USB: add ldusb driver")
Cc: stable <stable@vger.kernel.org> # 2.6.13
Reported-by: syzbot+6fe95b826644f7f12b0b@syzkaller.appspotmail.com
Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://lore.kernel.org/r/20191018151955.25135-2-johan@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Current code tries to derive VLAN ID and compares it with GID
attribute for matching entry. This raw search fails on macvlan
netdevice as its not a VLAN device, but its an upper device of a VLAN
netdevice.
Due to this limitation, incoming QP1 packets fail to match in the
GID table. Such packets are dropped.
Hence, to support it, use the existing rdma_read_gid_l2_fields()
that takes care of diffferent device types.
Fixes: dbf727de74 ("IB/core: Use GID table in AH creation and dmac resolution")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Link: https://lore.kernel.org/r/20191002121750.17313-1-leon@kernel.org
Signed-off-by: Doug Ledford <dledford@redhat.com>
Johan writes:
USB-serial fixes for 5.4-rc4
Here's a fix for a long-standing locking bug in ti_usb_3410_5052 and
related clean up.
Both have been in linux-next with no reported issues.
Signed-off-by: Johan Hovold <johan@kernel.org>
* tag 'usb-serial-5.4-rc4' of https://git.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial:
USB: serial: ti_usb_3410_5052: clean up serial data access
USB: serial: ti_usb_3410_5052: fix port-close races
In the format of synthetic events, the "gfp_t" is shown as "signed:1",
but in fact the "gfp_t" is "unsigned", should be shown as "signed:0".
The issue can be reproduced by the following commands:
echo 'memlatency u64 lat; unsigned int order; gfp_t gfp_flags; int migratetype' > /sys/kernel/debug/tracing/synthetic_events
cat /sys/kernel/debug/tracing/events/synthetic/memlatency/format
name: memlatency
ID: 2233
format:
field:unsigned short common_type; offset:0; size:2; signed:0;
field:unsigned char common_flags; offset:2; size:1; signed:0;
field:unsigned char common_preempt_count; offset:3; size:1; signed:0;
field:int common_pid; offset:4; size:4; signed:1;
field:u64 lat; offset:8; size:8; signed:0;
field:unsigned int order; offset:16; size:4; signed:0;
field:gfp_t gfp_flags; offset:24; size:4; signed:1;
field:int migratetype; offset:32; size:4; signed:1;
print fmt: "lat=%llu, order=%u, gfp_flags=%x, migratetype=%d", REC->lat, REC->order, REC->gfp_flags, REC->migratetype
Link: http://lkml.kernel.org/r/20191018012034.6404-1-zhengjun.xing@linux.intel.com
Reviewed-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Zhengjun Xing <zhengjun.xing@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
kref release routines usually perform memory release operations,
hence, they should not be called with spinlocks held.
one such case is: SIW kref release routine siw_free_qp(), which
can sleep via vfree() while freeing queue memory.
Hence, all iw_rem_ref() calls in IWCM are moved out of spinlocks.
Fixes: 922a8e9fb2 ("RDMA: iWARP Connection Manager.")
Signed-off-by: Krishnamraju Eraparaju <krishna2@chelsio.com>
Reviewed-by: Bernard Metzler <bmt@zurich.ibm.com>
Link: https://lore.kernel.org/r/20191007102627.12568-1-krishna2@chelsio.com
Signed-off-by: Doug Ledford <dledford@redhat.com>
pass_accept_req() is using the same skb for handling accept request and
sending accept reply to HW. Here req and rpl structures are pointing to
same skb->data which is over written by INIT_TP_WR() and leads to
accessing corrupt req fields in accept_cr() while checking for ECN flags.
Reordered code in accept_cr() to fetch correct req fields.
Fixes: 92e7ae7172 ("iw_cxgb4: Choose appropriate hw mtu index and ISS for iWARP connections")
Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com>
Link: https://lore.kernel.org/r/20191003104353.11590-1-bharat@chelsio.com
Signed-off-by: Doug Ledford <dledford@redhat.com>
As soon as the netdev is registers, the kernel can start using the
interface. If the driver connects the MAC to the PHY after the netdev
is registered, there is a race condition where the interface can be
opened without having the PHY connected.
Change the order to close this race condition.
Fixes: 92571a1aae ("lan78xx: Connect phy early")
Reported-by: Daniel Wagner <dwagner@suse.de>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: Daniel Wagner <dwagner@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stefano Garzarella says:
====================
vsock/virtio: make the credit mechanism more robust
This series makes the credit mechanism implemented in the
virtio-vsock devices more robust.
Patch 1 sends an update to the remote peer when the buf_alloc
change.
Patch 2 prevents a malicious peer (especially the guest) can
consume all the memory of the other peer, discarding packets
when the credit available is not respected.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
If the remote peer doesn't respect the credit information
(buf_alloc, fwd_cnt), sending more data than it can send,
we should drop the packets to prevent a malicious peer
from using all of our memory.
This is patch follows the VIRTIO spec: "VIRTIO_VSOCK_OP_RW data
packets MUST only be transmitted when the peer has sufficient
free buffer space for the payload"
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the user application set a new buffer size value, we should
update the remote peer about this change, since it uses this
information to calculate the credit available.
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
devlink maintains packets and bytes statistics for each trap. Since
eth_type_trans() was called to set the skb's protocol, the data pointer
no longer points to the start of the packet and the bytes accounting is
off by 14 bytes.
Fix this by pushing the skb's data pointer to the start of the packet.
Fixes: b5ce611fd9 ("mlxsw: spectrum: Add devlink-trap support")
Reported-by: Alex Kushnarov <alexanderk@mellanox.com>
Tested-by: Alex Kushnarov <alexanderk@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The commit below, adds a call to sysclk callback on shutdown.
This introduces a regression in stm32 SAI driver, as some clock
services are called twice, leading to unbalanced calls.
Move processing related to mclk from shutdown to sysclk callback.
When requested frequency is 0, assume shutdown and release mclk.
Fixes: 2458adb8f9 ("SoC: simple-card-utils: set 0Hz to sysclk when shutdown")
Signed-off-by: Olivier Moysan <olivier.moysan@st.com>
Link: https://lore.kernel.org/r/20191018082040.31022-1-olivier.moysan@st.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Thomas found that some forwarded packets would be stuck
in FQ packet scheduler because their skb->tstamp contained
timestamps far in the future.
We thought we addressed this point in commit 8203e2d844
("net: clear skb->tstamp in forwarding paths") but there
is still an issue when/if a packet needs to be fragmented.
In order to meet EDT requirements, we have to make sure all
fragments get the original skb->tstamp.
Note that this original skb->tstamp should be zero in
forwarding path, but might have a non zero value in
output path if user decided so.
Fixes: fb420d5d91 ("tcp/fq: move back to CLOCK_MONOTONIC")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Thomas Bartschies <Thomas.Bartschies@cvk.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull MMC fixes from Ulf Hansson:
"MMC host:
- sdhci-iproc: Prevent some spurious interrupts
- renesas_sdhi/sh_mmcif: Avoid false warnings about IRQs not found
MEMSTICK host:
- jmb38x_ms: Fix an error handling path at ->probe()"
* tag 'mmc-v5.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
memstick: jmb38x_ms: Fix an error handling path in 'jmb38x_ms_probe()'
mmc: sdhci-iproc: fix spurious interrupts on Multiblock reads with bcm2711
mmc: sh_mmcif: Use platform_get_irq_optional() for optional interrupt
mmc: renesas_sdhi: Do not use platform_get_irq() to count interrupts
Doug Berger says:
====================
net: bcmgenet: restore internal EPHY support
I managed to get my hands on an old BCM97435SVMB board to do some
testing with the latest kernel and uncovered a number of things
that managed to get broken over the years (some by me ;).
This commit set attempts to correct the errors I observed in my
testing.
The first commit applies to all internal PHYs to restore proper
reporting of link status when a link comes up.
The second commit restores the soft reset to the initialization of
the older internal EPHYs used by 40nm Set-Top Box devices.
The third corrects a bug I introduced when removing excessive soft
resets by altering the initialization sequence in a way that keeps
the GENETv3 MAC interface happy.
Finally, I observed a number of issues when manually configuring
the network interface of the older EPHYs that appear to be resolved
by the fourth commit.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The EPHY integrated into the 40nm Set-Top Box devices can falsely
detect energy when connected to a disabled peer interface. When the
peer interface is enabled the EPHY will detect and report the link
as active, but on occasion may get into a state where it is not
able to exchange data with the connected GENET MAC. This issue has
not been observed when the link parameters are auto-negotiated;
however, it has been observed with a manually configured link.
It has been empirically determined that issuing a soft reset to the
EPHY when energy is detected prevents it from getting into this bad
state.
Fixes: 1c1008c793 ("net: bcmgenet: add main driver file")
Signed-off-by: Doug Berger <opendmb@gmail.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It turns out that the "Workaround for putting the PHY in IDDQ mode"
used by the internal EPHYs on 40nm Set-Top Box chips when powering
down puts the interface to the GENET MAC in a state that can cause
subsequent MAC resets to be incomplete.
Rather than restore the forced soft reset when powering up internal
PHYs, this commit moves the invocation of phy_init_hw earlier in
the MAC initialization sequence to just before the MAC reset in the
open and resume functions. This allows the interface to be stable
and allows the MAC resets to be successful.
The bcmgenet_mii_probe() function is split in two to accommodate
this. The new function bcmgenet_mii_connect() handles the first
half of the functionality before the MAC initialization, and the
bcmgenet_mii_config() function is extended to provide the remaining
PHY configuration following the MAC initialization.
Fixes: 484bfa1507 ("Revert "net: bcmgenet: Software reset EPHY after power on"")
Signed-off-by: Doug Berger <opendmb@gmail.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The internal 40nm EPHYs use a "Workaround for putting the PHY in
IDDQ mode." These PHYs require a soft reset to restore functionality
after they are powered back up.
This commit defines the soft_reset function to use genphy_soft_reset
during phy_init_hw to accommodate this.
Fixes: 6e2d85ec05 ("net: phy: Stop with excessive soft reset")
Signed-off-by: Doug Berger <opendmb@gmail.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When commit 28b2e0d2cd ("net: phy: remove parameter new_link from
phy_mac_interrupt()") removed the new_link parameter it set the
phydev->link state from the MAC before invoking phy_mac_interrupt().
However, once commit 88d6272aca ("net: phy: avoid unneeded MDIO
reads in genphy_read_status") was added this initialization prevents
the proper determination of the connection parameters by the function
genphy_read_status().
This commit removes that initialization to restore the proper
functionality.
Fixes: 88d6272aca ("net: phy: avoid unneeded MDIO reads in genphy_read_status")
Signed-off-by: Doug Berger <opendmb@gmail.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull sound fixes from Takashi Iwai:
"Just a few small fixes for the usual suspect, HD- and USB-audio:
enablement of runtime PM for Nvidia due to the recent PCI changes, a
fix for potential hangs with recent HD-audio platforms, and the rest
device-specific quirks"
* tag 'sound-5.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda - Force runtime PM on Nvidia HDMI codecs
ALSA: hda/realtek - Enable headset mic on Asus MJ401TA
ALSA: usb-audio: Disable quirks for BOSS Katana amplifiers
ALSA: hdac: clear link output stream mapping
ALSA: hda/realtek: Reduce the Headphone static noise on XPS 9350/9360
I noticed that when probed with ti-sysc, watchdog can trigger on am3, am4
and dra7 causing a device reset.
Turns out I made several mistakes implementing the watchdog quirk handling:
1. We must do both writes to spr register
2. We must also call the reset quirk on disable
3. On am3 and am4 we need to also set swsup quirk flag
I probably only tested this earlier with watchdog service running when the
watchdog never gets disabled.
Fixes: 4e23be473e ("bus: ti-sysc: Add support for module specific reset quirks")
Signed-off-by: Tony Lindgren <tony@atomide.com>
The OMAP3 ISP IOMMU does not have any reset lines, so it didn't
need any pdata previously. The OMAP IOMMU driver now requires the
platform data ops for device_enable/idle on all the IOMMU devices
after commit db8918f61d ("iommu/omap: streamline enable/disable
through runtime pm callbacks") to enable/disable the clocks properly
and maintain the reference count and the omap_hwmod state machine.
So, add these callbacks through iommu pdata quirks for the OMAP3
ISP IOMMU.
Signed-off-by: Suman Anna <s-anna@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
The OMAP IOMMU driver requires the device_enable/idle platform
data ops on all the IOMMU devices to be able to enable and disable
the clocks after commit db8918f61d ("iommu/omap: streamline
enable/disable through runtime pm callbacks"). Plug in these
pdata ops for all the existing IOMMUs through pdata quirks to
maintain functionality.
Signed-off-by: Suman Anna <s-anna@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Pull ACPI fixes from Rafael Wysocki:
"Fix possible use-after-free in the ACPI CPPC support code (John Garry)
and prevent the ACPI HMAT parsing code from using possibly incorrect
data coming from the platform firmware (Daniel Black)"
* tag 'acpi-5.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: CPPC: Set pcc_data[pcc_ss_id] to NULL in acpi_cppc_processor_exit()
ACPI: HMAT: ACPI_HMAT_MEMORY_PD_VALID is deprecated since ACPI-6.3
Pull power management fixes from Rafael Wysocki:
"These include a fix for a recent regression in the ACPI CPU
performance scaling code, a PCI device power management fix,
a system shutdown fix related to cpufreq, a removal of an ACPI
suspend-to-idle blacklist entry and a build warning fix.
Specifics:
- Fix possible NULL pointer dereference in the ACPI processor scaling
initialization code introduced by a recent cpufreq update (Rafael
Wysocki).
- Fix possible deadlock due to suspending cpufreq too late during
system shutdown (Rafael Wysocki).
- Make the PCI device system resume code path be more consistent with
its PM-runtime counterpart to fix an issue with missing delay on
transitions from D3cold to D0 during system resume from
suspend-to-idle on some systems (Rafael Wysocki).
- Drop Dell XPS13 9360 from the LPS0 Idle _DSM blacklist to make it
use suspend-to-idle by default (Mario Limonciello).
- Fix build warning in the core system suspend support code (Ben
Dooks)"
* tag 'pm-5.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: processor: Avoid NULL pointer dereferences at init time
PCI: PM: Fix pci_power_up()
PM: sleep: include <linux/pm_runtime.h> for pm_wq
cpufreq: Avoid cpufreq_suspend() deadlock on system shutdown
ACPI: PM: Drop Dell XPS13 9360 from LPS0 Idle _DSM blacklist
We must return a mask covering the full physical RAM when bypassing the
IOMMU mapping. Also, in iommu_need_mapping, we need to check using
dma_direct_get_required_mask to ensure that the device's dma_mask can
cover physical RAM before deciding to bypass IOMMU mapping.
Based on an earlier patch from Christoph Hellwig.
Fixes: 249baa5479 ("dma-mapping: provide a better default ->get_required_mask")
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Acked-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Pull scsi fixes from Martin Petersen:
"These two commits were in a separate postmerge branch due to a
dependency on changes merged for 5.4 in the block tree.
They fix two issues in the intersection of the request cleanup changes
from block (b7e9e1fb7a) and the request batching changes
(8930a6c207) that were made to SCSI during the 5.4 cycle"
* tag 'mkp-scsi-postmerge' of git://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi:
scsi: core: fix dh and multipathing for SCSI hosts without request batching
scsi: core: fix missing .cleanup_rq for SCSI hosts without request batching
The increase_address_space() function has to check the PM_LEVEL_SIZE()
condition again under the domain->lock to avoid a false trigger of the
WARN_ON_ONCE() and to avoid that the address space is increase more
often than necessary.
Reported-by: Qian Cai <cai@lca.pw>
Fixes: 754265bcab ("iommu/amd: Fix race in increase_address_space()")
Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Pull NVMe updates from Keith:
"This is a collection of bug fixes committed since the previous pull
request that address deadlocks, double resets, memory leaks, and other
regression."
* 'nvme-5.4' of git://git.infradead.org/nvme:
nvme-pci: Set the prp2 correctly when using more than 4k page
nvme-tcp: fix possible leakage during error flow
nvmet-loop: fix possible leakage during error flow
nvme-tcp: Initialize sk->sk_ll_usec only with NET_RX_BUSY_POLL
nvme: Wait for reset state when required
nvme: Prevent resets during paused controller state
nvme: Restart request timers in resetting state
nvme: Remove ADMIN_ONLY state
nvme-pci: Free tagset if no IO queues
nvme: retain split access workaround for capability reads
nvme: fix possible deadlock when nvme_update_formats fails
In the current code, the nvme is using a fixed 4k PRP entry size,
but if the kernel use a page size which is more than 4k, we should
consider the situation that the bv_offset may be larger than the
dev->ctrl.page_size. Otherwise we may miss setting the prp2 and then
cause the command can't be executed correctly.
Fixes: dff824b2aa ("nvme-pci: optimize mapping of small single segment requests")
Cc: stable@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Kevin Hao <haokexin@gmail.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
The Primebook C11B uses the SIPODEV SP1064 touchpad. There are 2 versions
of this 2-in-1 and the touchpad in the older version does not supply
descriptors, so it has to be added to the override list.
Cc: stable@vger.kernel.org
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
The introduction of Symbol Namespaces changed the naming schema of the
__ksymtab entries from __kysmtab__symbol to __ksymtab_NAMESPACE.symbol.
That caused some breakages in tools that depend on the name layout in
either the binaries(vmlinux,*.ko) or in System.map. E.g. kmod's depmod
would not be able to read System.map without a patch to support symbol
namespaces. A warning reported by depmod for namespaced symbols would
look like
depmod: WARNING: [...]/uas.ko needs unknown symbol usb_stor_adjust_quirks
In order to address this issue, revert to the original naming scheme and
rather read the __kstrtabns_<symbol> entries and their corresponding
values from __ksymtab_strings to update the namespace values for
symbols. After having read all symbols and handled them in
handle_modversions(), the symbols are created. In a second pass, read
the __kstrtabns_ entries and update the namespaces accordingly.
Fixes: 8651ec01da ("module: add support for symbol namespaces.")
Reported-by: Stefan Wahren <stefan.wahren@i2se.com>
Suggested-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Will Deacon <will@kernel.org>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Matthias Maennich <maennich@google.com>
Signed-off-by: Jessica Yu <jeyu@kernel.org>
Setting the symbol namespace of a symbol within sym_add_exported feels
displaced and lead to issues in the current implementation of symbol
namespaces. This patch makes updating the namespace an explicit call to
decouple it from adding a symbol to the export list.
Acked-by: Will Deacon <will@kernel.org>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Matthias Maennich <maennich@google.com>
Signed-off-by: Jessica Yu <jeyu@kernel.org>
Let the function 'sym_update_namespace' take care of updating the
namespace for a symbol. While this currently only replaces one single
location where namespaces are updated, in a following patch, this
function will get more call sites.
The function signature is intentionally close to sym_update_crc and
taking the name by char* seems like unnecessary work as the symbol has
to be looked up again. In a later patch of this series, this concern
will be addressed.
This function ensures that symbol::namespace is either NULL or has a
valid non-empty value. Previously, the empty string was considered 'no
namespace' as well and this lead to confusion.
Acked-by: Will Deacon <will@kernel.org>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Matthias Maennich <maennich@google.com>
Signed-off-by: Jessica Yu <jeyu@kernel.org>
G920 device only advertises REPORT_ID_HIDPP_LONG and
REPORT_ID_HIDPP_VERY_LONG in its HID report descriptor, so querying
for REPORT_ID_HIDPP_SHORT with optional=false will always fail and
prevent G920 to be recognized as a valid HID++ device.
To fix this and improve some other aspects, modify
hidpp_validate_device() as follows:
- Inline the code of hidpp_validate_report() to simplify
distingushing between non-present and invalid report descriptors
- Drop the check for id >= HID_MAX_IDS || id < 0 since all of our
IDs are static and known to satisfy that at compile time
- Change the algorithms to check all possible report
types (including very long report) and deem the device as a valid
HID++ device if it supports at least one
- Treat invalid report length as a hard stop for the validation
algorithm, meaning that if any of the supported reports has
invalid length we assume the worst and treat the device as a
generic HID device.
- Fold initialization of hidpp->very_long_report_length into
hidpp_validate_device() since it already fetches very long report
length and validates its value
Fixes: fe3ee1ec00 ("HID: logitech-hidpp: allow non HID++ devices to be handled by this module")
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=204191
Reported-by: Sam Bazely <sambazley@fastmail.com>
Signed-off-by: Andrey Smirnov <andrew.smirnov@gmail.com>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Cc: Henrik Rydberg <rydberg@bitmath.org>
Cc: Pierre-Loup A. Griffais <pgriffais@valvesoftware.com>
Cc: Austin Palmer <austinp@valvesoftware.com>
Cc: linux-input@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: stable@vger.kernel.org # 5.2+
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Runtime power management in i2c-hid brings lots of issues, such as:
- When transitioning from display manager to desktop session, i2c-hid
was closed and opened, so the device was set to SLEEP and ON in a short
period. Vendors confirmed that their devices can't handle fast ON/SLEEP
command because Windows doesn't have this behavior.
- When rebooting, i2c-hid was closed, and the driver core put the device
back to full power before shutdown. This behavior also triggers a quick
SLEEP and ON commands that some devices can't handle, renders an
unusable touchpad after reboot.
- Most importantly, my power meter reports little to none energy saving
when i2c-hid is runtime suspended.
So let's remove runtime power management since there is no actual
benefit.
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Acked-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
When building with "EXTRA_CFLAGS=-Wall" gcc warns:
arch/x86/boot/compressed/acpi.c:29:30: warning: get_cmdline_acpi_rsdp defined but not used [-Wunused-function]
get_cmdline_acpi_rsdp() is only used when CONFIG_RANDOMIZE_BASE and
CONFIG_MEMORY_HOTREMOVE are both enabled, so any build where one of these
config options is disabled has this issue.
Move the function under the same ifdef guard as the call site.
[ tglx: Add context to the changelog so it becomes useful ]
Fixes: 41fa1ee9c6 ("acpi: Ignore acpi_rsdp kernel param when the kernel has been locked down")
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/1569719633-32164-1-git-send-email-zhenzhong.duan@oracle.com
* pm-cpufreq:
ACPI: processor: Avoid NULL pointer dereferences at init time
cpufreq: Avoid cpufreq_suspend() deadlock on system shutdown
* pm-sleep:
PM: sleep: include <linux/pm_runtime.h> for pm_wq
ACPI: PM: Drop Dell XPS13 9360 from LPS0 Idle _DSM blacklist
bam_dma_terminate_all() will leak resources if any of the transactions are
committed to the hardware (present in the desc fifo), and not complete.
Since bam_dma_terminate_all() does not cause the hardware to be updated,
the hardware will still operate on any previously committed transactions.
This can cause memory corruption if the memory for the transaction has been
reassigned, and will cause a sync issue between the BAM and its client(s).
Fix this by properly updating the hardware in bam_dma_terminate_all().
Fixes: e7c0fe2a5c ("dmaengine: add Qualcomm BAM dma driver")
Signed-off-by: Jeffrey Hugo <jeffrey.l.hugo@gmail.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20191017152606.34120-1-jeffrey.l.hugo@gmail.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
The BUILD_NVME define never got defined anywhere, causing NVMe commands to
be treated as SCSI commands when freeing the buffers. This was causing a
stuck discovery and a horrible crash in lpfc_set_rrq_active() later on.
Link: https://lore.kernel.org/r/20191017150019.75769-1-hare@suse.de
Fixes: c00f62e6c5 ("scsi: lpfc: Merge per-protocol WQ/CQ pairs into single per-cpu pair")
Signed-off-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
We have a test case like block/001 in blktests, which will create a scsi
device by loading scsi_debug module and then try to delete the device by
sysfs interface. At the same time, it may remove the scsi_debug module.
And getting a invalid paging request BUG_ON as following:
[ 34.625854] BUG: unable to handle page fault for address: ffffffffa0016bb8
[ 34.629189] Oops: 0000 [#1] SMP PTI
[ 34.629618] CPU: 1 PID: 450 Comm: bash Tainted: G W 5.4.0-rc3+ #473
[ 34.632524] RIP: 0010:scsi_proc_hostdir_rm+0x5/0xa0
[ 34.643555] CR2: ffffffffa0016bb8 CR3: 000000012cd88000 CR4: 00000000000006e0
[ 34.644545] Call Trace:
[ 34.644907] scsi_host_dev_release+0x6b/0x1f0
[ 34.645511] device_release+0x74/0x110
[ 34.646046] kobject_put+0x116/0x390
[ 34.646559] put_device+0x17/0x30
[ 34.647041] scsi_target_dev_release+0x2b/0x40
[ 34.647652] device_release+0x74/0x110
[ 34.648186] kobject_put+0x116/0x390
[ 34.648691] put_device+0x17/0x30
[ 34.649157] scsi_device_dev_release_usercontext+0x2e8/0x360
[ 34.649953] execute_in_process_context+0x29/0x80
[ 34.650603] scsi_device_dev_release+0x20/0x30
[ 34.651221] device_release+0x74/0x110
[ 34.651732] kobject_put+0x116/0x390
[ 34.652230] sysfs_unbreak_active_protection+0x3f/0x50
[ 34.652935] sdev_store_delete.cold.4+0x71/0x8f
[ 34.653579] dev_attr_store+0x1b/0x40
[ 34.654103] sysfs_kf_write+0x3d/0x60
[ 34.654603] kernfs_fop_write+0x174/0x250
[ 34.655165] __vfs_write+0x1f/0x60
[ 34.655639] vfs_write+0xc7/0x280
[ 34.656117] ksys_write+0x6d/0x140
[ 34.656591] __x64_sys_write+0x1e/0x30
[ 34.657114] do_syscall_64+0xb1/0x400
[ 34.657627] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 34.658335] RIP: 0033:0x7f156f337130
During deleting scsi target, the scsi_debug module have been removed. Then,
sdebug_driver_template belonged to the module cannot be accessd, resulting
in scsi_proc_hostdir_rm() BUG_ON.
To fix the bug, we add scsi_device_get() in sdev_store_delete() to try to
increase refcount of module, avoiding the module been removed.
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20191015130556.18061-1-yuyufen@huawei.com
Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
passthrough_parse_cdb() - used by TCMU and PSCSI - attepts to reset the LUN
field of SCSI-2 CDBs (bits 5,6,7 of byte 1). The current code is wrong as
for newer commands not having the LUN field it overwrites relevant command
bits (e.g. for SECURITY PROTOCOL IN / OUT). We think this code was
unnecessary from the beginning or at least it is no longer useful. So we
remove it entirely.
Link: https://lore.kernel.org/r/12498eab-76fd-eaad-1316-c2827badb76a@ts.fujitsu.com
Signed-off-by: Bodo Stroesser <bstroesser@ts.fujitsu.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Update maintainers for MediaTek ethernet driver with Mark Lee.
He is familiar with MediaTek mt762x series ethernet devices and
will keep following maintenance from the vendor side.
Signed-off-by: Sean Wang <sean.wang@mediatek.com>
Signed-off-by: Mark Lee <Mark-MC.Lee@mediatek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull arm64 fixes from Will Deacon:
"The main thing here is a long-awaited workaround for a CPU erratum on
ThunderX2 which we have developed in conjunction with engineers from
Cavium/Marvell.
At the moment, the workaround is unconditionally enabled for affected
CPUs at runtime but we may add a command-line option to disable it in
future if performance numbers show up indicating a significant cost
for real workloads.
Summary:
- Work around Cavium/Marvell ThunderX2 erratum #219
- Fix regression in mlock() ABI caused by sign-extension of TTBR1 addresses
- More fixes to the spurious kernel fault detection logic
- Fix pathological preemption race when enabling some CPU features at boot
- Drop broken kcore macros in favour of generic implementations
- Fix userspace view of ID_AA64ZFR0_EL1 when SVE is disabled
- Avoid NULL dereference on allocation failure during hibernation"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: tags: Preserve tags for addresses translated via TTBR1
arm64: mm: fix inverted PAR_EL1.F check
arm64: sysreg: fix incorrect definition of SYS_PAR_EL1_F
arm64: entry.S: Do not preempt from IRQ before all cpufeatures are enabled
arm64: hibernate: check pgd table allocation
arm64: cpufeature: Treat ID_AA64ZFR0_EL1 as RAZ when SVE is not enabled
arm64: Fix kcore macros after 52-bit virtual addressing fallout
arm64: Allow CAVIUM_TX2_ERRATUM_219 to be selected
arm64: Avoid Cavium TX2 erratum 219 when switching TTBR
arm64: Enable workaround for Cavium TX2 erratum 219 when running SMT
arm64: KVM: Trap VM ops when ARM64_WORKAROUND_CAVIUM_TX2_219_TVM is set
Jesse and Ido reported the following race condition:
<CPU A, t0> - Received packet A is forwarded and cached dst entry is
taken from the nexthop ('nhc->nhc_rth_input'). Calls skb_dst_set()
<t1> - Given Jesse has busy routers ("ingesting full BGP routing tables
from multiple ISPs"), route is added / deleted and rt_cache_flush() is
called
<CPU B, t2> - Received packet B tries to use the same cached dst entry
from t0, but rt_cache_valid() is no longer true and it is replaced in
rt_cache_route() by the newer one. This calls dst_dev_put() on the
original dst entry which assigns the blackhole netdev to 'dst->dev'
<CPU A, t3> - dst_input(skb) is called on packet A and it is dropped due
to 'dst->dev' being the blackhole netdev
There are 2 issues in the v4 routing code:
1. A per-netns counter is used to do the validation of the route. That
means whenever a route is changed in the netns, users of all routes in
the netns needs to redo lookup. v6 has an implementation of only
updating fn_sernum for routes that are affected.
2. When rt_cache_valid() returns false, rt_cache_route() is called to
throw away the current cache, and create a new one. This seems
unnecessary because as long as this route does not change, the route
cache does not need to be recreated.
To fully solve the above 2 issues, it probably needs quite some code
changes and requires careful testing, and does not suite for net branch.
So this patch only tries to add the deleted cached rt into the uncached
list, so user could still be able to use it to receive packets until
it's done.
Fixes: 95c47f9cf5 ("ipv4: call dst_dev_put() properly")
Signed-off-by: Wei Wang <weiwan@google.com>
Reported-by: Ido Schimmel <idosch@idosch.org>
Reported-by: Jesse Hathaway <jesse@mbuki-mvuki.org>
Tested-by: Jesse Hathaway <jesse@mbuki-mvuki.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Cc: David Ahern <dsahern@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull Xtensa fixes from Max Filippov:
- fix {get,put}_user() for 64bit values
- fix warning about static EXPORT_SYMBOL from modpost
- fix PCI IO ports mapping for the virt board
- fix pasto in change_bit for exclusive access option
* tag 'xtensa-20191017' of git://github.com/jcmvbkbc/linux-xtensa:
xtensa: fix change_bit in exclusive access option
xtensa: virt: fix PCI IO ports mapping
xtensa: drop EXPORT_SYMBOL for outs*/ins*
xtensa: fix type conversion in __get_user_[no]check
xtensa: clean up assembly arguments in uaccess macros
xtensa: fix {get,put}_user() for 64bit values
...instead of -EINVAL. An issue was found with older kernel versions
while unplugging a NFS client with pending RPCs, and the wrong error
code here prevented it from recovering once link is back up with a
configured address.
Incidentally, this is not an issue anymore since commit 4f8943f808
("SUNRPC: Replace direct task wakeups from softirq context"), included
in 5.2-rc7, had the effect of decoupling the forwarding of this error
by using SO_ERROR in xs_wake_error(), as pointed out by Benjamin
Coddington.
To the best of my knowledge, this isn't currently causing any further
issue, but the error code doesn't look appropriate anyway, and we
might hit this in other paths as well.
In detail, as analysed by Gonzalo Siero, once the route is deleted
because the interface is down, and can't be resolved and we return
-EINVAL here, this ends up, courtesy of inet_sk_rebuild_header(),
as the socket error seen by tcp_write_err(), called by
tcp_retransmit_timer().
In turn, tcp_write_err() indirectly calls xs_error_report(), which
wakes up the RPC pending tasks with a status of -EINVAL. This is then
seen by call_status() in the SUN RPC implementation, which aborts the
RPC call calling rpc_exit(), instead of handling this as a
potentially temporary condition, i.e. as a timeout.
Return -EINVAL only if the input parameters passed to
ip_route_output_key_hash_rcu() are actually invalid (this is the case
if the specified source address is multicast, limited broadcast or
all zeroes), but return -ENETUNREACH in all cases where, at the given
moment, the given source address doesn't allow resolving the route.
While at it, drop the initialisation of err to -ENETUNREACH, which
was added to __ip_route_output_key() back then by commit
0315e38270 ("net: Fix behaviour of unreachable, blackhole and
prohibit routes"), but actually had no effect, as it was, and is,
overwritten by the fib_lookup() return code assignment, and anyway
ignored in all other branches, including the if (fl4->saddr) one:
I find this rather confusing, as it would look like -ENETUNREACH is
the "default" error, while that statement has no effect.
Also note that after commit fc75fc8339 ("ipv4: dont create routes
on down devices"), we would get -ENETUNREACH if the device is down,
but -EINVAL if the source address is specified and we can't resolve
the route, and this appears to be rather inconsistent.
Reported-by: Stefan Walter <walteste@inf.ethz.ch>
Analysed-by: Benjamin Coddington <bcodding@redhat.com>
Analysed-by: Gonzalo Siero <gsierohu@redhat.com>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The KSZ8051 PHY and the KSZ8794/KSZ8795/KSZ8765 switch share exactly the
same PHY ID. Since KSZ8051 is higher in the ksphy_driver[] list of PHYs
in the micrel PHY driver, it is used even with the KSZ87xx switch. This
is wrong, since the KSZ8051 configures registers of the PHY which are
not present on the simplified KSZ87xx switch PHYs and misconfigures
other registers of the KSZ87xx switch PHYs.
Fortunatelly, it is possible to tell apart the KSZ8051 PHY from the
KSZ87xx switch by checking the Basic Status register Bit 0, which is
read-only and indicates presence of the Extended Capability Registers.
The KSZ8051 PHY has those registers while the KSZ87xx switch does not.
This patch implements simple check for the presence of this bit for
both the KSZ8051 PHY and KSZ87xx switch, to let both use the correct
PHY driver instance.
Fixes: 9d162ed69f ("net: phy: micrel: add support for KSZ8795")
Signed-off-by: Marek Vasut <marex@denx.de>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: David S. Miller <davem@davemloft.net>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: George McCollister <george.mccollister@gmail.com>
Cc: Heiner Kallweit <hkallweit1@gmail.com>
Cc: Sean Nyekjaer <sean.nyekjaer@prevas.dk>
Cc: Tristram Ha <Tristram.Ha@microchip.com>
Cc: Woojung Huh <woojung.huh@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If ctx->cached_sq_head < nxt_sq_head, we should add UINT_MAX to tmp, not
tmp_nxt.
Fixes: 5da0fb1ab3 ("io_uring: consider the overflow of sequence for timeout req")
Signed-off-by: yangerkun <yangerkun@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
We've got two issues with the non-regular file handling for non-blocking
IO:
1) We don't want to re-do a short read in full for a non-regular file,
as we can't just read the data again.
2) For non-regular files that don't support non-blocking IO attempts,
we need to punt to async context even if the file is opened as
non-blocking. Otherwise the caller always gets -EAGAIN.
Add two new request flags to handle these cases. One is just a cache
of the inode S_ISREG() status, the other tells io_uring that we always
need to punt this request to async context, even if REQ_F_NOWAIT is set.
Cc: stable@vger.kernel.org
Reported-by: Hrvoje Zeba <zeba.hrvoje@gmail.com>
Tested-by: Hrvoje Zeba <zeba.hrvoje@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pull xfs fix from Darrick Wong:
"The single fix converts the seconds field in the recently added XFS
bulkstat structure to a signed 64-bit quantity.
The structure layout doesn't change and so far there are no users of
the ioctl to break because we only publish xfs ioctl interfaces
through the XFS userspace development libraries, and we're still
working on a 5.3 release"
* tag 'xfs-5.4-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: change the seconds fields in xfs_bulkstat to signed
user_pages array should always be freed after validation regardless if
user pages are changed after bo is created because with HMM change parse
bo always allocate user pages array to get user pages for userptr bo.
v2: remove unused local variable and amend commit
v3: add back get user pages in gem_userptr_ioctl, to detect application
bug where an userptr VMA is not ananymous memory and reject it.
Bugzilla: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1844962
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Tested-by: Joe Barnett <thejoe@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 5.3
Pull drm fixes from Dave Airlie:
"This is this weeks fixes for drm.
The dma-resv one is probably the more important one a fair few people
have reported it, besides that it's a couple of panfrost, a few i915
and a few amdgpu fixes.
One radeon patch to fix some ppc64 related issues caused an x86
regression so is getting reverted for now.
Summary:
dma-resv:
- shared fences for lima/panfrost
ttm:
- prefault regression fix
- lifetime fix
panfrost:
- stopped job timeout fix
- missing register values
amdgpu:
- smu7 powerplay fix
- bail earlier for cik/si detection
- navi SDMA fix
radeon:
- revert a ppc64 shutdown fix that broke x86
i915:
- VBT information handling fix
- Circular locking fix
- preemption vs resubmission virtual requests fix"
* tag 'drm-fixes-2019-10-18' of git://anongit.freedesktop.org/drm/drm:
drm/i915: Fixup preempt-to-busy vs resubmission of a virtual request
drm/i915/userptr: Never allow userptr into the mappable GGTT
drm/i915: Favor last VBT child device with conflicting AUX ch/DDC pin
drm/i915/execlists: Refactor -EIO markup of hung requests
drm/panfrost: Handle resetting on timeout better
drm/panfrost: Add missing GPU feature registers
drm/ttm: fix handling in ttm_bo_add_mem_to_lru
drm/ttm: Restore ttm prefaulting
drm/ttm: fix busy reference in ttm_mem_evict_first
drm/amdgpu/sdma5: fix mask value of POLL_REGMEM packet for pipe sync
drm/amdgpu: Bail earlier when amdgpu.cik_/si_support is not set to 1
Revert "drm/radeon: Fix EEH during kexec"
drm/msm/dsi: Implement reset correctly
dma-buf/resv: fix exclusive fence get
drm/edid: Add 6 bpc quirk for SDC panel in Lenovo G50
drm/tiny: Kconfig: Remove always-y THERMAL dep. from TINYDRM_REPAPER
drm/amdgpu/powerplay: fix typo in mvdd table setup
Workaround for Cavium/Marvell ThunderX2 erratum #219.
* errata/tx2-219:
arm64: Allow CAVIUM_TX2_ERRATUM_219 to be selected
arm64: Avoid Cavium TX2 erratum 219 when switching TTBR
arm64: Enable workaround for Cavium TX2 erratum 219 when running SMT
arm64: KVM: Trap VM ops when ARM64_WORKAROUND_CAVIUM_TX2_219_TVM is set
A TID RDMA READ request could be retried under one of the following
conditions:
- The RC retry timer expires;
- A later TID RDMA READ RESP packet is received before the next
expected one.
For the latter, under normal conditions, the PSN in IB space is used
for comparison. More specifically, the IB PSN in the incoming TID RDMA
READ RESP packet is compared with the last IB PSN of a given TID RDMA
READ request to determine if the request should be retried. This is
similar to the retry logic for noraml RDMA READ request.
However, if a TID RDMA READ RESP packet is lost due to congestion,
header suppresion will be disabled and each incoming packet will raise
an interrupt until the hardware flow is reloaded. Under this condition,
each packet KDETH PSN will be checked by software against r_next_psn
and a retry will be requested if the packet KDETH PSN is later than
r_next_psn. Since each TID RDMA READ segment could have up to 64
packets and each TID RDMA READ request could have many segments, we
could make far more retries under such conditions, and thus leading to
RETRY_EXC_ERR status.
This patch fixes the issue by removing the retry when the incoming
packet KDETH PSN is later than r_next_psn. Instead, it resorts to
RC timer and normal IB PSN comparison for any request retry.
Fixes: 9905bf06e8 ("IB/hfi1: Add functions to receive TID RDMA READ response")
Cc: <stable@vger.kernel.org>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Link: https://lore.kernel.org/r/20191004204035.26542.41684.stgit@awfm-01.aw.intel.com
Signed-off-by: Doug Ledford <dledford@redhat.com>
The KSZ driver uses one regmap per register width (8/16/32), each with
it's own lock, but accessing the same set of registers. In theory, it
is possible to create a race condition between these regmaps, although
the underlying bus (SPI or I2C) locking should assure nothing bad will
really happen and the accesses would be correct.
To make the driver do the right thing, add one single shared mutex for
all the regmaps used by the driver instead. This assures that even if
some future hardware is on a bus which does not serialize the accesses
the same way SPI or I2C does, nothing bad will happen.
Note that the status_mutex was unused and only initied, hence it was
renamed and repurposed as the regmap mutex.
Signed-off-by: Marek Vasut <marex@denx.de>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: David S. Miller <davem@davemloft.net>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: George McCollister <george.mccollister@gmail.com>
Cc: Tristram Ha <Tristram.Ha@microchip.com>
Cc: Woojung Huh <woojung.huh@microchip.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
The stmmac_pcs_ctrl_ane() expects a register address as
argument 1, but for some reason the mac_device_info is
being passed.
Fix the warning (and possible bug) from sparse:
drivers/net/ethernet/stmicro/stmmac/stmmac_main.c:2613:17: warning: incorrect type in argument 1 (different address spaces)
drivers/net/ethernet/stmicro/stmmac/stmmac_main.c:2613:17: expected void [noderef] <asn:2> *ioaddr
drivers/net/ethernet/stmicro/stmmac/stmmac_main.c:2613:17: got struct mac_device_info *hw
Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ioana Ciornei says:
====================
dpaa2-eth: misc fixes
This patch set adds a couple of fixes around updating configuration on MAC
change. Depending on when MC connects the DPNI to a MAC, both the MAC
address and TX FQIDs should be updated everytime there is a change in
configuration.
Changes in v2:
- used reverse christmas tree ordering in patch 2/2
Changes in v3:
- add a missing new line
- go back to FQ based enqueueing after a transient error
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Depending on when MC connects the DPNI to a MAC, Tx FQIDs may
not be available during probe time.
Read the FQIDs each time the link goes up to avoid using invalid
values. In case an error occurs or an invalid value is retrieved,
fall back to QDID-based enqueueing.
Fixes: 1fa0f68c92 ("dpaa2-eth: Use FQ-based DPIO enqueue API")
Signed-off-by: Ioana Radulescu <ruxandra.radulescu@nxp.com>
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add IRQ for the DPNI endpoint change event, resolving the issue
when a dynamically created DPNI gets a randomly generated hw address
when the endpoint is a DPMAC object.
Signed-off-by: Florin Chiculita <florinlaurentiu.chiculita@nxp.com>
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The serial state information must not be embedded into another
data structure, as this interferes with cache handling for DMA
on architectures without cache coherence..
That would result in data corruption on some architectures
Allocating it separately.
v2: fix syntax error
Signed-off-by: Oliver Neukum <oneukum@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We were checking for the full fsync flag in the inode before locking the
inode, which is racy, since at that that time it might not be set but
after we acquire the inode lock some other task set it. One case where
this can happen is on a system low on memory and some concurrent task
failed to allocate an extent map and therefore set the full sync flag on
the inode, to force the next fsync to work in full mode.
A consequence of missing the full fsync flag set is hitting the problems
fixed by commit 0c713cbab6 ("Btrfs: fix race between ranged fsync and
writeback of adjacent ranges"), BUG_ON() when dropping extents from a log
tree, hitting assertion failures at tree-log.c:copy_items() or all sorts
of weird inconsistencies after replaying a log due to file extents items
representing ranges that overlap.
So just move the check such that it's done after locking the inode and
before starting writeback again.
Fixes: 0c713cbab6 ("Btrfs: fix race between ranged fsync and writeback of adjacent ranges")
CC: stable@vger.kernel.org # 5.2+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Pull input fixes from Dmitry Torokhov:
"The main change is that we are reverting blanket enablement of SMBus
mode for devices with Elan touchpads that report BIOS release date as
2018+ because there are older boxes with updated BIOSes that still do
not work well in SMbus mode.
We will have to establish whitelist for SMBus mode it looks like"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Revert "Input: elantech - enable SMBus on new (2018+) systems"
Input: synaptics-rmi4 - avoid processing unknown IRQs
Input: soc_button_array - partial revert of support for newer surface devices
Input: goodix - add support for 9-bytes reports
Input: da9063 - fix capability and drop KEY_SLEEP
If we fail to reserve metadata for delalloc operations we end up releasing
the previously reserved qgroup amount twice, once explicitly under the
'out_qgroup' label by calling btrfs_qgroup_free_meta_prealloc() and once
again, under label 'out_fail', by calling btrfs_inode_rsv_release() with a
value of 'true' for its 'qgroup_free' argument, which results in
btrfs_qgroup_free_meta_prealloc() being called again, so we end up having
a double free.
Also if we fail to reserve the necessary qgroup amount, we jump to the
label 'out_fail', which calls btrfs_inode_rsv_release() and that in turns
calls btrfs_qgroup_free_meta_prealloc(), even though we weren't able to
reserve any qgroup amount. So we freed some amount we never reserved.
So fix this by removing the call to btrfs_inode_rsv_release() in the
failure path, since it's not necessary at all as we haven't changed the
inode's block reserve in any way at this point.
Fixes: c8eaeac7b7 ("btrfs: reserve delalloc metadata differently")
CC: stable@vger.kernel.org # 5.2+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
While it is useful for new drivers to use devm_platform_ioremap_resource,
this script is currently used to spam maintainers, often updating very
old drivers. The net benefit is the removal of 2 lines of code in the
driver but the review load for the maintainers is huge. As of now, more
that 560 patches have been sent, some of them obviously broken, as in:
https://lore.kernel.org/lkml/9bbcce19c777583815c92ce3c2ff2586@www.loen.fr/
Remove the script to reduce the spam.
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Acked-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Przemysław Kopa reports that since commit b516ea586d ("PCI: Enable
NVIDIA HDA controllers"), the discrete GPU Nvidia GeForce GT 540M on his
2011 Samsung laptop refuses to runtime suspend, resulting in a power
regression and excessive heat.
Rivera Valdez witnesses the same issue with a GeForce GT 525M (GF108M)
of the same era, as does another Arch Linux user named "R0AR" with a
more recent GeForce GTX 1050 Ti (GP107M).
The commit exposes the discrete GPU's HDA controller and all four codecs
on the controller do not set the CLKSTOP and EPSS bits in the Supported
Power States Response. They also do not set the PS-ClkStopOk bit in the
Get Power State Response. hda_codec_runtime_suspend() therefore does
not call snd_hdac_codec_link_down(), which prevents each codec and the
PCI device from runtime suspending.
The same issue is present on some AMD discrete GPUs and we addressed it
by forcing runtime PM despite the bits not being set, see commit
57cb54e53b ("ALSA: hda - Force to link down at runtime suspend on
ATI/AMD HDMI").
Do the same for Nvidia HDMI codecs.
Fixes: b516ea586d ("PCI: Enable NVIDIA HDA controllers")
Link: https://bbs.archlinux.org/viewtopic.php?pid=1865512
Link: https://bugs.freedesktop.org/show_bug.cgi?id=75985#c81
Reported-by: Przemysław Kopa <prymoo@gmail.com>
Reported-by: Rivera Valdez <riveravaldez@ysinembargo.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Cc: Daniel Drake <dan@reactivated.net>
Cc: stable@vger.kernel.org # v5.3+
Link: https://lore.kernel.org/r/3086bc75135c1e3567c5bc4f3cc4ff5cbf7a56c2.1571324194.git.lukas@wunner.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Pull x86 platform driver fixes from Andy Shevchenko:
- Users of Intel P-Unit IPC driver might be surprised by harmless
warning. Thus, switch to API which doesn't issue a warning at all.
- I²C multi-instantiate driver continues to add slave devices even when
IRQ resource is not found. For devices in the market IRQ resource is
mandatory, so, fail the ->probe() of the parent driver to avoid
slaves being probed.
- Avoid compiler warning due to unused variable in Classmate laptop
driver.
* tag 'platform-drivers-x86-v5.4-3' of git://git.infradead.org/linux-platform-drivers-x86:
platform/x86: i2c-multi-instantiate: Fail the probe if no IRQ provided
platform/x86: intel_punit_ipc: Avoid error message when retrieving IRQ
platform/x86: classmate-laptop: remove unused variable
GFP_NOWAIT allocation can fail anytime - it doesn't wait for memory being
available and it fails if the mempool is exhausted and there is not enough
memory.
If we go down this path:
map_bio -> mg_start -> alloc_migration -> mempool_alloc(GFP_NOWAIT)
we can see that map_bio() doesn't check the return value of mg_start(),
and the bio is leaked.
If we go down this path:
map_bio -> mg_start -> mg_lock_writes -> alloc_prison_cell ->
dm_bio_prison_alloc_cell_v2 -> mempool_alloc(GFP_NOWAIT) ->
mg_lock_writes -> mg_complete
the bio is ended with an error - it is unacceptable because it could
cause filesystem corruption if the machine ran out of memory
temporarily.
Change GFP_NOWAIT to GFP_NOIO, so that the mempool code will properly
wait until memory becomes available. mempool_alloc with GFP_NOIO can't
fail, so remove the code paths that deal with allocation failure.
Cc: stable@vger.kernel.org
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Pull GPIO fixes from Linus Walleij:
"The fixes pertain to a problem with initializing the Intel GPIO
irqchips when adding gpiochips.
Andy fixed it up elegantly by adding a hardware initialization
callback to the struct gpio_irq_chip so let's use this. Tested and
verified on the target hardware"
* tag 'gpio-v5.4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
gpio: lynxpoint: set default handler to be handle_bad_irq()
gpio: merrifield: Move hardware initialization to callback
gpio: lynxpoint: Move hardware initialization to callback
gpio: intel-mid: Move hardware initialization to callback
gpiolib: Initialize the hardware with a callback
gpio: merrifield: Restore use of irq_base
DA850 EVM has been converted to use GPIO backlight device
for display backlight GPIO control.
Enable the GPIO backlight module in davinci_all_defconfig
to keep backlight support working.
Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
[nsekhar@ti.com: edits to commit message for context]
Signed-off-by: Sekhar Nori <nsekhar@ti.com>
dm365 have only single McBSP, so the device name is without .0
Fixes: 0c750e1fe4 ("ARM: davinci: dm365: Add dma_slave_map to edma")
Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
Signed-off-by: Sekhar Nori <nsekhar@ti.com>
binder_mmap() tries to prevent the creation of overly big binder mappings
by silently truncating the size of the VMA to 4MiB. However, this violates
the API contract of mmap(). If userspace attempts to create a large binder
VMA, and later attempts to unmap that VMA, it will call munmap() on a range
beyond the end of the VMA, which may have been allocated to another VMA in
the meantime. This can lead to userspace memory corruption.
The following sequence of calls leads to a segfault without this commit:
int main(void) {
int binder_fd = open("/dev/binder", O_RDWR);
if (binder_fd == -1) err(1, "open binder");
void *binder_mapping = mmap(NULL, 0x800000UL, PROT_READ, MAP_SHARED,
binder_fd, 0);
if (binder_mapping == MAP_FAILED) err(1, "mmap binder");
void *data_mapping = mmap(NULL, 0x400000UL, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
if (data_mapping == MAP_FAILED) err(1, "mmap data");
munmap(binder_mapping, 0x800000UL);
*(char*)data_mapping = 1;
return 0;
}
Cc: stable@vger.kernel.org
Signed-off-by: Jann Horn <jannh@google.com>
Acked-by: Todd Kjos <tkjos@google.com>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Link: https://lore.kernel.org/r/20191016150119.154756-1-jannh@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[BUG]
For btrfs:qgroup_meta_reserve event, the trace event can output garbage:
qgroup_meta_reserve: 9c7f6acc-b342-4037-bc47-7f6e4d2232d7: refroot=5(FS_TREE) type=DATA diff=2
qgroup_meta_reserve: 9c7f6acc-b342-4037-bc47-7f6e4d2232d7: refroot=5(FS_TREE) type=0x258792 diff=2
The @type can be completely garbage, as DATA type is not possible for
trace_qgroup_meta_reserve() trace event.
[CAUSE]
Ther are several problems related to qgroup trace events:
- Unassigned entry member
Member entry::type of trace_qgroup_update_reserve() and
trace_qgourp_meta_reserve() is not assigned
- Redundant entry member
Member entry::type is completely useless in
trace_qgroup_meta_convert()
Fixes: 4ee0d8832c ("btrfs: qgroup: Update trace events for metadata reservation")
CC: stable@vger.kernel.org # 4.10+
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
For btrfs:qgroup_meta_reserve event, the trace event can output garbage:
qgroup_meta_reserve: 9c7f6acc-b342-4037-bc47-7f6e4d2232d7: refroot=5(FS_TREE) type=DATA diff=2
The diff should always be alinged to sector size (4k), so there is
definitely something wrong.
[CAUSE]
For the wrong @diff, it's caused by wrong parameter order.
The correct parameters are:
struct btrfs_root, s64 diff, int type.
However the parameters used are:
struct btrfs_root, int type, s64 diff.
Fixes: 4ee0d8832c ("btrfs: qgroup: Update trace events for metadata reservation")
CC: stable@vger.kernel.org # 4.19+
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Both multi_cpu_stop() and set_state() access multi_stop_data::state
racily using plain accesses. These are subject to compiler
transformations which could break the intended behaviour of the code,
and this situation is detected by KCSAN on both arm64 and x86 (splats
below).
Improve matters by using READ_ONCE() and WRITE_ONCE() to ensure that the
compiler cannot elide, replay, or tear loads and stores.
In multi_cpu_stop() the two loads of multi_stop_data::state are expected to
be a consistent value, so snapshot the value into a temporary variable to
ensure this.
The state transitions are serialized by atomic manipulation of
multi_stop_data::num_threads, and other fields in multi_stop_data are not
modified while subject to concurrent reads.
KCSAN splat on arm64:
| BUG: KCSAN: data-race in multi_cpu_stop+0xa8/0x198 and set_state+0x80/0xb0
|
| write to 0xffff00001003bd00 of 4 bytes by task 24 on cpu 3:
| set_state+0x80/0xb0
| multi_cpu_stop+0x16c/0x198
| cpu_stopper_thread+0x170/0x298
| smpboot_thread_fn+0x40c/0x560
| kthread+0x1a8/0x1b0
| ret_from_fork+0x10/0x18
|
| read to 0xffff00001003bd00 of 4 bytes by task 14 on cpu 1:
| multi_cpu_stop+0xa8/0x198
| cpu_stopper_thread+0x170/0x298
| smpboot_thread_fn+0x40c/0x560
| kthread+0x1a8/0x1b0
| ret_from_fork+0x10/0x18
|
| Reported by Kernel Concurrency Sanitizer on:
| CPU: 1 PID: 14 Comm: migration/1 Not tainted 5.3.0-00007-g67ab35a199f4-dirty #3
| Hardware name: linux,dummy-virt (DT)
KCSAN splat on x86:
| write to 0xffffb0bac0013e18 of 4 bytes by task 19 on cpu 2:
| set_state kernel/stop_machine.c:170 [inline]
| ack_state kernel/stop_machine.c:177 [inline]
| multi_cpu_stop+0x1a4/0x220 kernel/stop_machine.c:227
| cpu_stopper_thread+0x19e/0x280 kernel/stop_machine.c:516
| smpboot_thread_fn+0x1a8/0x300 kernel/smpboot.c:165
| kthread+0x1b5/0x200 kernel/kthread.c:255
| ret_from_fork+0x35/0x40 arch/x86/entry/entry_64.S:352
|
| read to 0xffffb0bac0013e18 of 4 bytes by task 44 on cpu 7:
| multi_cpu_stop+0xb4/0x220 kernel/stop_machine.c:213
| cpu_stopper_thread+0x19e/0x280 kernel/stop_machine.c:516
| smpboot_thread_fn+0x1a8/0x300 kernel/smpboot.c:165
| kthread+0x1b5/0x200 kernel/kthread.c:255
| ret_from_fork+0x35/0x40 arch/x86/entry/entry_64.S:352
|
| Reported by Kernel Concurrency Sanitizer on:
| CPU: 7 PID: 44 Comm: migration/7 Not tainted 5.3.0+ #1
| Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Marco Elver <elver@google.com>
Link: https://lkml.kernel.org/r/20191007104536.27276-1-mark.rutland@arm.com
ghes_edac models a single logical memory controller, and uses a global
ghes_init variable to ensure only the first ghes_edac_register() will
do anything.
ghes_edac is registered the first time a GHES entry in the HEST is
probed. There may be multiple entries, so subsequent attempts to
register ghes_edac are silently ignored as the work has already been
done.
When a GHES entry is unregistered, it calls ghes_edac_unregister(),
which free()s the memory behind the global variables in ghes_edac.
But there may be multiple GHES entries, the next call to
ghes_edac_unregister() will dereference the free()d memory, and attempt
to free it a second time.
This may also be triggered on a platform with one GHES entry, if the
driver is unbound/re-bound and unbound. The re-bind step will do
nothing because of ghes_init, the second unbind will then do the same
work as the first.
Doing the unregister work on the first call is unsafe, as another
CPU may be processing a notification in ghes_edac_report_mem_error(),
using the memory we are about to free.
ghes_init is already half of the reference counting. We only need
to do the register work for the first call, and the unregister work
for the last. Add the unregister check.
This means we no longer free ghes_edac's memory while there are
GHES entries that may receive a notification.
This was detected by KASAN and DEBUG_TEST_DRIVER_REMOVE.
[ bp: merge into a single patch. ]
Fixes: 0fe5f281f7 ("EDAC, ghes: Model a single, logical memory controller")
Reported-by: John Garry <john.garry@huawei.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Robert Richter <rrichter@marvell.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20191014171919.85044-2-james.morse@arm.com
Link: https://lkml.kernel.org/r/304df85b-8b56-b77e-1a11-aa23769f2e7c@huawei.com
On Asus MJ401TA (with Realtek ALC256), the headset mic is connected to
pin 0x19, with default configuration value 0x411111f0 (indicating no
physical connection).
Enable this by quirking the pin. Mic jack detection was also tested and
found to be working.
This enables use of the headset mic on this product.
Signed-off-by: Daniel Drake <drake@endlessm.com>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20191017081501.17135-1-drake@endlessm.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
The option --sort=ORDER was only introduced in tar 1.28 (2014), which
is rather new and might not be available in some setups.
This patch tries to replicate the previous behaviour as closely as
possible to fix the kheaders build for older environments. It does
not produce identical archives compared to the previous version due
to minor sorting differences but produces reproducible results itself
in my tests.
Reported-by: Andreas Schwab <schwab@suse.de>
Signed-off-by: Dmitry Goldin <dgoldin+lkml@protonmail.ch>
Tested-by: Andreas Schwab <schwab@suse.de>
Tested-by: Quentin Perret <qperret@google.com>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
This pull request clarifies maintainership of the BCM2711 and adds a replacement
mail address for a former contributor.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
disable ptp_ref_clk in suspend flow, and enable it in resume flow.
Fixes: f573c0b9c4 ("stmmac: move stmmac_clk, pclk, clk_ptp_ref and stmmac_rst to platform structure")
Signed-off-by: Biao Huang <biao.huang@mediatek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some drivers just call phy_ethtool_ksettings_set() to set the
links, for those phy drivers that use genphy_read_status(), if
autoneg is on, and the link is up, than execute "ethtool -s
ethx autoneg on" will cause "link partner" information disappear.
The call trace is phy_ethtool_ksettings_set()->phy_start_aneg()
->linkmode_zero(phydev->lp_advertising)->genphy_read_status(),
the link didn't change, so genphy_read_status() just return, and
phydev->lp_advertising is zero now.
This patch moves the clear operation of lp_advertising from
phy_start_aneg() to genphy_read_lpa()/genphy_c45_read_lpa(), and
if autoneg on and autoneg not complete, just clear what the
generic functions care about.
Fixes: 88d6272aca ("net: phy: avoid unneeded MDIO reads in genphy_read_status")
Signed-off-by: Yonglong Liu <liuyonglong@huawei.com>
Reviewed-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We need to extend the rcu_read_lock() section in rxrpc_error_report()
and use rcu_dereference_sk_user_data() instead of plain access
to sk->sk_user_data to make sure all rules are respected.
The compiler wont reload sk->sk_user_data at will, and RCU rules
prevent memory beeing freed too soon.
Fixes: f0308fb070 ("rxrpc: Fix possible NULL pointer access in ICMP handling")
Fixes: 17926a7932 ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As preempt-to-busy leaves the request on the HW as the resubmission is
processed, that request may complete in the background and even cause a
second virtual request to enter queue. This second virtual request
breaks our "single request in the virtual pipeline" assumptions.
Furthermore, as the virtual request may be completed and retired, we
lose the reference the virtual engine assumes is held. Normally, just
removing the request from the scheduler queue removes it from the
engine, but the virtual engine keeps track of its singleton request via
its ve->request. This pointer needs protecting with a reference.
v2: Drop unnecessary motion of rq->engine = owner
Fixes: 22b7a426bb ("drm/i915/execlists: Preempt-to-busy")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190923152844.8914-1-chris@chris-wilson.co.uk
(cherry picked from commit b647c7df01)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
This reverts commit b0818f80c8.
Started seeing weird behavior after this patch especially in
the IPv6 code path. Haven't root caused it, but since this was
applied to net branch, taking a precautionary measure to revert
it and look / analyze those failures
Revert this now and I'll send a better fix after analysing / fixing
the weirdness observed.
CC: Eric Dumazet <edumazet@google.com>
CC: Wei Wang <weiwan@google.com>
CC: David S. Miller <davem@davemloft.net>
Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sign-extending TTBR1 addresses when converting to an untagged address
breaks the documented POSIX semantics for mlock() in some obscure error
cases where we end up returning -EINVAL instead of -ENOMEM as a direct
result of rewriting the upper address bits.
Rework the untagged_addr() macro to preserve the upper address bits for
TTBR1 addresses and only clear the tag bits for user addresses. This
matches the behaviour of the 'clear_address_tag' assembly macro, so
rename that and align the implementations at the same time so that they
use the same instruction sequences for the tag manipulation.
Link: https://lore.kernel.org/stable/20191014162651.GF19200@arrakis.emea.arm.com/
Reported-by: Jan Stancek <jstancek@redhat.com>
Tested-by: Jan Stancek <jstancek@redhat.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
When detecting a spurious EL1 translation fault, we have the CPU retry
the translation using an AT S1E1R instruction, and inspect PAR_EL1 to
determine if the fault was spurious.
When PAR_EL1.F == 0, the AT instruction successfully translated the
address without a fault, which implies the original fault was spurious.
However, in this case we return false and treat the original fault as if
it was not spurious.
Invert the return value so that we treat such a case as spurious.
Cc: Catalin Marinas <catalin.marinas@arm.com>
Fixes: 42f91093b0 ("arm64: mm: Ignore spurious translation faults taken from the kernel")
Tested-by: James Morse <james.morse@arm.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
The 'F' field of the PAR_EL1 register lives in bit 0, not bit 1.
Fix the broken definition in 'sysreg.h'.
Fixes: e8620cff99 ("arm64: sysreg: Add some field definitions for PAR_EL1")
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Will Deacon <will@kernel.org>
Preempting from IRQ-return means that the task has its PSTATE saved
on the stack, which will get restored when the task is resumed and does
the actual IRQ return.
However, enabling some CPU features requires modifying the PSTATE. This
means that, if a task was scheduled out during an IRQ-return before all
CPU features are enabled, the task might restore a PSTATE that does not
include the feature enablement changes once scheduled back in.
* Task 1:
PAN == 0 ---| |---------------
| |<- return from IRQ, PSTATE.PAN = 0
| <- IRQ |
+--------+ <- preempt() +--
^
|
reschedule Task 1, PSTATE.PAN == 1
* Init:
--------------------+------------------------
^
|
enable_cpu_features
set PSTATE.PAN on all CPUs
Worse than this, since PSTATE is untouched when task switching is done,
a task missing the new bits in PSTATE might affect another task, if both
do direct calls to schedule() (outside of IRQ/exception contexts).
Fix this by preventing preemption on IRQ-return until features are
enabled on all CPUs.
This way the only PSTATE values that are saved on the stack are from
synchronous exceptions. These are expected to be fatal this early, the
exception is BRK for WARN_ON(), but as this uses do_debug_exception()
which keeps IRQs masked, it shouldn't call schedule().
Signed-off-by: Julien Thierry <julien.thierry@arm.com>
[james: Replaced a really cool hack, with an even simpler static key in C.
expanded commit message with Julien's cover-letter ascii art]
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
The message should match the parameter, i.e. raid0.default_layout.
Fixes: c84a1372df ("md/raid0: avoid RAID0 data corruption due to layout confusion.")
Cc: NeilBrown <neilb@suse.de>
Reported-by: Ivan Topolsky <doktor.yak@gmail.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
The __kthread_queue_delayed_work is not exported so
make it static, to avoid the following sparse warning:
kernel/kthread.c:869:6: warning: symbol '__kthread_queue_delayed_work' was not declared. Should it be static?
Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signal descriptors can represent multi-bit bitfields and so have
explicit "enable" and "disable" states. However many descriptor
instances only describe a single bit, and so the SIG_DESC_SET() macro is
provides an abstraction for the single-bit cases: Its expansion
configures the "enable" state to set the bit and "disable" to clear.
SIG_DESC_CLEAR() was introduced to provide a similar single-bit
abstraction for for descriptors to clear the bit of interest. However
its behaviour was defined as the literal inverse of SIG_DESC_SET() - the
impact is the bit of interest is set in the disable path. This behaviour
isn't intuitive and doesn't align with how we want to use the macro in
practice, so make it clear the bit for both the enable and disable
paths.
Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
Link: https://lore.kernel.org/r/20191008044153.12734-6-andrew@aj.id.au
Reviewed-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
The I2C function the pin participated in was incorrectly named SDA14
which lead to a failure to mux:
[ 6.884344] No function I2C14 found on pin 7 (7). Found signal(s) MACLINK4, SDA14, GPIOA7 for function(s) MACLINK4, SDA14, GPIOA7
Fixes: 58dc52ad00a0 ("pinctrl: aspeed: Add AST2600 pinmux support")
Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
Link: https://lore.kernel.org/r/20191008044153.12734-4-andrew@aj.id.au
Reviewed-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Rename SD3 functions and groups to EMMC to better reflect their intended
use before the binding escapes too far into the wild. Also clean up the
SD3 pin groups to eliminate some silliness that slipped through the
cracks (SD3DAT[4-7]) by unifying them into three new groups: EMMCG1,
EMMCG4 and EMMCG8 for 1, 4 and 8-bit data buses respectively.
Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
Link: https://lore.kernel.org/r/20191008044153.12734-2-andrew@aj.id.au
Reviewed-by: Rob Herring <robh@kernel.org>
Reviewed-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
On a machine with a 64K PAGE_SIZE, the nested for loops in
test_check_nonzero_user() can lead to soft lockups, eg:
watchdog: BUG: soft lockup - CPU#4 stuck for 22s! [modprobe:611]
Modules linked in: test_user_copy(+) vmx_crypto gf128mul crc32c_vpmsum virtio_balloon ip_tables x_tables autofs4
CPU: 4 PID: 611 Comm: modprobe Tainted: G L 5.4.0-rc1-gcc-8.2.0-00001-gf5a1a536fa14-dirty #1151
...
NIP __might_sleep+0x20/0xc0
LR __might_fault+0x40/0x60
Call Trace:
check_zeroed_user+0x12c/0x200
test_user_copy_init+0x67c/0x1210 [test_user_copy]
do_one_initcall+0x60/0x340
do_init_module+0x7c/0x2f0
load_module+0x2d94/0x30e0
__do_sys_finit_module+0xc8/0x150
system_call+0x5c/0x68
Even with a 4K PAGE_SIZE the test takes multiple seconds. Instead
tweak it to only scan a 1024 byte region, but make it cross the
page boundary.
Fixes: f5a1a536fa ("lib: introduce copy_struct_from_user() helper")
Suggested-by: Aleksa Sarai <cyphar@cyphar.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Aleksa Sarai <cyphar@cyphar.com>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Link: https://lore.kernel.org/r/20191016122732.13467-1-mpe@ellerman.id.au
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
If there are neither processor objects nor processor device objects
in the ACPI tables, the per-CPU processors table will not be
initialized and attempting to dereference pointers from there will
cause the kernel to crash. This happens in acpi_processor_ppc_init()
and acpi_thermal_cpufreq_init() after commit d15ce41273 ("ACPI:
cpufreq: Switch to QoS requests instead of cpufreq notifier")
which didn't add the requisite NULL pointer checks in there.
Add the NULL pointer checks to acpi_processor_ppc_init() and
acpi_thermal_cpufreq_init(), and to the corresponding "exit"
routines.
While at it, drop redundant return instructions from
acpi_processor_ppc_init() and acpi_thermal_cpufreq_init().
Fixes: d15ce41273 ("ACPI: cpufreq: Switch to QoS requests instead of cpufreq notifier")
Reported-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Use the tdev pointer directly instead of going through the port data
when accessing the serial data in close().
Signed-off-by: Johan Hovold <johan@kernel.org>
Fix races between closing a port and opening or closing another port on
the same device which could lead to a failure to start or stop the
shared interrupt URB. The latter could potentially cause a
use-after-free or worse in the completion handler on driver unbind.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Johan Hovold <johan@kernel.org>
change_bit implementation for XCHAL_HAVE_EXCLUSIVE case changes all bits
except the one required due to copy-paste error from clear_bit.
Cc: stable@vger.kernel.org # v5.2+
Fixes: f7c34874f0 ("xtensa: add exclusive atomics support")
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
This patch fixes the virtual address layout in pgtable.h. The virtual
address of FIXADDR_START and VMEMMAP_START should not be overlapped.
Fixes: d95f1a542c ("RISC-V: Implement sparsemem")
Signed-off-by: Greentime Hu <greentime.hu@sifive.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
[paul.walmsley@sifive.com: fixed patch description]
Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
The RGMII_MODE_EN bit value was 0 for GENET versions 1 through 3, and
became 6 for GENET v4 and above, account for that difference.
Fixes: aa09677cba ("net: bcmgenet: add MDIO routines")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Doug Berger <opendmb@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The tc_flow_parsers is not used outside of the driver, so
make it static to avoid the following sparse warning:
drivers/net/ethernet/stmicro/stmmac/stmmac_tc.c:516:3: warning: symbol 'tc_flow_parsers' was not declared. Should it be static?
Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
The cpdma_chan_split_pool() function is not used outside of
the driver, so make it static to avoid the following sparse
warning:
drivers/net/ethernet/ti/davinci_cpdma.c:725:5: warning: symbol 'cpdma_chan_split_pool' was not declared. Should it be static?
Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 7f683b9204 ("i825xx: switch to switch to dma_alloc_attrs")
switched dma allocation over to dma_alloc_attr, but didn't convert
the SNI part to request consistent DMA memory. This broke sni_82596
since driver doesn't do dma_cache_sync for performance reasons.
Fix this by using different DMA_ATTRs for lasi_82596 and sni_82596.
Fixes: 7f683b9204 ("i825xx: switch to switch to dma_alloc_attrs")
Signed-off-by: Thomas Bogendoerfer <tbogendoerfer@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
syzbot reported a memory leak:
BUG: memory leak, unreferenced object 0xffff888120b3d380 (size 64):
backtrace:
[...] slab_alloc mm/slab.c:3319 [inline]
[...] kmem_cache_alloc+0x13f/0x2c0 mm/slab.c:3483
[...] sctp_bucket_create net/sctp/socket.c:8523 [inline]
[...] sctp_get_port_local+0x189/0x5a0 net/sctp/socket.c:8270
[...] sctp_do_bind+0xcc/0x200 net/sctp/socket.c:402
[...] sctp_bindx_add+0x4b/0xd0 net/sctp/socket.c:497
[...] sctp_setsockopt_bindx+0x156/0x1b0 net/sctp/socket.c:1022
[...] sctp_setsockopt net/sctp/socket.c:4641 [inline]
[...] sctp_setsockopt+0xaea/0x2dc0 net/sctp/socket.c:4611
[...] sock_common_setsockopt+0x38/0x50 net/core/sock.c:3147
[...] __sys_setsockopt+0x10f/0x220 net/socket.c:2084
[...] __do_sys_setsockopt net/socket.c:2100 [inline]
It was caused by when sending msgs without binding a port, in the path:
inet_sendmsg() -> inet_send_prepare() -> inet_autobind() ->
.get_port/sctp_get_port(), sp->bind_hash will be set while bp->port is
not. Later when binding another port by sctp_setsockopt_bindx(), a new
bucket will be created as bp->port is not set.
sctp's autobind is supposed to call sctp_autobind() where it does all
things including setting bp->port. Since sctp_autobind() is called in
sctp_sendmsg() if the sk is not yet bound, it should have skipped the
auto bind.
THis patch is to avoid calling inet_autobind() in inet_send_prepare()
by changing sctp_prot .no_autobind with true, also remove the unused
.get_port.
Reported-by: syzbot+d44f7bbebdea49dbc84a@syzkaller.appspotmail.com
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a application sends many packets with the same txtime, they may
be transmitted out of order (different from the order in which they
were enqueued).
This happens because when inserting elements into the tree, when the
txtime of two packets are the same, the new packet is inserted at the
left side of the tree, causing the reordering. The only effect of this
change should be that packets with the same txtime will be transmitted
in the order they are enqueued.
The application in question (the AVTP GStreamer plugin, still in
development) is sending video traffic, in which each video frame have
a single presentation time, the problem is that when packetizing,
multiple packets end up with the same txtime.
The receiving side was rejecting packets because they were being
received out of order.
Fixes: 25db26a913 ("net/sched: Introduce the ETF Qdisc")
Reported-by: Ederson de Souza <ederson.desouza@intel.com>
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch corrects the SPDX License Identifier style
in header files related to Distributed Switch Architecture
drivers for NXP SJA1105 series Ethernet switch support.
It uses an expilict block comment for the SPDX License
Identifier.
Changes made by using a script provided by Joe Perches here:
https://lkml.org/lkml/2019/2/7/46.
Suggested-by: Joe Perches <joe@perches.com>
Signed-off-by: Nishad Kamdar <nishadkamdar@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
syzbot found that if __inet_inherit_port() returns an error,
we call tcp_done() after inet_csk_prepare_forced_close(),
meaning the socket lock is no longer held.
We might fix this in a different way in net-next, but
for 5.4 it seems safer to relax the lockdep check.
Fixes: d983ea6f16 ("tcp: add rcu protection around tp->fastopen_rsk")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
MarkLee says:
====================
Update MT7629 to support PHYLINK API
This patch set has two goals :
1. Fix mt7629 GMII mode issue after apply mediatek
PHYLINK support patch.
2. Update mt7629 dts to reflect the latest dt-binding
with PHYLINK support.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
* Removes mediatek,physpeed property from dtsi that is useless in PHYLINK
* Use the fixed-link property speed = <2500> to set the phy in 2.5Gbit.
* Set gmac1 to gmii mode that connect to a internal gphy
Signed-off-by: MarkLee <Mark-MC.Lee@mediatek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In the original design, mtk_phy_connect function will set ge_mode=1
if phy-mode is GMII(PHY_INTERFACE_MODE_GMII) and then set the correct
ge_mode to ETHSYS_SYSCFG0 register. This logic was broken after apply
mediatek PHYLINK patch(Fixes tag), the new mtk_mac_config function will
not set ge_mode=1 for GMII mode hence the final ETHSYS_SYSCFG0 setting
will be incorrect for mt7629 GMII mode. This patch add the missing logic
back to fix it.
Fixes: b8fc9f3082 ("net: ethernet: mediatek: Add basic PHYLINK support")
Signed-off-by: MarkLee <Mark-MC.Lee@mediatek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Davide Caratti says:
====================
net/sched: fix wrong behavior of MPLS push/pop action
this series contains two fixes for TC 'act_mpls', that try to address
two problems that can be observed configuring simple 'push' / 'pop'
operations:
- patch 1/2 avoids dropping non-MPLS packets that pass through the MPLS
'pop' action.
- patch 2/2 fixes corruption of the L2 header that occurs when 'push'
or 'pop' actions are configured in TC egress path.
v2: - change commit message in patch 1/2 to better describe that the
patch impacts only TC, thanks to Simon Horman
- fix missing documentation of 'mac_len' in patch 2/2
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
the following script:
# tc qdisc add dev eth0 clsact
# tc filter add dev eth0 egress protocol ip matchall \
> action mpls push protocol mpls_uc label 0x355aa bos 1
causes corruption of all IP packets transmitted by eth0. On TC egress, we
can't rely on the value of skb->mac_len, because it's 0 and a MPLS 'push'
operation will result in an overwrite of the first 4 octets in the packet
L2 header (e.g. the Destination Address if eth0 is an Ethernet); the same
error pattern is present also in the MPLS 'pop' operation. Fix this error
in act_mpls data plane, computing 'mac_len' as the difference between the
network header and the mac header (when not at TC ingress), and use it in
MPLS 'push'/'pop' core functions.
v2: unbreak 'make htmldocs' because of missing documentation of 'mac_len'
in skb_mpls_pop(), reported by kbuild test robot
CC: Lorenzo Bianconi <lorenzo@kernel.org>
Fixes: 2a2ea50870 ("net: sched: add mpls manipulation actions to TC")
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Acked-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
the following script:
# tc qdisc add dev eth0 clsact
# tc filter add dev eth0 egress matchall action mpls pop
implicitly makes the kernel drop all packets transmitted by eth0, if they
don't have a MPLS header. This behavior is uncommon: other encapsulations
(like VLAN) just let the packet pass unmodified. Since the result of MPLS
'pop' operation would be the same regardless of the presence / absence of
MPLS header(s) in the original packet, we can let skb_mpls_pop() return 0
when dealing with non-MPLS packets.
For the OVS use-case, this is acceptable because __ovs_nla_copy_actions()
already ensures that MPLS 'pop' operation only occurs with packets having
an MPLS Ethernet type (and there are no other callers in current code, so
the semantic change should be ok).
v2: better documentation of use-cases for skb_mpls_pop(), thanks to Simon
Horman
Fixes: 2a2ea50870 ("net: sched: add mpls manipulation actions to TC")
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Acked-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch corrects the SPDX License Identifier style
in header files related to Cavium Ethernet drivers.
For C header files Documentation/process/license-rules.rst
mandates C-like comments (opposed to C source files where
C++ style should be used)
Changes made by using a script provided by Joe Perches here:
https://lkml.org/lkml/2019/2/7/46.
Suggested-by: Joe Perches <joe@perches.com>
Signed-off-by: Nishad Kamdar <nishadkamdar@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch corrects the SPDX License Identifier style
in header files related to Distributed Switch Architecture
drivers for Microchip KSZ series switch support.
For C header files Documentation/process/license-rules.rst
mandates C-like comments (opposed to C source files where
C++ style should be used)
Changes made by using a script provided by Joe Perches here:
https://lkml.org/lkml/2019/2/7/46.
Suggested-by: Joe Perches <joe@perches.com>
Signed-off-by: Nishad Kamdar <nishadkamdar@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is an arbitrary difference between the system resume and
runtime resume code paths for PCI devices regarding the delay to
apply when switching the devices from D3cold to D0.
Namely, pci_restore_standard_config() used in the runtime resume
code path calls pci_set_power_state() which in turn invokes
__pci_start_power_transition() to power up the device through the
platform firmware and that function applies the transition delay
(as per PCI Express Base Specification Revision 2.0, Section 6.6.1).
However, pci_pm_default_resume_early() used in the system resume
code path calls pci_power_up() which doesn't apply the delay at
all and that causes issues to occur during resume from
suspend-to-idle on some systems where the delay is required.
Since there is no reason for that difference to exist, modify
pci_power_up() to follow pci_set_power_state() more closely and
invoke __pci_start_power_transition() from there to call the
platform firmware to power up the device (in case that's necessary).
Fixes: db288c9c5f ("PCI / PM: restore the original behavior of pci_set_power_state()")
Reported-by: Daniel Drake <drake@endlessm.com>
Tested-by: Daniel Drake <drake@endlessm.com>
Link: https://lore.kernel.org/linux-pm/CAD8Lp44TYxrMgPLkHCqF9hv6smEurMXvmmvmtyFhZ6Q4SE+dig@mail.gmail.com/T/#m21be74af263c6a34f36e0fc5c77c5449d9406925
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: 3.10+ <stable@vger.kernel.org> # 3.10+
Pull virtio fixes from Michael Tsirkin:
"Some minor bugfixes"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
vhost/test: stop device before reset
tools/virtio: xen stub
tools/virtio: more stubs
virt device tree incorrectly uses 0xf0000000 on both sides of PCI IO
ports address space mapping. This results in incorrect port address
assignment in PCI IO BARs and subsequent crash on attempt to access
them. Use 0 as base address in PCI IO ports address space.
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Commit c312ef1763 "libata/ahci: Drop PCS quirk for Denverton and
beyond" got the polarity wrong on the check for which board-ids should
have the quirk applied. The board type board_ahci_pcs7 is defined at the
end of the list such that "pcs7" boards can be special cased in the
future if they need the quirk. All prior Intel board ids "<
board_ahci_pcs7" should proceed with applying the quirk.
Reported-by: Andreas Friedrich <afrie@gmx.net>
Reported-by: Stephen Douthit <stephend@silicom-usa.com>
Fixes: c312ef1763 ("libata/ahci: Drop PCS quirk for Denverton and beyond")
Cc: <stable@vger.kernel.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
After enabling CONFIG_IOMMU_DMA on X86 a new warning appears when
compiling vfio:
drivers/vfio/vfio_iommu_type1.c: In function ‘vfio_iommu_type1_attach_group’:
drivers/vfio/vfio_iommu_type1.c:1827:7: warning: ‘resv_msi_base’ may be used uninitialized in this function [-Wmaybe-uninitialized]
ret = iommu_get_msi_cookie(domain->domain, resv_msi_base);
~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The warning is a false positive, because the call to iommu_get_msi_cookie()
only happens when vfio_iommu_has_sw_msi() returned true. And that only
happens when it also set resv_msi_base.
But initialize the variable anyway to get rid of the warning.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
The current checking for failure on the number of ports fails when
-ENODEV is returned from the call to get_num_ports. Fix this by making
num_ports and loop counter i signed rather than unsigned ints. Also
add check for num_ports being less than zero to check for -ve error
returns.
Addresses-Coverity: ("Unsigned compared against 0")
Fixes: e2fea54e45 ("8250-men-mcb: add support for 16z025 and 16z057")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Michael Moese <mmoese@suse.de>
Link: https://lore.kernel.org/r/20191013220016.9369-1-colin.king@canonical.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Pull SCSI fixes from James Bottomley:
"Five changes, two in drivers (qla2xxx, zfcp), one to MAINTAINERS
(qla2xxx) and two in the core.
The last two are mostly about removing incorrect messages from the
kernel log: the resid message is definitely wrong and the sync cache
on protected drive problem is arguably wrong"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: MAINTAINERS: Update qla2xxx driver
scsi: zfcp: fix reaction on bit error threshold notification
scsi: core: save/restore command resid for error handling
scsi: qla2xxx: Remove WARN_ON_ONCE in qla2x00_status_cont_entry()
scsi: sd: Ignore a failure to sync cache due to lack of authorization
It seems that the right variable to use in this case is *i*, instead of
*n*, otherwise there is an undefined behavior when right shifiting by more
than 31 bits when multiplying n by 8; notice that *n* can take values
equal or greater than 4 (4, 8, 16, ...).
Also, notice that under the current conditions (bl = 3), we are skiping
the handling of bytes 3, 7, 31... So, fix this by updating this logic
and limit *bl* up to 4 instead of up to 3.
This fix is based on function udc_stuff_fifo().
Addresses-Coverity-ID: 1454834 ("Bad bit shift operation")
Fixes: 24a28e4283 ("USB: gadget driver for LPC32xx")
Cc: stable@vger.kernel.org
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Link: https://lore.kernel.org/r/20191014191830.GA10721@embeddedor
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Dequeuing implementation in cdns3_gadget_ep_dequeue gets first request from
deferred_req_list and changed TRB associated with it to LINK TRB.
This approach is incorrect because deferred_req_list contains requests
that have not been placed on hardware RING. In this case driver should
just giveback this request to gadget driver.
The patch implements new approach that first checks where dequeuing
request is located and only when it's on Transfer Ring then changes TRB
associated with it to LINK TRB.
During processing completed transfers such LINK TRB will be ignored.
Reported-by: Peter Chen <peter.chen@nxp.com>
Signed-off-by: Pawel Laszczak <pawell@cadence.com>
Fixes: 7733f6c32e ("usb: cdns3: Add Cadence USB3 DRD Driver")
Reviewed-by: Peter Chen <peter.chen@nxp.com>
Link: https://lore.kernel.org/r/1570958420-22196-1-git-send-email-pawell@cadence.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Use my kernel.org address for all entries in MAINTAINERS.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
phydev->dev_flags is entirely dependent on the PHY device driver which
is going to be used, setting the internal GENET PHY revision in those
bits only makes sense when drivers/net/phy/bcm7xxx.c is the PHY driver
being used.
Fixes: 487320c541 ("net: bcmgenet: communicate integrated PHY revision to PHY driver")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Doug Berger <opendmb@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
While invalidating the dst, we assign backhole_netdev instead of
loopback device. However, this device does not have idev pointer
and hence no ip6_ptr even if IPv6 is enabled. Possibly this has
triggered the syzbot reported crash.
The syzbot report does not have reproducer, however, this is the
only device that doesn't have matching idev created.
Crash instruction is :
static inline bool ip6_ignore_linkdown(const struct net_device *dev)
{
const struct inet6_dev *idev = __in6_dev_get(dev);
return !!idev->cnf.ignore_routes_with_linkdown; <= crash
}
Also ipv6 always assumes presence of idev and never checks for it
being NULL (as does the above referenced code). So adding a idev
for the blackhole_netdev to avoid this class of crashes in the future.
Signed-off-by: David S. Miller <davem@davemloft.net>
The commit e7774049ff ("ARM: dts: bcm283x: Define MMC interfaces at
board level") accidently dropped the bus width for the sdhci on the
RPi Zero W, because the board file was relying on the defaults
from bcm2835-rpi.dtsi. So fix this performance regression by adding
the bus width to the board file.
Fixes: e7774049ff ("ARM: dts: bcm283x: Define MMC interfaces at board level")
Reported-by: Phil Elwell <phil@raspberrypi.org>
Signed-off-by: Stefan Wahren <wahrenst@gmx.net>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
HAVE_FAST_GUP enables the lockless quick page table walker for simple
cases, and is a nice optimization for some random loads that can then
use get_user_pages_fast() rather than the more careful page walker.
However, for some unexplained reason, it seems to be subtly broken on
sparc64. The breakage is only with some compiler versions and some
hardware, and nobody seems to have figured out what triggers it,
although there's a simple reprodicer for the problem when it does
trigger.
The problem was introduced with the conversion to the generic GUP code
in commit 7b9afb86b6 ("sparc64: use the generic get_user_pages_fast
code"), but nothing looks obviously wrong in that conversion. It may be
a compiler bug that just hits us with the code reorganization. Or it
may be something very specific to sparc64.
This disables HAVE_FAST_GUP entirely. That makes things like futexes a
bit slower, but at least they work. If we can figure out the trigger,
that would be lovely, but it's been three months already..
Link: https://lore.kernel.org/lkml/20190717215956.GA30369@altlinux.org/
Fixes: 7b9afb86b6 ("sparc64: use the generic get_user_pages_fast code")
Reported-by: Dmitry V Levin <ldv@altlinux.org>
Reported-by: Anatoly Pugachev <matorola@gmail.com>
Requested-by: Meelis Roos <mroos@linux.ee>
Suggested-by: Christoph Hellwig <hch@infradead.org>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[Background]
Btrfs qgroup uses two types of reserved space for METADATA space,
PERTRANS and PREALLOC.
PERTRANS is metadata space reserved for each transaction started by
btrfs_start_transaction().
While PREALLOC is for delalloc, where we reserve space before joining a
transaction, and finally it will be converted to PERTRANS after the
writeback is done.
[Inconsistency]
However there is inconsistency in how we handle PREALLOC metadata space.
The most obvious one is:
In btrfs_buffered_write():
btrfs_delalloc_release_extents(BTRFS_I(inode), reserve_bytes, true);
We always free qgroup PREALLOC meta space.
While in btrfs_truncate_block():
btrfs_delalloc_release_extents(BTRFS_I(inode), blocksize, (ret != 0));
We only free qgroup PREALLOC meta space when something went wrong.
[The Correct Behavior]
The correct behavior should be the one in btrfs_buffered_write(), we
should always free PREALLOC metadata space.
The reason is, the btrfs_delalloc_* mechanism works by:
- Reserve metadata first, even it's not necessary
In btrfs_delalloc_reserve_metadata()
- Free the unused metadata space
Normally in:
btrfs_delalloc_release_extents()
|- btrfs_inode_rsv_release()
Here we do calculation on whether we should release or not.
E.g. for 64K buffered write, the metadata rsv works like:
/* The first page */
reserve_meta: num_bytes=calc_inode_reservations()
free_meta: num_bytes=0
total: num_bytes=calc_inode_reservations()
/* The first page caused one outstanding extent, thus needs metadata
rsv */
/* The 2nd page */
reserve_meta: num_bytes=calc_inode_reservations()
free_meta: num_bytes=calc_inode_reservations()
total: not changed
/* The 2nd page doesn't cause new outstanding extent, needs no new meta
rsv, so we free what we have reserved */
/* The 3rd~16th pages */
reserve_meta: num_bytes=calc_inode_reservations()
free_meta: num_bytes=calc_inode_reservations()
total: not changed (still space for one outstanding extent)
This means, if btrfs_delalloc_release_extents() determines to free some
space, then those space should be freed NOW.
So for qgroup, we should call btrfs_qgroup_free_meta_prealloc() other
than btrfs_qgroup_convert_reserved_meta().
The good news is:
- The callers are not that hot
The hottest caller is in btrfs_buffered_write(), which is already
fixed by commit 336a8bb8e3 ("btrfs: Fix wrong
btrfs_delalloc_release_extents parameter"). Thus it's not that
easy to cause false EDQUOT.
- The trans commit in advance for qgroup would hide the bug
Since commit f5fef45936 ("btrfs: qgroup: Make qgroup async transaction
commit more aggressive"), when btrfs qgroup metadata free space is slow,
it will try to commit transaction and free the wrongly converted
PERTRANS space, so it's not that easy to hit such bug.
[FIX]
So to fix the problem, remove the @qgroup_free parameter for
btrfs_delalloc_release_extents(), and always pass true to
btrfs_inode_rsv_release().
Reported-by: Filipe Manana <fdmanana@suse.com>
Fixes: 43b18595d6 ("btrfs: qgroup: Use separate meta reservation type for delalloc")
CC: stable@vger.kernel.org # 4.19+
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Kalle Valo says:
====================
wireless-drivers fixes for 5.4
Second set of fixes for 5.4. ath10k regression and iwlwifi BAD_COMMAND
bug are the ones getting most reports at the moment.
ath10k
* fix throughput regression on QCA98XX
iwlwifi
* fix initialization of 3168 devices (the infamous BAD_COMMAND bug)
* other smaller fixes
rt2x00
* don't include input-polldev.h header
* fix hw reset to work during first 5 minutes of system run
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Panfrost uses multiple schedulers (one for each slot, so 2 in reality),
and on a timeout has to stop all the schedulers to safely perform a
reset. However more than one scheduler can trigger a timeout at the same
time. This race condition results in jobs being freed while they are
still in use.
When stopping other slots use cancel_delayed_work_sync() to ensure that
any timeout started for that slot has completed. Also use
mutex_trylock() to obtain reset_lock. This means that only one thread
attempts the reset, the other threads will simply complete without doing
anything (the first thread will wait for this in the call to
cancel_delayed_work_sync()).
While we're here and since the function is already dependent on
sched_job not being NULL, let's remove the unnecessary checks.
Fixes: aa20236784 ("drm/panfrost: Prevent concurrent resets")
Tested-by: Neil Armstrong <narmstrong@baylibre.com>
Signed-off-by: Steven Price <steven.price@arm.com>
Cc: stable@vger.kernel.org
Signed-off-by: Rob Herring <robh@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20191009094456.9704-1-steven.price@arm.com
Pull parisc fixes from Helge Deller:
- Fix a parisc-specific fallout of Christoph's
dma_set_mask_and_coherent() patches (Sven)
- Fix a vmap memory leak in ioremap()/ioremap() (Helge)
- Some minor cleanups and documentation updates (Nick, Helge)
* 'parisc-5.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Remove 32-bit DMA enforcement from sba_iommu
parisc: Fix vmap memory leak in ioremap()/iounmap()
parisc: prefer __section from compiler_attributes.h
parisc: sysctl.c: Use CONFIG_PARISC instead of __hppa_ define
MAINTAINERS: Add hp_sdc drivers to parisc arch
Pull dmi fix from Jean Delvare.
* 'dmi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging:
firmware: dmi: Fix unlikely out-of-bounds read in save_mem_devices
rq_qos_del() incorrectly assigns the node being deleted to the head if
it was the first on the list in the !prev path. Fix it by iterating
with ** instead.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Josef Bacik <josef@toxicpanda.com>
Fixes: a79050434b ("blk-rq-qos: refactor out common elements of blk-wbt")
Cc: stable@vger.kernel.org # v4.19+
Signed-off-by: Jens Axboe <axboe@kernel.dk>
blkcg_activate_policy() has the following bugs.
* cf09a8ee19 ("blkcg: pass @q and @blkcg into
blkcg_pol_alloc_pd_fn()") added @blkcg to ->pd_alloc_fn(); however,
blkcg_activate_policy() ends up using pd's allocated for the root
blkcg for all preallocations, so ->pd_init_fn() for non-root blkcgs
can be passed in pd's which are allocated for the root blkcg.
For blk-iocost, this means that ->pd_init_fn() can write beyond the
end of the allocated object as it determines the length of the flex
array at the end based on the blkcg's nesting level.
* Each pd is initialized as they get allocated. If alloc fails, the
policy will get freed with pd's initialized on it.
* After the above partial failure, the partial pds are not freed.
This patch fixes all the above issues by
* Restructuring blkcg_activate_policy() so that alloc and init passes
are separate. Init takes place only after all allocs succeeded and
on failure all allocated pds are freed.
* Unifying and fixing the cleanup of the remaining pd_prealloc.
Signed-off-by: Tejun Heo <tj@kernel.org>
Fixes: cf09a8ee19 ("blkcg: pass @q and @blkcg into blkcg_pol_alloc_pd_fn()")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
64-bit time is a signed quantity in the kernel, so the bulkstat
structure should reflect that. Note that the structure size stays
the same and that we have not yet published userspace headers for this
new ioctl so there are no users to break.
Fixes: 7035f9724f ("xfs: introduce new v5 bulkstat structure")
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
There is a warning message in my test with below steps:
# rbd bench --io-type write --io-size 4K --io-threads 1 --io-pattern rand test &
# sleep 5
# pkill -9 rbd
# rbd map test &
# sleep 5
# pkill rbd
The reason is that the rbd_add_acquire_lock() is interruptable,
that means, when we kill the waiting on ->acquire_wait, the lock_dwork
could be still running.
1. do_rbd_add() 2. lock_dwork
rbd_add_acquire_lock()
- queue_delayed_work()
lock_dwork queued
- wait_for_completion_killable_timeout() <-- kill happen
rbd_dev_image_unlock() <-- UNLOCKED now, nothing to do.
rbd_dev_device_release()
rbd_dev_image_release()
- ...
lock successed here
- cancel_delayed_work_sync(&rbd_dev->lock_dwork)
Then when we reach the rbd_dev_free(), WARN_ON is triggered because
lock_state is not RBD_LOCK_STATE_UNLOCKED.
To fix it, this commit make sure the lock_dwork was finished before
calling rbd_dev_image_unlock().
On the other hand, this would not happend in do_rbd_remove(), because
after rbd mapped, lock_dwork will only be queued for IO request, and
request will continue unless lock_dwork finished. when we call
rbd_dev_image_unlock() in do_rbd_remove(), all requests are done.
That means, lock_state should not be locked again after
rbd_dev_image_unlock().
[ Cancel lock_dwork in rbd_add_acquire_lock(), only if the wait is
interrupted. ]
Fixes: 637cd06053 ("rbd: new exclusive lock wait/wake code")
Signed-off-by: Dongsheng Yang <dongsheng.yang@easystack.cn>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
In the future, we're going to want to extend the ceph_reply_info_extra
for create replies. Currently though, the kernel code doesn't accept an
extra blob that is larger than the expected data.
Change the code to skip over any unrecognized fields at the end of the
extra blob, rather than returning -EIO.
Cc: stable@vger.kernel.org
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
To pick the changes in:
344c6c8047 ("KVM/Hyper-V: Add new KVM capability KVM_CAP_HYPERV_DIRECT_TLBFLUSH")
dee04eee91 ("KVM: RISC-V: Add KVM_REG_RISCV for ONE_REG interface")
These trigger the rebuild of this object:
CC /tmp/build/perf/trace/beauty/ioctl.o
But do not result in any change in tooling, as the additions are not
being used in any table generatator.
This silences this perf build warning:
Warning: Kernel ABI header at 'tools/include/uapi/linux/kvm.h' differs from latest version at 'include/uapi/linux/kvm.h'
diff -u tools/include/uapi/linux/kvm.h include/uapi/linux/kvm.h
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Anup Patel <Anup.Patel@wdc.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Tianyu Lan <Tianyu.Lan@microsoft.com>
Link: https://lkml.kernel.org/n/tip-d1v48a0qfoe98u5v9tn3mu5u@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To pick the changes in:
0cb8410b90 ("kvm: svm: Intercept RDPRU")
That trigger a rebuild in too in tooling:
CC /tmp/build/perf/arch/x86/util/kvm-stat.o
But this time around no changes in tooling results, as SVM_EXIT_RDPRU
wasn't added to SVM_EXIT_REASONS, that is used in kvm-stat.c.
And addresses this perf build warnings:
Warning: Kernel ABI header at 'tools/arch/x86/include/uapi/asm/svm.h' differs from latest version at 'arch/x86/include/uapi/asm/svm.h'
diff -u tools/arch/x86/include/uapi/asm/svm.h arch/x86/include/uapi/asm/svm.h
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jim Mattson <jmattson@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Link: https://lkml.kernel.org/n/tip-pqzkt1hmfpqph3ts8i6zzmim@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To pick the changes in:
bf653b78f9 ("KVM: vmx: Introduce handle_unexpected_vmexit and handle WAITPKG vmexit")
That trigger these changes in tooling:
CC /tmp/build/perf/arch/x86/util/kvm-stat.o
INSTALL GTK UI
DESCEND plugins
make[3]: Nothing to be done for '/tmp/build/perf/plugins/libtraceevent-dynamic-list'.
INSTALL trace_plugins
LD /tmp/build/perf/arch/x86/util/perf-in.o
LD /tmp/build/perf/arch/x86/perf-in.o
LD /tmp/build/perf/arch/perf-in.o
LD /tmp/build/perf/perf-in.o
LINK /tmp/build/perf/perf
And this is not just because that header is included, kvm-stat.c
uses the VMX_EXIT_REASONS define and it got changed by the above cset.
And addresses this perf build warnings:
Warning: Kernel ABI header at 'tools/arch/x86/include/uapi/asm/vmx.h' differs from latest version at 'arch/x86/include/uapi/asm/vmx.h'
diff -u tools/arch/x86/include/uapi/asm/vmx.h arch/x86/include/uapi/asm/vmx.h
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Tao Xu <tao3.xu@intel.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-gr1eel0hckmi5l3p2ewdpfxh@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Now we recalculate the sequence of timeout with 'req->sequence =
ctx->cached_sq_head + count - 1', judge the right place to insert
for timeout_list by compare the number of request we still expected for
completion. But we have not consider about the situation of overflow:
1. ctx->cached_sq_head + count - 1 may overflow. And a bigger count for
the new timeout req can have a small req->sequence.
2. cached_sq_head of now may overflow compare with before req. And it
will lead the timeout req with small req->sequence.
This overflow will lead to the misorder of timeout_list, which can lead
to the wrong order of the completion of timeout_list. Fix it by reuse
req->submit.sequence to store the count, and change the logic of
inserting sort in io_timeout.
Signed-off-by: yangerkun <yangerkun@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
In the earlier fix for the memory overrun of id arrays I managed to typo
the wrong event in the fix.
Of course we need to close the current event in the loop, not the
original failing event.
The same test case as in the original patch still passes.
Fixes: 7834fa948b ("perf evlist: Fix access of freed id arrays")
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Link: http://lore.kernel.org/lkml/20191011182140.8353-2-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The build of file libperf-jvmti.so succeeds but the resulting
object fails to load:
# ~/linux/tools/perf/perf record -k mono -- java \
-XX:+PreserveFramePointer \
-agentpath:/root/linux/tools/perf/libperf-jvmti.so \
hog 100000 123450
Error occurred during initialization of VM
Could not find agent library /root/linux/tools/perf/libperf-jvmti.so
in absolute path, with error:
/root/linux/tools/perf/libperf-jvmti.so: undefined symbol: _ctype
Add the missing _ctype symbol into the build script.
Fixes: 79743bc927 ("perf jvmti: Link against tools/lib/string.o to have weak strlcpy()")
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Link: http://lore.kernel.org/lkml/20191008093841.59387-1-tmricht@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Fix bashism reported by checkbashisms by using only one '=':
possible bashism in scripts/setlocalversion line 96 (should be 'b = a'):
if [ "`hg log -r . --template '{latesttagdistance}'`" == "1" ]; then
Fixes: 38b3439d84 ("setlocalversion: update mercurial tag parsing")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Mike Crowe <mcrowe@zipitwireless.com>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Virtio-fs does not accept any mount options, so it's confusing and wrong to
show any in /proc/mounts.
Reported-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
During nvme_tcp_setup_cmd_pdu error flow, one must call nvme_cleanup_cmd
since it's symmetric to nvme_setup_cmd.
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
During nvme_loop_queue_rq error flow, one must call nvme_cleanup_cmd since
it's symmetric to nvme_setup_cmd.
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
The patch 32b593bfcb ("Btrfs: remove no longer used function to run
delayed refs asynchronously") removed the async delayed refs but the
thread has been created, without any use. Remove it to avoid resource
consumption.
Fixes: 32b593bfcb ("Btrfs: remove no longer used function to run delayed refs asynchronously")
CC: stable@vger.kernel.org # 5.2+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
IOMMU Event Log encodes 20-bit PASID for events:
ILLEGAL_DEV_TABLE_ENTRY
IO_PAGE_FAULT
PAGE_TAB_HARDWARE_ERROR
INVALID_DEVICE_REQUEST
as:
PASID[15:0] = bit 47:32
PASID[19:16] = bit 19:16
Note that INVALID_PPR_REQUEST event has different encoding
from the rest of the events as the following:
PASID[15:0] = bit 31:16
PASID[19:16] = bit 45:42
So, fixes the decoding logic.
Fixes: d64c0486ed ("iommu/amd: Update the PASID information printed to the system log")
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Gary R Hook <gary.hook@amd.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
As platform_get_irq() now prints an error when the interrupt does not
exist, calling it gratuitously causes scary messages like:
ipmmu-vmsa e6740000.mmu: IRQ index 0 not found
Fix this by moving the call to platform_get_irq() down, where the
existence of the interrupt is mandatory.
Fixes: 7723f4c5ec ("driver core: platform: Add an error message to platform_get_irq*()")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Tested-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Till now the Rockchip iommu driver walked through the irq list via
platform_get_irq() until it encountered an ENXIO error. With the
recent change to add a central error message, this always results
in such an error for each iommu on probe and shutdown.
To not confuse people, switch to platform_count_irqs() to get the
actual number of interrupts before walking through them.
Fixes: 7723f4c5ec ("driver core: platform: Add an error message to platform_get_irq*()")
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Tested-by: Enric Balletbo i Serra <enric.balletbo@collabora.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
If we terminate the channel to free all descriptors associated with this
channel, we will leak the memory of current descriptor if the current
descriptor is not completed, since it had been deteled from the desc_issued
list and have not been added into the desc_completed list.
Thus we should check if current descriptor is completed or not, when freeing
the descriptors associated with one channel, if not, we should free it to
avoid this issue.
Fixes: 9b3b8171f7 ("dmaengine: sprd: Add Spreadtrum DMA driver")
Reported-by: Zhenfang Wang <zhenfang.wang@unisoc.com>
Tested-by: Zhenfang Wang <zhenfang.wang@unisoc.com>
Signed-off-by: Baolin Wang <baolin.wang@linaro.org>
Link: https://lore.kernel.org/r/170dbbc6d5366b6fa974ce2d366652e23a334251.1570609788.git.baolin.wang@linaro.org
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Check that the per-cpu cluster mask pointer has been set prior to
clearing a dying cpu's bit. The per-cpu pointer is not set until the
target cpu reaches smp_callin() during CPUHP_BRINGUP_CPU, whereas the
teardown function, x2apic_dead_cpu(), is associated with the earlier
CPUHP_X2APIC_PREPARE. If an error occurs before the cpu is awakened,
e.g. if do_boot_cpu() itself fails, x2apic_dead_cpu() will dereference
the NULL pointer and cause a panic.
smpboot: do_boot_cpu failed(-22) to wakeup CPU#1
BUG: kernel NULL pointer dereference, address: 0000000000000008
RIP: 0010:x2apic_dead_cpu+0x1a/0x30
Call Trace:
cpuhp_invoke_callback+0x9a/0x580
_cpu_up+0x10d/0x140
do_cpu_up+0x69/0xb0
smp_init+0x63/0xa9
kernel_init_freeable+0xd7/0x229
? rest_init+0xa0/0xa0
kernel_init+0xa/0x100
ret_from_fork+0x35/0x40
Fixes: 023a611748 ("x86/apic/x2apic: Simplify cluster management")
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20191001205019.5789-1-sean.j.christopherson@intel.com
Now that there's Hyper-V IOMMU driver, Linux can switch to x2apic mode
when supported by the vcpus.
However, the apic access functions for Hyper-V enlightened apic assume
xapic mode only.
As a result, Linux fails to bring up secondary cpus when run as a guest
in QEMU/KVM with both hv_apic and x2apic enabled.
According to Michael Kelley, when in x2apic mode, the Hyper-V synthetic
apic MSRs behave exactly the same as the corresponding architectural
x2apic MSRs, so there's no need to override the apic accessors. The
only exception is hv_apic_eoi_write, which benefits from lazy EOI when
available; however, its implementation works for both xapic and x2apic
modes.
Fixes: 29217a4746 ("iommu/hyper-v: Add Hyper-V stub IOMMU driver")
Fixes: 6b48cb5f83 ("X86/Hyper-V: Enlighten APIC access")
Suggested-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Roman Kagan <rkagan@virtuozzo.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20191010123258.16919-1-rkagan@virtuozzo.com
There is a bug in create_safe_exec_page(), when page table is allocated
it is not checked that table is allocated successfully:
But it is dereferenced in: pgd_none(READ_ONCE(*pgdp)). Check that
allocation was successful.
Fixes: 82869ac57b ("arm64: kernel: Add support for hibernate/suspend-to-disk")
Reviewed-by: James Morse <james.morse@arm.com>
Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
Signed-off-by: Will Deacon <will@kernel.org>
If CONFIG_ARM64_SVE=n then we fail to report ID_AA64ZFR0_EL1 as 0 when
read by userspace, despite being required by the architecture. Although
this is theoretically a change in ABI, userspace will first check for
the presence of SVE via the HWCAP or the ID_AA64PFR0_EL1.SVE field
before probing the ID_AA64ZFR0_EL1 register. Given that these are
reported correctly for this configuration, we can safely tighten up the
current behaviour.
Ensure ID_AA64ZFR0_EL1 is treated as RAZ when CONFIG_ARM64_SVE=n.
Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Dave Martin <dave.martin@arm.com>
Fixes: 06a916feca ("arm64: Expose SVE2 features for userspace")
Signed-off-by: Will Deacon <will@kernel.org>
Igor Russkikh says:
====================
Aquantia/Marvell AQtion atlantic driver fixes 10/2019
Here is a set of various bugfixes, to be considered for stable as well.
V2: double space removed
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
macvlan and multicast handling is now mixed up.
The explicit issue is that macvlan interface gets broken (no traffic)
after clearing MULTICAST flag on the real interface.
We now do separate logic and consider both ALLMULTI and MULTICAST
flags on the device.
Fixes: 11ba961c91 ("net: aquantia: Fix IFF_ALLMULTI flag functionality")
Signed-off-by: Dmitry Bogdanov <dmitry.bogdanov@aquantia.com>
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Individual descriptors on LRO TCP session should be checked
for CRC errors. It was discovered that HW recalculates
L4 checksums on LRO session and does not break it up on bad L4
csum.
Thus, driver should aggregate HW LRO L4 statuses from all individual
buffers of LRO session and drop packet if one of the buffers has bad
L4 checksum.
Fixes: f38f1ee8ae ("net: aquantia: check rx csum for all packets in LRO session")
Signed-off-by: Dmitry Bogdanov <dmitry.bogdanov@aquantia.com>
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
>From HW specification to correctly reset HW caches (this is a required
workaround when stopping the device), register bit should actually
be toggled.
It was previosly always just set. Due to the way driver stops HW this
never actually caused any issues, but it still may, so cleaning this up.
Fixes: 7a1bb49461 ("net: aquantia: fix potential IOMMU fault after driver unbind")
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Chip temperature is a two byte word, colocated internally with cable
length data. We do all readouts from HW memory by dwords, thus
we should clear extra high bytes, otherwise temperature output
gets weird as soon as we attach a cable to the NIC.
Fixes: 8f89401186 ("net: aquantia: add infrastructure to readout chip temperature")
Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Merge more fixes from Andrew Morton:
"The usual shower of hotfixes and some followups to the recently merged
page_owner enhancements"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
mm/memory-failure: poison read receives SIGKILL instead of SIGBUS if mmaped more than once
mm/slab.c: fix kernel-doc warning for __ksize()
xarray.h: fix kernel-doc warning
bitmap.h: fix kernel-doc warning and typo
fs/fs-writeback.c: fix kernel-doc warning
fs/libfs.c: fix kernel-doc warning
fs/direct-io.c: fix kernel-doc warning
mm, compaction: fix wrong pfn handling in __reset_isolation_pfn()
mm, hugetlb: allow hugepage allocations to reclaim as needed
lib/test_meminit: add a kmem_cache_alloc_bulk() test
mm/slub.c: init_on_free=1 should wipe freelist ptr for bulk allocations
lib/generic-radix-tree.c: add kmemleak annotations
mm/slub: fix a deadlock in show_slab_objects()
mm, page_owner: rename flag indicating that page is allocated
mm, page_owner: decouple freeing stack trace from debug_pagealloc
mm, page_owner: fix off-by-one error in __set_page_owner_handle()
We switch the default handler to be handle_bad_irq() instead of
handle_simple_irq() (which was not correct anyway).
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
The driver wants to initialize related registers before IRQ chip will be added.
That's why move it to a corresponding callback. It also fixes the NULL pointer
dereference.
Fixes: 8f86a5b4ad ("gpio: merrifield: Pass irqchip when adding gpiochip")
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
The driver wants to initialize related registers before IRQ chip will be added.
That's why move it to a corresponding callback. It also fixes the NULL pointer
dereference.
Fixes: 7b1e889436 ("gpio: lynxpoint: Pass irqchip when adding gpiochip")
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
The driver wants to initialize related registers before IRQ chip will be added.
That's why move it to a corresponding callback. It also fixes the NULL pointer
dereference.
Fixes: 8069e69a97 ("gpio: intel-mid: Pass irqchip when adding gpiochip")
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
After changing the drivers to use GPIO core to add an IRQ chip
it appears that some of them requires a hardware initialization
before adding the IRQ chip.
Add an optional callback ->init_hw() to allow that drivers
to initialize hardware if needed.
This change is a part of the fix NULL pointer dereference
brought to the several drivers recently.
Cc: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
During conversion to internal IRQ chip initialization the commit
8f86a5b4ad ("gpio: merrifield: Pass irqchip when adding gpiochip")
lost the irq_base assignment.
drivers/gpio/gpio-merrifield.c: In function ‘mrfld_gpio_probe’:
drivers/gpio/gpio-merrifield.c:405:17: warning: variable ‘irq_base’ set but not used [-Wunused-but-set-variable]
Assign the girq->first to it.
Fixes: 8f86a5b4ad ("gpio: merrifield: Pass irqchip when adding gpiochip")
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Custom outs*/ins* implementations are long gone from the xtensa port,
remove matching EXPORT_SYMBOLs.
This fixes the following build warnings issued by modpost since commit
15bfc2348d ("modpost: check for static EXPORT_SYMBOL* functions"):
WARNING: "insb" [vmlinux] is a static EXPORT_SYMBOL
WARNING: "insw" [vmlinux] is a static EXPORT_SYMBOL
WARNING: "insl" [vmlinux] is a static EXPORT_SYMBOL
WARNING: "outsb" [vmlinux] is a static EXPORT_SYMBOL
WARNING: "outsw" [vmlinux] is a static EXPORT_SYMBOL
WARNING: "outsl" [vmlinux] is a static EXPORT_SYMBOL
Cc: stable@vger.kernel.org
Fixes: d38efc1f15 ("xtensa: adopt generic io routines")
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Mmap /dev/dax more than once, then read the poison location using
address from one of the mappings. The other mappings due to not having
the page mapped in will cause SIGKILLs delivered to the process.
SIGKILL succeeds over SIGBUS, so user process loses the opportunity to
handle the UE.
Although one may add MAP_POPULATE to mmap(2) to work around the issue,
MAP_POPULATE makes mapping 128GB of pmem several magnitudes slower, so
isn't always an option.
Details -
ndctl inject-error --block=10 --count=1 namespace6.0
./read_poison -x dax6.0 -o 5120 -m 2
mmaped address 0x7f5bb6600000
mmaped address 0x7f3cf3600000
doing local read at address 0x7f3cf3601400
Killed
Console messages in instrumented kernel -
mce: Uncorrected hardware memory error in user-access at edbe201400
Memory failure: tk->addr = 7f5bb6601000
Memory failure: address edbe201: call dev_pagemap_mapping_shift
dev_pagemap_mapping_shift: page edbe201: no PUD
Memory failure: tk->size_shift == 0
Memory failure: Unable to find user space address edbe201 in read_poison
Memory failure: tk->addr = 7f3cf3601000
Memory failure: address edbe201: call dev_pagemap_mapping_shift
Memory failure: tk->size_shift = 21
Memory failure: 0xedbe201: forcibly killing read_poison:22434 because of failure to unmap corrupted page
=> to deliver SIGKILL
Memory failure: 0xedbe201: Killing read_poison:22434 due to hardware memory corruption
=> to deliver SIGBUS
Link: http://lkml.kernel.org/r/1565112345-28754-3-git-send-email-jane.chu@oracle.com
Signed-off-by: Jane Chu <jane.chu@oracle.com>
Suggested-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Florian and Dave reported [1] a NULL pointer dereference in
__reset_isolation_pfn(). While the exact cause is unclear, staring at
the code revealed two bugs, which might be related.
One bug is that if zone starts in the middle of pageblock, block_page
might correspond to different pfn than block_pfn, and then the
pfn_valid_within() checks will check different pfn's than those accessed
via struct page. This might result in acessing an unitialized page in
CONFIG_HOLES_IN_ZONE configs.
The other bug is that end_page refers to the first page of next
pageblock and not last page of current pageblock. The online and valid
check is then wrong and with sections, the while (page < end_page) loop
might wander off actual struct page arrays.
[1] https://lore.kernel.org/linux-xfs/87o8z1fvqu.fsf@mid.deneb.enyo.de/
Link: http://lkml.kernel.org/r/20191008152915.24704-1-vbabka@suse.cz
Fixes: 6b0868c820 ("mm/compaction.c: correct zone boundary handling when resetting pageblock skip hints")
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reported-by: Florian Weimer <fw@deneb.enyo.de>
Reported-by: Dave Chinner <david@fromorbit.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit b39d0ee263 ("mm, page_alloc: avoid expensive reclaim when
compaction may not succeed") has chnaged the allocator to bail out from
the allocator early to prevent from a potentially excessive memory
reclaim. __GFP_RETRY_MAYFAIL is designed to retry the allocation,
reclaim and compaction loop as long as there is a reasonable chance to
make forward progress. Neither COMPACT_SKIPPED nor COMPACT_DEFERRED at
the INIT_COMPACT_PRIORITY compaction attempt gives this feedback.
The most obvious affected subsystem is hugetlbfs which allocates huge
pages based on an admin request (or via admin configured overcommit). I
have done a simple test which tries to allocate half of the memory for
hugetlb pages while the memory is full of a clean page cache. This is
not an unusual situation because we try to cache as much of the memory
as possible and sysctl/sysfs interface to allocate huge pages is there
for flexibility to allocate hugetlb pages at any time.
System has 1GB of RAM and we are requesting 515MB worth of hugetlb pages
after the memory is prefilled by a clean page cache:
root@test1:~# cat hugetlb_test.sh
set -x
echo 0 > /proc/sys/vm/nr_hugepages
echo 3 > /proc/sys/vm/drop_caches
echo 1 > /proc/sys/vm/compact_memory
dd if=/mnt/data/file-1G of=/dev/null bs=$((4<<10))
TS=$(date +%s)
echo 256 > /proc/sys/vm/nr_hugepages
cat /proc/sys/vm/nr_hugepages
The results for 2 consecutive runs on clean 5.3
root@test1:~# sh hugetlb_test.sh
+ echo 0
+ echo 3
+ echo 1
+ dd if=/mnt/data/file-1G of=/dev/null bs=4096
262144+0 records in
262144+0 records out
1073741824 bytes (1.1 GB) copied, 21.0694 s, 51.0 MB/s
+ date +%s
+ TS=1569905284
+ echo 256
+ cat /proc/sys/vm/nr_hugepages
256
root@test1:~# sh hugetlb_test.sh
+ echo 0
+ echo 3
+ echo 1
+ dd if=/mnt/data/file-1G of=/dev/null bs=4096
262144+0 records in
262144+0 records out
1073741824 bytes (1.1 GB) copied, 21.7548 s, 49.4 MB/s
+ date +%s
+ TS=1569905311
+ echo 256
+ cat /proc/sys/vm/nr_hugepages
256
Now with b39d0ee263 applied
root@test1:~# sh hugetlb_test.sh
+ echo 0
+ echo 3
+ echo 1
+ dd if=/mnt/data/file-1G of=/dev/null bs=4096
262144+0 records in
262144+0 records out
1073741824 bytes (1.1 GB) copied, 20.1815 s, 53.2 MB/s
+ date +%s
+ TS=1569905516
+ echo 256
+ cat /proc/sys/vm/nr_hugepages
11
root@test1:~# sh hugetlb_test.sh
+ echo 0
+ echo 3
+ echo 1
+ dd if=/mnt/data/file-1G of=/dev/null bs=4096
262144+0 records in
262144+0 records out
1073741824 bytes (1.1 GB) copied, 21.9485 s, 48.9 MB/s
+ date +%s
+ TS=1569905541
+ echo 256
+ cat /proc/sys/vm/nr_hugepages
12
The success rate went down by factor of 20!
Although hugetlb allocation requests might fail and it is reasonable to
expect them to under extremely fragmented memory or when the memory is
under a heavy pressure but the above situation is not that case.
Fix the regression by reverting back to the previous behavior for
__GFP_RETRY_MAYFAIL requests and disable the beail out heuristic for
those requests.
Mike said:
: hugetlbfs allocations are commonly done via sysctl/sysfs shortly after
: boot where this may not be as much of an issue. However, I am aware of at
: least three use cases where allocations are made after the system has been
: up and running for quite some time:
:
: - DB reconfiguration. If sysctl/sysfs fails to get required number of
: huge pages, system is rebooted to perform allocation after boot.
:
: - VM provisioning. If unable get required number of huge pages, fall
: back to base pages.
:
: - An application that does not preallocate pool, but rather allocates
: pages at fault time for optimal NUMA locality.
:
: In all cases, I would expect b39d0ee263 to cause regressions and
: noticable behavior changes.
:
: My quick/limited testing in
: https://lkml.kernel.org/r/3468b605-a3a9-6978-9699-57c52a90bd7e@oracle.com
: was insufficient. It was also mentioned that if something like
: b39d0ee263 went forward, I would like exemptions for __GFP_RETRY_MAYFAIL
: requests as in this patch.
[mhocko@suse.com: reworded changelog]
Link: http://lkml.kernel.org/r/20191007075548.12456-1-mhocko@kernel.org
Fixes: b39d0ee263 ("mm, page_alloc: avoid expensive reclaim when compaction may not succeed")
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
slab_alloc_node() already zeroed out the freelist pointer if
init_on_free was on. Thibaut Sautereau noticed that the same needs to
be done for kmem_cache_alloc_bulk(), which performs the allocations
separately.
kmem_cache_alloc_bulk() is currently used in two places in the kernel,
so this change is unlikely to have a major performance impact.
SLAB doesn't require a similar change, as auto-initialization makes the
allocator store the freelist pointers off-slab.
Link: http://lkml.kernel.org/r/20191007091605.30530-1-glider@google.com
Fixes: 6471384af2 ("mm: security: introduce init_on_alloc=1 and init_on_free=1 boot options")
Signed-off-by: Alexander Potapenko <glider@google.com>
Reported-by: Thibaut Sautereau <thibaut@sautereau.fr>
Reported-by: Kees Cook <keescook@chromium.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Laura Abbott <labbott@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
A long time ago we fixed a similar deadlock in show_slab_objects() [1].
However, it is apparently due to the commits like 01fb58bcba ("slab:
remove synchronous synchronize_sched() from memcg cache deactivation
path") and 03afc0e25f ("slab: get_online_mems for
kmem_cache_{create,destroy,shrink}"), this kind of deadlock is back by
just reading files in /sys/kernel/slab which will generate a lockdep
splat below.
Since the "mem_hotplug_lock" here is only to obtain a stable online node
mask while racing with NUMA node hotplug, in the worst case, the results
may me miscalculated while doing NUMA node hotplug, but they shall be
corrected by later reads of the same files.
WARNING: possible circular locking dependency detected
------------------------------------------------------
cat/5224 is trying to acquire lock:
ffff900012ac3120 (mem_hotplug_lock.rw_sem){++++}, at:
show_slab_objects+0x94/0x3a8
but task is already holding lock:
b8ff009693eee398 (kn->count#45){++++}, at: kernfs_seq_start+0x44/0xf0
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #2 (kn->count#45){++++}:
lock_acquire+0x31c/0x360
__kernfs_remove+0x290/0x490
kernfs_remove+0x30/0x44
sysfs_remove_dir+0x70/0x88
kobject_del+0x50/0xb0
sysfs_slab_unlink+0x2c/0x38
shutdown_cache+0xa0/0xf0
kmemcg_cache_shutdown_fn+0x1c/0x34
kmemcg_workfn+0x44/0x64
process_one_work+0x4f4/0x950
worker_thread+0x390/0x4bc
kthread+0x1cc/0x1e8
ret_from_fork+0x10/0x18
-> #1 (slab_mutex){+.+.}:
lock_acquire+0x31c/0x360
__mutex_lock_common+0x16c/0xf78
mutex_lock_nested+0x40/0x50
memcg_create_kmem_cache+0x38/0x16c
memcg_kmem_cache_create_func+0x3c/0x70
process_one_work+0x4f4/0x950
worker_thread+0x390/0x4bc
kthread+0x1cc/0x1e8
ret_from_fork+0x10/0x18
-> #0 (mem_hotplug_lock.rw_sem){++++}:
validate_chain+0xd10/0x2bcc
__lock_acquire+0x7f4/0xb8c
lock_acquire+0x31c/0x360
get_online_mems+0x54/0x150
show_slab_objects+0x94/0x3a8
total_objects_show+0x28/0x34
slab_attr_show+0x38/0x54
sysfs_kf_seq_show+0x198/0x2d4
kernfs_seq_show+0xa4/0xcc
seq_read+0x30c/0x8a8
kernfs_fop_read+0xa8/0x314
__vfs_read+0x88/0x20c
vfs_read+0xd8/0x10c
ksys_read+0xb0/0x120
__arm64_sys_read+0x54/0x88
el0_svc_handler+0x170/0x240
el0_svc+0x8/0xc
other info that might help us debug this:
Chain exists of:
mem_hotplug_lock.rw_sem --> slab_mutex --> kn->count#45
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(kn->count#45);
lock(slab_mutex);
lock(kn->count#45);
lock(mem_hotplug_lock.rw_sem);
*** DEADLOCK ***
3 locks held by cat/5224:
#0: 9eff00095b14b2a0 (&p->lock){+.+.}, at: seq_read+0x4c/0x8a8
#1: 0eff008997041480 (&of->mutex){+.+.}, at: kernfs_seq_start+0x34/0xf0
#2: b8ff009693eee398 (kn->count#45){++++}, at:
kernfs_seq_start+0x44/0xf0
stack backtrace:
Call trace:
dump_backtrace+0x0/0x248
show_stack+0x20/0x2c
dump_stack+0xd0/0x140
print_circular_bug+0x368/0x380
check_noncircular+0x248/0x250
validate_chain+0xd10/0x2bcc
__lock_acquire+0x7f4/0xb8c
lock_acquire+0x31c/0x360
get_online_mems+0x54/0x150
show_slab_objects+0x94/0x3a8
total_objects_show+0x28/0x34
slab_attr_show+0x38/0x54
sysfs_kf_seq_show+0x198/0x2d4
kernfs_seq_show+0xa4/0xcc
seq_read+0x30c/0x8a8
kernfs_fop_read+0xa8/0x314
__vfs_read+0x88/0x20c
vfs_read+0xd8/0x10c
ksys_read+0xb0/0x120
__arm64_sys_read+0x54/0x88
el0_svc_handler+0x170/0x240
el0_svc+0x8/0xc
I think it is important to mention that this doesn't expose the
show_slab_objects to use-after-free. There is only a single path that
might really race here and that is the slab hotplug notifier callback
__kmem_cache_shrink (via slab_mem_going_offline_callback) but that path
doesn't really destroy kmem_cache_node data structures.
[1] http://lkml.iu.edu/hypermail/linux/kernel/1101.0/02850.html
[akpm@linux-foundation.org: add comment explaining why we don't need mem_hotplug_lock]
Link: http://lkml.kernel.org/r/1570192309-10132-1-git-send-email-cai@lca.pw
Fixes: 01fb58bcba ("slab: remove synchronous synchronize_sched() from memcg cache deactivation path")
Fixes: 03afc0e25f ("slab: get_online_mems for kmem_cache_{create,destroy,shrink}")
Signed-off-by: Qian Cai <cai@lca.pw>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 8974558f49 ("mm, page_owner, debug_pagealloc: save and dump
freeing stack trace") enhanced page_owner to also store freeing stack
trace, when debug_pagealloc is also enabled. KASAN would also like to
do this [1] to improve error reports to debug e.g. UAF issues.
Kirill has suggested that the freeing stack trace saving should be also
possible to be enabled separately from KASAN or debug_pagealloc, i.e.
with an extra boot option. Qian argued that we have enough options
already, and avoiding the extra overhead is not worth the complications
in the case of a debugging option. Kirill noted that the extra stack
handle in struct page_owner requires 0.1% of memory.
This patch therefore enables free stack saving whenever page_owner is
enabled, regardless of whether debug_pagealloc or KASAN is also enabled.
KASAN kernels booted with page_owner=on will thus benefit from the
improved error reports.
[1] https://bugzilla.kernel.org/show_bug.cgi?id=203967
[vbabka@suse.cz: v3]
Link: http://lkml.kernel.org/r/20191007091808.7096-3-vbabka@suse.cz
Link: http://lkml.kernel.org/r/20190930122916.14969-3-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Qian Cai <cai@lca.pw>
Suggested-by: Dmitry Vyukov <dvyukov@google.com>
Suggested-by: Walter Wu <walter-zh.wu@mediatek.com>
Suggested-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Suggested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Suggested-by: Qian Cai <cai@lca.pw>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Patch series "followups to debug_pagealloc improvements through
page_owner", v3.
These are followups to [1] which made it to Linus meanwhile. Patches 1
and 3 are based on Kirill's review, patch 2 on KASAN request [2]. It
would be nice if all of this made it to 5.4 with [1] already there (or
at least Patch 1).
This patch (of 3):
As noted by Kirill, commit 7e2f2a0cd1 ("mm, page_owner: record page
owner for each subpage") has introduced an off-by-one error in
__set_page_owner_handle() when looking up page_ext for subpages. As a
result, the head page page_owner info is set twice, while for the last
tail page, it's not set at all.
Fix this and also make the code more efficient by advancing the page_ext
pointer we already have, instead of calling lookup_page_ext() for each
subpage. Since the full size of struct page_ext is not known at compile
time, we can't use a simple page_ext++ statement, so introduce a
page_ext_next() inline function for that.
Link: http://lkml.kernel.org/r/20190930122916.14969-2-vbabka@suse.cz
Fixes: 7e2f2a0cd1 ("mm, page_owner: record page owner for each subpage")
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reported-by: Kirill A. Shutemov <kirill@shutemov.name>
Reported-by: Miles Chen <miles.chen@mediatek.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Walter Wu <walter-zh.wu@mediatek.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
__get_user_[no]check uses temporary buffer of type long to store result
of __get_user_size and do sign extension on it when necessary. This
doesn't work correctly for 64-bit data. Fix it by moving temporary
buffer/sign extension logic to __get_user_asm.
Don't do assignment of __get_user_bad result to (x) as it may not always
be integer-compatible now and issue warning even when it's going to be
optimized. Instead do (x) = 0; and call __get_user_bad separately.
Zero initialize __x in __get_user_asm and use '+' constraint for its
assembly argument, so that its value is preserved in error cases. This
may add at most 1 cycle to the fast path, but saves an instruction and
two padding bytes in the fixup section for each use of this macro and
works for both misaligned store and store exception.
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Numeric assembly arguments are hard to understand and assembly code that
uses them is hard to modify. Use named arguments in __check_align_*,
__get_user_asm and __put_user_asm. Modify macro parameter names so that
they don't affect argument names. Use '+' constraint for the [err]
argument instead of having it as both input and output.
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
A BIO based request queue does not have a tag_set, which prevent testing
for the flag BLK_MQ_F_NO_SCHED indicating that the queue does not
require an elevator. This leads to an incorrect initialization of a
default elevator in some cases such as BIO based null_blk
(queue_mode == BIO) with zoned mode enabled as the default elevator in
this case is mq-deadline instead of "none".
Fix this by testing for a NULL queue mq_ops field which indicates that
the queue is BIO based and should not have an elevator.
Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Reviewed-by: Bob Liu <bob.liu@oracle.com>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This breaks booting from sata_sil24 with the recent DMA change.
According to James Bottomley this was in to improve performance by
kicking the device into 32 bit descriptors, which are usually more
efficient, especially with older dual descriptor format cards like we
have on parisc systems.
Remove it for now to make DMA working again.
Fixes: dcc02c19cc ("sata_sil24: use dma_set_mask_and_coherent")
Signed-off-by: Sven Schnelle <svens@stackframe.org>
Signed-off-by: Helge Deller <deller@gmx.de>
Sven noticed that calling ioremap() and iounmap() multiple times leads
to a vmap memory leak:
vmap allocation for size 4198400 failed:
use vmalloc=<size> to increase size
It seems we missed calling vunmap() in iounmap().
Signed-off-by: Helge Deller <deller@gmx.de>
Noticed-by: Sven Schnelle <svens@stackframe.org>
Cc: <stable@vger.kernel.org> # v3.16+
Before reading the Extended Size field, we should ensure it fits in
the DMI record. There is already a record length check but it does
not cover that field.
It would take a seriously corrupted DMI table to hit that bug, so no
need to worry, but we should still fix it.
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Fixes: 6deae96b42 ("firmware, DMI: Add function to look up a handle and return DIMM size")
Cc: Tony Luck <tony.luck@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Remove a confusing comment on our local_flush_tlb_all()
implementation. Per an internal discussion with Andrew, while it's
true that the fence.i is not necessary, it's not the case that an
sfence.vma implies a fence.i. We also drop the section about
"flush[ing] the entire local TLB" to better align with the language in
section 4.2.1 "Supervisor Memory-Management Fence Instruction" of the
RISC-V Privileged Specification v20190608.
Fixes: c901e45a99 ("RISC-V: `sfence.vma` orderes the instruction cache")
Reported-by: Alan Kao <alankao@andestech.com>
Cc: Palmer Dabbelt <palmer@sifive.com>
Cc: Andrew Waterman <andrew@sifive.com>
Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
Add a default "stdout-path" to the kernel DTS file, as is present in many
of the board DTS files elsewhere in the kernel tree. With this line
present, earlyconsole can be enabled by simply passing "earlycon" on the
kernel command line. No specific device details are necessary, since the
kernel will use the stdout-path as the default.
Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
Reviewed-by: Atish Patra <atish.patra@wdc.com>
The dst in bpf_input() has lwtstate field set. As it is of the
LWTUNNEL_ENCAP_BPF type, lwtstate->data is struct bpf_lwt. When the bpf
program returns BPF_LWT_REROUTE, ip_route_input_noref is directly called on
this skb. This causes invalid memory access, as ip_route_input_slow calls
skb_tunnel_info(skb) that expects the dst->lwstate->data to be
struct ip_tunnel_info. This results to struct bpf_lwt being accessed as
struct ip_tunnel_info.
Drop the dst before calling the IP route input functions (both for IPv4 and
IPv6).
Reported by KASAN.
Fixes: 3bd0b15281 ("bpf: add handling of BPF_LWT_REROUTE to lwt_bpf.c")
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Peter Oskolkov <posk@google.com>
Link: https://lore.kernel.org/bpf/111664d58fe4e9dd9c8014bb3d0b2dab93086a9e.1570609794.git.jbenc@redhat.com
First of all, on short copies __copy_{to,from}_user() return the amount
of bytes left uncopied, *not* -EFAULT. get_user() and put_user() are
expected to return -EFAULT on failure.
Another problem is get_user(v32, (__u64 __user *)p); that should
fetch 64bit value and the assign it to v32, truncating it in process.
Current code, OTOH, reads 8 bytes of data and stores them at the
address of v32, stomping on the 4 bytes that follow v32 itself.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Pull irqchip fixes from Marc Zyngier:
- Add retrigger support to Amazon's al-fic driver
- Add SAM9X60 support to Atmel's AIC5 irqchip
- Fix GICv3 maximum interrupt calculation
- Convert SiFive's PLIC to the fasteoi IRQ flow
In case of an error (e.g. memory pool too small), kmemleak disables
itself and cleans up the already allocated metadata objects. However, if
this happens early before the RCU callback mechanism is available,
put_object() skips call_rcu() and frees the object directly. This is not
safe with the RCU list traversal in __kmemleak_do_cleanup().
Change the list traversal in __kmemleak_do_cleanup() to
list_for_each_entry_safe() and remove the rcu_read_{lock,unlock} since
the kmemleak is already disabled at this point. In addition, avoid an
unnecessary metadata object rb-tree look-up since it already has the
struct kmemleak_object pointer.
Fixes: c566586818 ("mm: kmemleak: use the memory pool for early allocations")
Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reported-by: Marc Dionne <marc.c.dionne@gmail.com>
Reported-by: Ted Ts'o <tytso@mit.edu>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The access to sk->sk_ll_usec should be hidden behind
CONFIG_NET_RX_BUSY_POLL like the definition of sk_ll_usec.
Put access to ->sk_ll_usec behind CONFIG_NET_RX_BUSY_POLL.
Fixes: 1a9460cef5 ("nvme-tcp: support simple polling")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Prevent simultaneous controller disabling/enabling tasks from interfering
with each other through a function to wait until the task successfully
transitioned the controller to the RESETTING state. This ensures disabling
the controller will not be interrupted by another reset path, otherwise
a concurrent reset may leave the controller in the wrong state.
Tested-by: Edmund Nadolski <edmund.nadolski@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
A paused controller is doing critical internal activation work in the
background. Prevent subsequent controller resets from occurring during
this period by setting the controller state to RESETTING first. A helper
function, nvme_try_sched_reset_work(), is introduced for these paths so
they may continue with scheduling the reset_work after they've completed
their uninterruptible critical section.
Tested-by: Edmund Nadolski <edmund.nadolski@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
A controller in the resetting state has not yet completed its recovery
actions. The pci and fc transports were already handling this, so update
the remaining transports to not attempt additional recovery in this
state. Instead, just restart the request timer.
Tested-by: Edmund Nadolski <edmund.nadolski@intel.com>
Reviewed-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
The admin only state was intended to fence off actions that don't
apply to a non-IO capable controller. The only actual user of this is
the scan_work, and pci was the only transport to ever set this state.
The consequence of having this state is placing an additional burden on
every other action that applies to both live and admin only controllers.
Remove the admin only state and place the admin only burden on the only
place that actually cares: scan_work.
This also prepares to make it easier to temporarily pause a LIVE state
so that we don't need to remember which state the controller had been in
prior to the pause.
Tested-by: Edmund Nadolski <edmund.nadolski@intel.com>
Reviewed-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
If a controller becomes degraded after a reset, we will not be able to
perform any IO. We currently teardown previously created request
queues and namespaces, but we had kept the unusable tagset. Free
it after all queues using it have been released.
Tested-by: Edmund Nadolski <edmund.nadolski@intel.com>
Reviewed-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Followup to commit dd2261ed45 ("hrtimer: Protect lockless access
to timer->base")
lock_hrtimer_base() fetches timer->base without lock exclusion.
Compiler is allowed to read timer->base twice (even if considered dumb)
which could end up trying to lock migration_base and return
&migration_base.
base = timer->base;
if (likely(base != &migration_base)) {
/* compiler reads timer->base again, and now (base == &migration_base)
raw_spin_lock_irqsave(&base->cpu_base->lock, *flags);
if (likely(base == timer->base))
return base; /* == &migration_base ! */
Similarly the write sides must use WRITE_ONCE() to avoid store tearing.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20191008173204.180879-1-edumazet@google.com
Currently the exit return path when sme->key_idx >= NUM_WEPKEYS is via
label 'exit' and this checks if result is non-zero, however result has
not been initialized and contains garbage. Fix this by replacing the
goto with a return with the error code.
Addresses-Coverity: ("Uninitialized scalar variable")
Fixes: 0ca6d8e744 ("Staging: wlan-ng: replace switch-case statements with macro")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Cc: stable <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20191014110201.9874-1-colin.king@canonical.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Since commit 2eba69071b ("drm/msm: Remove Kconfig default") the
CONFIG_DRM_MSM option is no longer selected by default on i.MX5.
Explicitly select CONFIG_DRM_MSM so that we can get GPU support
by default on i.MX51 and i.MX53.
Fixes: 2eba69071b ("drm/msm: Remove Kconfig default")
Signed-off-by: Fabio Estevam <festevam@gmail.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
On i.MX8MQ, usdhc's ipg clock is from IMX8MQ_CLK_IPG_ROOT,
assign it explicitly instead of using IMX8MQ_CLK_DUMMY.
Fixes: 748f908cc8 ("arm64: add basic DTS for i.MX8MQ")
Signed-off-by: Anson Huang <Anson.Huang@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
For APIC case of interrupt we don't fail a ->probe() of the driver,
which makes kernel to print a lot of warnings from the children.
We have two options here:
- switch to platform_get_irq_optional(), though it won't stop children
to be probed and failed
- fail the ->probe() of i2c-multi-instantiate
Since the in reality we never had devices in the wild where IRQ resource
is optional, the latter solution suits the best.
Fixes: 799d3379a6 ("platform/x86: i2c-multi-instantiate: Introduce IOAPIC IRQ support")
Reported-by: Ammy Yi <ammy.yi@intel.com>
Cc: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Cc: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
i.MX7S/D's GPT ipg clock should be from GPT clock root and
controlled by CCM's GPT CCGR, using correct clock source for
GPT ipg clock instead of IMX7D_CLK_DUMMY.
Fixes: 3ef79ca6bd ("ARM: dts: imx7d: use imx7s.dtsi as base device tree")
Signed-off-by: Anson Huang <Anson.Huang@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Commit 4daa4fba3a ("gpu: drm: ttm: Adding new return type vm_fault_t")
broke TTM prefaulting. Since vmf_insert_mixed() typically always returns
VM_FAULT_NOPAGE, prefaulting stops after the second PTE.
Restore (almost) the original behaviour. Unfortunately we can no longer
with the new vm_fault_t return type determine whether a prefaulting
PTE insertion hit an already populated PTE, and terminate the insertion
loop. Instead we continue with the pre-determined number of prefaults.
Fixes: 4daa4fba3a ("gpu: drm: ttm: Adding new return type vm_fault_t")
Cc: Souptick Joarder <jrdr.linux@gmail.com>
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Cc: stable@vger.kernel.org # v4.19+
Signed-off-by: Christian König <christian.koenig@amd.com>
Link: https://patchwork.freedesktop.org/patch/330387/
A previous patch disabled the SNVS power key by default which
breaks the ability for the imx6q-logicpd board to wake from sleep.
This patch re-enables this feature for this board.
Fixes: 770856f0da ("ARM: dts: imx6qdl: Enable SNVS power key according to board design")
Signed-off-by: Adam Ford <aford173@gmail.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
(kvalo: cherry picked from commit 1340cc631b in
wireless-drivers-next to wireless-drivers as this a frequently reported
regression)
Bad latency is found on QCA988x, the issue was introduced by
commit 4504f0e5b5 ("ath10k: sdio: workaround firmware UART
pin configuration bug"). If uart_pin_workaround is false, this
change will set uart pin even if uart_print is false.
Tested HW: QCA9880
Tested FW: 10.2.4-1.0-00037
Fixes: 4504f0e5b5 ("ath10k: sdio: workaround firmware UART pin configuration bug")
Signed-off-by: Miaoqing Pan <miaoqing@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
We have been calling it virtio_fs and even file name is virtio_fs.c. Module
name is virtio_fs.ko but when registering file system user is supposed to
specify filesystem type as "virtiofs".
Masayoshi Mizuma reported that he specified filesytem type as "virtio_fs"
and got this warning on console.
------------[ cut here ]------------
request_module fs-virtio_fs succeeded, but still no fs?
WARNING: CPU: 1 PID: 1234 at fs/filesystems.c:274 get_fs_type+0x12c/0x138
Modules linked in: ... virtio_fs fuse virtio_net net_failover ...
CPU: 1 PID: 1234 Comm: mount Not tainted 5.4.0-rc1 #1
So looks like kernel could find the module virtio_fs.ko but could not find
filesystem type after that.
It probably is better to rename module name to virtiofs.ko so that above
warning goes away in case user ends up specifying wrong fs name.
Reported-by: Masayoshi Mizuma <msys.mizuma@gmail.com>
Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Tested-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Illegal memory will be touch if SDMA_SCRIPT_ADDRS_ARRAY_SIZE_V3
(41) exceed the size of structure sdma_script_start_addrs(40),
thus cause memory corrupt such as slob block header so that kernel
trap into while() loop forever in slob_free(). Please refer to below
code piece in imx-sdma.c:
for (i = 0; i < sdma->script_number; i++)
if (addr_arr[i] > 0)
saddr_arr[i] = addr_arr[i]; /* memory corrupt here */
That issue was brought by commit a572460be9 ("dmaengine: imx-sdma: Add
support for version 3 firmware") because SDMA_SCRIPT_ADDRS_ARRAY_SIZE_V3
(38->41 3 scripts added) not align with script number added in
sdma_script_start_addrs(2 scripts).
Fixes: a572460be9 ("dmaengine: imx-sdma: Add support for version 3 firmware")
Cc: stable@vger.kernel
Link: https://www.spinics.net/lists/arm-kernel/msg754895.html
Signed-off-by: Robin Gong <yibin.gong@nxp.com>
Reported-by: Jurgen Lambrecht <J.Lambrecht@TELEVIC.com>
Link: https://lore.kernel.org/r/1569347584-3478-1-git-send-email-yibin.gong@nxp.com
[vkoul: update the patch title]
Signed-off-by: Vinod Koul <vkoul@kernel.org>
>From Tegra186 onwards OUTSTANDING_REQUESTS field is added in channel
configuration register(bits 7:4) which defines the maximum number of reads
from the source and writes to the destination that may be outstanding at
any given point of time. This field must be programmed with a value
between 1 and 8. A value of 0 will prevent any transfers from happening.
Thus added 'has_outstanding_reqs' bool member in chip data structure and is
set to false for Tegra210, since the field is not applicable. For Tegra186
it is set to true and channel configuration is updated with maximum
outstanding requests.
Fixes: 433de642a7 ("dmaengine: tegra210-adma: add support for Tegra186/Tegra194")
Cc: stable@vger.kernel.org
Signed-off-by: Sameer Pujar <spujar@nvidia.com>
Acked-by: Jon Hunter <jonathanh@nvidia.com>
Link: https://lore.kernel.org/r/1568626513-16541-1-git-send-email-spujar@nvidia.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
lx2160a support PW15 but not PW20, correct name to avoid confusing.
Signed-off-by: Ran Wang <ran.wang_1@nxp.com>
Fixes: 00c5ce8ac0 ("arm64: dts: lx2160a: add cpu idle support")
Acked-by: Li Yang <leoyang.li@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
We will set the link-list pointer register point to next link-list
configuration's physical address, which can load DMA configuration
from the link-list node automatically.
But the link-list node's physical address can be larger than 32bits,
and now Spreadtrum DMA driver only supports 32bits physical address,
which may cause loading a incorrect DMA configuration when starting
the link-list transfer mode. According to the DMA datasheet, we can
use SRC_BLK_STEP register (bit28 - bit31) to save the high bits of the
link-list node's physical address to fix this issue.
Fixes: 4ac6954647 ("dmaengine: sprd: Support DMA link-list mode")
Signed-off-by: Zhenfang Wang <zhenfang.wang@unisoc.com>
Signed-off-by: Baolin Wang <baolin.wang@linaro.org>
Link: https://lore.kernel.org/r/eadfe9295499efa003e1c344e67e2890f9d1d780.1568267061.git.baolin.wang@linaro.org
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Pull tracing fixes from Steven Rostedt:
"A few tracing fixes:
- Remove lockdown from tracefs itself and moved it to the trace
directory. Have the open functions there do the lockdown checks.
- Fix a few races with opening an instance file and the instance
being deleted (Discovered during the lockdown updates). Kept
separate from the clean up code such that they can be backported to
stable easier.
- Clean up and consolidated the checks done when opening a trace
file, as there were multiple checks that need to be done, and it
did not make sense having them done in each open instance.
- Fix a regression in the record mcount code.
- Small hw_lat detector tracer fixes.
- A trace_pipe read fix due to not initializing trace_seq"
* tag 'trace-v5.4-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Initialize iter->seq after zeroing in tracing_read_pipe()
tracing/hwlat: Don't ignore outer-loop duration when calculating max_latency
tracing/hwlat: Report total time spent in all NMIs during the sample
recordmcount: Fix nop_mcount() function
tracing: Do not create tracefs files if tracefs lockdown is in effect
tracing: Add locked_down checks to the open calls of files created for tracefs
tracing: Add tracing_check_open_get_tr()
tracing: Have trace events system open call tracing_open_generic_tr()
tracing: Get trace_array reference for available_tracers files
ftrace: Get a reference counter for the trace_array on filter files
tracefs: Revert ccbd54ff54 ("tracefs: Restrict tracefs when the kernel is locked down")
Each slave interface of an B.A.T.M.A.N. IV virtual interface has an OGM
packet buffer which is initialized using data from netdevice notifier and
other rtnetlink related hooks. It is sent regularly via various slave
interfaces of the batadv virtual interface and in this process also
modified (realloced) to integrate additional state information via TVLV
containers.
It must be avoided that the worker item is executed without a common lock
with the netdevice notifier/rtnetlink helpers. Otherwise it can either
happen that half modified/freed data is sent out or functions modifying the
OGM buffer try to access already freed memory regions.
Reported-by: syzbot+0cc629f19ccb8534935b@syzkaller.appspotmail.com
Fixes: c6c8fea297 ("net: Add batman-adv meshing protocol")
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
A B.A.T.M.A.N. V virtual interface has an OGM2 packet buffer which is
initialized using data from the netdevice notifier and other rtnetlink
related hooks. It is sent regularly via various slave interfaces of the
batadv virtual interface and in this process also modified (realloced) to
integrate additional state information via TVLV containers.
It must be avoided that the worker item is executed without a common lock
with the netdevice notifier/rtnetlink helpers. Otherwise it can either
happen that half modified data is sent out or the functions modifying the
OGM2 buffer try to access already freed memory regions.
Fixes: 0da0035942 ("batman-adv: OGMv2 - add basic infrastructure")
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
pSeries machines on POWER9 processors can run with the XICS (legacy)
interrupt mode or with the XIVE exploitation interrupt mode. These
interrupt contollers have different interfaces for interrupt
management : XICS uses hcalls and XIVE loads and stores on a page.
H_EOI being a XICS interface the enable_scrq_irq() routine can fail
when the machine runs in XIVE mode.
Fix that by calling the EOI handler of the interrupt chip.
Fixes: f23e0643cd ("ibmvnic: Clear pending interrupt after device reset")
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
__lpc_eth_shutdown is called after __lpc_eth_reset but it is already
calling __lpc_eth_reset. Avoid resetting the IP twice.
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet says:
====================
tcp: address KCSAN reports in tcp_poll() (part I)
This all started with a KCSAN report (included
in "tcp: annotate tp->rcv_nxt lockless reads" changelog)
tcp_poll() runs in a lockless way. This means that about
all accesses of tcp socket fields done in tcp_poll() context
need annotations otherwise KCSAN will complain about data-races.
While doing this detective work, I found a more serious bug,
addressed by the first patch ("tcp: add rcu protection around
tp->fastopen_rsk").
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
For the sake of tcp_poll(), there are few places where we fetch
sk->sk_wmem_queued while this field can change from IRQ or other cpu.
We need to add READ_ONCE() annotations, and also make sure write
sides use corresponding WRITE_ONCE() to avoid store-tearing.
sk_wmem_queued_add() helper is added so that we can in
the future convert to ADD_ONCE() or equivalent if/when
available.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For the sake of tcp_poll(), there are few places where we fetch
sk->sk_sndbuf while this field can change from IRQ or other cpu.
We need to add READ_ONCE() annotations, and also make sure write
sides use corresponding WRITE_ONCE() to avoid store-tearing.
Note that other transports probably need similar fixes.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For the sake of tcp_poll(), there are few places where we fetch
sk->sk_rcvbuf while this field can change from IRQ or other cpu.
We need to add READ_ONCE() annotations, and also make sure write
sides use corresponding WRITE_ONCE() to avoid store-tearing.
Note that other transports probably need similar fixes.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There two places where we fetch tp->urg_seq while
this field can change from IRQ or other cpu.
We need to add READ_ONCE() annotations, and also make
sure write side use corresponding WRITE_ONCE() to avoid
store-tearing.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are few places where we fetch tp->snd_nxt while
this field can change from IRQ or other cpu.
We need to add READ_ONCE() annotations, and also make
sure write sides use corresponding WRITE_ONCE() to avoid
store-tearing.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are few places where we fetch tp->write_seq while
this field can change from IRQ or other cpu.
We need to add READ_ONCE() annotations, and also make
sure write sides use corresponding WRITE_ONCE() to avoid
store-tearing.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are few places where we fetch tp->copied_seq while
this field can change from IRQ or other cpu.
We need to add READ_ONCE() annotations, and also make
sure write sides use corresponding WRITE_ONCE() to avoid
store-tearing.
Note that tcp_inq_hint() was already using READ_ONCE(tp->copied_seq)
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are few places where we fetch tp->rcv_nxt while
this field can change from IRQ or other cpu.
We need to add READ_ONCE() annotations, and also make
sure write sides use corresponding WRITE_ONCE() to avoid
store-tearing.
Note that tcp_inq_hint() was already using READ_ONCE(tp->rcv_nxt)
syzbot reported :
BUG: KCSAN: data-race in tcp_poll / tcp_queue_rcv
write to 0xffff888120425770 of 4 bytes by interrupt on cpu 0:
tcp_rcv_nxt_update net/ipv4/tcp_input.c:3365 [inline]
tcp_queue_rcv+0x180/0x380 net/ipv4/tcp_input.c:4638
tcp_rcv_established+0xbf1/0xf50 net/ipv4/tcp_input.c:5616
tcp_v4_do_rcv+0x381/0x4e0 net/ipv4/tcp_ipv4.c:1542
tcp_v4_rcv+0x1a03/0x1bf0 net/ipv4/tcp_ipv4.c:1923
ip_protocol_deliver_rcu+0x51/0x470 net/ipv4/ip_input.c:204
ip_local_deliver_finish+0x110/0x140 net/ipv4/ip_input.c:231
NF_HOOK include/linux/netfilter.h:305 [inline]
NF_HOOK include/linux/netfilter.h:299 [inline]
ip_local_deliver+0x133/0x210 net/ipv4/ip_input.c:252
dst_input include/net/dst.h:442 [inline]
ip_rcv_finish+0x121/0x160 net/ipv4/ip_input.c:413
NF_HOOK include/linux/netfilter.h:305 [inline]
NF_HOOK include/linux/netfilter.h:299 [inline]
ip_rcv+0x18f/0x1a0 net/ipv4/ip_input.c:523
__netif_receive_skb_one_core+0xa7/0xe0 net/core/dev.c:5004
__netif_receive_skb+0x37/0xf0 net/core/dev.c:5118
netif_receive_skb_internal+0x59/0x190 net/core/dev.c:5208
napi_skb_finish net/core/dev.c:5671 [inline]
napi_gro_receive+0x28f/0x330 net/core/dev.c:5704
receive_buf+0x284/0x30b0 drivers/net/virtio_net.c:1061
read to 0xffff888120425770 of 4 bytes by task 7254 on cpu 1:
tcp_stream_is_readable net/ipv4/tcp.c:480 [inline]
tcp_poll+0x204/0x6b0 net/ipv4/tcp.c:554
sock_poll+0xed/0x250 net/socket.c:1256
vfs_poll include/linux/poll.h:90 [inline]
ep_item_poll.isra.0+0x90/0x190 fs/eventpoll.c:892
ep_send_events_proc+0x113/0x5c0 fs/eventpoll.c:1749
ep_scan_ready_list.constprop.0+0x189/0x500 fs/eventpoll.c:704
ep_send_events fs/eventpoll.c:1793 [inline]
ep_poll+0xe3/0x900 fs/eventpoll.c:1930
do_epoll_wait+0x162/0x180 fs/eventpoll.c:2294
__do_sys_epoll_pwait fs/eventpoll.c:2325 [inline]
__se_sys_epoll_pwait fs/eventpoll.c:2311 [inline]
__x64_sys_epoll_pwait+0xcd/0x170 fs/eventpoll.c:2311
do_syscall_64+0xcf/0x2f0 arch/x86/entry/common.c:296
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 7254 Comm: syz-fuzzer Not tainted 5.3.0+ #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Both tcp_v4_err() and tcp_v6_err() do the following operations
while they do not own the socket lock :
fastopen = tp->fastopen_rsk;
snd_una = fastopen ? tcp_rsk(fastopen)->snt_isn : tp->snd_una;
The problem is that without appropriate barrier, the compiler
might reload tp->fastopen_rsk and trigger a NULL deref.
request sockets are protected by RCU, we can simply add
the missing annotations and barriers to solve the issue.
Fixes: 168a8f5805 ("tcp: TCP Fast Open Server - main code path")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull hwmon fixes from Guenter Roeck:
- Update/fix inspur-ipsps1 and k10temp Documentation
- Fix nct7904 driver
- Fix HWMON_P_MIN_ALARM mask in hwmon core
* tag 'hwmon-for-v5.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: docs: Extend inspur-ipsps1 title underline
hwmon: (nct7904) Add array fan_alarm and vsen_alarm to store the alarms in nct7904_data struct.
docs: hwmon: Include 'inspur-ipsps1.rst' into docs
hwmon: Fix HWMON_P_MIN_ALARM mask
hwmon: (k10temp) Update documentation and add temp2_input info
hwmon: (nct7904) Fix the incorrect value of vsen_mask in nct7904_data struct
Pull MTD fixes from Richard Weinberger:
"Two fixes for MTD:
- spi-nor: Fix for a regression in write_sr()
- rawnand: Regression fix for the au1550nd driver"
* tag 'fixes-for-5.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux:
mtd: rawnand: au1550nd: Fix au_read_buf16() prototype
mtd: spi-nor: Fix direction of the write_sr() transfer
Pull io_uring fix from Jens Axboe:
"Single small fix for a regression in the sequence logic for linked
commands"
* tag 'for-linus-20191012' of git://git.kernel.dk/linux-block:
io_uring: fix sequence logic for timeout requests
When device stop was moved out of reset, test device wasn't updated to
stop before reset, this resulted in a use after free. Fix by invoking
stop appropriately.
Fixes: b211616d71 ("vhost: move -net specific code out")
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
A customer reported the following softlockup:
[899688.160002] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [test.sh:16464]
[899688.160002] CPU: 0 PID: 16464 Comm: test.sh Not tainted 4.12.14-6.23-azure #1 SLE12-SP4
[899688.160002] RIP: 0010:up_write+0x1a/0x30
[899688.160002] Kernel panic - not syncing: softlockup: hung tasks
[899688.160002] RIP: 0010:up_write+0x1a/0x30
[899688.160002] RSP: 0018:ffffa86784d4fde8 EFLAGS: 00000257 ORIG_RAX: ffffffffffffff12
[899688.160002] RAX: ffffffff970fea00 RBX: 0000000000000001 RCX: 0000000000000000
[899688.160002] RDX: ffffffff00000001 RSI: 0000000000000080 RDI: ffffffff970fea00
[899688.160002] RBP: ffffffffffffffff R08: ffffffffffffffff R09: 0000000000000000
[899688.160002] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8b59014720d8
[899688.160002] R13: ffff8b59014720c0 R14: ffff8b5901471090 R15: ffff8b5901470000
[899688.160002] tracing_read_pipe+0x336/0x3c0
[899688.160002] __vfs_read+0x26/0x140
[899688.160002] vfs_read+0x87/0x130
[899688.160002] SyS_read+0x42/0x90
[899688.160002] do_syscall_64+0x74/0x160
It caught the process in the middle of trace_access_unlock(). There is
no loop. So, it must be looping in the caller tracing_read_pipe()
via the "waitagain" label.
Crashdump analyze uncovered that iter->seq was completely zeroed
at this point, including iter->seq.seq.size. It means that
print_trace_line() was never able to print anything and
there was no forward progress.
The culprit seems to be in the code:
/* reset all but tr, trace, and overruns */
memset(&iter->seq, 0,
sizeof(struct trace_iterator) -
offsetof(struct trace_iterator, seq));
It was added by the commit 53d0aa7730 ("ftrace:
add logic to record overruns"). It was v2.6.27-rc1.
It was the time when iter->seq looked like:
struct trace_seq {
unsigned char buffer[PAGE_SIZE];
unsigned int len;
};
There was no "size" variable and zeroing was perfectly fine.
The solution is to reinitialize the structure after or without
zeroing.
Link: http://lkml.kernel.org/r/20191011142134.11997-1-pmladek@suse.com
Signed-off-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
nmi_total_ts is supposed to record the total time spent in *all* NMIs
that occur on the given CPU during the (active portion of the)
sampling window. However, the code seems to be overwriting this
variable for each NMI, thereby only recording the time spent in the
most recent NMI. Fix it by accumulating the duration instead.
Link: http://lkml.kernel.org/r/157073343544.17189.13911783866738671133.stgit@srivatsa-ubuntu
Fixes: 7b2c862501 ("tracing: Add NMI tracing in hwlat detector")
Cc: stable@vger.kernel.org
Signed-off-by: Srivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
The removal of the longjmp code in recordmcount.c mistakenly made the return
of make_nop() being negative an exit of nop_mcount(). It should not exit the
routine, but instead just not process that part of the code. By exiting with
an error code, it would cause the update of recordmcount to fail some files
which would fail the build if ftrace function tracing was enabled.
Link: http://lkml.kernel.org/r/20191009110538.5909fec6@gandalf.local.home
Reported-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Tested-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Fixes: 3f1df12019 ("recordmcount: Rewrite error/success handling")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Added various checks on open tracefs calls to see if tracefs is in lockdown
mode, and if so, to return -EPERM.
Note, the event format files (which are basically standard on all machines)
as well as the enabled_functions file (which shows what is currently being
traced) are not lockde down. Perhaps they should be, but it seems counter
intuitive to lockdown information to help you know if the system has been
modified.
Link: http://lkml.kernel.org/r/CAHk-=wj7fGPKUspr579Cii-w_y60PtRaiDgKuxVtBAMK0VNNkA@mail.gmail.com
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Currently, most files in the tracefs directory test if tracing_disabled is
set. If so, it should return -ENODEV. The tracing_disabled is called when
tracing is found to be broken. Originally it was done in case the ring
buffer was found to be corrupted, and we wanted to prevent reading it from
crashing the kernel. But it's also called if a tracing selftest fails on
boot. It's a one way switch. That is, once it is triggered, tracing is
disabled until reboot.
As most tracefs files can also be used by instances in the tracefs
directory, they need to be carefully done. Each instance has a trace_array
associated to it, and when the instance is removed, the trace_array is
freed. But if an instance is opened with a reference to the trace_array,
then it requires looking up the trace_array to get its ref counter (as there
could be a race with it being deleted and the open itself). Once it is
found, a reference is added to prevent the instance from being removed (and
the trace_array associated with it freed).
Combine the two checks (tracing_disabled and trace_array_get()) into a
single helper function. This will also make it easier to add lockdown to
tracefs later.
Link: http://lkml.kernel.org/r/20191011135458.7399da44@gandalf.local.home
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Instead of having the trace events system open call open code the taking of
the trace_array descriptor (with trace_array_get()) and then calling
trace_open_generic(), have it use the tracing_open_generic_tr() that does
the combination of the two. This requires making tracing_open_generic_tr()
global.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
As instances may have different tracers available, we need to look at the
trace_array descriptor that shows the list of the available tracers for the
instance. But there's a race between opening the file and an admin
deleting the instance. The trace_array_get() needs to be called before
accessing the trace_array.
Cc: stable@vger.kernel.org
Fixes: 607e2ea167 ("tracing: Set up infrastructure to allow tracers for instances")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
The ftrace set_ftrace_filter and set_ftrace_notrace files are specific for
an instance now. They need to take a reference to the instance otherwise
there could be a race between accessing the files and deleting the instance.
It wasn't until the :mod: caching where these file operations started
referencing the trace_array directly.
Cc: stable@vger.kernel.org
Fixes: 673feb9d76 ("ftrace: Add :mod: caching infrastructure to trace_array")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Running the latest kernel through my "make instances" stress tests, I
triggered the following bug (with KASAN and kmemleak enabled):
mkdir invoked oom-killer:
gfp_mask=0x40cd0(GFP_KERNEL|__GFP_COMP|__GFP_RECLAIMABLE), order=0,
oom_score_adj=0
CPU: 1 PID: 2229 Comm: mkdir Not tainted 5.4.0-rc2-test #325
Hardware name: MSI MS-7823/CSM-H87M-G43 (MS-7823), BIOS V1.6 02/22/2014
Call Trace:
dump_stack+0x64/0x8c
dump_header+0x43/0x3b7
? trace_hardirqs_on+0x48/0x4a
oom_kill_process+0x68/0x2d5
out_of_memory+0x2aa/0x2d0
__alloc_pages_nodemask+0x96d/0xb67
__alloc_pages_node+0x19/0x1e
alloc_slab_page+0x17/0x45
new_slab+0xd0/0x234
___slab_alloc.constprop.86+0x18f/0x336
? alloc_inode+0x2c/0x74
? irq_trace+0x12/0x1e
? tracer_hardirqs_off+0x1d/0xd7
? __slab_alloc.constprop.85+0x21/0x53
__slab_alloc.constprop.85+0x31/0x53
? __slab_alloc.constprop.85+0x31/0x53
? alloc_inode+0x2c/0x74
kmem_cache_alloc+0x50/0x179
? alloc_inode+0x2c/0x74
alloc_inode+0x2c/0x74
new_inode_pseudo+0xf/0x48
new_inode+0x15/0x25
tracefs_get_inode+0x23/0x7c
? lookup_one_len+0x54/0x6c
tracefs_create_file+0x53/0x11d
trace_create_file+0x15/0x33
event_create_dir+0x2a3/0x34b
__trace_add_new_event+0x1c/0x26
event_trace_add_tracer+0x56/0x86
trace_array_create+0x13e/0x1e1
instance_mkdir+0x8/0x17
tracefs_syscall_mkdir+0x39/0x50
? get_dname+0x31/0x31
vfs_mkdir+0x78/0xa3
do_mkdirat+0x71/0xb0
sys_mkdir+0x19/0x1b
do_fast_syscall_32+0xb0/0xed
I bisected this down to the addition of the proxy_ops into tracefs for
lockdown. It appears that the allocation of the proxy_ops and then freeing
it in the destroy_inode callback, is causing havoc with the memory system.
Reading the documentation about destroy_inode and talking with Linus about
this, this is buggy and wrong. When defining the destroy_inode() method, it
is expected that the destroy_inode() will also free the inode, and not just
the extra allocations done in the creation of the inode. The faulty commit
causes a memory leak of the inode data structure when they are deleted.
Instead of allocating the proxy_ops (and then having to free it) the checks
should be done by the open functions themselves, and not hack into the
tracefs directory. First revert the tracefs updates for locked_down and then
later we can add the locked_down checks in the kernel/trace files.
Link: http://lkml.kernel.org/r/20191011135458.7399da44@gandalf.local.home
Fixes: ccbd54ff54 ("tracefs: Restrict tracefs when the kernel is locked down")
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Pull char/misc driver fixes from Greg KH:
"Here are some small char/misc driver fixes for 5.4-rc3.
Nothing huge here. Some binder driver fixes (although it is still
being discussed if these all fix the reported issues or not, so more
might be coming later), some mei device ids and fixes, and a google
firmware driver bugfix that fixes a regression, as well as some other
tiny fixes.
All have been in linux-next with no reported issues"
* tag 'char-misc-5.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
firmware: google: increment VPD key_len properly
w1: ds250x: Fix build error without CRC16
virt: vbox: fix memory leak in hgcm_call_preprocess_linaddr
binder: Fix comment headers on binder_alloc_prepare_to_free()
binder: prevent UAF read in print_binder_transaction_log_entry()
misc: fastrpc: prevent memory leak in fastrpc_dma_buf_attach
mei: avoid FW version request on Ibex Peak and earlier
mei: me: add comet point (lake) LP device ids
Pull staging/IIO driver fixes from Greg KH:
"Here are some staging and IIO driver fixes for 5.4-rc3.
The "biggest" thing here is a removal of the fbtft device and flexfb
code as they have been abandoned by their authors and are no longer
needed for that hardware.
Other than that, the usual amount of staging driver and iio driver
fixes for reported issues, and some speakup sysfs file documentation,
which has been long awaited for.
All have been in linux-next with no reported issues"
* tag 'staging-5.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (32 commits)
iio: Fix an undefied reference error in noa1305_probe
iio: light: opt3001: fix mutex unlock race
iio: adc: ad799x: fix probe error handling
iio: light: add missing vcnl4040 of_compatible
iio: light: fix vcnl4000 devicetree hooks
iio: imu: st_lsm6dsx: fix waitime for st_lsm6dsx i2c controller
iio: adc: axp288: Override TS pin bias current for some models
iio: imu: adis16400: fix memory leak
iio: imu: adis16400: release allocated memory on failure
iio: adc: stm32-adc: fix a race when using several adcs with dma and irq
iio: adc: stm32-adc: move registers definitions
iio: accel: adxl372: Perform a reset at start up
iio: accel: adxl372: Fix push to buffers lost samples
iio: accel: adxl372: Fix/remove limitation for FIFO samples
iio: adc: hx711: fix bug in sampling of data
staging: vt6655: Fix memory leak in vt6655_probe
staging: exfat: Use kvzalloc() instead of kzalloc() for exfat_sb_info
Staging: fbtft: fix memory leak in fbtft_framebuffer_alloc
staging: speakup: document sysfs attributes
staging: rtl8188eu: fix HighestRate check in odm_ARFBRefresh_8188E()
...
Pull tty/serial driver fixes from Greg KH:
"Here are some small tty and serial driver fixes for 5.4-rc3 that
resolve a number of reported issues and regressions.
None of these are huge, full details are in the shortlog. There's also
a MAINTAINERS update that I think you might have already taken in your
tree already, but git should handle that merge easily.
All have been in linux-next with no reported issues"
* tag 'tty-5.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
MAINTAINERS: kgdb: Add myself as a reviewer for kgdb/kdb
tty: serial: imx: Use platform_get_irq_optional() for optional IRQs
serial: fix kernel-doc warning in comments
serial: 8250_omap: Fix gpio check for auto RTS/CTS
serial: mctrl_gpio: Check for NULL pointer
tty: serial: fsl_lpuart: Fix lpuart_flush_buffer()
tty: serial: Fix PORT_LINFLEXUART definition
tty: n_hdlc: fix build on SPARC
serial: uartps: Fix uartps_major handling
serial: uartlite: fix exit path null pointer
tty: serial: linflexuart: Fix magic SysRq handling
serial: sh-sci: Use platform_get_irq_optional() for optional interrupts
dt-bindings: serial: sh-sci: Document r8a774b1 bindings
serial/sifive: select SERIAL_EARLYCON
tty: serial: rda: Fix the link time qualifier of 'rda_uart_exit()'
tty: serial: owl: Fix the link time qualifier of 'owl_uart_exit()'
Pull USB fixes from Greg KH:
"Here are a lot of small USB driver fixes for 5.4-rc3.
syzbot has stepped up its testing of the USB driver stack, now able to
trigger fun race conditions between disconnect and probe functions.
Because of that we have a lot of fixes in here from Johan and others
fixing these reported issues that have been around since almost all
time.
We also are just deleting the rio500 driver, making all of the syzbot
bugs found in it moot as it turns out no one has been using it for
years as there is a userspace version that is being used instead.
There are also a number of other small fixes in here, all resolving
reported issues or regressions.
All have been in linux-next without any reported issues"
* tag 'usb-5.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (65 commits)
USB: yurex: fix NULL-derefs on disconnect
USB: iowarrior: use pr_err()
USB: iowarrior: drop redundant iowarrior mutex
USB: iowarrior: drop redundant disconnect mutex
USB: iowarrior: fix use-after-free after driver unbind
USB: iowarrior: fix use-after-free on release
USB: iowarrior: fix use-after-free on disconnect
USB: chaoskey: fix use-after-free on release
USB: adutux: fix use-after-free on release
USB: ldusb: fix NULL-derefs on driver unbind
USB: legousbtower: fix use-after-free on release
usb: cdns3: Fix for incorrect DMA mask.
usb: cdns3: fix cdns3_core_init_role()
usb: cdns3: gadget: Fix full-speed mode
USB: usb-skeleton: drop redundant in-urb check
USB: usb-skeleton: fix use-after-free after driver unbind
USB: usb-skeleton: fix NULL-deref on disconnect
usb:cdns3: Fix for CV CH9 running with g_zero driver.
usb: dwc3: Remove dev_err() on platform_get_irq() failure
usb: dwc3: Switch to platform_get_irq_byname_optional()
...
Pull scheduler fixes from Ingo Molnar:
"Two fixes: a guest-cputime accounting fix, and a cgroup bandwidth
quota precision fix"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/vtime: Fix guest/system mis-accounting on task switch
sched/fair: Scale bandwidth quota and period without losing quota/period ratio precision
Pull perf fixes from Ingo Molnar:
"Mostly tooling fixes, but also a couple of updates for new Intel
models (which are technically hw-enablement, but to users it's a fix
to perf behavior on those new CPUs - hope this is fine), an AUX
inheritance fix, event time-sharing fix, and a fix for lost non-perf
NMI events on AMD systems"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits)
perf/x86/cstate: Add Tiger Lake CPU support
perf/x86/msr: Add Tiger Lake CPU support
perf/x86/intel: Add Tiger Lake CPU support
perf/x86/cstate: Update C-state counters for Ice Lake
perf/x86/msr: Add new CPU model numbers for Ice Lake
perf/x86/cstate: Add Comet Lake CPU support
perf/x86/msr: Add Comet Lake CPU support
perf/x86/intel: Add Comet Lake CPU support
perf/x86/amd: Change/fix NMI latency mitigation to use a timestamp
perf/core: Fix corner case in perf_rotate_context()
perf/core: Rework memory accounting in perf_mmap()
perf/core: Fix inheritance of aux_output groups
perf annotate: Don't return -1 for error when doing BPF disassembly
perf annotate: Return appropriate error code for allocation failures
perf annotate: Fix arch specific ->init() failure errors
perf annotate: Propagate the symbol__annotate() error return
perf annotate: Fix the signedness of failure returns
perf annotate: Propagate perf_env__arch() error
perf evsel: Fall back to global 'perf_env' in perf_evsel__env()
perf tools: Propagate get_cpuid() error
...
Pull EFI fixes from Ingo Molnar:
"Misc EFI fixes all across the map: CPER error report fixes, fixes to
TPM event log parsing, fix for a kexec hang, a Sparse fix and other
fixes"
* 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
efi/tpm: Fix sanity check of unsigned tbl_size being less than zero
efi/x86: Do not clean dummy variable in kexec path
efi: Make unexported efi_rci2_sysfs_init() static
efi/tpm: Only set 'efi_tpm_final_log_size' after successful event log parsing
efi/tpm: Don't traverse an event log with no events
efi/tpm: Don't access event->count when it isn't mapped
efivar/ssdt: Don't iterate over EFI vars if no SSDT override was specified
efi/cper: Fix endianness of PCIe class code
Pull x86 fixes from Ingo Molnar:
"A handful of fixes: a kexec linking fix, an AMD MWAITX fix, a vmware
guest support fix when built under Clang, and new CPU model number
definitions"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/cpu: Add Comet Lake to the Intel CPU models header
lib/string: Make memzero_explicit() inline instead of external
x86/cpu/vmware: Use the full form of INL in VMWARE_PORT
x86/asm: Fix MWAITX C-state hint value
Pull x86 license tag fixlets from Ingo Molnar:
"Fix a couple of SPDX tags in x86 headers to follow the canonical
pattern"
* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86: Use the correct SPDX License Identifier in headers
Pull RISC-V fixes from Paul Walmsley:
- Fix several bugs in the breakpoint trap handler
- Drop an unnecessary loop around calls to preempt_schedule_irq()
* tag 'riscv/for-v5.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
RISC-V: entry: Remove unneeded need_resched() loop
riscv: Correct the handling of unexpected ebreak in do_trap_break()
riscv: avoid sending a SIGTRAP to a user thread trapped in WARN()
riscv: avoid kernel hangs when trapped in BUG()
Pull MIPS fixes from Paul Burton:
- Build fixes for CONFIG_OPTIMIZE_INLINING=y builds in which the
compiler may choose not to inline __xchg() & __cmpxchg().
- A build fix for Loongson configurations with GCC 9.x.
- Expose some extra HWCAP bits to indicate support for various
instruction set extensions to userland.
- Fix bad stack access in firmware handling code for old SNI
RM200/300/400 machines.
* tag 'mips_fixes_5.4_2' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
MIPS: Disable Loongson MMI instructions for kernel build
MIPS: elf_hwcap: Export userspace ASEs
MIPS: fw: sni: Fix out of bounds init of o32 stack
MIPS: include: Mark __xchg as __always_inline
MIPS: include: Mark __cmpxchg as __always_inline
Pull powerpc fixes from Michael Ellerman:
"Fix a kernel crash in spufs_create_root() on Cell machines, since the
new mount API went in.
Fix a regression in our KVM code caused by our recent PCR changes.
Avoid a warning message about a failing hypervisor API on systems that
don't have that API.
A couple of minor build fixes.
Thanks to: Alexey Kardashevskiy, Alistair Popple, Desnes A. Nunes do
Rosario, Emmanuel Nicolet, Jordan Niethe, Laurent Dufour, Stephen
Rothwell"
* tag 'powerpc-5.4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
spufs: fix a crash in spufs_create_root()
powerpc/kvm: Fix kvmppc_vcore->in_guest value in kvmhv_switch_to_host
selftests/powerpc: Fix compile error on tlbie_test due to newer gcc
powerpc/pseries: Remove confusing warning message.
powerpc/64s/radix: Fix build failure with RADIX_MMU=n
Pull xen fixes from Juergen Gross:
- correct panic handling when running as a Xen guest
- cleanup the Xen grant driver to remove printing a pointer being
always NULL
- remove a soon to be wrong call of of_dma_configure()
* tag 'for-linus-5.4-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen: Stop abusing DT of_dma_configure API
xen/grant-table: remove unnecessary printing
x86/xen: Return from panic notifier
Alexei Starovoitov says:
====================
pull-request: bpf 2019-10-12
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) a bunch of small fixes. Nothing critical.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Comet Lake is the new 10th Gen Intel processor. From the perspective of
Intel cstate residency counters, there is nothing changed compared with
Kaby Lake.
Share hswult_cstates with Kaby Lake.
Update the comments for Comet Lake.
Kaby Lake is missed in the comments for some Residency Counters. Update
the comments for Kaby Lake as well.
The External Design Specification (EDS) is not published yet. It comes
from an authoritative internal source.
The patch has been tested on real hardware.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/1570549810-25049-5-git-send-email-kan.liang@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Add replacement email address for the one on my expired domain.
Signed-off-by: Simon Arlott <simon@octiron.net>
Signed-off-by: Stefan Wahren <wahrenst@gmx.net>
If an ICMP packet comes in on the UDP socket backing an AF_RXRPC socket as
the UDP socket is being shut down, rxrpc_error_report() may get called to
deal with it after sk_user_data on the UDP socket has been cleared, leading
to a NULL pointer access when this local endpoint record gets accessed.
Fix this by just returning immediately if sk_user_data was NULL.
The oops looks like the following:
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
...
RIP: 0010:rxrpc_error_report+0x1bd/0x6a9
...
Call Trace:
? sock_queue_err_skb+0xbd/0xde
? __udp4_lib_err+0x313/0x34d
__udp4_lib_err+0x313/0x34d
icmp_unreach+0x1ee/0x207
icmp_rcv+0x25b/0x28f
ip_protocol_deliver_rcu+0x95/0x10e
ip_local_deliver+0xe9/0x148
__netif_receive_skb_one_core+0x52/0x6e
process_backlog+0xdc/0x177
net_rx_action+0xf9/0x270
__do_softirq+0x1b6/0x39a
? smpboot_register_percpu_thread+0xce/0xce
run_ksoftirqd+0x1d/0x42
smpboot_thread_fn+0x19e/0x1b3
kthread+0xf1/0xf6
? kthread_delayed_work_timer_fn+0x83/0x83
ret_from_fork+0x24/0x30
Fixes: 17926a7932 ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both")
Reported-by: syzbot+611164843bd48cc2190c@syzkaller.appspotmail.com
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
rmi_process_interrupt_requests() calls handle_nested_irq() for
each interrupt status bit it finds. If the irq domain mapping for
this bit had not yet been set up, then it ends up calling
handle_nested_irq(0), which causes a NULL pointer dereference.
There's already code that masks the irq_status bits coming out of the
hardware with current_irq_mask, presumably to avoid this situation.
However current_irq_mask seems to more reflect the actual mask set
in the hardware rather than the IRQs software has set up and registered
for. For example, in rmi_driver_reset_handler(), the current_irq_mask
is initialized based on what is read from the hardware. If the reset
value of this mask enables IRQs that Linux has not set up yet, then
we end up in this situation.
There appears to be a third unused bitmask that used to serve this
purpose, fn_irq_bits. Use that bitmask instead of current_irq_mask
to avoid calling handle_nested_irq() on IRQs that have not yet been
set up.
Signed-off-by: Evan Green <evgreen@chromium.org>
Reviewed-by: Andrew Duggan <aduggan@synaptics.com>
Link: https://lore.kernel.org/r/20191008223657.163366-1-evgreen@chromium.org
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Pull NFS client bugfixes from Anna Schumaker:
"Stable bugfixes:
- Fix O_DIRECT accounting of number of bytes read/written # v4.1+
Other fixes:
- Fix nfsi->nrequests count error on nfs_inode_remove_request()
- Remove redundant mirror tracking in O_DIRECT
- Fix leak of clp->cl_acceptor string
- Fix race to sk_err after xs_error_report"
* tag 'nfs-for-5.4-2' of git://git.linux-nfs.org/projects/anna/linux-nfs:
SUNRPC: fix race to sk_err after xs_error_report
NFSv4: Fix leak of clp->cl_acceptor string
NFS: Remove redundant mirror tracking in O_DIRECT
NFS: Fix O_DIRECT accounting of number of bytes read/written
nfs: Fix nfsi->nrequests count error on nfs_inode_remove_request
Pull cifs fixes from Steve French:
"Eight small SMB3 fixes, four for stable, and important fix for the
recent regression introduced by filesystem timestamp range patches"
* tag '5.4-rc2-smb3' of git://git.samba.org/sfrench/cifs-2.6:
CIFS: Force reval dentry if LOOKUP_REVAL flag is set
CIFS: Force revalidate inode when dentry is stale
smb3: Fix regression in time handling
smb3: remove noisy debug message and minor cleanup
CIFS: Gracefully handle QueryInfo errors during open
cifs: use cifsInodeInfo->open_file_lock while iterating to avoid a panic
fs: cifs: mute -Wunused-const-variable message
smb3: cleanup some recent endian errors spotted by updated sparse
In btrfs_read_block_groups(), if we have an invalid block group which
has mixed type (DATA|METADATA) while the fs doesn't have MIXED_GROUPS
feature, we error out without freeing the block group cache.
This patch will add the missing btrfs_put_block_group() to prevent
memory leak.
Note for stable backports: the file to patch in versions <= 5.3 is
fs/btrfs/extent-tree.c
Fixes: 49303381f1 ("Btrfs: bail out if block group has different mixed flag")
CC: stable@vger.kernel.org # 4.9+
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
On msm8998, vblank timeouts are observed because the DSI controller is not
reset properly, which ends up stalling the MDP. This is because the reset
logic is not correct per the hardware documentation.
The documentation states that after asserting reset, software should wait
some time (no indication of how long), or poll the status register until it
returns 0 before deasserting reset.
wmb() is insufficient for this purpose since it just ensures ordering, not
timing between writes. Since asserting and deasserting reset occurs on the
same register, ordering is already guaranteed by the architecture, making
the wmb extraneous.
Since we would define a timeout for polling the status register to avoid a
possible infinite loop, lets just use a static delay of 20 ms, since 16.666
ms is the time available to process one frame at 60 fps.
Fixes: a689554ba6 ("drm/msm: Initial add DSI connector support")
Cc: Hai Li <hali@codeaurora.org>
Cc: Rob Clark <robdclark@gmail.com>
Signed-off-by: Jeffrey Hugo <jeffrey.l.hugo@gmail.com>
Reviewed-by: Sean Paul <sean@poorly.run>
[seanpaul renamed RESET_DELAY to DSI_RESET_TOGGLE_DELAY_MS]
Signed-off-by: Sean Paul <seanpaul@chromium.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20191011133939.16551-1-jeffrey.l.hugo@gmail.com
If we error out when finding a page at relocate_file_extent_cluster(), we
need to release the outstanding extents counter on the relocation inode,
set by the previous call to btrfs_delalloc_reserve_metadata(), otherwise
the inode's block reserve size can never decrease to zero and metadata
space is leaked. Therefore add a call to btrfs_delalloc_release_extents()
in case we can't find the target page.
Fixes: 8b62f87bad ("Btrfs: rework outstanding_extents")
CC: stable@vger.kernel.org # 4.19+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Pull module fixes from Jessica Yu:
"Code cleanups and kbuild/namespace related fixups from Masahiro.
Most importantly, it fixes a namespace-related modpost issue for
external module builds
- Fix broken external module builds due to a modpost bug in
read_dump(), where the namespace was not being strdup'd and
sym->namespace would be set to bogus data.
- Various namespace-related kbuild fixes and cleanups thanks to
Masahiro Yamada"
* tag 'modules-for-v5.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux:
doc: move namespaces.rst from kbuild/ to core-api/
nsdeps: make generated patches independent of locale
nsdeps: fix hashbang of scripts/nsdeps
kbuild: fix build error of 'make nsdeps' in clean tree
module: rename __kstrtab_ns_* to __kstrtabns_* to avoid symbol conflict
modpost: fix broken sym->namespace for external module builds
module: swap the order of symbol.namespace
scripts: add_namespace: Fix coccicheck failed
Pull Hyper-V fixes from Sasha Levin:
"Two fixes from Dexuan Cui:
- Fix a (harmless) warning when building vmbus without
CONFIG_PM_SLEEP
- Fix for a memory leak (and optimization) in the hyperv mouse code"
* tag 'hyperv-fixes-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
Drivers: hv: vmbus: Fix harmless building warnings without CONFIG_PM_SLEEP
HID: hyperv: Use in-place iterator API in the channel callback
Our hardware (UV aka Superdome Flex) has address ranges marked
reserved by the BIOS. Access to these ranges is caught as an error,
causing the BIOS to halt the system.
Initial page tables mapped a large range of physical addresses that
were not checked against the list of BIOS reserved addresses, and
sometimes included reserved addresses in part of the mapped range.
Including the reserved range in the map allowed processor speculative
accesses to the reserved range, triggering a BIOS halt.
Used early in booting, the page table level2_kernel_pgt addresses 1
GiB divided into 2 MiB pages, and it was set up to linearly map a full
1 GiB of physical addresses that included the physical address range
of the kernel image, as chosen by KASLR. But this also included a
large range of unused addresses on either side of the kernel image.
And unlike the kernel image's physical address range, this extra
mapped space was not checked against the BIOS tables of usable RAM
addresses. So there were times when the addresses chosen by KASLR
would result in processor accessible mappings of BIOS reserved
physical addresses.
The kernel code did not directly access any of this extra mapped
space, but having it mapped allowed the processor to issue speculative
accesses into reserved memory, causing system halts.
This was encountered somewhat rarely on a normal system boot, and much
more often when starting the crash kernel if "crashkernel=512M,high"
was specified on the command line (this heavily restricts the physical
address of the crash kernel, in our case usually within 1 GiB of
reserved space).
The solution is to invalidate the pages of this table outside the kernel
image's space before the page table is activated. It fixes this problem
on our hardware.
[ bp: Touchups. ]
Signed-off-by: Steve Wahl <steve.wahl@hpe.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: dimitri.sivanich@hpe.com
Cc: Feng Tang <feng.tang@intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jordan Borgner <mail@jordan-borgner.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: mike.travis@hpe.com
Cc: russ.anderson@hpe.com
Cc: stable@vger.kernel.org
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Cc: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Link: https://lkml.kernel.org/r/9c011ee51b081534a7a15065b1681d200298b530.1569358539.git.steve.wahl@hpe.com
We export the entire kernel address space (i.e. the whole of the TTBR1
address range) via /proc/kcore. The kc_vaddr_to_offset() and
kc_offset_to_vaddr() macros are intended to convert between a kernel
virtual address and its offset relative to the start of the TTBR1
address space.
Prior to commit:
14c127c957 ("arm64: mm: Flip kernel VA space")
... the offset was calculated relative to VA_START, which at the time
was the start of the TTBR1 address space. At this time, PAGE_OFFSET
pointed to the high half of the TTBR1 address space where arm64's
linear map lived.
That commit swapped the position of VA_START and PAGE_OFFSET, but
failed to update kc_vaddr_to_offset() or kc_offset_to_vaddr(), so
since then the two macros behave incorrectly.
Note that VA_START was subsequently renamed to PAGE_END in commit:
77ad4ce693 ("arm64: memory: rename VA_START to PAGE_END")
As the generic implementations of the two macros calculate the offset
relative to PAGE_OFFSET (which is now the start of the TTBR1 address
space), we can delete the arm64 implementation and use those.
Fixes: 14c127c957 ("arm64: mm: Flip kernel VA space")
Reviewed-by: James Morse <james.morse@arm.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
Signed-off-by: Will Deacon <will@kernel.org>
Describe the fallthrough pseudo-keyword.
Convert the coding-style.rst example to the keyword style.
Add description and links to deprecated.rst.
Miguel Ojeda comments on the eventual [[fallthrough]] syntax:
"Note that C17/C18 does not have [[fallthrough]].
C++17 introduced it, as it is mentioned above. I would keep the
__attribute__((fallthrough)) -> [[fallthrough]] change you did,
though, since that is indeed the standard syntax (given the paragraph
references C++17).
I was told by Aaron Ballman (who is proposing them for C) that it is
more or less likely that it becomes standardized in C2x. However, it
is still not added to the draft (other attributes are already,
though). See N2268 and N2269:
http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2268.pdf (fallthrough)
http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2269.pdf (attributes in general)"
Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Reserve the pseudo keyword 'fallthrough' for the ability to convert the
various case block /* fallthrough */ style comments to appear to be an
actual reserved word with the same gcc case block missing fallthrough
warning capability.
All switch/case blocks now should end in one of:
break;
fallthrough;
goto <label>;
return [expression];
continue;
In C mode, GCC supports the __fallthrough__ attribute since 7.1,
the same time the warning and the comment parsing were introduced.
fallthrough devolves to an empty "do {} while (0)" if the compiler
version (any version less than gcc 7) does not support the attribute.
Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Suggested-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull drm fixes from Dave Airlie:
"The regular fixes pull for rc3. The i915 team found some fixes they
(or I) missed for rc1, which is why this is a bit bigger than usual,
otherwise there is a single amdgpu fix, some spi panel aliases, and a
bridge fix.
i915:
- execlist access fixes
- list deletion fix
- CML display fix
- HSW workaround extension to GT2
- chicken bit whitelist
- GGTT resume issue
- SKL GPU hangs for Vulkan compute
amdgpu:
- memory leak fix
panel:
- spi aliases
tc358767:
- bridge artifacts fix"
* tag 'drm-fixes-2019-10-11' of git://anongit.freedesktop.org/drm/drm: (22 commits)
drm/bridge: tc358767: fix max_tu_symbol value
drm/i915/gt: execlists->active is serialised by the tasklet
drm/i915/execlists: Protect peeking at execlists->active
drm/i915: Fixup preempt-to-busy vs reset of a virtual request
drm/i915: Only enqueue already completed requests
drm/i915/execlists: Drop redundant list_del_init(&rq->sched.link)
drm/i915/cml: Add second PCH ID for CMP
drm/amdgpu: fix memory leak
drm/panel: tpo-td043mtea1: Fix SPI alias
drm/panel: tpo-td028ttec1: Fix SPI alias
drm/panel: sony-acx565akm: Fix SPI alias
drm/panel: nec-nl8048hl11: Fix SPI alias
drm/panel: lg-lb035q02: Fix SPI alias
drm/i915: Mark contents as dirty on a write fault
drm/i915: Prevent bonded requests from overtaking each other on preemption
drm/i915: Bump skl+ max plane width to 5k for linear/x-tiled
drm/i915: Verify the engine after acquiring the active.lock
drm/i915: Extend Haswell GT1 PSMI workaround to all
drm/i915: Don't mix srcu tag and negative error codes
drm/i915: Whitelist COMMON_SLICE_CHICKEN2
...
Pull block fixes from Jens Axboe:
- Fix wbt performance regression introduced with the blk-rq-qos
refactoring (Harshad)
- Fix io_uring fileset removal inadvertently killing the workqueue (me)
- Fix io_uring typo in linked command nonblock submission (Pavel)
- Remove spurious io_uring wakeups on request free (Pavel)
- Fix null_blk zoned command error return (Keith)
- Don't use freezable workqueues for backing_dev, also means we can
revert a previous libata hack (Mika)
- Fix nbd sysfs mutex dropped too soon at removal time (Xiubo)
* tag 'for-linus-20191010' of git://git.kernel.dk/linux-block:
nbd: fix possible sysfs duplicate warning
null_blk: Fix zoned command return code
io_uring: only flush workqueues on fileset removal
io_uring: remove wait loop spurious wakeups
blk-wbt: fix performance regression in wbt scale_up/scale_down
Revert "libata, freezer: avoid block device removal while system is frozen"
bdi: Do not use freezable workqueue
io_uring: fix reversed nonblock flag for link submission
mvebu fixes for 5.4 (part 1)
Fix regression on USB for Turris Mox (Armada 3720 based board)
* tag 'mvebu-fixes-5.4-1' of git://git.infradead.org/linux-mvebu:
arm64: dts: armada-3720-turris-mox: convert usb-phy to phy-supply
Link: https://lore.kernel.org/r/87blunsm43.fsf@FE-laptop
Signed-off-by: Olof Johansson <olof@lixom.net>
Depending on inlining decisions by the compiler, __get/put_user_fn
might become out of line. Then the compiler is no longer able to tell
that size can only be 1,2,4 or 8 due to the check in __get/put_user
resulting in false positives like
./arch/s390/include/asm/uaccess.h: In function ‘__put_user_fn’:
./arch/s390/include/asm/uaccess.h:113:9: warning: ‘rc’ may be used uninitialized in this function [-Wmaybe-uninitialized]
113 | return rc;
| ^~
./arch/s390/include/asm/uaccess.h: In function ‘__get_user_fn’:
./arch/s390/include/asm/uaccess.h:143:9: warning: ‘rc’ may be used uninitialized in this function [-Wmaybe-uninitialized]
143 | return rc;
| ^~
These functions are supposed to be always inlined. Mark it as such.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Commit 4b708b7b1a ("firmware: google: check if size is valid when
decoding VPD data") adds length checks, but the new vpd_decode_entry()
function botched the logic -- it adds the key length twice, instead of
adding the key and value lengths separately.
On my local system, this means vpd.c's vpd_section_create_attribs() hits
an error case after the first attribute it parses, since it's no longer
looking at the correct offset. With this patch, I'm back to seeing all
the correct attributes in /sys/firmware/vpd/...
Fixes: 4b708b7b1a ("firmware: google: check if size is valid when decoding VPD data")
Cc: <stable@vger.kernel.org>
Cc: Hung-Te Lin <hungte@chromium.org>
Signed-off-by: Brian Norris <briannorris@chromium.org>
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Guenter Roeck <groeck@chromium.org>
Link: https://lore.kernel.org/r/20190930214522.240680-1-briannorris@chromium.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We have two ways a request can be deferred:
1) It's a regular request that depends on another one
2) It's a timeout that tracks completions
We have a shared helper to determine whether to defer, and that
attempts to make the right decision based on the request. But we
only have some of this information in the caller. Un-share the
two timeout/defer helpers so the caller can use the right one.
Fixes: 5262f56798 ("io_uring: IORING_OP_TIMEOUT support")
Reported-by: yangerkun <yangerkun@huawei.com>
Reviewed-by: Jackie Liu <liuyun01@kylinos.cn>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Karsten Graul says:
====================
Fixes for -net, addressing two races in SMC receive path and
add a missing cleanup when the link group creating fails with
ISM devices and a VLAN id.
====================
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
smc_rx_recvmsg() first checks if data is available, and then if
RCV_SHUTDOWN is set. There is a race when smc_cdc_msg_recv_action() runs
in between these 2 checks, receives data and sets RCV_SHUTDOWN.
In that case smc_rx_recvmsg() would return from receive without to
process the available data.
Fix that with a final check for data available if RCV_SHUTDOWN is set.
Move the check for data into a function and call it twice.
And use the existing helper smc_rx_data_available().
Fixes: 952310ccf2 ("smc: receive data from RMBE")
Reviewed-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
smc_cdc_rxed_any_close_or_senddone() is used as an end condition for the
receive loop. This conflicts with smc_cdc_msg_recv_action() which could
run in parallel and set the bits checked by
smc_cdc_rxed_any_close_or_senddone() before the receive is processed.
In that case we could return from receive with no data, although data is
available. The same applies to smc_rx_wait().
Fix this by checking for RCV_SHUTDOWN only, which is set in
smc_cdc_msg_recv_action() after the receive was actually processed.
Fixes: 952310ccf2 ("smc: receive data from RMBE")
Reviewed-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
If creation of an SMCD link group with VLAN id fails, the initial
smc_ism_get_vlan() step has to be reverted as well.
Fixes: c6ba7c9ba4 ("net/smc: add base infrastructure for SMC-D and ISM")
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
- Fix CML display by adding a missing ID.
- Drop redundant list_del_init
- Only enqueue already completed requests to avoid races
- Fixup preempt-to-busy vs reset of a virtual request
- Protect peeking at execlists->active
- execlists->active is serialised by the tasklet
drm-intel-next-fixes-2019-09-19:
- Extend old HSW workaround to fix some GPU hangs on Haswell GT2
- Fix return error code on GEM mmap.
- White list a chicken bit register for push constants legacy mode on Mesa
- Fix resume issue related to GGTT restore
- Remove incorrect BUG_ON on execlist's schedule-out
- Fix unrecoverable GPU hangs with Vulkan compute workloads on SKL
drm-intel-next-fixes-2019-09-26:
- Fix concurrence on cases where requests where getting retired at same time as resubmitted to HW
- Fix gen9 display resolutions by setting the right max plane width
- Fix GPU hang on preemption
- Mark contents as dirty on a write fault. This was breaking cursor sprite with dumb buffers.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191010143039.GA15313@intel.com
Commit 8960b38932 ("linux/dim: Rename externally used net_dim
members") renamed the net_dim API, removing the "net_" prefix from the
structures and functions. The patch didn't update the net_dim.txt
documentation file.
Fix the documentation so that its examples match the current code.
Fixes: 8960b38932 ("linux/dim: Rename externally used net_dim members", 2019-06-25)
Fixes: c002bd529d ("linux/dim: Rename externally exposed macros", 2019-06-25)
Fixes: 4f75da3666 ("linux/dim: Move implementation to .c files")
Cc: Tal Gilboa <talgi@mellanox.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Mariusz reported that invalid packets are sent after resume from
suspend if jumbo packets are active. It turned out that his BIOS
resets chip settings to non-jumbo on resume. Most chip settings are
re-initialized on resume from suspend by calling rtl_hw_start(),
so let's add configuring jumbo to this function.
There's nothing wrong with the commit marked as fixed, it's just
the first one where the patch applies cleanly.
Fixes: 7366016d2d ("r8169: read common register for PCI commit")
Reported-by: Mariusz Bialonczyk <manio@skyboo.net>
Tested-by: Mariusz Bialonczyk <manio@skyboo.net>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
intel-pinctrl fixes for v5.4
This includes two fixes for Intel pinctrl drivers:
- Fix warning about shared irqchip
- Restore Strago DMI workaround for all versions
It was reported that 72cd4064fc "NOMMU: Toggle only bits in
EXC_RETURN we are really care of" breaks NOMMU+XIP combination.
It happens because saved EXC_RETURN gets overwritten when data
section is relocated.
The fix is to propagate EXC_RETURN via register and let relocation
code to commit that value into memory.
Fixes: 72cd4064fc ("ARM: 8830/1: NOMMU: Toggle only bits in EXC_RETURN we are really care of")
Reported-by: afzal mohammed <afzal.mohd.ma@gmail.com>
Tested-by: afzal mohammed <afzal.mohd.ma@gmail.com>
Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
KernelCI reports that bcm2835_defconfig is no longer booting since
commit ac7c3e4ff4 ("compiler: enable CONFIG_OPTIMIZE_INLINING
forcibly") (https://lkml.org/lkml/2019/9/26/825).
I also received a regression report from Nicolas Saenz Julienne
(https://lkml.org/lkml/2019/9/27/263).
This problem has cropped up on bcm2835_defconfig because it enables
CONFIG_CC_OPTIMIZE_FOR_SIZE. The compiler tends to prefer not inlining
functions with -Os. I was able to reproduce it with other boards and
defconfig files by manually enabling CONFIG_CC_OPTIMIZE_FOR_SIZE.
The __get_user_check() specifically uses r0, r1, r2 registers.
So, uaccess_save_and_enable() and uaccess_restore() must be inlined.
Otherwise, those register assignments would be entirely dropped,
according to my analysis of the disassembly.
Prior to commit 9012d01166 ("compiler: allow all arches to enable
CONFIG_OPTIMIZE_INLINING"), the 'inline' marker was always enough for
inlining functions, except on x86.
Since that commit, all architectures can enable CONFIG_OPTIMIZE_INLINING.
So, __always_inline is now the only guaranteed way of forcible inlining.
I added __always_inline to 4 functions in the call-graph from the
__get_user_check() macro.
Fixes: 9012d01166 ("compiler: allow all arches to enable CONFIG_OPTIMIZE_INLINING")
Reported-by: "kernelci.org bot" <bot@kernelci.org>
Reported-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Tested-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Since commit 4f8943f808 ("SUNRPC: Replace direct task wakeups from
softirq context") there has been a race to the value of the sk_err if both
XPRT_SOCK_WAKE_ERROR and XPRT_SOCK_WAKE_DISCONNECT are set. In that case,
we may end up losing the sk_err value that existed when xs_error_report was
called.
Fix this by reverting to the previous behavior: instead of using SO_ERROR
to retrieve the value at a later time (which might also return sk_err_soft),
copy the sk_err value onto struct sock_xprt, and use that value to wake
pending tasks.
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Fixes: 4f8943f808 ("SUNRPC: Replace direct task wakeups from softirq context")
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
GCC 9.x automatically enables support for Loongson MMI instructions when
using some -march= flags, and then errors out when -msoft-float is
specified with:
cc1: error: ‘-mloongson-mmi’ must be used with ‘-mhard-float’
The kernel shouldn't be using these MMI instructions anyway, just as it
doesn't use floating point instructions. Explicitly disable them in
order to fix the build with GCC 9.x.
Signed-off-by: Paul Burton <paul.burton@mips.com>
Fixes: 3702bba5eb ("MIPS: Loongson: Add GCC 4.4 support for Loongson2E")
Fixes: 6f7a251a25 ("MIPS: Loongson: Add basic Loongson 2F support")
Fixes: 5188129b8c ("MIPS: Loongson-3: Improve -march option and move it to Platform")
Cc: Huacai Chen <chenhc@lemote.com>
Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
Cc: stable@vger.kernel.org # v2.6.32+
Cc: linux-mips@vger.kernel.org
Pull xfs fixes from Darrick Wong:
"A couple of small code cleanups and bug fixes for rounding errors,
metadata logging errors, and an extra layer of safeguards against
leaking memory contents.
- Fix a rounding error in the fallocate code
- Minor code cleanups
- Make sure to zero memory buffers before formatting metadata blocks
- Fix a few places where we forgot to log an inode metadata update
- Remove broken error handling that tried to clean up after a failure
but still got it wrong"
* tag 'xfs-5.4-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: move local to extent inode logging into bmap helper
xfs: remove broken error handling on failed attr sf to leaf change
xfs: log the inode on directory sf to block format change
xfs: assure zeroed memory buffers for certain kmem allocations
xfs: removed unused error variable from xchk_refcountbt_rec
xfs: remove unused flags arg from xfs_get_aghdr_buf()
xfs: Fix tail rounding in xfs_alloc_file_space()
1. nbd_put takes the mutex and drops nbd->ref to 0. It then does
idr_remove and drops the mutex.
2. nbd_genl_connect takes the mutex. idr_find/idr_for_each fails
to find an existing device, so it does nbd_dev_add.
3. just before the nbd_put could call nbd_dev_remove or not finished
totally, but if nbd_dev_add try to add_disk, we can hit:
debugfs: Directory 'nbd1' with parent 'block' already present!
This patch will make sure all the disk add/remove stuff are done
by holding the nbd_index_mutex lock.
Reported-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pull btrfs fixes from David Sterba:
"A few more stabitly fixes, one build warning fix.
- fix inode allocation under NOFS context
- fix leak in fiemap due to concurrent append writes
- fix log-root tree updates
- fix balance convert of single profile on 32bit architectures
- silence false positive warning on old GCCs (code moved in rc1)"
* tag 'for-5.4-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: silence maybe-uninitialized warning in clone_range
btrfs: fix uninitialized ret in ref-verify
btrfs: allocate new inode in NOFS context
btrfs: fix balance convert to single on 32-bit host CPUs
btrfs: fix incorrect updating of log root tree
Btrfs: fix memory leak due to concurrent append writes with fiemap
Pull dcache_readdir() fixes from Al Viro:
"The couple of patches you'd been OK with; no hlist conversion yet, and
cursors are still in the list of children"
[ Al is referring to future work to avoid some nasty O(n**2) behavior
with the readdir cursors when you have lots of concurrent readdirs.
This is just a fix for a race with a trivial cleanup - Linus ]
* 'work.dcache' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
libfs: take cursors out of list when moving past the end of directory
Fix the locking in dcache_readdir() and friends
Pull mount fixes from Al Viro:
"A couple of regressions from the mount series"
* 'work.mount3' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
vfs: add missing blkdev_put() in get_tree_bdev()
shmem: fix LSM options parsing
At the end of the v5.3 upstream kernel development cycle, Simon stepped
down from his role as Renesas SoC maintainer.
Remove his maintainership, git repository, and branch from the
MAINTAINERS file, and add an entry to the CREDITS file to honor his
work.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
xen_auto_xlat_grant_frames.vaddr is definitely NULL in this case.
So the address printing is unnecessary.
Signed-off-by: Fuqian Huang <huangfq.daxian@gmail.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Commit 721b1d98fb ("dm snapshot: Fix excessive memory usage and
workqueue stalls") introduced a semaphore to limit the maximum number of
in-flight kcopyd (COW) jobs.
The implementation of this throttling mechanism is prone to a deadlock:
1. One or more threads write to the origin device causing COW, which is
performed by kcopyd.
2. At some point some of these threads might reach the s->cow_count
semaphore limit and block in down(&s->cow_count), holding a read lock
on _origins_lock.
3. Someone tries to acquire a write lock on _origins_lock, e.g.,
snapshot_ctr(), which blocks because the threads at step (2) already
hold a read lock on it.
4. A COW operation completes and kcopyd runs dm-snapshot's completion
callback, which ends up calling pending_complete().
pending_complete() tries to resubmit any deferred origin bios. This
requires acquiring a read lock on _origins_lock, which blocks.
This happens because the read-write semaphore implementation gives
priority to writers, meaning that as soon as a writer tries to enter
the critical section, no readers will be allowed in, until all
writers have completed their work.
So, pending_complete() waits for the writer at step (3) to acquire
and release the lock. This writer waits for the readers at step (2)
to release the read lock and those readers wait for
pending_complete() (the kcopyd thread) to signal the s->cow_count
semaphore: DEADLOCK.
The above was thoroughly analyzed and documented by Nikos Tsironis as
part of his initial proposal for fixing this deadlock, see:
https://www.redhat.com/archives/dm-devel/2019-October/msg00001.html
Fix this deadlock by reworking COW throttling so that it waits without
holding any locks. Add a variable 'in_progress' that counts how many
kcopyd jobs are running. A function wait_for_in_progress() will sleep if
'in_progress' is over the limit. It drops _origins_lock in order to
avoid the deadlock.
Reported-by: Guruswamy Basavaiah <guru2018@gmail.com>
Reported-by: Nikos Tsironis <ntsironis@arrikto.com>
Reviewed-by: Nikos Tsironis <ntsironis@arrikto.com>
Tested-by: Nikos Tsironis <ntsironis@arrikto.com>
Fixes: 721b1d98fb ("dm snapshot: Fix excessive memory usage and workqueue stalls")
Cc: stable@vger.kernel.org # v5.0+
Depends-on: 4a3f111a73a8c ("dm snapshot: introduce account_start_copy() and account_end_copy()")
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
This simple refactoring moves code for modifying the semaphore cow_count
into separate functions to prepare for changes that will extend these
methods to provide for a more sophisticated mechanism for COW
throttling.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Reviewed-by: Nikos Tsironis <ntsironis@arrikto.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
If CRC16 is not set, building will fails:
drivers/w1/slaves/w1_ds250x.o: In function `w1_ds2505_read_page':
w1_ds250x.c:(.text+0x82f): undefined reference to `crc16'
w1_ds250x.c:(.text+0x90a): undefined reference to `crc16'
w1_ds250x.c:(.text+0x91a): undefined reference to `crc16'
Reported-by: Hulk Robot <hulkci@huawei.com>
Fixes: 25ec8710d9 ("w1: add DS2501, DS2502, DS2505 EPROM device driver")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20190920060318.35020-1-yuehaibing@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[cherry-picked to drm-misc-fixes: drm-misc-next commit dfef959803]
Commit 554b3529fe ("thermal/drivers/core: Remove the module Kconfig's
option") changed the type of THERMAL from tristate to bool, so
THERMAL || !THERMAL is now always y. Remove the redundant dependency.
Discovered through Kconfiglib detecting a dependency loop. The C tools
simplify the expression to y before running dependency loop detection,
and so don't see it. Changing the type of THERMAL back to tristate makes
the C tools detect the same loop.
Not sure if running dep. loop detection after simplification can be
called a bug. Fixing this nit unbreaks Kconfiglib on the kernel at
least.
Signed-off-by: Ulf Magnusson <ulfalizer@gmail.com>
Signed-off-by: Noralf Trønnes <noralf@tronnes.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20190927174218.GA32085@huvuddator
In hgcm_call_preprocess_linaddr memory is allocated for bounce_buf but
is not released if copy_form_user fails. In order to prevent memory leak
in case of failure, the assignment to bounce_buf_ret is moved before the
error check. This way the allocated bounce_buf will be released by the
caller.
Fixes: 579db9d45c ("virt: Add vboxguest VMMDEV communication code")
Signed-off-by: Navid Emamdoost <navid.emamdoost@gmail.com>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20190930204223.3660-1-navid.emamdoost@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The driver was using its struct usb_interface pointer as an inverted
disconnected flag, but was setting it to NULL without making sure all
code paths that used it were done with it.
Before commit ef61eb43ad ("USB: yurex: Fix protection fault after
device removal") this included the interrupt-in completion handler, but
there are further accesses in dev_err and dev_dbg statements in
yurex_write() and the driver-data destructor (sic!).
Fix this by unconditionally stopping also the control URB at disconnect
and by using a dedicated disconnected flag.
Note that we need to take a reference to the struct usb_interface to
avoid a use-after-free in the destructor whenever the device was
disconnected while the character device was still open.
Fixes: aadd6472d9 ("USB: yurex.c: remove dbg() usage")
Fixes: 45714104b9 ("USB: yurex.c: remove err() usage")
Cc: stable <stable@vger.kernel.org> # 3.5: ef61eb43ad
Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://lore.kernel.org/r/20191009153848.8664-6-johan@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Since the commit
7723f4c5ec ("driver core: platform: Add an error message to platform_get_irq*()")
the platform_get_irq() started issuing an error message which is not
what we want here.
Switch to platform_get_irq_optional() to have only warning message
provided by the driver.
Fixes: 7723f4c5ec ("driver core: platform: Add an error message to platform_get_irq*()")
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Fixes gcc '-Wunused-but-set-variable' warning:
drivers/platform/x86/classmate-laptop.c: In function cmpc_accel_remove_v4:
drivers/platform/x86/classmate-laptop.c:424:21: warning: variable accel
set but not used [-Wunused-but-set-variable]
drivers/platform/x86/classmate-laptop.c: In function cmpc_accel_remove:
drivers/platform/x86/classmate-laptop.c:660:21: warning: variable accel
set but not used [-Wunused-but-set-variable]
In function cmpc_accel_remove_v4 and cmpc_accel_remove, variable accel is
set but not used, so it can be removed. In that case, variable inputdev is
set but not used and can be removed.
Fixes: 7125587df4 ("classmate-laptop: Add support for Classmate V4 accelerometer.")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: yu kuai <yukuai3@huawei.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
All i.MX SoCs except i.MX1 have ONLY one necessary IRQ, use
platform_get_irq_optional() to get second/third IRQ which are
optional to avoid below error message during probe:
[ 0.726219] imx-uart 30860000.serial: IRQ index 1 not found
[ 0.731329] imx-uart 30860000.serial: IRQ index 2 not found
Fixes: 7723f4c5ec ("driver core: platform: Add an error message to platform_get_irq*()")
Signed-off-by: Anson Huang <Anson.Huang@nxp.com>
Link: https://lore.kernel.org/r/1570614559-11900-1-git-send-email-Anson.Huang@nxp.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
_find_opp_of_np() doesn't traverse the list of OPP tables but instead
just the entries within an OPP table and so only requires to lock the
OPP table itself.
The lockdep_assert_held() was added there by mistake and isn't really
required.
Fixes: 5d6d106fa4 ("OPP: Populate required opp tables from "required-opps" property")
Cc: v5.0+ <stable@vger.kernel.org> # v5.0+
Reported-by: Niklas Cassel <niklas.cassel@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Drop the redundant disconnect mutex which was introduced after the
open-disconnect race had been addressed generally in USB core by commit
d4ead16f50 ("USB: prevent char device open/deregister race").
Specifically, the rw-semaphore in core guarantees that all calls to
open() will have completed and that no new calls to open() will occur
after usb_deregister_dev() returns. Hence there is no need use the
driver data as an inverted disconnected flag.
Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://lore.kernel.org/r/20191009104846.5925-5-johan@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
A recent fix addressing a deadlock on disconnect introduced a new bug
by moving the present flag out of the critical section protected by the
driver-data mutex. This could lead to a racing release() freeing the
driver data before disconnect() is done with it.
Due to insufficient locking a related use-after-free could be triggered
also before the above mentioned commit. Specifically, the driver needs
to hold the driver-data mutex also while checking the opened flag at
disconnect().
Fixes: c468a8aa79 ("usb: iowarrior: fix deadlock on disconnect")
Fixes: 946b960d13 ("USB: add driver for iowarrior devices.")
Cc: stable <stable@vger.kernel.org> # 2.6.21
Reported-by: syzbot+0761012cebf7bdb38137@syzkaller.appspotmail.com
Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://lore.kernel.org/r/20191009104846.5925-2-johan@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The driver was using its struct usb_interface pointer as an inverted
disconnected flag, but was setting it to NULL before making sure all
completion handlers had run. This could lead to a NULL-pointer
dereference in a number of dev_dbg, dev_warn and dev_err statements in
the completion handlers which relies on said pointer.
Fix this by unconditionally stopping all I/O and preventing
resubmissions by poisoning the interrupt URBs at disconnect and using a
dedicated disconnected flag.
This also makes sure that all I/O has completed by the time the
disconnect callback returns.
Fixes: 2824bd250f ("[PATCH] USB: add ldusb driver")
Cc: stable <stable@vger.kernel.org> # 2.6.13
Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://lore.kernel.org/r/20191009153848.8664-4-johan@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
At startup we should trigger the HW state machine
only if it is OTG mode. Otherwise we should just
start the respective role.
Initialize idle role by default. If we don't do this then
cdns3_idle_role_stop() is not called when switching to
host/device role and so lane switch mechanism
doesn't work. This results to super-speed device not working
in one orientation if it was plugged before driver probe.
Signed-off-by: Roger Quadros <rogerq@ti.com>
Signed-off-by: Sekhar Nori <nsekhar@ti.com>
Link: https://lore.kernel.org/r/20191007121601.25996-2-rogerq@ti.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The driver was using its struct usb_interface pointer as an inverted
disconnected flag and was setting it to NULL before making sure all
completion handlers had run. This could lead to NULL-pointer
dereferences in the dev_err() statements in the completion handlers
which relies on said pointer.
Fix this by using a dedicated disconnected flag.
Note that this is also addresses a NULL-pointer dereference at release()
and a struct usb_interface reference leak introduced by a recent runtime
PM fix, which depends on and should have been submitted together with
this patch.
Fixes: 4212cd74ca ("USB: usb-skeleton.c: remove err() usage")
Fixes: 5c290a5e42 ("USB: usb-skeleton: fix runtime PM after driver unbind")
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://lore.kernel.org/r/20191009170944.30057-2-johan@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Jonathan writes:
First set of IIO fixes for the 5.4 cycle.
* adis16400
- Make sure to free memory on a few failure paths.
* adxl372
- Fix wrong fifo depth
- Fix wrong indexing of data from the fifo.
- Perform a reset at startup to avoid a problem with inconsistent state.
* axp288
- This is a fix for a fix. The original fix made sure we kept the
configuration from some firmwares to preserve a bias current.
Unfortunately it appears the previous behaviour was working around
a buggy firmware by overwriting the wrong value it had. Hence
a regression was seen.
* bmc150
- Fix the centre temperature. This was due to an error in one of the
datasheets.
* hx711
- Fix an issue where a badly timed interrupt could lead to a control
line being high long enough to put the device into a low power state.
* meson_sar_adc
- Fix a case where the irq was enabled before everything it uses was
allocated.
* st_lsm6dsx
- Ensure we don't set the sensor sensitivity to 0 as it will force
all readings to 0.
- Fix a wait time for the slave i2c controller when the accelerometer
is not enabled.
* stm32-adc
- Precursor for fix. Move a set of register definitions to a header.
- Fix a race when several ADCs are in use with some using interrupts
to control the dataflow and some using DMA.
* vcnl4000
- Fix a garbage of_match_table in which a string was passed instead
of the intended enum.
* tag 'iio-fixes-for-5.4a' of https://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio:
iio: Fix an undefied reference error in noa1305_probe
iio: light: opt3001: fix mutex unlock race
iio: adc: ad799x: fix probe error handling
iio: light: add missing vcnl4040 of_compatible
iio: light: fix vcnl4000 devicetree hooks
iio: imu: st_lsm6dsx: fix waitime for st_lsm6dsx i2c controller
iio: adc: axp288: Override TS pin bias current for some models
iio: imu: adis16400: fix memory leak
iio: imu: adis16400: release allocated memory on failure
iio: adc: stm32-adc: fix a race when using several adcs with dma and irq
iio: adc: stm32-adc: move registers definitions
iio: accel: adxl372: Perform a reset at start up
iio: accel: adxl372: Fix push to buffers lost samples
iio: accel: adxl372: Fix/remove limitation for FIFO samples
iio: adc: hx711: fix bug in sampling of data
iio: fix center temperature of bmc150-accel-core
iio: imu: st_lsm6dsx: forbid 0 sensor sensitivity
iio: adc: meson_saradc: Fix memory allocation order
max_tu_symbol was programmed to TU_SIZE_RECOMMENDED - 1, which is not
what the spec says. The spec says:
roundup ((input active video bandwidth in bytes/output active video
bandwidth in bytes) * tu_size)
It is not quite clear what the above means, but calculating
max_tu_symbol = (input Bps / output Bps) * tu_size seems to work and
fixes the issues seen.
This fixes artifacts in some videomodes (e.g. 1024x768@60 on 2-lanes &
1.62Gbps was pretty bad for me).
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
Tested-by: Jyri Sarha <jsarha@ti.com>
Signed-off-by: Andrzej Hajda <a.hajda@samsung.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190924131702.9988-1-tomi.valkeinen@ti.com
Include the <linux/runtime_pm.h> for the definition of
pm_wq to avoid the following warning:
kernel/power/main.c:890:25: warning: symbol 'pm_wq' was not declared. Should it be static?
Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
It is incorrect to set the cpufreq syscore shutdown callback pointer
to cpufreq_suspend(), because that function cannot be run in the
syscore stage of system shutdown for two reasons: (a) it may attempt
to carry out actions depending on devices that have already been shut
down at that point and (b) the RCU synchronization carried out by it
may not be able to make progress then.
The latter issue has been present since commit 45975c7d21 ("rcu:
Define RCU-sched API in terms of RCU for Tree RCU PREEMPT builds"),
but the former one has been there since commit 90de2a4aa9 ("cpufreq:
suspend cpufreq governors on shutdown") regardless.
Fix that by dropping cpufreq_syscore_ops altogether and making
device_shutdown() call cpufreq_suspend() directly before shutting
down devices, which is along the lines of what system-wide power
management does.
Fixes: 45975c7d21 ("rcu: Define RCU-sched API in terms of RCU for Tree RCU PREEMPT builds")
Fixes: 90de2a4aa9 ("cpufreq: suspend cpufreq governors on shutdown")
Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Tested-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: 4.0+ <stable@vger.kernel.org> # 4.0+
This reverts part of commit 71630b7a83 ("ACPI / PM: Blacklist Low
Power S0 Idle _DSM for Dell XPS13 9360") to remove the S0ix blacklist
for the XPS 9360.
The problems with this system occurred in one possible NVME SSD when
putting system into s0ix. As the NVME sleep behavior has been adjusted
in commit d916b1be94 ("nvme-pci: use host managed power state for
suspend") this is expected to be now resolved.
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=196907
Signed-off-by: Mario Limonciello <mario.limonciello@dell.com>
Tested-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Commit 37db8985b2 ("s390/cio: add basic protected virtualization
support") breaks virtio-ccw devices with VIRTIO_F_IOMMU_PLATFORM for non
Protected Virtualization (PV) guests. The problem is that the dma_mask
of the ccw device, which is used by virtio core, gets changed from 64 to
31 bit, because some of the DMA allocations do require 31 bit
addressable memory. For PV the only drawback is that some of the virtio
structures must end up in ZONE_DMA because we have the bounce the
buffers mapped via DMA API anyway.
But for non PV guests we have a problem: because of the 31 bit mask
guests bigger than 2G are likely to try bouncing buffers. The swiotlb
however is only initialized for PV guests, because we don't want to
bounce anything for non PV guests. The first such map kills the guest.
Since the DMA API won't allow us to specify for each allocation whether
we need memory from ZONE_DMA (31 bit addressable) or any DMA capable
memory will do, let us use coherent_dma_mask (which is used for
allocations) to force allocating form ZONE_DMA while changing dma_mask
to DMA_BIT_MASK(64) so that at least the streaming API will regard
the whole memory DMA capable.
Signed-off-by: Halil Pasic <pasic@linux.ibm.com>
Reported-by: Marc Hartmayer <mhartmay@linux.ibm.com>
Suggested-by: Robin Murphy <robin.murphy@arm.com>
Fixes: 37db8985b2 ("s390/cio: add basic protected virtualization support")
Link: https://lore.kernel.org/lkml/20190930153803.7958-1-pasic@linux.ibm.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
sk->sk_backlog.len can be written by BH handlers, and read
from process contexts in a lockless way.
Note the write side should also use WRITE_ONCE() or a variant.
We need some agreement about the best way to do this.
syzbot reported :
BUG: KCSAN: data-race in tcp_add_backlog / tcp_grow_window.isra.0
write to 0xffff88812665f32c of 4 bytes by interrupt on cpu 1:
sk_add_backlog include/net/sock.h:934 [inline]
tcp_add_backlog+0x4a0/0xcc0 net/ipv4/tcp_ipv4.c:1737
tcp_v4_rcv+0x1aba/0x1bf0 net/ipv4/tcp_ipv4.c:1925
ip_protocol_deliver_rcu+0x51/0x470 net/ipv4/ip_input.c:204
ip_local_deliver_finish+0x110/0x140 net/ipv4/ip_input.c:231
NF_HOOK include/linux/netfilter.h:305 [inline]
NF_HOOK include/linux/netfilter.h:299 [inline]
ip_local_deliver+0x133/0x210 net/ipv4/ip_input.c:252
dst_input include/net/dst.h:442 [inline]
ip_rcv_finish+0x121/0x160 net/ipv4/ip_input.c:413
NF_HOOK include/linux/netfilter.h:305 [inline]
NF_HOOK include/linux/netfilter.h:299 [inline]
ip_rcv+0x18f/0x1a0 net/ipv4/ip_input.c:523
__netif_receive_skb_one_core+0xa7/0xe0 net/core/dev.c:5004
__netif_receive_skb+0x37/0xf0 net/core/dev.c:5118
netif_receive_skb_internal+0x59/0x190 net/core/dev.c:5208
napi_skb_finish net/core/dev.c:5671 [inline]
napi_gro_receive+0x28f/0x330 net/core/dev.c:5704
receive_buf+0x284/0x30b0 drivers/net/virtio_net.c:1061
virtnet_receive drivers/net/virtio_net.c:1323 [inline]
virtnet_poll+0x436/0x7d0 drivers/net/virtio_net.c:1428
napi_poll net/core/dev.c:6352 [inline]
net_rx_action+0x3ae/0xa50 net/core/dev.c:6418
read to 0xffff88812665f32c of 4 bytes by task 7292 on cpu 0:
tcp_space include/net/tcp.h:1373 [inline]
tcp_grow_window.isra.0+0x6b/0x480 net/ipv4/tcp_input.c:413
tcp_event_data_recv+0x68f/0x990 net/ipv4/tcp_input.c:717
tcp_rcv_established+0xbfe/0xf50 net/ipv4/tcp_input.c:5618
tcp_v4_do_rcv+0x381/0x4e0 net/ipv4/tcp_ipv4.c:1542
sk_backlog_rcv include/net/sock.h:945 [inline]
__release_sock+0x135/0x1e0 net/core/sock.c:2427
release_sock+0x61/0x160 net/core/sock.c:2943
tcp_recvmsg+0x63b/0x1a30 net/ipv4/tcp.c:2181
inet_recvmsg+0xbb/0x250 net/ipv4/af_inet.c:838
sock_recvmsg_nosec net/socket.c:871 [inline]
sock_recvmsg net/socket.c:889 [inline]
sock_recvmsg+0x92/0xb0 net/socket.c:885
sock_read_iter+0x15f/0x1e0 net/socket.c:967
call_read_iter include/linux/fs.h:1864 [inline]
new_sync_read+0x389/0x4f0 fs/read_write.c:414
__vfs_read+0xb1/0xc0 fs/read_write.c:427
vfs_read fs/read_write.c:461 [inline]
vfs_read+0x143/0x2c0 fs/read_write.c:446
Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 7292 Comm: syz-fuzzer Not tainted 5.3.0+ #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
sock_rcvlowat() or int_sk_rcvlowat() might be called without the socket
lock for example from tcp_poll().
Use READ_ONCE() to document the fact that other cpus might change
sk->sk_rcvlowat under us and avoid KCSAN splats.
Use WRITE_ONCE() on write sides too.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
sk_add_backlog() callers usually read sk->sk_rcvbuf without
owning the socket lock. This means sk_rcvbuf value can
be changed by other cpus, and KCSAN complains.
Add READ_ONCE() annotations to document the lockless nature
of these reads.
Note that writes over sk_rcvbuf should also use WRITE_ONCE(),
but this will be done in separate patches to ease stable
backports (if we decide this is relevant for stable trees).
BUG: KCSAN: data-race in tcp_add_backlog / tcp_recvmsg
write to 0xffff88812ab369f8 of 8 bytes by interrupt on cpu 1:
__sk_add_backlog include/net/sock.h:902 [inline]
sk_add_backlog include/net/sock.h:933 [inline]
tcp_add_backlog+0x45a/0xcc0 net/ipv4/tcp_ipv4.c:1737
tcp_v4_rcv+0x1aba/0x1bf0 net/ipv4/tcp_ipv4.c:1925
ip_protocol_deliver_rcu+0x51/0x470 net/ipv4/ip_input.c:204
ip_local_deliver_finish+0x110/0x140 net/ipv4/ip_input.c:231
NF_HOOK include/linux/netfilter.h:305 [inline]
NF_HOOK include/linux/netfilter.h:299 [inline]
ip_local_deliver+0x133/0x210 net/ipv4/ip_input.c:252
dst_input include/net/dst.h:442 [inline]
ip_rcv_finish+0x121/0x160 net/ipv4/ip_input.c:413
NF_HOOK include/linux/netfilter.h:305 [inline]
NF_HOOK include/linux/netfilter.h:299 [inline]
ip_rcv+0x18f/0x1a0 net/ipv4/ip_input.c:523
__netif_receive_skb_one_core+0xa7/0xe0 net/core/dev.c:5004
__netif_receive_skb+0x37/0xf0 net/core/dev.c:5118
netif_receive_skb_internal+0x59/0x190 net/core/dev.c:5208
napi_skb_finish net/core/dev.c:5671 [inline]
napi_gro_receive+0x28f/0x330 net/core/dev.c:5704
receive_buf+0x284/0x30b0 drivers/net/virtio_net.c:1061
virtnet_receive drivers/net/virtio_net.c:1323 [inline]
virtnet_poll+0x436/0x7d0 drivers/net/virtio_net.c:1428
napi_poll net/core/dev.c:6352 [inline]
net_rx_action+0x3ae/0xa50 net/core/dev.c:6418
read to 0xffff88812ab369f8 of 8 bytes by task 7271 on cpu 0:
tcp_recvmsg+0x470/0x1a30 net/ipv4/tcp.c:2047
inet_recvmsg+0xbb/0x250 net/ipv4/af_inet.c:838
sock_recvmsg_nosec net/socket.c:871 [inline]
sock_recvmsg net/socket.c:889 [inline]
sock_recvmsg+0x92/0xb0 net/socket.c:885
sock_read_iter+0x15f/0x1e0 net/socket.c:967
call_read_iter include/linux/fs.h:1864 [inline]
new_sync_read+0x389/0x4f0 fs/read_write.c:414
__vfs_read+0xb1/0xc0 fs/read_write.c:427
vfs_read fs/read_write.c:461 [inline]
vfs_read+0x143/0x2c0 fs/read_write.c:446
ksys_read+0xd5/0x1b0 fs/read_write.c:587
__do_sys_read fs/read_write.c:597 [inline]
__se_sys_read fs/read_write.c:595 [inline]
__x64_sys_read+0x4c/0x60 fs/read_write.c:595
do_syscall_64+0xcf/0x2f0 arch/x86/entry/common.c:296
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 7271 Comm: syz-fuzzer Not tainted 5.3.0+ #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
tcp_memory_pressure is read without holding any lock,
and its value could be changed on other cpus.
Use READ_ONCE() to annotate these lockless reads.
The write side is already using atomic ops.
Fixes: b8da51ebb1 ("tcp: introduce tcp_under_memory_pressure()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
reqsk_queue_empty() is called from inet_csk_listen_poll() while
other cpus might write ->rskq_accept_head value.
Use {READ|WRITE}_ONCE() to avoid compiler tricks
and potential KCSAN splats.
Fixes: fff1f3001c ("tcp: add a spinlock to protect struct request_sock_queue")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
The flag NLM_F_ECHO aims to reply to the user the message notified to all
listeners.
It was not the case with the command RTM_NEWNSID, let's fix this.
Fixes: 0c7aecd4bd ("netns: add rtnl cmd to add and get peer netns ids")
Reported-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Guillaume Nault <gnault@redhat.com>
Tested-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Clearing ch->device in ch_release() is wrong because that pointer must
remain valid until ch_remove() is called. This patch fixes the following
crash the second time a ch device is opened:
BUG: kernel NULL pointer dereference, address: 0000000000000790
RIP: 0010:scsi_device_get+0x5/0x60
Call Trace:
ch_open+0x4c/0xa0 [ch]
chrdev_open+0xa2/0x1c0
do_dentry_open+0x13a/0x380
path_openat+0x591/0x1470
do_filp_open+0x91/0x100
do_sys_open+0x184/0x220
do_syscall_64+0x5f/0x1a0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Fixes: 085e56766f ("scsi: ch: add refcounting")
Cc: Hannes Reinecke <hare@suse.de>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20191009173536.247889-1-bvanassche@acm.org
Reported-by: Rob Turk <robtu@rtist.nl>
Suggested-by: Rob Turk <robtu@rtist.nl>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When building a kernel with SCSI_SNI_53C710 enabled, Kconfig warns:
WARNING: unmet direct dependencies detected for 53C700_LE_ON_BE
Depends on [n]: SCSI_LOWLEVEL [=y] && SCSI [=y] && SCSI_LASI700 [=n]
Selected by [y]:
- SCSI_SNI_53C710 [=y] && SCSI_LOWLEVEL [=y] && SNI_RM [=y] && SCSI [=y]
Add the missing depends SCSI_SNI_53C710 to 53C700_LE_ON_BE to fix it.
Link: https://lore.kernel.org/r/20191009151128.32411-1-tbogendoerfer@suse.de
Signed-off-by: Thomas Bogendoerfer <tbogendoerfer@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Some arrays are not capable of returning RTPG data during state
transitioning, but rather return an 'LUN not accessible, asymmetric access
state transition' sense code. In these cases we can set the state to
'transitioning' directly and don't need to evaluate the RTPG data (which we
won't have anyway).
Link: https://lore.kernel.org/r/20191007135701.32389-1-hare@suse.de
Reviewed-by: Laurence Oberman <loberman@redhat.com>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The return code from null_handle_zoned() sets the cmd->error value.
Returning OK status when an error occured overwrites the intended
cmd->error. Return the appropriate error code instead of setting the
error in the cmd.
Fixes: fceb5d1b19 ("null_blk: create a helper for zoned devices")
Cc: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
If tcf_register_action failed, mirred_device_notifier
should be unregistered.
Fixes: 3b87956ea6 ("net sched: fix race in mirred device removal")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
When configuring a taprio instance if "flags" is not specified (or
it's zero), taprio currently replies with an "Invalid argument" error.
So, set the return value to zero after we are done with all the
checks.
Fixes: 9c66d15646 ("taprio: Add support for hardware offloading")
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Acked-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Julian Wiedmann says:
====================
s390/qeth: fixes 2019-10-08
Alexandra fixes two issues in the initialization code for vnicc cmds.
One is an uninitialized variable when a cmd fails, the other that we
wouldn't recover correctly if the device's supported features changed.
====================
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Without this patch, a command bit in the supported commands mask is only
ever set to unsupported during set online. If a command is ever marked as
unsupported (e.g. because of error during qeth_l2_vnicc_query_cmds),
subsequent successful initialization (offline/online) would not bring it
back.
Fixes: caa1f0b10d ("s390/qeth: add VNICC enable/disable support")
Signed-off-by: Alexandra Winter <wintera@linux.ibm.com>
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Smatch discovered the use of uninitialized variable sup_cmds
in error paths.
Fixes: caa1f0b10d ("s390/qeth: add VNICC enable/disable support")
Signed-off-by: Alexandra Winter <wintera@linux.ibm.com>
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Fix kernel-doc warnings in phylink.c:
../drivers/net/phy/phylink.c:595: warning: Function parameter or member 'config' not described in 'phylink_create'
../drivers/net/phy/phylink.c:595: warning: Excess function parameter 'ndev' description in 'phylink_create'
Fixes: 8796c8923d ("phylink: add documentation for kernel APIs")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
This patch is to fix a NULL-ptr deref in selinux_socket_connect_helper:
[...] kasan: GPF could be caused by NULL-ptr deref or user memory access
[...] RIP: 0010:selinux_socket_connect_helper+0x94/0x460
[...] Call Trace:
[...] selinux_sctp_bind_connect+0x16a/0x1d0
[...] security_sctp_bind_connect+0x58/0x90
[...] sctp_process_asconf+0xa52/0xfd0 [sctp]
[...] sctp_sf_do_asconf+0x785/0x980 [sctp]
[...] sctp_do_sm+0x175/0x5a0 [sctp]
[...] sctp_assoc_bh_rcv+0x285/0x5b0 [sctp]
[...] sctp_backlog_rcv+0x482/0x910 [sctp]
[...] __release_sock+0x11e/0x310
[...] release_sock+0x4f/0x180
[...] sctp_accept+0x3f9/0x5a0 [sctp]
[...] inet_accept+0xe7/0x720
It was caused by that the 'newsk' sk_socket was not set before going to
security sctp hook when processing asconf chunk with SCTP_PARAM_ADD_IP
or SCTP_PARAM_SET_PRIMARY:
inet_accept()->
sctp_accept():
lock_sock():
lock listening 'sk'
do_softirq():
sctp_rcv(): <-- [1]
asconf chunk arrives and
enqueued in 'sk' backlog
sctp_sock_migrate():
set asoc's sk to 'newsk'
release_sock():
sctp_backlog_rcv():
lock 'newsk'
sctp_process_asconf() <-- [2]
unlock 'newsk'
sock_graft():
set sk_socket <-- [3]
As it shows, at [1] the asconf chunk would be put into the listening 'sk'
backlog, as accept() was holding its sock lock. Then at [2] asconf would
get processed with 'newsk' as asoc's sk had been set to 'newsk'. However,
'newsk' sk_socket is not set until [3], while selinux_sctp_bind_connect()
would deref it, then kernel crashed.
Here to fix it by adding the chunk to sk_backlog until newsk sk_socket is
set when .accept() is done.
Note that sk->sk_socket can be NULL when the sock is closed, so SOCK_DEAD
flag is also needed to check in sctp_newsk_ready().
Thanks to Ondrej for reviewing the code.
Fixes: d452930fd3 ("selinux: Add SCTP support")
Reported-by: Ying Xu <yinxu@redhat.com>
Suggested-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Accordingly to Synopsys documentation [1] and [2], when bit PPSEN0
in register MAC_PPS_CONTROL is set it selects the functionality
command in the same register, otherwise selects the functionality
control.
Command functionality is required to either enable (command 0x2)
and disable (command 0x5) the flexible PPS output, but the bit
PPSEN0 is currently set only for enabling.
Set the bit PPSEN0 to properly disable flexible PPS output.
Tested on STM32MP15x, based on dwmac 4.10a.
[1] DWC Ethernet QoS Databook 4.10a October 2014
[2] DWC Ethernet QoS Databook 5.00a September 2017
Signed-off-by: Antonio Borneo <antonio.borneo@st.com>
Fixes: 9a8a02c9d4 ("net: stmmac: Add Flexible PPS support")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
The field "name" in struct ptp_clock_info has a fixed size of 16
chars and is used as zero terminated string by clock_name_show()
in drivers/ptp/ptp_sysfs.c
The current initialization value requires 17 chars to fit also the
null termination, and this causes overflow to the next bytes in
the struct when the string is read as null terminated:
hexdump -C /sys/class/ptp/ptp0/clock_name
00000000 73 74 6d 6d 61 63 5f 70 74 70 5f 63 6c 6f 63 6b |stmmac_ptp_clock|
00000010 a0 ac b9 03 0a |.....|
where the extra 4 bytes (excluding the newline) after the string
represent the integer 0x03b9aca0 = 62500000 assigned to the field
"max_adj" that follows "name" in the same struct.
There is no strict requirement for the "name" content and in the
comment in ptp_clock_kernel.h it's reported it should just be 'A
short "friendly name" to identify the clock'.
Replace it with "stmmac ptp".
Signed-off-by: Antonio Borneo <antonio.borneo@st.com>
Fixes: 92ba688851 ("stmmac: add the support for PTP hw clock driver")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Rather than using "unsigned long", use "u32" for 32-bit instructions in
the alignment fault handler.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
When the system has high memory pressure, the page containing the
instruction may be paged out. Using probe_kernel_address() means that
if the page is swapped out, the resulting page fault will not be
handled because page faults are disabled by this function.
Use get_user() to read the instruction instead.
Reported-by: Jing Xiangfeng <jingxiangfeng@huawei.com>
Fixes: b255188f90 ("ARM: fix scheduling while atomic warning in alignment handling code")
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Commit 572cf7d7b0 ("ARM: dts: Improve omap l4per idling with wlcore edge
sensitive interrupt") changed wlcore interrupts to use edge interrupt based
on what's specified in the wl1835mod.pdf data sheet.
However, there are still cases where we can have lost interrupts as
described in omap_gpio_unidle(). And using a level interrupt instead of edge
interrupt helps as we avoid the check for untriggered GPIO interrupts in
omap_gpio_unidle().
And with commit e6818d29ea ("gpio: gpio-omap: configure edge detection
for level IRQs for idle wakeup") GPIOs idle just fine with level interrupts.
Let's change omap4 and 5 wlcore users back to using level interrupt
instead of edge interrupt. Let's not change the others as I've only seen
this on omap4 and 5, probably because the other SoCs don't have l4per idle
independent of the CPUs.
Fixes: 572cf7d7b0 ("ARM: dts: Improve omap l4per idling with wlcore edge sensitive interrupt")
Depends-on: e6818d29ea ("gpio: gpio-omap: configure edge detection for level IRQs for idle wakeup")
Cc: Anders Roxell <anders.roxell@linaro.org>
Cc: Eyal Reizer <eyalr@ti.com>
Cc: Guy Mishol <guym@ti.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Tony Lindgren <tony@atomide.com>
With commit 79bdcb202a ("ARM: 8906/1: drivers/amba: add reset control
to amba bus probe") it is possible for the the amba bus driver to defer
probing the device for its IDs because the reset driver may be probed
later.
However when a subsequent probe occurs, the call to request_resource()
in the driver returns -EBUSY as the driver has not released the resource
from the initial probe attempt - or cleaned up any of the preceding
actions.
Fix this both for the deferred probe case as well as a failure to get
the reset.
Fixes: 79bdcb202a ("ARM: 8906/1: drivers/amba: add reset control to amba bus probe")
Reported-by: Dinh Nguyen <dinguyen@kernel.org>
Tested-by: Dinh Nguyen <dinguyen@kernel.org>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Due to the nature of preempt-to-busy the execlists active tracking and
the schedule queue may become temporarily desync'ed (between resubmission
to HW and its ack from HW). This means that we may have unwound a
request and passed it back to the virtual engine, but it is still
inflight on the HW and may even result in a GPU hang. If we detect that
GPU hang and try to reset, the hanging request->engine will no longer
match the current engine, which means that the request is not on the
execlists active list and we should not try to find an older incomplete
request. Given that we have deduced this must be a request on a virtual
engine, it is the single active request in the context and so must be
guilty (as the context is still inflight, it is prevented from being
executed on another engine as we process the reset).
Fixes: 22b7a426bb ("drm/i915/execlists: Preempt-to-busy")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190923152844.8914-2-chris@chris-wilson.co.uk
(cherry picked from commit cb2377a919)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
We should not remove the workqueue, we just need to ensure that the
workqueues are synced. The workqueues are torn down on ctx removal.
Cc: stable@vger.kernel.org
Fixes: 6b06314c47 ("io_uring: add file set registration")
Reported-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
If we are asked to submit a completed request, just move it onto the
active-list without modifying it's payload. If we try to emit the
modified payload of a completed request, we risk racing with the
ring->head update during retirement which may advance the head past our
breadcrumb and so we generate a warning for the emission being behind
the RING_HEAD.
v2: Commentary for the sneaky, shared responsibility between functions.
v3: Spelling mistakes and bonus assertion
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190923110056.15176-3-chris@chris-wilson.co.uk
(cherry picked from commit c0bb487dc1)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
I hit the following error when compile the kernel.
drivers/iio/light/noa1305.o: In function `noa1305_probe':
noa1305.c:(.text+0x65): undefined reference to `__devm_regmap_init_i2c'
make: *** [vmlinux] Error 1
Signed-off-by: zhong jiang <zhongjiang@huawei.com>
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
When an end-of-conversion interrupt is received after performing a
single-shot reading of the light sensor, the driver was waking up the
result ready queue before checking opt->ok_to_ignore_lock to determine
if it should unlock the mutex. The problem occurred in the case where
the other thread woke up and changed the value of opt->ok_to_ignore_lock
to false prior to the interrupt thread performing its read of the
variable. In this case, the mutex would be unlocked twice.
Signed-off-by: David Frey <dpfrey@gmail.com>
Reviewed-by: Andreas Dannenberg <dannenberg@ti.com>
Fixes: 94a9b7b180 ("iio: light: add support for TI's opt3001 light sensor")
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Since commit 0f7ddcc1bf ("iio:adc:ad799x: Write default config on probe
and reset alert status on probe") the error path is wrong since it
leaves the vref regulator on. Fix this by disabling both regulators.
Fixes: 0f7ddcc1bf ("iio:adc:ad799x: Write default config on probe and reset alert status on probe")
Signed-off-by: Marco Felsch <m.felsch@pengutronix.de>
Reviewed-by: Alexandru Ardelean <alexandru.ardelean@analog.com>
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Commit 5a441aade5 ("iio: light: vcnl4000 add support for the VCNL4040
proximity and light sensor") added the support for the vcnl4040 but
forgot to add the of_compatible. Fix this by adding it now.
Signed-off-by: Marco Felsch <m.felsch@pengutronix.de>
Fixes: 5a441aade5 ("iio: light: vcnl4000 add support for the VCNL4040 proximity and light sensor")
Reviewed-by: Angus Ainslie (Purism) angus@akkea.ca
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Since commit ebd457d559 ("iio: light: vcnl4000 add devicetree hooks")
the of_match_table is supported but the data shouldn't be a string.
Instead it shall be one of 'enum vcnl4000_device_ids'. Also the matching
logic for the vcnl4020 was wrong. Since the data retrieve mechanism is
still based on the i2c_device_id no failures did appeared till now.
Fixes: ebd457d559 ("iio: light: vcnl4000 add devicetree hooks")
Signed-off-by: Marco Felsch <m.felsch@pengutronix.de>
Reviewed-by: Angus Ainslie (Purism) angus@akkea.ca
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
i2c controller available in st_lsm6dsx series performs i2c slave
configuration using accel clock as trigger.
st_lsm6dsx_shub_wait_complete routine is used to wait the controller has
carried out the requested configuration. However if the accel sensor is not
enabled we should not use its configured odr to estimate a proper timeout
Fixes: c91c1c844e ("iio: imu: st_lsm6dsx: add i2c embedded controller support")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Since commit 9bcf15f75c ("iio: adc: axp288: Fix TS-pin handling") we
preserve the bias current set by the firmware at boot. This fixes issues
we were seeing on various models, but it seems our old hardcoded 80ųA bias
current was working around a firmware bug on at least one model laptop.
In order to both have our cake and eat it, this commit adds a dmi based
list of models where we need to override the firmware set bias current and
adds the one model we now know needs this to it: The Lenovo Ideapad 100S
(11 inch version).
Fixes: 9bcf15f75c ("iio: adc: axp288: Fix TS-pin handling")
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=203829
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
End of conversion may be handled by using IRQ or DMA. There may be a
race when two conversions complete at the same time on several ADCs.
EOC can be read as 'set' for several ADCs, with:
- an ADC configured to use IRQs. EOCIE bit is set. The handler is normally
called in this case.
- an ADC configured to use DMA. EOCIE bit isn't set. EOC triggers the DMA
request instead. It's then automatically cleared by DMA read. But the
handler gets called due to status bit is temporarily set (IRQ triggered
by the other ADC).
So both EOC status bit in CSR and EOCIE control bit must be checked
before invoking the interrupt handler (e.g. call ISR only for
IRQ-enabled ADCs).
Fixes: 2763ea0585 ("iio: adc: stm32: add optional dma support")
Signed-off-by: Fabrice Gasnier <fabrice.gasnier@st.com>
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Move STM32 ADC registers definitions to common header.
This is precursor patch to:
- iio: adc: stm32-adc: fix a race when using several adcs with dma and irq
It keeps registers definitions as a whole block, to ease readability and
allow simple access path to EOC bits (readl) in stm32-adc-core driver.
Fixes: 2763ea0585 ("iio: adc: stm32: add optional dma support")
Signed-off-by: Fabrice Gasnier <fabrice.gasnier@st.com>
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
We need to perform a reset a start up to make sure that the chip is in a
consistent state. This reset also disables all the interrupts which
should only be enabled together with the iio buffer. Not doing this, was
sometimes causing unwanted interrupts to trigger.
Signed-off-by: Stefan Popa <stefan.popa@analog.com>
Fixes: f4f55ce38e ("iio:adxl372: Add FIFO and interrupts support")
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
One in two sample sets was lost by multiplying fifo_set_size with
sizeof(u16). Also, the double number of available samples were pushed to
the iio buffers.
Signed-off-by: Stefan Popa <stefan.popa@analog.com>
Fixes: f4f55ce38e ("iio:adxl372: Add FIFO and interrupts support")
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Currently, the driver sets the FIFO_SAMPLES register with the number of
sample sets (maximum of 170 for 3 axis data, 256 for 2-axis and 512 for
single axis). However, the FIFO_SAMPLES register should store the number
of samples, regardless of how the FIFO format is configured.
Signed-off-by: Stefan Popa <stefan.popa@analog.com>
Fixes: f4f55ce38e ("iio:adxl372: Add FIFO and interrupts support")
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Fix bug in sampling function hx711_cycle() when interrupt occures while
PD_SCK is high. If PD_SCK is high for at least 60 us power down mode of
the sensor is entered which in turn leads to a wrong measurement.
Switch off interrupts during a PD_SCK high period and move query of DOUT
to the latest point of time which is at the end of PD_SCK low period.
This bug exists in the driver since it's initial addition. The more
interrupts on the system the higher is the probability that it happens.
Fixes: c3b2fdd0ea ("iio: adc: hx711: Add IIO driver for AVIA HX711")
Signed-off-by: Andreas Klinger <ak@it-klinger.de>
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
First batch of fixes intended for v5.4
* fix for an ACPI table parsing bug;
* a fix for a NULL pointer dereference in the cfg with specific
devices;
* fix the rb_allocator;
* prevent multiple phy configuration with new devices;
* fix a race-condition in the rx queue;
* prevent a couple of memory leaks;
* fix initialization of 3168 devices (the infamous BAD_COMMAND bug);
* fix recognition of some newer systems with integrated MAC;
This patch adds missing MIX2 path on RX1/2 which take IIR1 and
IIR2 as inputs.
Without this patch sound card fails to intialize with below warning:
ASoC: no sink widget found for RX1 MIX2 INP1
ASoC: Failed to add route IIR1 -> IIR1 -> RX1 MIX2 INP1
ASoC: no sink widget found for RX2 MIX2 INP1
ASoC: Failed to add route IIR1 -> IIR1 -> RX2 MIX2 INP1
ASoC: no sink widget found for RX1 MIX2 INP1
ASoC: Failed to add route IIR2 -> IIR2 -> RX1 MIX2 INP1
ASoC: no sink widget found for RX2 MIX2 INP1
ASoC: Failed to add route IIR2 -> IIR2 -> RX2 MIX2 INP1
Reported-by: Stephan Gerhold <stephan@gerhold.net>
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Tested-by: Stephan Gerhold <stephan@gerhold.net>
Link: https://lore.kernel.org/r/20191009111944.28069-1-srinivas.kandagatla@linaro.org
Signed-off-by: Mark Brown <broonie@kernel.org>
Pull rdma fixes from Jason Gunthorpe:
"The usual collection of driver bug fixes, and a few regressions from
the merge window. Nothing particularly worrisome.
- Various missed memory frees and error unwind bugs
- Fix regressions in a few iwarp drivers from 5.4 patches
- A few regressions added in past kernels
- Squash a number of races in mlx5 ODP code"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
RDMA/mlx5: Add missing synchronize_srcu() for MW cases
RDMA/mlx5: Put live in the correct place for ODP MRs
RDMA/mlx5: Order num_pending_prefetch properly with synchronize_srcu
RDMA/odp: Lift umem_mutex out of ib_umem_odp_unmap_dma_pages()
RDMA/mlx5: Fix a race with mlx5_ib_update_xlt on an implicit MR
RDMA/mlx5: Do not allow rereg of a ODP MR
IB/core: Fix wrong iterating on ports
RDMA/nldev: Reshuffle the code to avoid need to rebind QP in error path
RDMA/cxgb4: Do not dma memory off of the stack
RDMA/cm: Fix memory leak in cm_add/remove_one
RDMA/core: Fix an error handling path in 'res_get_common_doit()'
RDMA/i40iw: Associate ibdev to netdev before IB device registration
RDMA/iwcm: Fix a lock inversion issue
RDMA/iw_cxgb4: fix SRQ access from dump_qp()
RDMA/hfi1: Prevent memory leak in sdma_init
RDMA/core: Fix use after free and refcnt leak on ndev in_device in iwarp_query_port
RDMA/siw: Fix serialization issue in write_space()
RDMA/vmw_pvrdma: Free SRQ only once
Polaris and vegam use count for the value rather than
level. This looks like a copy paste typo from when
the code was adapted from previous asics.
I'm not sure that the SMU actually uses this value, so
I don't know that it actually is a bug per se.
Bug: https://bugs.freedesktop.org/show_bug.cgi?id=108609
Reported-by: Robert Strube <rstrube@gmail.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
cleanup error handling code and make sure temporary info array
with the handles are freed by amdgpu_bo_list_put() on
idr_replace()'s failure.
Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Pull arm64 fixes from Will Deacon:
"A larger-than-usual batch of arm64 fixes for -rc3.
The bulk of the fixes are dealing with a bunch of issues with the
build system from the compat vDSO, which unfortunately led to some
significant Makefile rework to manage the horrible combinations of
toolchains that we can end up needing to drive simultaneously.
We came close to disabling the thing entirely, but Vincenzo was quick
to spin up some patches and I ended up picking up most of the bits
that were left [*]. Future work will look at disentangling the header
files properly.
Other than that, we have some important fixes all over, including one
papering over the miscompilation fallout from forcing
CONFIG_OPTIMIZE_INLINING=y, which I'm still unhappy about. Harumph.
We've still got a couple of open issues, so I'm expecting to have some
more fixes later this cycle.
Summary:
- Numerous fixes to the compat vDSO build system, especially when
combining gcc and clang
- Fix parsing of PAR_EL1 in spurious kernel fault detection
- Partial workaround for Neoverse-N1 erratum #1542419
- Fix IRQ priority masking on entry from compat syscalls
- Fix advertisment of FRINT HWCAP to userspace
- Attempt to workaround inlining breakage with '__always_inline'
- Fix accidental freeing of parent SVE state on fork() error path
- Add some missing NULL pointer checks in instruction emulation init
- Some formatting and comment fixes"
[*] Will's final fixes were
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
but they were already in linux-next by then and he didn't rebase
just to add those.
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (21 commits)
arm64: armv8_deprecated: Checking return value for memory allocation
arm64: Kconfig: Make CONFIG_COMPAT_VDSO a proper Kconfig option
arm64: vdso32: Rename COMPATCC to CC_COMPAT
arm64: vdso32: Pass '--target' option to clang via VDSO_CAFLAGS
arm64: vdso32: Don't use KBUILD_CPPFLAGS unconditionally
arm64: vdso32: Move definition of COMPATCC into vdso32/Makefile
arm64: Default to building compat vDSO with clang when CONFIG_CC_IS_CLANG
lib: vdso: Remove CROSS_COMPILE_COMPAT_VDSO
arm64: vdso32: Remove jump label config option in Makefile
arm64: vdso32: Detect binutils support for dmb ishld
arm64: vdso: Remove stale files from old assembly implementation
arm64: vdso32: Fix broken compat vDSO build warnings
arm64: mm: fix spurious fault detection
arm64: ftrace: Ensure synchronisation in PLT setup for Neoverse-N1 #1542419
arm64: Fix incorrect irqflag restore for priority masking for compat
arm64: mm: avoid virt_to_phys(init_mm.pgd)
arm64: cpufeature: Effectively expose FRINT capability to userspace
arm64: Mark functions using explicit register variables as '__always_inline'
docs: arm64: Fix indentation and doc formatting
arm64/sve: Fix wrong free for task->thread.sve_state
...
The callers of xfs_bmap_local_to_extents_empty() log the inode
external to the function, yet this function is where the on-disk
format value is updated. Push the inode logging down into the
function itself to help prevent future mistakes.
Note that internal bmap callers track the inode logging flags
independently and thus may log the inode core twice due to this
change. This is harmless, so leave this code around for consistency
with the other attr fork conversion functions.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
xfs_attr_shortform_to_leaf() attempts to put the shortform fork back
together after a failed attempt to convert from shortform to leaf
format. While this code reallocates and copies back the shortform
attr fork data, it never resets the inode format field back to local
format. Further, now that the inode is properly logged after the
initial switch from local format, any error that triggers the
recovery code will eventually abort the transaction and shutdown the
fs. Therefore, remove the broken and unnecessary error handling
code.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
When a directory changes from shortform (sf) to block format, the sf
format is copied to a temporary buffer, the inode format is modified
and the updated format filled with the dentries from the temporary
buffer. If the inode format is modified and attempt to grow the
inode fails (due to I/O error, for example), it is possible to
return an error while leaving the directory in an inconsistent state
and with an otherwise clean transaction. This results in corruption
of the associated directory and leads to xfs_dabuf_map() errors as
subsequent lookups cannot accurately determine the format of the
directory. This problem is reproduced occasionally by generic/475.
The fundamental problem is that xfs_dir2_sf_to_block() changes the
on-disk inode format without logging the inode. The inode is
eventually logged by the bmapi layer in the common case, but error
checking introduces the possibility of failing the high level
request before this happens.
Update both of the dir2 and attr callers of
xfs_bmap_local_to_extents_empty() to log the inode core as
consistent with the bmap local to extent format change codepath.
This ensures that any subsequent errors after the format has changed
cause the transaction to abort.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
We no longer need the extra mirror length tracking in the O_DIRECT code,
as we are able to track the maximum contiguous length in dreq->max_count.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
When a series of O_DIRECT reads or writes are truncated, either due to
eof or due to an error, then we should return the number of contiguous
bytes that were received/sent starting at the offset specified by the
application.
Currently, we are failing to correctly check contiguity, and so we're
failing the generic/465 in xfstests when the race between the read
and write RPCs causes the file to get extended while the 2 reads are
outstanding. If the first read RPC call wins the race and returns with
eof set, we should treat the second read RPC as being truncated.
Reported-by: Su Yanjun <suyj.fnst@cn.fujitsu.com>
Fixes: 1ccbad9f9f ("nfs: fix DIO good bytes calculation")
Cc: stable@vger.kernel.org # 4.1+
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
It turns out that the NMI latency workaround from commit:
6d3edaae16 ("x86/perf/amd: Resolve NMI latency issues for active PMCs")
ends up being too conservative and results in the perf NMI handler claiming
NMIs too easily on AMD hardware when the NMI watchdog is active.
This has an impact, for example, on the hpwdt (HPE watchdog timer) module.
This module can produce an NMI that is used to reset the system. It
registers an NMI handler for the NMI_UNKNOWN type and relies on the fact
that nothing has claimed an NMI so that its handler will be invoked when
the watchdog device produces an NMI. After the referenced commit, the
hpwdt module is unable to process its generated NMI if the NMI watchdog is
active, because the current NMI latency mitigation results in the NMI
being claimed by the perf NMI handler.
Update the AMD perf NMI latency mitigation workaround to, instead, use a
window of time. Whenever a PMC is handled in the perf NMI handler, set a
timestamp which will act as a perf NMI window. Any NMIs arriving within
that window will be claimed by perf. Anything outside that window will
not be claimed by perf. The value for the NMI window is set to 100 msecs.
This is a conservative value that easily covers any NMI latency in the
hardware. While this still results in a window in which the hpwdt module
will not receive its NMI, the window is now much, much smaller.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Jerry Hoemann <jerry.hoemann@hpe.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 6d3edaae16 ("x86/perf/amd: Resolve NMI latency issues for active PMCs")
Link: https://lkml.kernel.org/r/Message-ID:
Signed-off-by: Ingo Molnar <mingo@kernel.org>
In perf_rotate_context(), when the first cpu flexible event fail to
schedule, cpu_rotate is 1, while cpu_event is NULL. Since cpu_event is
NULL, perf_rotate_context will _NOT_ call cpu_ctx_sched_out(), thus
cpuctx->ctx.is_active will have EVENT_FLEXIBLE set. Then, the next
perf_event_sched_in() will skip all cpu flexible events because of the
EVENT_FLEXIBLE bit.
In the next call of perf_rotate_context(), cpu_rotate stays 1, and
cpu_event stays NULL, so this process repeats. The end result is, flexible
events on this cpu will not be scheduled (until another event being added
to the cpuctx).
Here is an easy repro of this issue. On Intel CPUs, where ref-cycles
could only use one counter, run one pinned event for ref-cycles, one
flexible event for ref-cycles, and one flexible event for cycles. The
flexible ref-cycles is never scheduled, which is expected. However,
because of this issue, the cycles event is never scheduled either.
$ perf stat -e ref-cycles:D,ref-cycles,cycles -C 5 -I 1000
time counts unit events
1.000152973 15,412,480 ref-cycles:D
1.000152973 <not counted> ref-cycles (0.00%)
1.000152973 <not counted> cycles (0.00%)
2.000486957 18,263,120 ref-cycles:D
2.000486957 <not counted> ref-cycles (0.00%)
2.000486957 <not counted> cycles (0.00%)
To fix this, when the flexible_active list is empty, try rotate the
first event in the flexible_groups. Also, rename ctx_first_active() to
ctx_event_to_rotate(), which is more accurate.
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <kernel-team@fb.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sasha Levin <sashal@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 8d5bce0c37 ("perf/core: Optimize perf_rotate_context() event scheduling")
Link: https://lkml.kernel.org/r/20191008165949.920548-1-songliubraving@fb.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
vtime_account_system() assumes that the target task to account cputime
to is always the current task. This is most often true indeed except on
task switch where we call:
vtime_common_task_switch(prev)
vtime_account_system(prev)
Here prev is the scheduling-out task where we account the cputime to. It
doesn't match current that is already the scheduling-in task at this
stage of the context switch.
So we end up checking the wrong task flags to determine if we are
accounting guest or system time to the previous task.
As a result the wrong task is used to check if the target is running in
guest mode. We may then spuriously account or leak either system or
guest time on task switch.
Fix this assumption and also turn vtime_guest_enter/exit() to use the
task passed in parameter as well to avoid future similar issues.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wanpeng Li <wanpengli@tencent.com>
Fixes: 2a42eb9594 ("sched/cputime: Accumulate vtime on top of nsec clocksource")
Link: https://lkml.kernel.org/r/20190925214242.21873-1-frederic@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The quota/period ratio is used to ensure a child task group won't get
more bandwidth than the parent task group, and is calculated as:
normalized_cfs_quota() = [(quota_us << 20) / period_us]
If the quota/period ratio was changed during this scaling due to
precision loss, it will cause inconsistency between parent and child
task groups.
See below example:
A userspace container manager (kubelet) does three operations:
1) Create a parent cgroup, set quota to 1,000us and period to 10,000us.
2) Create a few children cgroups.
3) Set quota to 1,000us and period to 10,000us on a child cgroup.
These operations are expected to succeed. However, if the scaling of
147/128 happens before step 3, quota and period of the parent cgroup
will be changed:
new_quota: 1148437ns, 1148us
new_period: 11484375ns, 11484us
And when step 3 comes in, the ratio of the child cgroup will be
104857, which will be larger than the parent cgroup ratio (104821),
and will fail.
Scaling them by a factor of 2 will fix the problem.
Tested-by: Phil Auld <pauld@redhat.com>
Signed-off-by: Xuewei Zhang <xueweiz@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Phil Auld <pauld@redhat.com>
Cc: Anton Blanchard <anton@ozlabs.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Fixes: 2e8e192263 ("sched/fair: Limit sched_cfs_period_timer() loop to avoid hard lockup")
Link: https://lkml.kernel.org/r/20191004001243.140897-1-xueweiz@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
There were a bunch of devices with qu and jf that were loading the
configuration with pu and jf, which is wrong. Fix them all
accordingly. Additionally, remove 0x1010 and 0x1210 subsytem IDs from
the list, since they are obviously wrong, and 0x0044 and 0x0244, which
were duplicate.
Cc: stable@vger.kernel.org # 5.1+
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
We currently support two NICs in FW version 29, namely 7265D and 3168.
Out of these, only 7265D supports GEO SAR, so adjust the function that
checks for it accordingly.
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Fixes: f5a47fae6a ("iwlwifi: mvm: fix version check for GEO_TX_POWER_LIMIT support")
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
In iwl_pcie_ctxt_info_gen3_init there are cases that the allocated dma
memory is leaked in case of error.
DMA memories prph_scratch, prph_info, and ctxt_info_gen3 are allocated
and initialized to be later assigned to trans_pcie. But in any error case
before such assignment the allocated memories should be released.
First of such error cases happens when iwl_pcie_init_fw_sec fails.
Current implementation correctly releases prph_scratch. But in two
sunsequent error cases where dma_alloc_coherent may fail, such
releases are missing.
This commit adds release for prph_scratch when allocation for
prph_info fails, and adds releases for prph_scratch and prph_info when
allocation for ctxt_info_gen3 fails.
Fixes: 2ee8240262 ("iwlwifi: pcie: support context information for 22560 devices")
Signed-off-by: Navid Emamdoost <navid.emamdoost@gmail.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
We don't handle failures in the rb_allocator workqueue allocation
correctly. To fix that, move the code earlier so the cleanup is
easier and we don't have to undo all the interrupt allocations in
this case.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
We got a crash in iwl_trans_pcie_get_cmdlen(), while the TFD was
being accessed to sum up the lengths.
We want to access the TFD here, which is the information for the
hardware. We always only allocate 32 buffers for the cmd queue,
but on newer hardware (using TFH) we can also allocate only a
shorter hardware array, also only 32 TFDs. Prior to the TFH, we
had to allocate a bigger TFD array but would make those point to
a smaller set of buffers.
Additionally, now max_tfd_queue_size is up to 65536, so we can
access *way* out of bounds of a really only 32-entry array, so
it crashes.
Fix this by making the TFD index depend on which hardware we are
using right now.
While changing the calculation, also fix it to not use void ptr
arithmetic, but cast to u8 * before.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Consider the following flow:
1. Driver starts to sync the rx queues due to a delba.
mvm->queue_sync_cookie=1.
This rx-queues-sync is synchronous, so it doesn't increment the
cookie until all rx queues handle the notification from FW.
2. During this time, driver starts to sync rx queues due to nssn sync
required.
The cookie's value is still 1, but it doesn't matter since this
rx-queue-sync is non-synchronous so in the notification handler the
cookie is ignored.
What _does_ matter is that this flow increments the cookie to 2
immediately.
Remember though that the FW won't start servicing this command until
it's done with the previous one.
3. FW is still handling the first command, so it sends a notification
with internal_notif->sync=1, and internal_notif->cookie=0, which
triggers a WARN_ONCE.
The solution for this race is to only use the mvm->queue_sync_cookie in
case of a synchronous sync-rx-queues. This way in step 2 the cookie's
value won't change so we avoid the WARN.
The commit in the "fixes" field is the first commit to introduce
non-synchronous sending of this command to FW.
Fixes: 3c514bf831 ("iwlwifi: mvm: add a loose synchronization of the NSSN across Rx queues")
Signed-off-by: Naftali Goldstein <naftali.goldstein@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
The PHY is initialized during device initialization, but devices with
the tx_siso_diversity flag set need to send PHY_CONFIGURATION_CMD first,
otherwise the PHY would be reinitialized, causing a SYSASSERT.
To fix this, use a bit that tells the FW not to complete the PHY
initialization before a PHY_CONFIGURATION_CMD is received.
Signed-off-by: Haim Dreyfuss <haim.dreyfuss@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
We can't check for the ACPI table revision validity in the same if
where we check if the package was read correctly, because we return
PTR_ERR(pkg) and if the table is not valid but the pointer is, we
would return a valid pointer as an error. Fix that by moving the
table checks to a separate if and return -EINVAL if it's not valid.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
We copy cfg->trans to trans->trans_cfg at the very beginning, so don't
try to access it via cfg->trans anymore, because the cfg may be unset
in later cases.
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
If 'jmb38x_ms_count_slots()' returns 0, we must undo the previous
'pci_request_regions()' call.
Goto 'err_out_int' to fix it.
Fixes: 60fdd931d5 ("memstick: add support for JMicron jmb38x MemoryStick host controller")
Cc: stable@vger.kernel.org
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Update Turris Mox device tree to use the phy-supply property of the
generic PHY framework instead of the legacy usb-phy property.
This is needed since it caused a regression on Turris Mox since "usb:
host: xhci-plat: Prevent an abnormally restrictive PHY init skipping".
Signed-off-by: Marek Behún <marek.behun@nic.cz>
Fixes: eb6c2eb6c7 ("usb: host: xhci-plat: Prevent an abnormally restrictive PHY init skipping")
Reviewed-by: Miquel Raynal <miquel.raynal@bootlin.com>
Cc: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Gregory CLEMENT <gregory.clement@bootlin.com>
kvmhv_switch_to_host() in arch/powerpc/kvm/book3s_hv_rmhandlers.S
needs to set kvmppc_vcore->in_guest to 0 to signal secondary CPUs to
continue. This happens after resetting the PCR. Before commit
13c7bb3c57 ("powerpc/64s: Set reserved PCR bits"), r0 would always
be 0 before it was stored to kvmppc_vcore->in_guest. However because
of this change in the commit:
/* Reset PCR */
ld r0, VCORE_PCR(r5)
- cmpdi r0, 0
+ LOAD_REG_IMMEDIATE(r6, PCR_MASK)
+ cmpld r0, r6
beq 18f
- li r0, 0
- mtspr SPRN_PCR, r0
+ mtspr SPRN_PCR, r6
18:
/* Signal secondary CPUs to continue */
stb r0,VCORE_IN_GUEST(r5)
We are no longer comparing r0 against 0 and loading it with 0 if it
contains something else. Hence when we store r0 to
kvmppc_vcore->in_guest, it might not be 0. This means that secondary
CPUs will not be signalled to continue. Those CPUs get stuck and
errors like the following are logged:
KVM: CPU 1 seems to be stuck
KVM: CPU 2 seems to be stuck
KVM: CPU 3 seems to be stuck
KVM: CPU 4 seems to be stuck
KVM: CPU 5 seems to be stuck
KVM: CPU 6 seems to be stuck
KVM: CPU 7 seems to be stuck
This can be reproduced with:
$ for i in `seq 1 7` ; do chcpu -d $i ; done ;
$ taskset -c 0 qemu-system-ppc64 -smp 8,threads=8 \
-M pseries,accel=kvm,kvm-type=HV -m 1G -nographic -vga none \
-kernel vmlinux -initrd initrd.cpio.xz
Fix by making sure r0 is 0 before storing it to
kvmppc_vcore->in_guest.
Fixes: 13c7bb3c57 ("powerpc/64s: Set reserved PCR bits")
Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Jordan Niethe <jniethe5@gmail.com>
Reviewed-by: Alistair Popple <alistair@popple.id.au>
Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191004025317.19340-1-jniethe5@gmail.com
Newer versions of GCC (>= 9) demand that the size of the string to be
copied must be explicitly smaller than the size of the destination.
Thus, the NULL char has to be taken into account on strncpy.
This will avoid the following compiling error:
tlbie_test.c: In function 'main':
tlbie_test.c:639:4: error: 'strncpy' specified bound 100 equals destination size
strncpy(logdir, optarg, LOGDIR_NAME_SIZE);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors
Signed-off-by: Desnes A. Nunes do Rosario <desnesn@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191003211010.9711-1-desnesn@linux.ibm.com
Since commit 1211ee61b4 ("powerpc/pseries: Read TLB Block Invalidate
Characteristics"), a warning message is displayed when booting a guest
on top of KVM:
lpar: arch/powerpc/platforms/pseries/lpar.c pseries_lpar_read_hblkrm_characteristics Error calling get-system-parameter (0xfffffffd)
This message is displayed because this hypervisor is not supporting
the H_BLOCK_REMOVE hcall and thus is not exposing the corresponding
feature.
Reading the TLB Block Invalidate Characteristics should not be done if
the feature is not exposed.
Fixes: 1211ee61b4 ("powerpc/pseries: Read TLB Block Invalidate Characteristics")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191001132928.72555-1-ldufour@linux.ibm.com
After merging the powerpc tree, today's linux-next build (powerpc64
allnoconfig) failed like this:
arch/powerpc/mm/book3s64/pgtable.c:216:3:
error: implicit declaration of function 'radix__flush_all_lpid_guest'
radix__flush_all_lpid_guest() is only declared for
CONFIG_PPC_RADIX_MMU which is not set for this build.
Fix it by adding an empty version for the RADIX_MMU=n case, which
should never be called.
Fixes: 99161de3a2 ("powerpc/64s/radix: tidy up TLB flushing code")
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
[mpe: Munge change log]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190930101342.36c1afa0@canb.auug.org.au
Mark inode for force revalidation if LOOKUP_REVAL flag is set.
This tells the client to actually send a QueryInfo request to
the server to obtain the latest metadata in case a directory
or a file were changed remotely. Only do that if the client
doesn't have a lease for the file to avoid unneeded round
trips to the server.
Cc: <stable@vger.kernel.org>
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Currently the client indicates that a dentry is stale when inode
numbers or type types between a local inode and a remote file
don't match. If this is the case attributes is not being copied
from remote to local, so, it is already known that the local copy
has stale metadata. That's why the inode needs to be marked for
revalidation in order to tell the VFS to lookup the dentry again
before openning a file. This prevents unexpected stale errors
to be returned to the user space when openning a file.
Cc: <stable@vger.kernel.org>
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Fixes: cb7a69e605 ("cifs: Initialize filesystem timestamp ranges")
Only very old servers (e.g. OS/2 and DOS) did not support
DCE TIME (100 nanosecond granularity). Fix the checks used
to set minimum and maximum times.
Fixes xfstest generic/258 (on 5.4-rc1 and later)
CC: Deepa Dinamani <deepa.kernel@gmail.com>
Acked-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
ip6erspan driver calls ether_setup(), after commit 61e84623ac
("net: centralize net_device min/max MTU checking"), the range
of mtu is [min_mtu, max_mtu], which is [68, 1500] by default.
It causes the dev mtu of the erspan device to not be greater
than 1500, this limit value is not correct for ip6erspan tap
device.
Fixes: 61e84623ac ("net: centralize net_device min/max MTU checking")
Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Acked-by: William Tu <u9012063@gmail.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Johannes Berg says:
====================
A number of fixes:
* allow scanning when operating on radar channels in
ETSI regdomains
* accept deauth frames in IBSS - we have code to parse
and handle them, but were dropping them early
* fix an allocation failure path in hwsim
* fix a failure path memory leak in nl80211 FTM code
* fix RCU handling & locking in multi-BSSID parsing
* reject malformed SSID in mac80211 (this shouldn't
really be able to happen, but defense in depth)
* avoid userspace buffer overrun in ancient wext code
if SSID was too long
====================
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Message was intended only for developer temporary build
In addition cleanup two minor warnings noticed by Coverity
and a trivial change to workaround a sparse warning
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Commit c394159310 ("Input: soc_button_array - add support for newer
surface devices") not only added support for the MSHW0040 ACPI HID,
but for some reason it also makes changes to the error handling of the
soc_button_lookup_gpio() call in soc_button_device_create(). Note ideally
this seamingly unrelated change would have been made in a separate commit,
with a message explaining the what and why of this change.
I guess this change may have been added to deal with -EPROBE_DEFER errors,
but in case of the existing support for PNP0C40 devices, treating
-EPROBE_DEFER as any other error is deliberate, see the comment this
commit adds for why.
The actual returning of -EPROBE_DEFER to the caller of soc_button_probe()
introduced by the new error checking causes a serious regression:
On devices with so called virtual GPIOs soc_button_lookup_gpio() will
always return -EPROBE_DEFER for these fake GPIOs, when this happens
during the second call of soc_button_device_create() we already have
successfully registered our first child. This causes the kernel to think
we are making progress with probing things even though we unregister the
child before again before we return the -EPROBE_DEFER. Since we are making
progress the kernel will retry deferred-probes again immediately ending
up stuck in a loop with the following showing in dmesg:
[ 124.022697] input: gpio-keys as /devices/platform/INTCFD9:00/gpio-keys.0.auto/input/input6537
[ 124.040764] input: gpio-keys as /devices/platform/INTCFD9:00/gpio-keys.0.auto/input/input6538
[ 124.056967] input: gpio-keys as /devices/platform/INTCFD9:00/gpio-keys.0.auto/input/input6539
[ 124.072143] input: gpio-keys as /devices/platform/INTCFD9:00/gpio-keys.0.auto/input/input6540
[ 124.092373] input: gpio-keys as /devices/platform/INTCFD9:00/gpio-keys.0.auto/input/input6541
[ 124.108065] input: gpio-keys as /devices/platform/INTCFD9:00/gpio-keys.0.auto/input/input6542
[ 124.128483] input: gpio-keys as /devices/platform/INTCFD9:00/gpio-keys.0.auto/input/input6543
[ 124.147141] input: gpio-keys as /devices/platform/INTCFD9:00/gpio-keys.0.auto/input/input6544
[ 124.165070] input: gpio-keys as /devices/platform/INTCFD9:00/gpio-keys.0.auto/input/input6545
[ 124.179775] input: gpio-keys as /devices/platform/INTCFD9:00/gpio-keys.0.auto/input/input6546
[ 124.202726] input: gpio-keys as /devices/platform/INTCFD9:00/gpio-keys.0.auto/input/input6547
<continues on and on and on>
And 1 CPU core being stuck at 100% and udev hanging since it is waiting
for the modprobe of soc_button_array to return.
This patch reverts the soc_button_lookup_gpio() error handling changes,
fixing this regression.
Fixes: c394159310 ("Input: soc_button_array - add support for newer surface devices")
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=205031
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Tested-by: Maximilian Luz <luzmaximilian@gmail.com>
Acked-by: Maximilian Luz <luzmaximilian@gmail.com>
Link: https://lore.kernel.org/r/20191005105551.353273-1-hdegoede@redhat.com
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
For TCA_ACT_KIND, we have to keep the backward compatibility too,
and rely on nla_strlcpy() to check and terminate the string with
a NUL.
Note for TC actions, nla_strcmp() is already used to compare kind
strings, so we don't need to fix other places.
Fixes: 199ce850ce ("net_sched: add policy validation for action attributes")
Reported-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Marcelo noticed a backward compatibility issue of TCA_KIND
after we move from NLA_STRING to NLA_NUL_STRING, so it is probably
too late to change it.
Instead, to make everyone happy, we can just insert a NUL to
terminate the string with nla_strlcpy() like we do for TC actions.
Fixes: 62794fc4fb ("net_sched: add max len check for TCA_KIND")
Reported-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Duplicate rules were not allowed to be configured with SW steering.
This restriction caused failures with the replace rule logic done by
upper layers.
This fix allows for multiple rules with the same match values, in
such case the first inserted rules will match.
Fixes: 41d0707415 ("net/mlx5: DR, Expose steering rule functionality")
Signed-off-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Pull LED fixes from Jacek Anaszewski:
- fix a leftover from earlier stage of development in the documentation
of recently added led_compose_name() and fix old mistake in the
documentation of led_set_brightness_sync() parameter name.
- MAINTAINERS: add pointer to Pavel Machek's linux-leds.git tree.
Pavel is going to take over LED tree maintainership from myself.
* tag 'led-fixes-for-5.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/j.anaszewski/linux-leds:
Add my linux-leds branch to MAINTAINERS
leds: core: Fix leds.h structure documentation
Eric Biggers says:
====================
Patches 1-2 fix the memory leaks that syzbot has reported in net/llc:
memory leak in llc_ui_create (2)
memory leak in llc_ui_sendmsg
memory leak in llc_conn_ac_send_sabme_cmd_p_set_x
Patches 3-4 fix related bugs that I noticed while reading this code.
Note: I've tested that this fixes the syzbot bugs, but otherwise I don't
know of any way to test this code.
====================
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
If llc_conn_state_process() sees that llc_conn_service() put the skb on
a list, it will drop one fewer references to it. This is wrong because
the current behavior is that llc_conn_service() never consumes a
reference to the skb.
The code also makes the number of skb references being dropped
conditional on which of ind_prim and cfm_prim are nonzero, yet neither
of these affects how many references are *acquired*. So there is extra
code that tries to fix this up by sometimes taking another reference.
Remove the unnecessary/broken refcounting logic and instead just add an
skb_get() before the only two places where an extra reference is
actually consumed.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
All callers of llc_conn_state_process() except llc_build_and_send_pkt()
(via llc_ui_sendmsg() -> llc_ui_send_data()) assume that it always
consumes a reference to the skb. Fix this caller to do the same.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
syzbot reported:
BUG: memory leak
unreferenced object 0xffff888116270800 (size 224):
comm "syz-executor641", pid 7047, jiffies 4294947360 (age 13.860s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00 20 e1 2a 81 88 ff ff 00 40 3d 2a 81 88 ff ff . .*.....@=*....
backtrace:
[<000000004d41b4cc>] kmemleak_alloc_recursive include/linux/kmemleak.h:55 [inline]
[<000000004d41b4cc>] slab_post_alloc_hook mm/slab.h:439 [inline]
[<000000004d41b4cc>] slab_alloc_node mm/slab.c:3269 [inline]
[<000000004d41b4cc>] kmem_cache_alloc_node+0x153/0x2a0 mm/slab.c:3579
[<00000000506a5965>] __alloc_skb+0x6e/0x210 net/core/skbuff.c:198
[<000000001ba5a161>] alloc_skb include/linux/skbuff.h:1058 [inline]
[<000000001ba5a161>] alloc_skb_with_frags+0x5f/0x250 net/core/skbuff.c:5327
[<0000000047d9c78b>] sock_alloc_send_pskb+0x269/0x2a0 net/core/sock.c:2225
[<000000003828fe54>] sock_alloc_send_skb+0x32/0x40 net/core/sock.c:2242
[<00000000e34d94f9>] llc_ui_sendmsg+0x10a/0x540 net/llc/af_llc.c:933
[<00000000de2de3fb>] sock_sendmsg_nosec net/socket.c:652 [inline]
[<00000000de2de3fb>] sock_sendmsg+0x54/0x70 net/socket.c:671
[<000000008fe16e7a>] __sys_sendto+0x148/0x1f0 net/socket.c:1964
[...]
The bug is that llc_sap_state_process() always takes an extra reference
to the skb, but sometimes neither llc_sap_next_state() nor
llc_sap_state_process() itself drops this reference.
Fix it by changing llc_sap_next_state() to never consume a reference to
the skb, rather than sometimes do so and sometimes not. Then remove the
extra skb_get() and kfree_skb() from llc_sap_state_process().
Reported-by: syzbot+6bf095f9becf5efef645@syzkaller.appspotmail.com
Reported-by: syzbot+31c16aa4202dace3812e@syzkaller.appspotmail.com
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Add pointer to my git tree to MAINTAINERS. I'd like to maintain
linux-leds for-next branch for 5.5.
Signed-off-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Jacek Anaszewski <jacek.anaszewski@gmail.com>
Update the leds.h structure documentation to define the
correct arguments.
Signed-off-by: Dan Murphy <dmurphy@ti.com>
Signed-off-by: Jacek Anaszewski <jacek.anaszewski@gmail.com>
drivers/md/dm-clone-target.c:594:34: warning:
symbol '__hash_find' was not declared. Should it be static?
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Pull GPIO fixes from Linus Walleij:
- don't clear FLAG_IS_OUT when emulating open drain/source in gpiolib
- fix up the usage of nonexclusive GPIO descriptors from device trees
- fix the incorrect IEC offset when toggling trigger edge in the
Spreadtrum driver
- use the correct unit for debounce settings in the MAX77620 driver
* tag 'gpio-v5.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
gpio: max77620: Use correct unit for debounce times
gpio: eic: sprd: Fix the incorrect EIC offset when toggling
gpio: fix getting nonexclusive gpiods from DT
gpiolib: don't clear FLAG_IS_OUT when emulating open-drain/open-source
Pull selinuxfix from Paul Moore:
"One patch to ensure we don't copy bad memory up into userspace"
* tag 'selinux-pr-20191007' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
selinux: fix context string corruption in convert_context()
Pull Kselftest fixes from Shuah Khan:
"Fixes for existing tests and the framework.
Cristian Marussi's patches add the ability to skip targets (tests) and
exclude tests that didn't build from run-list. These patches improve
the Kselftest results. Ability to skip targets helps avoid running
tests that aren't supported in certain environments. As an example,
bpf tests from mainline aren't supported on stable kernels and have
dependency on bleeding edge llvm. Being able to skip bpf on systems
that can't meet this llvm dependency will be helpful.
Kselftest can be built and installed from the main Makefile. This
change help simplify Kselftest use-cases which addresses request from
users.
Kees Cook added per test timeout support to limit individual test
run-time"
* tag 'linux-kselftest-5.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests: watchdog: Add command line option to show watchdog_info
selftests: watchdog: Validate optional file argument
selftests/kselftest/runner.sh: Add 45 second timeout per test
kselftest: exclude failed TARGETS from runlist
kselftest: add capability to skip chosen TARGETS
selftests: Add kselftest-all and kselftest-install targets
The driver does not use input subsystem so we do not need this header,
and it is being removed, so stop pulling it in.
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
We discussed a better location for this file, and agreed that
core-api/ is a good fit. Rename it to symbol-namespaces.rst
for disambiguation, and also add it to index.rst and MAINTAINERS.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Matthias Maennich <maennich@google.com>
Signed-off-by: Jessica Yu <jeyu@kernel.org>
The clkctrl code searches for the parent clockdomain based on the name
of the CM provider node. The introduction of SGX node for omap5 made
the node name for the gpu_cm to be clock-controller. There is no
clockdomain named like this, so the lookup fails. Fix by changing
the node name properly.
Fixes: 394534cb07 ("ARM: dts: Configure sgx for omap5")
Signed-off-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
There are no return value checking when using kzalloc() and kcalloc() for
memory allocation. so add it.
Signed-off-by: Yunfeng Ye <yeyunfeng@huawei.com>
Signed-off-by: Will Deacon <will@kernel.org>
Allow the user to select the workaround for TX2-219, and update
the silicon-errata.rst file to reflect this.
Cc: <stable@vger.kernel.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
As a PRFM instruction racing against a TTBR update can have undesirable
effects on TX2, NOP-out such PRFM on cores that are affected by
the TX2-219 erratum.
Cc: <stable@vger.kernel.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
It appears that the only case where we need to apply the TX2_219_TVM
mitigation is when the core is in SMT mode. So let's condition the
enabling on detecting a CPU whose MPIDR_EL1.Aff0 is non-zero.
Cc: <stable@vger.kernel.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
In order to workaround the TX2-219 erratum, it is necessary to trap
TTBRx_EL1 accesses to EL2. This is done by setting HCR_EL2.TVM on
guest entry, which has the side effect of trapping all the other
VM-related sysregs as well.
To minimize the overhead, a fast path is used so that we don't
have to go all the way back to the main sysreg handling code,
unless the rest of the hypervisor expects to see these accesses.
Cc: <stable@vger.kernel.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
GCC throws warning message as below:
‘clone_src_i_size’ may be used uninitialized in this function
[-Wmaybe-uninitialized]
#define IS_ALIGNED(x, a) (((x) & ((typeof(x))(a) - 1)) == 0)
^
fs/btrfs/send.c:5088:6: note: ‘clone_src_i_size’ was declared here
u64 clone_src_i_size;
^
The clone_src_i_size is only used as call-by-reference
in a call to get_inode_info().
Silence the warning by initializing clone_src_i_size to 0.
Note that the warning is a false positive and reported by older versions
of GCC (eg. 7.x) but not eg 9.x. As there have been numerous people, the
patch is applied. Setting clone_src_i_size to 0 does not otherwise make
sense and would not do any action in case the code changes in the future.
Signed-off-by: Austin Kim <austindh.kim@gmail.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ add note ]
Signed-off-by: David Sterba <dsterba@suse.com>
Any changes interesting to tasks waiting in io_cqring_wait() are
commited with io_cqring_ev_posted(). However, io_ring_drop_ctx_refs()
also tries to do that but with no reason, that means spurious wakeups
every io_free_req() and io_uring_enter().
Just use percpu_ref_put() instead.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Merge misc fixes from Andrew Morton:
"The usual shower of hotfixes.
Chris's memcg patches aren't actually fixes - they're mature but a few
niggling review issues were late to arrive.
The ocfs2 fixes are quite old - those took some time to get reviewer
attention.
Subsystems affected by this patch series: ocfs2, hotfixes, mm/memcg,
mm/slab-generic"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two)
mm, sl[ou]b: improve memory accounting
mm, memcg: make scan aggression always exclude protection
mm, memcg: make memory.emin the baseline for utilisation determination
mm, memcg: proportional memory.{low,min} reclaim
mm/vmpressure.c: fix a signedness bug in vmpressure_register_event()
mm/page_alloc.c: fix a crash in free_pages_prepare()
mm/z3fold.c: claim page in the beginning of free
kernel/sysctl.c: do not override max_threads provided by userspace
memcg: only record foreign writebacks with dirty pages when memcg is not disabled
mm: fix -Wmissing-prototypes warnings
writeback: fix use-after-free in finish_writeback_work()
mm/memremap: drop unused SECTION_SIZE and SECTION_MASK
panic: ensure preemption is disabled during panic()
fs: ocfs2: fix a possible null-pointer dereference in ocfs2_info_scan_inode_alloc()
fs: ocfs2: fix a possible null-pointer dereference in ocfs2_write_end_nolock()
fs: ocfs2: fix possible null-pointer dereferences in ocfs2_xa_prepare_entry()
ocfs2: clear zero in unaligned direct IO
In most configurations, kmalloc() happens to return naturally aligned
(i.e. aligned to the block size itself) blocks for power of two sizes.
That means some kmalloc() users might unknowingly rely on that
alignment, until stuff breaks when the kernel is built with e.g.
CONFIG_SLUB_DEBUG or CONFIG_SLOB, and blocks stop being aligned. Then
developers have to devise workaround such as own kmem caches with
specified alignment [1], which is not always practical, as recently
evidenced in [2].
The topic has been discussed at LSF/MM 2019 [3]. Adding a
'kmalloc_aligned()' variant would not help with code unknowingly relying
on the implicit alignment. For slab implementations it would either
require creating more kmalloc caches, or allocate a larger size and only
give back part of it. That would be wasteful, especially with a generic
alignment parameter (in contrast with a fixed alignment to size).
Ideally we should provide to mm users what they need without difficult
workarounds or own reimplementations, so let's make the kmalloc()
alignment to size explicitly guaranteed for power-of-two sizes under all
configurations. What this means for the three available allocators?
* SLAB object layout happens to be mostly unchanged by the patch. The
implicitly provided alignment could be compromised with
CONFIG_DEBUG_SLAB due to redzoning, however SLAB disables redzoning for
caches with alignment larger than unsigned long long. Practically on at
least x86 this includes kmalloc caches as they use cache line alignment,
which is larger than that. Still, this patch ensures alignment on all
arches and cache sizes.
* SLUB layout is also unchanged unless redzoning is enabled through
CONFIG_SLUB_DEBUG and boot parameter for the particular kmalloc cache.
With this patch, explicit alignment is guaranteed with redzoning as
well. This will result in more memory being wasted, but that should be
acceptable in a debugging scenario.
* SLOB has no implicit alignment so this patch adds it explicitly for
kmalloc(). The potential downside is increased fragmentation. While
pathological allocation scenarios are certainly possible, in my testing,
after booting a x86_64 kernel+userspace with virtme, around 16MB memory
was consumed by slab pages both before and after the patch, with
difference in the noise.
[1] https://lore.kernel.org/linux-btrfs/c3157c8e8e0e7588312b40c853f65c02fe6c957a.1566399731.git.christophe.leroy@c-s.fr/
[2] https://lore.kernel.org/linux-fsdevel/20190225040904.5557-1-ming.lei@redhat.com/
[3] https://lwn.net/Articles/787740/
[akpm@linux-foundation.org: documentation fixlet, per Matthew]
Link: http://lkml.kernel.org/r/20190826111627.7505-3-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Cc: David Sterba <dsterba@suse.cz>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: "Darrick J . Wong" <darrick.wong@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Patch series "guarantee natural alignment for kmalloc()", v2.
This patch (of 2):
SLOB currently doesn't account its pages at all, so in /proc/meminfo the
Slab field shows zero. Modifying a counter on page allocation and
freeing should be acceptable even for the small system scenarios SLOB is
intended for. Since reclaimable caches are not separated in SLOB,
account everything as unreclaimable.
SLUB currently doesn't account kmalloc() and kmalloc_node() allocations
larger than order-1 page, that are passed directly to the page
allocator. As they also don't appear in /proc/slabinfo, it might look
like a memory leak. For consistency, account them as well. (SLAB
doesn't actually use page allocator directly, so no change there).
Ideally SLOB and SLUB would be handled in separate patches, but due to
the shared kmalloc_order() function and different kfree()
implementations, it's easier to patch both at once to prevent
inconsistencies.
Link: http://lkml.kernel.org/r/20190826111627.7505-2-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: "Darrick J . Wong" <darrick.wong@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch is an incremental improvement on the existing
memory.{low,min} relative reclaim work to base its scan pressure
calculations on how much protection is available compared to the current
usage, rather than how much the current usage is over some protection
threshold.
This change doesn't change the experience for the user in the normal
case too much. One benefit is that it replaces the (somewhat arbitrary)
100% cutoff with an indefinite slope, which makes it easier to ballpark
a memory.low value.
As well as this, the old methodology doesn't quite apply generically to
machines with varying amounts of physical memory. Let's say we have a
top level cgroup, workload.slice, and another top level cgroup,
system-management.slice. We want to roughly give 12G to
system-management.slice, so on a 32GB machine we set memory.low to 20GB
in workload.slice, and on a 64GB machine we set memory.low to 52GB.
However, because these are relative amounts to the total machine size,
while the amount of memory we want to generally be willing to yield to
system.slice is absolute (12G), we end up putting more pressure on
system.slice just because we have a larger machine and a larger workload
to fill it, which seems fairly unintuitive. With this new behaviour, we
don't end up with this unintended side effect.
Previously the way that memory.low protection works is that if you are
50% over a certain baseline, you get 50% of your normal scan pressure.
This is certainly better than the previous cliff-edge behaviour, but it
can be improved even further by always considering memory under the
currently enforced protection threshold to be out of bounds. This means
that we can set relatively low memory.low thresholds for variable or
bursty workloads while still getting a reasonable level of protection,
whereas with the previous version we may still trivially hit the 100%
clamp. The previous 100% clamp is also somewhat arbitrary, whereas this
one is more concretely based on the currently enforced protection
threshold, which is likely easier to reason about.
There is also a subtle issue with the way that proportional reclaim
worked previously -- it promotes having no memory.low, since it makes
pressure higher during low reclaim. This happens because we base our
scan pressure modulation on how far memory.current is between memory.min
and memory.low, but if memory.low is unset, we only use the overage
method. In most cromulent configurations, this then means that we end
up with *more* pressure than with no memory.low at all when we're in low
reclaim, which is not really very usable or expected.
With this patch, memory.low and memory.min affect reclaim pressure in a
more understandable and composable way. For example, from a user
standpoint, "protected" memory now remains untouchable from a reclaim
aggression standpoint, and users can also have more confidence that
bursty workloads will still receive some amount of guaranteed
protection.
Link: http://lkml.kernel.org/r/20190322160307.GA3316@chrisdown.name
Signed-off-by: Chris Down <chris@chrisdown.name>
Reviewed-by: Roman Gushchin <guro@fb.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Roman points out that when when we do the low reclaim pass, we scale the
reclaim pressure relative to position between 0 and the maximum
protection threshold.
However, if the maximum protection is based on memory.elow, and
memory.emin is above zero, this means we still may get binary behaviour
on second-pass low reclaim. This is because we scale starting at 0, not
starting at memory.emin, and since we don't scan at all below emin, we
end up with cliff behaviour.
This should be a fairly uncommon case since usually we don't go into the
second pass, but it makes sense to scale our low reclaim pressure
starting at emin.
You can test this by catting two large sparse files, one in a cgroup
with emin set to some moderate size compared to physical RAM, and
another cgroup without any emin. In both cgroups, set an elow larger
than 50% of physical RAM. The one with emin will have less page
scanning, as reclaim pressure is lower.
Rebase on top of and apply the same idea as what was applied to handle
cgroup_memory=disable properly for the original proportional patch
http://lkml.kernel.org/r/20190201045711.GA18302@chrisdown.name ("mm,
memcg: Handle cgroup_disable=memory when getting memcg protection").
Link: http://lkml.kernel.org/r/20190201051810.GA18895@chrisdown.name
Signed-off-by: Chris Down <chris@chrisdown.name>
Suggested-by: Roman Gushchin <guro@fb.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Dennis Zhou <dennis@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
cgroup v2 introduces two memory protection thresholds: memory.low
(best-effort) and memory.min (hard protection). While they generally do
what they say on the tin, there is a limitation in their implementation
that makes them difficult to use effectively: that cliff behaviour often
manifests when they become eligible for reclaim. This patch implements
more intuitive and usable behaviour, where we gradually mount more
reclaim pressure as cgroups further and further exceed their protection
thresholds.
This cliff edge behaviour happens because we only choose whether or not
to reclaim based on whether the memcg is within its protection limits
(see the use of mem_cgroup_protected in shrink_node), but we don't vary
our reclaim behaviour based on this information. Imagine the following
timeline, with the numbers the lruvec size in this zone:
1. memory.low=1000000, memory.current=999999. 0 pages may be scanned.
2. memory.low=1000000, memory.current=1000000. 0 pages may be scanned.
3. memory.low=1000000, memory.current=1000001. 1000001* pages may be
scanned. (?!)
* Of course, we won't usually scan all available pages in the zone even
without this patch because of scan control priority, over-reclaim
protection, etc. However, as shown by the tests at the end, these
techniques don't sufficiently throttle such an extreme change in input,
so cliff-like behaviour isn't really averted by their existence alone.
Here's an example of how this plays out in practice. At Facebook, we are
trying to protect various workloads from "system" software, like
configuration management tools, metric collectors, etc (see this[0] case
study). In order to find a suitable memory.low value, we start by
determining the expected memory range within which the workload will be
comfortable operating. This isn't an exact science -- memory usage deemed
"comfortable" will vary over time due to user behaviour, differences in
composition of work, etc, etc. As such we need to ballpark memory.low,
but doing this is currently problematic:
1. If we end up setting it too low for the workload, it won't have
*any* effect (see discussion above). The group will receive the full
weight of reclaim and won't have any priority while competing with the
less important system software, as if we had no memory.low configured
at all.
2. Because of this behaviour, we end up erring on the side of setting
it too high, such that the comfort range is reliably covered. However,
protected memory is completely unavailable to the rest of the system,
so we might cause undue memory and IO pressure there when we *know* we
have some elasticity in the workload.
3. Even if we get the value totally right, smack in the middle of the
comfort zone, we get extreme jumps between no pressure and full
pressure that cause unpredictable pressure spikes in the workload due
to the current binary reclaim behaviour.
With this patch, we can set it to our ballpark estimation without too much
worry. Any undesirable behaviour, such as too much or too little reclaim
pressure on the workload or system will be proportional to how far our
estimation is off. This means we can set memory.low much more
conservatively and thus waste less resources *without* the risk of the
workload falling off a cliff if we overshoot.
As a more abstract technical description, this unintuitive behaviour
results in having to give high-priority workloads a large protection
buffer on top of their expected usage to function reliably, as otherwise
we have abrupt periods of dramatically increased memory pressure which
hamper performance. Having to set these thresholds so high wastes
resources and generally works against the principle of work conservation.
In addition, having proportional memory reclaim behaviour has other
benefits. Most notably, before this patch it's basically mandatory to set
memory.low to a higher than desirable value because otherwise as soon as
you exceed memory.low, all protection is lost, and all pages are eligible
to scan again. By contrast, having a gradual ramp in reclaim pressure
means that you now still get some protection when thresholds are exceeded,
which means that one can now be more comfortable setting memory.low to
lower values without worrying that all protection will be lost. This is
important because workingset size is really hard to know exactly,
especially with variable workloads, so at least getting *some* protection
if your workingset size grows larger than you expect increases user
confidence in setting memory.low without a huge buffer on top being
needed.
Thanks a lot to Johannes Weiner and Tejun Heo for their advice and
assistance in thinking about how to make this work better.
In testing these changes, I intended to verify that:
1. Changes in page scanning become gradual and proportional instead of
binary.
To test this, I experimented stepping further and further down
memory.low protection on a workload that floats around 19G workingset
when under memory.low protection, watching page scan rates for the
workload cgroup:
+------------+-----------------+--------------------+--------------+
| memory.low | test (pgscan/s) | control (pgscan/s) | % of control |
+------------+-----------------+--------------------+--------------+
| 21G | 0 | 0 | N/A |
| 17G | 867 | 3799 | 23% |
| 12G | 1203 | 3543 | 34% |
| 8G | 2534 | 3979 | 64% |
| 4G | 3980 | 4147 | 96% |
| 0 | 3799 | 3980 | 95% |
+------------+-----------------+--------------------+--------------+
As you can see, the test kernel (with a kernel containing this
patch) ramps up page scanning significantly more gradually than the
control kernel (without this patch).
2. More gradual ramp up in reclaim aggression doesn't result in
premature OOMs.
To test this, I wrote a script that slowly increments the number of
pages held by stress(1)'s --vm-keep mode until a production system
entered severe overall memory contention. This script runs in a highly
protected slice taking up the majority of available system memory.
Watching vmstat revealed that page scanning continued essentially
nominally between test and control, without causing forward reclaim
progress to become arrested.
[0]: https://facebookmicrosites.github.io/cgroup2/docs/overview.html#case-study-the-fbtax2-project
[akpm@linux-foundation.org: reflow block comments to fit in 80 cols]
[chris@chrisdown.name: handle cgroup_disable=memory when getting memcg protection]
Link: http://lkml.kernel.org/r/20190201045711.GA18302@chrisdown.name
Link: http://lkml.kernel.org/r/20190124014455.GA6396@chrisdown.name
Signed-off-by: Chris Down <chris@chrisdown.name>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Roman Gushchin <guro@fb.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Partially revert 16db3d3f11 ("kernel/sysctl.c: threads-max observe
limits") because the patch is causing a regression to any workload which
needs to override the auto-tuning of the limit provided by kernel.
set_max_threads is implementing a boot time guesstimate to provide a
sensible limit of the concurrently running threads so that runaways will
not deplete all the memory. This is a good thing in general but there
are workloads which might need to increase this limit for an application
to run (reportedly WebSpher MQ is affected) and that is simply not
possible after the mentioned change. It is also very dubious to
override an admin decision by an estimation that doesn't have any direct
relation to correctness of the kernel operation.
Fix this by dropping set_max_threads from sysctl_max_threads so any
value is accepted as long as it fits into MAX_THREADS which is important
to check because allowing more threads could break internal robust futex
restriction. While at it, do not use MIN_THREADS as the lower boundary
because it is also only a heuristic for automatic estimation and admin
might have a good reason to stop new threads to be created even when
below this limit.
This became more severe when we switched x86 from 4k to 8k kernel
stacks. Starting since 6538b8ea88 ("x86_64: expand kernel stack to
16K") (3.16) we use THREAD_SIZE_ORDER = 2 and that halved the auto-tuned
value.
In the particular case
3.12
kernel.threads-max = 515561
4.4
kernel.threads-max = 200000
Neither of the two values is really insane on 32GB machine.
I am not sure we want/need to tune the max_thread value further. If
anything the tuning should be removed altogether if proven not useful in
general. But we definitely need a way to override this auto-tuning.
Link: http://lkml.kernel.org/r/20190922065801.GB18814@dhcp22.suse.cz
Fixes: 16db3d3f11 ("kernel/sysctl.c: threads-max observe limits")
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Heinrich Schuchardt <xypron.glpk@gmx.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In kdump kernel, memcg usually is disabled with 'cgroup_disable=memory'
for saving memory. Now kdump kernel will always panic when dump vmcore
to local disk:
BUG: kernel NULL pointer dereference, address: 0000000000000ab8
Oops: 0000 [#1] SMP NOPTI
CPU: 0 PID: 598 Comm: makedumpfile Not tainted 5.3.0+ #26
Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 10/02/2018
RIP: 0010:mem_cgroup_track_foreign_dirty_slowpath+0x38/0x140
Call Trace:
__set_page_dirty+0x52/0xc0
iomap_set_page_dirty+0x50/0x90
iomap_write_end+0x6e/0x270
iomap_write_actor+0xce/0x170
iomap_apply+0xba/0x11e
iomap_file_buffered_write+0x62/0x90
xfs_file_buffered_aio_write+0xca/0x320 [xfs]
new_sync_write+0x12d/0x1d0
vfs_write+0xa5/0x1a0
ksys_write+0x59/0xd0
do_syscall_64+0x59/0x1e0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
And this will corrupt the 1st kernel too with 'cgroup_disable=memory'.
Via the trace and with debugging, it is pointing to commit 97b27821b4
("writeback, memcg: Implement foreign dirty flushing") which introduced
this regression. Disabling memcg causes the null pointer dereference at
uninitialized data in function mem_cgroup_track_foreign_dirty_slowpath().
Fix it by returning directly if memcg is disabled, but not trying to
record the foreign writebacks with dirty pages.
Link: http://lkml.kernel.org/r/20190924141928.GD31919@MiWiFi-R3L-srv
Fixes: 97b27821b4 ("writeback, memcg: Implement foreign dirty flushing")
Signed-off-by: Baoquan He <bhe@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
SECTION_SIZE and SECTION_MASK macros are not getting used anymore. But
they do conflict with existing definitions on arm64 platform causing
following warning during build. Lets drop these unused macros.
mm/memremap.c:16: warning: "SECTION_MASK" redefined
#define SECTION_MASK ~((1UL << PA_SECTION_SHIFT) - 1)
arch/arm64/include/asm/pgtable-hwdef.h:79: note: this is the location of the previous definition
#define SECTION_MASK (~(SECTION_SIZE-1))
mm/memremap.c:17: warning: "SECTION_SIZE" redefined
#define SECTION_SIZE (1UL << PA_SECTION_SHIFT)
arch/arm64/include/asm/pgtable-hwdef.h:78: note: this is the location of the previous definition
#define SECTION_SIZE (_AC(1, UL) << SECTION_SHIFT)
Link: http://lkml.kernel.org/r/1569312010-31313-1-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Reported-by: kbuild test robot <lkp@intel.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Logan Gunthorpe <logang@deltatee.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Calling 'panic()' on a kernel with CONFIG_PREEMPT=y can leave the
calling CPU in an infinite loop, but with interrupts and preemption
enabled. From this state, userspace can continue to be scheduled,
despite the system being "dead" as far as the kernel is concerned.
This is easily reproducible on arm64 when booting with "nosmp" on the
command line; a couple of shell scripts print out a periodic "Ping"
message whilst another triggers a crash by writing to
/proc/sysrq-trigger:
| sysrq: Trigger a crash
| Kernel panic - not syncing: sysrq triggered crash
| CPU: 0 PID: 1 Comm: init Not tainted 5.2.15 #1
| Hardware name: linux,dummy-virt (DT)
| Call trace:
| dump_backtrace+0x0/0x148
| show_stack+0x14/0x20
| dump_stack+0xa0/0xc4
| panic+0x140/0x32c
| sysrq_handle_reboot+0x0/0x20
| __handle_sysrq+0x124/0x190
| write_sysrq_trigger+0x64/0x88
| proc_reg_write+0x60/0xa8
| __vfs_write+0x18/0x40
| vfs_write+0xa4/0x1b8
| ksys_write+0x64/0xf0
| __arm64_sys_write+0x14/0x20
| el0_svc_common.constprop.0+0xb0/0x168
| el0_svc_handler+0x28/0x78
| el0_svc+0x8/0xc
| Kernel Offset: disabled
| CPU features: 0x0002,24002004
| Memory Limit: none
| ---[ end Kernel panic - not syncing: sysrq triggered crash ]---
| Ping 2!
| Ping 1!
| Ping 1!
| Ping 2!
The issue can also be triggered on x86 kernels if CONFIG_SMP=n,
otherwise local interrupts are disabled in 'smp_send_stop()'.
Disable preemption in 'panic()' before re-enabling interrupts.
Link: http://lkml.kernel.org/r/20191002123538.22609-1-will@kernel.org
Link: https://lore.kernel.org/r/BX1W47JXPMR8.58IYW53H6M5N@dragonstone
Signed-off-by: Will Deacon <will@kernel.org>
Reported-by: Xogium <contact@xogium.me>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Feng Tang <feng.tang@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In ocfs2_xa_prepare_entry(), there is an if statement on line 2136 to
check whether loc->xl_entry is NULL:
if (loc->xl_entry)
When loc->xl_entry is NULL, it is used on line 2158:
ocfs2_xa_add_entry(loc, name_hash);
loc->xl_entry->xe_name_hash = cpu_to_le32(name_hash);
loc->xl_entry->xe_name_offset = cpu_to_le16(loc->xl_size);
and line 2164:
ocfs2_xa_add_namevalue(loc, xi);
loc->xl_entry->xe_value_size = cpu_to_le64(xi->xi_value_len);
loc->xl_entry->xe_name_len = xi->xi_name_len;
Thus, possible null-pointer dereferences may occur.
To fix these bugs, if loc-xl_entry is NULL, ocfs2_xa_prepare_entry()
abnormally returns with -EINVAL.
These bugs are found by a static analysis tool STCheck written by us.
[akpm@linux-foundation.org: remove now-unused ocfs2_xa_add_entry()]
Link: http://lkml.kernel.org/r/20190726101447.9153-1-baijiaju1990@gmail.com
Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Unused portion of a part-written fs-block-sized block is not set to zero
in unaligned append direct write.This can lead to serious data
inconsistencies.
Ocfs2 manage disk with cluster size(for example, 1M), part-written in
one cluster will change the cluster state from UN-WRITTEN to WRITTEN,
VFS(function dio_zero_block) doesn't do the cleaning because bh's state
is not set to NEW in function ocfs2_dio_wr_get_block when we write a
WRITTEN cluster. For example, the cluster size is 1M, file size is 8k
and we direct write from 14k to 15k, then 12k~14k and 15k~16k will
contain dirty data.
We have to deal with two cases:
1.The starting position of direct write is outside the file.
2.The starting position of direct write is located in the file.
We need set bh's state to NEW in the first case. In the second case, we
need mapped twice because bh's state of area out file should be set to
NEW while area in file not.
[akpm@linux-foundation.org: coding style fixes]
Link: http://lkml.kernel.org/r/5292e287-8f1a-fd4a-1a14-661e555e0bed@huawei.com
Signed-off-by: Jia Guo <guojia12@huawei.com>
Reviewed-by: Yiwen Jiang <jiangyiwen@huawei.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <joseph.qi@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently execution of panic() continues until Xen's panic notifier
(xen_panic_event()) is called at which point we make a hypercall that
never returns.
This means that any notifier that is supposed to be called later as
well as significant part of panic() code (such as pstore writes from
kmsg_dump()) is never executed.
There is no reason for xen_panic_event() to be this last point in
execution since panic()'s emergency_restart() will call into
xen_emergency_restart() from where we can perform our hypercall.
Nevertheless, we will provide xen_legacy_crash boot option that will
preserve original behavior during crash. This option could be used,
for example, if running kernel dumper (which happens after panic
notifiers) is undesirable.
Reported-by: James Dingwall <james@dingwall.me.uk>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
In non-ETSI regulatory domains scan is blocked when operating channel
is a DFS channel. For ETSI, however, once DFS channel is marked as
available after the CAC, this channel will remain available (for some
time) even after leaving this channel.
Therefore a scan can be done without any impact on the availability
of the DFS channel as no new CAC is required after the scan.
Enable scan in mac80211 in these cases.
Signed-off-by: Aaron Komisar <aaron.komisar@tandemg.com>
Link: https://lore.kernel.org/r/1570024728-17284-1-git-send-email-aaron.komisar@tandemg.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
For the kernel space, all ebreak instructions are determined at compile
time because the kernel space debugging module is currently unsupported.
Hence, it should be treated as a bug if an ebreak instruction which does
not belong to BUG_TRAP_TYPE_WARN or BUG_TRAP_TYPE_BUG is executed in
kernel space. For the userspace, debugging module or user problem may
intentionally insert an ebreak instruction to trigger a SIGTRAP signal.
To approach the above two situations, the do_trap_break() will direct
the BUG_TRAP_TYPE_NONE ebreak exception issued in kernel space to die()
and will send a SIGTRAP to the trapped process only when the ebreak is
in userspace.
Signed-off-by: Vincent Chen <vincent.chen@sifive.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
[paul.walmsley@sifive.com: fixed checkpatch issue]
Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
On RISC-V, when the kernel runs code on behalf of a user thread, and the
kernel executes a WARN() or WARN_ON(), the user thread will be sent
a bogus SIGTRAP. Fix the RISC-V kernel code to not send a SIGTRAP when
a WARN()/WARN_ON() is executed.
Signed-off-by: Vincent Chen <vincent.chen@sifive.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
[paul.walmsley@sifive.com: fixed subject]
Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
When the CONFIG_GENERIC_BUG is disabled by disabling CONFIG_BUG, if a
kernel thread is trapped by BUG(), the whole system will be in the
loop that infinitely handles the ebreak exception instead of entering the
die function. To fix this problem, the do_trap_break() will always call
the die() to deal with the break exception as the type of break is
BUG_TRAP_TYPE_BUG.
Signed-off-by: Vincent Chen <vincent.chen@sifive.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
In commit 9f79b78ef7 ("Convert filldir[64]() from __put_user() to
unsafe_put_user()") I made filldir() use unsafe_put_user(), which
improves code generation on x86 enormously.
But because we didn't have a "unsafe_copy_to_user()", the dirent name
copy was also done by hand with unsafe_put_user() in a loop, and it
turns out that a lot of other architectures didn't like that, because
unlike x86, they have various alignment issues.
Most non-x86 architectures trap and fix it up, and some (like xtensa)
will just fail unaligned put_user() accesses unconditionally. Which
makes that "copy using put_user() in a loop" not work for them at all.
I could make that code do explicit alignment etc, but the architectures
that don't like unaligned accesses also don't really use the fancy
"user_access_begin/end()" model, so they might just use the regular old
__copy_to_user() interface.
So this commit takes that looping implementation, turns it into the x86
version of "unsafe_copy_to_user()", and makes other architectures
implement the unsafe copy version as __copy_to_user() (the same way they
do for the other unsafe_xyz() accessor functions).
Note that it only does this for the copying _to_ user space, and we
still don't have a unsafe version of copy_from_user().
That's partly because we have no current users of it, but also partly
because the copy_from_user() case is slightly different and cannot
efficiently be implemented in terms of a unsafe_get_user() loop (because
gcc can't do asm goto with outputs).
It would be trivial to do this using "rep movsb", which would work
really nicely on newer x86 cores, but really badly on some older ones.
Al Viro is looking at cleaning up all our user copy routines to make
this all a non-issue, but for now we have this simple-but-stupid version
for x86 that works fine for the dirent name copy case because those
names are short strings and we simply don't need anything fancier.
Fixes: 9f79b78ef7 ("Convert filldir[64]() from __put_user() to unsafe_put_user()")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Reported-and-tested-by: Tony Luck <tony.luck@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
cfg80211_update_notlisted_nontrans() leaves the RCU critical session
too early, while still using nontrans_ssid which is RCU protected. In
addition, it performs a bunch of RCU pointer update operations such
as rcu_access_pointer and rcu_assign_pointer.
The caller, cfg80211_inform_bss_frame_data(), also accesses the RCU
pointer without holding the lock.
Just wrap all of this with bss_lock.
Signed-off-by: Sara Sharon <sara.sharon@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Link: https://lore.kernel.org/r/20191004123706.15768-3-luca@coelho.fi
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Force bonded requests to run on distinct engines so that they cannot be
shuffled onto the same engine where timeslicing will reverse the order.
A bonded request will often wait on a semaphore signaled by its master,
creating an implicit dependency -- if we ignore that implicit dependency
and allow the bonded request to run on the same engine and before its
master, we will cause a GPU hang. [Whether it will hang the GPU is
debatable, we should keep on timeslicing and each timeslice should be
"accidentally" counted as forward progress, in which case it should run
but at one-half to one-third speed.]
We can prevent this inversion by restricting which engines we allow
ourselves to jump to upon preemption, i.e. baking in the arrangement
established at first execution. (We should also consider capturing the
implicit dependency using i915_sched_add_dependency(), but first we need
to think about the constraints that requires on the execution/retirement
ordering.)
Fixes: 8ee36e048c ("drm/i915/execlists: Minimalistic timeslicing")
References: ee1136908e ("drm/i915/execlists: Virtual engine bonding")
Testcase: igt/gem_exec_balancer/bonded-slice
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190923152844.8914-3-chris@chris-wilson.co.uk
(cherry picked from commit e2144503bf)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
The officially validated plane width limit is 4k on skl+, however
we already had people using 5k displays before we started to enforce
the limit. Also it seems Windows allows 5k resolutions as well
(though not sure if they do it with one plane or two).
According to hw folks 5k should work with the possible
exception of the following features:
- Ytile (already limited to 4k)
- FP16 (already limited to 4k)
- render compression (already limited to 4k)
- KVMR sprite and cursor (don't care)
- horizontal panning (need to verify this)
- pipe and plane scaling (need to verify this)
So apart from last two items on that list we are already
fine. We should really verify what happens with those last
two items but I don't have a 5k display on hand atm so it'll
have to wait.
In the meantime let's just bump the limit back up to 5k since
several users have already been using it without apparent issues.
At least we'll be no worse off than we were prior to lowering
the limits.
Cc: stable@vger.kernel.org
Cc: Sean Paul <sean@poorly.run>
Cc: José Roberto de Souza <jose.souza@intel.com>
Tested-by: Leho Kraav <leho@kraav.com>
Fixes: 372b9ffb57 ("drm/i915: Fix skl+ max plane width")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111501
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190905135044.2001-1-ville.syrjala@linux.intel.com
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Sean Paul <sean@poorly.run>
(cherry picked from commit bed34ef544)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
As we may unwind incomplete requests (for preemption) prior to
processing the CSB and the schedule-out events, we may update rq->engine
(resetting it to point back to the parent virtual engine) prior to
calling execlists_schedule_out(), invalidating the assertion that the
request still points to the inflight engine. (The likelihood of this is
increased if the CSB interrupt processing is pushed to the ksoftirqd for
being too slow and direct submission overtakes it.)
Tvrtko summarised it as:
"So unwind from direct submission resets rq->engine and races with
process_csb from the tasklet which notices request has actually
completed."
Reported-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Fixes: df40306902 ("drm/i915/execlists: Lift process_csb() out of the irq-off spinlock")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190907105046.19934-1-chris@chris-wilson.co.uk
(cherry picked from commit d810583fc2)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Commit ac7c3e4ff4 ("compiler: enable CONFIG_OPTIMIZE_INLINING
forcibly") allows compiler to uninline functions marked as 'inline'.
In cace of cmpxchg this would cause to reference function
__cmpxchg_called_with_bad_pointer, which is a error case
for catching bugs and will not happen for correct code, if
__cmpxchg is inlined.
Signed-off-by: Thomas Bogendoerfer <tbogendoerfer@suse.de>
[paul.burton@mips.com: s/__cmpxchd/__cmpxchg in subject]
Signed-off-by: Paul Burton <paul.burton@mips.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: James Hogan <jhogan@kernel.org>
Cc: linux-mips@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
scripts/nsdeps automatically generates a patch to add MODULE_IMPORT_NS
tags, and what is nicer, it sorts the lines alphabetically with the
'sort' command. However, the output from the 'sort' command depends on
locale.
For example, I got this:
$ { echo usbstorage; echo usb_storage; } | LANG=en_US.UTF-8 sort
usbstorage
usb_storage
$ { echo usbstorage; echo usb_storage; } | LANG=C sort
usb_storage
usbstorage
So, this means people might potentially send different patches.
This kind of issue was reported in the past, for example,
commit f55f2328bb ("kbuild: make sorting initramfs contents
independent of locale").
Adding 'LANG=C' is a conventional way of fixing when a deterministic
result is desirable.
I added 'LANG=C' very close to the 'sort' command since changing
locale affects the language of error messages etc. We should respect
users' choice as much as possible.
Reviewed-by: Matthias Maennich <maennich@google.com>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Jessica Yu <jeyu@kernel.org>
This script does not use bash-extension. I am guessing this hashbang
was copied from scripts/coccicheck, which really uses bash-extension.
/bin/sh is enough for this script.
Reviewed-by: Matthias Maennich <maennich@google.com>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Jessica Yu <jeyu@kernel.org>
Running 'make nsdeps' in a clean source tree fails as follows:
$ make -s clean; make -s defconfig; make nsdeps
[ snip ]
awk: fatal: cannot open file `init/modules.order' for reading (No such file or directory)
make: *** [Makefile;1307: modules.order] Error 2
make: *** Deleting file 'modules.order'
make: *** Waiting for unfinished jobs....
The cause of the error is 'make nsdeps' does not build modules at all.
Set KBUILD_MODULES to fix it.
Reviewed-by: Matthias Maennich <maennich@google.com>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Jessica Yu <jeyu@kernel.org>
The module namespace produces __strtab_ns_<sym> symbols to store
namespace strings, but it does not guarantee the name uniqueness.
This is a potential problem because we have exported symbols starting
with "ns_".
For example, kernel/capability.c exports the following symbols:
EXPORT_SYMBOL(ns_capable);
EXPORT_SYMBOL(capable);
Assume a situation where those are converted as follows:
EXPORT_SYMBOL_NS(ns_capable, some_namespace);
EXPORT_SYMBOL_NS(capable, some_namespace);
The former expands to "__kstrtab_ns_capable" and "__kstrtab_ns_ns_capable",
and the latter to "__kstrtab_capable" and "__kstrtab_ns_capable".
Then, we have the duplicated "__kstrtab_ns_capable".
To ensure the uniqueness, rename "__kstrtab_ns_*" to "__kstrtabns_*".
Reviewed-by: Matthias Maennich <maennich@google.com>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Jessica Yu <jeyu@kernel.org>
Currently, external module builds produce tons of false-positives:
WARNING: module <mod> uses symbol <sym> from namespace <ns>, but does not import it.
Here, the <ns> part shows a random string.
When you build external modules, the symbol info of vmlinux and
in-kernel modules are read from $(objtree)/Module.symvers, but
read_dump() is buggy in multiple ways:
[1] When the modpost is run for vmlinux and in-kernel modules,
sym_extract_namespace() allocates memory for the namespace. On the
other hand, read_dump() does not, then sym->namespace will point to
somewhere in the line buffer of get_next_line(). The data in the
buffer will be replaced soon, and sym->namespace will end up with
pointing to unrelated data. As a result, check_exports() will show
random strings in the warning messages.
[2] When there is no namespace, sym_extract_namespace() returns NULL.
On the other hand, read_dump() sets namespace to an empty string "".
(but, it will be later replaced with unrelated data due to bug [1].)
The check_exports() shows a warning unless exp->namespace is NULL,
so every symbol read from read_dump() emits the warning, which is
mostly false positive.
To address [1], sym_add_exported() calls strdup() for s->namespace.
The namespace from sym_extract_namespace() must be freed to avoid
memory leak.
For [2], I changed the if-conditional in check_exports().
This commit also fixes sym_add_exported() to set s->namespace correctly
when the symbol is preloaded.
Reviewed-by: Matthias Maennich <maennich@google.com>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Jessica Yu <jeyu@kernel.org>
Currently, EXPORT_SYMBOL_NS(_GPL) constructs the kernel symbol as
follows:
__ksymtab_SYMBOL.NAMESPACE
The sym_extract_namespace() in modpost allocates memory for the part
SYMBOL.NAMESPACE when '.' is contained. One problem is that the pointer
returned by strdup() is lost because the symbol name will be copied to
malloc'ed memory by alloc_symbol(). No one will keep track of the
pointer of strdup'ed memory.
sym->namespace still points to the NAMESPACE part. So, you can free it
with complicated code like this:
free(sym->namespace - strlen(sym->name) - 1);
It complicates memory free.
To fix it elegantly, I swapped the order of the symbol and the
namespace as follows:
__ksymtab_NAMESPACE.SYMBOL
then, simplified sym_extract_namespace() so that it allocates memory
only for the NAMESPACE part.
I prefer this order because it is intuitive and also matches to major
languages. For example, NAMESPACE::NAME in C++, MODULE.NAME in Python.
Reviewed-by: Matthias Maennich <maennich@google.com>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Jessica Yu <jeyu@kernel.org>
Now all scripts in scripts/coccinelle to be automatically called
by coccicheck. However new adding add_namespace.cocci does not
support report mode, which make coccicheck failed.
This add "virtual report" to make the coccicheck go ahead smoothly.
Fixes: eb8305aecb ("scripts: Coccinelle script for namespace dependencies.")
Acked-by: Julia Lawall <julia.lawall@lip6.fr>
Acked-by: Matthias Maennich <maennich@google.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Jessica Yu <jeyu@kernel.org>
When the netdev is down, the queues and their debug stats
do not exist, so don't try using a pointer to them when
when printing the ethtool stats.
Fixes: e470355bd9 ("ionic: Add driver stats")
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
If __calc_tpm2_event_size() fails to parse an event it will return 0,
resulting tpm2_calc_event_log_size() returning -1. Currently there is
no check of this return value, and 'efi_tpm_final_log_size' can end up
being set to this negative value resulting in a crash like this one:
BUG: unable to handle page fault for address: ffffbc8fc00866ad
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
RIP: 0010:memcpy_erms+0x6/0x10
Call Trace:
tpm_read_log_efi()
tpm_bios_log_setup()
tpm_chip_register()
tpm_tis_core_init.cold.9+0x28c/0x466
tpm_tis_plat_probe()
platform_drv_probe()
...
Also __calc_tpm2_event_size() returns a size of 0 when it fails
to parse an event, so update function documentation to reflect this.
The root cause of the issue that caused the failure of event parsing
in this case is resolved by Peter Jone's patchset dealing with large
event logs where crossing over a page boundary causes the page with
the event count to be unmapped.
Signed-off-by: Jerry Snitselaar <jsnitsel@redhat.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Ben Dooks <ben.dooks@codethink.co.uk>
Cc: Dave Young <dyoung@redhat.com>
Cc: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Lukas Wunner <lukas@wunner.de>
Cc: Lyude Paul <lyude@redhat.com>
Cc: Matthew Garrett <mjg59@google.com>
Cc: Octavian Purdila <octavian.purdila@intel.com>
Cc: Peter Jones <pjones@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Scott Talbert <swt@techie.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Cc: linux-integrity@vger.kernel.org
Cc: stable@vger.kernel.org
Fixes: c46f340569 ("tpm: Reserve the TPM final events table")
Link: https://lkml.kernel.org/r/20191002165904.8819-6-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull perf/urgent fixes from Arnaldo Carvalho de Melo:
perf script:
Andi Kleen:
- Fix recovery from LBR/binary mismatch in the "brstackinsn" --field.
perf annotate:
Arnaldo Carvalho de Melo:
- Propagate errors so that meaningful messages can be presented to the
user in case of problems.
perf map:
Steve MacLean:
- Fix handling of maps partially overlapped, resolving symbols in the
ranges not replaced by new mmaps.
perf tests:
Ian Rogers:
- Use raise() instead of NULL derefs to avoid causing a SIGILL rather than a
SIGSEGV for optimized builds that turn NULL derefs into ud2 instructions.
perf LLVM:
Ian Rogers:
- Don't access out-of-scope array.
perf inject:
Steve MacLean:
- Fix JIT_CODE_MOVE filename, that was having a u64 truncaded into a 32-bit
snprintf format and also a missing ".so" suffix in another case.
libsubcmd:
Ian Rogers:
- Make _FORTIFY_SOURCE defines dependent on the feature, avoiding
false positives with with memory sanitizers such as LLVM's ASan.
Vendor specific events:
Intel:
Andi Kleen:
- Fix period for Intel fixed counters.
s390:
Thomas Richter (2):
- Fix some event details transaction for machine type 8561.
tools headers UAPI:
Arnaldo Carvalho de Melo:
- Sync headers with the kernel, catching new usbdevfs ioctls and
madvise behaviours to properly decode in 'perf trace' output.
Documentation:
Steve MacLean:
- Correct and clarify jitdump spec.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
David Howells says:
====================
rxrpc: Syzbot-inspired fixes
Here's a series of patches that fix a number of issues found by syzbot:
(1) A reference leak on rxrpc_call structs in a sendmsg error path.
(2) A tracepoint that looked in the rxrpc_peer record after putting it.
Analogous with this, though not presently detected, the same bug is
also fixed in relation to rxrpc_connection and rxrpc_call records.
(3) Peer records don't pin local endpoint records, despite accessing them.
(4) Access to connection crypto ops to clean up a call after the call's
ref on that connection has been put.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Sphinx is generating a build warning as the title underline
of this file is too short.
Signed-off-by: Adam Zerella <adam.zerella@gmail.com>
Reviewed-by: Jean Delvare <jdelvare@suse.de>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
We can have 2 dcpm-s with the same backend and frontend name
(capture + playback pair), this causes the following debugfs error
on Intel Bay Trail systems:
[ 298.969049] debugfs: Directory 'SSP2-Codec' with parent 'Baytrail Audio Port' already present!
This commit adds a ":playback" or ":capture" postfix to the debugfs dir
name fixing this.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20191005212202.5206-1-hdegoede@redhat.com
Signed-off-by: Mark Brown <broonie@kernel.org>
CONFIG_COMPAT_VDSO is defined by passing '-DCONFIG_COMPAT_VDSO' to the
compiler when the generic compat vDSO code is in use. It's much cleaner
and simpler to expose this as a proper Kconfig option (like x86 does),
so do that and remove the bodge.
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
For consistency with CROSS_COMPILE_COMPAT, mechanically rename COMPATCC
to CC_COMPAT so that specifying aspects of the compat vDSO toolchain in
the environment isn't needlessly confusing.
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Directly passing the '--target' option to clang by appending to
COMPATCC does not work if COMPATCC has been specified explicitly as
an argument to Make unless the 'override' directive is used, which is
ugly and different to what is done in the top-level Makefile.
Move the '--target' option for clang out of COMPATCC and into
VDSO_CAFLAGS, where it will be picked up when compiling and assembling
the 32-bit vDSO under clang.
Reported-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
KBUILD_CPPFLAGS is defined differently depending on whether the main
compiler is clang or not. This means that it is not possible to build
the compat vDSO with GCC if the rest of the kernel is built with clang.
Define VDSO_CPPFLAGS directly to break this dependency and allow a clang
kernel to build a compat vDSO with GCC:
$ make ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- \
CROSS_COMPILE_COMPAT=arm-linux-gnueabihf- CC=clang \
COMPATCC=arm-linux-gnueabihf-gcc
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
There's no need to export COMPATCC, so just define it locally in the
vdso32/Makefile, which is the only place where it is used.
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Rather than force the use of GCC for the compat cross-compiler, instead
extract the target from CROSS_COMPILE_COMPAT and pass it to clang if the
main compiler is clang.
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
What we thought would be the module clock is actually the clock meant to be
used by the sensors, and play no role in the CSI controller. Now that the
binding has been updated to reflect that, let's update the device tree too.
Fixes: d2b9c64443 ("ARM: dts: sun7i: Add CSI0 controller")
Reported-by: Chen-Yu Tsai <wens@csie.org>
Signed-off-by: Maxime Ripard <mripard@kernel.org>
It turns out that what was thought to be the module clock was actually the
clock meant to be used by the sensor, and isn't playing any role with the
CSI controller itself. Let's drop that clock from our binding.
Fixes: c5e8f4ccd7 ("media: dt-bindings: media: Add Allwinner A10 CSI binding")
Reported-by: Chen-Yu Tsai <wens@csie.org>
Signed-off-by: Maxime Ripard <mripard@kernel.org>
There are two checks to see if the manual gpio is configured, but
these the check is seeing if the structure is NULL instead it
should check to see if there are CTS and/or RTS pins defined.
This patch uses checks for those individual pins instead of
checking for the structure itself to restore auto RTS/CTS.
Signed-off-by: Adam Ford <aford173@gmail.com>
Reviewed-by: Yegor Yefremov <yegorslists@googlemail.com>
Link: https://lore.kernel.org/r/20191006163314.23191-2-aford173@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Patch fixes issue with Halt Endnpoint Test observed during using g_zero
driver as DUT. Bug occurred only on some testing board.
Endpoint can defer transition to Halted state if endpoint has pending
requests.
Patch add additional condition that allows to return correct endpoint
status during Get Endpoint Status request even if the halting endpoint
is in progress.
Reported-by: Rahul Kumar <kurahul@cadence.com>
Signed-off-by: Rahul Kumar <kurahul@cadence.com>
Signed-off-by: Pawel Laszczak <pawell@cadence.com>
Fixes: 7733f6c32e ("usb: cdns3: Add Cadence USB3 DRD Driver")
Tested-by: Roger Quadros <rogerq@ti.com>
Link: https://lore.kernel.org/r/1570430355-26118-1-git-send-email-pawell@cadence.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
arm64 was the last architecture using CROSS_COMPILE_COMPAT_VDSO config
option. With this patch series the dependency in the architecture has
been removed.
Remove CROSS_COMPILE_COMPAT_VDSO from the Unified vDSO library code.
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Older versions of binutils (prior to 2.24) do not support the "ISHLD"
option for memory barrier instructions, which leads to a build failure
when assembling the vdso32 library.
Add a compilation time mechanism that detects if binutils supports those
instructions and configure the kernel accordingly.
Cc: Will Deacon <will@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Reported-by: Will Deacon <will@kernel.org>
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
The .config file and the generated include/config/auto.conf can
end up out of sync after a set of commands since
CONFIG_CROSS_COMPILE_COMPAT_VDSO is not updated correctly.
The sequence can be reproduced as follows:
$ make ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- defconfig
[...]
$ make ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- menuconfig
[set CONFIG_CROSS_COMPILE_COMPAT_VDSO="arm-linux-gnueabihf-"]
$ make ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu-
Which results in:
arch/arm64/Makefile:62: CROSS_COMPILE_COMPAT not defined or empty,
the compat vDSO will not be built
even though the compat vDSO has been built:
$ file arch/arm64/kernel/vdso32/vdso.so
arch/arm64/kernel/vdso32/vdso.so: ELF 32-bit LSB pie executable, ARM,
EABI5 version 1 (SYSV), dynamically linked,
BuildID[sha1]=c67f6c786f2d2d6f86c71f708595594aa25247f6, stripped
A similar case that involves changing the configuration parameter
multiple times can be reconducted to the same family of problems.
Remove the use of CONFIG_CROSS_COMPILE_COMPAT_VDSO altogether and
instead rely on the cross-compiler prefix coming from the environment
via CROSS_COMPILE_COMPAT, much like we do for the rest of the kernel.
Cc: Will Deacon <will@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Reported-by: Will Deacon <will@kernel.org>
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
When detecting a spurious EL1 translation fault, we attempt to compare
ESR_EL1.DFSC with PAR_EL1.FST. We erroneously use FIELD_PREP() to
extract PAR_EL1.FST, when we should be using FIELD_GET().
In the wise words of Robin Murphy:
| FIELD_GET() is a UBFX, FIELD_PREP() is a BFI
Using FIELD_PREP() means that that dfsc & ESR_ELx_FSC_TYPE is always
zero, and hence not equal to ESR_ELx_FSC_FAULT. Thus we detect any
unhandled translation fault as spurious.
... so let's use FIELD_GET() to ensure we don't decide all translation
faults are spurious. ESR_EL1.DFSC occupies bits [5:0], and requires no
shifting.
Fixes: 42f91093b0 ("arm64: mm: Ignore spurious translation faults taken from the kernel")
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reported-by: Robin Murphy <robin.murphy@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will.deacon@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
Fix the cleanup of the crypto state on a call after the call has been
disconnected. As the call has been disconnected, its connection ref has
been discarded and so we can't go through that to get to the security ops
table.
Fix this by caching the security ops pointer in the rxrpc_call struct and
using that when freeing the call security state. Also use this in other
places we're dealing with call-specific security.
The symptoms look like:
BUG: KASAN: use-after-free in rxrpc_release_call+0xb2d/0xb60
net/rxrpc/call_object.c:481
Read of size 8 at addr ffff888062ffeb50 by task syz-executor.5/4764
Fixes: 1db88c5343 ("rxrpc: Fix -Wframe-larger-than= warnings from on-stack crypto")
Reported-by: syzbot+eed305768ece6682bb7f@syzkaller.appspotmail.com
Signed-off-by: David Howells <dhowells@redhat.com>
The rxrpc_peer record needs to hold a reference on the rxrpc_local record
it points as the peer is used as a base to access information in the
rxrpc_local record.
This can cause problems in __rxrpc_put_peer(), where we need the network
namespace pointer, and in rxrpc_send_keepalive(), where we need to access
the UDP socket, leading to symptoms like:
BUG: KASAN: use-after-free in __rxrpc_put_peer net/rxrpc/peer_object.c:411
[inline]
BUG: KASAN: use-after-free in rxrpc_put_peer+0x685/0x6a0
net/rxrpc/peer_object.c:435
Read of size 8 at addr ffff888097ec0058 by task syz-executor823/24216
Fix this by taking a ref on the local record for the peer record.
Fixes: ace45bec6d ("rxrpc: Fix firewall route keepalive")
Fixes: 2baec2c3f8 ("rxrpc: Support network namespacing")
Reported-by: syzbot+b9be979c55f2bea8ed30@syzkaller.appspotmail.com
Signed-off-by: David Howells <dhowells@redhat.com>
rxrpc_put_call() calls trace_rxrpc_call() after it has done the decrement
of the refcount - which looks at the debug_id in the call record. But
unless the refcount was reduced to zero, we no longer have the right to
look in the record and, indeed, it may be deleted by some other thread.
Fix this by getting the debug_id out before decrementing the refcount and
then passing that into the tracepoint.
Fixes: e34d4234b0 ("rxrpc: Trace rxrpc_call usage")
Signed-off-by: David Howells <dhowells@redhat.com>
rxrpc_put_*conn() calls trace_rxrpc_conn() after they have done the
decrement of the refcount - which looks at the debug_id in the connection
record. But unless the refcount was reduced to zero, we no longer have the
right to look in the record and, indeed, it may be deleted by some other
thread.
Fix this by getting the debug_id out before decrementing the refcount and
then passing that into the tracepoint.
Fixes: 363deeab6d ("rxrpc: Add connection tracepoint and client conn state tracepoint")
Signed-off-by: David Howells <dhowells@redhat.com>
rxrpc_put_peer() calls trace_rxrpc_peer() after it has done the decrement
of the refcount - which looks at the debug_id in the peer record. But
unless the refcount was reduced to zero, we no longer have the right to
look in the record and, indeed, it may be deleted by some other thread.
Fix this by getting the debug_id out before decrementing the refcount and
then passing that into the tracepoint.
This can cause the following symptoms:
BUG: KASAN: use-after-free in __rxrpc_put_peer net/rxrpc/peer_object.c:411
[inline]
BUG: KASAN: use-after-free in rxrpc_put_peer+0x685/0x6a0
net/rxrpc/peer_object.c:435
Read of size 8 at addr ffff888097ec0058 by task syz-executor823/24216
Fixes: 1159d4b496 ("rxrpc: Add a tracepoint to track rxrpc_peer refcounting")
Reported-by: syzbot+b9be979c55f2bea8ed30@syzkaller.appspotmail.com
Signed-off-by: David Howells <dhowells@redhat.com>
$id doesn't match the actual filename, so update the $id
Fixes: c5e8f4ccd7 ("media: dt-bindings: media: Add Allwinner A10 CSI binding")
Signed-off-by: Pragnesh Patel <pragnesh.patel@sifive.com>
Signed-off-by: Maxime Ripard <mripard@kernel.org>
When sendmsg() finds a call to continue on with, if the call is in an
inappropriate state, it doesn't release the ref it just got on that call
before returning an error.
This causes the following symptom to show up with kasan:
BUG: KASAN: use-after-free in rxrpc_send_keepalive+0x8a2/0x940
net/rxrpc/output.c:635
Read of size 8 at addr ffff888064219698 by task kworker/0:3/11077
where line 635 is:
whdr.epoch = htonl(peer->local->rxnet->epoch);
The local endpoint (which cannot be pinned by the call) has been released,
but not the peer (which is pinned by the call).
Fix this by releasing the call in the error path.
Fixes: 37411cad63 ("rxrpc: Fix potential NULL-pointer exception")
Reported-by: syzbot+d850c266e3df14da1d31@syzkaller.appspotmail.com
Signed-off-by: David Howells <dhowells@redhat.com>
Commit 7e534323c4 ("mtd: rawnand: Pass a nand_chip object to
chip->read_xxx() hooks") modified the prototype of the struct nand_chip
read_buf function pointer. In the au1550nd driver we have 2
implementations of read_buf. The previously mentioned commit modified
the au_read_buf() implementation to match the function pointer, but not
au_read_buf16(). This results in a compiler warning for MIPS
db1xxx_defconfig builds:
drivers/mtd/nand/raw/au1550nd.c:443:57:
warning: pointer type mismatch in conditional expression
Fix this by updating the prototype of au_read_buf16() to take a struct
nand_chip pointer as its first argument, as is expected after commit
7e534323c4 ("mtd: rawnand: Pass a nand_chip object to chip->read_xxx()
hooks").
Note that this shouldn't have caused any functional issues at runtime,
since the offset of the struct mtd_info within struct nand_chip is 0
making mtd_to_nand() effectively a type-cast.
Signed-off-by: Paul Burton <paul.burton@mips.com>
Fixes: 7e534323c4 ("mtd: rawnand: Pass a nand_chip object to chip->read_xxx() hooks")
Cc: stable@vger.kernel.org # v4.20+
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Currently if the client identifies problems when processing
metadata returned in CREATE response, the open handle is being
leaked. This causes multiple problems like a file missing a lease
break by that client which causes high latencies to other clients
accessing the file. Another side-effect of this is that the file
can't be deleted.
Fix this by closing the file after the client hits an error after
the file was opened and the open descriptor wasn't returned to
the user space. Also convert -ESTALE to -EOPENSTALE to allow
the VFS to revalidate a dentry and retry the open.
Cc: <stable@vger.kernel.org>
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
After 'Initial git repository build' commit,
'mapping_table_ERRHRD' variable has not been used.
So 'mapping_table_ERRHRD' const variable could be removed
to mute below warning message:
fs/cifs/netmisc.c:120:40: warning: unused variable 'mapping_table_ERRHRD' [-Wunused-const-variable]
static const struct smb_to_posix_error mapping_table_ERRHRD[] = {
^
Signed-off-by: Austin Kim <austindh.kim@gmail.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Now that sparse has been fixed, it spotted a couple recent minor
endian errors (and removed one additional sparse warning).
Thanks to Luc Van Oostenryck for his help fixing sparse.
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Headphone on XPS 9350/9360 produces a background white noise. The The
noise level somehow correlates with "Headphone Mic Boost", when it sets
to 1 the noise disappears. However, doing this has a side effect, which
also decreases the overall headphone volume so I didn't send the patch
upstream.
The noise was bearable back then, but after commit 717f43d81a ("ALSA:
hda/realtek - Update headset mode for ALC256") the noise exacerbates to
a point it starts hurting ears.
So let's use the workaround to set "Headphone Mic Boost" to 1 and lock
it so it's not touchable by userspace.
Fixes: 717f43d81a ("ALSA: hda/realtek - Update headset mode for ALC256")
BugLink: https://bugs.launchpad.net/bugs/1654448
BugLink: https://bugs.launchpad.net/bugs/1845810
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Link: https://lore.kernel.org/r/20191003043919.10960-1-kai.heng.feng@canonical.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
While writing the tests for copy_struct_from_user(), I used a construct
that Linus doesn't appear to be too fond of:
On 2019-10-04, Linus Torvalds <torvalds@linux-foundation.org> wrote:
> Hmm. That code is ugly, both before and after the fix.
>
> This just doesn't make sense for so many reasons:
>
> if ((ret |= test(umem_src == NULL, "kmalloc failed")))
>
> where the insanity comes from
>
> - why "|=" when you know that "ret" was zero before (and it had to
> be, for the test to make sense)
>
> - why do this as a single line anyway?
>
> - don't do the stupid "double parenthesis" to hide a warning. Make it
> use an actual comparison if you add a layer of parentheses.
So instead, use a bog-standard check that isn't nearly as ugly.
Fixes: 341115822f ("usercopy: Add parentheses around assignment in test_copy_struct_from_user")
Fixes: f5a1a536fa ("lib: introduce copy_struct_from_user() helper")
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
Reviewed-by: Christian Brauner <christian.brauner@ubuntu.com>
Link: https://lore.kernel.org/r/20191005233028.18566-1-cyphar@cyphar.com
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Guarantee zeroed memory buffers for cases where potential memory
leak to disk can occur. In these cases, kmem_alloc is used and
doesn't zero the buffer, opening the possibility of information
leakage to disk.
Use existing infrastucture (xfs_buf_allocate_memory) to obtain
the already zeroed buffer from kernel memory.
This solution avoids the performance issue that would occur if a
wholesale change to replace kmem_alloc with kmem_zalloc was done.
Signed-off-by: Bill O'Donnell <billodo@redhat.com>
[darrick: fix bitwise complaint about kmflag_mask]
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
The flags arg is always passed as zero, so remove it.
(xfs_buf_get_uncached takes flags to support XBF_NO_IOACCT for
the sb, but that should never be relevant for xfs_get_aghdr_buf)
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
To ensure that all blocks touched by the range [offset, offset + count)
are allocated, we need to calculate the block count from the difference
of the range end (rounded up) and the range start (rounded down).
Before this patch, we just round up the byte count, which may lead to
unaligned ranges not being fully allocated:
$ touch test_file
$ block_size=$(stat -fc '%S' test_file)
$ fallocate -o $((block_size / 2)) -l $block_size test_file
$ xfs_bmap test_file
test_file:
0: [0..7]: 1396264..1396271
1: [8..15]: hole
There should not be a hole there. Instead, the first two blocks should
be fully allocated.
With this patch applied, the result is something like this:
$ touch test_file
$ block_size=$(stat -fc '%S' test_file)
$ fallocate -o $((block_size / 2)) -l $block_size test_file
$ xfs_bmap test_file
test_file:
0: [0..15]: 11024..11039
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Jose Abreu says:
====================
net: stmmac: Fixes for -net
Fixes for -net. More info in commit logs.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
With the current MAC addresses hard-coded in the test we can get some
false positives as we use the Hash Filtering method. Let's change the
MAC addresses in the tests to be unique when hashed.
Fixes: 091810dbde ("net: stmmac: Introduce selftests support")
Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some setups may not have all Unicast addresses filters available. Check
the number of available filters before trying to setup it.
Fixes: 477286b53f ("stmmac: add GMAC4 core support")
Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We need to check if the number of available Hash Filters is enough to
run the test, otherwise we will get false failures.
Fixes: 091810dbde ("net: stmmac: Introduce selftests support")
Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
scale_up wakes up waiters after scaling up. But after scaling max, it
should not wake up more waiters as waiters will not have anything to
do. This patch fixes this by making scale_up (and also scale_down)
return when threshold is reached.
This bug causes increased fdatasync latency when fdatasync and dd
conv=sync are performed in parallel on 4.19 compared to 4.14. This
bug was introduced during refactoring of blk-wbt code.
Fixes: a79050434b ("blk-rq-qos: refactor out common elements of blk-wbt")
Cc: stable@vger.kernel.org
Cc: Josef Bacik <jbacik@fb.com>
Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This reverts commit 85fbd722ad.
The commit was added as a quick band-aid for a hang that happened when a
block device was removed during system suspend. Now that bdi_wq is not
freezable anymore the hang should not be possible and we can get rid of
this hack by reverting it.
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
A removable block device, such as NVMe or SSD connected over Thunderbolt
can be hot-removed any time including when the system is suspended. When
device is hot-removed during suspend and the system gets resumed, kernel
first resumes devices and then thaws the userspace including freezable
workqueues. What happens in that case is that the NVMe driver notices
that the device is unplugged and removes it from the system. This ends
up calling bdi_unregister() for the gendisk which then schedules
wb_workfn() to be run one more time.
However, since the bdi_wq is still frozen flush_delayed_work() call in
wb_shutdown() blocks forever halting system resume process. User sees
this as hang as nothing is happening anymore.
Triggering sysrq-w reveals this:
Workqueue: nvme-wq nvme_remove_dead_ctrl_work [nvme]
Call Trace:
? __schedule+0x2c5/0x630
? wait_for_completion+0xa4/0x120
schedule+0x3e/0xc0
schedule_timeout+0x1c9/0x320
? resched_curr+0x1f/0xd0
? wait_for_completion+0xa4/0x120
wait_for_completion+0xc3/0x120
? wake_up_q+0x60/0x60
__flush_work+0x131/0x1e0
? flush_workqueue_prep_pwqs+0x130/0x130
bdi_unregister+0xb9/0x130
del_gendisk+0x2d2/0x2e0
nvme_ns_remove+0xed/0x110 [nvme_core]
nvme_remove_namespaces+0x96/0xd0 [nvme_core]
nvme_remove+0x5b/0x160 [nvme]
pci_device_remove+0x36/0x90
device_release_driver_internal+0xdf/0x1c0
nvme_remove_dead_ctrl_work+0x14/0x30 [nvme]
process_one_work+0x1c2/0x3f0
worker_thread+0x48/0x3e0
kthread+0x100/0x140
? current_work+0x30/0x30
? kthread_park+0x80/0x80
ret_from_fork+0x35/0x40
This is not limited to NVMes so exactly same issue can be reproduced by
hot-removing SSD (over Thunderbolt) while the system is suspended.
Prevent this from happening by removing WQ_FREEZABLE from bdi_wq.
Reported-by: AceLan Kao <acelan.kao@canonical.com>
Link: https://marc.info/?l=linux-kernel&m=138695698516487
Link: https://bugzilla.kernel.org/show_bug.cgi?id=204385
Link: https://lore.kernel.org/lkml/20191002122136.GD2819@lahna.fi.intel.com/#t
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Clearing the existing bitmask of mirrored ports essentially prevents us
from capturing more than one port at any given time. This is clearly
wrong, do not clear the bitmask prior to setting up the new port.
Reported-by: Hubert Feurstein <h.feurstein@gmail.com>
Fixes: ed3af5fd08 ("net: dsa: b53: Add support for port mirroring")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The GPIO controlled regulator for the ARM power supply is supplying
the higher voltage when the GPIO is driven high. This is opposite to
the similar regulator setup on the EVK board and is impacting stability
of the board as the ARM domain has been supplied with a too low voltage
when to faster OPPs are in use.
Fixes: 4a13b3bec3 (arm64: dts: imx: add Zii Ultra board support)
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
The SCU firmware API for getting UID should have response,
otherwise, the message stored in function stack could be
released and then the response data received from SCU will be
stored into that released stack and cause kernel NULL pointer
dump.
Fixes: 73feb4d0f8 ("soc: imx-scu: Add SoC UID(unique identifier) support")
Signed-off-by: Anson Huang <Anson.Huang@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
dev_get_platdata(&pdev->dev) returns a pointer on struct stmfx_pinctrl,
not on struct stmfx (platform_set_drvdata(pdev, pctl); in probe).
Pointer on struct stmfx is stored in driver data of pdev parent (in probe:
struct stmfx *stmfx = dev_get_drvdata(pdev->dev.parent);).
Fixes: 1490d9f841 ("pinctrl: Add STMFX GPIO expander Pinctrl/GPIO driver")
Signed-off-by: Amelie Delaunay <amelie.delaunay@st.com>
Link: https://lore.kernel.org/r/20191004122342.22018-1-amelie.delaunay@st.com
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Commit 7fd8930f26
"nvme: add a common helper to read Identify Controller data"
has re-introduced an issue that we have attempted to work around in the
past, in commit a310acd7a7 ("NVMe: use split lo_hi_{read,write}q").
The problem is that some PCIe NVMe controllers do not implement 64-bit
outbound accesses correctly, which is why the commit above switched
to using lo_hi_[read|write]q for all 64-bit BAR accesses occuring in
the code.
In the mean time, the NVMe subsystem has been refactored, and now calls
into the PCIe support layer for NVMe via a .reg_read64() method, which
fails to use lo_hi_readq(), and thus reintroduces the problem that the
workaround above aimed to address.
Given that, at the moment, .reg_read64() is only used to read the
capability register [which is known to tolerate split reads], let's
switch .reg_read64() to lo_hi_readq() as well.
This fixes a boot issue on some ARM boxes with NVMe behind a Synopsys
DesignWare PCIe host controller.
Fixes: 7fd8930f26 ("nvme: add a common helper to read Identify Controller data")
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
nvme_update_formats may fail to revalidate the namespace and
attempt to remove the namespace. This may lead to a deadlock
as nvme_ns_remove will attempt to acquire the subsystem lock
which is already acquired by the passthru command with effects.
Move the invalid namepsace removal to after the passthru command
releases the subsystem lock.
Reported-by: Judy Brock <judy.brock@samsung.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
The pinctrl->functions[] array has pinctrl->num_functions elements and
the pinctrl->groups[] array is the same way. These are set in
ns2_pinmux_probe(). So the > comparisons should be >= so that we don't
read one element beyond the end of the array.
Fixes: b5aa1006e4 ("pinctrl: ns2: add pinmux driver support for Broadcom NS2 SoC")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Link: https://lore.kernel.org/r/20190926081426.GB2332@mwanda
Acked-by: Scott Branden <scott.branden@broadcom.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
The 37xx configuration registers are only 32 bits long, so
pins 32-35 spill over into the next register. The calculation
for the register address was done, but the bitmask was not, so
any configuration to pin 32 or above resulted in a bitmask that
overflowed and performed no action.
Fix the register / offset calculation to also adjust the offset.
Fixes: 5715092a45 ("pinctrl: armada-37xx: Add gpio support")
Signed-off-by: Patrick Williams <alpawi@amazon.com>
Acked-by: Gregory CLEMENT <gregory.clement@bootlin.com>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20191001154634.96165-1-alpawi@amazon.com
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
The RockPro64 schematic [1] page 18 states a min voltage of 0.8V and a
max voltage of 1.4V for the VDD_LOG pwm regulator. However, there is an
additional note that the pwm parameter needs to be modified.
From the schematics a voltage range of 0.8V to 1.7V can be calculated.
Additional voltage measurements on the board show that this fix indeed
leads to the correct voltage, while without this fix the voltage was set
too high.
[1] http://files.pine64.org/doc/rockpro64/rockpro64_v21-SCH.pdf
Fixes: e4f3fb4909 ("arm64: dts: rockchip: add initial dts support for Rockpro64")
Signed-off-by: Soeren Moch <smoch@web.de>
Link: https://lore.kernel.org/r/20191003215036.15023-1-smoch@web.de
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
While MR uses live as the SRCU 'update', the MW case uses the xarray
directly, xa_erase() causes the MW to become inaccessible to the pagefault
thread.
Thus whenever a MW is removed from the xarray we must synchronize_srcu()
before freeing it.
This must be done before freeing the mkey as re-use of the mkey while the
pagefault thread is using the stale mkey is undesirable.
Add the missing synchronizes to MW and DEVX indirect mkey and delete the
bogus protection against double destroy in mlx5_core_destroy_mkey()
Fixes: 534fd7aac5 ("IB/mlx5: Manage indirection mkey upon DEVX flow for ODP")
Fixes: 6aec21f6a8 ("IB/mlx5: Page faults handling infrastructure")
Link: https://lore.kernel.org/r/20191001153821.23621-7-jgg@ziepe.ca
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
live is used to signal to the pagefault thread that the MR is initialized
and ready for use. It should be after the umem is assigned and all other
setup is completed. This prevents races (at least) of the form:
CPU0 CPU1
mlx5_ib_alloc_implicit_mr()
implicit_mr_alloc()
live = 1
imr->umem = umem
num_pending_prefetch_inc()
if (live)
atomic_inc(num_pending_prefetch)
atomic_set(num_pending_prefetch,0) // Overwrites other thread's store
Further, live is being used with SRCU as the 'update' in an
acquire/release fashion, so it can not be read and written raw.
Move all live = 1's to after MR initialization is completed and use
smp_store_release/smp_load_acquire() for manipulating it.
Add a missing live = 0 when an implicit MR child is deleted, before
queuing work to do synchronize_srcu().
The barriers in update_odp_mr() were some broken attempt to create a
acquire/release, but were not even applied consistently and missed the
point, delete it as well.
Fixes: 6aec21f6a8 ("IB/mlx5: Page faults handling infrastructure")
Link: https://lore.kernel.org/r/20191001153821.23621-6-jgg@ziepe.ca
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
During destroy setting live = 0 and then synchronize_srcu() prevents
num_pending_prefetch from incrementing, and also, ensures that all work
holding that count is queued on the WQ. Testing before causes races of the
form:
CPU0 CPU1
dereg_mr()
mlx5_ib_advise_mr_prefetch()
srcu_read_lock()
num_pending_prefetch_inc()
if (!live)
live = 0
atomic_read() == 0
// skip flush_workqueue()
atomic_inc()
queue_work();
srcu_read_unlock()
WARN_ON(atomic_read()) // Fails
Swap the order so that the synchronize_srcu() prevents this.
Fixes: a6bc3875f1 ("IB/mlx5: Protect against prefetch of invalid MR")
Link: https://lore.kernel.org/r/20191001153821.23621-5-jgg@ziepe.ca
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
This fixes a race of the form:
CPU0 CPU1
mlx5_ib_invalidate_range() mlx5_ib_invalidate_range()
// This one actually makes npages == 0
ib_umem_odp_unmap_dma_pages()
if (npages == 0 && !dying)
// This one does nothing
ib_umem_odp_unmap_dma_pages()
if (npages == 0 && !dying)
dying = 1;
dying = 1;
schedule_work(&umem_odp->work);
// Double schedule of the same work
schedule_work(&umem_odp->work); // BOOM
npages and dying must be read and written under the umem_mutex lock.
Since whenever ib_umem_odp_unmap_dma_pages() is called mlx5 must also call
mlx5_ib_update_xlt, and both need to be done in the same locking region,
hoist the lock out of unmap.
This avoids an expensive double critical section in
mlx5_ib_invalidate_range().
Fixes: 81713d3788 ("IB/mlx5: Add implicit MR support")
Link: https://lore.kernel.org/r/20191001153821.23621-4-jgg@ziepe.ca
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
mlx5_ib_update_xlt() must be protected against parallel free of the MR it
is accessing, also it must be called single threaded while updating the
HW. Otherwise we can have races of the form:
CPU0 CPU1
mlx5_ib_update_xlt()
mlx5_odp_populate_klm()
odp_lookup() == NULL
pklm = ZAP
implicit_mr_get_data()
implicit_mr_alloc()
<update interval tree>
mlx5_ib_update_xlt
mlx5_odp_populate_klm()
odp_lookup() != NULL
pklm = VALID
mlx5_ib_post_send_wait()
mlx5_ib_post_send_wait() // Replaces VALID with ZAP
This can be solved by putting both the SRCU and the umem_mutex lock around
every call to mlx5_ib_update_xlt(). This ensures that the content of the
interval tree relavent to mlx5_odp_populate_klm() (ie mr->parent == mr)
will not change while it is running, and thus the posted WRs to update the
KLM will always reflect the correct information.
The race above will resolve by either having CPU1 wait till CPU0 completes
the ZAP or CPU0 will run after the add and instead store VALID.
The pagefault path adding children already holds the umem_mutex and SRCU,
so the only missed lock is during MR destruction.
Fixes: 81713d3788 ("IB/mlx5: Add implicit MR support")
Link: https://lore.kernel.org/r/20191001153821.23621-3-jgg@ziepe.ca
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Nicolas pointed out that the cxgb4 driver is doing dma off of the stack,
which is generally considered a very bad thing. On some architectures it
could be a security problem, but odds are none of them actually run this
driver, so it's just a "normal" bug.
Resolve this by allocating the memory for a message off of the heap
instead of the stack. kmalloc() always will give us a proper memory
location that DMA will work correctly from.
Link: https://lore.kernel.org/r/20191001165611.GA3542072@kroah.com
Reported-by: Nicolas Waisman <nico@semmle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Tested-by: Potnuri Bharat Teja <bharat@chelsio.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
In the process of moving the debug counters sysfs entries, the commit
mentioned below eliminated the cm_infiniband sysfs directory.
This sysfs directory was tied to the cm_port object allocated in procedure
cm_add_one().
Before the commit below, this cm_port object was freed via a call to
kobject_put(port->kobj) in procedure cm_remove_port_fs().
Since port no longer uses its kobj, kobject_put(port->kobj) was eliminated.
This, however, meant that kfree was never called for the cm_port buffers.
Fix this by adding explicit kfree(port) calls to functions cm_add_one()
and cm_remove_one().
Note: the kfree call in the first chunk below (in the cm_add_one error
flow) fixes an old, undetected memory leak.
Fixes: c87e65cfb9 ("RDMA/cm: Move debug counters to be under relevant IB device")
Link: https://lore.kernel.org/r/20190916071154.20383-2-leon@kernel.org
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
i40iw IB device registration fails with ENODEV.
ib_register_device
setup_device/setup_port_data
i40iw_port_immutable
ib_query_port
iw_query_port
ib_device_get_netdev(ENODEV)
ib_device_get_netdev() does not have a netdev associated
with the ibdev and thus fails.
Use ib_device_set_netdev() to associate netdev to ibdev
in i40iw before IB device registration.
Fixes: 4929116bdf ("RDMA/core: Add common iWARP query port")
Link: https://lore.kernel.org/r/20190925164524.856-1-shiraz.saleem@intel.com
Signed-off-by: Shiraz, Saleem <shiraz.saleem@intel.com>
Reviewed-by: Kamal Heib <kamalheib1@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The TWL4030 used on the Logit PD Torpedo SOM does not have the
keypad pins routed. This patch disables the twl_keypad driver
to remove some splat during boot:
twl4030_keypad 48070000.i2c:twl@48:keypad: missing or malformed property linux,keymap: -22
twl4030_keypad 48070000.i2c:twl@48:keypad: Failed to build keymap
twl4030_keypad: probe of 48070000.i2c:twl@48:keypad failed with error -22
Signed-off-by: Adam Ford <aford173@gmail.com>
[tony@atomide.com: removed error time stamps]
Signed-off-by: Tony Lindgren <tony@atomide.com>
The fixed MKHI client on PCH 6 gen platforms
does not support fw version retrieval.
The error is not fatal, but it fills up the kernel logs and
slows down the driver start.
This patch disables requesting FW version on GEN6 and earlier platforms.
Fixes warning:
[ 15.964298] mei mei::55213584-9a29-4916-badf-0fb7ed682aeb:01: Could not read FW version
[ 15.964301] mei mei::55213584-9a29-4916-badf-0fb7ed682aeb:01: version command failed -5
Cc: <stable@vger.kernel.org> +v4.18
Cc: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com>
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Link: https://lore.kernel.org/r/20191004181722.31374-1-tomas.winkler@intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Johan writes:
USB-serial fixes for 5.4-rc2
Here's a fix for a long-standing issue in the keyspan driver which could
lead to NULL-pointer dereferences when a device had unexpected endpoint
descriptors.
Included are also some new device IDs.
All but the last two commits have been in linux-next with no reported
issues.
Signed-off-by: Johan Hovold <johan@kernel.org>
* tag 'usb-serial-5.4-rc2' of https://git.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial:
USB: serial: keyspan: fix NULL-derefs on open() and write()
USB: serial: option: add support for Cinterion CLS8 devices
USB: serial: option: add Telit FN980 compositions
USB: serial: ftdi_sio: add device IDs for Sienna and Echelon PL-20
Fix tty driver build on SPARC by not using __exitdata.
It appears that SPARC does not support section .exit.data.
Fixes these build errors:
`.exit.data' referenced in section `.exit.text' of drivers/tty/n_hdlc.o: defined in discarded section `.exit.data' of drivers/tty/n_hdlc.o
`.exit.data' referenced in section `.exit.text' of drivers/tty/n_hdlc.o: defined in discarded section `.exit.data' of drivers/tty/n_hdlc.o
`.exit.data' referenced in section `.exit.text' of drivers/tty/n_hdlc.o: defined in discarded section `.exit.data' of drivers/tty/n_hdlc.o
`.exit.data' referenced in section `.exit.text' of drivers/tty/n_hdlc.o: defined in discarded section `.exit.data' of drivers/tty/n_hdlc.o
Reported-by: kbuild test robot <lkp@intel.com>
Fixes: 063246641d ("format-security: move static strings to const")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: https://lore.kernel.org/r/675e7bd9-955b-3ff3-1101-a973b58b5b75@infradead.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Call uart_unregister_driver() conditionally instead of
unconditionally, only if it has been previously registered.
This uses driver.state, just as the sh-sci.c driver does.
Fixes this null pointer dereference in tty_unregister_driver(),
since the 'driver' argument is null:
general protection fault: 0000 [#1] PREEMPT SMP KASAN PTI
RIP: 0010:tty_unregister_driver+0x25/0x1d0
Fixes: 238b8721a5 ("[PATCH] serial uartlite driver")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: stable <stable@vger.kernel.org>
Cc: Peter Korsgaard <jacmet@sunsite.dk>
Link: https://lore.kernel.org/r/9c8e6581-6fcc-a595-0897-4d90f5d710df@infradead.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
As platform_get_irq() now prints an error when the interrupt does not
exist, scary warnings may be printed for optional interrupts:
sh-sci e6550000.serial: IRQ index 1 not found
sh-sci e6550000.serial: IRQ index 2 not found
sh-sci e6550000.serial: IRQ index 3 not found
sh-sci e6550000.serial: IRQ index 4 not found
sh-sci e6550000.serial: IRQ index 5 not found
Fix this by calling platform_get_irq_optional() instead for all but the
first interrupts, which are optional.
Fixes: 7723f4c5ec ("driver core: platform: Add an error message to platform_get_irq*()")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Tested-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Link: https://lore.kernel.org/r/20191001180743.1041-1-geert+renesas@glider.be
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Since commit c2b71462d2 ("USB: core: Fix bug caused by duplicate
interface PM usage counter") USB drivers must always balance their
runtime PM gets and puts, including when the driver has already been
unbound from the interface.
Leaving the interface with a positive PM usage counter would prevent a
later bound driver from suspending the device.
Note that runtime PM has never actually been enabled for this driver
since the support_autosuspend flag in its usb_driver struct is not set.
Fixes: c2b71462d2 ("USB: core: Fix bug caused by duplicate interface PM usage counter")
Cc: stable <stable@vger.kernel.org>
Acked-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://lore.kernel.org/r/20191001084908.2003-5-johan@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Since commit c2b71462d2 ("USB: core: Fix bug caused by duplicate
interface PM usage counter") USB drivers must always balance their
runtime PM gets and puts, including when the driver has already been
unbound from the interface.
Leaving the interface with a positive PM usage counter would prevent a
later bound driver from suspending the device.
Fixes: c2b71462d2 ("USB: core: Fix bug caused by duplicate interface PM usage counter")
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://lore.kernel.org/r/20191001084908.2003-4-johan@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Since commit c2b71462d2 ("USB: core: Fix bug caused by duplicate
interface PM usage counter") USB drivers must always balance their
runtime PM gets and puts, including when the driver has already been
unbound from the interface.
Leaving the interface with a positive PM usage counter would prevent a
later bound driver from suspending the device.
Fixes: c2b71462d2 ("USB: core: Fix bug caused by duplicate interface PM usage counter")
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://lore.kernel.org/r/20191001084908.2003-3-johan@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Since commit c2b71462d2 ("USB: core: Fix bug caused by duplicate
interface PM usage counter") USB drivers must always balance their
runtime PM gets and puts, including when the driver has already been
unbound from the interface.
Leaving the interface with a positive PM usage counter would prevent a
later bound driver from suspending the device.
Fixes: c2b71462d2 ("USB: core: Fix bug caused by duplicate interface PM usage counter")
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://lore.kernel.org/r/20191001084908.2003-2-johan@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
While the original bindings that were superseeded by the YAML schemas
didn't mention that phy-names was needed, it turns out that phy-names is
required if phys is set according to phy/phy-bindings.txt.
Let's add back those properties.
Fixes: 14ec072a19 ("dt-bindings: usb: Convert USB HCD generic binding to YAML")
Fixes: c93bcace10 ("dt-bindings: usb: Convert the generic OHCI binding to YAML")
Fixes: c3e2485d5f ("dt-bindings: usb: Convert the generic EHCI binding to YAML")
Reported-by: Emmanuel Vadot <manu@bidouilliste.com>
Signed-off-by: Maxime Ripard <mripard@kernel.org>
Link: https://lore.kernel.org/r/20191002112651.100504-2-mripard@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This reverts commits 3d109bdca9 ("ARM: dts: sunxi: Remove useless
phy-names from EHCI and OHCI"), 0a3df8bb6d ("ARM: dts: sunxi: h3/h5:
Remove useless phy-names from EHCI and OHCI") and 3c7ab90aaa ("arm64:
dts: allwinner: Remove useless phy-names from EHCI and OHCI").
It turns out that while the USB bindings were not mentionning it, the PHY
client bindings were mandating that phy-names is set when phys is. Let's
add it back.
Fixes: 3d109bdca9 ("ARM: dts: sunxi: Remove useless phy-names from EHCI and OHCI")
Fixes: 0a3df8bb6d ("ARM: dts: sunxi: h3/h5: Remove useless phy-names from EHCI and OHCI")
Fixes: 3c7ab90aaa ("arm64: dts: allwinner: Remove useless phy-names from EHCI and OHCI")
Reported-by: Emmanuel Vadot <manu@bidouilliste.com>
Signed-off-by: Maxime Ripard <mripard@kernel.org>
Link: https://lore.kernel.org/r/20191002112651.100504-1-mripard@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
gcc points out a suspicious cast from a pointer to an 'int' when
compile-testing on 64-bit architectures.
drivers/usb/gadget/udc/lpc32xx_udc.c: In function ‘udc_pop_fifo’:
drivers/usb/gadget/udc/lpc32xx_udc.c:1156:11: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
drivers/usb/gadget/udc/lpc32xx_udc.c: In function ‘udc_stuff_fifo’:
drivers/usb/gadget/udc/lpc32xx_udc.c:1257:11: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
The code works find, but it's easy enough to change the cast to
a uintptr_t to shut up that warning.
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20190918200201.2292008-1-arnd@arndb.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The system can hit a deadlock if an xhci adapter breaks while initializing.
The deadlock is between two threads: thread 1 is tearing down the
adapter and is stuck in usb_unlocked_disable_lpm waiting to lock the
hcd->handwidth_mutex. Thread 2 is holding this mutex (while still trying
to add a usb device), but is stuck in xhci_endpoint_reset waiting for a
stop or config command to complete. A reboot is required to resolve.
It turns out when calling xhci_queue_stop_endpoint and
xhci_queue_configure_endpoint in xhci_endpoint_reset, the return code is
not checked for errors. If the timing is right and the adapter dies just
before either of these commands get issued, we hang indefinitely waiting
for a completion on a command that didn't get issued.
This wasn't a problem before the following fix because we didn't send
commands in xhci_endpoint_reset:
commit f5249461b5 ("xhci: Clear the host side toggle manually when
endpoint is soft reset")
With the patch I am submitting, a duration test which breaks adapters
during initialization (and which deadlocks with the standard kernel) runs
without issue.
Fixes: f5249461b5 ("xhci: Clear the host side toggle manually when endpoint is soft reset")
Cc: <stable@vger.kernel.org> # v4.17+
Cc: Torez Smith <torez@redhat.com>
Signed-off-by: Bill Kuzeja <william.kuzeja@stratus.com>
Signed-off-by: Torez Smith <torez@redhat.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/1570190373-30684-7-git-send-email-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The check printing out the "WARN Wrong bounce buffer write length:"
uses incorrect values when comparing bytes written from scatterlist
to bounce buffer. Actual copied lengths are fine.
The used seg->bounce_len will be set to equal new_buf_len a few lines later
in the code, but is incorrect when doing the comparison.
The patch which added this false warning was backported to 4.8+ kernels
so this should be backported as far as well.
Cc: <stable@vger.kernel.org> # v4.8+
Fixes: 597c56e372 ("xhci: update bounce buffer with correct sg num")
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/1570190373-30684-2-git-send-email-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The driver is using its struct usb_device pointer as an inverted
disconnected flag, but was setting it to NULL before making sure all
completion handlers had run. This could lead to a NULL-pointer
dereference in a number of dev_dbg and dev_err statements in the
completion handlers which relies on said pointer.
Fix this by unconditionally stopping all I/O and preventing
resubmissions by poisoning the interrupt URBs at disconnect and using a
dedicated disconnected flag.
This also makes sure that all I/O has completed by the time the
disconnect callback returns.
Fixes: 9d974b2a06 ("USB: legousbtower.c: remove err() usage")
Fixes: fef526cae7 ("USB: legousbtower: remove custom debug macro")
Fixes: 4dae996380 ("USB: legotower: remove custom debug macro and module parameter")
Cc: stable <stable@vger.kernel.org> # 3.5
Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://lore.kernel.org/r/20190919083039.30898-4-johan@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The "run_isr" flag is used for preventing the driver from
calling the interrupt service routine in its runtime resume
callback when the driver is expecting completion to a
command, but what that basically does is that it hides the
real problem. The real problem is that the controller is
allowed to suspend in the middle of command execution.
As a more appropriate fix for the problem, using autosuspend
delay time that matches UCSI_TIMEOUT_MS (5s). That prevents
the controller from suspending while still in the middle of
executing a command.
This fixes a potential deadlock. Both ccg_read() and
ccg_write() are called with the mutex already taken at least
from ccg_send_command(). In ccg_read() and ccg_write, the
mutex is only acquired so that run_isr flag can be set.
Fixes: f0e4cd948b ("usb: typec: ucsi: ccg: add runtime pm workaround")
Cc: stable@vger.kernel.org
Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Link: https://lore.kernel.org/r/20191004100219.71152-2-heikki.krogerus@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
CPUs affected by Neoverse-N1 #1542419 may execute a stale instruction if
it was recently modified. The affected sequence requires freshly written
instructions to be executable before a branch to them is updated.
There are very few places in the kernel that modify executable text,
all but one come with sufficient synchronisation:
* The module loader's flush_module_icache() calls flush_icache_range(),
which does a kick_all_cpus_sync()
* bpf_int_jit_compile() calls flush_icache_range().
* Kprobes calls aarch64_insn_patch_text(), which does its work in
stop_machine().
* static keys and ftrace both patch between nops and branches to
existing kernel code (not generated code).
The affected sequence is the interaction between ftrace and modules.
The module PLT is cleaned using __flush_icache_range() as the trampoline
shouldn't be executable until we update the branch to it.
Drop the double-underscore so that this path runs kick_all_cpus_sync()
too.
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Commit bd82d4bd21 ("arm64: Fix incorrect irqflag restore for priority
masking") added a macro to the entry.S call paths that leave the
PSTATE.I bit set. This tells the pPNMI masking logic that interrupts
are masked by the CPU, not by the PMR. This value is read back by
local_daif_save().
Commit bd82d4bd21 added this call to el0_svc, as el0_svc_handler
is called with interrupts masked. el0_svc_compat was missed, but should
be covered in the same way as both of these paths end up in
el0_svc_common(), which expects to unmask interrupts.
Fixes: bd82d4bd21 ("arm64: Fix incorrect irqflag restore for priority masking")
Signed-off-by: James Morse <james.morse@arm.com>
Cc: Julien Thierry <julien.thierry.kdev@gmail.com>
Signed-off-by: Will Deacon <will@kernel.org>
If we take an unhandled fault in the kernel, we call show_pte() to dump
the {PGDP,PGD,PUD,PMD,PTE} values for the corresponding page table walk,
where the PGDP value is virt_to_phys(mm->pgd).
The boot-time and runtime kernel page tables, init_pg_dir and
swapper_pg_dir respectively, are kernel symbols. Thus, it is not valid
to call virt_to_phys() on either of these, though we'll do so if we take
a fault on a TTBR1 address.
When CONFIG_DEBUG_VIRTUAL is not selected, virt_to_phys() will silently
fix this up. However, when CONFIG_DEBUG_VIRTUAL is selected, this
results in splats as below. Depending on when these occur, they can
happen to suppress information needed to debug the original unhandled
fault, such as the backtrace:
| Unable to handle kernel paging request at virtual address ffff7fffec73cf0f
| Mem abort info:
| ESR = 0x96000004
| EC = 0x25: DABT (current EL), IL = 32 bits
| SET = 0, FnV = 0
| EA = 0, S1PTW = 0
| Data abort info:
| ISV = 0, ISS = 0x00000004
| CM = 0, WnR = 0
| ------------[ cut here ]------------
| virt_to_phys used for non-linear address: 00000000102c9dbe (swapper_pg_dir+0x0/0x1000)
| WARNING: CPU: 1 PID: 7558 at arch/arm64/mm/physaddr.c:15 __virt_to_phys+0xe0/0x170 arch/arm64/mm/physaddr.c:12
| Kernel panic - not syncing: panic_on_warn set ...
| SMP: stopping secondary CPUs
| Dumping ftrace buffer:
| (ftrace buffer empty)
| Kernel Offset: disabled
| CPU features: 0x0002,23000438
| Memory Limit: none
| Rebooting in 1 seconds..
We can avoid this by ensuring that we call __pa_symbol() for
init_mm.pgd, as this will always be a kernel symbol. As the dumped
{PGD,PUD,PMD,PTE} values are the raw values from the relevant entries we
don't need to handle these specially.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
The HWCAP framework will detect a new capability based on the sanitized
version of the ID registers.
Sanitization is based on a whitelist, so any field not described will end
up to be zeroed.
At the moment, ID_AA64ISAR1_EL1.FRINTTS is not described in
ftr_id_aa64isar1. This means the field will be zeroed and therefore the
userspace will not be able to see the HWCAP even if the hardware
supports the feature.
This can be fixed by describing the field in ftr_id_aa64isar1.
Fixes: ca9503fc9e ("arm64: Expose FRINT capabilities to userspace")
Signed-off-by: Julien Grall <julien.grall@arm.com>
Cc: mark.brown@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
As of ac7c3e4ff4 ("compiler: enable CONFIG_OPTIMIZE_INLINING forcibly"),
inline functions are no longer annotated with '__always_inline', which
allows the compiler to decide whether inlining is really a good idea or
not. Although this is a great idea on paper, the reality is that AArch64
GCC prior to 9.1 has been shown to get confused when creating an
out-of-line copy of a function passing explicit 'register' variables
into an inline assembly block:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91111
It's not clear whether this is specific to arm64 or not but, for now,
ensure that all of our functions using 'register' variables are marked
as '__always_inline' so that the old behaviour is effectively preserved.
Hopefully other architectures are luckier with their compilers.
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Nicolas Saenz Julienne <nsaenzjulienne@suse.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Drop the redundant disconnect mutex which was introduced after the
open-disconnect race had been addressed generally in USB core by commit
d4ead16f50 ("USB: prevent char device open/deregister race").
Specifically, the rw-semaphore in core guarantees that all calls to
open() will have completed and that no new calls to open() will occur
after usb_deregister_dev() returns. Hence there is no need use the
driver data as an inverted disconnected flag.
Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://lore.kernel.org/r/20190926091228.24634-8-johan@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
According to Greg KH, it has been generally agreed that when a USB
driver encounters an unknown error (or one it can't handle directly),
it should just give up instead of going into a potentially infinite
retry loop.
The three codes -EPROTO, -EILSEQ, and -ETIME fall into this category.
They can be caused by bus errors such as packet loss or corruption,
attempting to communicate with a disconnected device, or by malicious
firmware. Nowadays the extent of packet loss or corruption is
negligible, so it should be safe for a driver to give up whenever one
of these errors occurs.
Although the yurex driver handles -EILSEQ errors in this way, it
doesn't do the same for -EPROTO (as discovered by the syzbot fuzzer)
or other unrecognized errors. This patch adjusts the driver so that
it doesn't log an error message for -EPROTO or -ETIME, and it doesn't
retry after any errors.
Reported-and-tested-by: syzbot+b24d736f18a1541ad550@syzkaller.appspotmail.com
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
CC: Tomoki Sekiyama <tomoki.sekiyama@gmail.com>
CC: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/Pine.LNX.4.44L0.1909171245410.1590-100000@iolanthe.rowland.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The driver was using its struct usb_device pointer as an inverted
disconnected flag, but was setting it to NULL before making sure all
completion handlers had run. This could lead to a NULL-pointer
dereference in a number of dev_dbg statements in the completion handlers
which relies on said pointer.
The pointer was also dereferenced unconditionally in a dev_dbg statement
release() something which would lead to a NULL-deref whenever a device
was disconnected before the final character-device close if debugging
was enabled.
Fix this by unconditionally stopping all I/O and preventing
resubmissions by poisoning the interrupt URBs at disconnect and using a
dedicated disconnected flag.
This also makes sure that all I/O has completed by the time the
disconnect callback returns.
Fixes: 1ef37c6047 ("USB: adutux: remove custom debug macro and module parameter")
Fixes: 66d4bc30d1 ("USB: adutux: remove custom debug macro")
Cc: stable <stable@vger.kernel.org> # 3.12
Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://lore.kernel.org/r/20190925092913.8608-2-johan@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fix NULL-pointer dereferences on open() and write() which can be
triggered by a malicious USB device.
The current URB allocation helper would fail to initialise the newly
allocated URB if the device has unexpected endpoint descriptors,
something which could lead NULL-pointer dereferences in a number of
open() and write() paths when accessing the URB. For example:
BUG: kernel NULL pointer dereference, address: 0000000000000000
...
RIP: 0010:usb_clear_halt+0x11/0xc0
...
Call Trace:
? tty_port_open+0x4d/0xd0
keyspan_open+0x70/0x160 [keyspan]
serial_port_activate+0x5b/0x80 [usbserial]
tty_port_open+0x7b/0xd0
? check_tty_count+0x43/0xa0
tty_open+0xf1/0x490
BUG: kernel NULL pointer dereference, address: 0000000000000000
...
RIP: 0010:keyspan_write+0x14e/0x1f3 [keyspan]
...
Call Trace:
serial_write+0x43/0xa0 [usbserial]
n_tty_write+0x1af/0x4f0
? do_wait_intr_irq+0x80/0x80
? process_echoes+0x60/0x60
tty_write+0x13f/0x2f0
BUG: kernel NULL pointer dereference, address: 0000000000000000
...
RIP: 0010:keyspan_usa26_send_setup+0x298/0x305 [keyspan]
...
Call Trace:
keyspan_open+0x10f/0x160 [keyspan]
serial_port_activate+0x5b/0x80 [usbserial]
tty_port_open+0x7b/0xd0
? check_tty_count+0x43/0xa0
tty_open+0xf1/0x490
Fixes: fdcba53e2d ("fix for bugzilla #7544 (keyspan USB-to-serial converter)")
Cc: stable <stable@vger.kernel.org> # 2.6.21
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Johan Hovold <johan@kernel.org>
On excessive bit errors for the FCP channel ingress fibre path, the channel
notifies us. Previously, we only emitted a kernel message and a trace
record. Since performance can become suboptimal with I/O timeouts due to
bit errors, we now stop using an FCP device by default on channel
notification so multipath on top can timely failover to other paths. A new
module parameter zfcp.ber_stop can be used to get zfcp old behavior.
User explanation of new kernel message:
* Description:
* The FCP channel reported that its bit error threshold has been exceeded.
* These errors might result from a problem with the physical components
* of the local fibre link into the FCP channel.
* The problem might be damage or malfunction of the cable or
* cable connection between the FCP channel and
* the adjacent fabric switch port or the point-to-point peer.
* Find details about the errors in the HBA trace for the FCP device.
* The zfcp device driver closed down the FCP device
* to limit the performance impact from possible I/O command timeouts.
* User action:
* Check for problems on the local fibre link, ensure that fibre optics are
* clean and functional, and all cables are properly plugged.
* After the repair action, you can manually recover the FCP device by
* writing "0" into its "failed" sysfs attribute.
* If recovery through sysfs is not possible, set the CHPID of the device
* offline and back online on the service element.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Cc: <stable@vger.kernel.org> #2.6.30+
Link: https://lore.kernel.org/r/20191001104949.42810-1-maier@linux.ibm.com
Reviewed-by: Jens Remus <jremus@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When a non-passthrough command is terminated with CHECK CONDITION, request
sense is executed by hijacking the command descriptor. Since
scsi_eh_prep_cmnd() and scsi_eh_restore_cmnd() do not save/restore the
original command resid, the value returned on failure of the original
command is lost and replaced with the value set by the execution of the
request sense command. This value may in many instances be unaligned to the
device sector size, causing sd_done() to print a warning message about the
incorrect unaligned resid before the command is retried.
Fix this problem by saving the original command residual in struct
scsi_eh_save using scsi_eh_prep_cmnd() and restoring it in
scsi_eh_restore_cmnd(). In addition, to make sure that the request sense
command is executed with a correctly initialized command structure, also
reset the residual to 0 in scsi_eh_prep_cmnd() after saving the original
command value in struct scsi_eh_save.
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20191001074839.1994-1-damien.lemoal@wdc.com
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The naming convention for the existing Theobroma boards is
soc-q7module-baseboard, so rk3399-puma-haikou and the in-kernel
devicetrees also follow that scheme.
For some reason in the binding a wrong or outdated naming slipped
in which does not match the used devicetrees and makes the dt-schema
complain now.
Fix this by using the names used in the wild by actual boards.
Fixes: a323a513c7 ("dt-bindings: arm: Convert Rockchip board/soc bindings to json-schema")
[although the issue was also present in the old txt file]
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Reviewed-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20190917083453.25744-1-heiko@sntech.de
Fix the pinctrl and interrupt specifier for RK808 to use GPIO3_B2. On the
Rockpro64 schematic [1] page 16, it shows GPIO3_B2 used for the interrupt
line PMIC_INT_L from the RK808, and there's a note which translates as:
"PMU termination GPIO1_C5 changed to this".
Tested by setting an RTC wakealarm and checking /proc/interrupts counters.
Without this patch, neither the rockchip_gpio_irq counter for the RK808,
nor the RTC alarm counter increment when the alarm time is reached.
With this patch, both interrupt counters increment by 1 as expected.
[1] http://files.pine64.org/doc/rockpro64/rockpro64_v21-SCH.pdf
Fixes: e4f3fb4909 ("arm64: dts: rockchip: add initial dts support for Rockpro64")
Signed-off-by: Hugh Cole-Baker <sigmaris@gmail.com>
Link: https://lore.kernel.org/r/20190921131457.36258-1-sigmaris@gmail.com
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
The syzbot fuzzer found a slab-out-of-bounds write bug in the hid-gaff
driver. The problem is caused by the driver's assumption that the
device must have an input report. While this will be true for all
normal HID input devices, a suitably malicious device can violate the
assumption.
The same assumption is present in over a dozen other HID drivers.
This patch fixes them by checking that the list of hid_inputs for the
hid_device is nonempty before allowing it to be used.
Reported-and-tested-by: syzbot+403741a091bf41d4ae79@syzkaller.appspotmail.com
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
CC: <stable@vger.kernel.org>
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
string_to_context_struct() may garble the context string, so we need to
copy back the contents again from the old context struct to avoid
storing the corrupted context.
Since string_to_context_struct() tokenizes (and therefore truncates) the
context string and we are later potentially copying it with kstrdup(),
this may eventually cause pieces of uninitialized kernel memory to be
disclosed to userspace (when copying to userspace based on the stored
length and not the null character).
How to reproduce on Fedora and similar:
# dnf install -y memcached
# systemctl start memcached
# semodule -d memcached
# load_policy
# load_policy
# systemctl stop memcached
# ausearch -m AVC
type=AVC msg=audit(1570090572.648:313): avc: denied { signal } for pid=1 comm="systemd" scontext=system_u:system_r:init_t:s0 tcontext=system_u:object_r:unlabeled_t:s0 tclass=process permissive=0 trawcon=73797374656D5F75007400000000000070BE6E847296FFFF726F6D000096FFFF76
Cc: stable@vger.kernel.org
Reported-by: Milos Malik <mmalik@redhat.com>
Fixes: ee1a84fdfe ("selinux: overhaul sidtab to fix bug and improve performance")
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>
The old omapdrm panels got removed for v5.4 in favor of generic panels,
and the Kconfig options changed. Let's update omap2plus_defconfig
accordingly so the same panels are still enabled.
Cc: Jyri Sarha <jsarha@ti.com>
Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Fixes a crash in poll() when an AF_XDP socket is opened in copy mode
and the bound device does not have ndo_xsk_wakeup defined. Avoid
trying to call the non-existing ndo and instead call the internal xsk
sendmsg function to send packets in the same way (from the
application's point of view) as calling sendmsg() in any mode or
poll() in zero-copy mode would have done. The application should
behave in the same way independent on if zero-copy mode or copy mode
is used.
Fixes: 77cd0d7b3f ("xsk: add support for need_wakeup flag in AF_XDP rings")
Reported-by: syzbot+a5765ed8cdb1cca4d249@syzkaller.appspotmail.com
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/1569997919-11541-1-git-send-email-magnus.karlsson@intel.com
Coverity caught a case where we could return with a uninitialized value
in ret in process_leaf. This is actually pretty likely because we could
very easily run into a block group item key and have a garbage value in
ret and think there was an errror. Fix this by initializing ret to 0.
Reported-by: Colin Ian King <colin.king@canonical.com>
Fixes: fd708b81d9 ("Btrfs: add a extent ref verify tool")
CC: stable@vger.kernel.org # 4.19+
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
As platform_get_irq() now prints an error when the interrupt does not
exist, a scary warning may be printed for an optional interrupt:
sh_mmcif ee200000.mmc: IRQ index 1 not found
Fix this by calling platform_get_irq_optional() instead for the second
interrupt, which is optional.
Remove the now superfluous error printing for the first interrupt, which
is mandatory.
Fixes: 7723f4c5ec ("driver core: platform: Add an error message to platform_get_irq*()")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Tested-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Reviewed-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Tested-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
As platform_get_irq() now prints an error when the interrupt does not
exist, counting interrupts by looping until failure causes the printing
of scary messages like:
renesas_sdhi_internal_dmac ee140000.sd: IRQ index 1 not found
Fix this by using the platform_irq_count() helper to avoid touching
non-existent interrupts.
Fixes: 7723f4c5ec ("driver core: platform: Add an error message to platform_get_irq*()")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Reviewed-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Tested-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Tested-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
ACPI-6.3 corresponds to when HMAT revision was bumped
from 1 to 2. In this version ACPI_HMAT_MEMORY_PD_VALID
was deprecated and made reserved.
As such in revision 2+ we shouldn't be testing this flag.
This is as per ACPI-6.3, 5.2.27.3, Table 5-145
"Memory Proximity Domain Attributes Structure"
for Flags.
Signed-off-by: Daniel Black <daniel@linux.ibm.com>
Reviewed-by: Tao Xu <tao3.xu@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Some variants of Goodix touchscreen firmwares use 9-bytes finger
report format instead of common 8-bytes format.
This report format may be present as:
struct goodix_contact_data {
uint8_t unknown1;
uint8_t track_id;
uint8_t unknown2;
uint16_t x;
uint16_t y;
uint16_t w;
}__attribute__((packed));
Add support for such format and use it for Lenovo Yoga Book notebook
(which uses a Goodix touchpad as a touch keyboard).
Signed-off-by: Yauhen Kharuzhy <jekhor@gmail.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Since commit f889beaaab ("Input: da9063 - report KEY_POWER instead of
KEY_SLEEP during power key-press") KEY_SLEEP isn't supported anymore. This
caused input device to not generate any events if "dlg,disable-key-power"
is set.
Fix this by unconditionally setting KEY_POWER capability, and not
declaring KEY_SLEEP.
Fixes: f889beaaab ("Input: da9063 - report KEY_POWER instead of KEY_SLEEP during power key-press")
Signed-off-by: Marco Felsch <m.felsch@pengutronix.de>
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
The newly added optional file argument does not validate if the
file is indeed a watchdog, e.g.:
./watchdog-test -f /dev/zero
Watchdog Ticking Away!
Fix it by confirming that the WDIOC_GETSUPPORT ioctl succeeds.
Fixes: a4864a33f5 ("selftests: watchdog: Add optional file argument")
Reported-by: Eugeniu Rosca <erosca@de.adit-jv.com>
Signed-off-by: George G. Davis <george_davis@mentor.com>
Signed-off-by: Eugeniu Rosca <erosca@de.adit-jv.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
gpio: fixes for v5.4-rc2
- fix a bug with emulated open-drain/source where lines' values can no longer
be changed
- fix getting nonexclusive gpiods from DT
- fix an incorrect offset for the level trigger in gpio-eic-sprd
SMI# interrupt for fan and voltage is Two-Times Interrupt Mode.
Fan or voltage exceeds high limit or going below low limit,
it will causes an interrupt if the previous interrupt has been
reset by reading all the interrupt Status Register. Thus, add the
array fan_alarm and vsen_alarm to store the alarms for all of the
fan and voltage sensors.
Signed-off-by: amy.shih <amy.shih@advantech.com.tw>
Link: https://lore.kernel.org/r/20190919030205.11440-1-Amy.Shih@advantech.com.tw
Fixes: 486842db3b ("hwmon: (nct7904) Add extra sysfs support for fan, voltage and temperature.")
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
It's been a while since the k10temp documentation has been updated.
There are new CPU families supported as well as Tdie temp was added.
This patch adds all missing families which I was able to find from git
history and provides more info about Tctl vs Tdie exported temps.
Signed-off-by: Lukas Zapletal <lzap+git@redhat.com>
Link: https://lore.kernel.org/r/20190923105931.27881-1-lzap+git@redhat.com
[groeck: Formatting]
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Initialize last_reset variable to INITIAL_JIFFIES, otherwise it is not
possible to test H/W reset for first 5 minutes of system run.
Fixes: e403fa31ed ("rt2x00: add restart hw")
Reported-and-tested-by: Jonathan Liu <net147@gmail.com>
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Commit a745f7af3c ("selftests/harness: Add 30 second timeout per
test") solves the problem of kselftest_harness.h-using binary tests
possibly hanging forever. However, scripts and other binaries can still
hang forever. This adds a global timeout to each test script run.
To make this configurable (e.g. as needed in the "rtc" test case),
include a new per-test-directory "settings" file (similar to "config")
that can contain kselftest-specific settings. The first recognized field
is "timeout".
Additionally, this splits the reporting for timeouts into a specific
"TIMEOUT" not-ok (and adds exit code reporting in the remaining case).
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
A TARGET which failed to be built/installed should not be included in the
runlist generated inside the run_kselftest.sh script.
Signed-off-by: Cristian Marussi <cristian.marussi@arm.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Let the user specify an optional TARGETS skiplist through the new optional
SKIP_TARGETS Makefile variable.
It is easier to skip at will using a reduced and well defined list of
possibly problematic targets with SKIP_TARGETS than to provide a partially
stripped down list of good targets using the usual TARGETS variable.
Signed-off-by: Cristian Marussi <cristian.marussi@arm.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
If CONFIG_PM_SLEEP is not set, we can comment out these functions to avoid
the below warnings:
drivers/hv/vmbus_drv.c:2208:12: warning: ‘vmbus_bus_resume’ defined but not used [-Wunused-function]
drivers/hv/vmbus_drv.c:2128:12: warning: ‘vmbus_bus_suspend’ defined but not used [-Wunused-function]
drivers/hv/vmbus_drv.c:937:12: warning: ‘vmbus_resume’ defined but not used [-Wunused-function]
drivers/hv/vmbus_drv.c:918:12: warning: ‘vmbus_suspend’ defined but not used [-Wunused-function]
Fixes: 271b2224d4 ("Drivers: hv: vmbus: Implement suspend/resume for VSC drivers for hibernation")
Fixes: f53335e328 ("Drivers: hv: vmbus: Suspend/resume the vmbus itself for hibernation")
Reported-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Simplify the ring buffer handling with the in-place API.
Also avoid the dynamic allocation and the memory leak in the channel
callback function.
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Currently, the command:
btrfs balance start -dconvert=single,soft .
on a Raspberry Pi produces the following kernel message:
BTRFS error (device mmcblk0p2): balance: invalid convert data profile single
This fails because we use is_power_of_2(unsigned long) to validate
the new data profile, the constant for 'single' profile uses bit 48,
and there are only 32 bits in a long on ARM.
Fix by open-coding the check using u64 variables.
Tested by completing the original balance command on several Raspberry
Pis.
Fixes: 818255feec ("btrfs: use common helper instead of open coding a bit test")
CC: stable@vger.kernel.org # 4.20+
Signed-off-by: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We've historically had reports of being unable to mount file systems
because the tree log root couldn't be read. Usually this is the "parent
transid failure", but could be any of the related errors, including
"fsid mismatch" or "bad tree block", depending on which block got
allocated.
The modification of the individual log root items are serialized on the
per-log root root_mutex. This means that any modification to the
per-subvol log root_item is completely protected.
However we update the root item in the log root tree outside of the log
root tree log_mutex. We do this in order to allow multiple subvolumes
to be updated in each log transaction.
This is problematic however because when we are writing the log root
tree out we update the super block with the _current_ log root node
information. Since these two operations happen independently of each
other, you can end up updating the log root tree in between writing out
the dirty blocks and setting the super block to point at the current
root.
This means we'll point at the new root node that hasn't been written
out, instead of the one we should be pointing at. Thus whatever garbage
or old block we end up pointing at complains when we mount the file
system later and try to replay the log.
Fix this by copying the log's root item into a local root item copy.
Then once we're safely under the log_root_tree->log_mutex we update the
root item in the log_root_tree. This way we do not modify the
log_root_tree while we're committing it, fixing the problem.
CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Chris Mason <clm@fb.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When we have a buffered write that starts at an offset greater than or
equals to the file's size happening concurrently with a full ranged
fiemap, we can end up leaking an extent state structure.
Suppose we have a file with a size of 1Mb, and before the buffered write
and fiemap are performed, it has a single extent state in its io tree
representing the range from 0 to 1Mb, with the EXTENT_DELALLOC bit set.
The following sequence diagram shows how the memory leak happens if a
fiemap a buffered write, starting at offset 1Mb and with a length of
4Kb, are performed concurrently.
CPU 1 CPU 2
extent_fiemap()
--> it's a full ranged fiemap
range from 0 to LLONG_MAX - 1
(9223372036854775807)
--> locks range in the inode's
io tree
--> after this we have 2 extent
states in the io tree:
--> 1 for range [0, 1Mb[ with
the bits EXTENT_LOCKED and
EXTENT_DELALLOC_BITS set
--> 1 for the range
[1Mb, LLONG_MAX[ with
the EXTENT_LOCKED bit set
--> start buffered write at offset
1Mb with a length of 4Kb
btrfs_file_write_iter()
btrfs_buffered_write()
--> cached_state is NULL
lock_and_cleanup_extent_if_need()
--> returns 0 and does not lock
range because it starts
at current i_size / eof
--> cached_state remains NULL
btrfs_dirty_pages()
btrfs_set_extent_delalloc()
(...)
__set_extent_bit()
--> splits extent state for range
[1Mb, LLONG_MAX[ and now we
have 2 extent states:
--> one for the range
[1Mb, 1Mb + 4Kb[ with
EXTENT_LOCKED set
--> another one for the range
[1Mb + 4Kb, LLONG_MAX[ with
EXTENT_LOCKED set as well
--> sets EXTENT_DELALLOC on the
extent state for the range
[1Mb, 1Mb + 4Kb[
--> caches extent state
[1Mb, 1Mb + 4Kb[ into
@cached_state because it has
the bit EXTENT_LOCKED set
--> btrfs_buffered_write() ends up
with a non-NULL cached_state and
never calls anything to release its
reference on it, resulting in a
memory leak
Fix this by calling free_extent_state() on cached_state if the range was
not locked by lock_and_cleanup_extent_if_need().
The same issue can happen if anything else other than fiemap locks a range
that covers eof and beyond.
This could be triggered, sporadically, by test case generic/561 from the
fstests suite, which makes duperemove run concurrently with fsstress, and
duperemove does plenty of calls to fiemap. When CONFIG_BTRFS_DEBUG is set
the leak is reported in dmesg/syslog when removing the btrfs module with
a message like the following:
[77100.039461] BTRFS: state leak: start 6574080 end 6582271 state 16402 in tree 0 refs 1
Otherwise (CONFIG_BTRFS_DEBUG not set) detectable with kmemleak.
CC: stable@vger.kernel.org # 4.16+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Add kselftest-all target to build tests from the top level
Makefile. This is to simplify kselftest use-cases for CI and
distributions where build and test systems are different.
Current kselftest target builds and runs tests on a development
system which is a developer use-case.
Add kselftest-install target to install tests from the top level
Makefile. This is to simplify kselftest use-cases for CI and
distributions where build and test systems are different.
This change addresses requests from developers and testers to add
support for installing kselftest from the main Makefile.
In addition, make the install directory the same when install is
run using "make kselftest-install" or by running kselftest_install.sh.
Also fix the INSTALL_PATH variable conflict between main Makefile and
selftests Makefile.
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Acked-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
This patch fixes the lock inversion complaint:
============================================
WARNING: possible recursive locking detected
5.3.0-rc7-dbg+ #1 Not tainted
--------------------------------------------
kworker/u16:6/171 is trying to acquire lock:
00000000035c6e6c (&id_priv->handler_mutex){+.+.}, at: rdma_destroy_id+0x78/0x4a0 [rdma_cm]
but task is already holding lock:
00000000bc7c307d (&id_priv->handler_mutex){+.+.}, at: iw_conn_req_handler+0x151/0x680 [rdma_cm]
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&id_priv->handler_mutex);
lock(&id_priv->handler_mutex);
*** DEADLOCK ***
May be due to missing lock nesting notation
3 locks held by kworker/u16:6/171:
#0: 00000000e2eaa773 ((wq_completion)iw_cm_wq){+.+.}, at: process_one_work+0x472/0xac0
#1: 000000001efd357b ((work_completion)(&work->work)#3){+.+.}, at: process_one_work+0x476/0xac0
#2: 00000000bc7c307d (&id_priv->handler_mutex){+.+.}, at: iw_conn_req_handler+0x151/0x680 [rdma_cm]
stack backtrace:
CPU: 3 PID: 171 Comm: kworker/u16:6 Not tainted 5.3.0-rc7-dbg+ #1
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
Workqueue: iw_cm_wq cm_work_handler [iw_cm]
Call Trace:
dump_stack+0x8a/0xd6
__lock_acquire.cold+0xe1/0x24d
lock_acquire+0x106/0x240
__mutex_lock+0x12e/0xcb0
mutex_lock_nested+0x1f/0x30
rdma_destroy_id+0x78/0x4a0 [rdma_cm]
iw_conn_req_handler+0x5c9/0x680 [rdma_cm]
cm_work_handler+0xe62/0x1100 [iw_cm]
process_one_work+0x56d/0xac0
worker_thread+0x7a/0x5d0
kthread+0x1bc/0x210
ret_from_fork+0x24/0x30
This is not a bug as there are actually two lock classes here.
Link: https://lore.kernel.org/r/20190930231707.48259-3-bvanassche@acm.org
Fixes: de910bd921 ("RDMA/cma: Simplify locking needed for serialization of callbacks")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
This is essentially a revert of:
e3f72b749d pinctrl: cherryview: fix Strago DMI workaround
86c5dd6860 pinctrl: cherryview: limit Strago DMI workarounds to version 1.0
because even with 1.1 versions of BIOS there are some pins that are
configured as interrupts but not claimed by any driver, and they
sometimes fire up and result in interrupt storms that cause touchpad
stop functioning and other issues.
Given that we are unlikely to qualify another firmware version for a
while it is better to keep the workaround active on all Strago boards.
Reported-by: Alex Levin <levinale@chromium.org>
Fixes: 86c5dd6860 ("pinctrl: cherryview: limit Strago DMI workarounds to version 1.0")
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Tested-by: Alex Levin <levinale@chromium.org>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Keeping the IRQ chip definition static shares it with multiple instances of
the GPIO chip in the system. This is bad and now we get this warning from
GPIO library:
"detected irqchip that is shared with multiple gpiochips: please fix the driver."
Hence, move the IRQ chip definition from being driver static into the struct
intel_pinctrl. So a unique IRQ chip is used for each GPIO chip instance.
Fixes: ee1a6ca43d ("pinctrl: intel: Add Intel Broxton pin controller support")
Depends-on: 5ff56b015e ("pinctrl: intel: Disable GPIO pin interrupts in suspend")
Reported-by: Federico Ricchiuto <fed.ricchiuto@gmail.com>
Suggested-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Don't populate the array keys on the stack but instead make it
static const. Makes the object code smaller by 166 bytes.
Before:
text data bss dec hex filename
18931 5872 480 25283 62c3 drivers/hid/hid-prodikeys.o
After:
text data bss dec hex filename
18669 5968 480 25117 621d drivers/hid/hid-prodikeys.o
(gcc version 9.2.1, amd64)
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
On HID report descriptor parsing error the code displays bogus
pointer instead of error offset (subtracts start=NULL from end).
Make the message more useful by displaying correct error offset
and include total buffer size for reference.
This was carried over from ancient times - "Fixed" commit just
promoted the message from DEBUG to ERROR.
Cc: stable@vger.kernel.org
Fixes: 8c3d52fc39 ("HID: make parser more verbose about parsing errors by default")
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
The ARM accelerated AES driver depends on the new AES library for
its non-SIMD fallback so express this in its Kconfig declaration.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The NEON/Crypto Extensions based AES implementation for 32-bit ARM
can be built in a kernel that targets ARMv6 CPUs and higher, even
though the actual code will not be able to run on that generation,
but it allows for a portable image to be generated that can will
use the special instructions only when they are available.
Since those instructions are part of a FPU profile rather than a
CPU profile, we don't override the architecture in the assembler
code, and most of the scalar code is simple enough to be ARMv6
compatible. However, that changes with commit c61b1607ed,
which introduces calls to the movw/movt instructions, which are
v7+ only.
So override the architecture in the .S file to armv8-a, which
matches the architecture specification in the crypto-neon-fp-armv8
FPU specificier that we already using. Note that using armv7-a
here may trigger an issue with the upcoming Clang 10 release,
which no longer permits .arch/.fpu combinations it views as
incompatible.
Reported-by: kbuild test robot <lkp@intel.com>
Fixes: c61b1607ed ("crypto: arm/aes-ce - implement ciphertext stealing ...")
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Sphinx generates the following warnings for the arm64 doc
pages:
Documentation/arm64/memory.rst:158: WARNING: Unexpected indentation.
Documentation/arm64/memory.rst:162: WARNING: Unexpected indentation.
These indentations warnings can be resolved by utilising code
hightlighting instead.
Signed-off-by: Adam Zerella <adam.zerella@gmail.com>
Signed-off-by: Will Deacon <will@kernel.org>
The system which has SVE feature crashed because of
the memory pointed by task->thread.sve_state was destroyed
by someone.
That is because sve_state is freed while the forking the
child process. The child process has the pointer of sve_state
which is same as the parent's because the child's task_struct
is copied from the parent's one. If the copy_process()
fails as an error on somewhere, for example, copy_creds(),
then the sve_state is freed even if the parent is alive.
The flow is as follows.
copy_process
p = dup_task_struct
=> arch_dup_task_struct
*dst = *src; // copy the entire region.
:
retval = copy_creds
if (retval < 0)
goto bad_fork_free;
:
bad_fork_free:
...
delayed_free_task(p);
=> free_task
=> arch_release_task_struct
=> fpsimd_release_task
=> __sve_free
=> kfree(task->thread.sve_state);
// free the parent's sve_state
Move child's sve_state = NULL and clearing TIF_SVE flag
to arch_dup_task_struct() so that the child doesn't free the
parent's one.
There is no need to wait until copy_process() to clear TIF_SVE for
dst, because the thread flags for dst are initialized already by
copying the src task_struct.
This change simplifies the code, so get rid of comments that are no
longer needed.
As a note, arm64 used to have thread_info on the stack. So it
would not be possible to clear TIF_SVE until the stack is initialized.
From commit c02433dd6d ("arm64: split thread_info from task stack"),
the thread_info is part of the task, so it should be valid to modify
the flag from arch_dup_task_struct().
Cc: stable@vger.kernel.org # 4.15.x-
Fixes: bc0ee47603 ("arm64/sve: Core task context handling")
Signed-off-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Reported-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Suggested-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Dave Martin <Dave.Martin@arm.com>
Tested-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Commit 73f3816609 ("arm64: Advertise mitigation of Spectre-v2, or lack
thereof") renamed the caller of the install_bp_hardening_cb() function
but forgot to update a comment, which can be confusing when trying to
follow the code flow.
Fixes: 73f3816609 ("arm64: Advertise mitigation of Spectre-v2, or lack thereof")
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Will Deacon <will@kernel.org>
ti_abb_wait_txdone() may return -ETIMEDOUT when ti_abb_check_txdone()
returns true in the latest iteration of the while loop because the timeout
value is abb->settling_time + 1. Similarly, ti_abb_clear_all_txdone() may
return -ETIMEDOUT when ti_abb_check_txdone() returns false in the latest
iteration of the while loop. Fix it.
Signed-off-by: Axel Lin <axel.lin@ingics.com>
Acked-by: Nishanth Menon <nm@ti.com>
Link: https://lore.kernel.org/r/20190929095848.21960-1-axel.lin@ingics.com
Signed-off-by: Mark Brown <broonie@kernel.org>
In principle, Midgard GPUs supporting smaller VA sizes should only
require 3-level pagetables, since level 0 only resolves bits 48:40 of
the address. However, the kbase driver does not appear to have any
notion of a variable start level, and empirically T720 and T820 rapidly
blow up with translation faults unless given a full 4-level table,
despite only supporting a 33-bit VA size.
The 'real' IAS value is still valuable in terms of validating addresses
on map/unmap, so tweak the allocator to allow smaller values while still
forcing the resultant tables to the full 4 levels. As far as I can test,
this should make all known Midgard variants happy.
Fixes: d08d42de64 ("iommu: io-pgtable: Add ARM Mali midgard MMU page table format")
Tested-by: Neil Armstrong <narmstrong@baylibre.com>
Reviewed-by: Steven Price <steven.price@arm.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Whilst Midgard's MEMATTR follows a similar principle to the VMSA MAIR,
the actual attribute values differ, so although it currently appears to
work to some degree, we probably shouldn't be using our standard stage 1
MAIR for that. Instead, generate a reasonable MEMATTR with attribute
values borrowed from the kbase driver; at this point we'll be overriding
or ignoring pretty much all of the LPAE config, so just implement these
Mali details in a dedicated allocator instead of pretending to subclass
the standard VMSA format.
Fixes: d08d42de64 ("iommu: io-pgtable: Add ARM Mali midgard MMU page table format")
Tested-by: Neil Armstrong <narmstrong@baylibre.com>
Reviewed-by: Steven Price <steven.price@arm.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
When alloc_io_pgtable_ops is failed, context bitmap which is just allocated
by __arm_smmu_alloc_bitmap should be freed to release the resource.
Signed-off-by: Liu Xiang <liuxiang_1999@126.com>
Signed-off-by: Will Deacon <will@kernel.org>
When toggling the level trigger to emulate the edge trigger, the
EIC offset is incorrect without adding the corresponding bank index,
thus fix it.
Fixes: 7bf0d7f622 ("gpio: eic: Add edge trigger emulation for EIC")
Cc: stable@vger.kernel.org
Signed-off-by: Bruce Chen <bruce.chen@unisoc.com>
Signed-off-by: Baolin Wang <baolin.wang@linaro.org>
Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Since commit ec757001c8 ("gpio: Enable nonexclusive gpiods from DT
nodes") we are able to get GPIOD_FLAGS_BIT_NONEXCLUSIVE marked gpios.
Currently the gpiolib uses the wrong flags variable for the check. We
need to check the gpiod_flags instead of the of_gpio_flags else we
return -EBUSY for GPIOD_FLAGS_BIT_NONEXCLUSIVE marked and requested
gpiod's.
Fixes: ec757001c8 gpio: Enable nonexclusive gpiods from DT nodes
Cc: stable@vger.kernel.org
Signed-off-by: Marco Felsch <m.felsch@pengutronix.de>
[Bartosz: the function was moved to gpiolib-of.c so updated the patch]
Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
When emulating open-drain/open-source by not actively driving the output
lines - we're simply changing their mode to input. This is wrong as it
will then make it impossible to change the value of such line - it's now
considered to actually be in input mode. If we want to still use the
direction_input() callback for simplicity then we need to set FLAG_IS_OUT
manually in gpiod_direction_output() and not clear it in
gpio_set_open_drain_value_commit() and
gpio_set_open_source_value_commit().
Fixes: c663e5f567 ("gpio: support native single-ended hardware drivers")
Cc: stable@vger.kernel.org
Reported-by: Kent Gibson <warthog618@gmail.com>
Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
If kzalloc() returns NULL, the error path doesn't stop the flow of
control from entering rtw_hal_read_chip_version() which dereferences the
null pointer. Fix this by adding a 'goto' to the error path to more
gracefully handle the issue and avoid proceeding with initialization
steps that we're no longer prepared to handle.
Also update the debug message to be more consistent with the other debug
messages in this function.
Addresses-Coverity: ("Dereference after null check")
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Link: https://lore.kernel.org/r/20190927214415.899-1-connor.kuehl@canonical.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The PCM draining behavior got broken since the recent refactoring, and
this turned out to be the incorrect expectation of the firmware
behavior regarding "draining". While I expected the "drain" flag at
the stop operation would do processing the queued samples, it seems
rather dropping the samples.
As a quick fix, just drop the SNDRV_PCM_INFO_DRAIN_TRIGGER flag, so
that the driver uses the normal PCM draining procedure. Also, put
some caution comment to the function for future readers not to fall
into the same pitfall.
Fixes: d7ca3a7154 ("staging: bcm2835-audio: Operate non-atomic PCM ops")
BugLink: https://github.com/raspberrypi/linux/issues/2983
Cc: stable@vger.kernel.org
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Acked-by: Stefan Wahren <wahrenst@gmx.net>
Link: https://lore.kernel.org/r/20190914152405.7416-1-tiwai@suse.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit c440eee1a7 ("Staging: fbtft: Switch to the gpio descriptor
interface") removed the gpio code from fbtft_device rendering it useless.
fbtft_device is a module that was used on the Raspberry Pi to dynamically
add fbtft devices when the Pi didn't have Device Tree support.
Just remove the module since it's the responsibility of Device Tree, ACPI
or platform code to add devices.
Fixes: c440eee1a7 ("Staging: fbtft: Switch to the gpio descriptor interface")
Signed-off-by: Noralf Trønnes <noralf@tronnes.org>
Link: https://lore.kernel.org/r/20190917171843.10334-2-noralf@tronnes.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit c440eee1a7 ("Staging: fbtft: Switch to the gpio descriptor
interface") removed setting gpios via platform data. This means that
fbtft will now only work with Device Tree so set the dependency.
This also prevents a NULL pointer deref on non-DT platform because
fbtftops.request_gpios is not set in that case anymore.
Fixes: c440eee1a7 ("Staging: fbtft: Switch to the gpio descriptor interface")
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Noralf Trønnes <noralf@tronnes.org>
Link: https://lore.kernel.org/r/20190917171843.10334-1-noralf@tronnes.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
On 32-bit:
In file included from drivers/staging/octeon/octeon-ethernet.h:41,
from drivers/staging/octeon/ethernet-tx.c:25:
drivers/staging/octeon/octeon-stubs.h: In function ‘cvmx_phys_to_ptr’:
drivers/staging/octeon/octeon-stubs.h:1205:9: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
return (void *)(physical_address);
^
drivers/staging/octeon/ethernet-tx.c: In function ‘cvm_oct_xmit’:
drivers/staging/octeon/ethernet-tx.c:264:37: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
hw_buffer.s.addr = XKPHYS_TO_PHYS((u64)skb->data);
^
drivers/staging/octeon/octeon-stubs.h:2:30: note: in definition of macro ‘XKPHYS_TO_PHYS’
#define XKPHYS_TO_PHYS(p) (p)
^
drivers/staging/octeon/ethernet-tx.c:268:37: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
hw_buffer.s.addr = XKPHYS_TO_PHYS((u64)skb->data);
^
drivers/staging/octeon/octeon-stubs.h:2:30: note: in definition of macro ‘XKPHYS_TO_PHYS’
#define XKPHYS_TO_PHYS(p) (p)
^
drivers/staging/octeon/ethernet-tx.c:276:20: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
XKPHYS_TO_PHYS((u64)skb_frag_address(fs));
^
drivers/staging/octeon/octeon-stubs.h:2:30: note: in definition of macro ‘XKPHYS_TO_PHYS’
#define XKPHYS_TO_PHYS(p) (p)
^
drivers/staging/octeon/ethernet-tx.c:280:37: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
hw_buffer.s.addr = XKPHYS_TO_PHYS((u64)CVM_OCT_SKB_CB(skb));
^
drivers/staging/octeon/octeon-stubs.h:2:30: note: in definition of macro ‘XKPHYS_TO_PHYS’
#define XKPHYS_TO_PHYS(p) (p)
^
Fix this by replacing casts to "u64" by casts to "uintptr_t", which is
either 32-bit or 64-bit, and adding an intermediate cast to "uintptr_t"
where needed.
Exposed by commit 171a9bae68 ("staging/octeon: Allow test build on
!MIPS").
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Link: https://lore.kernel.org/r/20190919095022.29099-1-geert@linux-m68k.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 88263208dd ("scsi: qla2xxx: Complain if sp->done() is not called
from the completion path") introduced the WARN_ON_ONCE in
qla2x00_status_cont_entry(). The assumption was that there is only one
status continuations element. According to the firmware documentation it is
possible that multiple status continuations are emitted by the firmware.
Fixes: 88263208dd ("scsi: qla2xxx: Complain if sp->done() is not called from the completion path")
Link: https://lore.kernel.org/r/20190927073031.62296-1-dwagner@suse.de
Cc: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Daniel Wagner <dwagner@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
I've got a report about a UAS drive enclosure reporting back Sense: Logical
unit access not authorized if the drive it holds is password protected.
While the drive is obviously unusable in that state as a mass storage
device, it still exists as a sd device and when the system is asked to
perform a suspend of the drive, it will be sent a SYNCHRONIZE CACHE. If
that fails due to password protection, the error must be ignored.
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20190903101840.16483-1-oneukum@suse.com
Signed-off-by: Oliver Neukum <oneukum@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
There are total of 151 non-secure gpio (0-150) and four
pins of pinmux (91, 92, 93 and 94) are not mapped to any
gpio pin, hence update same in DT.
Fixes: 8aa428cc1e ("arm64: dts: Add pinctrl DT nodes for Stingray SOC")
Signed-off-by: Rayagonda Kokatanur <rayagonda.kokatanur@broadcom.com>
Reviewed-by: Ray Jui <ray.jui@broadcom.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
bcc uses libbpf repo as a submodule. It brings in libbpf source
code and builds everything together to produce shared libraries.
With latest libbpf, I got the following errors:
/bin/ld: libbcc_bpf.so.0.10.0: version node not found for symbol xsk_umem__create@LIBBPF_0.0.2
/bin/ld: failed to set dynamic section sizes: Bad value
collect2: error: ld returned 1 exit status
make[2]: *** [src/cc/libbcc_bpf.so.0.10.0] Error 1
In xsk.c, we have
asm(".symver xsk_umem__create_v0_0_2, xsk_umem__create@LIBBPF_0.0.2");
asm(".symver xsk_umem__create_v0_0_4, xsk_umem__create@@LIBBPF_0.0.4");
The linker thinks the built is for LIBBPF but cannot find proper version
LIBBPF_0.0.2/4, so emit errors.
I also confirmed that using libbpf.a to produce a shared library also
has issues:
-bash-4.4$ cat t.c
extern void *xsk_umem__create;
void * test() { return xsk_umem__create; }
-bash-4.4$ gcc -c -fPIC t.c
-bash-4.4$ gcc -shared t.o libbpf.a -o t.so
/bin/ld: t.so: version node not found for symbol xsk_umem__create@LIBBPF_0.0.2
/bin/ld: failed to set dynamic section sizes: Bad value
collect2: error: ld returned 1 exit status
-bash-4.4$
Symbol versioning does happens in commonly used libraries, e.g., elfutils
and glibc. For static libraries, for a versioned symbol, the old definitions
will be ignored, and the symbol will be an alias to the latest definition.
For example, glibc sched_setaffinity is versioned.
-bash-4.4$ readelf -s /usr/lib64/libc.so.6 | grep sched_setaffinity
756: 000000000013d3d0 13 FUNC GLOBAL DEFAULT 13 sched_setaffinity@GLIBC_2.3.3
757: 00000000000e2e70 455 FUNC GLOBAL DEFAULT 13 sched_setaffinity@@GLIBC_2.3.4
1800: 0000000000000000 0 FILE LOCAL DEFAULT ABS sched_setaffinity.c
4228: 00000000000e2e70 455 FUNC LOCAL DEFAULT 13 __sched_setaffinity_new
4648: 000000000013d3d0 13 FUNC LOCAL DEFAULT 13 __sched_setaffinity_old
7338: 000000000013d3d0 13 FUNC GLOBAL DEFAULT 13 sched_setaffinity@GLIBC_2
7380: 00000000000e2e70 455 FUNC GLOBAL DEFAULT 13 sched_setaffinity@@GLIBC_
-bash-4.4$
For static library, the definition of sched_setaffinity aliases to the new definition.
-bash-4.4$ readelf -s /usr/lib64/libc.a | grep sched_setaffinity
File: /usr/lib64/libc.a(sched_setaffinity.o)
8: 0000000000000000 455 FUNC GLOBAL DEFAULT 1 __sched_setaffinity_new
12: 0000000000000000 455 FUNC WEAK DEFAULT 1 sched_setaffinity
For both elfutils and glibc, additional macros are used to control different handling
of symbol versioning w.r.t static and shared libraries.
For elfutils, the macro is SYMBOL_VERSIONING
(https://sourceware.org/git/?p=elfutils.git;a=blob;f=lib/eu-config.h).
For glibc, the macro is SHARED
(https://sourceware.org/git/?p=glibc.git;a=blob;f=include/shlib-compat.h;hb=refs/heads/master)
This patch used SHARED as the macro name. After this patch, the libbpf.a has
-bash-4.4$ readelf -s libbpf.a | grep xsk_umem__create
372: 0000000000017145 1190 FUNC GLOBAL DEFAULT 1 xsk_umem__create_v0_0_4
405: 0000000000017145 1190 FUNC GLOBAL DEFAULT 1 xsk_umem__create
499: 00000000000175eb 103 FUNC GLOBAL DEFAULT 1 xsk_umem__create_v0_0_2
-bash-4.4$
No versioned symbols for xsk_umem__create.
The libbpf.a can be used to build a shared library succesfully.
-bash-4.4$ cat t.c
extern void *xsk_umem__create;
void * test() { return xsk_umem__create; }
-bash-4.4$ gcc -c -fPIC t.c
-bash-4.4$ gcc -shared t.o libbpf.a -o t.so
-bash-4.4$
Fixes: 10d30e3017 ("libbpf: add flags to umem config")
Cc: Kevin Laatz <kevin.laatz@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Andrii Nakryiko <andriin@fb.com>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The Intel fixed counters use a special table to override the JSON
information.
During this override the period information from the JSON file got
dropped, which results in inst_retired.any and similar running with
frequency mode instead of a period.
Just specify the expected period in the table.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Link: http://lore.kernel.org/lkml/20190927233546.11533-2-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When the LBR data and the instructions in a binary do not match the loop
printing instructions could get confused and print a long stream of
bogus <bad> instructions.
The problem was that if the instruction decoder cannot decode an
instruction it ilen wasn't initialized, so the loop going through the
basic block would continue with the previous value.
Harden the code to avoid such problems:
- Make sure ilen is always freshly initialized and is 0 for bad
instructions.
- Do not overrun the code buffer while printing instructions
- Print a warning message if the final jump is not on an instruction
boundary.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Link: http://lore.kernel.org/lkml/20190927233546.11533-1-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Whenever an mmap/mmap2 event occurs, the map tree must be updated to add a new
entry. If a new map overlaps a previous map, the overlapped section of the
previous map is effectively unmapped, but the non-overlapping sections are
still valid.
maps__fixup_overlappings() is responsible for creating any new map entries from
the previously overlapped map. It optionally creates a before and an after map.
When creating the after map the existing code failed to adjust the map.pgoff.
This meant the new after map would incorrectly calculate the file offset
for the ip. This results in incorrect symbol name resolution for any ip in the
after region.
Make maps__fixup_overlappings() correctly populate map.pgoff.
Add an assert that new mapping matches old mapping at the beginning of
the after map.
Committer-testing:
Validated correct parsing of libcoreclr.so symbols from .NET Core 3.0 preview9
(which didn't strip symbols).
Preparation:
~/dotnet3.0-preview9/dotnet new webapi -o perfSymbol
cd perfSymbol
~/dotnet3.0-preview9/dotnet publish
perf record ~/dotnet3.0-preview9/dotnet \
bin/Debug/netcoreapp3.0/publish/perfSymbol.dll
^C
Before:
perf script --show-mmap-events 2>&1 | grep -e MMAP -e unknown |\
grep libcoreclr.so | head -n 4
dotnet 1907 373352.698780: PERF_RECORD_MMAP2 1907/1907: \
[0x7fe615726000(0x768000) @ 0 08:02 5510620 765057155]: \
r-xp .../3.0.0-preview9-19423-09/libcoreclr.so
dotnet 1907 373352.701091: PERF_RECORD_MMAP2 1907/1907: \
[0x7fe615974000(0x1000) @ 0x24e000 08:02 5510620 765057155]: \
rwxp .../3.0.0-preview9-19423-09/libcoreclr.so
dotnet 1907 373352.701241: PERF_RECORD_MMAP2 1907/1907: \
[0x7fe615c42000(0x1000) @ 0x51c000 08:02 5510620 765057155]: \
rwxp .../3.0.0-preview9-19423-09/libcoreclr.so
dotnet 1907 373352.705249: 250000 cpu-clock: \
7fe6159a1f99 [unknown] \
(.../3.0.0-preview9-19423-09/libcoreclr.so)
After:
perf script --show-mmap-events 2>&1 | grep -e MMAP -e unknown |\
grep libcoreclr.so | head -n 4
dotnet 1907 373352.698780: PERF_RECORD_MMAP2 1907/1907: \
[0x7fe615726000(0x768000) @ 0 08:02 5510620 765057155]: \
r-xp .../3.0.0-preview9-19423-09/libcoreclr.so
dotnet 1907 373352.701091: PERF_RECORD_MMAP2 1907/1907: \
[0x7fe615974000(0x1000) @ 0x24e000 08:02 5510620 765057155]: \
rwxp .../3.0.0-preview9-19423-09/libcoreclr.so
dotnet 1907 373352.701241: PERF_RECORD_MMAP2 1907/1907: \
[0x7fe615c42000(0x1000) @ 0x51c000 08:02 5510620 765057155]: \
rwxp .../3.0.0-preview9-19423-09/libcoreclr.so
All the [unknown] symbols were resolved.
Signed-off-by: Steve MacLean <Steve.MacLean@Microsoft.com>
Tested-by: Brian Robbins <brianrob@microsoft.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Eric Saint-Etienne <eric.saint.etienne@oracle.com>
Cc: John Keeping <john@metanate.com>
Cc: John Salem <josalem@microsoft.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Tom McDonald <thomas.mcdonald@microsoft.com>
Link: http://lore.kernel.org/lkml/BN8PR21MB136270949F22A6A02335C238F7800@BN8PR21MB1362.namprd21.prod.outlook.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To pick the changes in:
200824f55e ("KVM: s390: Disallow invalid bits in kvm_valid_regs and kvm_dirty_regs")
4a53d99dd0 ("KVM: VMX: Introduce exit reason for receiving INIT signal on guest-mode")
7396d337cf ("KVM: x86: Return to userspace with internal error on unexpected exit reason")
92f35b751c ("KVM: arm/arm64: vgic: Allow more than 256 vcpus for KVM_IRQ_LINE")
None of them trigger any changes in tooling, this time this is just to silence
these perf build warnings:
Warning: Kernel ABI header at 'tools/include/uapi/linux/kvm.h' differs from latest version at 'include/uapi/linux/kvm.h'
diff -u tools/include/uapi/linux/kvm.h include/uapi/linux/kvm.h
Warning: Kernel ABI header at 'tools/arch/x86/include/uapi/asm/vmx.h' differs from latest version at 'arch/x86/include/uapi/asm/vmx.h'
diff -u tools/arch/x86/include/uapi/asm/vmx.h arch/x86/include/uapi/asm/vmx.h
Warning: Kernel ABI header at 'tools/arch/s390/include/uapi/asm/kvm.h' differs from latest version at 'arch/s390/include/uapi/asm/kvm.h'
diff -u tools/arch/s390/include/uapi/asm/kvm.h arch/s390/include/uapi/asm/kvm.h
Warning: Kernel ABI header at 'tools/arch/arm/include/uapi/asm/kvm.h' differs from latest version at 'arch/arm/include/uapi/asm/kvm.h'
diff -u tools/arch/arm/include/uapi/asm/kvm.h arch/arm/include/uapi/asm/kvm.h
Warning: Kernel ABI header at 'tools/arch/arm64/include/uapi/asm/kvm.h' differs from latest version at 'arch/arm64/include/uapi/asm/kvm.h'
diff -u tools/arch/arm64/include/uapi/asm/kvm.h arch/arm64/include/uapi/asm/kvm.h
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Janosch Frank <frankja@linux.ibm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Liran Alon <liran.alon@oracle.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Thomas Huth <thuth@redhat.com>
Link: https://lkml.kernel.org/n/tip-akuugvvjxte26kzv23zp5d2z@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To pick the changes from:
78a1b96bcf ("fscrypt: add FS_IOC_REMOVE_ENCRYPTION_KEY_ALL_USERS ioctl")
23c688b540 ("fscrypt: allow unprivileged users to add/remove keys for v2 policies")
5dae460c22 ("fscrypt: v2 encryption policy support")
5a7e29924d ("fscrypt: add FS_IOC_GET_ENCRYPTION_KEY_STATUS ioctl")
b1c0ec3599 ("fscrypt: add FS_IOC_REMOVE_ENCRYPTION_KEY ioctl")
22d94f493b ("fscrypt: add FS_IOC_ADD_ENCRYPTION_KEY ioctl")
3b6df59bc4 ("fscrypt: use FSCRYPT_* definitions, not FS_*")
2336d0deb2 ("fscrypt: use FSCRYPT_ prefix for uapi constants")
7af0ab0d3a ("fs, fscrypt: move uapi definitions to new header <linux/fscrypt.h>")
That don't trigger any changes in tooling, as it so far is used only
for:
$ grep -l 'fs\.h' tools/perf/trace/beauty/*.sh | xargs grep regex=
tools/perf/trace/beauty/rename_flags.sh:regex='^[[:space:]]*#[[:space:]]*define[[:space:]]+RENAME_([[:alnum:]_]+)[[:space:]]+\(1[[:space:]]*<<[[:space:]]*([[:xdigit:]]+)[[:space:]]*\)[[:space:]]*.*'
tools/perf/trace/beauty/sync_file_range.sh:regex='^[[:space:]]*#[[:space:]]*define[[:space:]]+SYNC_FILE_RANGE_([[:alnum:]_]+)[[:space:]]+([[:xdigit:]]+)[[:space:]]*.*'
tools/perf/trace/beauty/usbdevfs_ioctl.sh:regex="^#[[:space:]]*define[[:space:]]+USBDEVFS_(\w+)(\(\w+\))?[[:space:]]+_IO[CWR]{0,2}\([[:space:]]*(_IOC_\w+,[[:space:]]*)?'U'[[:space:]]*,[[:space:]]*([[:digit:]]+).*"
tools/perf/trace/beauty/usbdevfs_ioctl.sh:regex="^#[[:space:]]*define[[:space:]]+USBDEVFS_(\w+)[[:space:]]+_IO[WR]{0,2}\([[:space:]]*'U'[[:space:]]*,[[:space:]]*([[:digit:]]+).*"
$
This silences this perf build warning:
Warning: Kernel ABI header at 'tools/include/uapi/linux/fs.h' differs from latest version at 'include/uapi/linux/fs.h'
diff -u tools/include/uapi/linux/fs.h include/uapi/linux/fs.h
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Eric Biggers <ebiggers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-44g48exl9br9ba0t64chqb4i@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To pick up the changes from:
4ed3350539 ("USB: usbfs: Add a capability flag for runtime suspend")
7794f486ed ("usbfs: Add ioctls for runtime power management")
This triggers these changes in the kernel sources, automagically
supporting these new ioctls in the 'perf trace' beautifiers.
Soon this will be used in things like filter expressions for tracepoints
in 'perf record', 'perf trace', 'perf top', i.e. filter expressions will
do a lookup to turn things like USBDEVFS_WAIT_FOR_RESUME into _IO('U',
35) before associating the tracepoint expression to tracepoint perf
event.
$ tools/perf/trace/beauty/usbdevfs_ioctl.sh > before
$ cp include/uapi/linux/usbdevice_fs.h tools/include/uapi/linux/usbdevice_fs.h
$ git diff
diff --git a/tools/include/uapi/linux/usbdevice_fs.h b/tools/include/uapi/linux/usbdevice_fs.h
index 78efe870c2b7..cf525cddeb94 100644
--- a/tools/include/uapi/linux/usbdevice_fs.h
+++ b/tools/include/uapi/linux/usbdevice_fs.h
@@ -158,6 +158,7 @@ struct usbdevfs_hub_portinfo {
#define USBDEVFS_CAP_MMAP 0x20
#define USBDEVFS_CAP_DROP_PRIVILEGES 0x40
#define USBDEVFS_CAP_CONNINFO_EX 0x80
+#define USBDEVFS_CAP_SUSPEND 0x100
/* USBDEVFS_DISCONNECT_CLAIM flags & struct */
@@ -223,5 +224,8 @@ struct usbdevfs_streams {
* extending size of the data returned.
*/
#define USBDEVFS_CONNINFO_EX(len) _IOC(_IOC_READ, 'U', 32, len)
+#define USBDEVFS_FORBID_SUSPEND _IO('U', 33)
+#define USBDEVFS_ALLOW_SUSPEND _IO('U', 34)
+#define USBDEVFS_WAIT_FOR_RESUME _IO('U', 35)
#endif /* _UAPI_LINUX_USBDEVICE_FS_H */
$ tools/perf/trace/beauty/usbdevfs_ioctl.sh > after
$ diff -u before after
--- before 2019-09-27 11:41:50.634867620 -0300
+++ after 2019-09-27 11:42:07.453102978 -0300
@@ -24,6 +24,9 @@
[30] = "DROP_PRIVILEGES",
[31] = "GET_SPEED",
[32] = "CONNINFO_EX",
+ [33] = "FORBID_SUSPEND",
+ [34] = "ALLOW_SUSPEND",
+ [35] = "WAIT_FOR_RESUME",
[3] = "RESETEP",
[4] = "SETINTERFACE",
[5] = "SETCONFIGURATION",
$
This addresses the following perf build warning:
Warning: Kernel ABI header at 'tools/include/uapi/linux/usbdevice_fs.h' differs from latest version at 'include/uapi/linux/usbdevice_fs.h'
diff -u tools/include/uapi/linux/usbdevice_fs.h include/uapi/linux/usbdevice_fs.h
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-x1rb109b9nfi7pukota82xhj@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To pick the changes from:
1a4e58cce8 ("mm: introduce MADV_PAGEOUT")
9c276cc65a ("mm: introduce MADV_COLD")
That result in these changes in the tools:
$ tools/perf/trace/beauty/madvise_behavior.sh > before
$ cp include/uapi/asm-generic/mman-common.h tools/include/uapi/asm-generic/mman-common.h
$ git diff
diff --git a/tools/include/uapi/asm-generic/mman-common.h b/tools/include/uapi/asm-generic/mman-common.h
index 63b1f506ea67..c160a5354eb6 100644
--- a/tools/include/uapi/asm-generic/mman-common.h
+++ b/tools/include/uapi/asm-generic/mman-common.h
@@ -67,6 +67,9 @@
#define MADV_WIPEONFORK 18 /* Zero memory on fork, child only */
#define MADV_KEEPONFORK 19 /* Undo MADV_WIPEONFORK */
+#define MADV_COLD 20 /* deactivate these pages */
+#define MADV_PAGEOUT 21 /* reclaim these pages */
+
/* compatibility flags */
#define MAP_FILE 0
$ tools/perf/trace/beauty/madvise_behavior.sh > after
$ diff -u before after
--- before 2019-09-27 11:29:43.346320100 -0300
+++ after 2019-09-27 11:30:03.838570439 -0300
@@ -16,6 +16,8 @@
[17] = "DODUMP",
[18] = "WIPEONFORK",
[19] = "KEEPONFORK",
+ [20] = "COLD",
+ [21] = "PAGEOUT",
[100] = "HWPOISON",
[101] = "SOFT_OFFLINE",
};
$
I.e. now when madvise gets those behaviours as args, it will be able to
translate from the number to a human readable string.
This addresses the following perf build warning:
Warning: Kernel ABI header at 'tools/include/uapi/asm-generic/mman-common.h' differs from latest version at 'include/uapi/asm-generic/mman-common.h'
diff -u tools/include/uapi/asm-generic/mman-common.h include/uapi/asm-generic/mman-common.h
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-n40y6c4sa49p29q6sl8w3ufx@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
It turns out that sopine-baseboard needs same fix as pine64-plus
for ethernet PHY. Here too Realtek ethernet PHY chip needs additional
power on delay to properly initialize. Datasheet mentions that chip
needs 30 ms to be properly powered on and that it needs some more time
to be initialized.
Fix that by adding 100ms ramp delay to regulator responsible for
powering PHY.
Note that issue was found out and fix tested on pine64-lts, but it's
basically the same as sopine-baseboard, only layout and connectors
differ.
Fixes: bdfe4cebea ("arm64: allwinner: a64: add Ethernet PHY regulator for several boards")
Signed-off-by: Jernej Skrabec <jernej.skrabec@siol.net>
Signed-off-by: Maxime Ripard <mripard@kernel.org>
Looks like PMU in A64 is broken, it generates no interrupts at all and
as result 'perf top' shows no events.
Tested on Pine64-LTS.
Fixes: 34a97fcc71 ("arm64: dts: allwinner: a64: Add PMU node")
Cc: Harald Geyer <harald@ccbib.org>
Cc: Jared D. McNeill <jmcneill@NetBSD.org>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Reviewed-by: Emmanuel Vadot <manu@FreeBSD.org>
Signed-off-by: Maxime Ripard <mripard@kernel.org>
Depending on kernel and bootloader configuration, it's possible that
Realtek ethernet PHY isn't powered on properly. According to the
datasheet, it needs 30ms to power up and then some more time before it
can be used.
Fix that by adding 100ms ramp delay to regulator responsible for
powering PHY.
Fixes: 94dcfdc77f ("arm64: allwinner: pine64-plus: Enable dwmac-sun8i")
Suggested-by: Ondrej Jirman <megous@megous.com>
Signed-off-by: Jernej Skrabec <jernej.skrabec@siol.net>
Signed-off-by: Maxime Ripard <mripard@kernel.org>
make TARGETS=bpf kselftest fails with:
Makefile:127: tools/build/Makefile.include: No such file or directory
When the bpf tool make is invoked from tools Makefile, srctree is
cleared and the current logic check for srctree equals to empty
string to determine srctree location from CURDIR.
When the build in invoked from selftests/bpf Makefile, the srctree
is set to "." and the same logic used for srctree equals to empty is
needed to determine srctree.
Check building_out_of_srctree undefined as the condition for both
cases to fix "make TARGETS=bpf kselftest" build failure.
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20190927011344.4695-1-skhan@linuxfoundation.org
An optimized build such as:
make -C tools/perf CLANG=1 CC=clang EXTRA_CFLAGS="-O3
will turn the dereference operation into a ud2 instruction, raising a
SIGILL rather than a SIGSEGV. Use raise(..) for correctness and clarity.
Similar issues were addressed in Numfor Mbiziwo-Tiapo's patch:
https://lkml.org/lkml/2019/7/8/1234
Committer testing:
Before:
[root@quaco ~]# perf test hooks
55: perf hooks : Ok
[root@quaco ~]# perf test -v hooks
55: perf hooks :
--- start ---
test child forked, pid 17092
SIGSEGV is observed as expected, try to recover.
Fatal error (SEGFAULT) in perf hook 'test'
test child finished with 0
---- end ----
perf hooks: Ok
[root@quaco ~]#
After:
[root@quaco ~]# perf test hooks
55: perf hooks : Ok
[root@quaco ~]# perf test -v hooks
55: perf hooks :
--- start ---
test child forked, pid 17909
SIGSEGV is observed as expected, try to recover.
Fatal error (SEGFAULT) in perf hook 'test'
test child finished with 0
---- end ----
perf hooks: Ok
[root@quaco ~]#
Fixes: a074865e60 ("perf tools: Introduce perf hooks")
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lore.kernel.org/lkml/20190925195924.152834-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This was missing from scsi_device_from_queue() due to the introduction of
another new scsi_mq_ops_no_commit of linux-next commit 8930a6c207 ("scsi:
core: add support for request batching") from Martin's scsi/5.4/scsi-queue
or James' scsi/misc.
Only devicehandler code seems to call scsi_device_from_queue():
*** drivers/scsi/scsi_dh.c:
scsi_dh_activate[255] sdev = scsi_device_from_queue(q);
scsi_dh_set_params[302] sdev = scsi_device_from_queue(q);
scsi_dh_attach[325] sdev = scsi_device_from_queue(q);
scsi_dh_attached_handler_name[363] sdev = scsi_device_from_queue(q);
Fixes multipath tools follow-on errors:
$ multipath -v6
...
libdevmapper: ioctl/libdm-iface.c(1887): device-mapper: reload ioctl on mpatha failed: No such device
...
mpatha: failed to load map, error 19
...
showing also as kernel messages:
device-mapper: table: 252:0: multipath: error attaching hardware handler
device-mapper: ioctl: error adding target to table
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Fixes: 8930a6c207 ("scsi: core: add support for request batching")
Cc: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This was missing from scsi_mq_ops_no_commit of linux-next commit
8930a6c207 ("scsi: core: add support for request batching") from Martin's
scsi/5.4/scsi-queue or James' scsi/misc.
See also linux-next commit b7e9e1fb7a ("scsi: implement .cleanup_rq
callback") from block/for-next.
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Fixes: 8930a6c207 ("scsi: core: add support for request batching")
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Currently the suspend reg_field maps to the pmic voltage selection bits
and is used during suspend_enabe/disable() and during get_mode(). This
seems to be wrong for both use cases.
Use case one (suspend_enabe/disable):
Those callbacks are used to mark a regulator device as enabled/disabled
during suspend. Marking the regulator enabled during suspend is done by
the LDOx_CONF/BUCKx_CONF bit within the LDOx_CONT/BUCKx_CONT registers.
Setting this bit tells the DA9062 PMIC state machine to keep the
regulator on in POWERDOWN mode and switch to suspend voltage.
Use case two (get_mode):
The get_mode callback is used to retrieve the active mode state. Since
the regulator-setting-A is used for the active state and
regulator-setting-B for the suspend state there is no need to check
which regulator setting is active.
Fixes: 4068e5182a ("regulator: da9062: DA9062 regulator driver")
Signed-off-by: Marco Felsch <m.felsch@pengutronix.de>
Reviewed-by: Adam Thomson <Adam.Thomson.Opensource@diasemi.com>
Link: https://lore.kernel.org/r/20190917124246.11732-2-m.felsch@pengutronix.de
Signed-off-by: Mark Brown <broonie@kernel.org>
In case of WM1811 device there are currently being registered controls
referring to registers not existing on that device.
It has been noticed when getting values of "AIF1ADC2 Volume", "AIF1DAC2
Volume" controls was failing during ALSA state restoring at boot time:
"amixer: Mixer hw:0 load error: Device or resource busy"
Reading some registers through I2C was failing with EBUSY error and
indeed these registers were not available according to the datasheet.
To fix this controls not available on WM1811 are moved to a separate
array and registered only for WM8994 and WM8958.
There are some further differences between WM8994 and WM1811,
e.g. registers 603h, 604h, 605h, which are not covered in this patch.
Acked-by: Charles Keepax <ckeepax@opensource.cirrus.com>
Acked-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Sylwester Nawrocki <s.nawrocki@samsung.com>
Link: https://lore.kernel.org/r/20190920130218.32690-2-s.nawrocki@samsung.com
Signed-off-by: Mark Brown <broonie@kernel.org>
There are two problems in dcache_readdir() - one is that lockless traversal
of the list needs non-trivial cooperation of d_alloc() (at least a switch
to list_add_rcu(), and probably more than just that) and another is that
it assumes that no removal will happen without the directory locked exclusive.
Said assumption had always been there, never had been stated explicitly and
is violated by several places in the kernel (devpts and selinuxfs).
* replacement of next_positive() with different calling conventions:
it returns struct list_head * instead of struct dentry *; the latter is
passed in and out by reference, grabbing the result and dropping the original
value.
* scan is under ->d_lock. If we run out of timeslice, cursor is moved
after the last position we'd reached and we reschedule; then the scan continues
from that place. To avoid livelocks between multiple lseek() (with cursors
getting moved past each other, never reaching the real entries) we always
skip the cursors, need_resched() or not.
* returned list_head * is either ->d_child of dentry we'd found or
->d_subdirs of parent (if we got to the end of the list).
* dcache_readdir() and dcache_dir_lseek() switched to new helper.
dcache_readdir() always holds a reference to dentry passed to dir_emit() now.
Cursor is moved to just before the entry where dir_emit() has failed or into
the very end of the list, if we'd run out.
* move_cursor() eliminated - it had sucky calling conventions and
after fixing that it became simply list_move() (in lseek and scan_positives)
or list_move_tail() (in readdir).
All operations with the list are under ->d_lock now, and we do not
depend upon having all file removals done with parent locked exclusive
anymore.
Cc: stable@vger.kernel.org
Reported-by: "zhengbin (A)" <zhengbin13@huawei.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
As per GIC spec, ITLinesNumber indicates the maximum SPI INTID that
the GIC implementation supports. And the maximum SPI INTID an
implementation might support is 1019 (field value 11111).
max(GICD_TYPER_SPIS(...), 1020) is not what we actually want for
GIC_LINE_NR. Fix it to min(GICD_TYPER_SPIS(...), 1020).
Signed-off-by: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/1568789850-14080-1-git-send-email-yuzenghui@huawei.com
Currently the regulator-suspend-min/max-microvolt must be within the
root regulator node but the dt-bindings specifies it as subnode
properties for the regulator-state-[mem/disk/standby] node. The only DT
using this bindings currently is the at91-sama5d2_xplained.dts and this
DT uses it correctly. I don't know if it isn't tested but it can't work
without this fix.
Fixes: f7efad10b5 ("regulator: add PM suspend and resume hooks")
Signed-off-by: Marco Felsch <m.felsch@pengutronix.de>
Link: https://lore.kernel.org/r/20190917154021.14693-3-m.felsch@pengutronix.de
Signed-off-by: Mark Brown <broonie@kernel.org>
The center temperature of the supported devices stored in the constant
BMC150_ACCEL_TEMP_CENTER_VAL is not 24 degrees but 23 degrees.
It seems that some datasheets were inconsistent on this value leading
to the error. For most usecases will only make minor difference so
not queued for stable.
Signed-off-by: Pascal Bouwmann <bouwmann@tau-tec.de>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Do not allow configuring null sensor gain since it will force to 0
device outputs
Fixes: c8d4066c7246 ("iio: imu: st_lsm6dsx: remove invalid gain value for LSM9DS1")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
meson_saradc's irq handler uses priv->regmap so make sure that it is
allocated before the irq get enabled.
This also fixes crash when CONFIG_DEBUG_SHIRQ is enabled, as device
managed resources are freed in the inverted order they had been
allocated, priv->regmap was freed before the spurious fake irq that
CONFIG_DEBUG_SHIRQ adds called the handler.
Fixes: 3af109131b ("iio: adc: meson-saradc: switch from polling to interrupt mode")
Reported-by: Elie Roudninski <xademax@gmail.com>
Signed-off-by: Remi Pommarel <repk@triplefau.lt>
Reviewed-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Tested-by: Elie ROUDNINSKI <xademax@gmail.com>
Reviewed-by: Kevin Hilman <khilman@baylibre.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2019-09-08 12:30:32 +01:00
1303 changed files with 12660 additions and 9944 deletions
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.