Pull btrfs fixes from David Sterba:
"Two fixes.
The first is a regression: when dropping some incompat bits the
conditions were reversed. The other is a fix for rename whiteout
potentially leaving stack memory linked to a list"
* tag 'for-5.6-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: fix removal of raid[56|1c34} incompat flags after removing block group
btrfs: fix log context list corruption after rename whiteout error
Merge misc fixes from Andrew Morton:
"10 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
x86/mm: split vmalloc_sync_all()
mm, slub: prevent kmalloc_node crashes and memory leaks
mm/mmu_notifier: silence PROVE_RCU_LIST warnings
epoll: fix possible lost wakeup on epoll_ctl() path
mm: do not allow MADV_PAGEOUT for CoW pages
mm, memcg: throttle allocators based on ancestral memory.high
mm, memcg: fix corruption on 64-bit divisor in memory.high throttling
page-flags: fix a crash at SetPageError(THP_SWAP)
mm/hotplug: fix hot remove failure in SPARSEMEM|!VMEMMAP case
memcg: fix NULL pointer dereference in __mem_cgroup_usage_unregister_event
Sachin reports [1] a crash in SLUB __slab_alloc():
BUG: Kernel NULL pointer dereference on read at 0x000073b0
Faulting instruction address: 0xc0000000003d55f4
Oops: Kernel access of bad area, sig: 11 [#1]
LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
Modules linked in:
CPU: 19 PID: 1 Comm: systemd Not tainted 5.6.0-rc2-next-20200218-autotest #1
NIP: c0000000003d55f4 LR: c0000000003d5b94 CTR: 0000000000000000
REGS: c0000008b37836d0 TRAP: 0300 Not tainted (5.6.0-rc2-next-20200218-autotest)
MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 24004844 XER: 00000000
CFAR: c00000000000dec4 DAR: 00000000000073b0 DSISR: 40000000 IRQMASK: 1
GPR00: c0000000003d5b94 c0000008b3783960 c00000000155d400 c0000008b301f500
GPR04: 0000000000000dc0 0000000000000002 c0000000003443d8 c0000008bb398620
GPR08: 00000008ba2f0000 0000000000000001 0000000000000000 0000000000000000
GPR12: 0000000024004844 c00000001ec52a00 0000000000000000 0000000000000000
GPR16: c0000008a1b20048 c000000001595898 c000000001750c18 0000000000000002
GPR20: c000000001750c28 c000000001624470 0000000fffffffe0 5deadbeef0000122
GPR24: 0000000000000001 0000000000000dc0 0000000000000002 c0000000003443d8
GPR28: c0000008b301f500 c0000008bb398620 0000000000000000 c00c000002287180
NIP ___slab_alloc+0x1f4/0x760
LR __slab_alloc+0x34/0x60
Call Trace:
___slab_alloc+0x334/0x760 (unreliable)
__slab_alloc+0x34/0x60
__kmalloc_node+0x110/0x490
kvmalloc_node+0x58/0x110
mem_cgroup_css_online+0x108/0x270
online_css+0x48/0xd0
cgroup_apply_control_enable+0x2ec/0x4d0
cgroup_mkdir+0x228/0x5f0
kernfs_iop_mkdir+0x90/0xf0
vfs_mkdir+0x110/0x230
do_mkdirat+0xb0/0x1a0
system_call+0x5c/0x68
This is a PowerPC platform with following NUMA topology:
available: 2 nodes (0-1)
node 0 cpus:
node 0 size: 0 MB
node 0 free: 0 MB
node 1 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
node 1 size: 35247 MB
node 1 free: 30907 MB
node distances:
node 0 1
0: 10 40
1: 40 10
possible numa nodes: 0-31
This only happens with a mmotm patch "mm/memcontrol.c: allocate
shrinker_map on appropriate NUMA node" [2] which effectively calls
kmalloc_node for each possible node. SLUB however only allocates
kmem_cache_node on online N_NORMAL_MEMORY nodes, and relies on
node_to_mem_node to return such valid node for other nodes since commit
a561ce00b0 ("slub: fall back to node_to_mem_node() node if allocating
on memoryless node"). This is however not true in this configuration
where the _node_numa_mem_ array is not initialized for nodes 0 and 2-31,
thus it contains zeroes and get_partial() ends up accessing
non-allocated kmem_cache_node.
A related issue was reported by Bharata (originally by Ramachandran) [3]
where a similar PowerPC configuration, but with mainline kernel without
patch [2] ends up allocating large amounts of pages by kmalloc-1k
kmalloc-512. This seems to have the same underlying issue with
node_to_mem_node() not behaving as expected, and might probably also
lead to an infinite loop with CONFIG_SLUB_CPU_PARTIAL [4].
This patch should fix both issues by not relying on node_to_mem_node()
anymore and instead simply falling back to NUMA_NO_NODE, when
kmalloc_node(node) is attempted for a node that's not online, or has no
usable memory. The "usable memory" condition is also changed from
node_present_pages() to N_NORMAL_MEMORY node state, as that is exactly
the condition that SLUB uses to allocate kmem_cache_node structures.
The check in get_partial() is removed completely, as the checks in
___slab_alloc() are now sufficient to prevent get_partial() being
reached with an invalid node.
[1] https://lore.kernel.org/linux-next/3381CD91-AB3D-4773-BA04-E7A072A63968@linux.vnet.ibm.com/
[2] https://lore.kernel.org/linux-mm/fff0e636-4c36-ed10-281c-8cdb0687c839@virtuozzo.com/
[3] https://lore.kernel.org/linux-mm/20200317092624.GB22538@in.ibm.com/
[4] https://lore.kernel.org/linux-mm/088b5996-faae-8a56-ef9c-5b567125ae54@suse.cz/
Fixes: a561ce00b0 ("slub: fall back to node_to_mem_node() node if allocating on memoryless node")
Reported-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
Reported-by: PUVICHAKRAVARTHY RAMACHANDRAN <puvichakravarthy@in.ibm.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
Tested-by: Bharata B Rao <bharata@linux.ibm.com>
Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Christopher Lameter <cl@linux.com>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Nathan Lynch <nathanl@linux.ibm.com>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20200320115533.9604-1-vbabka@suse.cz
Debugged-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It is safe to traverse mm->notifier_subscriptions->list either under
SRCU read lock or mm->notifier_subscriptions->lock using
hlist_for_each_entry_rcu(). Silence the PROVE_RCU_LIST false positives,
for example,
WARNING: suspicious RCU usage
-----------------------------
mm/mmu_notifier.c:484 RCU-list traversed in non-reader section!!
other info that might help us debug this:
rcu_scheduler_active = 2, debug_locks = 1
3 locks held by libvirtd/802:
#0: ffff9321e3f58148 (&mm->mmap_sem#2){++++}, at: do_mprotect_pkey+0xe1/0x3e0
#1: ffffffff91ae6160 (mmu_notifier_invalidate_range_start){+.+.}, at: change_p4d_range+0x5fa/0x800
#2: ffffffff91ae6e08 (srcu){....}, at: __mmu_notifier_invalidate_range_start+0x178/0x460
stack backtrace:
CPU: 7 PID: 802 Comm: libvirtd Tainted: G I 5.6.0-rc6-next-20200317+ #2
Hardware name: HP ProLiant BL460c Gen8, BIOS I31 11/02/2014
Call Trace:
dump_stack+0xa4/0xfe
lockdep_rcu_suspicious+0xeb/0xf5
__mmu_notifier_invalidate_range_start+0x3ff/0x460
change_p4d_range+0x746/0x800
change_protection+0x1df/0x300
mprotect_fixup+0x245/0x3e0
do_mprotect_pkey+0x23b/0x3e0
__x64_sys_mprotect+0x51/0x70
do_syscall_64+0x91/0xae8
entry_SYSCALL_64_after_hwframe+0x49/0xb3
Signed-off-by: Qian Cai <cai@lca.pw>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Link: http://lkml.kernel.org/r/20200317175640.2047-1-cai@lca.pw
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This fixes possible lost wakeup introduced by commit a218cc4914.
Originally modifications to ep->wq were serialized by ep->wq.lock, but
in commit a218cc4914 ("epoll: use rwlock in order to reduce
ep_poll_callback() contention") a new rw lock was introduced in order to
relax fd event path, i.e. callers of ep_poll_callback() function.
After the change ep_modify and ep_insert (both are called on epoll_ctl()
path) were switched to ep->lock, but ep_poll (epoll_wait) was using
ep->wq.lock on wqueue list modification.
The bug doesn't lead to any wqueue list corruptions, because wake up
path and list modifications were serialized by ep->wq.lock internally,
but actual waitqueue_active() check prior wake_up() call can be
reordered with modifications of ep ready list, thus wake up can be lost.
And yes, can be healed by explicit smp_mb():
list_add_tail(&epi->rdlink, &ep->rdllist);
smp_mb();
if (waitqueue_active(&ep->wq))
wake_up(&ep->wp);
But let's make it simple, thus current patch replaces ep->wq.lock with
the ep->lock for wqueue modifications, thus wake up path always observes
activeness of the wqueue correcty.
Fixes: a218cc4914 ("epoll: use rwlock in order to reduce ep_poll_callback() contention")
Reported-by: Max Neunhoeffer <max@arangodb.com>
Signed-off-by: Roman Penyaev <rpenyaev@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Max Neunhoeffer <max@arangodb.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Christopher Kohlhoff <chris.kohlhoff@clearpool.io>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: Jason Baron <jbaron@akamai.com>
Cc: Jes Sorensen <jes.sorensen@gmail.com>
Cc: <stable@vger.kernel.org> [5.1+]
Link: http://lkml.kernel.org/r/20200214170211.561524-1-rpenyaev@suse.de
References: https://bugzilla.kernel.org/show_bug.cgi?id=205933
Bisected-by: Max Neunhoeffer <max@arangodb.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In section_deactivate(), pfn_to_page() doesn't work any more after
ms->section_mem_map is resetting to NULL in SPARSEMEM|!VMEMMAP case. It
causes a hot remove failure:
kernel BUG at mm/page_alloc.c:4806!
invalid opcode: 0000 [#1] SMP PTI
CPU: 3 PID: 8 Comm: kworker/u16:0 Tainted: G W 5.5.0-next-20200205+ #340
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
Workqueue: kacpi_hotplug acpi_hotplug_work_fn
RIP: 0010:free_pages+0x85/0xa0
Call Trace:
__remove_pages+0x99/0xc0
arch_remove_memory+0x23/0x4d
try_remove_memory+0xc8/0x130
__remove_memory+0xa/0x11
acpi_memory_device_remove+0x72/0x100
acpi_bus_trim+0x55/0x90
acpi_device_hotplug+0x2eb/0x3d0
acpi_hotplug_work_fn+0x1a/0x30
process_one_work+0x1a7/0x370
worker_thread+0x30/0x380
kthread+0x112/0x130
ret_from_fork+0x35/0x40
Let's move the ->section_mem_map resetting after
depopulate_section_memmap() to fix it.
[akpm@linux-foundation.org: remove unneeded initialization, per David]
Fixes: ba72b4c8cf ("mm/sparsemem: support sub-section hotplug")
Signed-off-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Wei Yang <richardw.yang@linux.intel.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20200307084229.28251-2-bhe@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
An eventfd monitors multiple memory thresholds of the cgroup, closes them,
the kernel deletes all events related to this eventfd. Before all events
are deleted, another eventfd monitors the memory threshold of this cgroup,
leading to a crash:
BUG: kernel NULL pointer dereference, address: 0000000000000004
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD 800000033058e067 P4D 800000033058e067 PUD 3355ce067 PMD 0
Oops: 0002 [#1] SMP PTI
CPU: 2 PID: 14012 Comm: kworker/2:6 Kdump: loaded Not tainted 5.6.0-rc4 #3
Hardware name: LENOVO 20AWS01K00/20AWS01K00, BIOS GLET70WW (2.24 ) 05/21/2014
Workqueue: events memcg_event_remove
RIP: 0010:__mem_cgroup_usage_unregister_event+0xb3/0x190
RSP: 0018:ffffb47e01c4fe18 EFLAGS: 00010202
RAX: 0000000000000001 RBX: ffff8bb223a8a000 RCX: 0000000000000001
RDX: 0000000000000001 RSI: ffff8bb22fb83540 RDI: 0000000000000001
RBP: ffffb47e01c4fe48 R08: 0000000000000000 R09: 0000000000000010
R10: 000000000000000c R11: 071c71c71c71c71c R12: ffff8bb226aba880
R13: ffff8bb223a8a480 R14: 0000000000000000 R15: 0000000000000000
FS: 0000000000000000(0000) GS:ffff8bb242680000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000004 CR3: 000000032c29c003 CR4: 00000000001606e0
Call Trace:
memcg_event_remove+0x32/0x90
process_one_work+0x172/0x380
worker_thread+0x49/0x3f0
kthread+0xf8/0x130
ret_from_fork+0x35/0x40
CR2: 0000000000000004
We can reproduce this problem in the following ways:
1. We create a new cgroup subdirectory and a new eventfd, and then we
monitor multiple memory thresholds of the cgroup through this eventfd.
2. closing this eventfd, and __mem_cgroup_usage_unregister_event ()
will be called multiple times to delete all events related to this
eventfd.
The first time __mem_cgroup_usage_unregister_event() is called, the
kernel will clear all items related to this eventfd in thresholds->
primary.
Since there is currently only one eventfd, thresholds-> primary becomes
empty, so the kernel will set thresholds-> primary and hresholds-> spare
to NULL. If at this time, the user creates a new eventfd and monitor
the memory threshold of this cgroup, kernel will re-initialize
thresholds-> primary.
Then when __mem_cgroup_usage_unregister_event () is called for the
second time, because thresholds-> primary is not empty, the system will
access thresholds-> spare, but thresholds-> spare is NULL, which will
trigger a crash.
In general, the longer it takes to delete all events related to this
eventfd, the easier it is to trigger this problem.
The solution is to check whether the thresholds associated with the
eventfd has been cleared when deleting the event. If so, we do nothing.
[akpm@linux-foundation.org: fix comment, per Kirill]
Fixes: 907860ed38 ("cgroups: make cftype.unregister_event() void-returning")
Signed-off-by: Chunguang Xu <brookxu@tencent.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/077a6f67-aefa-4591-efec-f2f3af2b0b02@gmail.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull block fixes from Jens Axboe:
"Just two NVMe fabrics fixes that should go into 5.6"
* tag 'block-5.6-20200320' of git://git.kernel.dk/linux-block:
nvmet-tcp: set MSG_MORE only if we actually have more to send
nvme-rdma: Avoid double freeing of async event data
Pull io_uring fixes from Jens Axboe:
"Two different fixes in here:
- Fix for a potential NULL pointer deref for links with async or
drain marked (Pavel)
- Fix for not properly checking RLIMIT_NOFILE for async punted
operations.
This affects openat/openat2, which were added this cycle, and
accept4. I did a full audit of other cases where we might check
current->signal->rlim[] and found only RLIMIT_FSIZE for buffered
writes and fallocate. That one is fixed and queued for 5.7 and
marked stable"
* tag 'io_uring-5.6-20200320' of git://git.kernel.dk/linux-block:
io_uring: make sure accept honor rlimit nofile
io_uring: make sure openat/openat2 honor rlimit nofile
io_uring: NULL-deref for IOSQE_{ASYNC,DRAIN}
Pull turbostat updates from Len Brown:
"Update to turbostat v20.03.20.
These patches unlock the full turbostat features for some new
machines, plus a couple other minor tweaks"
* 'turbostat' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux:
tools/power turbostat: update version
tools/power turbostat: Print cpuidle information
tools/power turbostat: Fix 32-bit capabilities warning
tools/power turbostat: Fix missing SYS_LPI counter on some Chromebooks
tools/power turbostat: Support Elkhart Lake
tools/power turbostat: Support Jasper Lake
tools/power turbostat: Support Ice Lake server
tools/power turbostat: Support Tiger Lake
tools/power turbostat: Fix gcc build warnings
tools/power turbostat: Support Cometlake
Pull powerpc fixes from Michael Ellerman:
"Two fixes for bugs introduced this cycle:
- fix a crash when shutting down a KVM PR guest (our original style
of KVM which doesn't use hypervisor mode)
- fix for the recently added 32-bit KASAN_VMALLOC support
Thanks to: Christophe Leroy, Greg Kurz, Sean Christopherson"
* tag 'powerpc-5.6-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
KVM: PPC: Fix kernel crash with PR KVM
powerpc/kasan: Fix shadow memory protection with CONFIG_KASAN_VMALLOC
Pull NVMe fixes from Keith:
"Two late nvme fabrics fixes for 5.6: a double free with the rdma
transport, and a regression fix for tcp; please pull."
* 'nvme-5.6-rc6' of git://git.infradead.org/nvme:
nvmet-tcp: set MSG_MORE only if we actually have more to send
nvme-rdma: Avoid double freeing of async event data
We are incorrectly dropping the raid56 and raid1c34 incompat flags when
there are still raid56 and raid1c34 block groups, not when we do not any
of those anymore. The logic just got unintentionally broken after adding
the support for the raid1c34 modes.
Fix this by clear the flags only if we do not have block groups with the
respective profiles.
Fixes: 9c907446dc ("btrfs: drop incompat bit for raid1c34 after last block group is gone")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When we send PDU data, we want to optimize the tcp stack
operation if we have more data to send. So when we set MSG_MORE
when:
- We have more fragments coming in the batch, or
- We have a more data to send in this PDU
- We don't have a data digest trailer
- We optimize with the SUCCESS flag and omit the NVMe completion
(used if sq_head pointer update is disabled)
This addresses a regression in QD=1 with SUCCESS flag optimization
as we unconditionally set MSG_MORE when we didn't actually have
more data to send.
Fixes: 7058329538 ("nvmet-tcp: implement C2HData SUCCESS optimization")
Reported-by: Mark Wunderlich <mark.wunderlich@intel.com>
Tested-by: Mark Wunderlich <mark.wunderlich@intel.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Pull arm64 fixes from Will Deacon:
- Fix panic() when it occurs during secondary CPU startup
- Fix "kpti=off" when KASLR is enabled
- Fix howler in compat syscall table for vDSO clock_getres() fallback
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: compat: Fix syscall number of compat_clock_getres
arm64: kpti: Fix "kpti=off" when KASLR is enabled
arm64: smp: fix crash_smp_send_stop() behaviour
arm64: smp: fix smp_send_stop() behaviour
Pull char/misc driver fixes from Greg KH:
"Here are some small different driver fixes for 5.6-rc7:
- binderfs fix, yet again
- slimbus new device id added
- hwtracing bugfixes for reported issues and a new device id
All of these have been in linux-next with no reported issues"
* tag 'char-misc-5.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
intel_th: pci: Add Elkhart Lake CPU support
intel_th: Fix user-visible error codes
intel_th: msu: Fix the unexpected state warning
stm class: sys-t: Fix the use of time_after()
slimbus: ngd: add v2.1.0 compatible
binderfs: use refcount for binder control devices too
Pull staging/IIO fixes from Greg KH:
"Here are a number of small staging and IIO driver fixes for 5.6-rc7
Nothing major here, just resolutions for some reported problems:
- iio bugfixes for a number of different drivers
- greybus loopback_test fixes
- wfx driver fixes
All of these have been in linux-next with no reported issues"
* tag 'staging-5.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
staging: rtl8188eu: Add device id for MERCUSYS MW150US v2
staging: greybus: loopback_test: fix potential path truncations
staging: greybus: loopback_test: fix potential path truncation
staging: greybus: loopback_test: fix poll-mask build breakage
staging: wfx: fix RCU usage between hif_join() and ieee80211_bss_get_ie()
staging: wfx: fix RCU usage in wfx_join_finalize()
staging: wfx: make warning about pending frame less scary
staging: wfx: fix lines ending with a comma instead of a semicolon
staging: wfx: fix warning about freeing in-use mutex during device unregister
staging/speakup: fix get_word non-space look-ahead
iio: ping: set pa_laser_ping_cfg in of_ping_match
iio: chemical: sps30: fix missing triggered buffer dependency
iio: st_sensors: remap SMO8840 to LIS2DH12
iio: light: vcnl4000: update sampling periods for vcnl4040
iio: light: vcnl4000: update sampling periods for vcnl4200
iio: accel: adxl372: Set iio_chan BE
iio: magnetometer: ak8974: Fix negative raw values in sysfs
iio: trigger: stm32-timer: disable master mode when stopping
iio: adc: stm32-dfsdm: fix sleep in atomic context
iio: adc: at91-sama5d2_adc: fix differential channels in triggered mode
Pull USB fixes from Greg KH:
"Here are some small USB fixes for 5.6-rc7. And there's a thunderbolt
driver fix thrown in for good measure as well.
These fixes are:
- new device ids for usb-serial drivers
- thunderbolt error code fix
- xhci driver fixes
- typec fixes
- cdc-acm driver fixes
- chipidea driver fix
- more USB quirks added for devices that need them.
All of these have been in linux-next with no reported issues"
* tag 'usb-5.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
USB: cdc-acm: fix rounding error in TIOCSSERIAL
USB: cdc-acm: fix close_delay and closing_wait units in TIOCSSERIAL
usb: quirks: add NO_LPM quirk for RTL8153 based ethernet adapters
usb: chipidea: udc: fix sleeping function called from invalid context
USB: serial: pl2303: add device-id for HP LD381
USB: serial: option: add ME910G1 ECM composition 0x110b
usb: host: xhci-plat: add a shutdown
usb: typec: ucsi: displayport: Fix a potential race during registration
usb: typec: ucsi: displayport: Fix NULL pointer dereference
USB: Disable LPM on WD19's Realtek Hub
usb: xhci: apply XHCI_SUSPEND_DELAY to AMD XHCI controller 1022:145c
xhci: Do not open code __print_symbolic() in xhci trace events
thunderbolt: Fix error code in tb_port_is_width_supported()
Pull tty fixes from Greg KH:
"Here are three small tty_io bugfixes for reported issues that Eric has
resolved for 5.6-rc7
All of these have been in linux-next with no reported issues"
* tag 'tty-5.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
tty: fix compat TIOCGSERIAL checking wrong function ptr
tty: fix compat TIOCGSERIAL leaking uninitialized memory
tty: drop outdated comments about release_tty() locking
Pull sound fixes from Takashi Iwai:
"A few fixes covering the issues reported by syzkaller, a couple of
fixes for the MIDI decoding bug, and a few usual HD-audio quirks.
Some of them are about ALSA core stuff, but they are small fixes just
for corner cases, and nothing thrilling"
* tag 'sound-5.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda/realtek - Enable the headset of Acer N50-600 with ALC662
ALSA: hda/realtek - Enable headset mic of Acer X2660G with ALC662
ALSA: seq: oss: Fix running status after receiving sysex
ALSA: seq: virmidi: Fix running status after receiving sysex
ALSA: pcm: oss: Remove WARNING from snd_pcm_plug_alloc() checks
ALSA: hda/realtek: Fix pop noise on ALC225
ALSA: line6: Fix endless MIDI read loop
ALSA: pcm: oss: Avoid plugin buffer overflow
Pull drm fixes from Dave Airlie:
"Hope you are well hiding out above the garage. A few amdgpu changes
but nothing too major. I've had a wisdom tooth out this week so
haven't been to on top of things, but all seems good.
core:
- fix lease warning
i915:
- Track active elements during dequeue
- Fix failure to handle all MCR ranges
- Revert unnecessary workaround
amdgpu:
- Pageflip fix
- VCN clockgating fixes
- GPR debugfs fix for umr
- GPU reset fix
- eDP fix for MBP
- DCN2.x fix
dw-hdmi:
- fix AVI frame colorimetry
komeda:
- fix compiler warning
bochs:
- downgrade a binding failure to a warning"
* tag 'drm-fixes-2020-03-20' of git://anongit.freedesktop.org/drm/drm:
drm/amd/display: Fix pageflip event race condition for DCN.
drm/amdgpu: fix typo for vcn2.5/jpeg2.5 idle check
drm/amdgpu: fix typo for vcn2/jpeg2 idle check
drm/amdgpu: fix typo for vcn1 idle check
drm/lease: fix WARNING in idr_destroy
drm/i915: Handle all MCR ranges
Revert "drm/i915/tgl: Add extra hdc flush workaround"
drm/i915/execlists: Track active elements during dequeue
drm/bochs: downgrade pci_request_region failure from error to warning
drm/amd/display: Add link_rate quirk for Apple 15" MBP 2017
drm/amdgpu: add fbdev suspend/resume on gpu reset
drm/amd/amdgpu: Fix GPR read from debugfs (v2)
drm/amd/display: fix typos for dcn20_funcs and dcn21_funcs struct
drm/komeda: mark PM functions as __maybe_unused
drm/bridge: dw-hdmi: fix AVI frame colorimetry
Just like commit 4022e7af86, this fixes the fact that
IORING_OP_ACCEPT ends up using get_unused_fd_flags(), which checks
current->signal->rlim[] for limits.
Add an extra argument to __sys_accept4_file() that allows us to pass
in the proper nofile limit, and grab it at request prep time.
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Dmitry reports that a test case shows that io_uring isn't honoring a
modified rlimit nofile setting. get_unused_fd_flags() checks the task
signal->rlimi[] for the limits. As this isn't easily inheritable,
provide a __get_unused_fd_flags() that takes the value instead. Then we
can grab it when the request is prepared (from the original task), and
pass that in when we do the async part part of the open.
Reported-by: Dmitry Kadashev <dkadashev@gmail.com>
Tested-by: Dmitry Kadashev <dkadashev@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Some Chromebook BIOS' do not export an ACPI LPIT, which is how
Linux finds the residency counter for CPU and SYSTEM low power states,
that is exports in /sys/devices/system/cpu/cpuidle/*residency_us
When these sysfs attributes are missing, check the debugfs attrubte
from the pmc_core driver, which accesses the same counter value.
Signed-off-by: Len Brown <len.brown@intel.com>
From a turbostat point of view the Tremont-based Elkhart Lake
is very similar to Goldmont, reuse the code of Goldmont.
Elkhart Lake does not support 'group turbo limit counter'
nor C3, adjust the code accordingly.
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Jasper Lake, like Elkhart Lake, uses a Tremont CPU.
So reuse the code.
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
From a turbostat point of view, Ice Lake server looks like Sky Lake server.
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
From a turbostat point of view, Tiger Lake looks like Ice Lake.
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Warning: ‘__builtin_strncpy’ specified bound 20 equals destination size
[-Wstringop-truncation]
reduce param to strncpy, to guarantee that a null byte is always copied
into destination buffer.
Signed-off-by: Len Brown <len.brown@intel.com>
With PR KVM, shutting down a VM causes the host kernel to crash:
[ 314.219284] BUG: Unable to handle kernel data access on read at 0xc00800000176c638
[ 314.219299] Faulting instruction address: 0xc008000000d4ddb0
cpu 0x0: Vector: 300 (Data Access) at [c00000036da077a0]
pc: c008000000d4ddb0: kvmppc_mmu_pte_flush_all+0x68/0xd0 [kvm_pr]
lr: c008000000d4dd94: kvmppc_mmu_pte_flush_all+0x4c/0xd0 [kvm_pr]
sp: c00000036da07a30
msr: 900000010280b033
dar: c00800000176c638
dsisr: 40000000
current = 0xc00000036d4c0000
paca = 0xc000000001a00000 irqmask: 0x03 irq_happened: 0x01
pid = 1992, comm = qemu-system-ppc
Linux version 5.6.0-master-gku+ (greg@palmb) (gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04)) #17 SMP Wed Mar 18 13:49:29 CET 2020
enter ? for help
[c00000036da07ab0] c008000000d4fbe0 kvmppc_mmu_destroy_pr+0x28/0x60 [kvm_pr]
[c00000036da07ae0] c0080000009eab8c kvmppc_mmu_destroy+0x34/0x50 [kvm]
[c00000036da07b00] c0080000009e50c0 kvm_arch_vcpu_destroy+0x108/0x140 [kvm]
[c00000036da07b30] c0080000009d1b50 kvm_vcpu_destroy+0x28/0x80 [kvm]
[c00000036da07b60] c0080000009e4434 kvm_arch_destroy_vm+0xbc/0x190 [kvm]
[c00000036da07ba0] c0080000009d9c2c kvm_put_kvm+0x1d4/0x3f0 [kvm]
[c00000036da07c00] c0080000009da760 kvm_vm_release+0x38/0x60 [kvm]
[c00000036da07c30] c000000000420be0 __fput+0xe0/0x310
[c00000036da07c90] c0000000001747a0 task_work_run+0x150/0x1c0
[c00000036da07cf0] c00000000014896c do_exit+0x44c/0xd00
[c00000036da07dc0] c0000000001492f4 do_group_exit+0x64/0xd0
[c00000036da07e00] c000000000149384 sys_exit_group+0x24/0x30
[c00000036da07e20] c00000000000b9d0 system_call+0x5c/0x68
This is caused by a use-after-free in kvmppc_mmu_pte_flush_all()
which dereferences vcpu->arch.book3s which was previously freed by
kvmppc_core_vcpu_free_pr(). This happens because kvmppc_mmu_destroy()
is called after kvmppc_core_vcpu_free() since commit ff030fdf55
("KVM: PPC: Move kvm_vcpu_init() invocation to common code").
The kvmppc_mmu_destroy() helper calls one of the following depending
on the KVM backend:
- kvmppc_mmu_destroy_hv() which does nothing (Book3s HV)
- kvmppc_mmu_destroy_pr() which undoes the effects of
kvmppc_mmu_init() (Book3s PR 32-bit)
- kvmppc_mmu_destroy_pr() which undoes the effects of
kvmppc_mmu_init() (Book3s PR 64-bit)
- kvmppc_mmu_destroy_e500() which does nothing (BookE e500/e500mc)
It turns out that this is only relevant to PR KVM actually. And both
32 and 64 backends need vcpu->arch.book3s to be valid when calling
kvmppc_mmu_destroy_pr(). So instead of calling kvmppc_mmu_destroy()
from kvm_arch_vcpu_destroy(), call kvmppc_mmu_destroy_pr() at the
beginning of kvmppc_core_vcpu_free_pr(). This is consistent with
kvmppc_mmu_init() being the last call in kvmppc_core_vcpu_create_pr().
For the same reason, if kvmppc_core_vcpu_create_pr() returns an
error then this means that kvmppc_mmu_init() was either not called
or failed, in which case kvmppc_mmu_destroy() should not be called.
Drop the line in the error path of kvm_arch_vcpu_create().
Fixes: ff030fdf55 ("KVM: PPC: Move kvm_vcpu_init() invocation to common code")
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/158455341029.178873.15248663726399374882.stgit@bahia.lan
Commit '16f17eda8bad ("drm/amd/display: Send vblank and user
events at vsartup for DCN")' introduces a new way of pageflip
completion handling for DCN, and some trouble.
The current implementation introduces a race condition, which
can cause pageflip completion events to be sent out one vblank
too early, thereby confusing userspace and causing flicker:
prepare_flip_isr():
1. Pageflip programming takes the ddev->event_lock.
2. Sets acrtc->pflip_status == AMDGPU_FLIP_SUBMITTED
3. Releases ddev->event_lock.
--> Deadline for surface address regs double-buffering passes on
target pipe.
4. dc_commit_updates_for_stream() MMIO programs the new pageflip
into hw, but too late for current vblank.
=> pflip_status == AMDGPU_FLIP_SUBMITTED, but flip won't complete
in current vblank due to missing the double-buffering deadline
by a tiny bit.
5. VSTARTUP trigger point in vblank is reached, VSTARTUP irq fires,
dm_dcn_crtc_high_irq() gets called.
6. Detects pflip_status == AMDGPU_FLIP_SUBMITTED and assumes the
pageflip has been completed/will complete in this vblank and
sends out pageflip completion event to userspace and resets
pflip_status = AMDGPU_FLIP_NONE.
=> Flip completion event sent out one vblank too early.
This behaviour has been observed during my testing with measurement
hardware a couple of time.
The commit message says that the extra flip event code was added to
dm_dcn_crtc_high_irq() to prevent missing to send out pageflip events
in case the pflip irq doesn't fire, because the "DCH HUBP" component
is clock gated and doesn't fire pflip irqs in that state. Also that
this clock gating may happen if no planes are active. This suggests
that the problem addressed by that commit can't happen if planes
are active.
The proposed solution is therefore to only execute the extra pflip
completion code iff the count of active planes is zero and otherwise
leave pflip completion handling to the pflip irq handler, for a
more race-free experience.
Note that i don't know if this fixes the problem the original commit
tried to address, as i don't know what the test scenario was. It
does fix the observed too early pageflip events though and points
out the problem introduced.
Fixes: 16f17eda8b ("drm/amd/display: Send vblank and user events at vsartup for DCN")
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Pull MMC fixes from Ulf Hansson:
- rtsx_pci: Fix support for some various speed modes
- sdhci-of-at91: Fix support for GPIO card detect on SAMA5D2
- sdhci-cadence: Fix support for DDR52 speed mode for eMMC on UniPhier
- sdhci-acpi: Fix broken WP support on Acer Aspire Switch 10
- sdhci-acpi: Workaround FW bug for suspend on Lenovo Miix 320
* tag 'mmc-v5.6-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
mmc: rtsx_pci: Fix support for speed-modes that relies on tuning
mmc: sdhci-of-at91: fix cd-gpios for SAMA5D2
mmc: sdhci-cadence: set SDHCI_QUIRK2_PRESET_VALUE_BROKEN for UniPhier
mmc: sdhci-acpi: Disable write protect detection on Acer Aspire Switch 10 (SW5-012)
mmc: sdhci-acpi: Switch signal voltage back to 3.3V on suspend on external microSD on Lenovo Miix 320
The syscall number of compat_clock_getres was erroneously set to 247
(__NR_io_cancel!) instead of 264. This causes the vDSO fallback of
clock_getres() to land on the wrong syscall for compat tasks.
Fix the numbering.
Cc: <stable@vger.kernel.org>
Fixes: 53c489e1df ("arm64: compat: Add missing syscall numbers")
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Pull cifs fixes from Steve French:
"Three small smb3 fixes, two for stable"
* tag '5.6-rc6-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
CIFS: fiemap: do not return EINVAL if get nothing
CIFS: Increment num_remote_opens stats counter even in case of smb2_query_dir_first
cifs: potential unintitliazed error code in cifs_getattr()
Pull Kbuild fixes from Masahiro Yamada:
- fix __uint128_t capability test in Kconfig when GCC that defaults to
32-bit is used to build the 64-bit kernel
- suppress new noisy Clang warnings -Wpointer-to-enum-cast
- move the namespace field in Module.symvers for the backward
compatibility reason for the depmod tool
- use available compression for initramdisk when INTRAMFS_SOURCE is
defined, which was the original behavior
- fix modpost to handle correct large section numbers when it refers to
modversion CRCs and module namespaces
- fix comments and documents
* tag 'kbuild-fixes-v5.6-3' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
scripts/kallsyms: fix wrong kallsyms_relative_base
modpost: Get proper section index by get_secindex() instead of st_shndx
initramfs: restore default compression behavior
modpost: move the namespace field in Module.symvers last
kbuild: Disable -Wpointer-to-enum-cast
kbuild: doc: fix references to other documents
int128: fix __uint128_t compiler test in Kconfig
kconfig: introduce m32-flag and m64-flag
kbuild: Fix inconsistent comment
I have hit the following build error:
armv7a-hardfloat-linux-gnueabi-ld: drivers/rtc/rtc-max8907.o: in function `max8907_rtc_probe':
rtc-max8907.c:(.text+0x400): undefined reference to `regmap_irq_get_virq'
max8907 should select REGMAP_IRQ
Fixes: 94c01ab6d7 ("rtc: add MAX8907 RTC driver")
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Corentin Labbe <clabbe@baylibre.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There is the code in the read_symbol function in 'scripts/kallsyms.c':
if (is_ignored_symbol(name, type))
return NULL;
/* Ignore most absolute/undefined (?) symbols. */
if (strcmp(name, "_text") == 0)
_text = addr;
But the is_ignored_symbol function returns true for name="_text" and
type='A'. So the next condition is not executed and the _text variable
is always zero.
It makes the wrong kallsyms_relative_base symbol as a result of the code
(CONFIG_KALLSYMS_BASE_RELATIVE is defined):
if (base_relative) {
output_label("kallsyms_relative_base");
output_address(relative_base);
printf("\n");
}
Because the output_address function uses the _text variable.
So the kallsyms_lookup function and all related functions in the kernel
do not work properly. For example, the stack trace in oops:
Call Trace:
[aa095e58] [809feab8] kobj_ns_ops_tbl+0x7ff09ac8/0x7ff1c1c4 (unreliable)
[aa095e98] [80002b64] kobj_ns_ops_tbl+0x7f50db74/0x80000010
[aa095ef8] [809c3d24] kobj_ns_ops_tbl+0x7feced34/0x7ff1c1c4
[aa095f28] [80002ed0] kobj_ns_ops_tbl+0x7f50dee0/0x80000010
[aa095f38] [8000f238] kobj_ns_ops_tbl+0x7f51a248/0x80000010
The right stack trace:
Call Trace:
[aa095e58] [809feab8] module_vdu_video_init+0x2fc/0x3bc (unreliable)
[aa095e98] [80002b64] do_one_initcall+0x40/0x1f0
[aa095ef8] [809c3d24] kernel_init_freeable+0x164/0x1d8
[aa095f28] [80002ed0] kernel_init+0x14/0x124
[aa095f38] [8000f238] ret_from_kernel_thread+0x14/0x1c
[masahiroy@kernel.org:
This issue happens on binutils <= 2.22
The following commit fixed it:
https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=d2667025dd30611514810c28bee9709e4623012a
The symbol type of _text is 'T' on binutils >= 2.23
The minimal supported binutils version for the kernel build is 2.21
]
Signed-off-by: Mikhail Petrov <Mikhail.Petrov@mir.dev>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Enabling KASLR forces the use of non-global page-table entries for kernel
mappings, as this is a decision that we have to make very early on before
mapping the kernel proper. When used in conjunction with the "kpti=off"
command-line option, it is possible to use non-global kernel mappings but
with the kpti trampoline disabled.
Since commit 09e3c22a86 ("arm64: Use a variable to store non-global
mappings decision"), arm64_kernel_unmapped_at_el0() reflects only the use of
non-global mappings and does not take into account whether the kpti
trampoline is enabled. This breaks context switching of the TPIDRRO_EL0
register for 64-bit tasks, where the clearing of the register is deferred to
the ret-to-user code, but it also breaks the ARM SPE PMU driver which
helpfully recommends passing "kpti=off" on the command line!
Report whether or not KPTI is actually enabled in
arm64_kernel_unmapped_at_el0() and check the 'arm64_use_ng_mappings' global
variable directly when determining the protection flags for kernel mappings.
Cc: Mark Brown <broonie@kernel.org>
Reported-by: Hongbo Yao <yaohongbo@huawei.com>
Tested-by: Hongbo Yao <yaohongbo@huawei.com>
Fixes: 09e3c22a86 ("arm64: Use a variable to store non-global mappings decision")
Signed-off-by: Will Deacon <will@kernel.org>
There is measurable performance impact in some synthetic tests due to
commit 6d390e4b5d (locks: fix a potential use-after-free problem when
wakeup a waiter). Fix the race condition instead by clearing the
fl_blocker pointer after the wake_up, using explicit acquire/release
semantics.
This does mean that we can no longer use the clearing of fl_blocker as
the wait condition, so switch the waiters over to checking whether the
fl_blocked_member list_head is empty.
Reviewed-by: yangerkun <yangerkun@huawei.com>
Reviewed-by: NeilBrown <neilb@suse.de>
Fixes: 6d390e4b5d (locks: fix a potential use-after-free problem when wakeup a waiter)
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(uint16_t) st_shndx is limited to 65535(i.e. SHN_XINDEX) so sym_get_data() gets
wrong section index by st_shndx if requested symbol contains extended section
index that is more than 65535. In this case, we need to get proper section index
by .symtab_shndx section.
Module.symvers generated by building kernel with "-ffunction-sections -fdata-sections"
shows the issue.
Fixes: 56067812d5 ("kbuild: modversions: add infrastructure for emitting relative CRCs")
Fixes: e84f9fbbec ("modpost: refactor namespace_from_kstrtabns() to not hard-code section name")
Signed-off-by: Xiao Yang <yangx.jy@cn.fujitsu.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
This is just a cleanup addition to Jann's fix to properly update the
transaction ID for the slub slowpath in commit fd4d9c7d0c ("mm: slub:
add missing TID bump..").
The transaction ID is what protects us against any concurrent accesses,
but we should really also make sure to make the 'freelist' comparison
itself always use the same freelist value that we then used as the new
next free pointer.
Jann points out that if we do all of this carefully, we could skip the
transaction ID update for all the paths that only remove entries from
the lists, and only update the TID when adding entries (to avoid the ABA
issue with cmpxchg and list handling re-adding a previously seen value).
But this patch just does the "make sure to cmpxchg the same value we
used" rather than then try to be clever.
Acked-by: Jann Horn <jannh@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When kmem_cache_alloc_bulk() attempts to allocate N objects from a percpu
freelist of length M, and N > M > 0, it will first remove the M elements
from the percpu freelist, then call ___slab_alloc() to allocate the next
element and repopulate the percpu freelist. ___slab_alloc() can re-enable
IRQs via allocate_slab(), so the TID must be bumped before ___slab_alloc()
to properly commit the freelist head change.
Fix it by unconditionally bumping c->tid when entering the slowpath.
Cc: stable@vger.kernel.org
Fixes: ebe909e0fd ("slub: improve bulk alloc strategy")
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 7765435030 ("take compat TIOC[SG]SERIAL treatment into
tty_compat_ioctl()") changed the compat version of TIOCGSERIAL to start
checking for the presence of the ->set_serial function pointer rather
than ->get_serial. This appears to be a copy-and-paste error, since
->get_serial is the function pointer that is called as well as the
pointer that is checked by the non-compat version of TIOCGSERIAL.
Fix this by checking the correct function pointer.
Fixes: 7765435030 ("take compat TIOC[SG]SERIAL treatment into tty_compat_ioctl()")
Cc: <stable@vger.kernel.org> # v4.20+
Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Jiri Slaby <jslaby@suse.cz>
Link: https://lore.kernel.org/r/20200224182044.234553-3-ebiggers@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Jonathan writes:
First set of IIO fixes in the 5.6 cycle.
* adxl372
- Fix marking of buffered values as big endian.
* ak8974
- Fix wrong handling of negative values when read from sysfs.
* at91-sama5d2
- Fix differential mode by ensuring configuration set correctly.
* ping
- Use the write sensor type for of_ping_match table.
* sps30
- Kconfig build dependency fix.
* st-sensors
- Fix a wrong identification of which part the SMO8840 ACPI ID indicates.
* stm32-dsfdm
- Fix a sleep in atomic issue by not using a trigger when it makes no sense.
* stm32-timer
- Make sure master mode is disabled when stopping.
* vcnl400
- Update some sampling periods based on new docs.
* tag 'iio-fixes-for-5.6a' of git://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio:
iio: ping: set pa_laser_ping_cfg in of_ping_match
iio: chemical: sps30: fix missing triggered buffer dependency
iio: st_sensors: remap SMO8840 to LIS2DH12
iio: light: vcnl4000: update sampling periods for vcnl4040
iio: light: vcnl4000: update sampling periods for vcnl4200
iio: accel: adxl372: Set iio_chan BE
iio: magnetometer: ak8974: Fix negative raw values in sysfs
iio: trigger: stm32-timer: disable master mode when stopping
iio: adc: stm32-dfsdm: fix sleep in atomic context
iio: adc: at91-sama5d2_adc: fix differential channels in triggered mode
Johan writes:
USB-serial fixes for 5.6-rc7
Here are a couple of new device ids for 5.6-rc.
All have been in linux-next with no reported issues.
* tag 'usb-serial-5.6-rc7' of https://git.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial:
USB: serial: pl2303: add device-id for HP LD381
USB: serial: option: add ME910G1 ECM composition 0x110b
On a system configured to trigger a crash_kexec() reboot, when only one CPU
is online and another CPU panics while starting-up, crash_smp_send_stop()
will fail to send any STOP message to the other already online core,
resulting in fail to freeze and registers not properly saved.
Moreover even if the proper messages are sent (case CPUs > 2)
it will similarly fail to account for the booting CPU when executing
the final stop wait-loop, so potentially resulting in some CPU not
been waited for shutdown before rebooting.
A tangible effect of this behaviour can be observed when, after a panic
with kexec enabled and loaded, on the following reboot triggered by kexec,
the cpu that could not be successfully stopped fails to come back online:
[ 362.291022] ------------[ cut here ]------------
[ 362.291525] kernel BUG at arch/arm64/kernel/cpufeature.c:886!
[ 362.292023] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
[ 362.292400] Modules linked in:
[ 362.292970] CPU: 3 PID: 0 Comm: swapper/3 Kdump: loaded Not tainted 5.6.0-rc4-00003-gc780b890948a #105
[ 362.293136] Hardware name: Foundation-v8A (DT)
[ 362.293382] pstate: 200001c5 (nzCv dAIF -PAN -UAO)
[ 362.294063] pc : has_cpuid_feature+0xf0/0x348
[ 362.294177] lr : verify_local_elf_hwcaps+0x84/0xe8
[ 362.294280] sp : ffff800011b1bf60
[ 362.294362] x29: ffff800011b1bf60 x28: 0000000000000000
[ 362.294534] x27: 0000000000000000 x26: 0000000000000000
[ 362.294631] x25: 0000000000000000 x24: ffff80001189a25c
[ 362.294718] x23: 0000000000000000 x22: 0000000000000000
[ 362.294803] x21: ffff8000114aa018 x20: ffff800011156a00
[ 362.294897] x19: ffff800010c944a0 x18: 0000000000000004
[ 362.294987] x17: 0000000000000000 x16: 0000000000000000
[ 362.295073] x15: 00004e53b831ae3c x14: 00004e53b831ae3c
[ 362.295165] x13: 0000000000000384 x12: 0000000000000000
[ 362.295251] x11: 0000000000000000 x10: 00400032b5503510
[ 362.295334] x9 : 0000000000000000 x8 : ffff800010c7e204
[ 362.295426] x7 : 00000000410fd0f0 x6 : 0000000000000001
[ 362.295508] x5 : 00000000410fd0f0 x4 : 0000000000000000
[ 362.295592] x3 : 0000000000000000 x2 : ffff8000100939d8
[ 362.295683] x1 : 0000000000180420 x0 : 0000000000180480
[ 362.296011] Call trace:
[ 362.296257] has_cpuid_feature+0xf0/0x348
[ 362.296350] verify_local_elf_hwcaps+0x84/0xe8
[ 362.296424] check_local_cpu_capabilities+0x44/0x128
[ 362.296497] secondary_start_kernel+0xf4/0x188
[ 362.296998] Code: 52805001 72a00301 6b01001f 54000ec0 (d4210000)
[ 362.298652] SMP: stopping secondary CPUs
[ 362.300615] Starting crashdump kernel...
[ 362.301168] Bye!
[ 0.000000] Booting Linux on physical CPU 0x0000000003 [0x410fd0f0]
[ 0.000000] Linux version 5.6.0-rc4-00003-gc780b890948a (crimar01@e120937-lin) (gcc version 8.3.0 (GNU Toolchain for the A-profile Architecture 8.3-2019.03 (arm-rel-8.36))) #105 SMP PREEMPT Fri Mar 6 17:00:42 GMT 2020
[ 0.000000] Machine model: Foundation-v8A
[ 0.000000] earlycon: pl11 at MMIO 0x000000001c090000 (options '')
[ 0.000000] printk: bootconsole [pl11] enabled
.....
[ 0.138024] rcu: Hierarchical SRCU implementation.
[ 0.153472] its@2f020000: unable to locate ITS domain
[ 0.154078] its@2f020000: Unable to locate ITS domain
[ 0.157541] EFI services will not be available.
[ 0.175395] smp: Bringing up secondary CPUs ...
[ 0.209182] psci: failed to boot CPU1 (-22)
[ 0.209377] CPU1: failed to boot: -22
[ 0.274598] Detected PIPT I-cache on CPU2
[ 0.278707] GICv3: CPU2: found redistributor 1 region 0:0x000000002f120000
[ 0.285212] CPU2: Booted secondary processor 0x0000000001 [0x410fd0f0]
[ 0.369053] Detected PIPT I-cache on CPU3
[ 0.372947] GICv3: CPU3: found redistributor 2 region 0:0x000000002f140000
[ 0.378664] CPU3: Booted secondary processor 0x0000000002 [0x410fd0f0]
[ 0.401707] smp: Brought up 1 node, 3 CPUs
[ 0.404057] SMP: Total of 3 processors activated.
Make crash_smp_send_stop() account also for the online status of the
calling CPU while evaluating how many CPUs are effectively online: this way
the right number of STOPs is sent and all other stopped-cores's registers
are properly saved.
Fixes: 78fd584cde ("arm64: kdump: implement machine_crash_shutdown()")
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Cristian Marussi <cristian.marussi@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
We have been receiving bug reports that ethernet connections over
RTL8153 based ethernet adapters stops working after a while with
errors like these showing up in dmesg when the ethernet stops working:
[12696.189484] r8152 6-1:1.0 enp10s0u1: Tx timeout
[12702.333456] r8152 6-1:1.0 enp10s0u1: Tx timeout
[12707.965422] r8152 6-1:1.0 enp10s0u1: Tx timeout
This has been reported on Dell WD15 docks, Belkin USB-C Express Dock 3.1
docks and with generic USB to ethernet dongles using the RTL8153
chipsets. Some users have tried adding usbcore.quirks=0bda:8153:k to
the kernel commandline and all users who have tried this report that
this fixes this.
Also note that we already have an existing NO_LPM quirk for the RTL8153
used in the Microsoft Surface Dock (where it uses a different usb-id).
This commit adds a NO_LPM quirk for the generic Realtek RTL8153
0bda:8153 usb-id, fixing the Tx timeout errors on these devices.
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=198931
Cc: stable@vger.kernel.org
Cc: russianneuromancer@ya.ru
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20200313120708.100339-1-hdegoede@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If we call fiemap on a truncated file with none blocks allocated,
it makes sense we get nothing from this call. No output means
no blocks have been counted, but the call succeeded. It's a valid
response.
Simple example reproducer:
xfs_io -f 'truncate 2M' -c 'fiemap -v' /cifssch/testfile
xfs_io: ioctl(FS_IOC_FIEMAP) ["/cifssch/testfile"]: Invalid argument
Signed-off-by: Murphy Zhou <jencce.kernel@gmail.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
CC: Stable <stable@vger.kernel.org>
The num_remote_opens counter keeps track of the number of open files which must be
maintained by the server at any point. This is a per-tree-connect counter, and the value
of this counter gets displayed in the /proc/fs/cifs/Stats output as a following...
Open files: 0 total (local), 1 open on server
^^^^^^^^^^^^^^^^
As a thumb-rule, we want to increment this counter for each open/create that we
successfully execute on the server. Similarly, we should decrement the counter when
we successfully execute a close.
In this case, an increment was being missed in case of smb2_query_dir_first,
in case of successful open. As a result, we would underflow the counter and we
could even see the counter go to negative after sufficient smb2_query_dir_first calls.
I tested the stats counter for a bunch of filesystem operations with the fix.
And it looks like the counter looks correct to me.
I also check if we missed the increments and decrements elsewhere. It does not
seem so. Few other cases where an open is done and we don't increment the counter are
the compound calls where the corresponding close is also sent in the request.
Signed-off-by: Shyam Prasad N <nspmangalore@gmail.com>
CC: Stable <stable@vger.kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Smatch complains that "rc" could be uninitialized.
fs/cifs/inode.c:2206 cifs_getattr() error: uninitialized symbol 'rc'.
Changing it to "return 0;" improves readability as well.
Fixes: cc1baf98c8f6 ("cifs: do not ignore the SYNC flags in getattr")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Acked-by: Ronnie Sahlberg <lsahlber@redhat.com>
Pull HID fixes from Jiri Kosina:
- string buffer formatting fixes in picolcd and sensor drivers, from
Takashi Iwai
- two new device IDs from Chen-Tsung Hsieh and Tony Fischetti
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
HID: add ALWAYS_POLL quirk to lenovo pixart mouse
HID: google: add moonball USB id
HID: hid-sensor-custom: Use scnprintf() for avoiding potential buffer overflow
HID: hid-picolcd_fb: Use scnprintf() for avoiding potential buffer overflow
Newer GCC warns about possible truncations of two generated path names as
we're concatenating the configurable sysfs and debugfs path prefixes
with a filename and placing the results in buffers of the same size as
the maximum length of the prefixes.
snprintf(d->name, MAX_STR_LEN, "gb_loopback%u", dev_id);
snprintf(d->sysfs_entry, MAX_SYSFS_PATH, "%s%s/",
t->sysfs_prefix, d->name);
snprintf(d->debugfs_entry, MAX_SYSFS_PATH, "%sraw_latency_%s",
t->debugfs_prefix, d->name);
Fix this by separating the maximum path length from the maximum prefix
length and reducing the latter enough to fit the generated strings.
Note that we also need to reduce the device-name buffer size as GCC
isn't smart enough to figure out that we ever only used MAX_STR_LEN
bytes of it.
Fixes: 6b0658f687 ("greybus: tools: Add tools directory to greybus repo and add loopback")
Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://lore.kernel.org/r/20200312110151.22028-4-johan@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Newer GCC warns about a possible truncation of a generated sysfs path
name as we're concatenating a directory path with a file name and
placing the result in a buffer that is half the size of the maximum
length of the directory path (which is user controlled).
loopback_test.c: In function 'open_poll_files':
loopback_test.c:651:31: warning: '%s' directive output may be truncated writing up to 511 bytes into a region of size 255 [-Wformat-truncation=]
651 | snprintf(buf, sizeof(buf), "%s%s", dev->sysfs_entry, "iteration_count");
| ^~
loopback_test.c:651:3: note: 'snprintf' output between 16 and 527 bytes into a destination of size 255
651 | snprintf(buf, sizeof(buf), "%s%s", dev->sysfs_entry, "iteration_count");
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Fix this by making sure the buffer is large enough the concatenated
strings.
Fixes: 6b0658f687 ("greybus: tools: Add tools directory to greybus repo and add loopback")
Fixes: 9250c0ee26 ("greybus: Loopback_test: use poll instead of inotify")
Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://lore.kernel.org/r/20200312110151.22028-3-johan@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The SDHCI_PRESET_FOR_* registers are not set for the UniPhier platform
integration. (They are all read as zeros).
Set the SDHCI_QUIRK2_PRESET_VALUE_BROKEN quirk flag. Otherwise, the
High Speed DDR mode on the eMMC controller (MMC_TIMING_MMC_DDR52)
would not work.
I split the platform data to give no impact to other platforms,
although the UniPhier platform is currently only the upstream user
of this IP.
The SDHCI_QUIRK2_PRESET_VALUE_BROKEN flag is set if the compatible
string matches to "socionext,uniphier-sd4hc".
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20200312104257.21017-1-yamada.masahiro@socionext.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
On the Acer Aspire Switch 10 (SW5-012) microSD slot always reports the card
being write-protected even though microSD cards do not have a write-protect
switch at all.
Add a new DMI_QUIRK_SD_NO_WRITE_PROTECT quirk which when set sets
the MMC_CAP2_NO_WRITE_PROTECT flag on the controller for the external SD
slot; and add a DMI quirk table entry which selects this quirk for the
Acer SW5-012.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20200316184753.393458-2-hdegoede@redhat.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Based on a sample of 7 DSDTs from Cherry Trail devices using an AXP288
PMIC depending on the design one of 2 possible LDOs on the PMIC is used
for the MMC signalling voltage, either DLDO3 or GPIO1LDO (GPIO1 pin in
low noise LDO mode).
The Lenovo Miix 320-10ICR uses GPIO1LDO in the SHC1 ACPI device's DSM
methods to set 3.3 or 1.8 signalling voltage and this appears to work
as advertised, so presumably the device is actually using GPIO1LDO for
the external microSD signalling voltage.
But this device has a bug in the _PS0 method of the SHC1 ACPI device,
the DSM remembers the last set signalling voltage and the _PS0 restores
this after a (runtime) suspend-resume cycle, but it "restores" the voltage
on DLDO3 instead of setting it on GPIO1LDO as the DSM method does. DLDO3
is used for the LCD and setting it to 1.8V causes the LCD to go black.
This commit works around this issue by calling the Intel DSM to reset the
signal voltage to 3.3V after the host has been runtime suspended.
This will make the _PS0 method reprogram the DLDO3 voltage to 3.3V, which
leaves it at its original setting fixing the LCD going black.
This commit adds and uses a DMI quirk mechanism to only trigger this
workaround on the Lenovo Miix 320 while leaving the behavior of the
driver unchanged on other devices.
BugLink: https://bugs.freedesktop.org/show_bug.cgi?id=111294
BugLink: https://gitlab.freedesktop.org/drm/intel/issues/355
Reported-by: russianneuromancer <russianneuromancer@ya.ru>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20200316184753.393458-1-hdegoede@redhat.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Even though INITRAMFS_SOURCE kconfig option isn't set in most of
defconfigs it is used (set) extensively by various build systems.
Commit f26661e127 ("initramfs: make initramfs compression choice
non-optional") has changed default compression mode. Previously we
compress initramfs using available compression algorithm. Now
we don't use any compression at all by default.
It significantly increases the image size in case of build system
chooses embedded initramfs. Initially I faced with this issue while
using buildroot.
As of today it's not possible to set preferred compression mode
in target defconfig as this option depends on INITRAMFS_SOURCE
being set. Modification of all build systems either doesn't look
like good option.
Let's instead rewrite initramfs compression mode choices list
the way that "INITRAMFS_COMPRESSION_NONE" will be the last option
in the list. In that case it will be chosen only if all other
options (which implements any compression) are not available.
Signed-off-by: Eugeniy Paltsev <Eugeniy.Paltsev@synopsys.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
In order to preserve backwards compatability with kmod tools, we have to
move the namespace field in Module.symvers last, as the depmod -e -E
option looks at the first three fields in Module.symvers to check symbol
versions (and it's expected they stay in the original order of crc,
symbol, module).
In addition, update an ancient comment above read_dump() in modpost that
suggested that the export type field in Module.symvers was optional. I
suspect that there were historical reasons behind that comment that are
no longer accurate. We have been unconditionally printing the export
type since 2.6.18 (commit bd5cbcedf4), which is over a decade ago now.
Fix up read_dump() to treat each field as non-optional. I suspect the
original read_dump() code treated the export field as optional in order
to support pre <= 2.6.18 Module.symvers (which did not have the export
type field). Note that although symbol namespaces are optional, the
field will not be omitted from Module.symvers if a symbol does not have
a namespace. In this case, the field will simply be empty and the next
delimiter or end of line will follow.
Cc: stable@vger.kernel.org
Fixes: cb9b55d21f ("modpost: add support for symbol namespaces")
Tested-by: Matthias Maennich <maennich@google.com>
Reviewed-by: Matthias Maennich <maennich@google.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Jessica Yu <jeyu@kernel.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
A lenovo pixart mouse (17ef:608d) is afflicted common the the malfunction
where it disconnects and reconnects every minute--each time incrementing
the device number. This patch adds the device id of the device and
specifies that it needs the HID_QUIRK_ALWAYS_POLL quirk in order to
work properly.
Signed-off-by: Tony Fischetti <tony.fischetti@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Pull irq fix from Thomas Gleixner:
"A single commit to handle an erratum in Cavium ThunderX to prevent
access to GIC registers which are broken in the implementation"
* tag 'irq-urgent-2020-03-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/gic-v3: Workaround Cavium erratum 38539 when reading GICD_TYPER2
Pull futex fix from Thomas Gleixner:
"Fix for yet another subtle futex issue.
The futex code used ihold() to prevent inodes from vanishing, but
ihold() does not guarantee inode persistence. Replace the inode
pointer with a per boot, machine wide, unique inode identifier.
The second commit fixes the breakage of the hash mechanism which
causes a 100% performance regression"
* tag 'locking-urgent-2020-03-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
futex: Unbreak futex hashing
futex: Fix inode life-time issue
Pull x86 fixes from Thomas Gleixner:
"Two fixes for x86:
- Map EFI runtime service data as encrypted when SEV is enabled.
Otherwise e.g. SMBIOS data cannot be properly decoded by dmidecode.
- Remove the warning in the vector management code which triggered
when a managed interrupt affinity changed outside of a CPU hotplug
operation.
The warning was correct until the recent core code change that
introduced a CPU isolation feature which needs to migrate managed
interrupts away from online CPUs under certain conditions to
achieve the isolation"
* tag 'x86-urgent-2020-03-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/vector: Remove warning on managed interrupt migration
x86/ioremap: Map EFI runtime services data as encrypted for SEV
Pull perf fixes from Thomas Gleixner:
"A pile of perf fixes:
Kernel side:
- AMD uncore driver: Replace the open coded sanity check with the
core variant, which provides the correct error code and also leaves
a hint in dmesg
Tooling:
- Fix the stdio input handling with glibc versions >= 2.28
- Unbreak the futex-wake benchmark which was reduced to 0 test
threads due to the conversion to cpumaps
- Initialize sigaction structs before invoking sys_sigactio()
- Plug the mapfile memory leak in perf jevents
- Fix off by one relative directory includes
- Fix an undefined string comparison in perf diff"
* tag 'perf-urgent-2020-03-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/amd/uncore: Replace manual sampling check with CAP_NO_INTERRUPT flag
tools: Fix off-by 1 relative directory includes
perf jevents: Fix leak of mapfile memory
perf bench: Clear struct sigaction before sigaction() syscall
perf bench futex-wake: Restore thread count default to online CPU count
perf top: Fix stdio interface input handling with glibc 2.28+
perf diff: Fix undefined string comparision spotted by clang's -Wstring-compare
perf symbols: Don't try to find a vmlinux file when looking for kernel modules
perf bench: Share some global variables to fix build with gcc 10
perf parse-events: Use asprintf() instead of strncpy() to read tracepoint files
perf env: Do not return pointers to local variables
perf tests bp_account: Make global variable static
Pull timer fix from Thomas Gleixner:
"A single fix adding the missing time namespace adjustment in
sys/sysinfo which caused sys/sysinfo to be inconsistent with
/proc/uptime when read from a task inside a time namespace"
* tag 'timers-urgent-2020-03-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sys/sysinfo: Respect boottime inside time namespace
Pull RAS fixes from Thomas Gleixner:
"Two RAS related fixes:
- Shut down the per CPU thermal throttling poll work properly when a
CPU goes offline.
The missing shutdown caused the poll work to be migrated to a
unbound worker which triggered warnings about the usage of
smp_processor_id() in preemptible context
- Fix the PPIN feature initialization which missed to enable the
functionality when PPIN_CTL was enabled but the MSR locked against
updates"
* tag 'ras-urgent-2020-03-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mce: Fix logic and comments around MSR_PPIN_CTL
x86/mce/therm_throt: Undo thermal polling properly on CPU offline
Pull EFI fixes from Thomas Gleixner:
"Two EFI fixes:
- Prevent a race and buffer overflow in the sysfs efivars interface
which causes kernel memory corruption.
- Add the missing NULL pointer checks in efivar_store_raw()"
* tag 'efi-urgent-2020-03-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
efi: Add a sanity check to efivar_store_raw()
efi: Fix a race and a buffer overflow while reading efivars via sysfs
Pull IOMMU fixes from Joerg Roedel:
- Intel VT-d fixes:
- RCU list handling fixes
- Replace WARN_TAINT with pr_warn + add_taint for reporting firmware
issues
- DebugFS fixes
- Fix for hugepage handling in iova_to_phys implementation
- Fix for handling VMD devices, which have a domain number which
doesn't fit into 16 bits
- Warning message fix
- MSI allocation fix for iommu-dma code
- Sign-extension fix for io page-table code
- Fix for AMD-Vi to properly update the is-running bit when AVIC is
used
* tag 'iommu-fixes-v5.6-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu/vt-d: Populate debugfs if IOMMUs are detected
iommu/amd: Fix IOMMU AVIC not properly update the is_run bit in IRTE
iommu/vt-d: Ignore devices with out-of-spec domain number
iommu/vt-d: Fix the wrong printing in RHSA parsing
iommu/vt-d: Fix debugfs register reads
iommu/vt-d: quirk_ioat_snb_local_iommu: replace WARN_TAINT with pr_warn + add_taint
iommu/vt-d: dmar_parse_one_rmrr: replace WARN_TAINT with pr_warn + add_taint
iommu/vt-d: dmar: replace WARN_TAINT with pr_warn + add_taint
iommu/vt-d: Silence RCU-list debugging warnings
iommu/vt-d: Fix RCU-list bugs in intel_iommu_init()
iommu/dma: Fix MSI reservation allocation
iommu/io-pgtable-arm: Fix IOVA validation for 32-bit
iommu/vt-d: Fix a bug in intel_iommu_iova_to_phys() for huge page
iommu/vt-d: Fix RCU list debugging warnings
Processing links, io_submit_sqe() prepares requests, drops sqes, and
passes them with sqe=NULL to io_queue_sqe(). There IOSQE_DRAIN and/or
IOSQE_ASYNC requests will go through the same prep, which doesn't expect
sqe=NULL and fail with NULL pointer deference.
Always do full prepare including io_alloc_async_ctx() for linked
requests, and then it can skip the second preparation.
Cc: stable@vger.kernel.org # 5.5
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pull i2c fixes from Wolfram Sang:
"I2C has quite some regression fixes this time.
One is also related to watchdogs, we have proper acks from Guenter for
them"
* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: acpi: put device when verifying client fails
misc: eeprom: at24: fix regulator underflow
i2c: gpio: suppress error on probe defer
macintosh: windfarm: fix MODINFO regression
i2c: designware-pci: Fix BUG_ON during device removal
i2c: i801: Do not add ICH_RES_IO_SMI for the iTCO_wdt device
watchdog: iTCO_wdt: Make ICH_RES_IO_SMI optional
watchdog: iTCO_wdt: Export vendorsupport
Pull ARC fixes from Vineet Gupta:
- Fix __ALIGN_STR and __ALIGN to not use default junk padding
- Misc Kconfig cleanups, header updates
* tag 'arc-5.6-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
ARC: define __ALIGN_STR and __ALIGN symbols for ARC
ARC: show_regs: reduce lines of output
ARC: Replace <linux/clk-provider.h> by <linux/of_clk.h>
ARC: fpu: fix randconfig build error reported by 0-day test service
ARC: fix some Kconfig typos
ARC: Cleanup old Kconfig IO scheduler options
Pull kvm fixes from Paolo Bonzini:
"Bugfixes for x86 and s390"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: nVMX: avoid NULL pointer dereference with incorrect EVMCS GPAs
KVM: x86: Initializing all kvm_lapic_irq fields in ioapic_write_indirect
KVM: VMX: Condition ENCLS-exiting enabling on CPU support for SGX1
KVM: s390: Also reset registers in sync regs for initial cpu reset
KVM: fix Kconfig menu text for -Werror
KVM: x86: remove stale comment from struct x86_emulate_ctxt
KVM: x86: clear stale x86_emulate_ctxt->intercept value
KVM: SVM: Fix the svm vmexit code for WRMSR
KVM: X86: Fix dereference null cpufreq policy
Currently, the intel iommu debugfs directory(/sys/kernel/debug/iommu/intel)
gets populated only when DMA remapping is enabled (dmar_disabled = 0)
irrespective of whether interrupt remapping is enabled or not.
Instead, populate the intel iommu debugfs directory if any IOMMUs are
detected.
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: ee2636b867 ("iommu/vt-d: Enable base Intel IOMMU debugfs support")
Signed-off-by: Megha Dey <megha.dey@linux.intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Pull clk fixes from Stephen Boyd:
"A small collection of fixes. I'll make another sweep soon to look for
more fixes for this -rc series.
- Mark device node const in of_clk_get_parent APIs to ease landing
changes in users later
- Fix flag for Qualcomm SC7180 video clocks where we thought it would
never turn off but actually hardware takes care of it
- Remove disp_cc_mdss_rscc_ahb_clk on Qualcomm SC7180 SoCs because
this clk is always on anyway
- Correct some bad dt-binding numbers for i.MX8MN SoCs"
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: imx8mn: Fix incorrect clock defines
clk: qcom: dispcc: Remove support of disp_cc_mdss_rscc_ahb_clk
clk: qcom: videocc: Update the clock flag for video_cc_vcodec0_core_clk
of: clk: Make of_clk_get_parent_{count,name}() parameter const
When an EVMCS enabled L1 guest on KVM will tries doing enlightened VMEnter
with EVMCS GPA = 0 the host crashes because the
evmcs_gpa != vmx->nested.hv_evmcs_vmptr
condition in nested_vmx_handle_enlightened_vmptrld() will evaluate to
false (as nested.hv_evmcs_vmptr is zeroed after init). The crash will
happen on vmx->nested.hv_evmcs pointer dereference.
Another problematic EVMCS ptr value is '-1' but it only causes host crash
after nested_release_evmcs() invocation. The problem is exactly the same as
with '0', we mistakenly think that the EVMCS pointer hasn't changed and
thus nested.hv_evmcs_vmptr is valid.
Resolve the issue by adding an additional !vmx->nested.hv_evmcs
check to nested_vmx_handle_enlightened_vmptrld(), this way we will
always be trying kvm_vcpu_map() when nested.hv_evmcs is NULL
and this is supposed to catch all invalid EVMCS GPAs.
Also, initialize hv_evmcs_vmptr to '0' in nested_release_evmcs()
to be consistent with initialization where we don't currently
set hv_evmcs_vmptr to '-1'.
Cc: stable@vger.kernel.org
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
KVM: s390: Fully do the CPU resets as intended
With 7de3f1423f ("KVM: s390: Add new reset vcpu API") we clarified
the meaning of the reset ioctl to fully reset the CPU and not only the
parts that can not be handled by userspace. Turns out that we missed
some parts.
Previously all fields of structure kvm_lapic_irq were not initialized
before it was passed to kvm_bitmap_or_dest_vcpus(). Which will cause
an issue when any of those fields are used for processing a request.
For example not initializing the msi_redir_hint field before passing
to the kvm_bitmap_or_dest_vcpus(), may lead to a misbehavior of
kvm_apic_map_get_dest_lapic(). This will specifically happen when the
kvm_lowest_prio_delivery() returns TRUE due to a non-zero garbage
value of msi_redir_hint, which should not happen as the request belongs
to APIC fixed delivery mode and we do not want to deliver the
interrupt only to the lowest priority candidate.
This patch initializes all the fields of kvm_lapic_irq based on the
values of ioapic redirect_entry object before passing it on to
kvm_bitmap_or_dest_vcpus().
Fixes: 7ee30bc132 ("KVM: x86: deliver KVM IOAPIC scan request to target vCPUs")
Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
[Set level to false since the value doesn't really matter. Suggested
by Vitaly Kuznetsov. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Enable ENCLS-exiting (and thus set vmcs.ENCLS_EXITING_BITMAP) only if
the CPU supports SGX1. Per Intel's SDM, all ENCLS leafs #UD if SGX1
is not supported[*], i.e. intercepting ENCLS to inject a #UD is
unnecessary.
Avoiding ENCLS-exiting even when it is reported as supported by the CPU
works around a reported issue where SGX is "hard" disabled after an S3
suspend/resume cycle, i.e. CPUID.0x7.SGX=0 and the VMCS field/control
are enumerated as unsupported. While the root cause of the S3 issue is
unknown, it's definitely _not_ a KVM (or kernel) bug, i.e. this is a
workaround for what is most likely a hardware or firmware issue. As a
bonus side effect, KVM saves a VMWRITE when first preparing vmcs01 and
vmcs02.
Note, SGX must be disabled in BIOS to take advantage of this workaround
[*] The additional ENCLS CPUID check on SGX1 exists so that SGX can be
globally "soft" disabled post-reset, e.g. if #MC bits in MCi_CTL are
cleared. Soft disabled meaning disabling SGX without clearing the
primary CPUID bit (in leaf 0x7) and without poking into non-SGX
CPU paths, e.g. for the VMCS controls.
Fixes: 0b665d3040 ("KVM: vmx: Inject #UD for SGX ENCLS instruction in guest")
Reported-by: Toni Spets <toni.spets@iki.fi>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Commit b9c6ff94e4 ("iommu/amd: Re-factor guest virtual APIC
(de-)activation code") accidentally left out the ir_data pointer when
calling modity_irte_ga(), which causes the function amd_iommu_update_ga()
to return prematurely due to struct amd_ir_data.ref is NULL and
the "is_run" bit of IRTE does not get updated properly.
This results in bad I/O performance since IOMMU AVIC always generate GA Log
entry and notify IOMMU driver and KVM when it receives interrupt from the
PCI pass-through device instead of directly inject interrupt to the vCPU.
Fixes by passing ir_data when calling modify_irte_ga() as done previously.
Fixes: b9c6ff94e4 ("iommu/amd: Re-factor guest virtual APIC (de-)activation code")
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
VMD subdevices are created with a PCI domain ID of 0x10000 or
higher.
These subdevices are also handled like all other PCI devices by
dmar_pci_bus_notifier().
However, when dmar_alloc_pci_notify_info() take records of such devices,
it will truncate the domain ID to a u16 value (in info->seg).
The device at (e.g.) 10000:00:02.0 is then treated by the DMAR code as if
it is 0000:00:02.0.
In the unlucky event that a real device also exists at 0000:00:02.0 and
also has a device-specific entry in the DMAR table,
dmar_insert_dev_scope() will crash on:
BUG_ON(i >= devices_cnt);
That's basically a sanity check that only one PCI device matches a
single DMAR entry; in this case we seem to have two matching devices.
Fix this by ignoring devices that have a domain number higher than
what can be looked up in the DMAR table.
This problem was carefully diagnosed by Jian-Hong Pan.
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Daniel Drake <drake@endlessm.com>
Fixes: 59ce0515cd ("iommu/vt-d: Update DRHD/RMRR/ATSR device scope caches when PCI hotplug happens")
Signed-off-by: Joerg Roedel <jroedel@suse.de>
When base address in RHSA structure doesn't match base address in
each DRHD structure, the base address in last DRHD is printed out.
This doesn't make sense when there are multiple DRHD units, fix it
by printing the buggy RHSA's base address.
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@gmail.com>
Fixes: fd0c889489 ("intel-iommu: Set a more specific taint flag for invalid BIOS DMAR tables")
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Pull SCSI fixes from James Bottomley:
"Two small fixes, both in drivers: ipr and ufs"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: ipr: Fix softlockup when rescanning devices in petitboot
scsi: ufs: Fix possible unclocked access to auto hibern8 timer register
Pull NFS client bugfixes from Anna Schumaker:
"These are mostly fscontext fixes, but there is also one that fixes
collisions seen in fscache:
- Ensure the fs_context has the correct fs_type when mounting and
submounting
- Fix leaking of ctx->nfs_server.hostname
- Add minor version to fscache key to prevent collisions"
* tag 'nfs-for-5.6-3' of git://git.linux-nfs.org/projects/anna/linux-nfs:
nfs: add minor version to nfs_server_key for fscache
NFS: Fix leak of ctx->nfs_server.hostname
NFS: Don't hard-code the fs_type when submounting
NFS: Ensure the fs_context has the correct fs_type before mounting
Pull fuse fix from Miklos Szeredi:
"Fix an Oops introduced in v5.4"
* tag 'fuse-fixes-5.6-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
fuse: fix stack use after return
Pull overlayfs fixes from Miklos Szeredi:
"Fix three bugs introduced in this cycle"
* tag 'ovl-fixes-5.6-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
ovl: fix lockdep warning for async write
ovl: fix some xino configurations
ovl: fix lock in ovl_llseek()
During a rename whiteout, if btrfs_whiteout_for_rename() returns an error
we can end up returning from btrfs_rename() with the log context object
still in the root's log context list - this happens if 'sync_log' was
set to true before we called btrfs_whiteout_for_rename() and it is
dangerous because we end up with a corrupt linked list (root->log_ctxs)
as the log context object was allocated on the stack.
After btrfs_rename() returns, any task that is running btrfs_sync_log()
concurrently can end up crashing because that linked list is traversed by
btrfs_sync_log() (through btrfs_remove_all_log_ctxs()). That results in
the same issue that commit e6c617102c ("Btrfs: fix log context list
corruption after rename exchange operation") fixed.
Fixes: d4682ba03e ("Btrfs: sync log after logging new name")
CC: stable@vger.kernel.org # 4.19+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Pull power management fix from Rafael Wysocki:
"Fix cpupower utility build failures with -fno-common enabled (Mike
Gilbert)"
* tag 'pm-5.6-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpupower: avoid multiple definition with gcc -fno-common
Pull io_uring fix from Jens Axboe:
"Just a single fix here, improving the RCU callback ordering from last
week. After a bit more perusing by Paul, he poked a hole in the
original"
* tag 'io_uring-5.6-2020-03-13' of git://git.kernel.dk/linux-block:
io_uring: ensure RCU callback ordering with rcu_barrier()
Pull block fixes from Jens Axboe:
"A few fixes that should go into this release. This contains:
- Fix for a corruption issue with the s390 dasd driver (Stefan)
- Fixup/improvement for the flush insertion change that we had in
this series (Ming)
- Fix for the partition suppor for host aware zoned devices
(Shin'ichiro)
- Fix incorrect blk-iocost comparison (Tejun)
The diffstat looks large, but that's a) mostly dasd, and b) the flush
fix from Ming adds a big comment"
* tag 'block-5.6-2020-03-13' of git://git.kernel.dk/linux-block:
block: Fix partition support for host aware zoned block devices
blk-mq: insert flush request to the front of dispatch queue
s390/dasd: fix data corruption for thin provisioned devices
blk-iocost: fix incorrect vtime comparison in iocg_is_idle()
Pull MMC fixes from Ulf Hansson:
"MMC core:
- Fix HW busy detection support for host controllers requiring the
MMC_RSP_BUSY response flag (R1B) to be set for the command. In
particular for CMD6 (eMMC), erase/trim/discard (SD/eMMC) and CMD5
(eMMC sleep).
MMC host:
- sdhci-omap|tegra: Fix support for HW busy detection"
* tag 'mmc-v5.6-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
mmc: core: Respect MMC_CAP_NEED_RSP_BUSY for eMMC sleep command
mmc: sdhci-tegra: Fix busy detection by enabling MMC_CAP_NEED_RSP_BUSY
mmc: sdhci-omap: Fix busy detection by enabling MMC_CAP_NEED_RSP_BUSY
mmc: core: Respect MMC_CAP_NEED_RSP_BUSY for erase/trim/discard
mmc: core: Allow host controllers to require R1B for CMD6
afs_put_addrlist() casts kfree() to rcu_callback_t. Apart from being wrong
in theory, this might also blow up when people start enforcing function
types via compiler instrumentation, and it means the rcu_head has to be
first in struct afs_addr_list.
Use kfree_rcu() instead, it's simpler and more correct.
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Lockdep reports "WARNING: lock held when returning to user space!" due to
async write holding freeze lock over the write. Apparently aio.c already
deals with this by lying to lockdep about the state of the lock.
Do the same here. No need to check for S_IFREG() here since these file ops
are regular-only.
Reported-by: syzbot+9331a354f4f624a52a55@syzkaller.appspotmail.com
Fixes: 2406a307ac ("ovl: implement async IO routines")
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Fix up two bugs in the coversion to xino_mode:
1. xino=off does not always end up in disabled mode
2. xino=auto on 32bit arch should end up in disabled mode
Take a proactive approach to disabling xino on 32bit kernel:
1. Disable XINO_AUTO config during build time
2. Disable xino with a warning on mount time
As a by product, xino=on on 32bit arch also ends up in disabled mode.
We never intended to enable xino on 32bit arch and this will make the
rest of the logic simpler.
Fixes: 0f831ec85e ("ovl: simplify ovl_same_sb() helper")
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
The vector management code assumes that managed interrupts cannot be
migrated away from an online CPU. free_moved_vector() has a WARN_ON_ONCE()
which triggers when a managed interrupt vector association on a online CPU
is cleared. The CPU offline code uses a different mechanism which cannot
trigger this.
This assumption is not longer correct because the new CPU isolation feature
which affects the placement of managed interrupts must be able to move a
managed interrupt away from an online CPU.
There are two reasons why this can happen:
1) When the interrupt is activated the affinity mask which was
established in irq_create_affinity_masks() is handed in to
the vector allocation code. This mask contains all CPUs to which
the interrupt can be made affine to, but this does not take the
CPU isolation 'managed_irq' mask into account.
When the interrupt is finally requested by the device driver then the
affinity is checked again and the CPU isolation 'managed_irq' mask is
taken into account, which moves the interrupt to a non-isolated CPU if
possible.
2) The interrupt can be affine to an isolated CPU because the
non-isolated CPUs in the calculated affinity mask are not online.
Once a non-isolated CPU which is in the mask comes online the
interrupt is migrated to this non-isolated CPU
In both cases the regular online migration mechanism is used which triggers
the WARN_ON_ONCE() in free_moved_vector().
Case #1 could have been addressed by taking the isolation mask into
account, but that would require a massive code change in the activation
logic and the eventual migration event was accepted as a reasonable
tradeoff when the isolation feature was developed. But even if #1 would be
addressed, #2 would still trigger it.
Of course the warning in free_moved_vector() was overlooked at that time
and the above two cases which have been discussed during patch review have
obviously never been tested before the final submission.
So keep it simple and remove the warning.
[ tglx: Rewrote changelog and added a comment to free_moved_vector() ]
Fixes: 11ea68f553 ("genirq, sched/isolation: Isolate from handling managed interrupts")
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lkml.kernel.org/r/20200312205830.81796-1-peterx@redhat.com
Quoting from the comment describing the WARN functions in
include/asm-generic/bug.h:
* WARN(), WARN_ON(), WARN_ON_ONCE, and so on can be used to report
* significant kernel issues that need prompt attention if they should ever
* appear at runtime.
*
* Do not use these macros when checking for invalid external inputs
The (buggy) firmware tables which the dmar code was calling WARN_TAINT
for really are invalid external inputs. They are not under the kernel's
control and the issues in them cannot be fixed by a kernel update.
So logging a backtrace, which invites bug reports to be filed about this,
is not helpful.
Fixes: 556ab45f9a ("ioat2: catch and recover from broken vtd configurations v6")
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Acked-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20200309182510.373875-1-hdegoede@redhat.com
BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=701847
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Quoting from the comment describing the WARN functions in
include/asm-generic/bug.h:
* WARN(), WARN_ON(), WARN_ON_ONCE, and so on can be used to report
* significant kernel issues that need prompt attention if they should ever
* appear at runtime.
*
* Do not use these macros when checking for invalid external inputs
The (buggy) firmware tables which the dmar code was calling WARN_TAINT
for really are invalid external inputs. They are not under the kernel's
control and the issues in them cannot be fixed by a kernel update.
So logging a backtrace, which invites bug reports to be filed about this,
is not helpful.
Some distros, e.g. Fedora, have tools watching for the kernel backtraces
logged by the WARN macros and offer the user an option to file a bug for
this when these are encountered. The WARN_TAINT in dmar_parse_one_rmrr
+ another iommu WARN_TAINT, addressed in another patch, have lead to over
a 100 bugs being filed this way.
This commit replaces the WARN_TAINT("...") call, with a
pr_warn(FW_BUG "...") + add_taint(TAINT_FIRMWARE_WORKAROUND, ...) call
avoiding the backtrace and thus also avoiding bug-reports being filed
about this against the kernel.
Fixes: f5a68bb075 ("iommu/vt-d: Mark firmware tainted if RMRR fails sanity check")
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Acked-by: Lu Baolu <baolu.lu@linux.intel.com>
Cc: stable@vger.kernel.org
Cc: Barret Rhoden <brho@google.com>
Link: https://lore.kernel.org/r/20200309140138.3753-3-hdegoede@redhat.com
BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1808874
Quoting from the comment describing the WARN functions in
include/asm-generic/bug.h:
* WARN(), WARN_ON(), WARN_ON_ONCE, and so on can be used to report
* significant kernel issues that need prompt attention if they should ever
* appear at runtime.
*
* Do not use these macros when checking for invalid external inputs
The (buggy) firmware tables which the dmar code was calling WARN_TAINT
for really are invalid external inputs. They are not under the kernel's
control and the issues in them cannot be fixed by a kernel update.
So logging a backtrace, which invites bug reports to be filed about this,
is not helpful.
Some distros, e.g. Fedora, have tools watching for the kernel backtraces
logged by the WARN macros and offer the user an option to file a bug for
this when these are encountered. The WARN_TAINT in warn_invalid_dmar()
+ another iommu WARN_TAINT, addressed in another patch, have lead to over
a 100 bugs being filed this way.
This commit replaces the WARN_TAINT("...") calls, with
pr_warn(FW_BUG "...") + add_taint(TAINT_FIRMWARE_WORKAROUND, ...) calls
avoiding the backtrace and thus also avoiding bug-reports being filed
about this against the kernel.
Fixes: fd0c889489 ("intel-iommu: Set a more specific taint flag for invalid BIOS DMAR tables")
Fixes: e625b4a95d ("iommu/vt-d: Parse ANDD records")
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Acked-by: Lu Baolu <baolu.lu@linux.intel.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20200309140138.3753-2-hdegoede@redhat.com
BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1564895
This fixes a problem found on the MacBookPro 2017 Retina panel:
The panel reports 10 bpc color depth in its EDID, and the
firmware chooses link settings at boot which support enough
bandwidth for 10 bpc (324000 kbit/sec aka LINK_RATE_RBR2
aka 0xc), but the DP_MAX_LINK_RATE dpcd register only reports
2.7 Gbps (multiplier value 0xa) as possible, in direct
contradiction of what the firmware successfully set up.
This restricts the panel to 8 bpc, not providing the full
color depth of the panel on Linux <= 5.5. Additionally, commit
'4a8ca46bae8a ("drm/amd/display: Default max bpc to 16 for eDP")'
introduced into Linux 5.6-rc1 will unclamp panel depth to
its full 10 bpc, thereby requiring a eDP bandwidth for all
modes that exceeds the bandwidth available and causes all modes
to fail validation -> No modes for the laptop panel -> failure
to set any mode -> Panel goes dark.
This patch adds a quirk specific to the MBP 2017 15" Retina
panel to override reported max link rate to the correct maximum
of 0xc = LINK_RATE_RBR2 to fix the darkness and reduced display
precision.
Please apply for Linux 5.6+ to avoid regressing Apple MBP panel
support.
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This can fix the baco reset failure seen on Navi10.
And this should be a low risk fix as the same sequence
is already used for system suspend/resume.
Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
In dcn20_funcs and dcn21_funcs struct, the member ".dsc_pg_control = NULL"
should be removed due to .dsc_pg_control be assigned to dcn20_dsc_pg_control.
Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Anthony Koo <Anthony.Koo@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
With CONFIG_KASAN_VMALLOC, new page tables are created at the time
shadow memory for vmalloc area is unmapped. If some parts of the
page table still have entries to the zero page shadow memory, the
entries are wrongly marked RW.
With CONFIG_KASAN_VMALLOC, almost the entire kernel address space
is managed by KASAN. To make it simple, just create KASAN page tables
for the entire kernel space at kasan_init(). That doesn't use much
more space, and that's anyway already done for hash platforms.
Fixes: 3d4247fcc9 ("powerpc/32: Add support of KASAN_VMALLOC")
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/ef5248fc1f496c6b0dfdb59380f24968f25f75c5.1583513368.git.christophe.leroy@c-s.fr
Pull drm fixes from Dave Airlie:
"It's a bit quieter, probably not as much as it could be.
There is on large regression fix in here from Lyude for displayport
bandwidth calculations, there've been reports of multi-monitor in
docks not working since -rc1 and this has been tested to fix those.
Otherwise it's a bunch of i915 (with some GVT fixes), a set of amdgpu
watermark + bios fixes, and an exynos iommu cleanup fix.
core:
- DP MST bandwidth regression fix.
i915:
- hard lockup fix
- GVT fixes
- 32-bit alignment issue fix
- timeline wait fixes
- cacheline_retire and free
amdgpu:
- Update the display watermark bounding box for navi14
- Fix fetching vbios directly from rom on vega20/arcturus
- Navi and renoir watermark fixes
exynos:
- iommu object cleanup fix"
`
* tag 'drm-fixes-2020-03-13' of git://anongit.freedesktop.org/drm/drm:
drm/dp_mst: Rewrite and fix bandwidth limit checks
drm/dp_mst: Reprobe path resources in CSN handler
drm/dp_mst: Use full_pbn instead of available_pbn for bandwidth checks
drm/dp_mst: Rename drm_dp_mst_is_dp_mst_end_device() to be less redundant
drm/i915: Defer semaphore priority bumping to a workqueue
drm/i915/gt: Close race between cacheline_retire and free
drm/i915/execlists: Enable timeslice on partial virtual engine dequeue
drm/i915: be more solid in checking the alignment
drm/i915/gvt: Fix dma-buf display blur issue on CFL
drm/i915: Return early for await_start on same timeline
drm/i915: Actually emit the await_start
drm/amdgpu/powerplay: nv1x, renior copy dcn clock settings of watermark to smu during boot up
drm/exynos: Fix cleanup of IOMMU related objects
drm/amdgpu: correct ROM_INDEX/DATA offset for VEGA20
drm/amd/display: update soc bb for nv14
drm/i915/gvt: Fix emulated vbt size issue
drm/i915/gvt: Fix unnecessary schedule timer when no vGPU exits
UAPI Changes: None
Cross-subsystem Changes: None
Core Changes: Fixed regressions introduced by commit cd82d82cbc
("drm/dp_mst: Add branch bandwidth validation to MST atomic check"),
which would cause us to:
* Calculate the available bandwidth on an MST topology incorrectly, and
as a result reject most display configurations that would try to enable
more then one sink on a topology
* Occasionally expose MST connectors to userspace before finishing
probing their PBN capabilities, resulting in us rejecting display
configurations because we assumed briefly that no bandwidth was
available
Driver Changes: None
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Lyude Paul <lyude@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/bf16ee577567beed91c86b7d9cda3ec2e8c50a71.camel@redhat.com
Pull networking fixes from David Miller:
"It looks like a decent sized set of fixes, but a lot of these are one
liner off-by-one and similar type changes:
1) Fix netlink header pointer to calcular bad attribute offset
reported to user. From Pablo Neira Ayuso.
2) Don't double clear PHY interrupts when ->did_interrupt is set,
from Heiner Kallweit.
3) Add missing validation of various (devlink, nl802154, fib, etc.)
attributes, from Jakub Kicinski.
4) Missing *pos increments in various netfilter seq_next ops, from
Vasily Averin.
5) Missing break in of_mdiobus_register() loop, from Dajun Jin.
6) Don't double bump tx_dropped in veth driver, from Jiang Lidong.
7) Work around FMAN erratum A050385, from Madalin Bucur.
8) Make sure ARP header is pulled early enough in bonding driver,
from Eric Dumazet.
9) Do a cond_resched() during multicast processing of ipvlan and
macvlan, from Mahesh Bandewar.
10) Don't attach cgroups to unrelated sockets when in interrupt
context, from Shakeel Butt.
11) Fix tpacket ring state management when encountering unknown GSO
types. From Willem de Bruijn.
12) Fix MDIO bus PHY resume by checking mdio_bus_phy_may_suspend()
only in the suspend context. From Heiner Kallweit"
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (112 commits)
net: systemport: fix index check to avoid an array out of bounds access
tc-testing: add ETS scheduler to tdc build configuration
net: phy: fix MDIO bus PM PHY resuming
net: hns3: clear port base VLAN when unload PF
net: hns3: fix RMW issue for VLAN filter switch
net: hns3: fix VF VLAN table entries inconsistent issue
net: hns3: fix "tc qdisc del" failed issue
taprio: Fix sending packets without dequeueing them
net: mvmdio: avoid error message for optional IRQ
net: dsa: mv88e6xxx: Add missing mask of ATU occupancy register
net: memcg: fix lockdep splat in inet_csk_accept()
s390/qeth: implement smarter resizing of the RX buffer pool
s390/qeth: refactor buffer pool code
s390/qeth: use page pointers to manage RX buffer pool
seg6: fix SRv6 L2 tunnels to use IANA-assigned protocol number
net: dsa: Don't instantiate phylink for CPU/DSA ports unless needed
net/packet: tpacket_rcv: do not increment ring index on drop
sxgbe: Fix off by one in samsung driver strncpy size arg
net: caif: Add lockdep expression to RCU traversal primitive
MAINTAINERS: remove Sathya Perla as Emulex NIC maintainer
...
Sigh, this is mostly my fault for not giving commit cd82d82cbc
("drm/dp_mst: Add branch bandwidth validation to MST atomic check")
enough scrutiny during review. The way we're checking bandwidth
limitations here is mostly wrong:
For starters, drm_dp_mst_atomic_check_bw_limit() determines the
pbn_limit of a branch by simply scanning each port on the current branch
device, then uses the last non-zero full_pbn value that it finds. It
then counts the sum of the PBN used on each branch device for that
level, and compares against the full_pbn value it found before.
This is wrong because ports can and will have different PBN limitations
on many hubs, especially since a number of DisplayPort hubs out there
will be clever and only use the smallest link rate required for each
downstream sink - potentially giving every port a different full_pbn
value depending on what link rate it's trained at. This means with our
current code, which max PBN value we end up with is not well defined.
Additionally, we also need to remember when checking bandwidth
limitations that the top-most device in any MST topology is a branch
device, not a port. This means that the first level of a topology
doesn't technically have a full_pbn value that needs to be checked.
Instead, we should assume that so long as our VCPI allocations fit we're
within the bandwidth limitations of the primary MSTB.
We do however, want to check full_pbn on every port including those of
the primary MSTB. However, it's important to keep in mind that this
value represents the minimum link rate /between a port's sink or mstb,
and the mstb itself/. A quick diagram to explain:
MSTB #1
/ \
/ \
Port #1 Port #2
full_pbn for Port #1 → | | ← full_pbn for Port #2
Sink #1 MSTB #2
|
etc...
Note that in the above diagram, the combined PBN from all VCPI
allocations on said hub should not exceed the full_pbn value of port #2,
and the display configuration on sink #1 should not exceed the full_pbn
value of port #1. However, port #1 and port #2 can otherwise consume as
much bandwidth as they want so long as their VCPI allocations still fit.
And finally - our current bandwidth checking code also makes the mistake
of not checking whether something is an end device or not before trying
to traverse down it.
So, let's fix it by rewriting our bandwidth checking helpers. We split
the function into one part for handling branches which simply adds up
the total PBN on each branch and returns it, and one for checking each
port to ensure we're not going over its PBN limit. Phew.
This should fix regressions seen, where we erroneously reject display
configurations due to thinking they're going over our bandwidth limits
when they're not.
Changes since v1:
* Took an even closer look at how PBN limitations are supposed to be
handled, and did some experimenting with Sean Paul. Ended up rewriting
these helpers again, but this time they should actually be correct!
Changes since v2:
* Small indenting fix
* Fix pbn_used check in drm_dp_mst_atomic_check_port_bw_limit()
Signed-off-by: Lyude Paul <lyude@redhat.com>
Fixes: cd82d82cbc ("drm/dp_mst: Add branch bandwidth validation to MST atomic check")
Cc: Sean Paul <seanpaul@google.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Mikita Lipski <mikita.lipski@amd.com>
Tested-by: Hans de Goede <hdegoede@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200309210131.1497545-1-lyude@redhat.com
We used to punt off reprobing path resources to the link address probe
work, but now that we handle CSNs asynchronously from the driver's HPD
handling we can do whatever the heck we want from the CSN!
So, reprobe the path resources from drm_dp_mst_handle_conn_stat(). Also,
get rid of the path resource reprobing code in
drm_dp_check_and_send_link_address() since it's needlessly complicated
when we already reprobe path resources from
drm_dp_handle_link_address_port(). And finally, teach
drm_dp_send_enum_path_resources() to return 1 on PBN changes so we know
if we need to send another hotplug or not.
This fixes issues where we've indicated to userspace that a port has
just been connected, before we actually probed it's available PBN -
something that results in unexpected atomic check failures.
Signed-off-by: Lyude Paul <lyude@redhat.com>
Fixes: cd82d82cbc ("drm/dp_mst: Add branch bandwidth validation to MST atomic check")
Cc: Mikita Lipski <mikita.lipski@amd.com>
Cc: Hans de Goede <hdegoede@redhat.com>
Cc: Sean Paul <sean@poorly.run>
Link: https://patchwork.freedesktop.org/patch/msgid/20200306234623.547525-4-lyude@redhat.com
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Tested-by: Hans de Goede <hdegoede@redhat.com>
DisplayPort specifications are fun. For a while, it's been really
unclear to us what available_pbn actually does. There's a somewhat vague
explanation in the DisplayPort spec (starting from 1.2) that partially
explains it:
The minimum payload bandwidth number supported by the path. Each node
updates this number with its available payload bandwidth number if its
payload bandwidth number is less than that in the Message Transaction
reply.
So, it sounds like available_pbn represents the smallest link rate in
use between the source and the branch device. Cool, so full_pbn is just
the highest possible PBN that the branch device supports right?
Well, we assumed that for quite a while until Sean Paul noticed that on
some MST hubs, available_pbn will actually get set to 0 whenever there's
any active payloads on the respective branch device. This caused quite a
bit of confusion since clearing the payload ID table would end up fixing
the available_pbn value.
So, we just went with that until commit cd82d82cbc ("drm/dp_mst: Add
branch bandwidth validation to MST atomic check") started breaking
people's setups due to us getting erroneous available_pbn values. So, we
did some more digging and got confused until we finally looked at the
definition for full_pbn:
The bandwidth of the link at the trained link rate and lane count
between the DP Source device and the DP Sink device with no time slots
allocated to VC Payloads, represented as a Payload Bandwidth Number. As
with the Available_Payload_Bandwidth_Number, this number is determined
by the link with the lowest lane count and link rate.
That's what we get for not reading specs closely enough, hehe. So, since
full_pbn is definitely what we want for doing bandwidth restriction
checks - let's start using that instead and ignore available_pbn
entirely.
Signed-off-by: Lyude Paul <lyude@redhat.com>
Fixes: cd82d82cbc ("drm/dp_mst: Add branch bandwidth validation to MST atomic check")
Cc: Mikita Lipski <mikita.lipski@amd.com>
Cc: Hans de Goede <hdegoede@redhat.com>
Cc: Sean Paul <sean@poorly.run>
Reviewed-by: Mikita Lipski <mikita.lipski@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200306234623.547525-3-lyude@redhat.com
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Tested-by: Hans de Goede <hdegoede@redhat.com>
Pull vfs fixes from Al Viro:
"A couple of fixes for old crap in ->atomic_open() instances"
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
cifs_atomic_open(): fix double-put on late allocation failure
gfs2_atomic_open(): fix O_EXCL|O_CREAT handling on cold dcache
Currently the bounds check on index is off by one and can lead to
an out of bounds access on array priv->filters_loc when index is
RXCHK_BRCM_TAG_MAX.
Fixes: bb9051a2b2 ("net: systemport: Add support for WAKE_FILTER")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
add CONFIG_NET_SCH_ETS to 'config', otherwise test suites using this file
to perform a full tdc run will encounter the following warning:
ok 645 e90e - Add ETS qdisc using bands # skipped - "-----> teardown stage" did not complete successfully
Fixes: 82c664b69c ("selftests: qdiscs: Add test coverage for ETS Qdisc")
Reported-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
So far we have the unfortunate situation that mdio_bus_phy_may_suspend()
is called in suspend AND resume path, assuming that function result is
the same. After the original change this is no longer the case,
resulting in broken resume as reported by Geert.
To fix this call mdio_bus_phy_may_suspend() in the suspend path only,
and let the phy_device store the info whether it was suspended by
MDIO bus PM.
Fixes: 503ba7c696 ("net: phy: Avoid multiple suspends")
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Tested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
several iterations of ->atomic_open() calling conventions ago, we
used to need fput() if ->atomic_open() failed at some point after
successful finish_open(). Now (since 2016) it's not needed -
struct file carries enough state to make fput() work regardless
of the point in struct file lifecycle and discarding it on
failure exits in open() got unified. Unfortunately, I'd missed
the fact that we had an instance of ->atomic_open() (cifs one)
that used to need that fput(), as well as the stale comment in
finish_open() demanding such late failure handling. Trivially
fixed...
Fixes: fe9ec8291f "do_last(): take fput() on error after opening to out:"
Cc: stable@kernel.org # v4.7+
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
with the way fs/namei.c:do_last() had been done, ->atomic_open()
instances needed to recognize the case when existing file got
found with O_EXCL|O_CREAT, either by falling back to finish_no_open()
or failing themselves. gfs2 one didn't.
Fixes: 6d4ade986f (GFS2: Add atomic_open support)
Cc: stable@kernel.org # v3.11
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Huazhong Tan says:
====================
net: hns3: fixes for -net
This series includes several bugfixes for the HNS3 ethernet driver.
[patch 1] fixes an "tc qdisc del" failure.
[patch 2] fixes SW & HW VLAN table not consistent issue.
[patch 3] fixes a RMW issue related to VLAN filter switch.
[patch 4] clears port based VLAN when uploading PF.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, PF missed to clear the port base VLAN for VF when
unload. In this case, the VLAN id will remain in the VLAN
table. This patch fixes it.
Fixes: 92f11ea177 ("net: hns3: fix set port based VLAN issue for VF")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
According to the user manual, the ingress and egress VLAN filter
are configured at the same time. Currently, hclge_init_vlan_config()
and hclge_set_vlan_spoofchk() will both change the VLAN filter
switch. So it's necessary to read the old configuration before
modifying it.
Fixes: 22044f95fa ("net: hns3: add support for spoof check setting")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, if VF is loaded on the host side, the host doesn't
clear the VF's VLAN table entries when VF removing. In this
case, when doing reset and disabling sriov at the same time the
VLAN device over VF will be removed, but the VLAN table entries
in hardware are remained.
This patch fixes it by asking PF to clear the VLAN table entries for
VF when VF is removing. It also clears the VLAN table full bit
after VF VLAN table entries being cleared.
Fixes: c6075b1934 ("net: hns3: Record VF vlan tables")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The HNS3 driver supports to configure TC numbers and TC to priority
map via "tc" tool. But when delete the rule, will fail, because
the HNS3 driver needs at least one TC, but the "tc" tool sets TC
number to zero when delete.
This patch makes sure that the TC number is at least one.
Fixes: 30d240dfa2 ("net: hns3: Add mqprio hardware offload support in hns3 driver")
Signed-off-by: Yonglong Liu <liuyonglong@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There was a bug that was causing packets to be sent to the driver
without first calling dequeue() on the "child" qdisc. And the KASAN
report below shows that sending a packet without calling dequeue()
leads to bad results.
The problem is that when checking the last qdisc "child" we do not set
the returned skb to NULL, which can cause it to be sent to the driver,
and so after the skb is sent, it may be freed, and in some situations a
reference to it may still be in the child qdisc, because it was never
dequeued.
The crash log looks like this:
[ 19.937538] ==================================================================
[ 19.938300] BUG: KASAN: use-after-free in taprio_dequeue_soft+0x620/0x780
[ 19.938968] Read of size 4 at addr ffff8881128628cc by task swapper/1/0
[ 19.939612]
[ 19.939772] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.6.0-rc3+ #97
[ 19.940397] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qe4
[ 19.941523] Call Trace:
[ 19.941774] <IRQ>
[ 19.941985] dump_stack+0x97/0xe0
[ 19.942323] print_address_description.constprop.0+0x3b/0x60
[ 19.942884] ? taprio_dequeue_soft+0x620/0x780
[ 19.943325] ? taprio_dequeue_soft+0x620/0x780
[ 19.943767] __kasan_report.cold+0x1a/0x32
[ 19.944173] ? taprio_dequeue_soft+0x620/0x780
[ 19.944612] kasan_report+0xe/0x20
[ 19.944954] taprio_dequeue_soft+0x620/0x780
[ 19.945380] __qdisc_run+0x164/0x18d0
[ 19.945749] net_tx_action+0x2c4/0x730
[ 19.946124] __do_softirq+0x268/0x7bc
[ 19.946491] irq_exit+0x17d/0x1b0
[ 19.946824] smp_apic_timer_interrupt+0xeb/0x380
[ 19.947280] apic_timer_interrupt+0xf/0x20
[ 19.947687] </IRQ>
[ 19.947912] RIP: 0010:default_idle+0x2d/0x2d0
[ 19.948345] Code: 00 00 41 56 41 55 65 44 8b 2d 3f 8d 7c 7c 41 54 55 53 0f 1f 44 00 00 e8 b1 b2 c5 fd e9 07 00 3
[ 19.950166] RSP: 0018:ffff88811a3efda0 EFLAGS: 00000282 ORIG_RAX: ffffffffffffff13
[ 19.950909] RAX: 0000000080000000 RBX: ffff88811a3a9600 RCX: ffffffff8385327e
[ 19.951608] RDX: 1ffff110234752c0 RSI: 0000000000000000 RDI: ffffffff8385262f
[ 19.952309] RBP: ffffed10234752c0 R08: 0000000000000001 R09: ffffed10234752c1
[ 19.953009] R10: ffffed10234752c0 R11: ffff88811a3a9607 R12: 0000000000000001
[ 19.953709] R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000000
[ 19.954408] ? default_idle_call+0x2e/0x70
[ 19.954816] ? default_idle+0x1f/0x2d0
[ 19.955192] default_idle_call+0x5e/0x70
[ 19.955584] do_idle+0x3d4/0x500
[ 19.955909] ? arch_cpu_idle_exit+0x40/0x40
[ 19.956325] ? _raw_spin_unlock_irqrestore+0x23/0x30
[ 19.956829] ? trace_hardirqs_on+0x30/0x160
[ 19.957242] cpu_startup_entry+0x19/0x20
[ 19.957633] start_secondary+0x2a6/0x380
[ 19.958026] ? set_cpu_sibling_map+0x18b0/0x18b0
[ 19.958486] secondary_startup_64+0xa4/0xb0
[ 19.958921]
[ 19.959078] Allocated by task 33:
[ 19.959412] save_stack+0x1b/0x80
[ 19.959747] __kasan_kmalloc.constprop.0+0xc2/0xd0
[ 19.960222] kmem_cache_alloc+0xe4/0x230
[ 19.960617] __alloc_skb+0x91/0x510
[ 19.960967] ndisc_alloc_skb+0x133/0x330
[ 19.961358] ndisc_send_ns+0x134/0x810
[ 19.961735] addrconf_dad_work+0xad5/0xf80
[ 19.962144] process_one_work+0x78e/0x13a0
[ 19.962551] worker_thread+0x8f/0xfa0
[ 19.962919] kthread+0x2ba/0x3b0
[ 19.963242] ret_from_fork+0x3a/0x50
[ 19.963596]
[ 19.963753] Freed by task 33:
[ 19.964055] save_stack+0x1b/0x80
[ 19.964386] __kasan_slab_free+0x12f/0x180
[ 19.964830] kmem_cache_free+0x80/0x290
[ 19.965231] ip6_mc_input+0x38a/0x4d0
[ 19.965617] ipv6_rcv+0x1a4/0x1d0
[ 19.965948] __netif_receive_skb_one_core+0xf2/0x180
[ 19.966437] netif_receive_skb+0x8c/0x3c0
[ 19.966846] br_handle_frame_finish+0x779/0x1310
[ 19.967302] br_handle_frame+0x42a/0x830
[ 19.967694] __netif_receive_skb_core+0xf0e/0x2a90
[ 19.968167] __netif_receive_skb_one_core+0x96/0x180
[ 19.968658] process_backlog+0x198/0x650
[ 19.969047] net_rx_action+0x2fa/0xaa0
[ 19.969420] __do_softirq+0x268/0x7bc
[ 19.969785]
[ 19.969940] The buggy address belongs to the object at ffff888112862840
[ 19.969940] which belongs to the cache skbuff_head_cache of size 224
[ 19.971202] The buggy address is located 140 bytes inside of
[ 19.971202] 224-byte region [ffff888112862840, ffff888112862920)
[ 19.972344] The buggy address belongs to the page:
[ 19.972820] page:ffffea00044a1800 refcount:1 mapcount:0 mapping:ffff88811a2bd1c0 index:0xffff8881128625c0 compo0
[ 19.973930] flags: 0x8000000000010200(slab|head)
[ 19.974388] raw: 8000000000010200 ffff88811a2ed650 ffff88811a2ed650 ffff88811a2bd1c0
[ 19.975151] raw: ffff8881128625c0 0000000000190013 00000001ffffffff 0000000000000000
[ 19.975915] page dumped because: kasan: bad access detected
[ 19.976461] page_owner tracks the page as allocated
[ 19.976946] page last allocated via order 2, migratetype Unmovable, gfp_mask 0xd20c0(__GFP_IO|__GFP_FS|__GFP_NO)
[ 19.978332] prep_new_page+0x24b/0x330
[ 19.978707] get_page_from_freelist+0x2057/0x2c90
[ 19.979170] __alloc_pages_nodemask+0x218/0x590
[ 19.979619] new_slab+0x9d/0x300
[ 19.979948] ___slab_alloc.constprop.0+0x2f9/0x6f0
[ 19.980421] __slab_alloc.constprop.0+0x30/0x60
[ 19.980870] kmem_cache_alloc+0x201/0x230
[ 19.981269] __alloc_skb+0x91/0x510
[ 19.981620] alloc_skb_with_frags+0x78/0x4a0
[ 19.982043] sock_alloc_send_pskb+0x5eb/0x750
[ 19.982476] unix_stream_sendmsg+0x399/0x7f0
[ 19.982904] sock_sendmsg+0xe2/0x110
[ 19.983262] ____sys_sendmsg+0x4de/0x6d0
[ 19.983660] ___sys_sendmsg+0xe4/0x160
[ 19.984032] __sys_sendmsg+0xab/0x130
[ 19.984396] do_syscall_64+0xe7/0xae0
[ 19.984761] page last free stack trace:
[ 19.985142] __free_pages_ok+0x432/0xbc0
[ 19.985533] qlist_free_all+0x56/0xc0
[ 19.985907] quarantine_reduce+0x149/0x170
[ 19.986315] __kasan_kmalloc.constprop.0+0x9e/0xd0
[ 19.986791] kmem_cache_alloc+0xe4/0x230
[ 19.987182] prepare_creds+0x24/0x440
[ 19.987548] do_faccessat+0x80/0x590
[ 19.987906] do_syscall_64+0xe7/0xae0
[ 19.988276] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 19.988775]
[ 19.988930] Memory state around the buggy address:
[ 19.989402] ffff888112862780: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 19.990111] ffff888112862800: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
[ 19.990822] >ffff888112862880: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 19.991529] ^
[ 19.992081] ffff888112862900: fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc fc
[ 19.992796] ffff888112862980: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
Fixes: 5a781ccbd1 ("tc: Add support for configuring the taprio scheduler")
Reported-by: Michael Schmidt <michael.schmidt@eti.uni-siegen.de>
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Acked-by: Andre Guedes <andre.guedes@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull IPMI fix from Corey Minyard:
"Fix a message spew on some system
The call to platform_get_irq() was changed to print a log if the
interrupt was not available, and that was causing bogus messages to
spew out for the IPMI driver. People have requested that this get in
to 5.6 so I'm sending it along"
* tag 'for-linus-5.6-2' of git://github.com/cminyard/linux-ipmi:
ipmi_si: Avoid spurious errors for optional IRQs
Pull crypto fix from Herbert Xu:
"Fix a build problem with x86/curve25519"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: x86/curve25519 - support assemblers with no adx support
ovl_inode_lock() is interruptible. When inode_lock() in ovl_llseek()
was replaced with ovl_inode_lock(), we did not add a check for error.
Fix this by making ovl_inode_lock() uninterruptible and change the
existing call sites to use an _interruptible variant.
Reported-by: syzbot+66a9752fa927f745385e@syzkaller.appspotmail.com
Fixes: b1f9d3858f ("ovl: use ovl_inode_lock in ovl_llseek()")
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Commit b72053072c ("block: allow partitions on host aware zone
devices") introduced the helper function disk_has_partitions() to check
if a given disk has valid partitions. However, since this function result
directly depends on the disk partition table length rather than the
actual existence of valid partitions in the table, it returns true even
after all partitions are removed from the disk. For host aware zoned
block devices, this results in zone management support to be kept
disabled even after removing all partitions.
Fix this by changing disk_has_partitions() to walk through the partition
table entries and return true if and only if a valid non-zero size
partition is found.
Fixes: b72053072c ("block: allow partitions on host aware zone devices")
Cc: stable@vger.kernel.org # 5.5
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
commit 01e99aeca3 ("blk-mq: insert passthrough request into
hctx->dispatch directly") may change to add flush request to the tail
of dispatch by applying the 'add_head' parameter of
blk_mq_sched_insert_request.
Turns out this way causes performance regression on NCQ controller because
flush is non-NCQ command, which can't be queued when there is any in-flight
NCQ command. When adding flush rq to the front of hctx->dispatch, it is
easier to introduce extra time to flush rq's latency compared with adding
to the tail of dispatch queue because of S_SCHED_RESTART, then chance of
flush merge is increased, and less flush requests may be issued to
controller.
So always insert flush request to the front of dispatch queue just like
before applying commit 01e99aeca3 ("blk-mq: insert passthrough request
into hctx->dispatch directly").
Cc: Damien Le Moal <Damien.LeMoal@wdc.com>
Cc: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Fixes: 01e99aeca3 ("blk-mq: insert passthrough request into hctx->dispatch directly")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Devices are formatted in multiple of tracks.
For an Extent Space Efficient (ESE) volume we get errors when accessing
unformatted tracks. In this case the driver either formats the track on
the flight for write requests or returns zero data for read requests.
In case a request spans multiple tracks, the indication of an unformatted
track presented for the first track is incorrectly applied to all tracks
covered by the request. As a result, tracks containing data will be handled
as empty, resulting in zero data being returned on read, or overwriting
existing data with zero on write.
Fix by determining the track that gets the NRF error.
For write requests only format the track that is surely not formatted.
For Read requests all tracks before have returned valid data and should not
be touched.
All tracks after the unformatted track might be formatted or not. Those are
returned to the blocklayer to build a new request.
When using alias devices there is a chance that multiple write requests
trigger a format of the same track which might lead to data loss. Ensure
that a track is formatted only once by maintaining a list of currently
processed tracks.
Fixes: 5e2b17e712 ("s390/dasd: Add dynamic formatting support for ESE volumes")
Cc: stable@vger.kernel.org # 5.3+
Signed-off-by: Stefan Haberland <sth@linux.ibm.com>
Reviewed-by: Jan Hoeppner <hoeppner@linux.ibm.com>
Reviewed-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Enable the sampling check in kernel/events/core.c::perf_event_open(),
which returns the more appropriate -EOPNOTSUPP.
BEFORE:
$ sudo perf record -a -e instructions,l3_request_g1.caching_l3_cache_accesses true
Error:
The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (l3_request_g1.caching_l3_cache_accesses).
/bin/dmesg | grep -i perf may provide additional information.
With nothing relevant in dmesg.
AFTER:
$ sudo perf record -a -e instructions,l3_request_g1.caching_l3_cache_accesses true
Error:
l3_request_g1.caching_l3_cache_accesses: PMU Hardware doesn't support sampling/overflow-interrupts. Try 'perf stat'
Fixes: c43ca5091a ("perf/x86/amd: Add support for AMD NB and L2I "uncore" counters")
Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20200311191323.13124-1-kim.phillips@amd.com
The busy timeout for the CMD5 to put the eMMC into sleep state, is specific
to the card. Potentially the timeout may exceed the host->max_busy_timeout.
If that becomes the case, mmc_sleep() converts from using an R1B response
to an R1 response, as to prevent the host from doing HW busy detection.
However, it has turned out that some hosts requires an R1B response no
matter what, so let's respect that via checking MMC_CAP_NEED_RSP_BUSY. Note
that, if the R1B gets enforced, the host becomes fully responsible of
managing the needed busy timeout, in one way or the other.
Suggested-by: Sowjanya Komatineni <skomatineni@nvidia.com>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20200311092036.16084-1-ulf.hansson@linaro.org
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
The at24 driver attempts to read a byte from the device to validate that
it's actually present, and if not, disables the vcc regulator and
returns -ENODEV. However, between the read and the error handling path,
pm_runtime_idle() is called and invokes the driver's suspend callback,
which also disables the vcc regulator. This leads to an underflow of the
regulator enable count if the EEPROM is not present.
Move the pm_runtime_suspend() call to be after the error handling path
to resolve this.
Fixes: cd5676db05 ("misc: eeprom: at24: support pm_runtime control")
Signed-off-by: Michael Auchter <michael.auchter@ni.com>
Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Per the dt-binding the interrupt is optional so use
platform_get_irq_optional() instead of platform_get_irq(). Since
commit 7723f4c5ec ("driver core: platform: Add an error message to
platform_get_irq*()") platform_get_irq() produces an error message
orion-mdio f1072004.mdio: IRQ index 0 not found
which is perfectly normal if one hasn't specified the optional property
in the device tree.
Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Only the bottom 12 bits contain the ATU bin occupancy statistics. The
upper bits need masking off.
Fixes: e0c69ca7df ("net: dsa: mv88e6xxx: Add ATU occupancy via devlink resources")
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Julian Wiedmann says:
====================
s390/qeth: fixes 2020-03-11
please apply the following patch series for qeth to netdev's net tree.
Just one fix to get the RX buffer pool resizing right, with two
preparatory cleanups.
This is on the larger side given where we are in the -rc cycle, but a
big chunk of the delta is just refactoring to make the fix look nice.
I intentionally split these off from yesterday's series. No objections
if you'd rather punt them to net-next, the series should apply cleanly.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The RX buffer pool is allocated in qeth_alloc_qdio_queues().
A subsequent pool resizing is then handled in a very simple way:
first free the current pool, then allocate a new pool of the requested
size.
There's two ways where this can go wrong:
1. if the resize action happens _before_ the initial pool was allocated,
then a subsequent initialization will call qeth_alloc_qdio_queues()
and fill the pool with a second(!) set of pages. We consume twice the
planned amount of memory.
This is easy to fix - just skip the resizing if the queues haven't
been allocated yet.
2. if the initial pool was created by qeth_alloc_qdio_queues() but a
subsequent resizing fails, then the device has no(!) RX buffer pool.
The next initialization will _not_ call qeth_alloc_qdio_queues(), and
attempting to back the RX buffers with pages in
qeth_init_qdio_queues() will fail.
Not very difficult to fix either - instead of re-allocating the whole
pool, just allocate/free as many entries to match the desired size.
Fixes: 4a71df5004 ("qeth: new qeth device driver")
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In preparation for a subsequent fix, split out helpers to allocate/free
individual pool entries.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The RX buffer elements are always backed with full pages, reflect this
in the pointer type.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The Internet Assigned Numbers Authority (IANA) has recently assigned
a protocol number value of 143 for Ethernet [1].
Before this assignment, encapsulation mechanisms such as Segment Routing
used the IPv6-NoNxt protocol number (59) to indicate that the encapsulated
payload is an Ethernet frame.
In this patch, we add the definition of the Ethernet protocol number to the
kernel headers and update the SRv6 L2 tunnels to use it.
[1] https://www.iana.org/assignments/protocol-numbers/protocol-numbers.xhtml
Signed-off-by: Paolo Lungaroni <paolo.lungaroni@cnit.it>
Reviewed-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Acked-by: Ahmed Abdelsalam <ahmed.abdelsalam@gssi.it>
Signed-off-by: David S. Miller <davem@davemloft.net>
By default, DSA drivers should configure CPU and DSA ports to their
maximum speed. In many configurations this is sufficient to make the
link work.
In some cases it is necessary to configure the link to run slower,
e.g. because of limitations of the SoC it is connected to. Or back to
back PHYs are used and the PHY needs to be driven in order to
establish link. In this case, phylink is used.
Only instantiate phylink if it is required. If there is no PHY, or no
fixed link properties, phylink can upset a link which works in the
default configuration.
Fixes: 0e27921816 ("net: dsa: Use PHYLINK for the CPU/DSA ports")
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
In one error case, tpacket_rcv drops packets after incrementing the
ring producer index.
If this happens, it does not update tp_status to TP_STATUS_USER and
thus the reader is stalled for an iteration of the ring, causing out
of order arrival.
The only such error path is when virtio_net_hdr_from_skb fails due
to encountering an unknown GSO type.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes an off-by-one error in strncpy size argument in
drivers/net/ethernet/samsung/sxgbe/sxgbe_main.c. The issue is that in:
strncmp(opt, "eee_timer:", 6)
the passed string literal: "eee_timer:" has 10 bytes (without the NULL
byte) and the passed size argument is 6. As a result, the logic will
also accept other, malformed strings, e.g. "eee_tiXXX:".
This bug doesn't seem to have any security impact since its present in
module's cmdline parsing code.
Signed-off-by: Dominik Czarnota <dominik.b.czarnota@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
caifdevs->list is traversed using list_for_each_entry_rcu()
outside an RCU read-side critical section but under the
protection of rtnl_mutex. Hence, add the corresponding lockdep
expression to silence the following false-positive warning:
[ 10.868467] =============================
[ 10.869082] WARNING: suspicious RCU usage
[ 10.869817] 5.6.0-rc1-00177-g06ec0a154aae4 #1 Not tainted
[ 10.870804] -----------------------------
[ 10.871557] net/caif/caif_dev.c:115 RCU-list traversed in non-reader section!!
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Amol Grover <frextrite@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
fec_enet_set_coalesce() validates the previously set params
and if they are within range proceeds to apply the new ones.
The new ones, however, are not validated. This seems backwards,
probably a copy-paste error?
Compile tested only.
Fixes: d851b47b22 ("net: fec: add interrupt coalescence feature support")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Fugang Duan <fugang.duan@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Although the IRQ assignment in ipmi_si driver is optional,
platform_get_irq() spews error messages unnecessarily:
ipmi_si dmi-ipmi-si.0: IRQ index 0 not found
Fix this by switching to platform_get_irq_optional().
Cc: stable@vger.kernel.org # 5.4.x
Cc: John Donnelly <john.p.donnelly@oracle.com>
Fixes: 7723f4c5ec ("driver core: platform: Add an error message to platform_get_irq*()")
Reported-and-tested-by: Patrick Vo <patrick.vo@hpe.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Message-Id: <20200205093146.1352-1-tiwai@suse.de>
Signed-off-by: Corey Minyard <cminyard@mvista.com>
The support for __uint128_t is dependent on the target bit size.
GCC that defaults to the 32-bit can still build the 64-bit kernel
with -m64 flag passed.
However, $(cc-option,-D__SIZEOF_INT128__=0) is evaluated against the
default machine bit, which may not match to the kernel it is building.
Theoretically, this could be evaluated separately for 64BIT/32BIT.
config CC_HAS_INT128
bool
default !$(cc-option,$(m64-flag) -D__SIZEOF_INT128__=0) if 64BIT
default !$(cc-option,$(m32-flag) -D__SIZEOF_INT128__=0)
I simplified it more because the 32-bit compiler is unlikely to support
__uint128_t.
Fixes: c12d3362a7 ("int128: move __uint128_t compiler test to Kconfig")
Reported-by: George Spelvin <lkml@sdf.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Tested-by: George Spelvin <lkml@sdf.org>
When a compiler supports multiple architectures, some compiler features
can be dependent on the target architecture.
This is typical for Clang, which supports multiple LLVM backends.
Even for GCC, we need to take care of biarch compiler cases.
It is not a problem when we evaluate cc-option in Makefiles because
cc-option is tested against the flag in question + $(KBUILD_CFLAGS).
The cc-option in Kconfig, on the other hand, does not accumulate
tested flags. Due to this simplification, it could potentially test
cc-option against a different target.
At first, Kconfig always evaluated cc-option against the host
architecture.
Since commit e8de12fb7c ("kbuild: Check for unknown options with
cc-option usage in Kconfig and clang"), in case of cross-compiling
with Clang, the target triple is correctly passed to Kconfig.
The case with biarch GCC (and native build with Clang) is still not
handled properly. We need to pass some flags to specify the target
machine bit.
Due to the design, all the macros in Kconfig are expanded in the
parse stage, where we do not know the target bit size yet.
For example, arch/x86/Kconfig allows a user to toggle CONFIG_64BIT.
If a compiler flag -foo depends on the machine bit, it must be tested
twice, one with -m32 and the other with -m64.
However, -m32/-m64 are not always recognized. So, this commits adds
m64-flag and m32-flag macros. They expand to -m32, -m64, respectively
if supported. Or, they expand to an empty string if unsupported.
The typical usage is like this:
config FOO
bool
default $(cc-option,$(m64-flag) -foo) if 64BIT
default $(cc-option,$(m32-flag) -foo)
This is clumsy, but there is no elegant way to handle this in the
current static macro expansion.
There was discussion for static functions vs dynamic functions.
The consensus was to go as far as possible with the static functions.
(https://lkml.org/lkml/2018/3/2/22)
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Tested-by: George Spelvin <lkml@sdf.org>
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
The commit 2042b5486b ("kbuild: unset variables in top Makefile
instead of setting 0") renamed the variable from "config-targets"
to "config-build", the comment should be consistent accordingly.
Signed-off-by: Kaiden PK Yu (余泊鎧) <KaidenPK.Yu@moxa.com>
Signed-off-by: SZ Lin (林上智) <sz.lin@moxa.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Clang warns:
drivers/net/ethernet/freescale/dpaa/dpaa_eth.c:2860:9: warning:
converting the result of '?:' with integer constants to a boolean always
evaluates to 'true' [-Wtautological-constant-compare]
return DPAA_FD_DATA_ALIGNMENT ? ALIGN(headroom,
^
drivers/net/ethernet/freescale/dpaa/dpaa_eth.c:131:34: note: expanded
from macro 'DPAA_FD_DATA_ALIGNMENT'
\#define DPAA_FD_DATA_ALIGNMENT (fman_has_errata_a050385() ? 64 : 16)
^
1 warning generated.
This was exposed by commit 3c68b8fffb ("dpaa_eth: FMan erratum A050385
workaround") even though it appears to have been an issue since the
introductory commit 9ad1a37493 ("dpaa_eth: add support for DPAA
Ethernet") since DPAA_FD_DATA_ALIGNMENT has never been able to be zero.
Just replace the whole boolean expression with the true branch, as it is
always been true.
Link: https://github.com/ClangBuiltLinux/linux/issues/928
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Reviewed-by: Madalin Bucur <madalin.bucur@oss.nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull fscrypt fix from Eric Biggers:
"Fix a bug where if userspace is writing to encrypted files while the
FS_IOC_REMOVE_ENCRYPTION_KEY ioctl (introduced in v5.4) is running,
dirty inodes could be evicted, causing writes could be lost or the
filesystem to hang due to a use-after-free. This was encountered
during real-world use, not just theoretical.
Tested with the existing fscrypt xfstests, and with a new xfstest I
wrote to reproduce this bug. This fix does expose an existing bug with
'-o lazytime' that Ted is working on fixing, but this fix is more
critical and needed anyway regardless of the lazytime fix"
* tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt:
fscrypt: don't evict dirty inodes after removing key
Johannes Berg says:
====================
A couple of fixes:
* three netlink validation fixes
* a mesh path selection fix
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Binderfs binder-control devices are cleaned up via binderfs_evict_inode
too() which will use refcount_dec_and_test(). However, we missed to set
the refcount for binderfs binder-control devices and so we underflowed
when the binderfs instance got unmounted. Pretty obvious oversight and
should have been part of the more general UAF fix. The good news is that
having test cases (suprisingly) helps.
Technically, we could detect that we're about to cleanup the
binder-control dentry in binderfs_evict_inode() and then simply clean it
up. But that makes the assumption that the binder driver itself will
never make use of a binderfs binder-control device after the binderfs
instance it belongs to has been unmounted and the superblock for it been
destroyed. While it is unlikely to ever come to this let's be on the
safe side. Performance-wise this also really doesn't matter since the
binder-control device is only every really when creating the binderfs
filesystem or creating additional binder devices. Both operations are
pretty rare.
Fixes: f0fe2c0f05 ("binder: prevent UAF for binderfs devices II")
Link: https://lore.kernel.org/r/CA+G9fYusdfg7PMfC9Xce-xLT7NiyKSbgojpK35GOm=Pf9jXXrA@mail.gmail.com
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Cc: stable@vger.kernel.org
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Acked-by: Todd Kjos <tkjos@google.com>
Link: https://lore.kernel.org/r/20200311105309.1742827-1-christian.brauner@ubuntu.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The default defintions use fill pattern 0x90 for padding which for ARC
generates unintended "ldh_s r12,[r0,0x20]" corresponding to opcode 0x9090
So use ".align 4" which insert a "nop_s" instruction instead.
Cc: stable@vger.kernel.org
Acked-by: Vineet Gupta <vgupta@synopsys.com>
Signed-off-by: Eugeniy Paltsev <Eugeniy.Paltsev@synopsys.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Pull thread fix from Christian Brauner:
"This contains a single fix for a regression which was introduced when
we introduced the ability to select a specific pid at process creation
time.
When this feature is requested, the error value will be set to -EPERM
after exiting the pid allocation loop. This caused EPERM to be
returned when e.g. the init process/child subreaper of the pid
namespace has already died where we used to return ENOMEM before.
The first patch here simply fixes the regression by unconditionally
setting the return value back to ENOMEM again once we've successfully
allocated the requested pid number. This should be easy to backport to
v5.5.
The second patch adds a comment explaining that we must keep returning
ENOMEM since we've been doing it for a long time and have explicitly
documented this behavior for userspace. This seemed worthwhile because
we now have at least two separate example where people tried to change
the return value to something other than ENOMEM (The first version of
the regression fix did that too and the commit message links to an
earlier patch that tried to do the same.).
I have a simple regression test to make sure we catch this regression
in the future but since that introduces a whole new selftest subdir
and test files I'll keep this for v5.7"
* tag 'for-linus-2020-03-10' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
pid: make ENOMEM return value more obvious
pid: Fix error return value in some cases
Pull ftrace fix from Steven Rostedt:
"Have ftrace lookup_rec() return a consistent record otherwise it can
break live patching"
* tag 'trace-v5.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
ftrace: Return the first found result in lookup_rec()
Pull pin control fixes from Linus Walleij:
"Some pin control fixes for the v5.6 series.
It comes down to memory leaks in the core and driver fixes. Some
should have been sent earlier but they kept piling up and the world is
just so full of distractions these days.
- Fix some inverted pins in the Meson GLX driver.
- Align the i.MX SC message structs causing warnings from KASan.
- Balance the kref in pinctrl hogs so they are actually free:d when
removing a pin control module. We haven't seen it before as people
don't use modules for pin control that much, I think.
- Add a missing call to pinctrl_unregister_mappings() another memory
leak when using modules.
- Fix the fwspec parsing in the Qualcomm driver.
- Fix a syntax error in the Falcon driver.
- Assign .irq_eoi conditionally in the Qualcomm driver, fixing a bug
affecting elder Qualcomm platforms"
* tag 'pinctrl-v5.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
pinctrl: qcom: Assign irq_eoi conditionally
pinctrl: falcon: fix syntax error
pinctrl: qcom: ssbi-gpio: Fix fwspec parsing bug
pinctrl: madera: Add missing call to pinctrl_unregister_mappings
pinctrl: core: Remove extra kref_get which blocks hogs being freed
pinctrl: imx: scu: Align imx sc msg structs to 4
pinctrl: meson-gxl: fix GPIOX sdio pins
This does three inter-related things to clarify the usage of the
platform device dma_mask field. In the process, fix the bug introduced
by cdfee56232 ("driver core: initialize a default DMA mask for
platform device") that caused Artem Tashkinov's laptop to not boot with
newer Fedora kernels.
This does:
- First off, rename the field to "platform_dma_mask" to make it
greppable.
We have way too many different random fields called "dma_mask" in
various data structures, where some of them are actual masks, and
some of them are just pointers to the mask. And the structures all
have pointers to each other, or embed each other inside themselves,
and "pdev" sometimes means "platform device" and sometimes it means
"PCI device".
So to make it clear in the code when you actually use this new field,
give it a unique name (it really should be something even more unique
like "platform_device_dma_mask", since it's per platform device, not
per platform, but that gets old really fast, and this is unique
enough in context).
To further clarify when the field gets used, initialize it when we
actually start using it with the default value.
- Then, use this field instead of the random one-off allocation in
platform_device_register_full() that is now unnecessary since we now
already have a perfectly fine allocation for it in the platform
device structure.
- The above then allows us to fix the actual bug, where the error path
of platform_device_register_full() would unconditionally free the
platform device DMA allocation with 'kfree()'.
That kfree() was dont regardless of whether the allocation had been
done earlier with the (now removed) kmalloc, or whether
setup_pdev_dma_masks() had already been used and the dma_mask pointer
pointed to the mask that was part of the platform device.
It seems most people never triggered the error path, or only triggered
it from a call chain that set an explicit pdevinfo->dma_mask value (and
thus caused the unnecessary allocation that was "cleaned up" in the
error path) before calling platform_device_register_full().
Robin Murphy points out that in Artem's case the wdat_wdt driver failed
in platform_device_add(), and that was the one that had called
platform_device_register_full() with pdevinfo.dma_mask = 0, and would
have caused that kfree() of pdev.dma_mask corrupting the heap.
A later unrelated kmalloc() then oopsed due to the heap corruption.
Fixes: cdfee56232 ("driver core: initialize a default DMA mask for platform device")
Reported-bisected-and-tested-by: Artem S. Tashkinov <aros@gmx.com>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It has turned out that the sdhci-tegra controller requires the R1B response,
for commands that has this response associated with them. So, converting
from an R1B to an R1 response for a CMD6 for example, leads to problems
with the HW busy detection support.
Fix this by informing the mmc core about the requirement, via setting the
host cap, MMC_CAP_NEED_RSP_BUSY.
Reported-by: Bitan Biswas <bbiswas@nvidia.com>
Reported-by: Peter Geis <pgwipeout@gmail.com>
Suggested-by: Sowjanya Komatineni <skomatineni@nvidia.com>
Cc: <stable@vger.kernel.org>
Tested-by: Sowjanya Komatineni <skomatineni@nvidia.com>
Tested-By: Peter Geis <pgwipeout@gmail.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
It has turned out that the sdhci-omap controller requires the R1B response,
for commands that has this response associated with them. So, converting
from an R1B to an R1 response for a CMD6 for example, leads to problems
with the HW busy detection support.
Fix this by informing the mmc core about the requirement, via setting the
host cap, MMC_CAP_NEED_RSP_BUSY.
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Reported-by: Anders Roxell <anders.roxell@linaro.org>
Reported-by: Faiz Abbas <faiz_abbas@ti.com>
Cc: <stable@vger.kernel.org>
Tested-by: Anders Roxell <anders.roxell@linaro.org>
Tested-by: Faiz Abbas <faiz_abbas@ti.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
The busy timeout that is computed for each erase/trim/discard operation,
can become quite long and may thus exceed the host->max_busy_timeout. If
that becomes the case, mmc_do_erase() converts from using an R1B response
to an R1 response, as to prevent the host from doing HW busy detection.
However, it has turned out that some hosts requires an R1B response no
matter what, so let's respect that via checking MMC_CAP_NEED_RSP_BUSY. Note
that, if the R1B gets enforced, the host becomes fully responsible of
managing the needed busy timeout, in one way or the other.
Suggested-by: Sowjanya Komatineni <skomatineni@nvidia.com>
Cc: <stable@vger.kernel.org>
Tested-by: Anders Roxell <anders.roxell@linaro.org>
Tested-by: Sowjanya Komatineni <skomatineni@nvidia.com>
Tested-by: Faiz Abbas <faiz_abbas@ti.com>
Tested-By: Peter Geis <pgwipeout@gmail.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
It has turned out that some host controllers can't use R1B for CMD6 and
other commands that have R1B associated with them. Therefore invent a new
host cap, MMC_CAP_NEED_RSP_BUSY to let them specify this.
In __mmc_switch(), let's check the flag and use it to prevent R1B responses
from being converted into R1. Note that, this also means that the host are
on its own, when it comes to manage the busy timeout.
Suggested-by: Sowjanya Komatineni <skomatineni@nvidia.com>
Cc: <stable@vger.kernel.org>
Tested-by: Anders Roxell <anders.roxell@linaro.org>
Tested-by: Sowjanya Komatineni <skomatineni@nvidia.com>
Tested-by: Faiz Abbas <faiz_abbas@ti.com>
Tested-By: Peter Geis <pgwipeout@gmail.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
The dmidecode program fails to properly decode the SMBIOS data supplied
by OVMF/UEFI when running in an SEV guest. The SMBIOS area, under SEV, is
encrypted and resides in reserved memory that is marked as EFI runtime
services data.
As a result, when memremap() is attempted for the SMBIOS data, it
can't be mapped as regular RAM (through try_ram_remap()) and, since
the address isn't part of the iomem resources list, it isn't mapped
encrypted through the fallback ioremap().
Add a new __ioremap_check_other() to deal with memory types like
EFI_RUNTIME_SERVICES_DATA which are not covered by the resource ranges.
This allows any runtime services data which has been created encrypted,
to be mapped encrypted too.
[ bp: Move functionality to a separate function. ]
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Joerg Roedel <jroedel@suse.de>
Tested-by: Joerg Roedel <jroedel@suse.de>
Cc: <stable@vger.kernel.org> # 5.3
Link: https://lkml.kernel.org/r/2d9e16eb5b53dc82665c95c6764b7407719df7a0.1582645327.git.thomas.lendacky@amd.com
It appears that ip ranges can overlap so. In that case lookup_rec()
returns whatever results it got last even if it found nothing in last
searched page.
This breaks an obscure livepatch late module patching usecase:
- load livepatch
- load the patched module
- unload livepatch
- try to load livepatch again
To fix this return from lookup_rec() as soon as it found the record
containing searched-for ip. This used to be this way prior lookup_rec()
introduction.
Link: http://lkml.kernel.org/r/20200306174317.21699-1-asavkov@redhat.com
Cc: stable@vger.kernel.org
Fixes: 7e16f581a8 ("ftrace: Separate out functionality from ftrace_location_range()")
Signed-off-by: Artem Savkov <asavkov@redhat.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Since snprintf() returns the would-be-output size instead of the
actual output size, the succeeding calls may go beyond the given
buffer limit. Fix it by replacing with scnprintf().
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Since snprintf() returns the would-be-output size instead of the
actual output size, the succeeding calls may go beyond the given
buffer limit. Fix it by replacing with scnprintf().
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
When we do the initial CPU reset we must not only clear the registers
in the internal data structures but also in kvm_run sync_regs. For
modern userspace sync_regs is the only place that it looks at.
Fixes: 7de3f1423f ("KVM: s390: Add new reset vcpu API")
Acked-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
After hif_shutdown(), communication with the chip is no more possible.
It the only request that never reply. Therefore, hif_cmd.lock is never
unlocked. hif_shutdown() unlock itself hif_cmd.lock to avoid a potential
warning during disposal of device. hif_cmd.key_renew_lock should also
been unlocked for the same reason.
Signed-off-by: Jérôme Pouiller <jerome.pouiller@silabs.com>
Link: https://lore.kernel.org/r/20200310101356.182818-2-Jerome.Pouiller@silabs.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When trying to rescan disks in petitboot shell, we hit the following
softlockup stacktrace:
Kernel panic - not syncing: System is deadlocked on memory
[ 241.223394] CPU: 32 PID: 693 Comm: sh Not tainted 5.4.16-openpower1 #1
[ 241.223406] Call Trace:
[ 241.223415] [c0000003f07c3180] [c000000000493fc4] dump_stack+0xa4/0xd8 (unreliable)
[ 241.223432] [c0000003f07c31c0] [c00000000007d4ac] panic+0x148/0x3cc
[ 241.223446] [c0000003f07c3260] [c000000000114b10] out_of_memory+0x468/0x4c4
[ 241.223461] [c0000003f07c3300] [c0000000001472b0] __alloc_pages_slowpath+0x594/0x6d8
[ 241.223476] [c0000003f07c3420] [c00000000014757c] __alloc_pages_nodemask+0x188/0x1a4
[ 241.223492] [c0000003f07c34a0] [c000000000153e10] alloc_pages_current+0xcc/0xd8
[ 241.223508] [c0000003f07c34e0] [c0000000001577ac] alloc_slab_page+0x30/0x98
[ 241.223524] [c0000003f07c3520] [c0000000001597fc] new_slab+0x138/0x40c
[ 241.223538] [c0000003f07c35f0] [c00000000015b204] ___slab_alloc+0x1e4/0x404
[ 241.223552] [c0000003f07c36c0] [c00000000015b450] __slab_alloc+0x2c/0x48
[ 241.223566] [c0000003f07c36f0] [c00000000015b754] kmem_cache_alloc_node+0x9c/0x1b4
[ 241.223582] [c0000003f07c3760] [c000000000218c48] blk_alloc_queue_node+0x34/0x270
[ 241.223599] [c0000003f07c37b0] [c000000000226574] blk_mq_init_queue+0x2c/0x78
[ 241.223615] [c0000003f07c37e0] [c0000000002ff710] scsi_mq_alloc_queue+0x28/0x70
[ 241.223631] [c0000003f07c3810] [c0000000003005b8] scsi_alloc_sdev+0x184/0x264
[ 241.223647] [c0000003f07c38a0] [c000000000300ba0] scsi_probe_and_add_lun+0x288/0xa3c
[ 241.223663] [c0000003f07c3a00] [c000000000301768] __scsi_scan_target+0xcc/0x478
[ 241.223679] [c0000003f07c3b20] [c000000000301c64] scsi_scan_channel.part.9+0x74/0x7c
[ 241.223696] [c0000003f07c3b70] [c000000000301df4] scsi_scan_host_selected+0xe0/0x158
[ 241.223712] [c0000003f07c3bd0] [c000000000303f04] store_scan+0x104/0x114
[ 241.223727] [c0000003f07c3cb0] [c0000000002d5ac4] dev_attr_store+0x30/0x4c
[ 241.223741] [c0000003f07c3cd0] [c0000000001dbc34] sysfs_kf_write+0x64/0x78
[ 241.223756] [c0000003f07c3cf0] [c0000000001da858] kernfs_fop_write+0x170/0x1b8
[ 241.223773] [c0000003f07c3d40] [c0000000001621fc] __vfs_write+0x34/0x60
[ 241.223787] [c0000003f07c3d60] [c000000000163c2c] vfs_write+0xa8/0xcc
[ 241.223802] [c0000003f07c3db0] [c000000000163df4] ksys_write+0x70/0xbc
[ 241.223816] [c0000003f07c3e20] [c00000000000b40c] system_call+0x5c/0x68
As a part of the scan process Linux will allocate and configure a
scsi_device for each target to be scanned. If the device is not present,
then the scsi_device is torn down. As a part of scsi_device teardown a
workqueue item will be scheduled and the lockups we see are because there
are 250k workqueue items to be processed. Accoding to the specification of
SIS-64 sas controller, max_channel should be decreased on SIS-64 adapters
to 4.
The patch fixes softlockup issue.
Thanks for Oliver Halloran's help with debugging and explanation!
Link: https://lore.kernel.org/r/1583510248-23672-1-git-send-email-wenxiong@linux.vnet.ibm.com
Signed-off-by: Wen Xiong <wenxiong@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Julian Wiedmann says:
====================
s390/qeth: fixes 2020-03-10
This fixes three minor issues:
1) a setup parameter gets cleared unnecessarily when the HW config
changes,
2) insufficient error handling when initially filling the RX ring, and
3) a rarely used worker that needs to be cancelled during tear down.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
When qeth's napi poll code fails to refill an entirely empty RX ring, it
kicks off buffer_reclaim_work to try again later.
Make sure that this worker is cancelled when setting the qeth device
offline. Otherwise a RX refill action can unexpectedly end up running
concurrently to bigger re-configurations (eg. resizing the buffer pool),
without any locking.
Fixes: b333293058 ("qeth: add support for af_iucv HiperSockets transport")
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
qeth_init_qdio_queues() fills the RX ring with an initial set of
RX buffers. If qeth_init_input_buffer() fails to back one of the RX
buffers with memory, we need to bail out and report the error.
Fixes: 4a71df5004 ("qeth: new qeth device driver")
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When an OSA device in prio-queue setup is reduced to 1 TX queue due to
HW restrictions, we reset its the default_out_queue to 0.
In the old code this was needed so that qeth_get_priority_queue() gets
the queue selection right. But with proper multiqueue support we already
reduced dev->real_num_tx_queues to 1, and so the stack puts all traffic
on txq 0 without even calling .ndo_select_queue.
Thus we can preserve the user's configuration, and apply it if the OSA
device later re-gains support for multiple TX queues.
Fixes: 73dc2daf11 ("s390/qeth: add TX multiqueue support for OSA devices")
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Igor Russkikh says:
====================
MACSec bugfixes related to MAC address change
We found out that there's an issue in MACSec code when the MAC address
is changed.
Both s/w and offloaded implementations don't update SCI when the MAC
address changes at the moment, but they should do so, because SCI contains
MAC in its first 6 octets.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The ibmvnic driver does not check the device state when the device
is removed. If the device is removed while a device reset is being
processed, the remove may free structures needed by the reset,
causing an oops.
Fix this by checking the device state before processing device remove.
Signed-off-by: Juliet Kim <julietk@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rafał found an issue that for non-Ethernet interface, if we down and up
frequently, the memory will be consumed slowly.
The reason is we add allnodes/allrouters addressed in multicast list in
ipv6_add_dev(). When link down, we call ipv6_mc_down(), store all multicast
addresses via mld_add_delrec(). But when link up, we don't call ipv6_mc_up()
for non-Ethernet interface to remove the addresses. This makes idev->mc_tomb
getting bigger and bigger. The call stack looks like:
addrconf_notify(NETDEV_REGISTER)
ipv6_add_dev
ipv6_dev_mc_inc(ff01::1)
ipv6_dev_mc_inc(ff02::1)
ipv6_dev_mc_inc(ff02::2)
addrconf_notify(NETDEV_UP)
addrconf_dev_config
/* Alas, we support only Ethernet autoconfiguration. */
return;
addrconf_notify(NETDEV_DOWN)
addrconf_ifdown
ipv6_mc_down
igmp6_group_dropped(ff02::2)
mld_add_delrec(ff02::2)
igmp6_group_dropped(ff02::1)
igmp6_group_dropped(ff01::1)
After investigating, I can't found a rule to disable multicast on
non-Ethernet interface. In RFC2460, the link could be Ethernet, PPP, ATM,
tunnels, etc. In IPv4, it doesn't check the dev type when calls ip_mc_up()
in inetdev_event(). Even for IPv6, we don't check the dev type and call
ipv6_add_dev(), ipv6_dev_mc_inc() after register device.
So I think it's OK to fix this memory consumer by calling ipv6_mc_up() for
non-Ethernet interface.
v2: Also check IFF_MULTICAST flag to make sure the interface supports
multicast
Reported-by: Rafał Miłecki <zajec5@gmail.com>
Tested-by: Rafał Miłecki <zajec5@gmail.com>
Fixes: 74235a25c6 ("[IPV6] addrconf: Fix IPv6 on tuntap tunnels")
Fixes: 1666d49e1d ("mld: do not remove mld souce list info when set link down")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull clang-format update from Miguel Ojeda:
"Another update for the .clang-format macro list
It has been a while since the last time I sent one!"
* tag 'clang-format-for-linus-v5.6-rc6' of git://github.com/ojeda/linux:
clang-format: Update with the latest for_each macro list
If a TCP socket is allocated in IRQ context or cloned from unassociated
(i.e. not associated to a memcg) in IRQ context then it will remain
unassociated for its whole life. Almost half of the TCPs created on the
system are created in IRQ context, so, memory used by such sockets will
not be accounted by the memcg.
This issue is more widespread in cgroup v1 where network memory
accounting is opt-in but it can happen in cgroup v2 if the source socket
for the cloning was created in root memcg.
To fix the issue, just do the association of the sockets at the accept()
time in the process context and then force charge the memory buffer
already used and reserved by the socket.
Signed-off-by: Shakeel Butt <shakeelb@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We are testing network memory accounting in our setup and noticed
inconsistent network memory usage and often unrelated cgroups network
usage correlates with testing workload. On further inspection, it
seems like mem_cgroup_sk_alloc() and cgroup_sk_alloc() are broken in
irq context specially for cgroup v1.
mem_cgroup_sk_alloc() and cgroup_sk_alloc() can be called in irq context
and kind of assumes that this can only happen from sk_clone_lock()
and the source sock object has already associated cgroup. However in
cgroup v1, where network memory accounting is opt-in, the source sock
can be unassociated with any cgroup and the new cloned sock can get
associated with unrelated interrupted cgroup.
Cgroup v2 can also suffer if the source sock object was created by
process in the root cgroup or if sk_alloc() is called in irq context.
The fix is to just do nothing in interrupt.
WARNING: Please note that about half of the TCP sockets are allocated
from the IRQ context, so, memory used by such sockets will not be
accouted by the memcg.
The stack trace of mem_cgroup_sk_alloc() from IRQ-context:
CPU: 70 PID: 12720 Comm: ssh Tainted: 5.6.0-smp-DEV #1
Hardware name: ...
Call Trace:
<IRQ>
dump_stack+0x57/0x75
mem_cgroup_sk_alloc+0xe9/0xf0
sk_clone_lock+0x2a7/0x420
inet_csk_clone_lock+0x1b/0x110
tcp_create_openreq_child+0x23/0x3b0
tcp_v6_syn_recv_sock+0x88/0x730
tcp_check_req+0x429/0x560
tcp_v6_rcv+0x72d/0xa40
ip6_protocol_deliver_rcu+0xc9/0x400
ip6_input+0x44/0xd0
? ip6_protocol_deliver_rcu+0x400/0x400
ip6_rcv_finish+0x71/0x80
ipv6_rcv+0x5b/0xe0
? ip6_sublist_rcv+0x2e0/0x2e0
process_backlog+0x108/0x1e0
net_rx_action+0x26b/0x460
__do_softirq+0x104/0x2a6
do_softirq_own_stack+0x2a/0x40
</IRQ>
do_softirq.part.19+0x40/0x50
__local_bh_enable_ip+0x51/0x60
ip6_finish_output2+0x23d/0x520
? ip6table_mangle_hook+0x55/0x160
__ip6_finish_output+0xa1/0x100
ip6_finish_output+0x30/0xd0
ip6_output+0x73/0x120
? __ip6_finish_output+0x100/0x100
ip6_xmit+0x2e3/0x600
? ipv6_anycast_cleanup+0x50/0x50
? inet6_csk_route_socket+0x136/0x1e0
? skb_free_head+0x1e/0x30
inet6_csk_xmit+0x95/0xf0
__tcp_transmit_skb+0x5b4/0xb20
__tcp_send_ack.part.60+0xa3/0x110
tcp_send_ack+0x1d/0x20
tcp_rcv_state_process+0xe64/0xe80
? tcp_v6_connect+0x5d1/0x5f0
tcp_v6_do_rcv+0x1b1/0x3f0
? tcp_v6_do_rcv+0x1b1/0x3f0
__release_sock+0x7f/0xd0
release_sock+0x30/0xa0
__inet_stream_connect+0x1c3/0x3b0
? prepare_to_wait+0xb0/0xb0
inet_stream_connect+0x3b/0x60
__sys_connect+0x101/0x120
? __sys_getsockopt+0x11b/0x140
__x64_sys_connect+0x1a/0x20
do_syscall_64+0x51/0x200
entry_SYSCALL_64_after_hwframe+0x44/0xa9
The stack trace of mem_cgroup_sk_alloc() from IRQ-context:
Fixes: 2d75807383 ("mm: memcontrol: consolidate cgroup socket tracking")
Fixes: d979a39d72 ("cgroup: duplicate cgroup reference when cloning sockets")
Signed-off-by: Shakeel Butt <shakeelb@google.com>
Reviewed-by: Roman Gushchin <guro@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull auxdisplay updates from Miguel Ojeda:
"A few minor auxdisplay improvements:
- charlcd: replace zero-length array with flexible-array member
(kernel-wide cleanup by Gustavo A. R. Silva)
- img-ascii-lcd: convert to devm_platform_ioremap_resource (Yangtao
Li)
- Fix Kconfig indentation (Krzysztof Kozlowski)
* tag 'auxdisplay-for-linus-v5.6-rc6' of git://github.com/ojeda/linux:
auxdisplay: charlcd: replace zero-length array with flexible-array member
auxdisplay: img-ascii-lcd: convert to devm_platform_ioremap_resource
auxdisplay: Fix Kconfig indentation
Pull cgroup fixes from Tejun Heo:
- cgroup.procs listing related fixes.
It didn't interlock properly with exiting tasks leaving a short
window where a cgroup has empty cgroup.procs but still can't be
removed and misbehaved on short reads.
- psi_show() crash fix on 32bit ino archs
- Empty release_agent handling fix
* 'for-5.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup1: don't call release_agent when it is ""
cgroup: fix psi_show() crash on 32bit ino archs
cgroup: Iterate tasks that did not finish do_exit()
cgroup: cgroup_procs_next should increase position index
cgroup-v1: cgroup_pidlist_next should update position index
Pull workqueue fixes from Tejun Heo:
"Workqueue has been incorrectly round-robining per-cpu work items.
Hillf's patch fixes that.
The other patch documents memory-ordering properties of workqueue
operations"
* 'for-5.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue: don't use wq_select_unbound_cpu() for bound works
workqueue: Document (some) memory-ordering properties of {queue,schedule}_work()
dc to pplib interface is changed for navi1x, renoir.
display_config_changed is not called by dc anymore.
smu_write_watermarks_table is not executed for navi1x, renoir
during boot up.
solution: call smu_write_watermarks_table just after dc pass
watermark clock settings to pplib
Signed-off-by: Hersen Wu <hersenxs.wu@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The timeout of identify cmd, which is invoked as part of admin queue
creation, can result in freeing of async event data both in
nvme_rdma_timeout handler and error handling path of
nvme_rdma_configure_admin queue thus causing NULL pointer reference.
Call Trace:
? nvme_rdma_setup_ctrl+0x223/0x800 [nvme_rdma]
nvme_rdma_create_ctrl+0x2ba/0x3f7 [nvme_rdma]
nvmf_dev_write+0xa54/0xcc6 [nvme_fabrics]
__vfs_write+0x1b/0x40
vfs_write+0xb2/0x1b0
ksys_write+0x61/0xd0
__x64_sys_write+0x1a/0x20
do_syscall_64+0x60/0x1e0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Reviewed-by: Roland Dreier <roland@purestorage.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Prabhath Sajeepa <psajeepa@purestorage.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
vtimes may wrap and time_before/after64() should be used to determine
whether a given vtime is before or after another. iocg_is_idle() was
incorrectly using plain "<" comparison do determine whether done_vtime
is before vtime. Here, the only thing we're interested in is whether
done_vtime matches vtime which indicates that there's nothing in
flight. Let's test for inequality instead.
Signed-off-by: Tejun Heo <tj@kernel.org>
Fixes: 7caa47151a ("blkcg: implement blk-iocost")
Cc: stable@vger.kernel.org # v5.4+
Signed-off-by: Jens Axboe <axboe@kernel.dk>
libtraceevent (used by perf and trace-cmd) failed to parse the
xhci_urb_dequeue trace event. This is because the user space trace
event format parsing is not a full C compiler. It can handle some basic
logic, but is not meant to be able to handle everything C can do.
In cases where a trace event field needs to be converted from a number
to a string, there's the __print_symbolic() macro that should be used:
See samples/trace_events/trace-events-sample.h
Some xhci trace events open coded the __print_symbolic() causing the
user spaces tools to fail to parse it. This has to be replaced with
__print_symbolic() instead.
CC: stable@vger.kernel.org
Reported-by: Tzvetomir Stoyanov <tstoyanov@vmware.com>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=206531
Fixes: 5abdc2e6e1 ("usb: host: xhci: add urb_enqueue/dequeue/giveback tracers")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20200306150858.21904-2-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Mika writes:
thunderbolt: Fix for v5.6-rc6
This includes a single commit that fixes incorrect return value from
tb_port_is_width_supported() if the read fails.
* tag 'thunderbolt-fix-for-v5.6-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/westeri/thunderbolt:
thunderbolt: Fix error code in tb_port_is_width_supported()
wq_select_unbound_cpu() is designed for unbound workqueues only, but
it's wrongly called when using a bound workqueue too.
Fixing this ensures work queued to a bound workqueue with
cpu=WORK_CPU_UNBOUND always runs on the local CPU.
Before, that would happen only if wq_unbound_cpumask happened to include
it (likely almost always the case), or was empty, or we got lucky with
forced round-robin placement. So restricting
/sys/devices/virtual/workqueue/cpumask to a small subset of a machine's
CPUs would cause some bound work items to run unexpectedly there.
Fixes: ef55718044 ("workqueue: schedule WORK_CPU_UNBOUND work on wq_unbound_cpumask CPUs")
Cc: stable@vger.kernel.org # v4.5+
Signed-off-by: Hillf Danton <hdanton@sina.com>
[dj: massage changelog]
Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Tejun Heo <tj@kernel.org>
If a GPIO we are trying to use is not available and we are deferring
the probe, don't output an error message.
This seems to have been the intent of commit 05c7477885
("i2c: gpio: Add support for named gpios in DT") but the error was
still output due to not checking the updated 'retdesc'.
Fixes: 05c7477885 ("i2c: gpio: Add support for named gpios in DT")
Signed-off-by: Hamish Martin <hamish.martin@alliedtelesis.co.nz>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Function i2c_dw_pci_remove() -> pci_free_irq_vectors() ->
pci_disable_msi() -> free_msi_irqs() will throw a BUG_ON() for MSI
enabled device since the driver has not released the requested IRQ before
calling the pci_free_irq_vectors().
Here driver requests an IRQ using devm_request_irq() but automatic
release happens only after remove callback. Fix this by explicitly
freeing the IRQ before calling pci_free_irq_vectors().
Fixes: 21aa3983d6 ("i2c: designware-pci: Switch over to MSI interrupts")
Cc: stable@vger.kernel.org # v5.4+
Signed-off-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Similar to the commit 02d715b4a8 ("iommu/vt-d: Fix RCU list debugging
warnings"), there are several other places that call
list_for_each_entry_rcu() outside of an RCU read side critical section
but with dmar_global_lock held. Silence those false positives as well.
drivers/iommu/intel-iommu.c:4288 RCU-list traversed in non-reader section!!
1 lock held by swapper/0/1:
#0: ffffffff935892c8 (dmar_global_lock){+.+.}, at: intel_iommu_init+0x1ad/0xb97
drivers/iommu/dmar.c:366 RCU-list traversed in non-reader section!!
1 lock held by swapper/0/1:
#0: ffffffff935892c8 (dmar_global_lock){+.+.}, at: intel_iommu_init+0x125/0xb97
drivers/iommu/intel-iommu.c:5057 RCU-list traversed in non-reader section!!
1 lock held by swapper/0/1:
#0: ffffffffa71892c8 (dmar_global_lock){++++}, at: intel_iommu_init+0x61a/0xb13
Signed-off-by: Qian Cai <cai@lca.pw>
Acked-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
There are several places traverse RCU-list without holding any lock in
intel_iommu_init(). Fix them by acquiring dmar_global_lock.
WARNING: suspicious RCU usage
-----------------------------
drivers/iommu/intel-iommu.c:5216 RCU-list traversed in non-reader section!!
other info that might help us debug this:
rcu_scheduler_active = 2, debug_locks = 1
no locks held by swapper/0/1.
Call Trace:
dump_stack+0xa0/0xea
lockdep_rcu_suspicious+0x102/0x10b
intel_iommu_init+0x947/0xb13
pci_iommu_init+0x26/0x62
do_one_initcall+0xfe/0x500
kernel_init_freeable+0x45a/0x4f8
kernel_init+0x11/0x139
ret_from_fork+0x3a/0x50
DMAR: Intel(R) Virtualization Technology for Directed I/O
Fixes: d8190dc638 ("iommu/vt-d: Enable DMA remapping after rmrr mapped")
Signed-off-by: Qian Cai <cai@lca.pw>
Acked-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Martin noticed that nct6775 driver does not load properly on his system
in v5.4+ kernels. The issue was bisected to commit b84398d6d7 ("i2c:
i801: Use iTCO version 6 in Cannon Lake PCH and beyond") but it is
likely not the culprit because the faulty code has been in the driver
already since commit 9424693035 ("i2c: i801: Create iTCO device on
newer Intel PCHs"). So more likely some commit that added PCI IDs of
recent chipsets made the driver to create the iTCO_wdt device on Martins
system.
The issue was debugged to be PCI configuration access to the PMC device
that is not present. This returns all 1's when read and this caused the
iTCO_wdt driver to accidentally request resourses used by nct6775.
It turns out that the SMI resource is only required for some ancient
systems, not the ones supported by this driver. For this reason do not
populate the SMI resource at all and drop all the related code. The
driver now always populates the main I/O resource and only in case of SPT
(Intel Sunrisepoint) compatible devices it adds another resource for the
NO_REBOOT bit. These two resources are of different types so
platform_get_resource() used by the iTCO_wdt driver continues to find
the both resources at index 0.
Link: https://lore.kernel.org/linux-hwmon/CAM1AHpQ4196tyD=HhBu-2donSsuogabkfP03v1YF26Q7_BgvgA@mail.gmail.com/
Fixes: 9424693035 ("i2c: i801: Create iTCO device on newer Intel PCHs")
[wsa: complete fix needs all of http://patchwork.ozlabs.org/project/linux-i2c/list/?series=160959&state=*]
Reported-by: Martin Volf <martin.volf.42@gmail.com>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
The iTCO_wdt driver only needs ICH_RES_IO_SMI I/O resource when either
turn_SMI_watchdog_clear_off module parameter is set to match ->iTCO_version
(or higher), and when legacy iTCO_vendorsupport is set. Modify the driver
so that ICH_RES_IO_SMI is optional if the two conditions are not met.
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
In preparation for making ->smi_res optional the iTCO_wdt driver needs
to know whether vendorsupport is being set to non-zero. For this reason
export the variable.
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Pull cpupower utility fix for v5.6 from Shuah Khan:
"This cpupower update for Linux 5.6-rc6 consists of a fix from
Mike Gilbert for build failures when -fno-common is enabled.
-fno-common will be default in gcc v10."
* tag 'linux-cpupower-5.6-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux:
cpupower: avoid multiple definition with gcc -fno-common
Simon Wunderlich says:
====================
Here is a batman-adv bugfix:
- Don't schedule OGM for disabled interface, by Sven Eckelmann
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
What the driver writes into MAC_MAXLEN_CFG does not actually represent
VLAN_ETH_FRAME_LEN but instead ETH_FRAME_LEN + ETH_FCS_LEN. Yes they are
numerically equal, but the difference is important, as the switch treats
VLAN-tagged traffic specially and knows to increase the maximum accepted
frame size automatically. So it is always wrong to account for VLAN in
the MAC_MAXLEN_CFG register.
Unconditionally increase the maximum allowed frame size for
double-tagged traffic. Accounting for the additional length does not
mean that the other VLAN membership checks aren't performed, so there's
no harm done.
Also, stop abusing the MTU name for configuring the MRU. There is no
support for configuring the MRU on an interface at the moment.
Fixes: a556c76adc ("net: mscc: Add initial Ocelot switch support")
Fixes: fa914e9c4d ("net: mscc: ocelot: create a helper for changing the port MTU")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit e18b353f10 ("ipvlan: add cond_resched_rcu() while
processing muticast backlog") added a cond_resched_rcu() in a loop
using rcu protection to iterate over slaves.
This is breaking rcu rules, so lets instead use cond_resched()
at a point we can reschedule
Fixes: e18b353f10 ("ipvlan: add cond_resched_rcu() while processing muticast backlog")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In our production environment we have faced with problem that updating
classid in cgroup with heavy tasks cause long freeze of the file tables
in this tasks. By heavy tasks we understand tasks with many threads and
opened sockets (e.g. balancers). This freeze leads to an increase number
of client timeouts.
This patch implements following logic to fix this issue:
аfter iterating 1000 file descriptors file table lock will be released
thus providing a time gap for socket creation/deletion.
Now update is non atomic and socket may be skipped using calls:
dup2(oldfd, newfd);
close(oldfd);
But this case is not typical. Moreover before this patch skip is possible
too by hiding socket fd in unix socket buffer.
New sockets will be allocated with updated classid because cgroup state
is updated before start of the file descriptors iteration.
So in common cases this patch has no side effects.
Signed-off-by: Dmitry Yakunin <zeil@yandex-team.ru>
Reviewed-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
The Rx bound multicast packets are deferred to a workqueue and
macvlan can also suffer from the same attack that was discovered
by Syzbot for IPvlan. This solution is not as effective as in
IPvlan. IPvlan defers all (Tx and Rx) multicast packet processing
to a workqueue while macvlan does this way only for the Rx. This
fix should address the Rx codition to certain extent.
Tx is still suseptible. Tx multicast processing happens when
.ndo_start_xmit is called, hence we cannot add cond_resched().
However, it's not that severe since the user which is generating
/ flooding will be affected the most.
Fixes: 412ca1550c ("macvlan: Move broadcasts into a work queue")
Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
IPvlan in L3 mode discards outbound multicast packets but performs
the check before ensuring the ether-header is set or not. This is
an error that Eric found through code browsing.
Fixes: 2ad7bf3638 (“ipvlan: Initial check-in of the IPVLAN driver.”)
Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Reported-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It's a resource, not a parameter, so we can't copy it into the new
channel's TX queues, otherwise aliasing will lead to resource-
management bugs if the channel is subsequently torn down without
being initialised.
Before the Fixes:-tagged commit there was a similar bug with
tsoh_page, but I'm not sure it's worth doing another fix for such
old kernels.
Fixes: e9117e5099 ("sfc: Firmware-Assisted TSO version 2")
Suggested-by: Derek Shute <Derek.Shute@stratus.com>
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull Ktest fixes and clean ups from Steven Rostedt:
- Make the default option oldconfig instead of randconfig (one too many
times I lost my config because I left the build type out)
- Add timeout to ssh sync to sync before reboot (prevents test hangs)
- A couple of spelling fix patches
* tag 'ktest-v5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-ktest:
ktest: Fix typos in ktest.pl
ktest: Add timeout for ssh sync testing
ktest: Make default build option oldconfig not randconfig
ktest: Fix some typos in sample.conf
Pull MMC host fixes from Ulf Hansson:
- sdhci-msm: Silence warning about turning function into static
- sdhci-pci-gli: Fix support for GL975x by enabling MSI interrupt
* tag 'mmc-v5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
mmc: sdhci-pci-gli: Enable MSI interrupt for GL975x
mmc: sdhci-msm: Mark sdhci_msm_cqe_disable static
Pull virtio fixes from Michael Tsirkin:
"Some bug fixes all over the place"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
virtio_balloon: Adjust label in virtballoon_probe
virtio-blk: improve virtqueue error to BLK_STS
virtio-blk: fix hw_queue stopped on arbitrary error
virtio_ring: Fix mem leak with vring_new_virtqueue()
The alloc_pid() codepath used to be simpler. With the introducation of the
ability to choose specific pids in 49cb2fc42c ("fork: extend clone3() to
support setting a PID") it got more complex. It hasn't been super obvious
that ENOMEM is returned when the pid namespace init process/child subreaper
of the pid namespace has died. As can be seen from multiple attempts to
improve this see e.g. [1] and most recently [2].
We regressed returning ENOMEM in [3] and [2] restored it. Let's add a
comment on top explaining that this is historic and documented behavior and
cannot easily be changed.
[1]: 35f71bc0a0 ("fork: report pid reservation failure properly")
[2]: b26ebfe12f ("pid: Fix error return value in some cases")
[3]: 49cb2fc42c ("fork: extend clone3() to support setting a PID")
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
The recent futex inode life time fix changed the ordering of the futex key
union struct members, but forgot to adjust the hash function accordingly,
As a result the hashing omits the leading 64bit and even hashes beyond the
futex key causing a bad hash distribution which led to a ~100% performance
regression.
Hand in the futex key pointer instead of a random struct member and make
the size calculation based of the struct offset.
Fixes: 8019ad13ef ("futex: Fix inode life-time issue")
Reported-by: Rong Chen <rong.a.chen@intel.com>
Decoded-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Rong Chen <rong.a.chen@intel.com>
Link: https://lkml.kernel.org/r/87h7yy90ve.fsf@nanos.tec.linutronix.de
Before rebooting the box, a "ssh sync" is called to the test machine to see
if it is alive or not. But if the test machine is in a partial state, that
ssh may never actually finish, and the ktest test hangs.
Add a 10 second timeout to the sync test, which will fail after 10 seconds
and then cause the test to reboot the test machine.
Cc: stable@vger.kernel.org
Fixes: 6474ace999 ("ktest.pl: Powercycle the box on reboot if no connection can be made")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
For the last time, I screwed up my ktest config file, and the build went
into the default "randconfig", blowing away the .config that I had set up.
The reason for the default randconfig was because when this was first
written, I wanted to do a bunch of randconfigs. But as time progressed,
ktest isn't about randconfig anymore, and because randconfig destroys the
config in the build directory, it's a dangerous default to have. Use
oldconfig as the default.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
[why]
nv14 previously inherited soc bb from generic dcn 2, did not match
watermark values according to memory team
[how]
add nv14 specific soc bb: copy nv2 generic that it was
using from before, but changed num channels to 8
Signed-off-by: Martin Leung <martin.leung@amd.com>
Reviewed-by: Jun Lei <Jun.Lei@amd.com>
Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The hierarchical parts of MSM pinctrl/GPIO is only
used when the device tree has a "wakeup-parent" as
a phandle, but the .irq_eoi is anyway assigned leading
to semantic problems on elder Qualcomm chipsets.
When the drivers/mfd/qcom-pm8xxx.c driver calls
chained_irq_exit() that call will in turn call chip->irq_eoi()
which is set to irq_chip_eoi_parent() by default on a
hierachical IRQ chip, and the parent is pinctrl-msm.c
so that will in turn unconditionally call
irq_chip_eoi_parent() again, but its parent is invalid
so we get the following crash:
Unnable to handle kernel NULL pointer dereference at
virtual address 00000010
pgd = (ptrval)
[00000010] *pgd=00000000
Internal error: Oops: 5 [#1] PREEMPT SMP ARM
(...)
PC is at irq_chip_eoi_parent+0x4/0x10
LR is at pm8xxx_irq_handler+0x1b4/0x2d8
If we solve this crash by avoiding to call up to
irq_chip_eoi_parent(), the machine will hang and get
reset by the watchdog, because of semantic issues,
probably inside irq_chip.
As a solution, just assign the .irq_eoi conditionally if
we are actually using a wakeup parent.
Cc: David Heidelberg <david@ixit.cz>
Cc: Bjorn Andersson <bjorn.andersson@linaro.org>
Cc: Lina Iyer <ilina@codeaurora.org>
Cc: Stephen Boyd <swboyd@chromium.org>
Cc: stable@vger.kernel.org
Fixes: e35a6ae0eb ("pinctrl/msm: Setup GPIO chip in hierarchy")
Link: https://lore.kernel.org/r/20200306121221.1231296-1-linus.walleij@linaro.org
Link: https://lore.kernel.org/r/20200309125207.571840-1-linus.walleij@linaro.org
Link: https://lore.kernel.org/r/20200309152604.585112-1-linus.walleij@linaro.org
Tested-by: David Heidelberg <david@ixit.cz>
Acked-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Each OSS PCM plugins allocate its internal buffer per pre-calculation
of the max buffer size through the chain of plugins (calling
src_frames and dst_frames callbacks). This works for most plugins,
but the rate plugin might behave incorrectly. The calculation in the
rate plugin involves with the fractional position, i.e. it may vary
depending on the input position. Since the buffer size
pre-calculation is always done with the offset zero, it may return a
shorter size than it might be; this may result in the out-of-bound
access as spotted by fuzzer.
This patch addresses those possible buffer overflow accesses by simply
setting the upper limit per the given buffer size for each plugin
before src_frames() and after dst_frames() calls.
Reported-by: syzbot+e1fe9f44fb8ecf4fb5dd@syzkaller.appspotmail.com
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/000000000000b25ea005a02bcf21@google.com
Link: https://lore.kernel.org/r/20200309082148.19855-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
In commit 1ec17dbd90 ("inet_diag: fix reporting cgroup classid and
fallback to priority") croup classid reporting was fixed. But this works
only for TCP sockets because for other socket types icsk parameter can
be NULL and classid code path is skipped. This change moves classid
handling to inet_diag_msg_attrs_fill() function.
Also inet_diag_msg_attrs_size() helper was added and addends in
nlmsg_new() were reordered to save order from inet_sk_diag_fill().
Fixes: 1ec17dbd90 ("inet_diag: fix reporting cgroup classid and fallback to priority")
Signed-off-by: Dmitry Yakunin <zeil@yandex-team.ru>
Reviewed-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
ACS (auto PAD/FCS stripping) removes FCS off 802.3 packets (LLC) so that
there is no need to manually strip it for such packets. The enhanced DMA
descriptors allow to flag LLC packets so that the receiving callback can
use that to strip FCS manually or not. On the other hand, normal
descriptors do not support that.
Thus in order to not truncate LLC packet ACS should be disabled when
using normal DMA descriptors.
Fixes: 47dd7a540b ("net: add support for STMicroelectronics Ethernet controllers.")
Signed-off-by: Remi Pommarel <repk@triplefau.lt>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is a problem when ipvlan slaves are created on a master device that
is a vmxnet3 device (ipvlan in VMware guests). The vmxnet3 driver does not
support unicast address filtering. When an ipvlan device is brought up in
ipvlan_open(), the ipvlan driver calls dev_uc_add() to add the hardware
address of the vmxnet3 master device to the unicast address list of the
master device, phy_dev->uc. This inevitably leads to the vmxnet3 master
device being forced into promiscuous mode by __dev_set_rx_mode().
Promiscuous mode is switched on the master despite the fact that there is
still only one hardware address that the master device should use for
filtering in order for the ipvlan device to be able to receive packets.
The comment above struct net_device describes the uc_promisc member as a
"counter, that indicates, that promiscuous mode has been enabled due to
the need to listen to additional unicast addresses in a device that does
not implement ndo_set_rx_mode()". Moreover, the design of ipvlan
guarantees that only the hardware address of a master device,
phy_dev->dev_addr, will be used to transmit and receive all packets from
its ipvlan slaves. Thus, the unicast address list of the master device
should not be modified by ipvlan_open() and ipvlan_stop() in order to make
ipvlan a workable option on masters that do not support unicast address
filtering.
Fixes: 2ad7bf3638 ("ipvlan: Initial check-in of the IPVLAN driver")
Reported-by: Per Sundstrom <per.sundstrom@redqube.se>
Signed-off-by: Jiri Wiesner <jwiesner@suse.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
After more careful studying, Paul informs me that we cannot rely on
ordering of RCU callbacks in the way that the the tagged commit did.
The current construct looks like this:
void C(struct rcu_head *rhp)
{
do_something(rhp);
call_rcu(&p->rh, B);
}
call_rcu(&p->rh, A);
call_rcu(&p->rh, C);
and we're relying on ordering between A and B, which isn't guaranteed.
Make this explicit instead, and have a work item issue the rcu_barrier()
to ensure that A has run before we manually execute B.
While thorough testing never showed this issue, it's dependent on the
per-cpu load in terms of RCU callbacks. The updated method simplifies
the code as well, and eliminates the need to maintain an rcu_head in
the fileset data.
Fixes: c1e2148f8e ("io_uring: free fixed_file_data after RCU grace period")
Reported-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
SPS30 uses triggered buffer, but the dependency is not specified in the
Kconfig file. Fix this by selecting IIO_BUFFER and IIO_TRIGGERED_BUFFER
config symbols.
Cc: stable@vger.kernel.org
Fixes: 232e0f6dde ("iio: chemical: add support for Sensirion SPS30 sensor")
Signed-off-by: Petr Štetiar <ynezz@true.cz>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Vishay has published a new version of "Designing the VCNL4200 Into an
Application" application note in October 2019. The new version specifies
that there is +-20% of part to part tolerance. Although the application
note is related to vcnl4200, according to support the vcnl4040's "ASIC
is quite similar to that one for the VCNL4200".
So update the sampling periods (and comment), including the correct
sampling period for proximity. Both sampling periods are lower. Users
relying on the blocking behaviour of reading will get proximity
measurements much earlier.
Fixes: 5a441aade5 ("iio: light: vcnl4000 add support for the VCNL4040 proximity and light sensor")
Reviewed-by: Guido Günther <agx@sigxcpu.org>
Tested-by: Guido Günther <agx@sigxcpu.org>
Signed-off-by: Tomas Novotny <tomas@novotny.cz>
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Vishay has published a new version of "Designing the VCNL4200 Into an
Application" application note in October 2019. The new version specifies
that there is +-20% of part to part tolerance. This explains the drift
seen during experiments. The proximity pulse width is also changed from
32us to 30us. According to the support, the tolerance also applies to
ambient light.
So update the sampling periods. As the reading is blocking, current
users may notice slightly longer response time.
Fixes: be38866fbb ("iio: vcnl4000: add support for VCNL4200")
Reviewed-by: Guido Günther <agx@sigxcpu.org>
Signed-off-by: Tomas Novotny <tomas@novotny.cz>
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Recent changes to alloc_pid() allow the pid number to be specified on
the command line. If set_tid_size is set, then the code scanning the
levels will hard-set retval to -EPERM, overriding it's previous -ENOMEM
value.
After the code scanning the levels, there are error returns that do not
set retval, assuming it is still set to -ENOMEM.
So set retval back to -ENOMEM after scanning the levels.
Fixes: 49cb2fc42c ("fork: extend clone3() to support setting a PID")
Signed-off-by: Corey Minyard <cminyard@mvista.com>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Adrian Reber <areber@redhat.com>
Cc: <stable@vger.kernel.org> # 5.5
Link: https://lore.kernel.org/r/20200306172314.12232-1-minyard@acm.org
[christian.brauner@ubuntu.com: fixup commit message]
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Let's change the mapping between virtqueue_add errors to BLK_STS
statuses, so that -ENOSPC, which indicates virtqueue full is still
mapped to BLK_STS_DEV_RESOURCE, but -ENOMEM which indicates non-device
specific resource outage is mapped to BLK_STS_RESOURCE.
Signed-off-by: Halil Pasic <pasic@linux.ibm.com>
Link: https://lore.kernel.org/r/20200213123728.61216-3-pasic@linux.ibm.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Since nobody else is going to restart our hw_queue for us, the
blk_mq_start_stopped_hw_queues() is in virtblk_done() is not sufficient
necessarily sufficient to ensure that the queue will get started again.
In case of global resource outage (-ENOMEM because mapping failure,
because of swiotlb full) our virtqueue may be empty and we can get
stuck with a stopped hw_queue.
Let us not stop the queue on arbitrary errors, but only on -EONSPC which
indicates a full virtqueue, where the hw_queue is guaranteed to get
started by virtblk_done() before when it makes sense to carry on
submitting requests. Let us also remove a stale comment.
Signed-off-by: Halil Pasic <pasic@linux.ibm.com>
Cc: Jens Axboe <axboe@kernel.dk>
Fixes: f7728002c1 ("virtio_ring: fix return code on DMA mapping fails")
Link: https://lore.kernel.org/r/20200213123728.61216-2-pasic@linux.ibm.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
The functions vring_new_virtqueue() and __vring_new_virtqueue() are used
with split rings, and any allocations within these functions are managed
outside of the .we_own_ring flag. The commit cbeedb72b9 ("virtio_ring:
allocate desc state for split ring separately") allocates the desc state
within the __vring_new_virtqueue() but frees it only when the .we_own_ring
flag is set. This leads to a memory leak when freeing such allocated
virtqueues with the vring_del_virtqueue() function.
Fix this by moving the desc_state free code outside the flag and only
for split rings. Issue was discovered during testing with remoteproc
and virtio_rpmsg.
Fixes: cbeedb72b9 ("virtio_ring: allocate desc state for split ring separately")
Signed-off-by: Suman Anna <s-anna@ti.com>
Link: https://lore.kernel.org/r/20200224212643.30672-1-s-anna@ti.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
There is a race and a buffer overflow corrupting a kernel memory while
reading an EFI variable with a size more than 1024 bytes via the older
sysfs method. This happens because accessing struct efi_variable in
efivar_{attr,size,data}_read() and friends is not protected from
a concurrent access leading to a kernel memory corruption and, at best,
to a crash. The race scenario is the following:
CPU0: CPU1:
efivar_attr_read()
var->DataSize = 1024;
efivar_entry_get(... &var->DataSize)
down_interruptible(&efivars_lock)
efivar_attr_read() // same EFI var
var->DataSize = 1024;
efivar_entry_get(... &var->DataSize)
down_interruptible(&efivars_lock)
virt_efi_get_variable()
// returns EFI_BUFFER_TOO_SMALL but
// var->DataSize is set to a real
// var size more than 1024 bytes
up(&efivars_lock)
virt_efi_get_variable()
// called with var->DataSize set
// to a real var size, returns
// successfully and overwrites
// a 1024-bytes kernel buffer
up(&efivars_lock)
This can be reproduced by concurrent reading of an EFI variable which size
is more than 1024 bytes:
ts# for cpu in $(seq 0 $(nproc --ignore=1)); do ( taskset -c $cpu \
cat /sys/firmware/efi/vars/KEKDefault*/size & ) ; done
Fix this by using a local variable for a var's data buffer size so it
does not get overwritten.
Fixes: e14ab23dde ("efivars: efivar_entry API")
Reported-by: Bob Sanders <bob.sanders@hpe.com> and the LTP testsuite
Signed-off-by: Vladis Dronov <vdronov@redhat.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20200305084041.24053-2-vdronov@redhat.com
Link: https://lore.kernel.org/r/20200308080859.21568-24-ardb@kernel.org
After FS_IOC_REMOVE_ENCRYPTION_KEY removes a key, it syncs the
filesystem and tries to get and put all inodes that were unlocked by the
key so that unused inodes get evicted via fscrypt_drop_inode().
Normally, the inodes are all clean due to the sync.
However, after the filesystem is sync'ed, userspace can modify and close
one of the files. (Userspace is *supposed* to close the files before
removing the key. But it doesn't always happen, and the kernel can't
assume it.) This causes the inode to be dirtied and have i_count == 0.
Then, fscrypt_drop_inode() failed to consider this case and indicated
that the inode can be dropped, causing the write to be lost.
On f2fs, other problems such as a filesystem freeze could occur due to
the inode being freed while still on f2fs's dirty inode list.
Fix this bug by making fscrypt_drop_inode() only drop clean inodes.
I've written an xfstest which detects this bug on ext4, f2fs, and ubifs.
Fixes: b1c0ec3599 ("fscrypt: add FS_IOC_REMOVE_ENCRYPTION_KEY ioctl")
Cc: <stable@vger.kernel.org> # v5.4+
Link: https://lore.kernel.org/r/20200305084138.653498-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
There is a ACT8600 on the CI20 board and the bindings of the
ACT8865 driver have changed without updating the CI20 device
tree. Therefore the PMU can not be probed successfully and
is running in power-on reset state.
Fix DT to match the latest act8865-regulator bindings.
Fixes: 73f2b94047 ("MIPS: CI20: DTS: Add I2C nodes")
Cc: stable@vger.kernel.org
Signed-off-by: H. Nikolaus Schaller <hns@goldelico.com>
Reviewed-by: Paul Cercueil <paul@crapouillou.net>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
rhashtable_lookup_get_insert_key doesn't have a parameter `data`. It
does have a parameter `key`, however.
Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
These are a couple of read locks that should be write locks.
Fixes: fbb39807e9 ("ionic: support sr-iov operations")
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Madalin Bucur says:
====================
QorIQ DPAA FMan erratum A050385 workaround
Changes in v2:
- added CONFIG_DPAA_ERRATUM_A050385
- removed unnecessary parenthesis
- changed alignment defines to use only decimal values
The patch set implements the workaround for FMan erratum A050385:
FMAN DMA read or writes under heavy traffic load may cause FMAN
internal resource leak; thus stopping further packet processing.
To reproduce this issue when the workaround is not applied, one
needs to ensure the FMan DMA transaction queue is already full
when a transaction split occurs so the system must be under high
traffic load (i.e. multiple ports at line rate). After the errata
occurs, the traffic stops. The only SoC impacted by this is the
LS1043A, the other ARM DPAA 1 SoC or the PPC DPAA 1 SoCs do not
have this erratum.
The FMAN internal queue can overflow when FMAN splits single
read or write transactions into multiple smaller transactions
such that more than 17 AXI transactions are in flight from FMAN
to interconnect. When the FMAN internal queue overflows, it can
stall further packet processing. The issue can occur with any one
of the following three conditions:
1. FMAN AXI transaction crosses 4K address boundary (Errata
A010022)
2. FMAN DMA address for an AXI transaction is not 16 byte
aligned, i.e. the last 4 bits of an address are non-zero
3. Scatter Gather (SG) frames have more than one SG buffer in
the SG list and any one of the buffers, except the last
buffer in the SG list has data size that is not a multiple
of 16 bytes, i.e., other than 16, 32, 48, 64, etc.
With any one of the above three conditions present, there is
likelihood of stalled FMAN packet processing, especially under
stress with multiple ports injecting line-rate traffic.
To avoid situations that stall FMAN packet processing, all of the
above three conditions must be avoided; therefore, configure the
system with the following rules:
1. Frame buffers must not span a 4KB address boundary, unless
the frame start address is 256 byte aligned
2. All FMAN DMA start addresses (for example, BMAN buffer
address, FD[address] + FD[offset]) are 16B aligned
3. SG table and buffer addresses are 16B aligned and the size
of SG buffers are multiple of 16 bytes, except for the last
SG buffer that can be of any size.
Additional workaround notes:
- Address alignment of 64 bytes is recommended for maximally
efficient system bus transactions (although 16 byte alignment is
sufficient to avoid the stall condition)
- To support frame sizes that are larger than 4K bytes, there are
two options:
1. Large single buffer frames that span a 4KB page boundary can
be converted into SG frames to avoid transaction splits at
the 4KB boundary,
2. Align the large single buffer to 256B address boundaries,
ensure that the frame address plus offset is 256B aligned.
- If software generated SG frames have buffers that are unaligned
and with random non-multiple of 16 byte lengths, before
transmitting such frames via FMAN, frames will need to be copied
into a new single buffer or multiple buffer SG frame that is
compliant with the three rules listed above.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Align buffers, data start, SG fragment length to avoid DMA splits.
These changes prevent the A050385 erratum to manifest itself:
FMAN DMA read or writes under heavy traffic load may cause FMAN
internal resource leak; thus stopping further packet processing.
The FMAN internal queue can overflow when FMAN splits single
read or write transactions into multiple smaller transactions
such that more than 17 AXI transactions are in flight from FMAN
to interconnect. When the FMAN internal queue overflows, it can
stall further packet processing. The issue can occur with any one
of the following three conditions:
1. FMAN AXI transaction crosses 4K address boundary (Errata
A010022)
2. FMAN DMA address for an AXI transaction is not 16 byte
aligned, i.e. the last 4 bits of an address are non-zero
3. Scatter Gather (SG) frames have more than one SG buffer in
the SG list and any one of the buffers, except the last
buffer in the SG list has data size that is not a multiple
of 16 bytes, i.e., other than 16, 32, 48, 64, etc.
With any one of the above three conditions present, there is
likelihood of stalled FMAN packet processing, especially under
stress with multiple ports injecting line-rate traffic.
To avoid situations that stall FMAN packet processing, all of the
above three conditions must be avoided; therefore, configure the
system with the following rules:
1. Frame buffers must not span a 4KB address boundary, unless
the frame start address is 256 byte aligned
2. All FMAN DMA start addresses (for example, BMAN buffer
address, FD[address] + FD[offset]) are 16B aligned
3. SG table and buffer addresses are 16B aligned and the size
of SG buffers are multiple of 16 bytes, except for the last
SG buffer that can be of any size.
Additional workaround notes:
- Address alignment of 64 bytes is recommended for maximally
efficient system bus transactions (although 16 byte alignment is
sufficient to avoid the stall condition)
- To support frame sizes that are larger than 4K bytes, there are
two options:
1. Large single buffer frames that span a 4KB page boundary can
be converted into SG frames to avoid transaction splits at
the 4KB boundary,
2. Align the large single buffer to 256B address boundaries,
ensure that the frame address plus offset is 256B aligned.
- If software generated SG frames have buffers that are unaligned
and with random non-multiple of 16 byte lengths, before
transmitting such frames via FMAN, frames will need to be copied
into a new single buffer or multiple buffer SG frame that is
compliant with the three rules listed above.
Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The LS1043A SoC is affected by the A050385 erratum stating that
FMAN DMA read or writes under heavy traffic load may cause FMAN
internal resource leak thus stopping further packet processing.
Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
FMAN DMA read or writes under heavy traffic load may cause FMAN
internal resource leak; thus stopping further packet processing.
The FMAN internal queue can overflow when FMAN splits single
read or write transactions into multiple smaller transactions
such that more than 17 AXI transactions are in flight from FMAN
to interconnect. When the FMAN internal queue overflows, it can
stall further packet processing. The issue can occur with any one
of the following three conditions:
1. FMAN AXI transaction crosses 4K address boundary (Errata
A010022)
2. FMAN DMA address for an AXI transaction is not 16 byte
aligned, i.e. the last 4 bits of an address are non-zero
3. Scatter Gather (SG) frames have more than one SG buffer in
the SG list and any one of the buffers, except the last
buffer in the SG list has data size that is not a multiple
of 16 bytes, i.e., other than 16, 32, 48, 64, etc.
With any one of the above three conditions present, there is
likelihood of stalled FMAN packet processing, especially under
stress with multiple ports injecting line-rate traffic.
To avoid situations that stall FMAN packet processing, all of the
above three conditions must be avoided; therefore, configure the
system with the following rules:
1. Frame buffers must not span a 4KB address boundary, unless
the frame start address is 256 byte aligned
2. All FMAN DMA start addresses (for example, BMAN buffer
address, FD[address] + FD[offset]) are 16B aligned
3. SG table and buffer addresses are 16B aligned and the size
of SG buffers are multiple of 16 bytes, except for the last
SG buffer that can be of any size.
Additional workaround notes:
- Address alignment of 64 bytes is recommended for maximally
efficient system bus transactions (although 16 byte alignment is
sufficient to avoid the stall condition)
- To support frame sizes that are larger than 4K bytes, there are
two options:
1. Large single buffer frames that span a 4KB page boundary can
be converted into SG frames to avoid transaction splits at
the 4KB boundary,
2. Align the large single buffer to 256B address boundaries,
ensure that the frame address plus offset is 256B aligned.
- If software generated SG frames have buffers that are unaligned
and with random non-multiple of 16 byte lengths, before
transmitting such frames via FMAN, frames will need to be copied
into a new single buffer or multiple buffer SG frame that is
compliant with the three rules listed above.
Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for net:
1) Patches to bump position index from sysctl seq_next,
from Vasilin Averin.
2) Release flowtable hook from error path, from Florian Westphal.
3) Patches to add missing netlink attribute validation,
from Jakub Kicinski.
4) Missing NFTA_CHAIN_FLAGS in nf_tables_fill_chain_info().
5) Infinite loop in module autoload if extension is not available,
from Florian Westphal.
6) Missing module ownership in inet/nat chain type definition.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:
struct foo {
int stuff;
struct boo array[];
};
By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.
Also, notice that, dynamic memory allocations won't be affected by
this change:
"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]
This issue was found with the help of Coccinelle.
[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit 7649773293 ("cxgb3/l2t: Fix undefined behaviour")
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
Adjust indentation from spaces to tab (+optional two spaces) as in
coding style with command like:
$ sed -e 's/^ /\t/' -i */Kconfig
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
Since commit 3b2323c2c1 ("perf bench futex: Use cpumaps") the default
number of threads the benchmark uses got changed from number of online
CPUs to zero:
$ perf bench futex wake
# Running 'futex/wake' benchmark:
Run summary [PID 15930]: blocking on 0 threads (at [private] futex 0x558b8ee4bfac), waking up 1 at a time.
[Run 1]: Wokeup 0 of 0 threads in 0.0000 ms
[...]
[Run 10]: Wokeup 0 of 0 threads in 0.0000 ms
Wokeup 0 of 0 threads in 0.0004 ms (+-40.82%)
Restore the old behavior by grabbing the number of online CPUs via
cpu->nr:
$ perf bench futex wake
# Running 'futex/wake' benchmark:
Run summary [PID 18356]: blocking on 8 threads (at [private] futex 0xb3e62c), waking up 1 at a time.
[Run 1]: Wokeup 8 of 8 threads in 0.0260 ms
[...]
[Run 10]: Wokeup 8 of 8 threads in 0.0270 ms
Wokeup 8 of 8 threads in 0.0419 ms (+-24.35%)
Fixes: 3b2323c2c1 ("perf bench futex: Use cpumaps")
Signed-off-by: Tommi Rantala <tommi.t.rantala@nokia.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lore.kernel.org/lkml/20200305083714.9381-3-tommi.t.rantala@nokia.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Since glibc 2.28 when running 'perf top --stdio', input handling no
longer works, but hitting any key always just prints the "Mapped keys"
help text.
To fix it, call clearerr() in the display_thread() loop to clear any EOF
sticky errors, as instructed in the glibc NEWS file
(https://sourceware.org/git/?p=glibc.git;a=blob;f=NEWS):
* All stdio functions now treat end-of-file as a sticky condition. If you
read from a file until EOF, and then the file is enlarged by another
process, you must call clearerr or another function with the same effect
(e.g. fseek, rewind) before you can read the additional data. This
corrects a longstanding C99 conformance bug. It is most likely to affect
programs that use stdio to read interactive input from a terminal.
(Bug #1190.)
Signed-off-by: Tommi Rantala <tommi.t.rantala@nokia.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20200305083714.9381-2-tommi.t.rantala@nokia.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
clang warns:
util/block-info.c:298:18: error: result of comparison against a string
literal is unspecified (use an explicit string comparison function
instead) [-Werror,-Wstring-compare]
if ((start_line != SRCLINE_UNKNOWN) && (end_line != SRCLINE_UNKNOWN)) {
^ ~~~~~~~~~~~~~~~
util/block-info.c:298:51: error: result of comparison against a string
literal is unspecified (use an explicit string comparison function
instead) [-Werror,-Wstring-compare]
if ((start_line != SRCLINE_UNKNOWN) && (end_line != SRCLINE_UNKNOWN)) {
^ ~~~~~~~~~~~~~~~
util/block-info.c:298:18: error: result of comparison against a string
literal is unspecified (use an explicit string
comparison function instead) [-Werror,-Wstring-compare]
if ((start_line != SRCLINE_UNKNOWN) && (end_line != SRCLINE_UNKNOWN)) {
^ ~~~~~~~~~~~~~~~
util/block-info.c:298:51: error: result of comparison against a string
literal is unspecified (use an explicit string comparison function
instead) [-Werror,-Wstring-compare]
if ((start_line != SRCLINE_UNKNOWN) && (end_line != SRCLINE_UNKNOWN)) {
^ ~~~~~~~~~~~~~~~
util/map.c:434:15: error: result of comparison against a string literal
is unspecified (use an explicit string comparison function instead)
[-Werror,-Wstring-compare]
if (srcline != SRCLINE_UNKNOWN)
^ ~~~~~~~~~~~~~~~
Reviewer Notes:
Looks good to me. Some more context:
https://clang.llvm.org/docs/DiagnosticsReference.html#wstring-compare
The spec says:
J.1 Unspecified behavior
The following are unspecified:
.. Whether two string literals result in distinct arrays (6.4.5).
Signed-off-by: Nick Desaulniers <nick.desaulniers@gmail.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Changbin Du <changbin.du@intel.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: John Keeping <john@metanate.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: clang-built-linux@googlegroups.com
Link: https://github.com/ClangBuiltLinux/linux/issues/900
Link: http://lore.kernel.org/lkml/20200223193456.25291-1-nick.desaulniers@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
As reported by Jann, ihold() does not in fact guarantee inode
persistence. And instead of making it so, replace the usage of inode
pointers with a per boot, machine wide, unique inode identifier.
This sequence number is global, but shared (file backed) futexes are
rare enough that this should not become a performance issue.
Reported-by: Jann Horn <jannh@google.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Currently passive MPTCP socket can skip including the DACK
option - if the peer sends data before accept() completes.
The above happens because the msk 'can_ack' flag is set
only after the accept() call.
Such missing DACK option may cause - as per RFC spec -
unwanted fallback to TCP.
This change addresses the issue using the key material
available in the current subflow, if any, to create a suitable
dack option when msk ack seq is not yet available.
v1 -> v2:
- adavance the generated ack after the initial MPC packet
Fixes: d22f4988ff ("mptcp: process MP_CAPABLE data option")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is similar to commit 674d9de02a ("NFC: Fix possible memory
corruption when handling SHDLC I-Frame commands") and commit d7ee81ad09
("NFC: nci: Add some bounds checking in nci_hci_cmd_received()") which
added range checks on "pipe".
The "pipe" variable comes skb->data[0] in nfc_hci_msg_rx_work().
It's in the 0-255 range. We're using it as the array index into the
hdev->pipes[] array which has NFC_HCI_MAX_PIPES (128) members.
Fixes: 118278f20a ("NFC: hci: Add pipes table to reference them with a tuple {gate, host}")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Without this, we get a couple of warnings when CONFIG_PM
is disabled:
drivers/gpu/drm/arm/display/komeda/komeda_drv.c:156:12: error: 'komeda_rt_pm_resume' defined but not used [-Werror=unused-function]
static int komeda_rt_pm_resume(struct device *dev)
^~~~~~~~~~~~~~~~~~~
drivers/gpu/drm/arm/display/komeda/komeda_drv.c:149:12: error: 'komeda_rt_pm_suspend' defined but not used [-Werror=unused-function]
static int komeda_rt_pm_suspend(struct device *dev)
^~~~~~~~~~~~~~~~~~~~
Fixes: efb4650885 ("drm/komeda: Add runtime_pm support")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: James Qian Wang (Arm Technology China) <james.qian.wang@arm.com>
Signed-off-by: james qian wang (Arm Technology China) <james.qian.wang@arm.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200107215327.1579195-1-arnd@arndb.de
The emulated vbt doesn't tell its size correctly. According to the
intel_vbt_defs.h, vbt_header.vbt_size should the size of VBT (VBT Header,
BDB Header and data blocks), and bdb_header.bdb_size should be the size
of BDB (BDB Header and data blocks).
This patch fixes the issue and lets vbt provided by GVT-g pass the guest
i915's sanity test.
v2: refine the commit message. (Zhenyu)
Signed-off-by: Tina Zhang <tina.zhang@intel.com>
Reviewed-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20200305131600.29640-1-tina.zhang@intel.com
When local NET_RX backlog is full due to traffic overrun,
peer veth tx_dropped counter increases. At that time, list
local veth stats, rx_dropped has double value of peer
tx_dropped, even bigger than transmit packets by peer.
In NET_RX softirq process, if any packet drop case happens,
it increases dev's rx_dropped counter and returns NET_RX_DROP.
At veth tx side, it records any error returned from peer netif_rx
into local dev tx_dropped counter.
In veth get stats process, it puts local dev rx_dropped and
peer dev tx_dropped into together as local rx_drpped value.
So that it shows double value of real dropped packets number in
this case.
This patch ignores peer tx_dropped when counting local rx_dropped,
since peer tx_dropped is duplicated to local rx_dropped at most cases.
Signed-off-by: Jiang Lidong <jianglidong3@jd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kalle Valo says:
====================
wireless-drivers fixes for v5.6
Second set of fixes for v5.6. Only two small fixes this time.
iwlwifi
* fix another initialisation regression with 3168 devices
mt76
* fix memory corruption with too many rx fragments
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
We now ignore the "completion" event when using tx queue timestamping,
and only pay attention to the two (high and low) timestamp events. The
NIC will send a pair of timestamp events for every packet transmitted.
The current firmware may merge the completion events, and it is possible
that future versions may reorder the completion and timestamp events.
As such the completion event is not useful.
Without this patch in place a merged completion event on a queue with
timestamping will cause a "spurious TX completion" error. This affects
SFN8000-series adapters.
Signed-off-by: Tom Zhao <tzhao@solarflare.com>
Acked-by: Martin Habets <mhabets@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When fibre port supports auto-negotiation, the IMP(Intelligent
Management Process) processes the speed of auto-negotiation
and the user's speed separately.
For below case, the port will get a not link up problem.
step 1: disables auto-negotiation and sets speed to A, then
the driver's MAC speed will be updated to A.
step 2: enables auto-negotiation and MAC gets negotiated
speed B, then the driver's MAC speed will be updated to B
through querying in periodical task.
step 3: MAC gets new negotiated speed A.
step 4: disables auto-negotiation and sets speed to B before
periodical task query new MAC speed A, the driver will ignore
the speed configuration.
This patch fixes it by skipping speed and duplex checking when
fibre port supports auto-negotiation.
Fixes: 22f48e24a2 ("net: hns3: add autoneg and change speed support for fibre port")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This was evidently copy and pasted from the i915 driver, but the text
wasn't updated.
Fixes: 4f337faf1c ("KVM: allow disabling -Werror")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
nft will loop forever if the kernel doesn't support an expression:
1. nft_expr_type_get() appends the family specific name to the module list.
2. -EAGAIN is returned to nfnetlink, nfnetlink calls abort path.
3. abort path sets ->done to true and calls request_module for the
expression.
4. nfnetlink replays the batch, we end up in nft_expr_type_get() again.
5. nft_expr_type_get attempts to append family-specific name. This
one already exists on the list, so we continue
6. nft_expr_type_get adds the generic expression name to the module
list. -EAGAIN is returned, nfnetlink calls abort path.
7. abort path encounters the family-specific expression which
has 'done' set, so it gets removed.
8. abort path requests the generic expression name, sets done to true.
9. batch is replayed.
If the expression could not be loaded, then we will end up back at 1),
because the family-specific name got removed and the cycle starts again.
Note that userspace can SIGKILL the nft process to stop the cycle, but
the desired behaviour is to return an error after the generic expr name
fails to load the expression.
Fixes: eb014de4fd ("netfilter: nf_tables: autoload modules from the abort path")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Some older version of GAS do not support the ADX instructions, similarly
to how they also don't support AVX and such. This commit adds the same
build-time detection mechanisms we use for AVX and others for ADX, and
then makes sure that the curve25519 library dispatcher calls the right
functions.
Reported-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Older (and maybe current) versions of systemd set release_agent to "" when
shutting down, but do not set notify_on_release to 0.
Since 64e90a8acb ("Introduce STATIC_USERMODEHELPER to mediate
call_usermodehelper()"), we filter out such calls when the user mode helper
path is "". However, when used in conjunction with an actual (i.e. non "")
STATIC_USERMODEHELPER, the path is never "", so the real usermode helper
will be called with argv[0] == "".
Let's avoid this by not invoking the release_agent when it is "".
Signed-off-by: Tycho Andersen <tycho@tycho.ws>
Signed-off-by: Tejun Heo <tj@kernel.org>
Similar to the commit d749534322 ("cgroup: fix incorrect
WARN_ON_ONCE() in cgroup_setup_root()"), cgroup_id(root_cgrp) does not
equal to 1 on 32bit ino archs which triggers all sorts of issues with
psi_show() on s390x. For example,
BUG: KASAN: slab-out-of-bounds in collect_percpu_times+0x2d0/
Read of size 4 at addr 000000001e0ce000 by task read_all/3667
collect_percpu_times+0x2d0/0x798
psi_show+0x7c/0x2a8
seq_read+0x2ac/0x830
vfs_read+0x92/0x150
ksys_read+0xe2/0x188
system_call+0xd8/0x2b4
Fix it by using cgroup_ino().
Fixes: 743210386c ("cgroup: use cgrp->kn->id as the cgroup ID")
Signed-off-by: Qian Cai <cai@lca.pw>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: stable@vger.kernel.org # v5.5
The way cookie_init_hw_msi_region() allocates the iommu_dma_msi_page
structures doesn't match the way iommu_put_dma_cookie() frees them.
The former performs a single allocation of all the required structures,
while the latter tries to free them one at a time. It doesn't quite
work for the main use case (the GICv3 ITS where the range is 64kB)
when the base granule size is 4kB.
This leads to a nice slab corruption on teardown, which is easily
observable by simply creating a VF on a SRIOV-capable device, and
tearing it down immediately (no need to even make use of it).
Fortunately, this only affects systems where the ITS isn't translated
by the SMMU, which are both rare and non-standard.
Fix it by allocating iommu_dma_msi_page structures one at a time.
Fixes: 7c1b058c8b ("iommu/dma: Handle IOMMU API reserved regions")
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Will Deacon <will@kernel.org>
Cc: stable@vger.kernel.org
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Enable MSI interrupt for GL9750/GL9755. Some platforms
do not support PCI INTx and devices can not work without
interrupt. Like messages below:
[ 4.487132] sdhci-pci 0000:01:00.0: SDHCI controller found [17a0:9755] (rev 0)
[ 4.487198] ACPI BIOS Error (bug): Could not resolve symbol [\_SB.PCI0.PBR2._PRT.APS2], AE_NOT_FOUND (20190816/psargs-330)
[ 4.487397] ACPI Error: Aborting method \_SB.PCI0.PBR2._PRT due to previous error (AE_NOT_FOUND) (20190816/psparse-529)
[ 4.487707] pcieport 0000:00:01.3: can't derive routing for PCI INT A
[ 4.487709] sdhci-pci 0000:01:00.0: PCI INT A: no GSI
Signed-off-by: Ben Chuang <ben.chuang@genesyslogic.com.tw>
Tested-by: Raul E Rangel <rrangel@chromium.org>
Fixes: e51df6ce66 ("mmc: host: sdhci-pci: Add Genesys Logic GL975x support")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20200219092900.9151-1-benchuanggli@gmail.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Pull perf/urgent fixes from Arnaldo Carvalho de Melo:
perf symbols:
Arnaldo Carvalho de Melo:
- Don't try to find a vmlinux file when looking for kernel modules,
fixing symbol resolution in systems with compressed kernel modules.
perf env:
Arnaldo Carvalho de Melo:
- Do not return pointers to local variables, fixing valid warning from
gcc 10 for corner case that stops the build due to -Werror.
perf tests:
Arnaldo Carvalho de Melo:
- Make global variable static in the bp_account entry to fix build
with gcc 10.
perf parse-events:
Arnaldo Carvalho de Melo:
- Use asprintf() instead of strncpy() to read tracepoint files, addressing
compiler warning that stops the build as we use -Werror.
perf bench:
Arnaldo Carvalho de Melo:
- Share some global variables to fix build with gcc 10.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This function is type bool, and it's supposed to return true on success.
Unfortunately, this path takes negative error codes and casts them to
bool (true) so it's treated as success instead of failure.
Fixes: 91c0c12080 ("thunderbolt: Add support for lane bonding")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
When registers a phy_device successful, should terminate the loop
or the phy_device would be registered in other addr. If there are
multiple PHYs without reg properties, it will go wrong.
Signed-off-by: Dajun Jin <adajunjin@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hardware can support more than 8 queues currently limited by
netif_get_num_default_rss_queues(). So, rework and fix checks for max
number of queues to allocate. The checks should be based on how many are
actually supported by hardware, OR the number of online cpus; whichever
is lower.
Fixes: 5952dde723 ("cxgb4: set maximal number of default RSS queues")
Signed-off-by: Vishal Kulkarni <vishal@chelsio.com>"
Signed-off-by: David S. Miller <davem@davemloft.net>
This should improve the error message when the PHY validate in the MAC
driver failed. I ran into this problem multiple times that I put wrong
interface values into the device tree and was searching why it is
failing with -22 (-EINVAL). This should make it easier to spot the
problem.
Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
Acked-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
The devlink trigger command does not exist. While rewriting the
documentation for devlink into the reStructuredText format,
documentation for the trigger command was accidentally merged in. This
occurred because the author was also working on a potential extension to
devlink regions which included this trigger command, and accidentally
squashed the documentation incorrectly.
Further review eventually settled on using the previously unused "new"
command instead of creating a new trigger command.
Fix this by removing mention of the trigger command from the
documentation.
Fixes: 0b0f945f54 ("devlink: add a file documenting devlink regions", 2020-01-10)
Noticed-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add missing attribute validation for tunnel source and
destination ports to the netlink policy.
Fixes: af308b94a2 ("netfilter: nf_tables: add tunnel support")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Add missing attribute validation for NFTA_PAYLOAD_CSUM_FLAGS
to the netlink policy.
Fixes: 1814096980 ("netfilter: nft_payload: layer 4 checksum adjustment for pseudoheader fields")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
If hook registration fails, the hooks allocated via nft_netdev_hook_alloc
need to be freed.
We can't change the goto label to 'goto 5' -- while it does fix the memleak
it does cause a warning splat from the netfilter core (the hooks were not
registered).
Fixes: 3f0465a9ef ("netfilter: nf_tables: dynamically allocate hooks per net_device in flowtables")
Reported-by: syzbot+a2ff6fa45162a5ed4dd3@syzkaller.appspotmail.com
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
If .next function does not change position index,
following .show function will repeat output related
to current position index.
Without patch:
# dd if=/proc/net/ip_tables_matches # original file output
conntrack
conntrack
conntrack
recent
recent
icmp
udplite
udp
tcp
0+1 records in
0+1 records out
65 bytes copied, 5.4074e-05 s, 1.2 MB/s
# dd if=/proc/net/ip_tables_matches bs=62 skip=1
dd: /proc/net/ip_tables_matches: cannot skip to specified offset
cp <<< end of last line
tcp <<< and then unexpected whole last line once again
0+1 records in
0+1 records out
7 bytes copied, 0.000102447 s, 68.3 kB/s
Cc: stable@vger.kernel.org
Fixes: 1f4aace60b ("fs/seq_file.c: simplify seq_file iteration code ...")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206283
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
If .next function does not change position index,
following .show function will repeat output related
to current position index.
Without the patch:
# dd if=/proc/net/xt_recent/SSH # original file outpt
src=127.0.0.4 ttl: 0 last_seen: 6275444819 oldest_pkt: 1 6275444819
src=127.0.0.2 ttl: 0 last_seen: 6275438906 oldest_pkt: 1 6275438906
src=127.0.0.3 ttl: 0 last_seen: 6275441953 oldest_pkt: 1 6275441953
0+1 records in
0+1 records out
204 bytes copied, 6.1332e-05 s, 3.3 MB/s
Read after lseek into middle of last line (offset 140 in example below)
generates expected end of last line and then unexpected whole last line
once again
# dd if=/proc/net/xt_recent/SSH bs=140 skip=1
dd: /proc/net/xt_recent/SSH: cannot skip to specified offset
127.0.0.3 ttl: 0 last_seen: 6275441953 oldest_pkt: 1 6275441953
src=127.0.0.3 ttl: 0 last_seen: 6275441953 oldest_pkt: 1 6275441953
0+1 records in
0+1 records out
132 bytes copied, 6.2487e-05 s, 2.1 MB/s
Cc: stable@vger.kernel.org
Fixes: 1f4aace60b ("fs/seq_file.c: simplify seq_file iteration code ...")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206283
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Place phylink_start()/phylink_stop() inside dsa_port_enable() and
dsa_port_disable(), which ensures that we call phylink_stop() before
tearing down phylink - which is a documented requirement. Failure
to do so can cause use-after-free bugs.
Fixes: 0e27921816 ("net: dsa: Use PHYLINK for the CPU/DSA ports")
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hangbin Liu says:
====================
Fix IPv6 peer route update
Currently we have two issues for peer route update on IPv6.
1. When update peer route metric, we only updated the local one.
2. If peer address changed, we didn't remove the old one and add new one.
The first two patches fixed these issues and the third patch add new
tests to cover it.
With the fixes and updated test:
]# ./fib_tests.sh
IPv6 prefix route tests
TEST: Default metric [ OK ]
TEST: User specified metric on first device [ OK ]
TEST: User specified metric on second device [ OK ]
TEST: Delete of address on first device [ OK ]
TEST: Modify metric of address [ OK ]
TEST: Prefix route removed on link down [ OK ]
TEST: Prefix route with metric on link up [ OK ]
TEST: Set metric with peer route on local side [ OK ]
TEST: User specified metric on local address [ OK ]
TEST: Set metric with peer route on peer side [ OK ]
TEST: Modify metric with peer route on local side [ OK ]
TEST: Modify metric with peer route on peer side [ OK ]
IPv4 prefix route tests
TEST: Default metric [ OK ]
TEST: User specified metric on first device [ OK ]
TEST: User specified metric on second device [ OK ]
TEST: Delete of address on first device [ OK ]
TEST: Modify metric of address [ OK ]
TEST: Prefix route removed on link down [ OK ]
TEST: Prefix route with metric on link up [ OK ]
TEST: Modify metric of .0/24 address [ OK ]
TEST: Set metric of address with peer route [ OK ]
TEST: Modify metric of address with peer route [ OK ]
Tests passed: 22
Tests failed: 0
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch update {ipv4, ipv6}_addr_metric_test with
1. Set metric of address with peer route and see if the route added
correctly.
2. Modify metric and peer address for peer route and see if the route
changed correctly.
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When we modify the peer route and changed it to a new one, we should
remove the old route first. Before the fix:
+ ip addr add dev dummy1 2001:db8::1 peer 2001:db8::2
+ ip -6 route show dev dummy1
2001:db8::1 proto kernel metric 256 pref medium
2001:db8::2 proto kernel metric 256 pref medium
+ ip addr change dev dummy1 2001:db8::1 peer 2001:db8::3
+ ip -6 route show dev dummy1
2001:db8::1 proto kernel metric 256 pref medium
2001:db8::2 proto kernel metric 256 pref medium
After the fix:
+ ip addr change dev dummy1 2001:db8::1 peer 2001:db8::3
+ ip -6 route show dev dummy1
2001:db8::1 proto kernel metric 256 pref medium
2001:db8::3 proto kernel metric 256 pref medium
This patch depend on the previous patch "net/ipv6: need update peer route
when modify metric" to update new peer route after delete old one.
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When we modify the route metric, the peer address's route need also
be updated. Before the fix:
+ ip addr add dev dummy1 2001:db8::1 peer 2001:db8::2 metric 60
+ ip -6 route show dev dummy1
2001:db8::1 proto kernel metric 60 pref medium
2001:db8::2 proto kernel metric 60 pref medium
+ ip addr change dev dummy1 2001:db8::1 peer 2001:db8::2 metric 61
+ ip -6 route show dev dummy1
2001:db8::1 proto kernel metric 61 pref medium
2001:db8::2 proto kernel metric 60 pref medium
After the fix:
+ ip addr change dev dummy1 2001:db8::1 peer 2001:db8::2 metric 61
+ ip -6 route show dev dummy1
2001:db8::1 proto kernel metric 61 pref medium
2001:db8::2 proto kernel metric 61 pref medium
Fixes: 8308f3ff17 ("net/ipv6: Add support for specifying metric of connected routes")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski says:
====================
net: add missing netlink policies
Recent one-off fixes motivated me to do some grepping for
more missing netlink attribute policies. I didn't manage
to even produce a KASAN splat with these, but it should
be possible with sufficient luck. All the missing policies
are pretty trivial (NLA_Uxx).
I've only tested the devlink patches, the rest compiles.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Add missing attribute validation for vendor subcommand attributes
to the netlink policy.
Fixes: 9e58095f96 ("NFC: netlink: Implement vendor command support")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add missing attribute validation for NFC_ATTR_TARGET_INDEX
to the netlink policy.
Fixes: 4d63adfe12 ("NFC: Add NFC_CMD_DEACTIVATE_TARGET support")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add missing attribute validation for NFC_ATTR_SE_INDEX
to the netlink policy.
Fixes: 5ce3f32b52 ("NFC: netlink: SE API implementation")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add missing attribute validation for TIPC_NLA_PROP_MTU
to the netlink policy.
Fixes: 901271e040 ("tipc: implement configuration of UDP media MTU")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add missing attribute validation for TEAM_ATTR_OPTION_ARRAY_INDEX
to the netlink policy.
Fixes: b13033262d ("team: introduce array options")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add missing attribute validation for TEAM_ATTR_OPTION_PORT_IFINDEX
to the netlink policy.
Fixes: 80f7c6683f ("team: add support for per-port options")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add missing attribute validation for TCA_TAPRIO_ATTR_TXTIME_DELAY
to the netlink policy.
Fixes: 4cfd5779bd ("taprio: Add support for txtime-assist mode")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add missing attribute validation for TCA_FQ_ORPHAN_MASK
to the netlink policy.
Fixes: 06eb395fa9 ("pkt_sched: fq: better control of DDOS traffic")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add missing attribute validation for OVS_PACKET_ATTR_HASH
to the netlink policy.
Fixes: bd1903b7c4 ("net: openvswitch: add hash info to upcall")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add missing attribute validation for IFLA_MACSEC_PORT
to the netlink policy.
Fixes: c09440f7dc ("macsec: introduce IEEE 802.1AE driver")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add missing attribute validation for IFLA_CAN_TERMINATION
to the netlink policy.
Fixes: 12a6075cab ("can: dev: add CAN interface termination API")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add missing attribute type validation for IEEE802154_ATTR_DEV_TYPE
to the netlink policy.
Fixes: 90c049b2c6 ("ieee802154: interface type to be added")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Stefan Schmidt <stefan@datenfreihafen.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add missing attribute validation for several u8 types.
Fixes: 2c21d11518 ("net: add NL802154 interface for configuration of 802.15.4 devices")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Stefan Schmidt <stefan@datenfreihafen.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add missing netlink policy entry for FRA_TUN_ID.
Fixes: e7030878fc ("fib: Add fib rule match on tunnel id")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
DEVLINK_ATTR_REGION_CHUNK_ADDR and DEVLINK_ATTR_REGION_CHUNK_LEN
lack entries in the netlink policy. Corresponding nla_get_u64()s
may read beyond the end of the message.
Fixes: 4e54795a27 ("devlink: Add support for region snapshot read command")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
DEVLINK_ATTR_PARAM_VALUE_DATA may have different types
so it's not checked by the normal netlink policy. Make
sure the attribute length is what we expect.
Fixes: e3b7ca18ad ("devlink: Add param set command")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The dso->kernel value is now set to everything that is in
machine->kmaps, but that was being used to decide if vmlinux lookup is
needed, which ended up making that lookup be made for kernel modules,
that now have dso->kernel set, leading to these kinds of warnings when
running on a machine with compressed kernel modules, like fedora:31:
[root@five ~]# perf record -F 10000 -a sleep 2
[ perf record: Woken up 1 times to write data ]
lzma: fopen failed on vmlinux: 'No such file or directory'
lzma: fopen failed on /boot/vmlinux: 'No such file or directory'
lzma: fopen failed on /boot/vmlinux-5.5.5-200.fc31.x86_64: 'No such file or directory'
lzma: fopen failed on /usr/lib/debug/boot/vmlinux-5.5.5-200.fc31.x86_64: 'No such file or directory'
lzma: fopen failed on /lib/modules/5.5.5-200.fc31.x86_64/build/vmlinux: 'No such file or directory'
lzma: fopen failed on vmlinux: 'No such file or directory'
lzma: fopen failed on /boot/vmlinux: 'No such file or directory'
lzma: fopen failed on /boot/vmlinux-5.5.5-200.fc31.x86_64: 'No such file or directory'
lzma: fopen failed on /usr/lib/debug/boot/vmlinux-5.5.5-200.fc31.x86_64: 'No such file or directory'
lzma: fopen failed on /lib/modules/5.5.5-200.fc31.x86_64/build/vmlinux: 'No such file or directory'
lzma: fopen failed on vmlinux: 'No such file or directory'
lzma: fopen failed on /boot/vmlinux: 'No such file or directory'
lzma: fopen failed on /boot/vmlinux-5.5.5-200.fc31.x86_64: 'No such file or directory'
lzma: fopen failed on /usr/lib/debug/boot/vmlinux-5.5.5-200.fc31.x86_64: 'No such file or directory'
lzma: fopen failed on /lib/modules/5.5.5-200.fc31.x86_64/build/vmlinux: 'No such file or directory'
lzma: fopen failed on vmlinux: 'No such file or directory'
lzma: fopen failed on /boot/vmlinux: 'No such file or directory'
lzma: fopen failed on /boot/vmlinux-5.5.5-200.fc31.x86_64: 'No such file or directory'
lzma: fopen failed on /usr/lib/debug/boot/vmlinux-5.5.5-200.fc31.x86_64: 'No such file or directory'
lzma: fopen failed on /lib/modules/5.5.5-200.fc31.x86_64/build/vmlinux: 'No such file or directory'
lzma: fopen failed on vmlinux: 'No such file or directory'
lzma: fopen failed on /boot/vmlinux: 'No such file or directory'
lzma: fopen failed on /boot/vmlinux-5.5.5-200.fc31.x86_64: 'No such file or directory'
lzma: fopen failed on /usr/lib/debug/boot/vmlinux-5.5.5-200.fc31.x86_64: 'No such file or directory'
lzma: fopen failed on /lib/modules/5.5.5-200.fc31.x86_64/build/vmlinux: 'No such file or directory'
[ perf record: Captured and wrote 1.024 MB perf.data (1366 samples) ]
[root@five ~]#
This happens when collecting the buildid, when we find samples for
kernel modules, fix it by checking if the looked up DSO is a kernel
module by other means.
Fixes: 02213cec64 ("perf maps: Mark module DSOs with kernel type")
Tested-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Kim Phillips <kim.phillips@amd.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Link: http://lore.kernel.org/lkml/20200302191007.GD10335@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Noticed with gcc 10 (fedora rawhide) that those variables were not being
declared as static, so end up with:
ld: /tmp/build/perf/bench/epoll-wait.o:/git/perf/tools/perf/bench/epoll-wait.c:93: multiple definition of `end'; /tmp/build/perf/bench/futex-hash.o:/git/perf/tools/perf/bench/futex-hash.c:40: first defined here
ld: /tmp/build/perf/bench/epoll-wait.o:/git/perf/tools/perf/bench/epoll-wait.c:93: multiple definition of `start'; /tmp/build/perf/bench/futex-hash.o:/git/perf/tools/perf/bench/futex-hash.c:40: first defined here
ld: /tmp/build/perf/bench/epoll-wait.o:/git/perf/tools/perf/bench/epoll-wait.c:93: multiple definition of `runtime'; /tmp/build/perf/bench/futex-hash.o:/git/perf/tools/perf/bench/futex-hash.c:40: first defined here
ld: /tmp/build/perf/bench/epoll-ctl.o:/git/perf/tools/perf/bench/epoll-ctl.c:38: multiple definition of `end'; /tmp/build/perf/bench/futex-hash.o:/git/perf/tools/perf/bench/futex-hash.c:40: first defined here
ld: /tmp/build/perf/bench/epoll-ctl.o:/git/perf/tools/perf/bench/epoll-ctl.c:38: multiple definition of `start'; /tmp/build/perf/bench/futex-hash.o:/git/perf/tools/perf/bench/futex-hash.c:40: first defined here
ld: /tmp/build/perf/bench/epoll-ctl.o:/git/perf/tools/perf/bench/epoll-ctl.c:38: multiple definition of `runtime'; /tmp/build/perf/bench/futex-hash.o:/git/perf/tools/perf/bench/futex-hash.c:40: first defined here
make[4]: *** [/git/perf/tools/build/Makefile.build:145: /tmp/build/perf/bench/perf-in.o] Error 1
Prefix those with bench__ and add them to bench/bench.h, so that we can
share those on the tools needing to access those variables from signal
handlers.
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lore.kernel.org/lkml/20200303155811.GD13702@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Commit c44b4c6ab8 ("KVM: emulate: clean up initializations in
init_decode_cache") did some field shuffling and instead of
[opcode_len, _regs) started clearing [has_seg_override, modrm).
The comment about clearing fields altogether is not true anymore.
Fixes: c44b4c6ab8 ("KVM: emulate: clean up initializations in init_decode_cache")
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
After commit 07721feee4 ("KVM: nVMX: Don't emulate instructions in guest
mode") Hyper-V guests on KVM stopped booting with:
kvm_nested_vmexit: rip fffff802987d6169 reason EPT_VIOLATION info1 181
info2 0 int_info 0 int_info_err 0
kvm_page_fault: address febd0000 error_code 181
kvm_emulate_insn: 0:fffff802987d6169: f3 a5
kvm_emulate_insn: 0:fffff802987d6169: f3 a5 FAIL
kvm_inj_exception: #UD (0x0)
"f3 a5" is a "rep movsw" instruction, which should not be intercepted
at all. Commit c44b4c6ab8 ("KVM: emulate: clean up initializations in
init_decode_cache") reduced the number of fields cleared by
init_decode_cache() claiming that they are being cleared elsewhere,
'intercept', however, is left uncleared if the instruction does not have
any of the "slow path" flags (NotImpl, Stack, Op3264, Sse, Mmx, CheckPerm,
NearBranch, No16 and of course Intercept itself).
Fixes: c44b4c6ab8 ("KVM: emulate: clean up initializations in init_decode_cache")
Fixes: 07721feee4 ("KVM: nVMX: Don't emulate instructions in guest mode")
Cc: stable@vger.kernel.org
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
If the hardware receives an oversized packet with too many rx fragments,
skb_shinfo(skb)->frags can overflow and corrupt memory of adjacent pages.
This becomes especially visible if it corrupts the freelist pointer of
a slab page.
Cc: stable@vger.kernel.org
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Since we ony support the TTB1 quirk for AArch64 contexts, and
consequently only for 64-bit builds, the sign-extension aspect of the
"are all bits above IAS consistent?" check should implicitly only apply
to 64-bit IOVAs. Change the type of the cast to ensure that 32-bit longs
don't inadvertently get sign-extended, and thus considered invalid, if
they happen to be above 2GB in the TTB0 region.
Reported-by: Stephan Gerhold <stephan@gerhold.net>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Acked-by: Acked-by: Will Deacon <will@kernel.org>
Fixes: db6903010a ("iommu/io-pgtable-arm: Prepare for TTBR1 usage")
Signed-off-by: Joerg Roedel <jroedel@suse.de>
intel_iommu_iova_to_phys() has a bug when it translates an IOVA for a huge
page onto its corresponding physical address. This commit fixes the bug by
accomodating the level of page entry for the IOVA and adds IOVA's lower
address to the physical address.
Cc: <stable@vger.kernel.org>
Acked-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Moritz Fischer <mdf@kernel.org>
Signed-off-by: Yonghyun Hwang <yonghyun@google.com>
Fixes: 3871794642 ("VT-d: Changes to support KVM")
Signed-off-by: Joerg Roedel <jroedel@suse.de>
In svm, exit_code for MSR writes is not EXIT_REASON_MSR_WRITE which
belongs to vmx.
According to amd manual, SVM_EXIT_MSR(7ch) is the exit_code of VMEXIT_MSR
due to RDMSR or WRMSR access to protected MSR. Additionally, the processor
indicates in the VMCB's EXITINFO1 whether a RDMSR(EXITINFO1=0) or
WRMSR(EXITINFO1=1) was intercepted.
Signed-off-by: Haiwei Li <lihaiwei@tencent.com>
Fixes: 1e9e2622a1 ("KVM: VMX: FIXED+PHYSICAL mode single target IPI fastpath", 2019-11-21)
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Building cpupower with -fno-common in CFLAGS results in errors due to
multiple definitions of the 'cpu_count' and 'start_time' variables.
./utils/idle_monitor/snb_idle.o:./utils/idle_monitor/cpupower-monitor.h:28:
multiple definition of `cpu_count';
./utils/idle_monitor/nhm_idle.o:./utils/idle_monitor/cpupower-monitor.h:28:
first defined here
...
./utils/idle_monitor/cpuidle_sysfs.o:./utils/idle_monitor/cpuidle_sysfs.c:22:
multiple definition of `start_time';
./utils/idle_monitor/amd_fam14h_idle.o:./utils/idle_monitor/amd_fam14h_idle.c:85:
first defined here
The -fno-common option will be enabled by default in GCC 10.
Bug: https://bugs.gentoo.org/707462
Signed-off-by: Mike Gilbert <floppym@gentoo.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Make the code more compact by using asprintf() instead of malloc()+strncpy() which also uses
less memory and avoids these warnings with gcc 10:
CC /tmp/build/perf/util/cloexec.o
In file included from /usr/include/string.h:495,
from util/parse-events.h:12,
from util/parse-events.c:18:
In function ‘strncpy’,
inlined from ‘tracepoint_id_to_path’ at util/parse-events.c:271:5:
/usr/include/bits/string_fortified.h:106:10: error: ‘__builtin_strncpy’ offset [275, 511] from the object at ‘sys_dirent’ is out of the bounds of referenced subobject ‘d_name’ with type ‘char[256]’ at offset 19 [-Werror=array-bounds]
106 | return __builtin___strncpy_chk (__dest, __src, __len, __bos (__dest));
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /usr/include/dirent.h:61,
from util/parse-events.c:5:
util/parse-events.c: In function ‘tracepoint_id_to_path’:
/usr/include/bits/dirent.h:33:10: note: subobject ‘d_name’ declared here
33 | char d_name[256]; /* We must not include limits.h! */
| ^~~~~~
In file included from /usr/include/string.h:495,
from util/parse-events.h:12,
from util/parse-events.c:18:
In function ‘strncpy’,
inlined from ‘tracepoint_id_to_path’ at util/parse-events.c:273:5:
/usr/include/bits/string_fortified.h:106:10: error: ‘__builtin_strncpy’ offset [275, 511] from the object at ‘evt_dirent’ is out of the bounds of referenced subobject ‘d_name’ with type ‘char[256]’ at offset 19 [-Werror=array-bounds]
106 | return __builtin___strncpy_chk (__dest, __src, __len, __bos (__dest));
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /usr/include/dirent.h:61,
from util/parse-events.c:5:
util/parse-events.c: In function ‘tracepoint_id_to_path’:
/usr/include/bits/dirent.h:33:10: note: subobject ‘d_name’ declared here
33 | char d_name[256]; /* We must not include limits.h! */
| ^~~~~~
CC /tmp/build/perf/util/call-path.o
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lore.kernel.org/lkml/20200302145535.GA28183@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
It is possible to return a pointer to a local variable when looking up
the architecture name for the running system and no normalization is
done on that value, i.e. we may end up returning the uts.machine local
variable.
While this doesn't happen on most arches, as normalization takes place,
lets fix this by making that a static variable and optimize it a bit by
not always running uname(), only the first time.
Noticed in fedora rawhide running with:
[perfbuilder@a5ff49d6e6e4 ~]$ gcc --version
gcc (GCC) 10.0.1 20200216 (Red Hat 10.0.1-0.8)
Reported-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To fix the build with newer gccs, that without this patch exit with:
LD /tmp/build/perf/tests/perf-in.o
ld: /tmp/build/perf/tests/bp_account.o:/git/perf/tools/perf/tests/bp_account.c:22: multiple definition of `the_var'; /tmp/build/perf/tests/bp_signal.o:/git/perf/tools/perf/tests/bp_signal.c:38: first defined here
make[4]: *** [/git/perf/tools/build/Makefile.build:145: /tmp/build/perf/tests/perf-in.o] Error 1
First noticed in fedora:rawhide/32 with:
[perfbuilder@a5ff49d6e6e4 ~]$ gcc --version
gcc (GCC) 10.0.1 20200216 (Red Hat 10.0.1-0.8)
Reported-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Michael Chan says:
====================
bnxt_en: 2 bug fixes.
This first patch fixes a rare but possible crash in pci_disable_msix()
when the MTU is changed. The 2nd patch fixes a regression in error
code handling when flashing a file to NVRAM.
Please also queue these for -stable. Thanks.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
After bnxt_hwrm_do_send_message() was updated to return standard error
codes in a recent commit, a regression in bnxt_flash_package_from_file()
was introduced. The return value does not properly reflect all
possible firmware errors when calling firmware to flash the package.
Fix it by consolidating all errors in one local variable rc instead
of having 2 variables for different errors.
Fixes: d4f1420d36 ("bnxt_en: Convert error code in firmware message response to standard code.")
Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
MTU changes may affect the number of IRQs so we must call
bnxt_close_nic()/bnxt_open_nic() with the irq_re_init parameter
set to true. The reason is that a larger MTU may require
aggregation rings not needed with smaller MTU. We may not be
able to allocate the required number of aggregation rings and
so we reduce the number of channels which will change the number
of IRQs. Without this patch, it may crash eventually in
pci_disable_msix() when the IRQs are not properly unwound.
Fixes: c0c050c58d ("bnxt_en: New Broadcom ethernet driver.")
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
On all PHY drivers that implement did_interrupt() reading the interrupt
status bits clears them. This means we may loose an interrupt that
is triggered between calling did_interrupt() and phy_clear_interrupt().
As part of the fix make it a requirement that did_interrupt() clears
the interrupt.
The Fixes tag refers to the first commit where the patch applies
cleanly.
Fixes: 49644e68f4 ("net: phy: add callback for custom interrupt handler to struct phy_driver")
Reported-by: Michael Walle <michael@walle.cc>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When we add peer address with metric configured, IPv4 could set the dest
metric correctly, but IPv6 do not. e.g.
]# ip addr add 192.0.2.1 peer 192.0.2.2/32 dev eth1 metric 20
]# ip route show dev eth1
192.0.2.2 proto kernel scope link src 192.0.2.1 metric 20
]# ip addr add 2001:db8::1 peer 2001:db8::2/128 dev eth1 metric 20
]# ip -6 route show dev eth1
2001:db8::1 proto kernel metric 20 pref medium
2001:db8::2 proto kernel metric 256 pref medium
Fix this by using configured metric instead of default one.
Reported-by: Jianlin Shi <jishi@redhat.com>
Fixes: 8308f3ff17 ("net/ipv6: Add support for specifying metric of connected routes")
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the switch is not hardware reset on a warm boot, interrupts can be
left enabled, and possibly pending. This will cause us to enter an
infinite loop trying to service an interrupt we are unable to handle,
thereby preventing the kernel from booting.
Ensure that the global 2 interrupt sources are disabled before we claim
the parent interrupt.
Observed on the ZII development revision B and C platforms with
reworked serdes support, and using reboot -f to reboot the platform.
Fixes: dc30c35be7 ("net: dsa: mv88e6xxx: Implement interrupt support.")
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
When debugging via PRINTK() is not enabled, make the PRINTK()
macro be an empty do-while block.
Thix fixes a gcc warning when -Wextra is set:
../drivers/atm/nicstar.c:1819:23: warning: suggest braces around empty body in an ‘else’ statement [-Wempty-body]
I have verified that there is no object code change (with gcc 7.5.0).
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Chas Williams <3chas3@gmail.com>
Cc: linux-atm-general@lists.sourceforge.net
Cc: netdev@vger.kernel.org
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Userspace might send a batch that is composed of several netlink
messages. The netlink_ack() function must use the pointer to the netlink
header as base to calculate the bad attribute offset.
Fixes: 2d4bc93368 ("netlink: extended ACK reporting")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dell USB Type C docking WD19/WD19DC attaches additional peripherals as:
/: Bus 02.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/6p, 5000M
|__ Port 1: Dev 11, If 0, Class=Hub, Driver=hub/4p, 5000M
|__ Port 3: Dev 12, If 0, Class=Hub, Driver=hub/4p, 5000M
|__ Port 4: Dev 13, If 0, Class=Vendor Specific Class,
Driver=r8152, 5000M
where usb 2-1-3 is a hub connecting all USB Type-A/C ports on the dock.
When hotplugging such dock with additional usb devices already attached on
it, the probing process may reset usb 2.1 port, therefore r8152 ethernet
device is also reset. However, during r8152 device init there are several
for-loops that, when it's unable to retrieve hardware registers due to
being disconnected from USB, may take up to 14 seconds each in practice,
and that has to be completed before USB may re-enumerate devices on the
bus. As a result, devices attached to the dock will only be available
after nearly 1 minute after the dock was plugged in:
[ 216.388290] [250] r8152 2-1.4:1.0: usb_probe_interface
[ 216.388292] [250] r8152 2-1.4:1.0: usb_probe_interface - got id
[ 258.830410] r8152 2-1.4:1.0 (unnamed net_device) (uninitialized): PHY not ready
[ 258.830460] r8152 2-1.4:1.0 (unnamed net_device) (uninitialized): Invalid header when reading pass-thru MAC addr
[ 258.830464] r8152 2-1.4:1.0 (unnamed net_device) (uninitialized): Get ether addr fail
This happens in, for example, r8153_init:
static int generic_ocp_read(struct r8152 *tp, u16 index, u16 size,
void *data, u16 type)
{
if (test_bit(RTL8152_UNPLUG, &tp->flags))
return -ENODEV;
...
}
static u16 ocp_read_word(struct r8152 *tp, u16 type, u16 index)
{
u32 data;
...
generic_ocp_read(tp, index, sizeof(tmp), &tmp, type | byen);
data = __le32_to_cpu(tmp);
...
return (u16)data;
}
static void r8153_init(struct r8152 *tp)
{
...
if (test_bit(RTL8152_UNPLUG, &tp->flags))
return;
for (i = 0; i < 500; i++) {
if (ocp_read_word(tp, MCU_TYPE_PLA, PLA_BOOT_CTRL) &
AUTOLOAD_DONE)
break;
msleep(20);
}
...
}
Since ocp_read_word() doesn't check the return status of
generic_ocp_read(), and the only exit condition for the loop is to have
a match in the returned value, such loops will only ends after exceeding
its maximum runs when the device has been marked as disconnected, which
takes 500 * 20ms = 10 seconds in theory, 14 in practice.
To solve this long latency another test to RTL8152_UNPLUG flag should be
added after those 20ms sleep to skip unnecessary loops, so that the device
probe can complete early and proceed to parent port reset/reprobe process.
This can be reproduced on all kernel versions up to latest v5.6-rc2, but
after v5.5-rc7 the reproduce rate is dramatically lowered to 1/30 or less
while it was around 1/2.
Signed-off-by: You-Sheng Yang <vicamo.yang@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are two implemented bits in the PPIN_CTL MSR:
Bit 0: LockOut (R/WO)
Set 1 to prevent further writes to MSR_PPIN_CTL.
Bit 1: Enable_PPIN (R/W)
If 1, enables MSR_PPIN to be accessible using RDMSR.
If 0, an attempt to read MSR_PPIN will cause #GP.
So there are four defined values:
0: PPIN is disabled, PPIN_CTL may be updated
1: PPIN is disabled. PPIN_CTL is locked against updates
2: PPIN is enabled. PPIN_CTL may be updated
3: PPIN is enabled. PPIN_CTL is locked against updates
Code would only enable the X86_FEATURE_INTEL_PPIN feature for case "2".
When it should have done so for both case "2" and case "3".
Fix the final test to just check for the enable bit. Also fix some of
the other comments in this function.
Fixes: 3f5a7896a5 ("x86/mce: Include the PPIN in MCE records when available")
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20200226011737.9958-1-tony.luck@intel.com
The CONFIG_MIPS_CMDLINE_DTB_EXTEND option is used so that the kernel
arguments provided in the 'bootargs' property in devicetree are extended
with the kernel arguments provided by the bootloader.
The code was broken, as it didn't actually take any of the kernel
arguments provided in devicetree when that option was set.
Fixes: 7784cac697 ("MIPS: cmdline: Clean up boot_command_line initialization")
Cc: stable@vger.kernel.org
Signed-off-by: Paul Cercueil <paul@crapouillou.net>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Chris Wilson reported splats from running the thermal throttling
workqueue callback on offlined CPUs. The problem is that that callback
should not even run on offlined CPUs but it happens nevertheless because
the offlining callback thermal_throttle_offline() does not symmetrically
undo the setup work done in its onlining counterpart. IOW,
1. The thermal interrupt vector should be masked out before ...
2. ... cancelling any pending work synchronously so that no new work is
enqueued anymore.
Do those things and fix the issue properly.
[ bp: Write commit message. ]
Fixes: f6656208f0 ("x86/mce/therm_throt: Optimize notifications of thermal throttle")
Reported-by: Chris Wilson <chris@chris-wilson.co.uk>
Tested-by: Pandruvada, Srinivas <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/158120068234.18291.7938335950259651295@skylake-alporthouse-com
An NFS client that mounts multiple exports from the same NFS
server with higher NFSv4 versions disabled (i.e. 4.2) and without
forcing a specific NFS version results in fscache index cookie
collisions and the following messages:
[ 570.004348] FS-Cache: Duplicate cookie detected
Each nfs_client structure should have its own fscache index cookie,
so add the minorversion to nfs_server_key.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=200145
Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
If userspace passes an nfs_mount_data struct in the data argument of
mount(2), then nfs23_parse_monolithic() or nfs4_parse_monolithic()
will allocate memory for ctx->nfs_server.hostname. This needs to be
freed in nfs_parse_source(), which also allocates memory for
ctx->nfs_server.hostname, otherwise a leak will occur.
Reported-by: syzbot+193c375dcddb4f345091@syzkaller.appspotmail.com
Fixes: f2aedb713c ("NFS: Add fs_context support.")
Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Hard-coding the fstype causes "nfs4" mounts to appear as "nfs",
which breaks scripts that do "umount -at nfs4".
Reported-by: Patrick Steinhardt <ps@pks.im>
Fixes: f2aedb713c ("NFS: Add fs_context support.")
Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Pull an i.MX clk fix from Shawn Guo:
A fix on i.MX8MN I2C4 and UART1 clock IDs, which clash with PWM2 and
PWM3 ID. This bug makes I2C4 and UART1 device in i.MX8MN DT point to
PWM2 and PWM3 clock.
* tag 'imx-clk-fixes-5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux:
clk: imx8mn: Fix incorrect clock defines
This is necessary because unless userspace explicitly requests fstype
"nfs4" (either via "mount -t nfs4" or by calling the "mount.nfs4" helper
directly), the fstype will default to "nfs".
This was fine on older kernels because the super_block->s_type was set
via mount_info->nfs_mod->nfs_fs, which was set when parsing the mount
options and subsequently passed in the "type" argument of sget().
After commit f2aedb713c ("NFS: Add fs_context support."), sget_fc(),
which has no "type" argument, is called instead. In sget_fc(), the
super_block->s_type is set via fs_context->fs_type, which was set when
the filesystem context was initially created.
Reported-by: Patrick Steinhardt <ps@pks.im>
Fixes: f2aedb713c ("NFS: Add fs_context support.")
Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
When using plugins, GCC requires that the -fplugin= options precedes
any of its plugin arguments appearing on the command line as well.
This is usually not a concern, but as it turns out, this requirement
is causing some issues with ARM's per-task stack protector plugin
and Kbuild's implementation of $(cc-option).
When the per-task stack protector plugin is enabled, and we tweak
the implementation of cc-option not to pipe the stderr output of
GCC to /dev/null, the following output is generated when GCC is
executed in the context of cc-option:
cc1: error: plugin arm_ssp_per_task_plugin should be specified before \
-fplugin-arg-arm_ssp_per_task_plugin-tso=1 in the command line
cc1: error: plugin arm_ssp_per_task_plugin should be specified before \
-fplugin-arg-arm_ssp_per_task_plugin-offset=24 in the command line
These errors will cause any option passed to cc-option to be treated
as unsupported, which is obviously incorrect.
The cause of this issue is the fact that the -fplugin= argument is
added to GCC_PLUGINS_CFLAGS, whereas the arguments above are added
to KBUILD_CFLAGS, and the contents of the former get filtered out of
the latter before being passed to the GCC running the cc-option test,
and so the -fplugin= option does not appear at all on the GCC command
line.
Adding the arguments to GCC_PLUGINS_CFLAGS instead of KBUILD_CFLAGS
would be the correct approach here, if it weren't for the fact that we
are using $(eval) to defer the moment that they are added until after
asm-offsets.h is generated, which is after the point where the contents
of GCC_PLUGINS_CFLAGS are added to KBUILD_CFLAGS. So instead, we have
to add our plugin arguments to both.
For similar reasons, we cannot append DISABLE_ARM_SSP_PER_TASK_PLUGIN
to KBUILD_CFLAGS, as it will be passed to GCC when executing in the
context of cc-option, whereas the other plugin arguments will have
been filtered out, resulting in a similar error and false negative
result as above. So add it to ccflags-y instead.
Fixes: 189af46571 ("ARM: smp: add support for per-task stack canaries")
Reported-by: Merlijn Wajer <merlijn@wizzup.org>
Tested-by: Tony Lindgren <tony@atomide.com>
Acked-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
It is possible for a system with an ARMv8 timer to run a 32-bit kernel.
When this happens we will unconditionally have the vDSO code remove the
__vdso_gettimeofday and __vdso_clock_gettime symbols because
cntvct_functional() returns false since it does not match that
compatibility string.
Fixes: ecf99a4391 ("ARM: 8331/1: VDSO initialization, mapping, and synchronization")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
At the moment, reading from in_magn_*_raw in sysfs tends to return
large values around 65000, even though the output of ak8974 is actually
limited to ±32768. This happens because the value is never converted
to the signed 16-bit integer variant.
Add an explicit cast to s16 to fix this.
Fixes: 7c94a8b2ee ("iio: magn: add a driver for AK8974")
Signed-off-by: Stephan Gerhold <stephan@gerhold.net>
Reviewed-by: Linus Waleij <linus.walleij@linaro.org>
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
A transmission scheduling for an interface which is currently dropped by
batadv_iv_ogm_iface_disable could still be in progress. The B.A.T.M.A.N. V
is simply cancelling the workqueue item in an synchronous way but this is
not possible with B.A.T.M.A.N. IV because the OGM submissions are
intertwined.
Instead it has to stop submitting the OGM when it detect that the buffer
pointer is set to NULL.
Reported-by: syzbot+a98f2016f40b9cd3818a@syzkaller.appspotmail.com
Reported-by: syzbot+ac36b6a33c28a491e929@syzkaller.appspotmail.com
Fixes: c6c8fea297 ("net: Add batman-adv meshing protocol")
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Cc: Hillf Danton <hdanton@sina.com>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Master mode should be disabled when stopping. This mainly impacts
possible other use-case after timer has been stopped. Currently,
master mode remains set (from start routine).
Fixes: 6fb34812c2 ("iio: stm32 trigger: Add support for TRGO2 triggers")
Signed-off-by: Fabrice Gasnier <fabrice.gasnier@st.com>
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
The logic for checking required NVM sections was recently fixed in
commit b3f20e0982 ("iwlwifi: mvm: fix NVM check for 3168
devices"). However, with that fixed the else is now taken for 3168
devices and within the else clause there is a mandatory check for the
PHY_SKU section. This causes the parsing to fail for 3168 devices.
The PHY_SKU section is really only mandatory for the IWL_NVM_EXT
layout (the phy_sku parameter of iwl_parse_nvm_data is only used when
the NVM type is IWL_NVM_EXT). So this changes the PHY_SKU section
check so that it's only mandatory for IWL_NVM_EXT.
Fixes: b3f20e0982 ("iwlwifi: mvm: fix NVM check for 3168 devices")
Signed-off-by: Dan Moulding <dmoulding@me.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Normal, synchronous requests will have their args allocated on the stack.
After the FR_FINISHED bit is set by receiving the reply from the userspace
fuse server, the originating task may return and reuse the stack frame,
resulting in an Oops if the args structure is dereferenced.
Fix by setting a flag in the request itself upon initializing, indicating
whether it has an asynchronous ->end() callback.
Reported-by: Kyle Sanderson <kyle.leet@gmail.com>
Reported-by: Michael Stapelberg <michael+lkml@stapelberg.ch>
Fixes: 2b319d1f6f ("fuse: don't dereference req->args on finished request")
Cc: <stable@vger.kernel.org> # v5.4
Tested-by: Michael Stapelberg <michael+lkml@stapelberg.ch>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
The clock disable signal for video_cc_vcodec0_core_clk is tied to
vcodec0_gdsc which is supported in the HW control mode. Thus turning off
the clock would be taken care automatically when the GDSC turns OFF by
hardware and clock driver does not require to poll on the CLK_OFF bit.
Signed-off-by: Taniya Das <tdas@codeaurora.org>
Link: https://lkml.kernel.org/r/1581423235-21341-1-git-send-email-tdas@codeaurora.org
Fixes: 253dc75a0b ("clk: qcom: Add video clock controller driver for SC7180")
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
PF_EXITING is set earlier than actual removal from css_set when a task
is exitting. This can confuse cgroup.procs readers who see no PF_EXITING
tasks, however, rmdir is checking against css_set membership so it can
transitionally fail with EBUSY.
Fix this by listing tasks that weren't unlinked from css_set active
lists.
It may happen that other users of the task iterator (without
CSS_TASK_ITER_PROCS) spot a PF_EXITING task before cgroup_exit(). This
is equal to the state before commit c03cd7738a ("cgroup: Include dying
leaders with live threads in PROCS iterations") but it may be reviewed
later.
Reported-by: Suren Baghdasaryan <surenb@google.com>
Fixes: c03cd7738a ("cgroup: Include dying leaders with live threads in PROCS iterations")
Signed-off-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
If seq_file .next fuction does not change position index,
read after some lseek can generate unexpected output:
1) dd bs=1 skip output of each 2nd elements
$ dd if=/sys/fs/cgroup/cgroup.procs bs=8 count=1
2
3
4
5
1+0 records in
1+0 records out
8 bytes copied, 0,000267297 s, 29,9 kB/s
[test@localhost ~]$ dd if=/sys/fs/cgroup/cgroup.procs bs=1 count=8
2
4 <<< NB! 3 was skipped
6 <<< ... and 5 too
8 <<< ... and 7
8+0 records in
8+0 records out
8 bytes copied, 5,2123e-05 s, 153 kB/s
This happen because __cgroup_procs_start() makes an extra
extra cgroup_procs_next() call
2) read after lseek beyond end of file generates whole last line.
3) read after lseek into middle of last line generates
expected rest of last line and unexpected whole line once again.
Additionally patch removes an extra position index changes in
__cgroup_procs_start()
Cc: stable@vger.kernel.orghttps://bugzilla.kernel.org/show_bug.cgi?id=206283
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
if seq_file .next fuction does not change position index,
read after some lseek can generate unexpected output.
# mount | grep cgroup
# dd if=/mnt/cgroup.procs bs=1 # normal output
...
1294
1295
1296
1304
1382
584+0 records in
584+0 records out
584 bytes copied
dd: /mnt/cgroup.procs: cannot skip to specified offset
83 <<< generates end of last line
1383 <<< ... and whole last line once again
0+1 records in
0+1 records out
8 bytes copied
dd: /mnt/cgroup.procs: cannot skip to specified offset
1386 <<< generates last line anyway
0+1 records in
0+1 records out
5 bytes copied
https://bugzilla.kernel.org/show_bug.cgi?id=206283
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
It's desirable to be able to rely on the following property: All stores
preceding (in program order) a call to a successful queue_work() will be
visible from the CPU which will execute the queued work by the time such
work executes, e.g.,
{ x is initially 0 }
CPU0 CPU1
WRITE_ONCE(x, 1); [ "work" is being executed ]
r0 = queue_work(wq, work); r1 = READ_ONCE(x);
Forbids: r0 == true && r1 == 0
The current implementation of queue_work() provides such memory-ordering
property:
- In __queue_work(), the ->lock spinlock is acquired.
- On the other side, in worker_thread(), this same ->lock is held
when dequeueing work.
So the locking ordering makes things work out.
Add this property to the DocBook headers of {queue,schedule}_work().
Suggested-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Andrea Parri <parri.andrea@gmail.com>
Acked-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
The ARC platform code is not a clock provider, and just needs to call
of_clk_init().
Hence it can include <linux/of_clk.h> instead of <linux/clk-provider.h>.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Stephen Boyd <sboyd@kernel.org>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
CONFIG_IOSCHED_DEADLINE and CONFIG_IOSCHED_CFQ are gone since
commit f382fb0bce ("block: remove legacy IO schedulers").
The IOSCHED_DEADLINE was replaced by MQ_IOSCHED_DEADLINE and it will be
now enabled by default (along with MQ_IOSCHED_KYBER).
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
This commit fixes the error message:
"BUG: sleeping function called from invalid context at kernel/irq/chip.c"
Suppress the trigger irq handler. Make the buffer transfers directly
in DMA callback, instead.
Push buffers without timestamps, as timestamps are not supported
in DFSDM driver.
Fixes: 11646e81d7 ("iio: adc: stm32-dfsdm: add support for buffer modes")
Signed-off-by: Olivier Moysan <olivier.moysan@st.com>
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
The differential channels require writing the channel offset register (COR).
Otherwise they do not work in differential mode.
The configuration of COR is missing in triggered mode.
Fixes: 5e1a1da0f8 ("iio: adc: at91-sama5d2_adc: add hw trigger and buffer support")
Signed-off-by: Eugen Hristev <eugen.hristev@microchip.com>
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2020-02-02 11:00:23 +00:00
414 changed files with 3494 additions and 1510 deletions
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.