Compare commits

..

351 Commits

Author SHA1 Message Date
Linus Torvalds
2ef96a5bb1 Linux 5.7-rc5 2020-05-10 15:16:58 -07:00
Linus Torvalds
c14cab2688 Merge tag 'x86-urgent-2020-05-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
 "A set of fixes for x86:

   - Ensure that direct mapping alias is always flushed when changing
     page attributes. The optimization for small ranges failed to do so
     when the virtual address was in the vmalloc or module space.

   - Unbreak the trace event registration for syscalls without arguments
     caused by the refactoring of the SYSCALL_DEFINE0() macro.

   - Move the printk in the TSC deadline timer code to a place where it
     is guaranteed to only be called once during boot and cannot be
     rearmed by clearing warn_once after boot. If it's invoked post boot
     then lockdep rightfully complains about a potential deadlock as the
     calling context is different.

   - A series of fixes for objtool and the ORC unwinder addressing
     variety of small issues:

       - Stack offset tracking for indirect CFAs in objtool ignored
         subsequent pushs and pops

       - Repair the unwind hints in the register clearing entry ASM code

       - Make the unwinding in the low level exit to usermode code stop
         after switching to the trampoline stack. The unwind hint is no
         longer valid and the ORC unwinder emits a warning as it can't
         find the registers anymore.

       - Fix unwind hints in switch_to_asm() and rewind_stack_do_exit()
         which caused objtool to generate bogus ORC data.

       - Prevent unwinder warnings when dumping the stack of a
         non-current task as there is no way to be sure about the
         validity because the dumped stack can be a moving target.

       - Make the ORC unwinder behave the same way as the frame pointer
         unwinder when dumping an inactive tasks stack and do not skip
         the first frame.

       - Prevent ORC unwinding before ORC data has been initialized

       - Immediately terminate unwinding when a unknown ORC entry type
         is found.

       - Prevent premature stop of the unwinder caused by IRET frames.

       - Fix another infinite loop in objtool caused by a negative
         offset which was not catched.

       - Address a few build warnings in the ORC unwinder and add
         missing static/ro_after_init annotations"

* tag 'x86-urgent-2020-05-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/unwind/orc: Move ORC sorting variables under !CONFIG_MODULES
  x86/apic: Move TSC deadline timer debug printk
  ftrace/x86: Fix trace event registration for syscalls without arguments
  x86/mm/cpa: Flush direct map alias during cpa
  objtool: Fix infinite loop in for_offset_range()
  x86/unwind/orc: Fix premature unwind stoppage due to IRET frames
  x86/unwind/orc: Fix error path for bad ORC entry type
  x86/unwind/orc: Prevent unwinding before ORC initialization
  x86/unwind/orc: Don't skip the first frame for inactive tasks
  x86/unwind: Prevent false warnings for non-current tasks
  x86/unwind/orc: Convert global variables to static
  x86/entry/64: Fix unwind hints in rewind_stack_do_exit()
  x86/entry/64: Fix unwind hints in __switch_to_asm()
  x86/entry/64: Fix unwind hints in kernel exit path
  x86/entry/64: Fix unwind hints in register clearing code
  objtool: Fix stack offset tracking for indirect CFAs
2020-05-10 11:59:53 -07:00
Linus Torvalds
8b00083219 Merge tag 'objtool-urgent-2020-05-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull objtool fix from Thomas Gleixner:
 "A single fix for objtool to prevent an infinite loop in the
  jump table search which can be triggered when building the
  kernel with '-ffunction-sections'"

* tag 'objtool-urgent-2020-05-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  objtool: Fix infinite loop in find_jump_table()
2020-05-10 11:42:14 -07:00
Linus Torvalds
bd2049f871 Merge tag 'locking-urgent-2020-05-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking fix from Thomas Gleixner:
 "A single fix for the fallout of the recent futex uacess rework.

  With those changes GCC9 fails to analyze arch_futex_atomic_op_inuser()
  correctly and emits a 'maybe unitialized' warning. While we usually
  ignore compiler stupidity the conditional store is pointless anyway
  because the correct case has to store. For the fault case the extra
  store does no harm"

* tag 'locking-urgent-2020-05-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  ARM: futex: Address build warning
2020-05-10 11:39:31 -07:00
Linus Torvalds
27d2dcb1b9 Merge tag 'iommu-fixes-v5.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull iommu fixes from Joerg Roedel:

 - Race condition fixes for the AMD IOMMU driver.

   These are five patches fixing two race conditions around
   increase_address_space(). The first race condition was around the
   non-atomic update of the domain page-table root pointer and the
   variable containing the page-table depth (called mode). This is fixed
   now be merging page-table root and mode into one 64-bit field which
   is read/written atomically.

   The second race condition was around updating the page-table root
   pointer and making it public before the hardware caches were flushed.
   This could cause addresses to be mapped and returned to drivers which
   are not reachable by IOMMU hardware yet, causing IO page-faults. This
   is fixed too by adding the necessary flushes before a new page-table
   root is published.

   Related to the race condition fixes these patches also add a missing
   domain_flush_complete() barrier to update_domain() and a fix to bail
   out of the loop which tries to increase the address space when the
   call to increase_address_space() fails.

   Qian was able to trigger the race conditions under high load and
   memory pressure within a few days of testing. He confirmed that he
   has seen no issues anymore with the fixes included here.

 - Fix for a list-handling bug in the VirtIO IOMMU driver.

* tag 'iommu-fixes-v5.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
  iommu/virtio: Reverse arguments to list_add
  iommu/amd: Do not flush Device Table in iommu_map_page()
  iommu/amd: Update Device Table in increase_address_space()
  iommu/amd: Call domain_flush_complete() in update_domain()
  iommu/amd: Do not loop forever when trying to increase address space
  iommu/amd: Fix race in increase_address_space()/fetch_pte()
2020-05-10 11:26:23 -07:00
Linus Torvalds
0a85ed6e7f Merge tag 'block-5.7-2020-05-09' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:

 - a small series fixing a use-after-free of bdi name (Christoph,Yufen)

 - NVMe fix for a regression with the smaller CQ update (Alexey)

 - NVMe fix for a hang at namespace scanning error recovery (Sagi)

 - fix race with blk-iocost iocg->abs_vdebt updates (Tejun)

* tag 'block-5.7-2020-05-09' of git://git.kernel.dk/linux-block:
  nvme: fix possible hang when ns scanning fails during error recovery
  nvme-pci: fix "slimmer CQ head update"
  bdi: add a ->dev_name field to struct backing_dev_info
  bdi: use bdi_dev_name() to get device name
  bdi: move bdi_dev_name out of line
  vboxsf: don't use the source name in the bdi name
  iocost: protect iocg->abs_vdebt with iocg->waitq.lock
2020-05-10 11:16:07 -07:00
Linus Torvalds
e99332e7b4 gcc-10: mark more functions __init to avoid section mismatch warnings
It seems that for whatever reason, gcc-10 ends up not inlining a couple
of functions that used to be inlined before.  Even if they only have one
single callsite - it looks like gcc may have decided that the code was
unlikely, and not worth inlining.

The code generation difference is harmless, but caused a few new section
mismatch errors, since the (now no longer inlined) function wasn't in
the __init section, but called other init functions:

   Section mismatch in reference from the function kexec_free_initrd() to the function .init.text:free_initrd_mem()
   Section mismatch in reference from the function tpm2_calc_event_log_size() to the function .init.text:early_memremap()
   Section mismatch in reference from the function tpm2_calc_event_log_size() to the function .init.text:early_memunmap()

So add the appropriate __init annotation to make modpost not complain.
In both cases there were trivially just a single callsite from another
__init function.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-05-09 17:50:03 -07:00
Linus Torvalds
2e28f3b13a Merge tag 'riscv-for-linus-5.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V fixes from Palmer Dabbelt:
 "A smattering of fixes and cleanups:

   - Dead code removal.

   - Exporting riscv_cpuid_to_hartid_mask for modules.

   - Per-CPU tracking of ISA features.

   - Setting max_pfn correctly when probing memory.

   - Adding a note to the VDSO so glibc can check the kernel's version
     without a uname().

   - A fix to force the bootloader to initialize the boot spin tables,
     which still get used as a fallback when SBI-0.1 is enabled"

* tag 'riscv-for-linus-5.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  RISC-V: Remove unused code from STRICT_KERNEL_RWX
  riscv: force __cpu_up_ variables to put in data section
  riscv: add Linux note to vdso
  riscv: set max_pfn to the PFN of the last page
  RISC-V: Remove N-extension related defines
  RISC-V: Add bitmap reprensenting ISA features common across CPUs
  RISC-V: Export riscv_cpuid_to_hartid_mask() API
2020-05-09 16:24:16 -07:00
Linus Torvalds
1a263ae60b gcc-10: avoid shadowing standard library 'free()' in crypto
gcc-10 has started warning about conflicting types for a few new
built-in functions, particularly 'free()'.

This results in warnings like:

   crypto/xts.c:325:13: warning: conflicting types for built-in function ‘free’; expected ‘void(void *)’ [-Wbuiltin-declaration-mismatch]

because the crypto layer had its local freeing functions called
'free()'.

Gcc-10 is in the wrong here, since that function is marked 'static', and
thus there is no chance of confusion with any standard library function
namespace.

But the simplest thing to do is to just use a different name here, and
avoid this gcc mis-feature.

[ Side note: gcc knowing about 'free()' is in itself not the
  mis-feature: the semantics of 'free()' are special enough that a
  compiler can validly do special things when seeing it.

  So the mis-feature here is that gcc thinks that 'free()' is some
  restricted name, and you can't shadow it as a local static function.

  Making the special 'free()' semantics be a function attribute rather
  than tied to the name would be the much better model ]

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-05-09 15:58:04 -07:00
Linus Torvalds
adc7192096 gcc-10: disable 'restrict' warning for now
gcc-10 now warns about passing aliasing pointers to functions that take
restricted pointers.

That's actually a great warning, and if we ever start using 'restrict'
in the kernel, it might be quite useful.  But right now we don't, and it
turns out that the only thing this warns about is an idiom where we have
declared a few functions to be "printf-like" (which seems to make gcc
pick up the restricted pointer thing), and then we print to the same
buffer that we also use as an input.

And people do that as an odd concatenation pattern, with code like this:

    #define sysfs_show_gen_prop(buffer, fmt, ...) \
        snprintf(buffer, PAGE_SIZE, "%s"fmt, buffer, __VA_ARGS__)

where we have 'buffer' as both the destination of the final result, and
as the initial argument.

Yes, it's a bit questionable.  And outside of the kernel, people do have
standard declarations like

    int snprintf( char *restrict buffer, size_t bufsz,
                  const char *restrict format, ... );

where that output buffer is marked as a restrict pointer that cannot
alias with any other arguments.

But in the context of the kernel, that 'use snprintf() to concatenate to
the end result' does work, and the pattern shows up in multiple places.
And we have not marked our own version of snprintf() as taking restrict
pointers, so the warning is incorrect for now, and gcc picks it up on
its own.

If we do start using 'restrict' in the kernel (and it might be a good
idea if people find places where it matters), we'll need to figure out
how to avoid this issue for snprintf and friends.  But in the meantime,
this warning is not useful.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-05-09 15:45:21 -07:00
Linus Torvalds
5a76021c2e gcc-10: disable 'stringop-overflow' warning for now
This is the final array bounds warning removal for gcc-10 for now.

Again, the warning is good, and we should re-enable all these warnings
when we have converted all the legacy array declaration cases to
flexible arrays. But in the meantime, it's just noise.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-05-09 15:40:52 -07:00
Sagi Grimberg
59c7c3caaa nvme: fix possible hang when ns scanning fails during error recovery
When the controller is reconnecting, the host fails I/O and admin
commands as the host cannot reach the controller. ns scanning may
revalidate namespaces during that period and it is wrong to remove
namespaces due to these failures as we may hang (see 205da24343).

One command that may fail is nvme_identify_ns_descs. Since we return
success due to having ns identify descriptor list optional, we continue
to compare ns identifiers in nvme_revalidate_disk, obviously fail and
return -ENODEV to nvme_validate_ns, which will remove the namespace.

Exactly what we don't want to happen.

Fixes: 22802bf742 ("nvme: Namepace identification descriptor list is optional")
Tested-by: Anton Eidelman <anton@lightbitslabs.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-09 16:07:58 -06:00
Alexey Dobriyan
a8de663916 nvme-pci: fix "slimmer CQ head update"
Pre-incrementing ->cq_head can't be done in memory because OOB value
can be observed by another context.

This devalues space savings compared to original code :-\

	$ ./scripts/bloat-o-meter ../vmlinux-000 ../obj/vmlinux
	add/remove: 0/0 grow/shrink: 0/4 up/down: 0/-32 (-32)
	Function                                     old     new   delta
	nvme_poll_irqdisable                         464     456      -8
	nvme_poll                                    455     447      -8
	nvme_irq                                     388     380      -8
	nvme_dev_disable                             955     947      -8

But the code is minimal now: one read for head, one read for q_depth,
one increment, one comparison, single instruction phase bit update and
one write for new head.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Reported-by: John Garry <john.garry@huawei.com>
Tested-by: John Garry <john.garry@huawei.com>
Fixes: e2a366a4b0 ("nvme-pci: slimmer CQ head update")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-09 16:07:58 -06:00
Christoph Hellwig
6bd87eec23 bdi: add a ->dev_name field to struct backing_dev_info
Cache a copy of the name for the life time of the backing_dev_info
structure so that we can reference it even after unregistering.

Fixes: 68f23b8906 ("memcg: fix a crash in wb_workfn when a device disappears")
Reported-by: Yufen Yu <yuyufen@huawei.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-09 16:07:57 -06:00
Yufen Yu
d51cfc53ad bdi: use bdi_dev_name() to get device name
Use the common interface bdi_dev_name() to get device name.

Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>

Add missing <linux/backing-dev.h> include BFQ

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-09 16:07:39 -06:00
Linus Torvalds
44720996e2 gcc-10: disable 'array-bounds' warning for now
This is another fine warning, related to the 'zero-length-bounds' one,
but hitting the same historical code in the kernel.

Because C didn't historically support flexible array members, we have
code that instead uses a one-sized array, the same way we have cases of
zero-sized arrays.

The one-sized arrays come from either not wanting to use the gcc
zero-sized array extension, or from a slight convenience-feature, where
particularly for strings, the size of the structure now includes the
allocation for the final NUL character.

So with a "char name[1];" at the end of a structure, you can do things
like

       v = my_malloc(sizeof(struct vendor) + strlen(name));

and avoid the "+1" for the terminator.

Yes, the modern way to do that is with a flexible array, and using
'offsetof()' instead of 'sizeof()', and adding the "+1" by hand.  That
also technically gets the size "more correct" in that it avoids any
alignment (and thus padding) issues, but this is another long-term
cleanup thing that will not happen for 5.7.

So disable the warning for now, even though it's potentially quite
useful.  Having a slew of warnings that then hide more urgent new issues
is not an improvement.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-05-09 14:52:44 -07:00
Linus Torvalds
5c45de21a2 gcc-10: disable 'zero-length-bounds' warning for now
This is a fine warning, but we still have a number of zero-length arrays
in the kernel that come from the traditional gcc extension.  Yes, they
are getting converted to flexible arrays, but in the meantime the gcc-10
warning about zero-length bounds is very verbose, and is hiding other
issues.

I missed one actual build failure because it was hidden among hundreds
of lines of warning.  Thankfully I caught it on the second go before
pushing things out, but it convinced me that I really need to disable
the new warnings for now.

We'll hopefully be all done with our conversion to flexible arrays in
the not too distant future, and we can then re-enable this warning.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-05-09 14:30:29 -07:00
Linus Torvalds
78a5255ffb Stop the ad-hoc games with -Wno-maybe-initialized
We have some rather random rules about when we accept the
"maybe-initialized" warnings, and when we don't.

For example, we consider it unreliable for gcc versions < 4.9, but also
if -O3 is enabled, or if optimizing for size.  And then various kernel
config options disabled it, because they know that they trigger that
warning by confusing gcc sufficiently (ie PROFILE_ALL_BRANCHES).

And now gcc-10 seems to be introducing a lot of those warnings too, so
it falls under the same heading as 4.9 did.

At the same time, we have a very straightforward way to _enable_ that
warning when wanted: use "W=2" to enable more warnings.

So stop playing these ad-hoc games, and just disable that warning by
default, with the known and straight-forward "if you want to work on the
extra compiler warnings, use W=123".

Would it be great to have code that is always so obvious that it never
confuses the compiler whether a variable is used initialized or not?
Yes, it would.  In a perfect world, the compilers would be smarter, and
our source code would be simpler.

That's currently not the world we live in, though.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-05-09 13:57:10 -07:00
Linus Torvalds
1d3962ae3b Merge tag 'io_uring-5.7-2020-05-08' of git://git.kernel.dk/linux-block
Pull io_uring fixes from Jens Axboe:

 - Fix finish_wait() balancing in file cancelation (Xiaoguang)

 - Ensure early cleanup of resources in ring map failure (Xiaoguang)

 - Ensure IORING_OP_SLICE does the right file mode checks (Pavel)

 - Remove file opening from openat/openat2/statx, it's not needed and
   messes with O_PATH

* tag 'io_uring-5.7-2020-05-08' of git://git.kernel.dk/linux-block:
  io_uring: don't use 'fd' for openat/openat2/statx
  splice: move f_mode checks to do_{splice,tee}()
  io_uring: handle -EFAULT properly in io_uring_setup()
  io_uring: fix mismatched finish_wait() calls in io_uring_cancel_files()
2020-05-09 12:02:09 -07:00
Linus Torvalds
d5eeab8d7e Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
 "Four minor fixes, all in drivers (qla2xxx, ibmvfc, ibmvscsi)"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: ibmvscsi: Fix WARN_ON during event pool release
  scsi: ibmvfc: Don't send implicit logouts prior to NPIV login
  scsi: qla2xxx: Delete all sessions before unregister local nvme port
  scsi: qla2xxx: Fix hang when issuing nvme disconnect-all in NPIV
2020-05-08 10:36:56 -07:00
Linus Torvalds
eb24fdd8e6 Merge tag 'ceph-for-5.7-rc5' of git://github.com/ceph/ceph-client
Pull ceph fixes from Ilya Dryomov:
 "Fixes for an endianness handling bug that prevented mounts on
  big-endian arches, a spammy log message and a couple error paths.

  Also included a MAINTAINERS update"

* tag 'ceph-for-5.7-rc5' of git://github.com/ceph/ceph-client:
  ceph: demote quotarealm lookup warning to a debug message
  MAINTAINERS: remove myself as ceph co-maintainer
  ceph: fix double unlock in handle_cap_export()
  ceph: fix special error code in ceph_try_get_caps()
  ceph: fix endianness bug when handling MDS session feature bits
2020-05-08 10:27:00 -07:00
Luis Henriques
12ae44a40a ceph: demote quotarealm lookup warning to a debug message
A misconfigured cephx can easily result in having the kernel client
flooding the logs with:

  ceph: Can't lookup inode 1 (err: -13)

Change this message to debug level.

Cc: stable@vger.kernel.org
URL: https://tracker.ceph.com/issues/44546
Signed-off-by: Luis Henriques <lhenriques@suse.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-05-08 18:44:40 +02:00
Linus Torvalds
4334f30ebf Merge tag 'char-misc-5.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc driver fixes from Greg KH:
 "Here are some small driver fixes for 5.7-rc5 that resolve a number of
  minor reported issues:

   - mhi bus driver fixes found as people actually use the code

   - phy driver fixes and compat string additions

   - most driver fix due to link order changing when the core moved out
     of staging

   - mei driver fix

   - interconnect build warning fix

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'char-misc-5.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  bus: mhi: core: Fix channel device name conflict
  bus: mhi: core: Fix typo in comment
  bus: mhi: core: Offload register accesses to the controller
  bus: mhi: core: Remove link_status() callback
  bus: mhi: core: Make sure to powerdown if mhi_sync_power_up fails
  bus: mhi: Fix parsing of mhi_flags
  mei: me: disable mei interface on LBG servers.
  phy: qualcomm: usb-hs-28nm: Prepare clocks in init
  MAINTAINERS: Add Vinod Koul as Generic PHY co-maintainer
  interconnect: qcom: Move the static keyword to the front of declaration
  most: core: use function subsys_initcall()
  bus: mhi: core: Fix a NULL vs IS_ERR check in mhi_create_devices()
  phy: qcom-qusb2: Re add "qcom,sdm845-qusb2-phy" compat string
  phy: tegra: Select USB_COMMON for usb_get_maximum_speed()
2020-05-08 09:11:53 -07:00
Linus Torvalds
c61529f6f5 Merge tag 'driver-core-5.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core fixes from Greg KH:
 "Here are a number of small driver core fixes for 5.7-rc5 to resolve a
  bunch of reported issues with the current tree.

  Biggest here are the reverts and patches from John Stultz to resolve a
  bunch of deferred probe regressions we have been seeing in 5.7-rc
  right now.

  Along with those are some other smaller fixes:

   - coredump crash fix

   - devlink fix for when permissive mode was enabled

   - amba and platform device dma_parms fixes

   - component error silenced for when deferred probe happens

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'driver-core-5.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  regulator: Revert "Use driver_deferred_probe_timeout for regulator_init_complete_work"
  driver core: Ensure wait_for_device_probe() waits until the deferred_probe_timeout fires
  driver core: Use dev_warn() instead of dev_WARN() for deferred_probe_timeout warnings
  driver core: Revert default driver_deferred_probe_timeout value to 0
  component: Silence bind error on -EPROBE_DEFER
  driver core: Fix handling of fw_devlink=permissive
  coredump: fix crash when umh is disabled
  amba: Initialize dma_parms for amba devices
  driver core: platform: Initialize dma_parms for platform devices
2020-05-08 09:06:34 -07:00
Linus Torvalds
e7a1c733fe Merge tag 'staging-5.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull staging driver fixes from Greg KH:
 "Here are three small driver fixes for 5.7-rc5.

  Two of these are documentation fixes:

   - MAINTAINERS update due to removed driver

   - removing Wolfram from the ks7010 driver TODO file

  The other patch is a real fix:

   - fix gasket driver to proper check the return value of a call

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'staging-5.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
  staging: gasket: Check the return value of gasket_get_bar_index()
  staging: ks7010: remove me from CC list
  MAINTAINERS: remove entry after hp100 driver removal
2020-05-08 09:03:49 -07:00
Linus Torvalds
cbd0e48213 Merge tag 'tty-5.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty/serial fixes from Greg KH:
 "Here are three small TTY/Serial/VT fixes for 5.7-rc5:

   - revert for the bcm63xx driver "fix" that was incorrect

   - vt unicode console bugfix

   - xilinx_uartps console driver fix

  All of these have been in linux next with no reported issues"

* tag 'tty-5.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
  tty: xilinx_uartps: Fix missing id assignment to the console
  vt: fix unicode console freeing with a common interface
  Revert "tty: serial: bcm63xx: fix missing clk_put() in bcm63xx_uart"
2020-05-08 08:56:16 -07:00
Linus Torvalds
0a0b96b2e2 Merge tag 'usb-5.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
 "Here are some small USB fixes for 5.7-rc5 to resolve some reported
  issues:

   - syzbot found problems fixed

   - usbfs dma mapping fix

   - typec bugfixs

   - chipidea bugfix

   - usb4/thunderbolt fix

   - new device ids/quirks

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'usb-5.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
  usb: chipidea: msm: Ensure proper controller reset using role switch API
  usb: typec: mux: intel: Handle alt mode HPD_HIGH
  usb: usbfs: correct kernel->user page attribute mismatch
  usb: typec: intel_pmc_mux: Fix the property names
  USB: core: Fix misleading driver bug report
  USB: serial: qcserial: Add DW5816e support
  USB: uas: add quirk for LaCie 2Big Quadra
  thunderbolt: Check return value of tb_sw_read() in usb4_switch_op()
  USB: serial: garmin_gps: add sanity checking for data length
2020-05-08 08:54:00 -07:00
Linus Torvalds
775a8e0316 Merge tag 'drm-fixes-2020-05-08' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
 "Another pretty normal week. I didn't get any i915 fixes yet, so next
  week I'd expect double the usual i915, but otherwise a bunch of amdgpu
  and some scattered other fixes.

  hdcp:
   - fix HDCP regression

  amdgpu:
   - Runtime PM fixes
   - DC fix for PPC
   - Misc DC fixes

  virtio:
   - fix context ordering issue

  sun4i:
   - old gcc warning fix

  ingenic-drm:
   - missing module support"

* tag 'drm-fixes-2020-05-08' of git://anongit.freedesktop.org/drm/drm:
  drm/amd/display: Prevent dpcd reads with passive dongles
  drm/amd/display: fix counter in wait_for_no_pipes_pending
  drm/amd/display: Update DCN2.1 DV Code Revision
  drm: Fix HDCP failures when SRM fw is missing
  sun6i: dsi: fix gcc-4.8
  drm: ingenic-drm: add MODULE_DEVICE_TABLE
  drm/virtio: create context before RESOURCE_CREATE_2D in 3D mode
  drm/amd/display: work around fp code being emitted outside of DC_FP_START/END
  drm/amdgpu/dc: Use WARN_ON_ONCE for ASSERT
  drm/amdgpu: drop redundant cg/pg ungate on runpm enter
  drm/amdgpu: move kfd suspend after ip_suspend_phase1
2020-05-08 08:49:34 -07:00
Linus Torvalds
af38553c66 Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
 "14 fixes and one selftest to verify the ipc fixes herein"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  mm: limit boost_watermark on small zones
  ubsan: disable UBSAN_ALIGNMENT under COMPILE_TEST
  mm/vmscan: remove unnecessary argument description of isolate_lru_pages()
  epoll: atomically remove wait entry on wake up
  kselftests: introduce new epoll60 testcase for catching lost wakeups
  percpu: make pcpu_alloc() aware of current gfp context
  mm/slub: fix incorrect interpretation of s->offset
  scripts/gdb: repair rb_first() and rb_last()
  eventpoll: fix missing wakeup for ovflist in ep_poll_callback
  arch/x86/kvm/svm/sev.c: change flag passed to GUP fast in sev_pin_memory()
  scripts/decodecode: fix trapping instruction formatting
  kernel/kcov.c: fix typos in kcov_remote_start documentation
  mm/page_alloc: fix watchdog soft lockups during set_zone_contiguous()
  mm, memcg: fix error return value of mem_cgroup_css_alloc()
  ipc/mqueue.c: change __do_notify() to bypass check_kill_permission()
2020-05-08 08:41:09 -07:00
Julia Lawall
fb3637a113 iommu/virtio: Reverse arguments to list_add
Elsewhere in the file, there is a list_for_each_entry with
&vdev->resv_regions as the second argument, suggesting that
&vdev->resv_regions is the list head.  So exchange the
arguments on the list_add call to put the list head in the
second argument.

Fixes: 2a5a314874 ("iommu/virtio: Add probe request")
Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>

Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Link: https://lore.kernel.org/r/1588704467-13431-1-git-send-email-Julia.Lawall@inria.fr
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2020-05-08 17:31:18 +02:00
Dave Airlie
a9fe6f18cd Merge tag 'drm-misc-fixes-2020-05-07' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
A few minor fixes for an ordering issue in virtio, an (old) gcc warning
in sun4i, a probe issue in ingenic-drm and a regression in the HDCP
support.

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Maxime Ripard <maxime@cerno.tech>
Link: https://patchwork.freedesktop.org/patch/msgid/20200507160130.id64niqgf5wsha4u@gilmour.lan
2020-05-08 15:04:25 +10:00
Dave Airlie
c61b0b97ef Merge tag 'amd-drm-fixes-5.7-2020-05-06' of git://people.freedesktop.org/~agd5f/linux into drm-fixes
amd-drm-fixes-5.7-2020-05-06:

amdgpu:
- Runtime PM fixes
- DC fix for PPC
- Misc DC fixes

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexdeucher@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200506212257.3893-1-alexander.deucher@amd.com
2020-05-08 13:31:39 +10:00
Linus Torvalds
79dede78c0 Merge branch 'for-v5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull security subsystem fix from James Morris:
 "Fix the default value of fs_context_parse_param hook"

* 'for-v5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
  security: Fix the default value of fs_context_parse_param hook
2020-05-07 19:43:13 -07:00
Henry Willard
14f69140ff mm: limit boost_watermark on small zones
Commit 1c30844d2d ("mm: reclaim small amounts of memory when an
external fragmentation event occurs") adds a boost_watermark() function
which increases the min watermark in a zone by at least
pageblock_nr_pages or the number of pages in a page block.

On Arm64, with 64K pages and 512M huge pages, this is 8192 pages or
512M.  It does this regardless of the number of managed pages managed in
the zone or the likelihood of success.

This can put the zone immediately under water in terms of allocating
pages from the zone, and can cause a small machine to fail immediately
due to OoM.  Unlike set_recommended_min_free_kbytes(), which
substantially increases min_free_kbytes and is tied to THP,
boost_watermark() can be called even if THP is not active.

The problem is most likely to appear on architectures such as Arm64
where pageblock_nr_pages is very large.

It is desirable to run the kdump capture kernel in as small a space as
possible to avoid wasting memory.  In some architectures, such as Arm64,
there are restrictions on where the capture kernel can run, and
therefore, the space available.  A capture kernel running in 768M can
fail due to OoM immediately after boost_watermark() sets the min in zone
DMA32, where most of the memory is, to 512M.  It fails even though there
is over 500M of free memory.  With boost_watermark() suppressed, the
capture kernel can run successfully in 448M.

This patch limits boost_watermark() to boosting a zone's min watermark
only when there are enough pages that the boost will produce positive
results.  In this case that is estimated to be four times as many pages
as pageblock_nr_pages.

Mel said:

: There is no harm in marking it stable.  Clearly it does not happen very
: often but it's not impossible.  32-bit x86 is a lot less common now
: which would previously have been vulnerable to triggering this easily.
: ppc64 has a larger base page size but typically only has one zone.
: arm64 is likely the most vulnerable, particularly when CMA is
: configured with a small movable zone.

Fixes: 1c30844d2d ("mm: reclaim small amounts of memory when an external fragmentation event occurs")
Signed-off-by: Henry Willard <henry.willard@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/1588294148-6586-1-git-send-email-henry.willard@oracle.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-05-07 19:27:21 -07:00
Kees Cook
8d58f222e8 ubsan: disable UBSAN_ALIGNMENT under COMPILE_TEST
The documentation for UBSAN_ALIGNMENT already mentions that it should
not be used on all*config builds (and for efficient-unaligned-access
architectures), so just refactor the Kconfig to correctly implement this
so randconfigs will stop creating insane images that freak out objtool
under CONFIG_UBSAN_TRAP (due to the false positives producing functions
that never return, etc).

Link: http://lkml.kernel.org/r/202005011433.C42EA3E2D@keescook
Fixes: 0887a7ebc9 ("ubsan: add trap instrumentation option")
Signed-off-by: Kees Cook <keescook@chromium.org>
Reported-by: Randy Dunlap <rdunlap@infradead.org>
  Link: https://lore.kernel.org/linux-next/202004231224.D6B3B650@keescook/
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-05-07 19:27:21 -07:00
Qiwu Chen
17e34526f0 mm/vmscan: remove unnecessary argument description of isolate_lru_pages()
Since commit a9e7c39fa9 ("mm/vmscan.c: remove 7th argument of
isolate_lru_pages()"), the explanation of 'mode' argument has been
unnecessary.  Let's remove it.

Signed-off-by: Qiwu Chen <chenqiwu@xiaomi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/20200501090346.2894-1-chenqiwu@xiaomi.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-05-07 19:27:21 -07:00
Roman Penyaev
412895f03c epoll: atomically remove wait entry on wake up
This patch does two things:

 - fixes a lost wakeup introduced by commit 339ddb53d3 ("fs/epoll:
   remove unnecessary wakeups of nested epoll")

 - improves performance for events delivery.

The description of the problem is the following: if N (>1) threads are
waiting on ep->wq for new events and M (>1) events come, it is quite
likely that >1 wakeups hit the same wait queue entry, because there is
quite a big window between __add_wait_queue_exclusive() and the
following __remove_wait_queue() calls in ep_poll() function.

This can lead to lost wakeups, because thread, which was woken up, can
handle not all the events in ->rdllist.  (in better words the problem is
described here: https://lkml.org/lkml/2019/10/7/905)

The idea of the current patch is to use init_wait() instead of
init_waitqueue_entry().

Internally init_wait() sets autoremove_wake_function as a callback,
which removes the wait entry atomically (under the wq locks) from the
list, thus the next coming wakeup hits the next wait entry in the wait
queue, thus preventing lost wakeups.

Problem is very well reproduced by the epoll60 test case [1].

Wait entry removal on wakeup has also performance benefits, because
there is no need to take a ep->lock and remove wait entry from the queue
after the successful wakeup.  Here is the timing output of the epoll60
test case:

  With explicit wakeup from ep_scan_ready_list() (the state of the
  code prior 339ddb53d3):

    real    0m6.970s
    user    0m49.786s
    sys     0m0.113s

 After this patch:

   real    0m5.220s
   user    0m36.879s
   sys     0m0.019s

The other testcase is the stress-epoll [2], where one thread consumes
all the events and other threads produce many events:

  With explicit wakeup from ep_scan_ready_list() (the state of the
  code prior 339ddb53d3):

    threads  events/ms  run-time ms
          8       5427         1474
         16       6163         2596
         32       6824         4689
         64       7060         9064
        128       6991        18309

 After this patch:

    threads  events/ms  run-time ms
          8       5598         1429
         16       7073         2262
         32       7502         4265
         64       7640         8376
        128       7634        16767

 (number of "events/ms" represents event bandwidth, thus higher is
  better; number of "run-time ms" represents overall time spent
  doing the benchmark, thus lower is better)

[1] tools/testing/selftests/filesystems/epoll/epoll_wakeup_test.c
[2] https://github.com/rouming/test-tools/blob/master/stress-epoll.c

Signed-off-by: Roman Penyaev <rpenyaev@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Jason Baron <jbaron@akamai.com>
Cc: Khazhismel Kumykov <khazhy@google.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Heiher <r@hev.cc>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20200430130326.1368509-2-rpenyaev@suse.de
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-05-07 19:27:21 -07:00
Roman Penyaev
474328c06e kselftests: introduce new epoll60 testcase for catching lost wakeups
This test case catches lost wake up introduced by commit 339ddb53d3
("fs/epoll: remove unnecessary wakeups of nested epoll")

The test is simple: we have 10 threads and 10 event fds.  Each thread
can harvest only 1 event.  1 producer fires all 10 events at once and
waits that all 10 events will be observed by 10 threads.

In case of lost wakeup epoll_wait() will timeout and 0 will be returned.

Test case catches two sort of problems: forgotten wakeup on event, which
hits the ->ovflist list, this problem was fixed by:

  5a2513239750 ("eventpoll: fix missing wakeup for ovflist in ep_poll_callback")

the other problem is when several sequential events hit the same waiting
thread, thus other waiters get no wakeups.  Problem is fixed in the
following patch.

Signed-off-by: Roman Penyaev <rpenyaev@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Khazhismel Kumykov <khazhy@google.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Heiher <r@hev.cc>
Cc: Jason Baron <jbaron@akamai.com>
Link: http://lkml.kernel.org/r/20200430130326.1368509-1-rpenyaev@suse.de
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-05-07 19:27:21 -07:00
Filipe Manana
28307d938f percpu: make pcpu_alloc() aware of current gfp context
Since 5.7-rc1, on btrfs we have a percpu counter initialization for
which we always pass a GFP_KERNEL gfp_t argument (this happens since
commit 2992df7326 ("btrfs: Implement DREW lock")).

That is safe in some contextes but not on others where allowing fs
reclaim could lead to a deadlock because we are either holding some
btrfs lock needed for a transaction commit or holding a btrfs
transaction handle open.  Because of that we surround the call to the
function that initializes the percpu counter with a NOFS context using
memalloc_nofs_save() (this is done at btrfs_init_fs_root()).

However it turns out that this is not enough to prevent a possible
deadlock because percpu_alloc() determines if it is in an atomic context
by looking exclusively at the gfp flags passed to it (GFP_KERNEL in this
case) and it is not aware that a NOFS context is set.

Because percpu_alloc() thinks it is in a non atomic context it locks the
pcpu_alloc_mutex.  This can result in a btrfs deadlock when
pcpu_balance_workfn() is running, has acquired that mutex and is waiting
for reclaim, while the btrfs task that called percpu_counter_init() (and
therefore percpu_alloc()) is holding either the btrfs commit_root
semaphore or a transaction handle (done fs/btrfs/backref.c:
iterate_extent_inodes()), which prevents reclaim from finishing as an
attempt to commit the current btrfs transaction will deadlock.

Lockdep reports this issue with the following trace:

  ======================================================
  WARNING: possible circular locking dependency detected
  5.6.0-rc7-btrfs-next-77 #1 Not tainted
  ------------------------------------------------------
  kswapd0/91 is trying to acquire lock:
  ffff8938a3b3fdc8 (&delayed_node->mutex){+.+.}, at: __btrfs_release_delayed_node.part.0+0x3f/0x320 [btrfs]

  but task is already holding lock:
  ffffffffb4f0dbc0 (fs_reclaim){+.+.}, at: __fs_reclaim_acquire+0x5/0x30

  which lock already depends on the new lock.

  the existing dependency chain (in reverse order) is:

  -> #4 (fs_reclaim){+.+.}:
         fs_reclaim_acquire.part.0+0x25/0x30
         __kmalloc+0x5f/0x3a0
         pcpu_create_chunk+0x19/0x230
         pcpu_balance_workfn+0x56a/0x680
         process_one_work+0x235/0x5f0
         worker_thread+0x50/0x3b0
         kthread+0x120/0x140
         ret_from_fork+0x3a/0x50

  -> #3 (pcpu_alloc_mutex){+.+.}:
         __mutex_lock+0xa9/0xaf0
         pcpu_alloc+0x480/0x7c0
         __percpu_counter_init+0x50/0xd0
         btrfs_drew_lock_init+0x22/0x70 [btrfs]
         btrfs_get_fs_root+0x29c/0x5c0 [btrfs]
         resolve_indirect_refs+0x120/0xa30 [btrfs]
         find_parent_nodes+0x50b/0xf30 [btrfs]
         btrfs_find_all_leafs+0x60/0xb0 [btrfs]
         iterate_extent_inodes+0x139/0x2f0 [btrfs]
         iterate_inodes_from_logical+0xa1/0xe0 [btrfs]
         btrfs_ioctl_logical_to_ino+0xb4/0x190 [btrfs]
         btrfs_ioctl+0x165a/0x3130 [btrfs]
         ksys_ioctl+0x87/0xc0
         __x64_sys_ioctl+0x16/0x20
         do_syscall_64+0x5c/0x260
         entry_SYSCALL_64_after_hwframe+0x49/0xbe

  -> #2 (&fs_info->commit_root_sem){++++}:
         down_write+0x38/0x70
         btrfs_cache_block_group+0x2ec/0x500 [btrfs]
         find_free_extent+0xc6a/0x1600 [btrfs]
         btrfs_reserve_extent+0x9b/0x180 [btrfs]
         btrfs_alloc_tree_block+0xc1/0x350 [btrfs]
         alloc_tree_block_no_bg_flush+0x4a/0x60 [btrfs]
         __btrfs_cow_block+0x122/0x5a0 [btrfs]
         btrfs_cow_block+0x106/0x240 [btrfs]
         commit_cowonly_roots+0x55/0x310 [btrfs]
         btrfs_commit_transaction+0x509/0xb20 [btrfs]
         sync_filesystem+0x74/0x90
         generic_shutdown_super+0x22/0x100
         kill_anon_super+0x14/0x30
         btrfs_kill_super+0x12/0x20 [btrfs]
         deactivate_locked_super+0x31/0x70
         cleanup_mnt+0x100/0x160
         task_work_run+0x93/0xc0
         exit_to_usermode_loop+0xf9/0x100
         do_syscall_64+0x20d/0x260
         entry_SYSCALL_64_after_hwframe+0x49/0xbe

  -> #1 (&space_info->groups_sem){++++}:
         down_read+0x3c/0x140
         find_free_extent+0xef6/0x1600 [btrfs]
         btrfs_reserve_extent+0x9b/0x180 [btrfs]
         btrfs_alloc_tree_block+0xc1/0x350 [btrfs]
         alloc_tree_block_no_bg_flush+0x4a/0x60 [btrfs]
         __btrfs_cow_block+0x122/0x5a0 [btrfs]
         btrfs_cow_block+0x106/0x240 [btrfs]
         btrfs_search_slot+0x50c/0xd60 [btrfs]
         btrfs_lookup_inode+0x3a/0xc0 [btrfs]
         __btrfs_update_delayed_inode+0x90/0x280 [btrfs]
         __btrfs_commit_inode_delayed_items+0x81f/0x870 [btrfs]
         __btrfs_run_delayed_items+0x8e/0x180 [btrfs]
         btrfs_commit_transaction+0x31b/0xb20 [btrfs]
         iterate_supers+0x87/0xf0
         ksys_sync+0x60/0xb0
         __ia32_sys_sync+0xa/0x10
         do_syscall_64+0x5c/0x260
         entry_SYSCALL_64_after_hwframe+0x49/0xbe

  -> #0 (&delayed_node->mutex){+.+.}:
         __lock_acquire+0xef0/0x1c80
         lock_acquire+0xa2/0x1d0
         __mutex_lock+0xa9/0xaf0
         __btrfs_release_delayed_node.part.0+0x3f/0x320 [btrfs]
         btrfs_evict_inode+0x40d/0x560 [btrfs]
         evict+0xd9/0x1c0
         dispose_list+0x48/0x70
         prune_icache_sb+0x54/0x80
         super_cache_scan+0x124/0x1a0
         do_shrink_slab+0x176/0x440
         shrink_slab+0x23a/0x2c0
         shrink_node+0x188/0x6e0
         balance_pgdat+0x31d/0x7f0
         kswapd+0x238/0x550
         kthread+0x120/0x140
         ret_from_fork+0x3a/0x50

  other info that might help us debug this:

  Chain exists of:
    &delayed_node->mutex --> pcpu_alloc_mutex --> fs_reclaim

   Possible unsafe locking scenario:

         CPU0                    CPU1
         ----                    ----
    lock(fs_reclaim);
                                 lock(pcpu_alloc_mutex);
                                 lock(fs_reclaim);
    lock(&delayed_node->mutex);

   *** DEADLOCK ***

  3 locks held by kswapd0/91:
   #0: (fs_reclaim){+.+.}, at: __fs_reclaim_acquire+0x5/0x30
   #1: (shrinker_rwsem){++++}, at: shrink_slab+0x12f/0x2c0
   #2: (&type->s_umount_key#43){++++}, at: trylock_super+0x16/0x50

  stack backtrace:
  CPU: 1 PID: 91 Comm: kswapd0 Not tainted 5.6.0-rc7-btrfs-next-77 #1
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org 04/01/2014
  Call Trace:
   dump_stack+0x8f/0xd0
   check_noncircular+0x170/0x190
   __lock_acquire+0xef0/0x1c80
   lock_acquire+0xa2/0x1d0
   __mutex_lock+0xa9/0xaf0
   __btrfs_release_delayed_node.part.0+0x3f/0x320 [btrfs]
   btrfs_evict_inode+0x40d/0x560 [btrfs]
   evict+0xd9/0x1c0
   dispose_list+0x48/0x70
   prune_icache_sb+0x54/0x80
   super_cache_scan+0x124/0x1a0
   do_shrink_slab+0x176/0x440
   shrink_slab+0x23a/0x2c0
   shrink_node+0x188/0x6e0
   balance_pgdat+0x31d/0x7f0
   kswapd+0x238/0x550
   kthread+0x120/0x140
   ret_from_fork+0x3a/0x50

This could be fixed by making btrfs pass GFP_NOFS instead of GFP_KERNEL
to percpu_counter_init() in contextes where it is not reclaim safe,
however that type of approach is discouraged since
memalloc_[nofs|noio]_save() were introduced.  Therefore this change
makes pcpu_alloc() look up into an existing nofs/noio context before
deciding whether it is in an atomic context or not.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Dennis Zhou <dennis@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Christoph Lameter <cl@linux.com>
Link: http://lkml.kernel.org/r/20200430164356.15543-1-fdmanana@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-05-07 19:27:21 -07:00
Waiman Long
cbfc35a486 mm/slub: fix incorrect interpretation of s->offset
In a couple of places in the slub memory allocator, the code uses
"s->offset" as a check to see if the free pointer is put right after the
object.  That check is no longer true with commit 3202fa62fb ("slub:
relocate freelist pointer to middle of object").

As a result, echoing "1" into the validate sysfs file, e.g.  of dentry,
may cause a bunch of "Freepointer corrupt" error reports like the
following to appear with the system in panic afterwards.

  =============================================================================
  BUG dentry(666:pmcd.service) (Tainted: G    B): Freepointer corrupt
  -----------------------------------------------------------------------------

To fix it, use the check "s->offset == s->inuse" in the new helper
function freeptr_outside_object() instead.  Also add another helper
function get_info_end() to return the end of info block (inuse + free
pointer if not overlapping with object).

Fixes: 3202fa62fb ("slub: relocate freelist pointer to middle of object")
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Rafael Aquini <aquini@redhat.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Vitaly Nikolenko <vnik@duasynt.com>
Cc: Silvio Cesare <silvio.cesare@gmail.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Markus Elfring <Markus.Elfring@web.de>
Cc: Changbin Du <changbin.du@gmail.com>
Link: http://lkml.kernel.org/r/20200429135328.26976-1-longman@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-05-07 19:27:20 -07:00
Aymeric Agon-Rambosson
50e36be1fb scripts/gdb: repair rb_first() and rb_last()
The current implementations of the rb_first() and rb_last() gdb
functions have a variable that references itself in its instanciation,
which causes the function to throw an error if a specific condition on
the argument is met.  The original author rather intended to reference
the argument and made a typo.  Referring the argument instead makes the
function work as intended.

Signed-off-by: Aymeric Agon-Rambosson <aymeric.agon@yandex.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Cc: Jan Kiszka <jan.kiszka@siemens.com>
Cc: Kieran Bingham <kbingham@kernel.org>
Cc: Douglas Anderson <dianders@chromium.org>
Cc: Nikolay Borisov <n.borisov.lkml@gmail.com>
Cc: Jackie Liu <liuyun01@kylinos.cn>
Cc: Jason Wessel <jason.wessel@windriver.com>
Link: http://lkml.kernel.org/r/20200427051029.354840-1-aymeric.agon@yandex.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-05-07 19:27:20 -07:00
Khazhismel Kumykov
0c54a6a44b eventpoll: fix missing wakeup for ovflist in ep_poll_callback
In the event that we add to ovflist, before commit 339ddb53d3
("fs/epoll: remove unnecessary wakeups of nested epoll") we would be
woken up by ep_scan_ready_list, and did no wakeup in ep_poll_callback.

With that wakeup removed, if we add to ovflist here, we may never wake
up.  Rather than adding back the ep_scan_ready_list wakeup - which was
resulting in unnecessary wakeups, trigger a wake-up in ep_poll_callback.

We noticed that one of our workloads was missing wakeups starting with
339ddb53d3 and upon manual inspection, this wakeup seemed missing to me.
With this patch added, we no longer see missing wakeups.  I haven't yet
tried to make a small reproducer, but the existing kselftests in
filesystem/epoll passed for me with this patch.

[khazhy@google.com: use if/elif instead of goto + cleanup suggested by Roman]
  Link: http://lkml.kernel.org/r/20200424190039.192373-1-khazhy@google.com
Fixes: 339ddb53d3 ("fs/epoll: remove unnecessary wakeups of nested epoll")
Signed-off-by: Khazhismel Kumykov <khazhy@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Roman Penyaev <rpenyaev@suse.de>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Roman Penyaev <rpenyaev@suse.de>
Cc: Heiher <r@hev.cc>
Cc: Jason Baron <jbaron@akamai.com>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20200424025057.118641-1-khazhy@google.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-05-07 19:27:20 -07:00
Janakarajan Natarajan
996ed22c7a arch/x86/kvm/svm/sev.c: change flag passed to GUP fast in sev_pin_memory()
When trying to lock read-only pages, sev_pin_memory() fails because
FOLL_WRITE is used as the flag for get_user_pages_fast().

Commit 73b0140bf0 ("mm/gup: change GUP fast to use flags rather than a
write 'bool'") updated the get_user_pages_fast() call sites to use
flags, but incorrectly updated the call in sev_pin_memory().  As the
original coding of this call was correct, revert the change made by that
commit.

Fixes: 73b0140bf0 ("mm/gup: change GUP fast to use flags rather than a write 'bool'")
Signed-off-by: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Wanpeng Li <wanpengli@tencent.com>
Cc: Jim Mattson <jmattson@google.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: Mike Marshall <hubcap@omnibond.com>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Link: http://lkml.kernel.org/r/20200423152419.87202-1-Janakarajan.Natarajan@amd.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-05-07 19:27:20 -07:00
Ivan Delalande
e08df079b2 scripts/decodecode: fix trapping instruction formatting
If the trapping instruction contains a ':', for a memory access through
segment registers for example, the sed substitution will insert the '*'
marker in the middle of the instruction instead of the line address:

	2b:   65 48 0f c7 0f          cmpxchg16b %gs:*(%rdi)          <-- trapping instruction

I started to think I had forgotten some quirk of the assembly syntax
before noticing that it was actually coming from the script.  Fix it to
add the address marker at the right place for these instructions:

	28:   49 8b 06                mov    (%r14),%rax
	2b:*  65 48 0f c7 0f          cmpxchg16b %gs:(%rdi)           <-- trapping instruction
	30:   0f 94 c0                sete   %al

Fixes: 18ff44b189 ("scripts/decodecode: make faulting insn ptr more robust")
Signed-off-by: Ivan Delalande <colona@arista.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/20200419223653.GA31248@visor
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-05-07 19:27:20 -07:00
Maciej Grochowski
324cfb1956 kernel/kcov.c: fix typos in kcov_remote_start documentation
Signed-off-by: Maciej Grochowski <maciej.grochowski@pm.me>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Andrey Konovalov <andreyknvl@google.com>
Link: http://lkml.kernel.org/r/20200420030259.31674-1-maciek.grochowski@gmail.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-05-07 19:27:20 -07:00
David Hildenbrand
e84fe99b68 mm/page_alloc: fix watchdog soft lockups during set_zone_contiguous()
Without CONFIG_PREEMPT, it can happen that we get soft lockups detected,
e.g., while booting up.

  watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swapper/0:1]
  CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.6.0-next-20200331+ #4
  Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014
  RIP: __pageblock_pfn_to_page+0x134/0x1c0
  Call Trace:
   set_zone_contiguous+0x56/0x70
   page_alloc_init_late+0x166/0x176
   kernel_init_freeable+0xfa/0x255
   kernel_init+0xa/0x106
   ret_from_fork+0x35/0x40

The issue becomes visible when having a lot of memory (e.g., 4TB)
assigned to a single NUMA node - a system that can easily be created
using QEMU.  Inside VMs on a hypervisor with quite some memory
overcommit, this is fairly easy to trigger.

Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Pavel Tatashin <pasha.tatashin@soleen.com>
Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Reviewed-by: Baoquan He <bhe@redhat.com>
Reviewed-by: Shile Zhang <shile.zhang@linux.alibaba.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
Cc: Shile Zhang <shile.zhang@linux.alibaba.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Alexander Duyck <alexander.duyck@gmail.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20200416073417.5003-1-david@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-05-07 19:27:20 -07:00
Yafang Shao
11d6761218 mm, memcg: fix error return value of mem_cgroup_css_alloc()
When I run my memcg testcase which creates lots of memcgs, I found
there're unexpected out of memory logs while there're still enough
available free memory.  The error log is

  mkdir: cannot create directory 'foo.65533': Cannot allocate memory

The reason is when we try to create more than MEM_CGROUP_ID_MAX memcgs,
an -ENOMEM errno will be set by mem_cgroup_css_alloc(), but the right
errno should be -ENOSPC "No space left on device", which is an
appropriate errno for userspace's failed mkdir.

As the errno really misled me, we should make it right.  After this
patch, the error log will be

  mkdir: cannot create directory 'foo.65533': No space left on device

[akpm@linux-foundation.org: s/EBUSY/ENOSPC/, per Michal]
[akpm@linux-foundation.org: s/EBUSY/ENOSPC/, per Michal]
Fixes: 73f576c04b ("mm: memcontrol: fix cgroup creation failure after many small jobs")
Suggested-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Michal Hocko <mhocko@kernel.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Link: http://lkml.kernel.org/r/20200407063621.GA18914@dhcp22.suse.cz
Link: http://lkml.kernel.org/r/1586192163-20099-1-git-send-email-laoar.shao@gmail.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-05-07 19:27:20 -07:00
Oleg Nesterov
b5f2006144 ipc/mqueue.c: change __do_notify() to bypass check_kill_permission()
Commit cc731525f2 ("signal: Remove kernel interal si_code magic")
changed the value of SI_FROMUSER(SI_MESGQ), this means that mq_notify() no
longer works if the sender doesn't have rights to send a signal.

Change __do_notify() to use do_send_sig_info() instead of kill_pid_info()
to avoid check_kill_permission().

This needs the additional notify.sigev_signo != 0 check, shouldn't we
change do_mq_notify() to deny sigev_signo == 0 ?

Test-case:

	#include <signal.h>
	#include <mqueue.h>
	#include <unistd.h>
	#include <sys/wait.h>
	#include <assert.h>

	static int notified;

	static void sigh(int sig)
	{
		notified = 1;
	}

	int main(void)
	{
		signal(SIGIO, sigh);

		int fd = mq_open("/mq", O_RDWR|O_CREAT, 0666, NULL);
		assert(fd >= 0);

		struct sigevent se = {
			.sigev_notify	= SIGEV_SIGNAL,
			.sigev_signo	= SIGIO,
		};
		assert(mq_notify(fd, &se) == 0);

		if (!fork()) {
			assert(setuid(1) == 0);
			mq_send(fd, "",1,0);
			return 0;
		}

		wait(NULL);
		mq_unlink("/mq");
		assert(notified);
		return 0;
	}

[manfred@colorfullife.com: 1) Add self_exec_id evaluation so that the implementation matches do_notify_parent 2) use PIDTYPE_TGID everywhere]
Fixes: cc731525f2 ("signal: Remove kernel interal si_code magic")
Reported-by: Yoji <yoji.fujihar.min@gmail.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Markus Elfring <elfring@users.sourceforge.net>
Cc: <1vier1@web.de>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/e2a782e4-eab9-4f5c-c749-c07a8f7a4e66@colorfullife.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-05-07 19:27:20 -07:00
Linus Torvalds
192ffb7515 Merge tag 'trace-v5.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fixes from Steven Rostedt:

 - Fix bootconfig causing kernels to fail with CONFIG_BLK_DEV_RAM
   enabled

 - Fix allocation leaks in bootconfig tool

 - Fix a double initialization of a variable

 - Fix API bootconfig usage from kprobe boot time events

 - Reject NULL location for kprobes

 - Fix crash caused by preempt delay module not cleaning up kthread
   correctly

 - Add vmalloc_sync_mappings() to prevent x86_64 page faults from
   recursively faulting from tracing page faults

 - Fix comment in gpu/trace kerneldoc header

 - Fix documentation of how to create a trace event class

 - Make the local tracing_snapshot_instance_cond() function static

* tag 'trace-v5.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tools/bootconfig: Fix resource leak in apply_xbc()
  tracing: Make tracing_snapshot_instance_cond() static
  tracing: Fix doc mistakes in trace sample
  gpu/trace: Minor comment updates for gpu_mem_total tracepoint
  tracing: Add a vmalloc_sync_mappings() for safe measure
  tracing: Wait for preempt irq delay thread to finish
  tracing/kprobes: Reject new event if loc is NULL
  tracing/boottime: Fix kprobe event API usage
  tracing/kprobes: Fix a double initialization typo
  bootconfig: Fix to remove bootconfig data from initrd while boot
2020-05-07 15:27:11 -07:00
Linus Torvalds
9ecc4d775f Merge tag 'linux-kselftest-5.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull kselftest fixes from Shuah Khan:
 "ftrace test fixes and a fix to kvm Makefile for relocatable
  native/cross builds and installs"

* tag 'linux-kselftest-5.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  selftests: fix kvm relocatable native/cross builds and installs
  selftests/ftrace: Make XFAIL green color
  ftrace/selftest: make unresolved cases cause failure if --fail-unresolved set
  ftrace/selftests: workaround cgroup RT scheduling issues
2020-05-07 15:22:08 -07:00
Jens Axboe
63ff822358 io_uring: don't use 'fd' for openat/openat2/statx
We currently make some guesses as when to open this fd, but in reality
we have no business (or need) to do so at all. In fact, it makes certain
things fail, like O_PATH.

Remove the fd lookup from these opcodes, we're just passing the 'fd' to
generic helpers anyway. With that, we can also remove the special casing
of fd values in io_req_needs_file(), and the 'fd_non_neg' check that
we have. And we can ensure that we only read sqe->fd once.

This fixes O_PATH usage with openat/openat2, and ditto statx path side
oddities.

Cc: stable@vger.kernel.org: # v5.6
Reported-by: Max Kellermann <mk@cm4all.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-07 14:56:15 -06:00
Yunfeng Ye
8842604446 tools/bootconfig: Fix resource leak in apply_xbc()
Fix the @data and @fd allocations that are leaked in the error path of
apply_xbc().

Link: http://lkml.kernel.org/r/583a49c9-c27a-931d-e6c2-6f63a4b18bea@huawei.com

Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Yunfeng Ye <yeyunfeng@huawei.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-05-07 14:18:27 -04:00
Zou Wei
192b7993b3 tracing: Make tracing_snapshot_instance_cond() static
Fix the following sparse warning:

kernel/trace/trace.c:950:6: warning: symbol 'tracing_snapshot_instance_cond'
was not declared. Should it be static?

Link: http://lkml.kernel.org/r/1587614905-48692-1-git-send-email-zou_wei@huawei.com

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zou Wei <zou_wei@huawei.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-05-07 13:32:58 -04:00
Wei Yang
f094a233e1 tracing: Fix doc mistakes in trace sample
As the example below shows, DECLARE_EVENT_CLASS() is used instead of
DEFINE_EVENT_CLASS().

Link: http://lkml.kernel.org/r/20200428214959.11259-1-richard.weiyang@gmail.com

Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-05-07 13:32:57 -04:00
Yiwei Zhang
386c82a703 gpu/trace: Minor comment updates for gpu_mem_total tracepoint
This change updates the improper comment for the 'size' attribute in the
tracepoint definition. Most gfx drivers pre-fault in physical pages
instead of making virtual allocations. So we drop the 'Virtual' keyword
here and leave this to the implementations.

Link: http://lkml.kernel.org/r/20200428220825.169606-1-zzyiwei@google.com

Signed-off-by: Yiwei Zhang <zzyiwei@google.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-05-07 13:32:57 -04:00
Steven Rostedt (VMware)
11f5efc3ab tracing: Add a vmalloc_sync_mappings() for safe measure
x86_64 lazily maps in the vmalloc pages, and the way this works with per_cpu
areas can be complex, to say the least. Mappings may happen at boot up, and
if nothing synchronizes the page tables, those page mappings may not be
synced till they are used. This causes issues for anything that might touch
one of those mappings in the path of the page fault handler. When one of
those unmapped mappings is touched in the page fault handler, it will cause
another page fault, which in turn will cause a page fault, and leave us in
a loop of page faults.

Commit 763802b53a ("x86/mm: split vmalloc_sync_all()") split
vmalloc_sync_all() into vmalloc_sync_unmappings() and
vmalloc_sync_mappings(), as on system exit, it did not need to do a full
sync on x86_64 (although it still needed to be done on x86_32). By chance,
the vmalloc_sync_all() would synchronize the page mappings done at boot up
and prevent the per cpu area from being a problem for tracing in the page
fault handler. But when that synchronization in the exit of a task became a
nop, it caused the problem to appear.

Link: https://lore.kernel.org/r/20200429054857.66e8e333@oasis.local.home

Cc: stable@vger.kernel.org
Fixes: 737223fbca ("tracing: Consolidate buffer allocation code")
Reported-by: "Tzvetomir Stoyanov (VMware)" <tz.stoyanov@gmail.com>
Suggested-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-05-07 13:32:57 -04:00
Steven Rostedt (VMware)
d16a8c3107 tracing: Wait for preempt irq delay thread to finish
Running on a slower machine, it is possible that the preempt delay kernel
thread may still be executing if the module was immediately removed after
added, and this can cause the kernel to crash as the kernel thread might be
executing after its code has been removed.

There's no reason that the caller of the code shouldn't just wait for the
delay thread to finish, as the thread can also be created by a trigger in
the sysfs code, which also has the same issues.

Link: http://lore.kernel.org/r/5EA2B0C8.2080706@cn.fujitsu.com

Cc: stable@vger.kernel.org
Fixes: 793937236d ("lib: Add module for testing preemptoff/irqsoff latency tracers")
Reported-by: Xiao Yang <yangx.jy@cn.fujitsu.com>
Reviewed-by: Xiao Yang <yangx.jy@cn.fujitsu.com>
Reviewed-by: Joel Fernandes <joel@joelfernandes.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-05-07 13:32:40 -04:00
Linus Torvalds
6e7f2eacf0 Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fix from Catalin Marinas:
 "Avoid potential NULL dereference in huge_pte_alloc() on pmd_alloc()
  failure"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: hugetlb: avoid potential NULL dereference
2020-05-07 09:55:58 -07:00
Linus Torvalds
8c16ec94dc Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
 "Bugfixes, mostly for ARM and AMD, and more documentation.

  Slightly bigger than usual because I couldn't send out what was
  pending for rc4, but there is nothing worrisome going on. I have more
  fixes pending for guest debugging support (gdbstub) but I will send
  them next week"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (22 commits)
  KVM: X86: Declare KVM_CAP_SET_GUEST_DEBUG properly
  KVM: selftests: Fix build for evmcs.h
  kvm: x86: Use KVM CPU capabilities to determine CR4 reserved bits
  KVM: VMX: Explicitly clear RFLAGS.CF and RFLAGS.ZF in VM-Exit RSB path
  docs/virt/kvm: Document configuring and running nested guests
  KVM: s390: Remove false WARN_ON_ONCE for the PQAP instruction
  kvm: ioapic: Restrict lazy EOI update to edge-triggered interrupts
  KVM: x86: Fixes posted interrupt check for IRQs delivery modes
  KVM: SVM: fill in kvm_run->debug.arch.dr[67]
  KVM: nVMX: Replace a BUG_ON(1) with BUG() to squash clang warning
  KVM: arm64: Fix 32bit PC wrap-around
  KVM: arm64: vgic-v4: Initialize GICv4.1 even in the absence of a virtual ITS
  KVM: arm64: Save/restore sp_el0 as part of __guest_enter
  KVM: arm64: Delete duplicated label in invalid_vector
  KVM: arm64: vgic-its: Fix memory leak on the error path of vgic_add_lpi()
  KVM: arm64: vgic-v3: Retire all pending LPIs on vcpu destroy
  KVM: arm: vgic-v2: Only use the virtual state when userspace accesses pending bits
  KVM: arm: vgic: Only use the virtual state when userspace accesses enable bits
  KVM: arm: vgic: Synchronize the whole guest on GIC{D,R}_I{S,C}ACTIVER read
  KVM: arm64: PSCI: Forbid 64bit functions for 32bit guests
  ...
2020-05-07 09:50:59 -07:00
Linus Torvalds
de268ccb42 Merge tag 'configfs-for-5.7' of git://git.infradead.org/users/hch/configfs
Pull configfs fix from Christoph Hellwig:
 "Fix a refcount leak in configfs_rmdir (Xiyu Yang)"

* tag 'configfs-for-5.7' of git://git.infradead.org/users/hch/configfs:
  configfs: fix config_item refcnt leak in configfs_rmdir()
2020-05-07 09:48:37 -07:00
Pavel Begunkov
90da2e3f25 splice: move f_mode checks to do_{splice,tee}()
do_splice() is used by io_uring, as will be do_tee(). Move f_mode
checks from sys_{splice,tee}() to do_{splice,tee}(), so they're
enforced for io_uring as well.

Fixes: 7d67af2c01 ("io_uring: add splice(2) support")
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-07 09:45:07 -06:00
Josh Poimboeuf
1119d265bc objtool: Fix infinite loop in find_jump_table()
Kristen found a hang in objtool when building with -ffunction-sections.

It was caused by evergreen_pcie_gen2_enable.cold() being laid out
immediately before evergreen_pcie_gen2_enable().  Since their "pfunc" is
always the same, find_jump_table() got into an infinite loop because it
didn't recognize the boundary between the two functions.

Fix that with a new prev_insn_same_sym() helper, which doesn't cross
subfunction boundaries.

Reported-by: Kristen Carlson Accardi <kristen@linux.intel.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/378b51c9d9c894dc3294bc460b4b0869e950b7c5.1588110291.git.jpoimboe@redhat.com
2020-05-07 17:22:31 +02:00
Christoph Hellwig
eb7ae5e06b bdi: move bdi_dev_name out of line
bdi_dev_name is not a fast path function, move it out of line.  This
prepares for using it from modular callers without having to export
an implementation detail like bdi_unknown_name.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-07 08:45:47 -06:00
Christoph Hellwig
156c757372 vboxsf: don't use the source name in the bdi name
Simplify the bdi name to mirror what we are doing elsewhere, and
drop them name in favor of just using a number.  This avoids a
potentially very long bdi name.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-07 08:45:47 -06:00
Mark Rutland
027d0c7101 arm64: hugetlb: avoid potential NULL dereference
The static analyzer in GCC 10 spotted that in huge_pte_alloc() we may
pass a NULL pmdp into pte_alloc_map() when pmd_alloc() returns NULL:

|   CC      arch/arm64/mm/pageattr.o
|   CC      arch/arm64/mm/hugetlbpage.o
|                  from arch/arm64/mm/hugetlbpage.c:10:
| arch/arm64/mm/hugetlbpage.c: In function ‘huge_pte_alloc’:
| ./arch/arm64/include/asm/pgtable-types.h:28:24: warning: dereference of NULL ‘pmdp’ [CWE-690] [-Wanalyzer-null-dereference]
| ./arch/arm64/include/asm/pgtable.h:436:26: note: in expansion of macro ‘pmd_val’
| arch/arm64/mm/hugetlbpage.c:242:10: note: in expansion of macro ‘pte_alloc_map’
|     |arch/arm64/mm/hugetlbpage.c:232:10:
|     |./arch/arm64/include/asm/pgtable-types.h:28:24:
| ./arch/arm64/include/asm/pgtable.h:436:26: note: in expansion of macro ‘pmd_val’
| arch/arm64/mm/hugetlbpage.c:242:10: note: in expansion of macro ‘pte_alloc_map’

This can only occur when the kernel cannot allocate a page, and so is
unlikely to happen in practice before other systems start failing.

We can avoid this by bailing out if pmd_alloc() fails, as we do earlier
in the function if pud_alloc() fails.

Fixes: 66b3923a1a ("arm64: hugetlb: add support for PTE contiguous bit")
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reported-by: Kyrill Tkachov <kyrylo.tkachov@arm.com>
Cc: <stable@vger.kernel.org> # 4.5.x-
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-05-07 09:24:15 +01:00
Bryan O'Donoghue
91edf63d50 usb: chipidea: msm: Ensure proper controller reset using role switch API
Currently we check to make sure there is no error state on the extcon
handle for VBUS when writing to the HS_PHY_GENCONFIG_2 register. When using
the USB role-switch API we still need to write to this register absent an
extcon handle.

This patch makes the appropriate update to ensure the write happens if
role-switching is true.

Fixes: 05559f10ed ("usb: chipidea: add role switch class support")
Cc: stable <stable@vger.kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Philipp Zabel <p.zabel@pengutronix.de>
Cc: linux-usb@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: Stephen Boyd <swboyd@chromium.org>
Signed-off-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org>
Signed-off-by: Peter Chen <peter.chen@nxp.com>
Link: https://lore.kernel.org/r/20200507004918.25975-2-peter.chen@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-07 08:46:35 +02:00
Linus Torvalds
a811c1fa0a Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from David Miller:

 1) Fix reference count leaks in various parts of batman-adv, from Xiyu
    Yang.

 2) Update NAT checksum even when it is zero, from Guillaume Nault.

 3) sk_psock reference count leak in tls code, also from Xiyu Yang.

 4) Sanity check TCA_FQ_CODEL_DROP_BATCH_SIZE netlink attribute in
    fq_codel, from Eric Dumazet.

 5) Fix panic in choke_reset(), also from Eric Dumazet.

 6) Fix VLAN accel handling in bnxt_fix_features(), from Michael Chan.

 7) Disallow out of range quantum values in sch_sfq, from Eric Dumazet.

 8) Fix crash in x25_disconnect(), from Yue Haibing.

 9) Don't pass pointer to local variable back to the caller in
    nf_osf_hdr_ctx_init(), from Arnd Bergmann.

10) Wireguard should use the ECN decap helper functions, from Toke
    Høiland-Jørgensen.

11) Fix command entry leak in mlx5 driver, from Moshe Shemesh.

12) Fix uninitialized variable access in mptcp's
    subflow_syn_recv_sock(), from Paolo Abeni.

13) Fix unnecessary out-of-order ingress frame ordering in macsec, from
    Scott Dial.

14) IPv6 needs to use a global serial number for dst validation just
    like ipv4, from David Ahern.

15) Fix up PTP_1588_CLOCK deps, from Clay McClure.

16) Missing NLM_F_MULTI flag in gtp driver netlink messages, from
    Yoshiyuki Kurauchi.

17) Fix a regression in that dsa user port errors should not be fatal,
    from Florian Fainelli.

18) Fix iomap leak in enetc driver, from Dejin Zheng.

19) Fix use after free in lec_arp_clear_vccs(), from Cong Wang.

20) Initialize protocol value earlier in neigh code paths when
    generating events, from Roman Mashak.

21) netdev_update_features() must be called with RTNL mutex in macsec
    driver, from Antoine Tenart.

22) Validate untrusted GSO packets even more strictly, from Willem de
    Bruijn.

23) Wireguard decrypt worker needs a cond_resched(), from Jason
    Donenfeld.

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (111 commits)
  net: flow_offload: skip hw stats check for FLOW_ACTION_HW_STATS_DONT_CARE
  MAINTAINERS: put DYNAMIC INTERRUPT MODERATION in proper order
  wireguard: send/receive: use explicit unlikely branch instead of implicit coalescing
  wireguard: selftests: initalize ipv6 members to NULL to squelch clang warning
  wireguard: send/receive: cond_resched() when processing worker ringbuffers
  wireguard: socket: remove errant restriction on looping to self
  wireguard: selftests: use normal kernel stack size on ppc64
  net: ethernet: ti: am65-cpsw-nuss: fix irqs type
  ionic: Use debugfs_create_bool() to export bool
  net: dsa: Do not leave DSA master with NULL netdev_ops
  net: dsa: remove duplicate assignment in dsa_slave_add_cls_matchall_mirred
  net: stricter validation of untrusted gso packets
  seg6: fix SRH processing to comply with RFC8754
  net: mscc: ocelot: ANA_AUTOAGE_AGE_PERIOD holds a value in seconds, not ms
  net: dsa: ocelot: the MAC table on Felix is twice as large
  net: dsa: sja1105: the PTP_CLK extts input reacts on both edges
  selftests: net: tcp_mmap: fix SO_RCVLOWAT setting
  net: hsr: fix incorrect type usage for protocol variable
  net: macsec: fix rtnl locking issue
  net: mvpp2: cls: Prevent buffer overflow in mvpp2_ethtool_cls_rule_del()
  ...
2020-05-06 20:53:22 -07:00
Pablo Neira Ayuso
16f8036086 net: flow_offload: skip hw stats check for FLOW_ACTION_HW_STATS_DONT_CARE
This patch adds FLOW_ACTION_HW_STATS_DONT_CARE which tells the driver
that the frontend does not need counters, this hw stats type request
never fails. The FLOW_ACTION_HW_STATS_DISABLED type explicitly requests
the driver to disable the stats, however, if the driver cannot disable
counters, it bails out.

TCA_ACT_HW_STATS_* maintains the 1:1 mapping with FLOW_ACTION_HW_STATS_*
except by disabled which is mapped to FLOW_ACTION_HW_STATS_DISABLED
(this is 0 in tc). Add tc_act_hw_stats() to perform the mapping between
TCA_ACT_HW_STATS_* and FLOW_ACTION_HW_STATS_*.

Fixes: 319a1d1947 ("flow_offload: check for basic action hw stats type")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06 20:13:10 -07:00
Lukas Bulwahn
b0956956bb MAINTAINERS: put DYNAMIC INTERRUPT MODERATION in proper order
Commit 9b038086f0 ("docs: networking: convert DIM to RST") added a new
file entry to DYNAMIC INTERRUPT MODERATION to the end, and not following
alphabetical order.

So, ./scripts/checkpatch.pl -f MAINTAINERS complains:

  WARNING: Misordered MAINTAINERS entry - list file patterns in alphabetic
  order
  #5966: FILE: MAINTAINERS:5966:
  +F:      lib/dim/
  +F:      Documentation/networking/net_dim.rst

Reorder the file entries to keep MAINTAINERS nicely ordered.

Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06 20:12:33 -07:00
David S. Miller
d3f3e6acb2 Merge branch 'wireguard-fixes'
Jason A. Donenfeld says:

====================
wireguard fixes for 5.7-rc5

With Ubuntu and Debian having backported this into their kernels, we're
finally seeing testing from places we hadn't seen prior, which is nice.
With that comes more fixes:

1) The CI for PPC64 was running with extremely small stacks for 64-bit,
   causing spurious crashes in surprising places.

2) There's was an old leftover routing loop restriction, which no longer
   makes sense given the queueing architecture, and was causing problems
   for people who really did want nested routing.

3) Not yielding our kthread on CONFIG_PREEMPT_VOLUNTARY systems caused
   RCU stalls and other issues, reported by Wang Jian, with the fix
   suggested by Sultan Alsawaf.

4) Clang spewed warnings in a selftest for CONFIG_IPV6=n, reported by
   Arnd Bergmann.

5) A complicated if statement was simplified to an assignment while also
   making the likely/unlikely hinting more correct and simple, and
   increasing readability, suggested by Sultan.

Patches (2) and (3) have Fixes: lines and are probably good candidates
for stable.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06 20:03:48 -07:00
Jason A. Donenfeld
243f214893 wireguard: send/receive: use explicit unlikely branch instead of implicit coalescing
It's very unlikely that send will become true. It's nearly always false
between 0 and 120 seconds of a session, and in most cases becomes true
only between 120 and 121 seconds before becoming false again. So,
unlikely(send) is clearly the right option here.

What happened before was that we had this complex boolean expression
with multiple likely and unlikely clauses nested. Since this is
evaluated left-to-right anyway, the whole thing got converted to
unlikely. So, we can clean this up to better represent what's going on.

The generated code is the same.

Suggested-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06 20:03:47 -07:00
Jason A. Donenfeld
4fed818ef5 wireguard: selftests: initalize ipv6 members to NULL to squelch clang warning
Without setting these to NULL, clang complains in certain
configurations that have CONFIG_IPV6=n:

In file included from drivers/net/wireguard/ratelimiter.c:223:
drivers/net/wireguard/selftest/ratelimiter.c:173:34: error: variable 'skb6' is uninitialized when used here [-Werror,-Wuninitialized]
                ret = timings_test(skb4, hdr4, skb6, hdr6, &test_count);
                                               ^~~~
drivers/net/wireguard/selftest/ratelimiter.c:123:29: note: initialize the variable 'skb6' to silence this warning
        struct sk_buff *skb4, *skb6;
                                   ^
                                    = NULL
drivers/net/wireguard/selftest/ratelimiter.c:173:40: error: variable 'hdr6' is uninitialized when used here [-Werror,-Wuninitialized]
                ret = timings_test(skb4, hdr4, skb6, hdr6, &test_count);
                                                     ^~~~
drivers/net/wireguard/selftest/ratelimiter.c:125:22: note: initialize the variable 'hdr6' to silence this warning
        struct ipv6hdr *hdr6;
                            ^

We silence this warning by setting the variables to NULL as the warning
suggests.

Reported-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06 20:03:47 -07:00
Jason A. Donenfeld
4005f5c3c9 wireguard: send/receive: cond_resched() when processing worker ringbuffers
Users with pathological hardware reported CPU stalls on CONFIG_
PREEMPT_VOLUNTARY=y, because the ringbuffers would stay full, meaning
these workers would never terminate. That turned out not to be okay on
systems without forced preemption, which Sultan observed. This commit
adds a cond_resched() to the bottom of each loop iteration, so that
these workers don't hog the core. Note that we don't need this on the
napi poll worker, since that terminates after its budget is expended.

Suggested-by: Sultan Alsawaf <sultan@kerneltoast.com>
Reported-by: Wang Jian <larkwang@gmail.com>
Fixes: e7096c131e ("net: WireGuard secure network tunnel")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06 20:03:47 -07:00
Jason A. Donenfeld
b673e24aad wireguard: socket: remove errant restriction on looping to self
It's already possible to create two different interfaces and loop
packets between them. This has always been possible with tunnels in the
kernel, and isn't specific to wireguard. Therefore, the networking stack
already needs to deal with that. At the very least, the packet winds up
exceeding the MTU and is discarded at that point. So, since this is
already something that happens, there's no need to forbid the not very
exceptional case of routing a packet back to the same interface; this
loop is no different than others, and we shouldn't special case it, but
rather rely on generic handling of loops in general. This also makes it
easier to do interesting things with wireguard such as onion routing.

At the same time, we add a selftest for this, ensuring that both onion
routing works and infinite routing loops do not crash the kernel. We
also add a test case for wireguard interfaces nesting packets and
sending traffic between each other, as well as the loop in this case
too. We make sure to send some throughput-heavy traffic for this use
case, to stress out any possible recursion issues with the locks around
workqueues.

Fixes: e7096c131e ("net: WireGuard secure network tunnel")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06 20:03:47 -07:00
Jason A. Donenfeld
a0fd7cc87a wireguard: selftests: use normal kernel stack size on ppc64
While at some point it might have made sense to be running these tests
on ppc64 with 4k stacks, the kernel hasn't actually used 4k stacks on
64-bit powerpc in a long time, and more interesting things that we test
don't really work when we deviate from the default (16k). So, we stop
pushing our luck in this commit, and return to the default instead of
the minimum.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06 20:03:47 -07:00
Grygorii Strashko
6f5c27f9c6 net: ethernet: ti: am65-cpsw-nuss: fix irqs type
The K3 INTA driver, which is source TX/RX IRQs for CPSW NUSS, defines IRQs
triggering type as EDGE by default, but triggering type for CPSW NUSS TX/RX
IRQs has to be LEVEL as the EDGE triggering type may cause unnecessary IRQs
triggering and NAPI scheduling for empty queues. It was discovered with
RT-kernel.

Fix it by explicitly specifying CPSW NUSS TX/RX IRQ type as
IRQF_TRIGGER_HIGH.

Fixes: 93a7653031 ("net: ethernet: ti: introduce am65x/j721e gigabit eth subsystem driver")
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06 17:53:00 -07:00
Geert Uytterhoeven
0735ccc9d9 ionic: Use debugfs_create_bool() to export bool
Currently bool ionic_cq.done_color is exported using
debugfs_create_u8(), which requires a cast, preventing further compiler
checks.

Fix this by switching to debugfs_create_bool(), and dropping the cast.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06 17:46:33 -07:00
Florian Fainelli
050569fc83 net: dsa: Do not leave DSA master with NULL netdev_ops
When ndo_get_phys_port_name() for the CPU port was added we introduced
an early check for when the DSA master network device in
dsa_master_ndo_setup() already implements ndo_get_phys_port_name(). When
we perform the teardown operation in dsa_master_ndo_teardown() we would
not be checking that cpu_dp->orig_ndo_ops was successfully allocated and
non-NULL initialized.

With network device drivers such as virtio_net, this leads to a NPD as
soon as the DSA switch hanging off of it gets torn down because we are
now assigning the virtio_net device's netdev_ops a NULL pointer.

Fixes: da7b9e9b00 ("net: dsa: Add ndo_get_phys_port_name() for CPU port")
Reported-by: Allen Pais <allen.pais@oracle.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Tested-by: Allen Pais <allen.pais@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06 17:31:54 -07:00
Vladimir Oltean
657221598f net: dsa: remove duplicate assignment in dsa_slave_add_cls_matchall_mirred
This was caused by a poor merge conflict resolution on my side. The
"act = &cls->rule->action.entries[0];" assignment was already present in
the code prior to the patch mentioned below.

Fixes: e13c207528 ("net: dsa: refactor matchall mirred action to separate function")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06 17:30:35 -07:00
Willem de Bruijn
9274124f02 net: stricter validation of untrusted gso packets
Syzkaller again found a path to a kernel crash through bad gso input:
a packet with transport header extending beyond skb_headlen(skb).

Tighten validation at kernel entry:

- Verify that the transport header lies within the linear section.

    To avoid pulling linux/tcp.h, verify just sizeof tcphdr.
    tcp_gso_segment will call pskb_may_pull (th->doff * 4) before use.

- Match the gso_type against the ip_proto found by the flow dissector.

Fixes: bfd5f4a3d6 ("packet: Add GSO/csum offload support.")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06 17:23:06 -07:00
Ahmed Abdelsalam
0cb7498f23 seg6: fix SRH processing to comply with RFC8754
The Segment Routing Header (SRH) which defines the SRv6 dataplane is defined
in RFC8754.

RFC8754 (section 4.1) defines the SR source node behavior which encapsulates
packets into an outer IPv6 header and SRH. The SR source node encodes the
full list of Segments that defines the packet path in the SRH. Then, the
first segment from list of Segments is copied into the Destination address
of the outer IPv6 header and the packet is sent to the first hop in its path
towards the destination.

If the Segment list has only one segment, the SR source node can omit the SRH
as he only segment is added in the destination address.

RFC8754 (section 4.1.1) defines the Reduced SRH, when a source does not
require the entire SID list to be preserved in the SRH. A reduced SRH does
not contain the first segment of the related SR Policy (the first segment is
the one already in the DA of the IPv6 header), and the Last Entry field is
set to n-2, where n is the number of elements in the SR Policy.

RFC8754 (section 4.3.1.1) defines the SRH processing and the logic to
validate the SRH (S09, S10, S11) which works for both reduced and
non-reduced behaviors.

This patch updates seg6_validate_srh() to validate the SRH as per RFC8754.

Signed-off-by: Ahmed Abdelsalam <ahabdels@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06 17:21:35 -07:00
David S. Miller
6e0ddb6530 Merge branch 'FDB-fixes-for-Felix-and-Ocelot-switches'
Vladimir Oltean says:

====================
FDB fixes for Felix and Ocelot switches

This series fixes the following problems:
- Dynamically learnt addresses never expiring (neither for Ocelot nor
  for Felix)
- Half of the FDB not visible in 'bridge fdb show' (for Felix only)
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06 17:15:38 -07:00
Vladimir Oltean
c0d7eccbc7 net: mscc: ocelot: ANA_AUTOAGE_AGE_PERIOD holds a value in seconds, not ms
One may notice that automatically-learnt entries 'never' expire, even
though the bridge configures the address age period at 300 seconds.

Actually the value written to hardware corresponds to a time interval
1000 times higher than intended, i.e. 83 hours.

Fixes: a556c76adc ("net: mscc: Add initial Ocelot switch support")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Faineli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06 17:15:25 -07:00
Vladimir Oltean
21ce7f3e16 net: dsa: ocelot: the MAC table on Felix is twice as large
When running 'bridge fdb dump' on Felix, sometimes learnt and static MAC
addresses would appear, sometimes they wouldn't.

Turns out, the MAC table has 4096 entries on VSC7514 (Ocelot) and 8192
entries on VSC9959 (Felix), so the existing code from the Ocelot common
library only dumped half of Felix's MAC table. They are both organized
as a 4-way set-associative TCAM, so we just need a single variable
indicating the correct number of rows.

Fixes: 5605194877 ("net: dsa: ocelot: add driver for Felix switch family")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06 17:15:24 -07:00
Linus Torvalds
b9388959ba Merge tag 'tag-chrome-platform-fixes-for-v5.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux
Pull chrome platform fix from Benson Leung:
 "Fix a resource allocation issue in cros_ec_sensorhub.c"

* tag 'tag-chrome-platform-fixes-for-v5.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux:
  platform/chrome: cros_ec_sensorhub: Allocate sensorhub resource before claiming sensors
2020-05-06 16:40:14 -07:00
Thomas Gleixner
8101b5a153 ARM: futex: Address build warning
Stephen reported the following build warning on a ARM multi_v7_defconfig
build with GCC 9.2.1:

kernel/futex.c: In function 'do_futex':
kernel/futex.c:1676:17: warning: 'oldval' may be used uninitialized in this function [-Wmaybe-uninitialized]
 1676 |   return oldval == cmparg;
      |          ~~~~~~~^~~~~~~~~
kernel/futex.c:1652:6: note: 'oldval' was declared here
 1652 |  int oldval, ret;
      |      ^~~~~~

introduced by commit a08971e948 ("futex: arch_futex_atomic_op_inuser()
calling conventions change").

While that change should not make any difference it confuses GCC which
fails to work out that oldval is not referenced when the return value is
not zero.

GCC fails to properly analyze arch_futex_atomic_op_inuser(). It's not the
early return, the issue is with the assembly macros. GCC fails to detect
that those either set 'ret' to 0 and set oldval or set 'ret' to -EFAULT
which makes oldval uninteresting. The store to the callsite supplied oldval
pointer is conditional on ret == 0.

The straight forward way to solve this is to make the store unconditional.

Aside of addressing the build warning this makes sense anyway because it
removes the conditional from the fastpath. In the error case the stored
value is uninteresting and the extra store does not matter at all.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/87pncao2ph.fsf@nanos.tec.linutronix.de
2020-05-07 00:41:47 +02:00
Vladimir Oltean
0ba83aa037 net: dsa: sja1105: the PTP_CLK extts input reacts on both edges
It looks like the sja1105 external timestamping input is not as generic
as we thought. When fed a signal with 50% duty cycle, it will timestamp
both the rising and the falling edge. When fed a short pulse signal,
only the timestamp of the falling edge will be seen in the PTPSYNCTS
register, because that of the rising edge had been overwritten. So the
moral is: don't feed it short pulse inputs.

Luckily this is not a complete deal breaker, as we can still work with
1 Hz square waves. But the problem is that the extts polling period was
not dimensioned enough for this input signal. If we leave the period at
half a second, we risk losing timestamps due to jitter in the measuring
process. So we need to increase it to 4 times per second.

Also, the very least we can do to inform the user is to deny any other
flags combination than with PTP_RISING_EDGE and PTP_FALLING_EDGE both
set.

Fixes: 747e5eb31d ("net: dsa: sja1105: configure the PTP_CLK pin as EXT_TS or PER_OUT")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06 15:03:14 -07:00
Eric Dumazet
a84724178b selftests: net: tcp_mmap: fix SO_RCVLOWAT setting
Since chunk_size is no longer an integer, we can not
use it directly as an argument of setsockopt().

This patch should fix tcp_mmap for Big Endian kernels.

Fixes: 597b01edaf ("selftests: net: avoid ptl lock contention in tcp_mmap")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Cc: Arjun Roy <arjunroy@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06 15:01:30 -07:00
Murali Karicheri
f5dda315b6 net: hsr: fix incorrect type usage for protocol variable
Fix following sparse checker warning:-

net/hsr/hsr_slave.c:38:18: warning: incorrect type in assignment (different base types)
net/hsr/hsr_slave.c:38:18:    expected unsigned short [unsigned] [usertype] protocol
net/hsr/hsr_slave.c:38:18:    got restricted __be16 [usertype] h_proto
net/hsr/hsr_slave.c:39:25: warning: restricted __be16 degrades to integer
net/hsr/hsr_slave.c:39:57: warning: restricted __be16 degrades to integer

Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>
Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06 15:00:20 -07:00
Antoine Tenart
29ca3cdfe1 net: macsec: fix rtnl locking issue
netdev_update_features() must be called with the rtnl lock taken. Not
doing so triggers a warning, as ASSERT_RTNL() is used in
__netdev_update_features(), the first function called by
netdev_update_features(). Fix this.

Fixes: c850240b6c ("net: macsec: report real_dev features when HW offloading is enabled")
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06 14:34:38 -07:00
Dan Carpenter
722c0f00d4 net: mvpp2: cls: Prevent buffer overflow in mvpp2_ethtool_cls_rule_del()
The "info->fs.location" is a u32 that comes from the user via the
ethtool_set_rxnfc() function.  We need to check for invalid values to
prevent a buffer overflow.

I copy and pasted this check from the mvpp2_ethtool_cls_rule_ins()
function.

Fixes: 90b509b39a ("net: mvpp2: cls: Add Classification offload support")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06 14:15:46 -07:00
Dan Carpenter
39bd16df7c net: mvpp2: prevent buffer overflow in mvpp22_rss_ctx()
The "rss_context" variable comes from the user via  ethtool_get_rxfh().
It can be any u32 value except zero.  Eventually it gets passed to
mvpp22_rss_ctx() and if it is over MVPP22_N_RSS_TABLES (8) then it
results in an array overflow.

Fixes: 895586d5dc ("net: mvpp2: cls: Use RSS contexts to handle RSS tables")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06 14:15:46 -07:00
Eric Dumazet
bf5525f3a8 selftests: net: tcp_mmap: clear whole tcp_zerocopy_receive struct
We added fields in tcp_zerocopy_receive structure,
so make sure to clear all fields to not pass garbage to the kernel.

We were lucky because recent additions added 'out' parameters,
still we need to clean our reference implementation, before folks
copy/paste it.

Fixes: c8856c0514 ("tcp-zerocopy: Return inq along with tcp receive zerocopy.")
Fixes: 33946518d4 ("tcp-zerocopy: Return sk_err (if set) along with tcp receive zerocopy.")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Arjun Roy <arjunroy@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06 13:45:09 -07:00
Linus Torvalds
3c40cdb0e9 Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto fixes from Herbert Xu:
 "This fixes a potential scheduling latency problem for the algorithms
  used by WireGuard"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: arch/nhpoly1305 - process in explicit 4k chunks
  crypto: arch/lib - limit simd usage to 4k chunks
2020-05-06 10:20:00 -07:00
Greg Kroah-Hartman
084d7e7846 Merge tag 'usb-serial-5.7-rc5' of https://git.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial into usb-linus
Johan writes:

USB-serial fixes for 5.7-rc5

Here's a fix adding a missing input sanity check and a new modem device
id.

Both have been in linux-next with no reported issues.

* tag 'usb-serial-5.7-rc5' of https://git.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial:
  USB: serial: qcserial: Add DW5816e support
  USB: serial: garmin_gps: add sanity checking for data length
2020-05-06 17:26:35 +02:00
Masami Hiramatsu
5b4dcd2d20 tracing/kprobes: Reject new event if loc is NULL
Reject the new event which has NULL location for kprobes.
For kprobes, user must specify at least the location.

Link: http://lkml.kernel.org/r/158779376597.6082.1411212055469099461.stgit@devnote2

Cc: Tom Zanussi <zanussi@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: stable@vger.kernel.org
Fixes: 2a588dd1d5 ("tracing: Add kprobe event command generation functions")
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-05-06 09:04:11 -04:00
Masami Hiramatsu
da0f1f4167 tracing/boottime: Fix kprobe event API usage
Fix boottime kprobe events to use API correctly for
multiple events.

For example, when we set a multiprobe kprobe events in
bootconfig like below,

  ftrace.event.kprobes.myevent {
  	probes = "vfs_read $arg1 $arg2", "vfs_write $arg1 $arg2"
  }

This cause an error;

  trace_boot: Failed to add probe: p:kprobes/myevent (null)  vfs_read $arg1 $arg2  vfs_write $arg1 $arg2

This shows the 1st argument becomes NULL and multiprobes
are merged to 1 probe.

Link: http://lkml.kernel.org/r/158779375766.6082.201939936008972838.stgit@devnote2

Cc: Ingo Molnar <mingo@kernel.org>
Cc: stable@vger.kernel.org
Fixes: 29a1548105 ("tracing: Change trace_boot to use kprobe_event interface")
Reviewed-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-05-06 09:04:11 -04:00
Masami Hiramatsu
dcbd21c9fc tracing/kprobes: Fix a double initialization typo
Fix a typo that resulted in an unnecessary double
initialization to addr.

Link: http://lkml.kernel.org/r/158779374968.6082.2337484008464939919.stgit@devnote2

Cc: Tom Zanussi <zanussi@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: stable@vger.kernel.org
Fixes: c7411a1a12 ("tracing/kprobe: Check whether the non-suffixed symbol is notrace")
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-05-06 09:04:11 -04:00
Masami Hiramatsu
de462e5f10 bootconfig: Fix to remove bootconfig data from initrd while boot
If there is a bootconfig data in the tail of initrd/initramfs,
initrd image sanity check caused an error while decompression
stage as follows.

[    0.883882] Unpacking initramfs...
[    2.696429] Initramfs unpacking failed: invalid magic at start of compressed archive

This error will be ignored if CONFIG_BLK_DEV_RAM=n,
but CONFIG_BLK_DEV_RAM=y the kernel failed to mount rootfs
and causes a panic.

To fix this issue, shrink down the initrd_end for removing
tailing bootconfig data while boot the kernel.

Link: http://lkml.kernel.org/r/158788401014.24243.17424755854115077915.stgit@devnote2

Cc: Borislav Petkov <bp@alien8.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: stable@vger.kernel.org
Fixes: 7684b8582c ("bootconfig: Load boot config from the tail of initrd")
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-05-06 09:04:11 -04:00
Paolo Bonzini
2673cb6849 Merge tag 'kvm-s390-master-5.7-3' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD
KVM: s390: Fix for running nested uner z/VM

There are circumstances when running nested under z/VM that would trigger a
WARN_ON_ONCE. Remove the WARN_ON_ONCE. Long term we certainly want to make this
code more robust and flexible, but just returning instead of WARNING makes
guest bootable again.
2020-05-06 08:09:17 -04:00
Peter Xu
495907ec36 KVM: X86: Declare KVM_CAP_SET_GUEST_DEBUG properly
KVM_CAP_SET_GUEST_DEBUG should be supported for x86 however it's not declared
as supported.  My wild guess is that userspaces like QEMU are using "#ifdef
KVM_CAP_SET_GUEST_DEBUG" to check for the capability instead, but that could be
wrong because the compilation host may not be the runtime host.

The userspace might still want to keep the old "#ifdef" though to not break the
guest debug on old kernels.

Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20200505154750.126300-1-peterx@redhat.com>
[Do the same for PPC and s390. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-06 06:51:38 -04:00
Peter Xu
8ffdaf9155 KVM: selftests: Fix build for evmcs.h
I got this error when building kvm selftests:

/usr/bin/ld: /home/xz/git/linux/tools/testing/selftests/kvm/libkvm.a(vmx.o):/home/xz/git/linux/tools/testing/selftests/kvm/include/evmcs.h:222: multiple definition of `current_evmcs'; /tmp/cco1G48P.o:/home/xz/git/linux/tools/testing/selftests/kvm/include/evmcs.h:222: first defined here
/usr/bin/ld: /home/xz/git/linux/tools/testing/selftests/kvm/libkvm.a(vmx.o):/home/xz/git/linux/tools/testing/selftests/kvm/include/evmcs.h:223: multiple definition of `current_vp_assist'; /tmp/cco1G48P.o:/home/xz/git/linux/tools/testing/selftests/kvm/include/evmcs.h:223: first defined here

I think it's because evmcs.h is included both in a test file and a lib file so
the structs have multiple declarations when linking.  After all it's not a good
habit to declare structs in the header files.

Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20200504220607.99627-1-peterx@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-06 06:51:36 -04:00
Paolo Bonzini
139f7425fd kvm: x86: Use KVM CPU capabilities to determine CR4 reserved bits
Using CPUID data can be useful for the processor compatibility
check, but that's it.  Using it to compute guest-reserved bits
can have both false positives (such as LA57 and UMIP which we
are already handling) and false negatives: in particular, with
this patch we don't allow anymore a KVM guest to set CR4.PKE
when CR4.PKE is clear on the host.

Fixes: b9dd21e104 ("KVM: x86: simplify handling of PKRU")
Reported-by: Jim Mattson <jmattson@google.com>
Tested-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-06 06:51:36 -04:00
Sean Christopherson
c7cb2d650c KVM: VMX: Explicitly clear RFLAGS.CF and RFLAGS.ZF in VM-Exit RSB path
Clear CF and ZF in the VM-Exit path after doing __FILL_RETURN_BUFFER so
that KVM doesn't interpret clobbered RFLAGS as a VM-Fail.  Filling the
RSB has always clobbered RFLAGS, its current incarnation just happens
clear CF and ZF in the processs.  Relying on the macro to clear CF and
ZF is extremely fragile, e.g. commit 089dd8e531 ("x86/speculation:
Change FILL_RETURN_BUFFER to work with objtool") tweaks the loop such
that the ZF flag is always set.

Reported-by: Qian Cai <cai@lca.pw>
Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: stable@vger.kernel.org
Fixes: f2fde6a5bc ("KVM: VMX: Move RSB stuffing to before the first RET after VM-Exit")
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200506035355.2242-1-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-06 06:51:35 -04:00
Kashyap Chamarthy
27abe57770 docs/virt/kvm: Document configuring and running nested guests
This is a rewrite of this[1] Wiki page with further enhancements.  The
doc also includes a section on debugging problems in nested
environments, among other improvements.

[1] https://www.linux-kvm.org/page/Nested_Guests

Signed-off-by: Kashyap Chamarthy <kchamart@redhat.com>
Message-Id: <20200505112839.30534-1-kchamart@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-06 05:45:47 -04:00
Atish Patra
73cb8e2a58 RISC-V: Remove unused code from STRICT_KERNEL_RWX
This patch removes the unused functions set_kernel_text_rw/ro.
Currently, it is not being invoked from anywhere and no other architecture
(except arm) uses this code. Even in ARM, these functions are not invoked
from anywhere currently.

Fixes: d27c3c9081 ("riscv: add STRICT_KERNEL_RWX support")
Signed-off-by: Atish Patra <atish.patra@wdc.com>
Reviewed-by: Zong Li <zong.li@sifive.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2020-05-05 17:02:14 -07:00
Linus Torvalds
dc56c5acd8 Merge tag 'platform-drivers-x86-v5.7-2' of git://git.infradead.org/linux-platform-drivers-x86
Pull x86 platform driver fixes from Andy Shevchenko:

 - Avoid loading asus-nb-wmi module on selected laptop models

 - Fix S0ix debug support for Jasper Lake PMC

 - Few fixes which have been reported by Hulk bot and others

* tag 'platform-drivers-x86-v5.7-2' of git://git.infradead.org/linux-platform-drivers-x86:
  platform/x86: thinkpad_acpi: Remove always false 'value < 0' statement
  platform/x86: intel_pmc_core: avoid unused-function warnings
  platform/x86: asus-nb-wmi: Do not load on Asus T100TA and T200TA
  platform/x86: intel_pmc_core: Change Jasper Lake S0ix debug reg map back to ICL
  platform/x86/intel-uncore-freq: make uncore_root_kobj static
  platform/x86: wmi: Make two functions static
  platform/x86: surface3_power: Fix a NULL vs IS_ERR() check in probe
2020-05-05 16:29:03 -07:00
Roman Mashak
38212bb31f neigh: send protocol value in neighbor create notification
When a new neighbor entry has been added, event is generated but it does not
include protocol, because its value is assigned after the event notification
routine has run, so move protocol assignment code earlier.

Fixes: df9b0e30d4 ("neighbor: Add protocol attribute")
Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-05 13:38:59 -07:00
Aurabindo Pillai
e6142dd511 drm/amd/display: Prevent dpcd reads with passive dongles
[why]
During hotplug, a DP port may be connected to the sink through
passive adapter which does not support DPCD reads. Issuing reads
without checking for this condition will result in errors

[how]
Ensure the link is in aux_mode before initiating operation that result
in a DPCD read.

Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Reviewed-by: Harry Wentland <Harry.Wentland@amd.com>
Acked-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-05-05 16:13:57 -04:00
Roman Li
80797dd6f1 drm/amd/display: fix counter in wait_for_no_pipes_pending
[Why]
Wait counter is not being reset for each pipe.

[How]
Move counter reset into pipe loop scope.

Signed-off-by: Roman Li <roman.li@amd.com>
Reviewed-by: Zhan Liu <Zhan.Liu@amd.com>
Acked-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-05-05 16:13:15 -04:00
Sung Lee
b95e51eb9f drm/amd/display: Update DCN2.1 DV Code Revision
[WHY & HOW]
There is a problem in hscale_pixel_rate, the bug
causes DCN to be more optimistic (more likely to underflow)
in upscale cases during prefetch.
This commit ports the fix from DV code to address these issues.

Signed-off-by: Sung Lee <sung.lee@amd.com>
Reviewed-by: Dmytro Laktyushkin <Dmytro.Laktyushkin@amd.com>
Acked-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-05-05 16:07:02 -04:00
Xiaoguang Wang
7f13657d14 io_uring: handle -EFAULT properly in io_uring_setup()
If copy_to_user() in io_uring_setup() failed, we'll leak many kernel
resources, which will be recycled until process terminates. This bug
can be reproduced by using mprotect to set params to PROT_READ. To fix
this issue, refactor io_uring_create() a bit to add a new 'struct
io_uring_params __user *params' parameter and move the copy_to_user()
in io_uring_setup() to io_uring_setup(), if copy_to_user() failed,
we can free kernel resource properly.

Suggested-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-05 13:18:11 -06:00
Dejin Zheng
755f5738ff net: broadcom: fix a mistake about ioremap resource
Commit d7a5502b0b ("net: broadcom: convert to
devm_platform_ioremap_resource_byname()") will broke this driver.
idm_base and nicpm_base were optional, after this change, they are
mandatory. it will probe fails with -22 when the dtb doesn't have them
defined. so revert part of this commit and make idm_base and nicpm_base
as optional.

Fixes: d7a5502b0b ("net: broadcom: convert to devm_platform_ioremap_resource_byname()")
Reported-by: Jonathan Richardson <jonathan.richardson@broadcom.com>
Cc: Scott Branden <scott.branden@broadcom.com>
Cc: Ray Jui <ray.jui@broadcom.com>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Dejin Zheng <zhengdejin5@gmail.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-05 11:11:12 -07:00
Sean Paul
5fe89a6acd drm: Fix HDCP failures when SRM fw is missing
The SRM cleanup in 79643fddd6 ("drm/hdcp: optimizing the srm
handling") inadvertently altered the behavior of HDCP auth when
the SRM firmware is missing. Before that patch, missing SRM was
interpreted as the device having no revoked keys. With that patch,
if the SRM fw file is missing we reject _all_ keys.

This patch fixes that regression by returning success if the file
cannot be found. It also checks the return value from request_srm such
that we won't end up trying to parse the ksv list if there is an error
fetching it.

Fixes: 79643fddd6 ("drm/hdcp: optimizing the srm handling")
Cc: stable@vger.kernel.org
Cc: Ramalingam C <ramalingam.c@intel.com>
Cc: Sean Paul <sean@poorly.run>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Maxime Ripard <mripard@kernel.org>
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: dri-devel@lists.freedesktop.org
Reviewed-by: Ramalingam C <ramalingam.c@intel.com>
Signed-off-by: Sean Paul <seanpaul@chromium.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20200414190258.38873-1-sean@poorly.run

Changes in v2:
-Noticed a couple other things to clean up
Reviewed-by: Ramalingam C <ramalingam.c@intel.com>
2020-05-05 14:01:53 -04:00
Tejun Heo
0b80f9866e iocost: protect iocg->abs_vdebt with iocg->waitq.lock
abs_vdebt is an atomic_64 which tracks how much over budget a given cgroup
is and controls the activation of use_delay mechanism. Once a cgroup goes
over budget from forced IOs, it has to pay it back with its future budget.
The progress guarantee on debt paying comes from the iocg being active -
active iocgs are processed by the periodic timer, which ensures that as time
passes the debts dissipate and the iocg returns to normal operation.

However, both iocg activation and vdebt handling are asynchronous and a
sequence like the following may happen.

1. The iocg is in the process of being deactivated by the periodic timer.

2. A bio enters ioc_rqos_throttle(), calls iocg_activate() which returns
   without anything because it still sees that the iocg is already active.

3. The iocg is deactivated.

4. The bio from #2 is over budget but needs to be forced. It increases
   abs_vdebt and goes over the threshold and enables use_delay.

5. IO control is enabled for the iocg's subtree and now IOs are attributed
   to the descendant cgroups and the iocg itself no longer issues IOs.

This leaves the iocg with stuck abs_vdebt - it has debt but inactive and no
further IOs which can activate it. This can end up unduly punishing all the
descendants cgroups.

The usual throttling path has the same issue - the iocg must be active while
throttled to ensure that future event will wake it up - and solves the
problem by synchronizing the throttling path with a spinlock. abs_vdebt
handling is another form of overage handling and shares a lot of
characteristics including the fact that it isn't in the hottest path.

This patch fixes the above and other possible races by strictly
synchronizing abs_vdebt and use_delay handling with iocg->waitq.lock.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Vlad Dmitriev <vvd@fb.com>
Cc: stable@vger.kernel.org # v5.4+
Fixes: e1518f63f2 ("blk-iocost: Don't let merges push vtime into the future")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-05 09:23:18 -06:00
Jeffrey Hugo
f0e1d3ac2d bus: mhi: core: Fix channel device name conflict
When multiple instances of the same MHI product are present in a system,
we can see a splat from mhi_create_devices() - "sysfs: cannot create
duplicate filename".

This is because the device names assigned to the MHI channel devices are
non-unique.  They consist of the channel's name, and the channel's pipe
id.  For identical products, each instance is going to have the same
set of channel (both in name and pipe id).

To fix this, we prepend the device name of the parent device that the
MHI channels belong to.  Since different instances of the same product
should have unique device names, this makes the MHI channel devices for
each product also unique.

Additionally, remove the pipe id from the MHI channel device name.  This
is an internal detail to the MHI product that provides little value, and
imposes too much device specific internal details to userspace.  It is
expected that channel with a specific name (ie "SAHARA") has a specific
client, and it does not matter what pipe id that channel is enumerated on.
The pipe id is an internal detail between the MHI bus, and the hardware.
The client is not expected to make decisions based on the pipe id, and to
do so would require the client to have intimate knowledge of the hardware,
which is inappropiate as it may violate the layering provided by the MHI
bus.  The limitation of doing this is that each product may only have one
instance of a channel by a unique name.  This limitation is appropriate
given the usecases of MHI channels.

Signed-off-by: Jeffrey Hugo <jhugo@codeaurora.org>
Reviewed-by: Hemant Kumar <hemantk@codeaurora.org>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://lore.kernel.org/r/20200430190555.32741-7-manivannan.sadhasivam@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-05 16:59:54 +02:00
Jeffrey Hugo
af2e588180 bus: mhi: core: Fix typo in comment
There is a typo - "runtimet" should be "runtime".  Fix it.

Signed-off-by: Jeffrey Hugo <jhugo@codeaurora.org>
Reviewed-by: Hemant Kumar <hemantk@codeaurora.org>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://lore.kernel.org/r/20200430190555.32741-6-manivannan.sadhasivam@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-05 16:58:36 +02:00
Jeffrey Hugo
45723a4484 bus: mhi: core: Offload register accesses to the controller
When reading or writing MHI registers, the core assumes that the physical
link is a memory mapped PCI link.  This assumption may not hold for all
MHI devices.  The controller knows what is the physical link (ie PCI, I2C,
SPI, etc), and therefore knows the proper methods to access that link.
The controller can also handle link specific error scenarios, such as
reading -1 when the PCI link went down.

Therefore, it is appropriate that the MHI core requests the controller to
make register accesses on behalf of the core, which abstracts the core
from link specifics, and end up removing an unnecessary assumption.

Signed-off-by: Jeffrey Hugo <jhugo@codeaurora.org>
Reviewed-by: Hemant Kumar <hemantk@codeaurora.org>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://lore.kernel.org/r/20200430190555.32741-5-manivannan.sadhasivam@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-05 16:58:35 +02:00
Jeffrey Hugo
85a087df4a bus: mhi: core: Remove link_status() callback
If the MHI core detects invalid data due to a PCI read, it calls into
the controller via link_status() to double check that the link is infact
down.  All in all, this is pretty pointless, and racy.  There are no good
reasons for this, and only drawbacks.

Its pointless because chances are, the controller is going to do the same
thing to determine if the link is down - attempt a PCI access and compare
the result.  This does not make the link status decision any smarter.

Its racy because its possible that the link was down at the time of the
MHI core access, but then recovered before the controller access.  In this
case, the controller will indicate the link is not down, and the MHI core
will precede to use a bad value as the MHI core does not attempt to retry
the access.

Retrying the access in the MHI core is a bad idea because again, it is
racy - what if the link is down again?  Furthermore, there may be some
higher level state associated with the link status, that is now invalid
because the link went down.

The only reason why the MHI core could see "invalid" data when doing a PCI
access, that is actually valid, is if the register actually contained the
PCI spec defined sentinel for an invalid access.  In this case, it is
arguable that the MHI implementation broken, and should be fixed, not
worked around.

Therefore, remove the link_status() callback before anyone attempts to
implement it.

Signed-off-by: Jeffrey Hugo <jhugo@codeaurora.org>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Reviewed-by: Hemant Kumar <hemantk@codeaurora.org>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://lore.kernel.org/r/20200430190555.32741-4-manivannan.sadhasivam@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-05 16:58:35 +02:00
Jeffrey Hugo
ce31225808 bus: mhi: core: Make sure to powerdown if mhi_sync_power_up fails
Powerdown is necessary if mhi_sync_power_up fails due to a timeout, to
clean up the resources.  Otherwise a BUG could be triggered when
attempting to clean up MSIs because the IRQ is still active from a
request_irq().

Signed-off-by: Jeffrey Hugo <jhugo@codeaurora.org>
Reviewed-by: Hemant Kumar <hemantk@codeaurora.org>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://lore.kernel.org/r/20200430190555.32741-3-manivannan.sadhasivam@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-05 16:58:35 +02:00
Manivannan Sadhasivam
115f32512f bus: mhi: Fix parsing of mhi_flags
With the current parsing of mhi_flags, the following statement always
return false:

eob = !!(flags & MHI_EOB);

This is due to the fact that 'enum mhi_flags' starts with index 0 and we
are using direct AND operation to extract each bit. Fix this by using
BIT() macros for defining the flags so that the reset of the code need not
be touched.

Fixes: 189ff97cca ("bus: mhi: core: Add support for data transfer")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://lore.kernel.org/r/20200430190555.32741-2-manivannan.sadhasivam@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-05 16:56:41 +02:00
Tomas Winkler
d76bc8200f mei: me: disable mei interface on LBG servers.
Disable the MEI driver on LBG SPS (server) platforms, some corner
flows such as recovery mode does not work, and the driver
doesn't have working use cases.

Cc: <stable@vger.kernel.org>
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Link: https://lore.kernel.org/r/20200428211200.12200-1-tomas.winkler@intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-05 16:56:41 +02:00
Joerg Roedel
119b2b2c3e iommu/amd: Do not flush Device Table in iommu_map_page()
The flush of the Device Table Entries for the domain has already
happened in increase_address_space(), if necessary. Do no flush them
again in iommu_map_page().

Signed-off-by: Joerg Roedel <jroedel@suse.de>
Tested-by: Qian Cai <cai@lca.pw>
Link: https://lore.kernel.org/r/20200504125413.16798-6-joro@8bytes.org
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2020-05-05 14:38:38 +02:00
Joerg Roedel
19c6978fba iommu/amd: Update Device Table in increase_address_space()
The Device Table needs to be updated before the new page-table root
can be published in domain->pt_root. Otherwise a concurrent call to
fetch_pte might fetch a PTE which is not reachable through the Device
Table Entry.

Fixes: 92d420ec02 ("iommu/amd: Relax locking in dma_ops path")
Reported-by: Qian Cai <cai@lca.pw>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Tested-by: Qian Cai <cai@lca.pw>
Link: https://lore.kernel.org/r/20200504125413.16798-5-joro@8bytes.org
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2020-05-05 14:38:38 +02:00
Joerg Roedel
f44a4d7e4f iommu/amd: Call domain_flush_complete() in update_domain()
The update_domain() function is expected to also inform the hardware
about domain changes. This needs a COMPLETION_WAIT command to be sent
to all IOMMUs which use the domain.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
Tested-by: Qian Cai <cai@lca.pw>
Link: https://lore.kernel.org/r/20200504125413.16798-4-joro@8bytes.org
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2020-05-05 14:38:38 +02:00
Joerg Roedel
5b8a9a047b iommu/amd: Do not loop forever when trying to increase address space
When increase_address_space() fails to allocate memory, alloc_pte()
will call it again until it succeeds. Do not loop forever while trying
to increase the address space and just return an error instead.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
Tested-by: Qian Cai <cai@lca.pw>
Link: https://lore.kernel.org/r/20200504125413.16798-3-joro@8bytes.org
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2020-05-05 14:38:38 +02:00
Joerg Roedel
eb791aa70b iommu/amd: Fix race in increase_address_space()/fetch_pte()
The 'pt_root' and 'mode' struct members of 'struct protection_domain'
need to be get/set atomically, otherwise the page-table of the domain
can get corrupted.

Merge the fields into one atomic64_t struct member which can be
get/set atomically.

Fixes: 92d420ec02 ("iommu/amd: Relax locking in dma_ops path")
Reported-by: Qian Cai <cai@lca.pw>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Tested-by: Qian Cai <cai@lca.pw>
Link: https://lore.kernel.org/r/20200504125413.16798-2-joro@8bytes.org
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2020-05-05 14:38:38 +02:00
Xiongfeng Wang
f8a31eca47 platform/x86: thinkpad_acpi: Remove always false 'value < 0' statement
Since 'value' is declared as unsigned long, the following statement is
always false.
	value < 0

So let's remove it.

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2020-05-05 15:33:24 +03:00
Arnd Bergmann
01f259f372 platform/x86: intel_pmc_core: avoid unused-function warnings
When both CONFIG_DEBUG_FS and CONFIG_PM_SLEEP are disabled, the
functions that got moved out of the #ifdef section now cause
a warning:

drivers/platform/x86/intel_pmc_core.c:654:13: error: 'pmc_core_lpm_display' defined but not used [-Werror=unused-function]
  654 | static void pmc_core_lpm_display(struct pmc_dev *pmcdev, struct device *dev,
      |             ^~~~~~~~~~~~~~~~~~~~
drivers/platform/x86/intel_pmc_core.c:617:13: error: 'pmc_core_slps0_display' defined but not used [-Werror=unused-function]
  617 | static void pmc_core_slps0_display(struct pmc_dev *pmcdev, struct device *dev,
      |             ^~~~~~~~~~~~~~~~~~~~~~

Rather than add even more #ifdefs here, remove them entirely and
let the compiler work it out, it can actually get rid of all the
debugfs calls without problems as long as the struct member is
there.

The two PM functions just need a __maybe_unused annotations to avoid
another warning instead of the #ifdef.

Fixes: aae43c2bcd ("platform/x86: intel_pmc_core: Relocate pmc_core_*_display() to outside of CONFIG_DEBUG_FS")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2020-05-05 15:33:24 +03:00
Hans de Goede
3bd12da7f5 platform/x86: asus-nb-wmi: Do not load on Asus T100TA and T200TA
asus-nb-wmi does not add any extra functionality on these Asus
Transformer books. They have detachable keyboards, so the hotkeys are
send through a HID device (and handled by the hid-asus driver) and also
the rfkill functionality is not used on these devices.

Besides not adding any extra functionality, initializing the WMI interface
on these devices actually has a negative side-effect. For some reason
the \_SB.ATKD.INIT() function which asus_wmi_platform_init() calls drives
GPO2 (INT33FC:02) pin 8, which is connected to the front facing webcam LED,
high and there is no (WMI or other) interface to drive this low again
causing the LED to be permanently on, even during suspend.

This commit adds a blacklist of DMI system_ids on which not to load the
asus-nb-wmi and adds these Transformer books to this list. This fixes
the webcam LED being permanently on under Linux.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2020-05-05 15:32:51 +03:00
Archana Patni
e87fa339d4 platform/x86: intel_pmc_core: Change Jasper Lake S0ix debug reg map back to ICL
Jasper Lake uses Icelake PCH IPs and the S0ix debug interfaces are same as
Icelake. It uses SLP_S0_DBG register latch/read interface from Icelake
generation. It doesn't use Tiger Lake LPM debug registers. Change the
Jasper Lake S0ix debug interface to use the ICL reg map.

Fixes: 16292bed9c ("platform/x86: intel_pmc_core: Add Atom based Jasper Lake (JSL) platform support")
Signed-off-by: Archana Patni <archana.patni@intel.com>
Acked-by: David E. Box <david.e.box@intel.com>
Tested-by: Divagar Mohandass <divagar.mohandass@intel.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2020-05-05 15:32:41 +03:00
Prashant Malani
7990be48ef usb: typec: mux: intel: Handle alt mode HPD_HIGH
According to the PMC Type C Subsystem (TCSS) Mux programming guide rev
0.6, when a device is transitioning to DP Alternate Mode state, if the
HPD_STATE (bit 7) field in the status update command VDO is set to
HPD_HIGH, the HPD_HIGH field in the Alternate Mode request “mode_data”
field (bit 14) should also be set. Ensure the bit is correctly handled
while issuing the Alternate Mode request.

Signed-off-by: Prashant Malani <pmalani@chromium.org>
Fixes: 6701adfa96 ("usb: typec: driver for Intel PMC mux control")
Acked-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Link: https://lore.kernel.org/r/20200429054432.134178-1-pmalani@chromium.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-05 13:06:46 +02:00
Jeremy Linton
2bef9aed6f usb: usbfs: correct kernel->user page attribute mismatch
On some architectures (e.g. arm64) requests for
IO coherent memory may use non-cachable attributes if
the relevant device isn't cache coherent. If these
pages are then remapped into userspace as cacheable,
they may not be coherent with the non-cacheable mappings.

In particular this happens with libusb, when it attempts
to create zero-copy buffers for use by rtl-sdr
(https://github.com/osmocom/rtl-sdr/). On low end arm
devices with non-coherent USB ports, the application will
be unexpectedly killed, while continuing to work fine on
arm machines with coherent USB controllers.

This bug has been discovered/reported a few times over
the last few years. In the case of rtl-sdr a compile time
option to enable/disable zero copy was implemented to
work around it.

Rather than relaying on application specific workarounds,
dma_mmap_coherent() can be used instead of remap_pfn_range().
The page cache/etc attributes will then be correctly set in
userspace to match the kernel mapping.

Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
Cc: stable <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20200504201348.1183246-1-jeremy.linton@arm.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-05 13:06:46 +02:00
Heikki Krogerus
e283f5e89f usb: typec: intel_pmc_mux: Fix the property names
The device property names for the port index number are
"usb2-port-number" and "usb3-port-number", not "usb2-port"
and "usb3-port".

Fixes: 6701adfa96 ("usb: typec: driver for Intel PMC mux control")
Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Link: https://lore.kernel.org/r/20200430135657.45169-1-heikki.krogerus@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-05 13:06:46 +02:00
Alan Stern
ac854131d9 USB: core: Fix misleading driver bug report
The syzbot fuzzer found a race between URB submission to endpoint 0
and device reset.  Namely, during the reset we call usb_ep0_reinit()
because the characteristics of ep0 may have changed (if the reset
follows a firmware update, for example).  While usb_ep0_reinit() is
running there is a brief period during which the pointers stored in
udev->ep_in[0] and udev->ep_out[0] are set to NULL, and if an URB is
submitted to ep0 during that period, usb_urb_ep_type_check() will
report it as a driver bug.  In the absence of those pointers, the
routine thinks that the endpoint doesn't exist.  The log message looks
like this:

------------[ cut here ]------------
usb 2-1: BOGUS urb xfer, pipe 2 != type 2
WARNING: CPU: 0 PID: 9241 at drivers/usb/core/urb.c:478
usb_submit_urb+0x1188/0x1460 drivers/usb/core/urb.c:478

Now, although submitting an URB while the device is being reset is a
questionable thing to do, it shouldn't count as a driver bug as severe
as submitting an URB for an endpoint that doesn't exist.  Indeed,
endpoint 0 always exists, even while the device is in its unconfigured
state.

To prevent these misleading driver bug reports, this patch updates
usb_disable_endpoint() to avoid clearing the ep_in[] and ep_out[]
pointers when the endpoint being disabled is ep0.  There's no danger
of leaving a stale pointer in place, because the usb_host_endpoint
structure being pointed to is stored permanently in udev->ep0; it
doesn't get deallocated until the entire usb_device structure does.

Reported-and-tested-by: syzbot+db339689b2101f6f6071@syzkaller.appspotmail.com
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>

Link: https://lore.kernel.org/r/Pine.LNX.4.44L0.2005011558590.903-100000@netrider.rowland.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-05 13:06:45 +02:00
Oscar Carter
769acc3656 staging: gasket: Check the return value of gasket_get_bar_index()
Check the return value of gasket_get_bar_index function as it can return
a negative one (-EINVAL). If this happens, a negative index is used in
the "gasket_dev->bar_data" array.

Addresses-Coverity-ID: 1438542 ("Negative array index read")
Fixes: 9a69f5087c ("drivers/staging: Gasket driver framework + Apex driver")
Signed-off-by: Oscar Carter <oscar.carter@gmx.com>
Cc: stable <stable@vger.kernel.org>
Reviewed-by: Richard Yeh <rcy@google.com>
Link: https://lore.kernel.org/r/20200501155118.13380-1-oscar.carter@gmx.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-05 12:36:05 +02:00
Wolfram Sang
3c5c0805ac staging: ks7010: remove me from CC list
I lost interest in this driver years ago because I could't keep up with
testing the incoming janitorial patches. So, drop me from CC.

Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Link: https://lore.kernel.org/r/20200502112648.6942-1-wsa@the-dreams.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-05 12:36:04 +02:00
Christian Borntraeger
5615e74f48 KVM: s390: Remove false WARN_ON_ONCE for the PQAP instruction
In LPAR we will only get an intercept for FC==3 for the PQAP
instruction. Running nested under z/VM can result in other intercepts as
well as ECA_APIE is an effective bit: If one hypervisor layer has
turned this bit off, the end result will be that we will get intercepts for
all function codes. Usually the first one will be a query like PQAP(QCI).
So the WARN_ON_ONCE is not right. Let us simply remove it.

Cc: Pierre Morel <pmorel@linux.ibm.com>
Cc: Tony Krowiak <akrowiak@linux.ibm.com>
Cc: stable@vger.kernel.org # v5.3+
Fixes: e5282de931 ("s390: ap: kvm: add PQAP interception for AQIC")
Link: https://lore.kernel.org/kvm/20200505083515.2720-1-borntraeger@de.ibm.com
Reported-by: Qian Cai <cailca@icloud.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2020-05-05 11:15:05 +02:00
Linus Torvalds
47cf1b422e Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid
Pull HID fixes from Jiri Kosina:

 - Wacom driver functional and regression fixes from Jason Gerecke

 - race condition fix in usbhid, found by syzbot and fixed by Alan Stern

 - a few device-specific quirks and ID additions

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
  HID: quirks: Add HID_QUIRK_NO_INIT_REPORTS quirk for Dell K12A keyboard-dock
  HID: mcp2221: add gpiolib dependency
  HID: i2c-hid: reset Synaptics SYNA2393 on resume
  HID: wacom: Report 2nd-gen Intuos Pro S center button status over BT
  HID: usbhid: Fix race between usbhid_close() and usbhid_stop()
  Revert "HID: wacom: generic: read the number of expected touches on a per collection basis"
  HID: alps: ALPS_1657 is too specific; use U1_UNICORN_LEGACY instead
  HID: alps: Add AUI1657 device ID
  HID: logitech: Add support for Logitech G11 extra keys
  HID: multitouch: add eGalaxTouch P80H84 support
  HID: wacom: Read HID_DG_CONTACTMAX directly for non-generic devices
2020-05-04 18:55:20 -07:00
Zong Li
d6d5161280 riscv: force __cpu_up_ variables to put in data section
Put __cpu_up_stack_pointer and __cpu_up_task_pointer in data section.
Currently, these two variables are put in bss section, there is a
potential risk that secondary harts get the uninitialized value before
main hart finishing the bss clearing. In this case, all secondary
harts would pass the waiting loop and enable the MMU before main hart
set up the page table.

This issue happens on random booting of multiple harts, which means
it will manifest for BBL and OpenSBI v0.6 (or older version). In OpenSBI
v0.7 (or higher version), we have HSM extension so all the secondary harts
are brought-up by Linux kernel in an orderly fashion. This means we don't
need this change for OpenSBI v0.7 (or higher version).

Signed-off-by: Zong Li <zong.li@sifive.com>
Reviewed-by: Greentime Hu <greentime.hu@sifive.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Reviewed-by: Atish Patra <atish.patra@wdc.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2020-05-04 15:03:25 -07:00
Andreas Schwab
0a9f2a6161 riscv: add Linux note to vdso
The Linux note in the vdso allows glibc to check the running kernel
version without having to issue the uname syscall.

Signed-off-by: Andreas Schwab <schwab@suse.de>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2020-05-04 14:22:34 -07:00
Vincent Chen
c749bb2d55 riscv: set max_pfn to the PFN of the last page
The current max_pfn equals to zero. In this case, I found it caused users
cannot get some page information through /proc such as kpagecount in v5.6
kernel because of new sanity checks. The following message is displayed by
stress-ng test suite with the command "stress-ng --verbose --physpage 1 -t
1" on HiFive unleashed board.

 # stress-ng --verbose --physpage 1 -t 1
 stress-ng: debug: [109] 4 processors online, 4 processors configured
 stress-ng: info: [109] dispatching hogs: 1 physpage
 stress-ng: debug: [109] cache allocate: reducing cache level from L3 (too high) to L0
 stress-ng: debug: [109] get_cpu_cache: invalid cache_level: 0
 stress-ng: info: [109] cache allocate: using built-in defaults as no suitable cache found
 stress-ng: debug: [109] cache allocate: default cache size: 2048K
 stress-ng: debug: [109] starting stressors
 stress-ng: debug: [109] 1 stressor spawned
 stress-ng: debug: [110] stress-ng-physpage: started [110] (instance 0)
 stress-ng: error: [110] stress-ng-physpage: cannot read page count for address 0x3fd34de000 in /proc/kpagecount, errno=0 (Success)
 stress-ng: error: [110] stress-ng-physpage: cannot read page count for address 0x3fd32db078 in /proc/kpagecount, errno=0 (Success)
 ...
 stress-ng: error: [110] stress-ng-physpage: cannot read page count for address 0x3fd32db078 in /proc/kpagecount, errno=0 (Success)
 stress-ng: debug: [110] stress-ng-physpage: exited [110] (instance 0)
 stress-ng: debug: [109] process [110] terminated
 stress-ng: info: [109] successful run completed in 1.00s
 #

After applying this patch, the kernel can pass the test.

 # stress-ng --verbose --physpage 1 -t 1
 stress-ng: debug: [104] 4 processors online, 4 processors configured stress-ng: info: [104] dispatching hogs: 1 physpage
 stress-ng: info: [104] cache allocate: using defaults, can't determine cache details from sysfs
 stress-ng: debug: [104] cache allocate: default cache size: 2048K
 stress-ng: debug: [104] starting stressors
 stress-ng: debug: [104] 1 stressor spawned
 stress-ng: debug: [105] stress-ng-physpage: started [105] (instance 0) stress-ng: debug: [105] stress-ng-physpage: exited [105] (instance 0) stress-ng: debug: [104] process [105] terminated
 stress-ng: info: [104] successful run completed in 1.01s
 #

Cc: stable@vger.kernel.org
Signed-off-by: Vincent Chen <vincent.chen@sifive.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Reviewed-by: Yash Shah <yash.shah@sifive.com>
Tested-by: Yash Shah <yash.shah@sifive.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2020-05-04 14:12:32 -07:00
Anup Patel
a2da5b181f RISC-V: Remove N-extension related defines
The RISC-V N-extension is still in draft state hence remove
N-extension related defines from asm/csr.h.

Signed-off-by: Anup Patel <anup.patel@wdc.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2020-05-04 14:09:00 -07:00
Anup Patel
6bcff51539 RISC-V: Add bitmap reprensenting ISA features common across CPUs
This patch adds riscv_isa bitmap which represents Host ISA features
common across all Host CPUs. The riscv_isa is not same as elf_hwcap
because elf_hwcap will only have ISA features relevant for user-space
apps whereas riscv_isa will have ISA features relevant to both kernel
and user-space apps.

One of the use-case for riscv_isa bitmap is in KVM hypervisor where
we will use it to do following operations:

1. Check whether hypervisor extension is available
2. Find ISA features that need to be virtualized (e.g. floating
   point support, vector extension, etc.)

Signed-off-by: Anup Patel <anup.patel@wdc.com>
Signed-off-by: Atish Patra <atish.patra@wdc.com>
Reviewed-by: Alexander Graf <graf@amazon.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2020-05-04 14:08:59 -07:00
Anup Patel
7391efa48d RISC-V: Export riscv_cpuid_to_hartid_mask() API
The riscv_cpuid_to_hartid_mask() API should be exported to allow
building KVM RISC-V as loadable module.

Signed-off-by: Anup Patel <anup.patel@wdc.com>
Reviewed-by: Palmer Dabbelt <palmerdabbelt@google.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2020-05-04 14:08:58 -07:00
Qiushi Wu
bd4af432cc nfp: abm: fix a memory leak bug
In function nfp_abm_vnic_set_mac, pointer nsp is allocated by nfp_nsp_open.
But when nfp_nsp_has_hwinfo_lookup fail, the pointer is not released,
which can lead to a memory leak bug. Fix this issue by adding
nfp_nsp_close(nsp) in the error path.

Fixes: f6e71efdf9 ("nfp: abm: look up MAC addresses via management FW")
Signed-off-by: Qiushi Wu <wu000273@umn.edu>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-04 12:04:12 -07:00
Cong Wang
8d9f73c0ad atm: fix a memory leak of vcc->user_back
In lec_arp_clear_vccs() only entry->vcc is freed, but vcc
could be installed on entry->recv_vcc too in lec_vcc_added().

This fixes the following memory leak:

unreferenced object 0xffff8880d9266b90 (size 16):
  comm "atm2", pid 425, jiffies 4294907980 (age 23.488s)
  hex dump (first 16 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 6b 6b 6b a5  ............kkk.
  backtrace:
    [<(____ptrval____)>] kmem_cache_alloc_trace+0x10e/0x151
    [<(____ptrval____)>] lane_ioctl+0x4b3/0x569
    [<(____ptrval____)>] do_vcc_ioctl+0x1ea/0x236
    [<(____ptrval____)>] svc_ioctl+0x17d/0x198
    [<(____ptrval____)>] sock_do_ioctl+0x47/0x12f
    [<(____ptrval____)>] sock_ioctl+0x2f9/0x322
    [<(____ptrval____)>] vfs_ioctl+0x1e/0x2b
    [<(____ptrval____)>] ksys_ioctl+0x61/0x80
    [<(____ptrval____)>] __x64_sys_ioctl+0x16/0x19
    [<(____ptrval____)>] do_syscall_64+0x57/0x65
    [<(____ptrval____)>] entry_SYSCALL_64_after_hwframe+0x49/0xb3

Cc: Gengming Liu <l.dmxcsnsbh@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-04 11:59:38 -07:00
Cong Wang
93a2014afb atm: fix a UAF in lec_arp_clear_vccs()
Gengming reported a UAF in lec_arp_clear_vccs(),
where we add a vcc socket to an entry in a per-device
list but free the socket without removing it from the
list when vcc->dev is NULL.

We need to call lec_vcc_close() to search and remove
those entries contain the vcc being destroyed. This can
be done by calling vcc->push(vcc, NULL) unconditionally
in vcc_destroy_socket().

Another issue discovered by Gengming's reproducer is
the vcc->dev may point to the static device lecatm_dev,
for which we don't need to register/unregister device,
so we can just check for vcc->dev->ops->owner.

Reported-by: Gengming Liu <l.dmxcsnsbh@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-04 11:59:20 -07:00
Colin Ian King
44d95cc6b1 net: stmmac: gmac5+: fix potential integer overflow on 32 bit multiply
The multiplication of cfg->ctr[1] by 1000000000 is performed using a
32 bit multiplication (since cfg->ctr[1] is a u32) and this can lead
to a potential overflow. Fix this by making the constant a ULL to
ensure a 64 bit multiply occurs.

Fixes: 504723af0d ("net: stmmac: Add basic EST support for GMAC5+")
Addresses-Coverity: ("Unintentional integer overflow")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-04 11:57:21 -07:00
Cong Wang
a7df4870d7 net_sched: fix tcm_parent in tc filter dump
When we tell kernel to dump filters from root (ffff:ffff),
those filters on ingress (ffff:0000) are matched, but their
true parents must be dumped as they are. However, kernel
dumps just whatever we tell it, that is either ffff:ffff
or ffff:0000:

 $ nl-cls-list --dev=dummy0 --parent=root
 cls basic dev dummy0 id none parent root prio 49152 protocol ip match-all
 cls basic dev dummy0 id :1 parent root prio 49152 protocol ip match-all
 $ nl-cls-list --dev=dummy0 --parent=ffff:
 cls basic dev dummy0 id none parent ffff: prio 49152 protocol ip match-all
 cls basic dev dummy0 id :1 parent ffff: prio 49152 protocol ip match-all

This is confusing and misleading, more importantly this is
a regression since 4.15, so the old behavior must be restored.

And, when tc filters are installed on a tc class, the parent
should be the classid, rather than the qdisc handle. Commit
edf6711c98 ("net: sched: remove classid and q fields from tcf_proto")
removed the classid we save for filters, we can just restore
this classid in tcf_block.

Steps to reproduce this:
 ip li set dev dummy0 up
 tc qd add dev dummy0 ingress
 tc filter add dev dummy0 parent ffff: protocol arp basic action pass
 tc filter show dev dummy0 root

Before this patch:
 filter protocol arp pref 49152 basic
 filter protocol arp pref 49152 basic handle 0x1
	action order 1: gact action pass
	 random type none pass val 0
	 index 1 ref 1 bind 1

After this patch:
 filter parent ffff: protocol arp pref 49152 basic
 filter parent ffff: protocol arp pref 49152 basic handle 0x1
 	action order 1: gact action pass
 	 random type none pass val 0
	 index 1 ref 1 bind 1

Fixes: a10fa20101 ("net: sched: propagate q and parent from caller down to tcf_fill_node")
Fixes: edf6711c98 ("net: sched: remove classid and q fields from tcf_proto")
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-04 11:53:33 -07:00
Arnd Bergmann
071a43e660 cxgb4/chcr: avoid -Wreturn-local-addr warning
gcc-10 warns about functions that return a pointer to a stack
variable. In chcr_write_cpl_set_tcb_ulp(), this does not actually
happen, but it's too hard to see for the compiler:

drivers/crypto/chelsio/chcr_ktls.c: In function 'chcr_write_cpl_set_tcb_ulp.constprop':
drivers/crypto/chelsio/chcr_ktls.c:760:9: error: function may return address of local variable [-Werror=return-local-addr]
  760 |  return pos;
      |         ^~~
drivers/crypto/chelsio/chcr_ktls.c:712:5: note: declared here
  712 |  u8 buf[48] = {0};
      |     ^~~

Split the middle part of the function out into a helper to make
it easier to understand by both humans and compilers, which avoids
the warning.

Fixes: 5a4b9fe7fe ("cxgb4/chcr: complete record tx handling")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-04 11:47:52 -07:00
Julian Wiedmann
c649c41d5d s390/qeth: fix cancelling of TX timer on dev_close()
With the introduction of TX coalescing, .ndo_start_xmit now potentially
starts the TX completion timer. So only kill the timer _after_ TX has
been disabled.

Fixes: ee1e52d1e4 ("s390/qeth: add TX IRQ coalescing support for IQD devices")
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-04 11:21:40 -07:00
Linus Torvalds
9851a0dee7 Merge tag 'gcc-plugins-v5.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull gcc-plugins fixes from Kees Cook:
 "GCC 10 fixes for gcc-plugins:

   - Adjust caller of cgraph_create_edge for GCC 10 argument usage

   - Update common headers to build under GCC 10 (Frédéric Pierret)"

* tag 'gcc-plugins-v5.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  gcc-common.h: Update for GCC 10
  gcc-plugins/stackleak: Avoid assignment for unused macro argument
2020-05-04 11:20:32 -07:00
Linus Torvalds
a16a47e98a Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Pull virtio fixes from Michael Tsirkin:
 "A couple of bug fixes"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
  vhost: vsock: kick send_pkt worker once device is started
  virtio-blk: handle block_device_operations callbacks after hot unplug
2020-05-04 11:10:24 -07:00
Dejin Zheng
d975cb7ea9 net: enetc: fix an issue about leak system resources
the related system resources were not released when enetc_hw_alloc()
return error in the enetc_pci_mdio_probe(), add iounmap() for error
handling label "err_hw_alloc" to fix it.

Fixes: 6517798dd3 ("enetc: Make MDIO accessors more generic and export to include/linux/fsl")
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Dejin Zheng <zhengdejin5@gmail.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-04 10:51:20 -07:00
Linus Torvalds
67f852ef92 Merge tag 'flexible-array-member-5.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux
Pull flex-array reverts from Gustavo Silva:
 "This reverts flexible array changes in include/uapi/

  These structures can get embedded in other structures in user-space
  and cause all sorts of warnings and problems[1]. So, we better don't
  take any chances and keep the zero-length arrays in place for now"

[1] https://lore.kernel.org/lkml/20200424121553.GE26002@ziepe.ca/

* tag 'flexible-array-member-5.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux:
  uapi: revert flexible-array conversions
2020-05-04 10:47:41 -07:00
Tariq Toukan
40e473071d net/mlx4_core: Fix use of ENOSPC around mlx4_counter_alloc()
When ENOSPC is set the idx is still valid and gets set to the global
MLX4_SINK_COUNTER_INDEX.  However gcc's static analysis cannot tell that
ENOSPC is impossible from mlx4_cmd_imm() and gives this warning:

drivers/net/ethernet/mellanox/mlx4/main.c:2552:28: warning: 'idx' may be
used uninitialized in this function [-Wmaybe-uninitialized]
 2552 |    priv->def_counter[port] = idx;

Also, when ENOSPC is returned mlx4_allocate_default_counters should not
fail.

Fixes: 6de5f7f6a1 ("net/mlx4_core: Allocate default counter per port")
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-04 10:42:06 -07:00
Aya Levin
bea0c5c942 devlink: Fix reporter's recovery condition
Devlink health core conditions the reporter's recovery with the
expiration of the grace period. This is not relevant for the first
recovery. Explicitly demand that the grace period will only apply to
recoveries other than the first.

Fixes: c8e1da0bf9 ("devlink: Add health report functionality")
Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-04 10:40:39 -07:00
Maxim Petrov
f42234ffd5 stmmac: fix pointer check after utilization in stmmac_interrupt
The paranoidal pointer check in IRQ handler looks very strange - it
really protects us only against bogus drivers which request IRQ line
with null pointer dev_id. However, the code fragment is incorrect
because the dev pointer is used before the actual check which leads
to undefined behavior. Remove the check to avoid confusing people
with incorrect code.

Signed-off-by: Maxim Petrov <mmrmaximuzz@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-04 10:33:53 -07:00
Tuong Lien
980d69276f tipc: fix partial topology connection closure
When an application connects to the TIPC topology server and subscribes
to some services, a new connection is created along with some objects -
'tipc_subscription' to store related data correspondingly...
However, there is one omission in the connection handling that when the
connection or application is orderly shutdown (e.g. via SIGQUIT, etc.),
the connection is not closed in kernel, the 'tipc_subscription' objects
are not freed too.
This results in:
- The maximum number of subscriptions (65535) will be reached soon, new
subscriptions will be rejected;
- TIPC module cannot be removed (unless the objects  are somehow forced
to release first);

The commit fixes the issue by closing the connection if the 'recvmsg()'
returns '0' i.e. when the peer is shutdown gracefully. It also includes
the other unexpected cases.

Acked-by: Jon Maloy <jmaloy@redhat.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-04 10:31:13 -07:00
Sage Weil
3a5ccecd9a MAINTAINERS: remove myself as ceph co-maintainer
Jeff, Ilya, and Dongsheng are doing all of the Ceph maintainance
these days.

[ idryomov: Remove Sage's git tree too, it hasn't been pushed to
  in years. ]

Signed-off-by: Sage Weil <sage@redhat.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-05-04 19:31:08 +02:00
Florian Fainelli
86f8b1c01a net: dsa: Do not make user port errors fatal
Prior to 1d27732f41 ("net: dsa: setup and teardown ports"), we would
not treat failures to set-up an user port as fatal, but after this
commit we would, which is a regression for some systems where interfaces
may be declared in the Device Tree, but the underlying hardware may not
be present (pluggable daughter cards for instance).

Fixes: 1d27732f41 ("net: dsa: setup and teardown ports")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-04 10:29:47 -07:00
Wu Bo
4d8e28ff31 ceph: fix double unlock in handle_cap_export()
If the ceph_mdsc_open_export_target_session() return fails, it will
do a "goto retry", but the session mutex has already been unlocked.
Re-lock the mutex in that case to ensure that we don't unlock it
twice.

Signed-off-by: Wu Bo <wubo40@huawei.com>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-05-04 19:14:23 +02:00
Wu Bo
7d8976afad ceph: fix special error code in ceph_try_get_caps()
There are 3 speical error codes: -EAGAIN/-EFBIG/-ESTALE.
After calling try_get_cap_refs, ceph_try_get_caps test for the
-EAGAIN twice. Ensure that it tests for -ESTALE instead.

Signed-off-by: Wu Bo <wubo40@huawei.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-05-04 19:14:23 +02:00
Jeff Layton
0fa8263367 ceph: fix endianness bug when handling MDS session feature bits
Eduard reported a problem mounting cephfs on s390 arch. The feature
mask sent by the MDS is little-endian, so we need to convert it
before storing and testing against it.

Cc: stable@vger.kernel.org
Reported-and-Tested-by: Eduard Shishkin <edward6@linux.ibm.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-05-04 19:14:23 +02:00
Shubhrajyoti Datta
2ae11c46d5 tty: xilinx_uartps: Fix missing id assignment to the console
When serial console has been assigned to ttyPS1 (which is serial1 alias)
console index is not updated property and pointing to index -1 (statically
initialized) which ends up in situation where nothing has been printed on
the port.

The commit 18cc7ac8a2 ("Revert "serial: uartps: Register own uart console
and driver structures"") didn't contain this line which was removed by
accident.

Fixes: 18cc7ac8a2 ("Revert "serial: uartps: Register own uart console and driver structures"")
Signed-off-by: Shubhrajyoti Datta <shubhrajyoti.datta@xilinx.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Michal Simek <michal.simek@xilinx.com>
Link: https://lore.kernel.org/r/ed3111533ef5bd342ee5ec504812240b870f0853.1588602446.git.michal.simek@xilinx.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-04 18:55:45 +02:00
Gustavo A. R. Silva
1e6e9d0f48 uapi: revert flexible-array conversions
These structures can get embedded in other structures in user-space
and cause all sorts of warnings and problems. So, we better don't take
any chances and keep the zero-length arrays in place for now.

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
2020-05-04 11:30:15 -05:00
Paolo Bonzini
8be8f932e3 kvm: ioapic: Restrict lazy EOI update to edge-triggered interrupts
Commit f458d039db ("kvm: ioapic: Lazy update IOAPIC EOI") introduces
the following infinite loop:

BUG: stack guard page was hit at 000000008f595917 \
(stack is 00000000bdefe5a4..00000000ae2b06f5)
kernel stack overflow (double-fault): 0000 [#1] SMP NOPTI
RIP: 0010:kvm_set_irq+0x51/0x160 [kvm]
Call Trace:
 irqfd_resampler_ack+0x32/0x90 [kvm]
 kvm_notify_acked_irq+0x62/0xd0 [kvm]
 kvm_ioapic_update_eoi_one.isra.0+0x30/0x120 [kvm]
 ioapic_set_irq+0x20e/0x240 [kvm]
 kvm_ioapic_set_irq+0x5c/0x80 [kvm]
 kvm_set_irq+0xbb/0x160 [kvm]
 ? kvm_hv_set_sint+0x20/0x20 [kvm]
 irqfd_resampler_ack+0x32/0x90 [kvm]
 kvm_notify_acked_irq+0x62/0xd0 [kvm]
 kvm_ioapic_update_eoi_one.isra.0+0x30/0x120 [kvm]
 ioapic_set_irq+0x20e/0x240 [kvm]
 kvm_ioapic_set_irq+0x5c/0x80 [kvm]
 kvm_set_irq+0xbb/0x160 [kvm]
 ? kvm_hv_set_sint+0x20/0x20 [kvm]
....

The re-entrancy happens because the irq state is the OR of
the interrupt state and the resamplefd state.  That is, we don't
want to show the state as 0 until we've had a chance to set the
resamplefd.  But if the interrupt has _not_ gone low then
ioapic_set_irq is invoked again, causing an infinite loop.

This can only happen for a level-triggered interrupt, otherwise
irqfd_inject would immediately set the KVM_USERSPACE_IRQ_SOURCE_ID high
and then low.  Fortunately, in the case of level-triggered interrupts the VMEXIT already happens because
TMR is set.  Thus, fix the bug by restricting the lazy invocation
of the ack notifier to edge-triggered interrupts, the only ones that
need it.

Tested-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Reported-by: borisvk@bstnet.org
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Link: https://www.spinics.net/lists/kvm/msg213512.html
Fixes: f458d039db ("kvm: ioapic: Lazy update IOAPIC EOI")
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=207489
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-04 12:29:05 -04:00
Matt Jolly
78d6de3cfb USB: serial: qcserial: Add DW5816e support
Add support for Dell Wireless 5816e to drivers/usb/serial/qcserial.c

Signed-off-by: Matt Jolly <Kangie@footclan.ninja>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Johan Hovold <johan@kernel.org>
2020-05-04 18:23:54 +02:00
Suravee Suthikulpanit
637543a8d6 KVM: x86: Fixes posted interrupt check for IRQs delivery modes
Current logic incorrectly uses the enum ioapic_irq_destination_types
to check the posted interrupt destination types. However, the value was
set using APIC_DM_XXX macros, which are left-shifted by 8 bits.

Fixes by using the APIC_DM_FIXED and APIC_DM_LOWEST instead.

Fixes: (fdcf756213 'KVM: x86: Disable posted interrupts for non-standard IRQs delivery modes')
Cc: Alexander Graf <graf@amazon.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Message-Id: <1586239989-58305-1-git-send-email-suravee.suthikulpanit@amd.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Tested-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-04 12:16:51 -04:00
Linus Torvalds
9d82973e03 gcc-10 warnings: fix low-hanging fruit
Due to a bug-report that was compiler-dependent, I updated one of my
machines to gcc-10.  That shows a lot of new warnings.  Happily they
seem to be mostly the valid kind, but it's going to cause a round of
churn for getting rid of them..

This is the really low-hanging fruit of removing a couple of zero-sized
arrays in some core code.  We have had a round of these patches before,
and we'll have many more coming, and there is nothing special about
these except that they were particularly trivial, and triggered more
warnings than most.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-05-04 09:16:37 -07:00
Paolo Bonzini
7134fa0709 Merge tag 'kvmarm-fixes-5.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into kvm-master
KVM/arm fixes for Linux 5.7, take #2

- Fix compilation with Clang
- Correctly initialize GICv4.1 in the absence of a virtual ITS
- Move SP_EL0 save/restore to the guest entry/exit code
- Handle PC wrap around on 32bit guests, and narrow all 32bit
  registers on userspace access
2020-05-04 12:01:37 -04:00
Paolo Bonzini
9e5e19f585 Merge tag 'kvmarm-fixes-5.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into kvm-master
KVM/arm fixes for Linux 5.7, take #1

- Prevent the userspace API from interacting directly with the HW
  stage of the virtual GIC
- Fix a couple of vGIC memory leaks
- Tighten the rules around the use of the 32bit PSCI functions
  for 64bit guest, as well as the opposite situation (matches the
  specification)
2020-05-04 12:01:12 -04:00
Paolo Bonzini
dee919d15d KVM: SVM: fill in kvm_run->debug.arch.dr[67]
The corresponding code was added for VMX in commit 42dbaa5a05
("KVM: x86: Virtualize debug registers, 2008-12-15) but never for AMD.
Fix this.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-04 11:59:03 -04:00
Sean Christopherson
f9336e3281 KVM: nVMX: Replace a BUG_ON(1) with BUG() to squash clang warning
Use BUG() in the impossible-to-hit default case when switching on the
scope of INVEPT to squash a warning with clang 11 due to clang treating
the BUG_ON() as conditional.

  >> arch/x86/kvm/vmx/nested.c:5246:3: warning: variable 'roots_to_free'
     is used uninitialized whenever 'if' condition is false
     [-Wsometimes-uninitialized]
                   BUG_ON(1);

Reported-by: kbuild test robot <lkp@intel.com>
Fixes: ce8fe7b77b ("KVM: nVMX: Free only the affected contexts when emulating INVEPT")
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200504153506.28898-1-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-04 11:58:55 -04:00
Xiaoguang Wang
d8f1b9716c io_uring: fix mismatched finish_wait() calls in io_uring_cancel_files()
The prepare_to_wait() and finish_wait() calls in io_uring_cancel_files()
are mismatched. Currently I don't see any issues related this bug, just
find it by learning codes.

Signed-off-by: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-04 09:07:14 -06:00
Arnd Bergmann
3a3a71f97c sun6i: dsi: fix gcc-4.8
Older compilers warn about initializers with incorrect curly
braces:

drivers/gpu/drm/sun4i/sun6i_mipi_dsi.c: In function 'sun6i_dsi_encoder_enable':
drivers/gpu/drm/sun4i/sun6i_mipi_dsi.c:720:8: error: missing braces around initializer [-Werror=missing-braces]
  union phy_configure_opts opts = { 0 };
        ^
drivers/gpu/drm/sun4i/sun6i_mipi_dsi.c:720:8: error: (near initialization for 'opts.mipi_dphy') [-Werror=missing-braces]

Use the GNU empty initializer extension to avoid this.

Fixes: bb3b6fcb68 ("sun6i: dsi: Convert to generic phy handling")
Reviewed-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
Link: https://patchwork.freedesktop.org/patch/msgid/20200428215105.3928459-1-arnd@arndb.de
2020-05-04 13:48:42 +02:00
H. Nikolaus Schaller
c59359a02d drm: ingenic-drm: add MODULE_DEVICE_TABLE
so that the driver can load by matching the device tree
if compiled as module.

Cc: stable@vger.kernel.org # v5.3+
Fixes: 90b86fcc47 ("DRM: Add KMS driver for the Ingenic JZ47xx SoCs")
Signed-off-by: H. Nikolaus Schaller <hns@goldelico.com>
Signed-off-by: Paul Cercueil <paul@crapouillou.net>
Link: https://patchwork.freedesktop.org/patch/msgid/1694a29b7a3449b6b662cec33d1b33f2ee0b174a.1588574111.git.hns@goldelico.com
2020-05-04 12:12:32 +02:00
Nicolas Pitre
57d38f26d8 vt: fix unicode console freeing with a common interface
By directly using kfree() in different places we risk missing one if
it is switched to using vfree(), especially if the corresponding
vmalloc() is hidden away within a common abstraction.

Oh wait, that's exactly what happened here.

So let's fix this by creating a common abstraction for the free case
as well.

Signed-off-by: Nicolas Pitre <nico@fluxnic.net>
Reported-by: syzbot+0bfda3ade1ee9288a1be@syzkaller.appspotmail.com
Fixes: 9a98e7a80f ("vt: don't use kmalloc() for the unicode screen buffer")
Cc: <stable@vger.kernel.org>
Reviewed-by: Sam Ravnborg <sam@ravnborg.org>
Link: https://lore.kernel.org/r/nycvar.YSQ.7.76.2005021043110.2671@knanqh.ubzr
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-04 11:49:59 +02:00
Florian Fainelli
092a9f59bc Revert "tty: serial: bcm63xx: fix missing clk_put() in bcm63xx_uart"
This reverts commit 580d952e44 ("tty:
serial: bcm63xx: fix missing clk_put() in bcm63xx_uart") because we
should not be doing a clk_put() if we were not successful in getting a
valid clock reference via clk_get() in the first place.

Fixes: 580d952e44 ("tty: serial: bcm63xx: fix missing clk_put() in bcm63xx_uart")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Cc: stable <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20200501013904.1394-1-f.fainelli@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-04 11:49:57 +02:00
Hans de Goede
1e189f2670 HID: quirks: Add HID_QUIRK_NO_INIT_REPORTS quirk for Dell K12A keyboard-dock
Add a HID_QUIRK_NO_INIT_REPORTS quirk for the Dell K12A keyboard-dock,
which can be used with various Dell Venue 11 models.

Without this quirk the keyboard/touchpad combo works fine when connected
at boot, but when hotplugged 9 out of 10 times it will not work properly.
Adding the quirk fixes this.

Cc: Mario Limonciello <mario.limonciello@dell.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2020-05-04 11:24:58 +02:00
Gurchetan Singh
c3e2850a9b drm/virtio: create context before RESOURCE_CREATE_2D in 3D mode
If 3D is enabled, but userspace requests a dumb buffer, we will
call CTX_ATTACH_RESOURCE before actually creating the context.

Fixes: 72b48ae800 ("drm/virtio: enqueue virtio_gpu_create_context after the first 3D ioctl")
Signed-off-by: Gurchetan Singh <gurchetansingh@chromium.org>
Link: http://patchwork.freedesktop.org/patch/msgid/20200501185557.740-1-gurchetansingh@chromium.org
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2020-05-04 09:53:37 +02:00
Dejin Zheng
b959c77dac net: macb: fix an issue about leak related system resources
A call of the function macb_init() can fail in the function
fu540_c000_init. The related system resources were not released
then. use devm_platform_ioremap_resource() to replace ioremap()
to fix it.

Fixes: c218ad5590 ("macb: Add support for SiFive FU540-C000")
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Reviewed-by: Yash Shah <yash.shah@sifive.com>
Suggested-by: Nicolas Ferre <nicolas.ferre@microchip.com>
Suggested-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Dejin Zheng <zhengdejin5@gmail.com>
Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-03 16:01:48 -07:00
Matt Jolly
57c7f2bd75 net: usb: qmi_wwan: add support for DW5816e
Add support for Dell Wireless 5816e to drivers/net/usb/qmi_wwan.c

Signed-off-by: Matt Jolly <Kangie@footclan.ninja>
Acked-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-03 15:57:29 -07:00
Eric Dumazet
2761121af8 net_sched: sch_skbprio: add message validation to skbprio_change()
Do not assume the attribute has the right size.

Fixes: aea5f654e6 ("net/sched: add skbprio scheduler")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-03 15:52:13 -07:00
Josh Poimboeuf
fb9cbbc895 x86/unwind/orc: Move ORC sorting variables under !CONFIG_MODULES
Fix the following warnings seen with !CONFIG_MODULES:

  arch/x86/kernel/unwind_orc.c:29:26: warning: 'cur_orc_table' defined but not used [-Wunused-variable]
     29 | static struct orc_entry *cur_orc_table = __start_orc_unwind;
        |                          ^~~~~~~~~~~~~
  arch/x86/kernel/unwind_orc.c:28:13: warning: 'cur_orc_ip_table' defined but not used [-Wunused-variable]
     28 | static int *cur_orc_ip_table = __start_orc_unwind_ip;
        |             ^~~~~~~~~~~~~~~~

Fixes: 153eb2223c ("x86/unwind/orc: Convert global variables to static")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linux Next Mailing List <linux-next@vger.kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20200428071640.psn5m7eh3zt2in4v@treble
2020-05-03 13:23:28 +02:00
Jia He
0b84103062 vhost: vsock: kick send_pkt worker once device is started
Ning Bo reported an abnormal 2-second gap when booting Kata container [1].
The unconditional timeout was caused by VSOCK_DEFAULT_CONNECT_TIMEOUT of
connecting from the client side. The vhost vsock client tries to connect
an initializing virtio vsock server.

The abnormal flow looks like:
host-userspace           vhost vsock                       guest vsock
==============           ===========                       ============
connect()     -------->  vhost_transport_send_pkt_work()   initializing
   |                     vq->private_data==NULL
   |                     will not be queued
   V
schedule_timeout(2s)
                         vhost_vsock_start()  <---------   device ready
                         set vq->private_data

wait for 2s and failed
connect() again          vq->private_data!=NULL         recv connecting pkt

Details:
1. Host userspace sends a connect pkt, at that time, guest vsock is under
   initializing, hence the vhost_vsock_start has not been called. So
   vq->private_data==NULL, and the pkt is not been queued to send to guest
2. Then it sleeps for 2s
3. After guest vsock finishes initializing, vq->private_data is set
4. When host userspace wakes up after 2s, send connecting pkt again,
   everything is fine.

As suggested by Stefano Garzarella, this fixes it by additional kicking the
send_pkt worker in vhost_vsock_start once the virtio device is started. This
makes the pending pkt sent again.

After this patch, kata-runtime (with vsock enabled) boot time is reduced
from 3s to 1s on a ThunderX2 arm64 server.

[1] https://github.com/kata-containers/runtime/issues/1917

Reported-by: Ning Bo <n.b@live.com>
Suggested-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Jia He <justin.he@arm.com>
Link: https://lore.kernel.org/r/20200501043840.186557-1-justin.he@arm.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
2020-05-02 10:28:21 -04:00
Stefan Hajnoczi
90b5feb8c4 virtio-blk: handle block_device_operations callbacks after hot unplug
A userspace process holding a file descriptor to a virtio_blk device can
still invoke block_device_operations after hot unplug.  This leads to a
use-after-free accessing vblk->vdev in virtblk_getgeo() when
ioctl(HDIO_GETGEO) is invoked:

  BUG: unable to handle kernel NULL pointer dereference at 0000000000000090
  IP: [<ffffffffc00e5450>] virtio_check_driver_offered_feature+0x10/0x90 [virtio]
  PGD 800000003a92f067 PUD 3a930067 PMD 0
  Oops: 0000 [#1] SMP
  CPU: 0 PID: 1310 Comm: hdio-getgeo Tainted: G           OE  ------------   3.10.0-1062.el7.x86_64 #1
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
  task: ffff9be5fbfb8000 ti: ffff9be5fa890000 task.ti: ffff9be5fa890000
  RIP: 0010:[<ffffffffc00e5450>]  [<ffffffffc00e5450>] virtio_check_driver_offered_feature+0x10/0x90 [virtio]
  RSP: 0018:ffff9be5fa893dc8  EFLAGS: 00010246
  RAX: ffff9be5fc3f3400 RBX: ffff9be5fa893e30 RCX: 0000000000000000
  RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffff9be5fbc10b40
  RBP: ffff9be5fa893dc8 R08: 0000000000000301 R09: 0000000000000301
  R10: 0000000000000000 R11: 0000000000000000 R12: ffff9be5fdc24680
  R13: ffff9be5fbc10b40 R14: ffff9be5fbc10480 R15: 0000000000000000
  FS:  00007f1bfb968740(0000) GS:ffff9be5ffc00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000000000000090 CR3: 000000003a894000 CR4: 0000000000360ff0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
   [<ffffffffc016ac37>] virtblk_getgeo+0x47/0x110 [virtio_blk]
   [<ffffffff8d3f200d>] ? handle_mm_fault+0x39d/0x9b0
   [<ffffffff8d561265>] blkdev_ioctl+0x1f5/0xa20
   [<ffffffff8d488771>] block_ioctl+0x41/0x50
   [<ffffffff8d45d9e0>] do_vfs_ioctl+0x3a0/0x5a0
   [<ffffffff8d45dc81>] SyS_ioctl+0xa1/0xc0

A related problem is that virtblk_remove() leaks the vd_index_ida index
when something still holds a reference to vblk->disk during hot unplug.
This causes virtio-blk device names to be lost (vda, vdb, etc).

Fix these issues by protecting vblk->vdev with a mutex and reference
counting vblk so the vd_index_ida index can be removed in all cases.

Fixes: 48e4043d45 ("virtio: add virtio disk geometry feature")
Reported-by: Lance Digby <ldigby@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Link: https://lore.kernel.org/r/20200430140442.171016-1-stefanha@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
2020-05-02 10:28:13 -04:00
Arnd Bergmann
dc30b4059f drop_monitor: work around gcc-10 stringop-overflow warning
The current gcc-10 snapshot produces a false-positive warning:

net/core/drop_monitor.c: In function 'trace_drop_common.constprop':
cc1: error: writing 8 bytes into a region of size 0 [-Werror=stringop-overflow=]
In file included from net/core/drop_monitor.c:23:
include/uapi/linux/net_dropmon.h:36:8: note: at offset 0 to object 'entries' with size 4 declared here
   36 |  __u32 entries;
      |        ^~~~~~~

I reported this in the gcc bugzilla, but in case it does not get
fixed in the release, work around it by using a temporary variable.

Fixes: 9a8afc8d39 ("Network Drop Monitor: Adding drop monitor implementation & Netlink protocol")
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94881
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-01 15:45:16 -07:00
Yoshiyuki Kurauchi
846c68f7f1 gtp: set NLM_F_MULTI flag in gtp_genl_dump_pdp()
In drivers/net/gtp.c, gtp_genl_dump_pdp() should set NLM_F_MULTI
flag since it returns multipart message.
This patch adds a new arg "flags" in gtp_genl_fill_info() so that
flags can be set by the callers.

Signed-off-by: Yoshiyuki Kurauchi <ahochauwaaaaa@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-01 15:34:09 -07:00
Jules Irenge
cae9566acb cxgb4: Add missing annotation for service_ofldq()
Sparse reports a warning at service_ofldq()

warning: context imbalance in service_ofldq() - unexpected unlock

The root cause is the missing annotation at service_ofldq()

Add the missing __must_hold(&q->sendq.lock) annotation

Signed-off-by: Jules Irenge <jbi.octave@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-01 15:31:37 -07:00
Jacob Keller
709e7158f0 ice: cleanup language in ice.rst for fw.app
The documentation for the ice driver around "fw.app" has a spelling
mistake in variation. Additionally, the language of "shall have a unique
name" sounds like a requirement. Reword this to read more like
a description or property.

Reported-by: Benjamin Fisher <benjamin.l.fisher@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Acked-by: Jakub Kicinski <kubakici@wp.pl>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-01 15:30:24 -07:00
Clay McClure
b6d49cab44 net: Make PTP-specific drivers depend on PTP_1588_CLOCK
Commit d1cbfd771c ("ptp_clock: Allow for it to be optional") changed
all PTP-capable Ethernet drivers from `select PTP_1588_CLOCK` to `imply
PTP_1588_CLOCK`, "in order to break the hard dependency between the PTP
clock subsystem and ethernet drivers capable of being clock providers."
As a result it is possible to build PTP-capable Ethernet drivers without
the PTP subsystem by deselecting PTP_1588_CLOCK. Drivers are required to
handle the missing dependency gracefully.

Some PTP-capable Ethernet drivers (e.g., TI_CPSW) factor their PTP code
out into separate drivers (e.g., TI_CPTS_MOD). The above commit also
changed these PTP-specific drivers to `imply PTP_1588_CLOCK`, making it
possible to build them without the PTP subsystem. But as Grygorii
Strashko noted in [1]:

On Wed, Apr 22, 2020 at 02:16:11PM +0300, Grygorii Strashko wrote:

> Another question is that CPTS completely nonfunctional in this case and
> it was never expected that somebody will even try to use/run such
> configuration (except for random build purposes).

In my view, enabling a PTP-specific driver without the PTP subsystem is
a configuration error made possible by the above commit. Kconfig should
not allow users to create a configuration with missing dependencies that
results in "completely nonfunctional" drivers.

I audited all network drivers that call ptp_clock_register() but merely
`imply PTP_1588_CLOCK` and found five PTP-specific drivers that are
likely nonfunctional without PTP_1588_CLOCK:

    NET_DSA_MV88E6XXX_PTP
    NET_DSA_SJA1105_PTP
    MACB_USE_HWSTAMP
    CAVIUM_PTP
    TI_CPTS_MOD

Note how these symbols all reference PTP or timestamping in their name;
this is a clue that they depend on PTP_1588_CLOCK.

Change them from `imply PTP_1588_CLOCK` [2] to `depends on PTP_1588_CLOCK`.
I'm not using `select PTP_1588_CLOCK` here because PTP_1588_CLOCK has
its own dependencies, which `select` would not transitively apply.

Additionally, remove the `select NET_PTP_CLASSIFY` from CPTS_TI_MOD;
PTP_1588_CLOCK already selects that.

[1]: https://lore.kernel.org/lkml/c04458ed-29ee-1797-3a11-7f3f560553e6@ti.com/

[2]: NET_DSA_SJA1105_PTP had never declared any type of dependency on
PTP_1588_CLOCK (`imply` or otherwise); adding a `depends on PTP_1588_CLOCK`
here seems appropriate.

Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Nicolas Pitre <nico@fluxnic.net>
Cc: Grygorii Strashko <grygorii.strashko@ti.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Fixes: d1cbfd771c ("ptp_clock: Allow for it to be optional")
Signed-off-by: Clay McClure <clay@daemons.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-01 15:27:51 -07:00
Jakub Kicinski
610a9346c1 devlink: fix return value after hitting end in region read
Commit d5b90e99e1 ("devlink: report 0 after hitting end in region read")
fixed region dump, but region read still returns a spurious error:

$ devlink region read netdevsim/netdevsim1/dummy snapshot 0 addr 0 len 128
0000000000000000 a6 f4 c4 1c 21 35 95 a6 9d 34 c3 5b 87 5b 35 79
0000000000000010 f3 a0 d7 ee 4f 2f 82 7f c6 dd c4 f6 a5 c3 1b ae
0000000000000020 a4 fd c8 62 07 59 48 03 70 3b c7 09 86 88 7f 68
0000000000000030 6f 45 5d 6d 7d 0e 16 38 a9 d0 7a 4b 1e 1e 2e a6
0000000000000040 e6 1d ae 06 d6 18 00 85 ca 62 e8 7e 11 7e f6 0f
0000000000000050 79 7e f7 0f f3 94 68 bd e6 40 22 85 b6 be 6f b1
0000000000000060 af db ef 5e 34 f0 98 4b 62 9a e3 1b 8b 93 fc 17
devlink answers: Invalid argument
0000000000000070 61 e8 11 11 66 10 a5 f7 b1 ea 8d 40 60 53 ed 12

This is a minimal fix, I'll follow up with a restructuring
so we don't have two checks for the same condition.

Fixes: fdd41ec21e ("devlink: Return right error code in case of errors for region read")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-01 15:25:35 -07:00
Nathan Chancellor
7fdc66debe hv_netvsc: Fix netvsc_start_xmit's return type
netvsc_start_xmit is used as a callback function for the ndo_start_xmit
function pointer. ndo_start_xmit's return type is netdev_tx_t but
netvsc_start_xmit's return type is int.

This causes a failure with Control Flow Integrity (CFI), which requires
function pointer prototypes and callback function definitions to match
exactly. When CFI is in enforcing, the kernel panics. When booting a
CFI kernel with WSL 2, the VM is immediately terminated because of this.

The splat when CONFIG_CFI_PERMISSIVE is used:

[    5.916765] CFI failure (target: netvsc_start_xmit+0x0/0x10):
[    5.916771] WARNING: CPU: 8 PID: 0 at kernel/cfi.c:29 __cfi_check_fail+0x2e/0x40
[    5.916772] Modules linked in:
[    5.916774] CPU: 8 PID: 0 Comm: swapper/8 Not tainted 5.7.0-rc3-next-20200424-microsoft-cbl-00001-ged4eb37d2c69-dirty #1
[    5.916776] RIP: 0010:__cfi_check_fail+0x2e/0x40
[    5.916777] Code: 48 c7 c7 70 98 63 a9 48 c7 c6 11 db 47 a9 e8 69 55 59 00 85 c0 75 02 5b c3 48 c7 c7 73 c6 43 a9 48 89 de 31 c0 e8 12 2d f0 ff <0f> 0b 5b c3 00 00 cc cc 00 00 cc cc 00 00 cc cc 00 00 85 f6 74 25
[    5.916778] RSP: 0018:ffffa803c0260b78 EFLAGS: 00010246
[    5.916779] RAX: 712a1af25779e900 RBX: ffffffffa8cf7950 RCX: ffffffffa962cf08
[    5.916779] RDX: ffffffffa9c36b60 RSI: 0000000000000082 RDI: ffffffffa9c36b5c
[    5.916780] RBP: ffff8ffc4779c2c0 R08: 0000000000000001 R09: ffffffffa9c3c300
[    5.916781] R10: 0000000000000151 R11: ffffffffa9c36b60 R12: ffff8ffe39084000
[    5.916782] R13: ffffffffa8cf7950 R14: ffffffffa8d12cb0 R15: ffff8ffe39320140
[    5.916784] FS:  0000000000000000(0000) GS:ffff8ffe3bc00000(0000) knlGS:0000000000000000
[    5.916785] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    5.916786] CR2: 00007ffef5749408 CR3: 00000002f4f5e000 CR4: 0000000000340ea0
[    5.916787] Call Trace:
[    5.916788]  <IRQ>
[    5.916790]  __cfi_check+0x3ab58/0x450e0
[    5.916793]  ? dev_hard_start_xmit+0x11f/0x160
[    5.916795]  ? sch_direct_xmit+0xf2/0x230
[    5.916796]  ? __dev_queue_xmit.llvm.11471227737707190958+0x69d/0x8e0
[    5.916797]  ? neigh_resolve_output+0xdf/0x220
[    5.916799]  ? neigh_connected_output.cfi_jt+0x8/0x8
[    5.916801]  ? ip6_finish_output2+0x398/0x4c0
[    5.916803]  ? nf_nat_ipv6_out+0x10/0xa0
[    5.916804]  ? nf_hook_slow+0x84/0x100
[    5.916807]  ? ip6_input_finish+0x8/0x8
[    5.916807]  ? ip6_output+0x6f/0x110
[    5.916808]  ? __ip6_local_out.cfi_jt+0x8/0x8
[    5.916810]  ? mld_sendpack+0x28e/0x330
[    5.916811]  ? ip_rt_bug+0x8/0x8
[    5.916813]  ? mld_ifc_timer_expire+0x2db/0x400
[    5.916814]  ? neigh_proxy_process+0x8/0x8
[    5.916816]  ? call_timer_fn+0x3d/0xd0
[    5.916817]  ? __run_timers+0x2a9/0x300
[    5.916819]  ? rcu_core_si+0x8/0x8
[    5.916820]  ? run_timer_softirq+0x14/0x30
[    5.916821]  ? __do_softirq+0x154/0x262
[    5.916822]  ? native_x2apic_icr_write+0x8/0x8
[    5.916824]  ? irq_exit+0xba/0xc0
[    5.916825]  ? hv_stimer0_vector_handler+0x99/0xe0
[    5.916826]  ? hv_stimer0_callback_vector+0xf/0x20
[    5.916826]  </IRQ>
[    5.916828]  ? hv_stimer_global_cleanup.cfi_jt+0x8/0x8
[    5.916829]  ? raw_setsockopt+0x8/0x8
[    5.916830]  ? default_idle+0xe/0x10
[    5.916832]  ? do_idle.llvm.10446269078108580492+0xb7/0x130
[    5.916833]  ? raw_setsockopt+0x8/0x8
[    5.916833]  ? cpu_startup_entry+0x15/0x20
[    5.916835]  ? cpu_hotplug_enable.cfi_jt+0x8/0x8
[    5.916836]  ? start_secondary+0x188/0x190
[    5.916837]  ? secondary_startup_64+0xa5/0xb0
[    5.916838] ---[ end trace f2683fa869597ba5 ]---

Avoid this by using the right return type for netvsc_start_xmit.

Fixes: fceaf24a94 ("Staging: hv: add the Hyper-V virtual network driver")
Link: https://github.com/ClangBuiltLinux/linux/issues/1009
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-01 15:24:46 -07:00
David S. Miller
384649e79e Merge branch 'WoL-fixes-for-DP83822-and-DP83tc811'
Dan Murphy says:

====================
WoL fixes for DP83822 and DP83tc811

The WoL feature for each device was enabled during boot or when the PHY was
brought up which may be undesired.  These patches disable the WoL in the
config_init.  The disabling and enabling of the WoL is now done though the
set_wol call.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-01 15:23:44 -07:00
Dan Murphy
6c599044b0 net: phy: DP83TC811: Fix WoL in config init to be disabled
The WoL feature should be disabled when config_init is called and the
feature should turned on or off  when set_wol is called.

In addition updated the calls to modify the registers to use the set_bit
and clear_bit function calls.

Fixes: 6d749428788b ("net: phy: DP83TC811: Introduce support for the
DP83TC811 phy")
Signed-off-by: Dan Murphy <dmurphy@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-01 15:23:44 -07:00
Dan Murphy
600ac36b53 net: phy: DP83822: Fix WoL in config init to be disabled
The WoL feature should be disabled when config_init is called and the
feature should turned on or off  when set_wol is called.

In addition updated the calls to modify the registers to use the set_bit
and clear_bit function calls.

Fixes: 3b427751a9d0 ("net: phy: DP83822 initial driver submission")
Signed-off-by: Dan Murphy <dmurphy@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-01 15:23:44 -07:00
David Ahern
8f34e53b60 ipv6: Use global sernum for dst validation with nexthop objects
Nik reported a bug with pcpu dst cache when nexthop objects are
used illustrated by the following:
    $ ip netns add foo
    $ ip -netns foo li set lo up
    $ ip -netns foo addr add 2001:db8:11::1/128 dev lo
    $ ip netns exec foo sysctl net.ipv6.conf.all.forwarding=1
    $ ip li add veth1 type veth peer name veth2
    $ ip li set veth1 up
    $ ip addr add 2001:db8:10::1/64 dev veth1
    $ ip li set dev veth2 netns foo
    $ ip -netns foo li set veth2 up
    $ ip -netns foo addr add 2001:db8:10::2/64 dev veth2
    $ ip -6 nexthop add id 100 via 2001:db8:10::2 dev veth1
    $ ip -6 route add 2001:db8:11::1/128 nhid 100

    Create a pcpu entry on cpu 0:
    $ taskset -a -c 0 ip -6 route get 2001:db8:11::1

    Re-add the route entry:
    $ ip -6 ro del 2001:db8:11::1
    $ ip -6 route add 2001:db8:11::1/128 nhid 100

    Route get on cpu 0 returns the stale pcpu:
    $ taskset -a -c 0 ip -6 route get 2001:db8:11::1
    RTNETLINK answers: Network is unreachable

    While cpu 1 works:
    $ taskset -a -c 1 ip -6 route get 2001:db8:11::1
    2001:db8:11::1 from :: via 2001:db8:10::2 dev veth1 src 2001:db8:10::1 metric 1024 pref medium

Conversion of FIB entries to work with external nexthop objects
missed an important difference between IPv4 and IPv6 - how dst
entries are invalidated when the FIB changes. IPv4 has a per-network
namespace generation id (rt_genid) that is bumped on changes to the FIB.
Checking if a dst_entry is still valid means comparing rt_genid in the
rtable to the current value of rt_genid for the namespace.

IPv6 also has a per network namespace counter, fib6_sernum, but the
count is saved per fib6_node. With the per-node counter only dst_entries
based on fib entries under the node are invalidated when changes are
made to the routes - limiting the scope of invalidations. IPv6 uses a
reference in the rt6_info, 'from', to track the corresponding fib entry
used to create the dst_entry. When validating a dst_entry, the 'from'
is used to backtrack to the fib6_node and check the sernum of it to the
cookie passed to the dst_check operation.

With the inline format (nexthop definition inline with the fib6_info),
dst_entries cached in the fib6_nh have a 1:1 correlation between fib
entries, nexthop data and dst_entries. With external nexthops, IPv6
looks more like IPv4 which means multiple fib entries across disparate
fib6_nodes can all reference the same fib6_nh. That means validation
of dst_entries based on external nexthops needs to use the IPv4 format
- the per-network namespace counter.

Add sernum to rt6_info and set it when creating a pcpu dst entry. Update
rt6_get_cookie to return sernum if it is set and update dst_check for
IPv6 to look for sernum set and based the check on it if so. Finally,
rt6_get_pcpu_route needs to validate the cached entry before returning
a pcpu entry (similar to the rt_cache_valid calls in __mkroute_input and
__mkroute_output for IPv4).

This problem only affects routes using the new, external nexthops.

Thanks to the kbuild test robot for catching the IS_ENABLED needed
around rt_genid_ipv6 before I sent this out.

Fixes: 5b98324ebe ("ipv6: Allow routes to use nexthop objects")
Reported-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Tested-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-01 12:46:30 -07:00
Thomas Gleixner
c84cb3735f x86/apic: Move TSC deadline timer debug printk
Leon reported that the printk_once() in __setup_APIC_LVTT() triggers a
lockdep splat due to a lock order violation between hrtimer_base::lock and
console_sem, when the 'once' condition is reset via
/sys/kernel/debug/clear_warn_once after boot.

The initial printk cannot trigger this because that happens during boot
when the local APIC timer is set up on the boot CPU.

Prevent it by moving the printk to a place which is guaranteed to be only
called once during boot.

Mark the deadline timer check related functions and data __init while at
it.

Reported-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/87y2qhoshi.fsf@nanos.tec.linutronix.de
2020-05-01 19:15:41 +02:00
Konstantin Khlebnikov
fdc63ff0e4 ftrace/x86: Fix trace event registration for syscalls without arguments
The refactoring of SYSCALL_DEFINE0() macros removed the ABI stubs and
simply defines __abi_sys_$NAME as alias of __do_sys_$NAME.

As a result kallsyms_lookup() returns "__do_sys_$NAME" which does not match
with the declared trace event name.

See also commit 1c758a2202 ("tracing/x86: Update syscall trace events to
handle new prefixed syscall func names").

Add __do_sys_ to the valid prefixes which are checked in
arch_syscall_match_sym_name().

Fixes: d2b5de495e ("x86/entry: Refactor SYSCALL_DEFINE0 macros")
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Link: https://lkml.kernel.org/r/158636958997.7900.16485049455470033557.stgit@buzz
2020-05-01 19:15:40 +02:00
Shuah Khan
66d69e081b selftests: fix kvm relocatable native/cross builds and installs
kvm test Makefile doesn't fully support cross-builds and installs.
UNAME_M = $(shell uname -m) variable is used to define the target
programs and libraries to be built from arch specific sources in
sub-directories.

For cross-builds to work, UNAME_M has to map to ARCH and arch specific
directories and targets in this Makefile.

UNAME_M variable to used to run the compiles pointing to the right arch
directories and build the right targets for these supported architectures.

TEST_GEN_PROGS and LIBKVM are set using UNAME_M variable.
LINUX_TOOL_ARCH_INCLUDE is set using ARCH variable.

x86_64 targets are named to include x86_64 as a suffix and directories
for includes are in x86_64 sub-directory. s390x and aarch64 follow the
same convention. "uname -m" doesn't result in the correct mapping for
s390x and aarch64. Fix it to set UNAME_M correctly for s390x and aarch64
cross-builds.

In addition, Makefile doesn't create arch sub-directories in the case of
relocatable builds and test programs under s390x and x86_64 directories
fail to build. This is a problem for native and cross-builds. Fix it to
create all necessary directories keying off of TEST_GEN_PROGS.

The following use-cases work with this change:

Native x86_64:
make O=/tmp/kselftest -C tools/testing/selftests TARGETS=kvm install \
 INSTALL_PATH=$HOME/x86_64

arm64 cross-build:
make O=$HOME/arm64_build/ ARCH=arm64 HOSTCC=gcc \
	CROSS_COMPILE=aarch64-linux-gnu- defconfig

make O=$HOME/arm64_build/ ARCH=arm64 HOSTCC=gcc \
	CROSS_COMPILE=aarch64-linux-gnu- all

make kselftest-install TARGETS=kvm O=$HOME/arm64_build ARCH=arm64 \
	HOSTCC=gcc CROSS_COMPILE=aarch64-linux-gnu-

s390x cross-build:
make O=$HOME/s390x_build/ ARCH=s390 HOSTCC=gcc \
	CROSS_COMPILE=s390x-linux-gnu- defconfig

make O=$HOME/s390x_build/ ARCH=s390 HOSTCC=gcc \
	CROSS_COMPILE=s390x-linux-gnu- all

make kselftest-install TARGETS=kvm O=$HOME/s390x_build/ ARCH=s390 \
	HOSTCC=gcc CROSS_COMPILE=s390x-linux-gnu- all

No regressions in the following use-cases:
make -C tools/testing/selftests TARGETS=kvm
make kselftest-all TARGETS=kvm

Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2020-05-01 09:47:55 -06:00
Masami Hiramatsu
6734d211fe selftests/ftrace: Make XFAIL green color
Since XFAIL (Expected Failure) is expected to fail the test, which
means that test case works as we expected. IOW, XFAIL is same as
PASS. So make it green.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2020-05-01 09:31:38 -06:00
Alan Maguire
b730d66813 ftrace/selftest: make unresolved cases cause failure if --fail-unresolved set
Currently, ftracetest will return 1 (failure) if any unresolved cases
are encountered.  The unresolved status results from modules and
programs not being available, and as such does not indicate any
issues with ftrace itself.  As such, change the behaviour of
ftracetest in line with unsupported cases; if unsupported cases
happen, ftracetest still returns 0 unless --fail-unsupported.  Here
--fail-unresolved is added and the default is to return 0 if
unresolved results occur.

Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2020-05-01 09:26:51 -06:00
Alan Maguire
57c4cfd4a2 ftrace/selftests: workaround cgroup RT scheduling issues
wakeup_rt.tc and wakeup.tc tests in tracers/ subdirectory
fail due to the chrt command returning:

 chrt: failed to set pid 0's policy: Operation not permitted.

To work around this, temporarily disable grout RT scheduling
during ftracetest execution.  Restore original value on
test run completion.  With these changes in place, both
tests consistently pass.

Fixes: c575dea2c1 ("selftests/ftrace: Add wakeup_rt tracer testcase")
Fixes: c1edd060b4 ("selftests/ftrace: Add wakeup tracer testcase")
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2020-05-01 09:25:39 -06:00
Daniel Kolesa
59dfb0c64d drm/amd/display: work around fp code being emitted outside of DC_FP_START/END
The dcn20_validate_bandwidth function would have code touching the
incorrect registers emitted outside of the boundaries of the
DC_FP_START/END macros, at least on ppc64le. Work around the
problem by wrapping the whole function instead.

Signed-off-by: Daniel Kolesa <daniel@octaforge.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 5.6.x
2020-05-01 09:24:40 -04:00
Michel Dänzer
f33a6dec4e drm/amdgpu/dc: Use WARN_ON_ONCE for ASSERT
Once should generally be enough for diagnosing what lead up to it,
repeating it over and over can be pretty annoying.

Signed-off-by: Michel Dänzer <mdaenzer@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-05-01 09:24:40 -04:00
Evan Quan
f7b52890da drm/amdgpu: drop redundant cg/pg ungate on runpm enter
CG/PG ungate is already performed in ip_suspend_phase1. Otherwise,
the CG/PG ungate will be performed twice. That will cause gfxoff
disablement is performed twice also on runpm enter while gfxoff
enablemnt once on rump exit. That will put gfxoff into disabled
state.

Fixes: b2a7e9735a ("drm/amdgpu: fix the hw hang during perform system reboot and reset")
Signed-off-by: Evan Quan <evan.quan@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2020-05-01 09:23:49 -04:00
Evan Quan
c457a273e1 drm/amdgpu: move kfd suspend after ip_suspend_phase1
This sequence change should be safe as what did in ip_suspend_phase1
is to suspend DCE only. And this is a prerequisite for coming
redundant cg/pg ungate dropping.

Fixes: 487eca11a3 ("drm/amdgpu: fix gfx hang during suspend with video playback (v2)")
Signed-off-by: Evan Quan <evan.quan@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2020-05-01 09:22:14 -04:00
Marc Zyngier
0225fd5e0a KVM: arm64: Fix 32bit PC wrap-around
In the unlikely event that a 32bit vcpu traps into the hypervisor
on an instruction that is located right at the end of the 32bit
range, the emulation of that instruction is going to increment
PC past the 32bit range. This isn't great, as userspace can then
observe this value and get a bit confused.

Conversly, userspace can do things like (in the context of a 64bit
guest that is capable of 32bit EL0) setting PSTATE to AArch64-EL0,
set PC to a 64bit value, change PSTATE to AArch32-USR, and observe
that PC hasn't been truncated. More confusion.

Fix both by:
- truncating PC increments for 32bit guests
- sanitizing all 32bit regs every time a core reg is changed by
  userspace, and that PSTATE indicates a 32bit mode.

Cc: stable@vger.kernel.org
Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2020-05-01 09:51:08 +01:00
Rahul Lakkireddy
69422a7e5d cxgb4: fix EOTID leak when disabling TC-MQPRIO offload
Under heavy load, the EOTID termination FLOWC request fails to get
enqueued to the end of the Tx ring due to lack of credits. This
results in EOTID leak.

When disabling TC-MQPRIO offload, the link is already brought down
to cleanup EOTIDs. So, flush any pending enqueued skbs that can't be
sent outside the wire, to make room for FLOWC request. Also, move the
FLOWC descriptor consumption logic closer to when the FLOWC request is
actually posted to hardware.

Fixes: 0e395b3cb1 ("cxgb4: add FLOWC based QoS offload")
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-30 21:01:46 -07:00
Andy Shevchenko
ab1c637cc6 stmmac: intel: Fix kernel crash due to wrong error path
Unfortunately sometimes ->probe() may fail. The commit b9663b7ca6
("net: stmmac: Enable SERDES power up/down sequence")
messed up with error handling and thus:

[   12.811311] ------------[ cut here ]------------
[   12.811993] kernel BUG at net/core/dev.c:9937!

Fix this by properly crafted error path.

Fixes: b9663b7ca6 ("net: stmmac: Enable SERDES power up/down sequence")
Cc: Voon Weifeng <weifeng.voon@intel.com>
Cc: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-30 21:00:17 -07:00
Jiri Pirko
6ef4889fc0 mlxsw: spectrum_acl_tcam: Position vchunk in a vregion list properly
Vregion helpers to get min and max priority depend on the correct
ordering of vchunks in the vregion list. However, the current code
always adds new chunk to the end of the list, no matter what the
priority is. Fix this by finding the correct place in the list and put
vchunk there.

Fixes: 22a677661f ("mlxsw: spectrum: Introduce ACL core with simple TCAM implementation")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-30 20:41:14 -07:00
Toke Høiland-Jørgensen
b723748750 tunnel: Propagate ECT(1) when decapsulating as recommended by RFC6040
RFC 6040 recommends propagating an ECT(1) mark from an outer tunnel header
to the inner header if that inner header is already marked as ECT(0). When
RFC 6040 decapsulation was implemented, this case of propagation was not
added. This simply appears to be an oversight, so let's fix that.

Fixes: eccc1bb8d4 ("tunnel: drop packet if ECN present with not-ECT")
Reported-by: Bob Briscoe <ietf@bobbriscoe.net>
Reported-by: Olivier Tilmans <olivier.tilmans@nokia-bell-labs.com>
Cc: Dave Taht <dave.taht@gmail.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-30 20:32:15 -07:00
KP Singh
54261af473 security: Fix the default value of fs_context_parse_param hook
security_fs_context_parse_param is called by vfs_parse_fs_param and
a succussful return value (i.e 0) implies that a parameter will be
consumed by the LSM framework. This stops all further parsing of the
parmeter by VFS. Furthermore, if an LSM hook returns a success, the
remaining LSM hooks are not invoked for the parameter.

The current default behavior of returning success means that all the
parameters are expected to be parsed by the LSM hook and none of them
end up being populated by vfs in fs_context

This was noticed when lsm=bpf is supplied on the command line before any
other LSM. As the bpf lsm uses this default value to implement a default
hook, this resulted in a failure to parse any fs_context parameters and
a failure to mount the root filesystem.

Fixes: 98e828a065 ("security: Refactor declaration of LSM hooks")
Reported-by: Mikko Ylinen <mikko.ylinen@linux.intel.com>
Signed-off-by: KP Singh <kpsingh@google.com>
Signed-off-by: James Morris <jmorris@namei.org>
2020-04-30 20:29:34 -07:00
Andy Shevchenko
0ce205d466 net: macb: Fix runtime PM refcounting
The commit e6a41c23df, while trying to fix an issue,

    ("net: macb: ensure interface is not suspended on at91rm9200")

introduced a refcounting regression, because in error case refcounter
must be balanced. Fix it by calling pm_runtime_put_noidle() in error case.

While here, fix the same mistake in other couple of places.

Fixes: e6a41c23df ("net: macb: ensure interface is not suspended on at91rm9200")
Cc: Alexandre Belloni <alexandre.belloni@bootlin.com>
Cc: Claudiu Beznea <claudiu.beznea@microchip.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-30 20:27:48 -07:00
Christophe JAILLET
ee8d2267f0 net: moxa: Fix a potential double 'free_irq()'
Should an irq requested with 'devm_request_irq' be released explicitly,
it should be done by 'devm_free_irq()', not 'free_irq()'.

Fixes: 6c821bd9ed ("net: Add MOXA ART SoCs ethernet driver")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-30 20:23:49 -07:00
Scott Dial
ab046a5d4b net: macsec: preserve ingress frame ordering
MACsec decryption always occurs in a softirq context. Since
the FPU may not be usable in the softirq context, the call to
decrypt may be scheduled on the cryptd work queue. The cryptd
work queue does not provide ordering guarantees. Therefore,
preserving order requires masking out ASYNC implementations
of gcm(aes).

For instance, an Intel CPU with AES-NI makes available the
generic-gcm-aesni driver from the aesni_intel module to
implement gcm(aes). However, this implementation requires
the FPU, so it is not always available to use from a softirq
context, and will fallback to the cryptd work queue, which
does not preserve frame ordering. With this change, such a
system would select gcm_base(ctr(aes-aesni),ghash-generic).
While the aes-aesni implementation prefers to use the FPU, it
will fallback to the aes-asm implementation if unavailable.

By using a synchronous version of gcm(aes), the decryption
will complete before returning from crypto_aead_decrypt().
Therefore, the macsec_decrypt_done() callback will be called
before returning from macsec_decrypt(). Thus, the order of
calls to macsec_post_decrypt() for the frames is preserved.

While it's presumable that the pure AES-NI version of gcm(aes)
is more performant, the hybrid solution is capable of gigabit
speeds on modest hardware. Regardless, preserving the order
of frames is paramount for many network protocols (e.g.,
triggering TCP retries). Within the MACsec driver itself, the
replay protection is tripped by the out-of-order frames, and
can cause frames to be dropped.

This bug has been present in this code since it was added in
v4.6, however it may not have been noticed since not all CPUs
have FPU offload available. Additionally, the bug manifests
as occasional out-of-order packets that are easily
misattributed to other network phenomena.

When this code was added in v4.6, the crypto/gcm.c code did
not restrict selection of the ghash function based on the
ASYNC flag. For instance, x86 CPUs with PCLMULQDQ would
select the ghash-clmulni driver instead of ghash-generic,
which submits to the cryptd work queue if the FPU is busy.
However, this bug was was corrected in v4.8 by commit
b30bdfa864, and was backported
all the way back to the v3.14 stable branch, so this patch
should be applicable back to the v4.6 stable branch.

Signed-off-by: Scott Dial <scott@scottdial.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-30 20:20:34 -07:00
David S. Miller
b6f875a8d9 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes for net:

1) Do not update the UDP checksum when it's zero, from Guillaume Nault.

2) Fix return of local variable in nf_osf, from Arnd Bergmann.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-30 18:07:16 -07:00
David S. Miller
c778980a65 Merge branch 'net-ipa-three-bug-fixes'
Alex Elder says:

====================
net: ipa: three bug fixes

This series fixes three bugs in the Qualcomm IPA code.  The third
adds a missing error code initialization step.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-30 18:04:58 -07:00
Alex Elder
0b1ba18aec net: ipa: zero return code before issuing generic EE command
Zero the result code stored in a field of the scratch 0 register
before issuing a generic EE command.  This just guarantees that
the value we read later was actually written as a result of the
command.

Also add the definitions of two more possible result codes that can
be returned when issuing flow control enable or disable commands:
  INCORRECT_CHANNEL_STATE: - channel must be in started state
  INCORRECT_DIRECTION - flow control is only valid for TX channels

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-30 18:04:58 -07:00
Alex Elder
0721999f15 net: ipa: fix an error message in gsi_channel_init_one()
An error message about limiting the number of TREs used prints the
wrong value.  Fix this bug.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-30 18:04:58 -07:00
Alex Elder
713b6ebb4c net: ipa: fix a bug in ipa_endpoint_stop()
In ipa_endpoint_stop(), for TX endpoints we set the number of retries
to 0.  When we break out of the loop, retries being 0 means we return
EIO rather than the value of ret (which should be 0).

Fix this by using a non-zero retry count for both RX and TX
channels, and just break out of the loop after calling
gsi_channel_stop() for TX channels.  This way only RX channels
will retry, and the retry count will be non-zero at the end
for TX channels (so the proper value gets returned).

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-30 18:04:58 -07:00
David S. Miller
de04604e23 Merge branch 'ionic-fw-upgrade-bug-fixes'
Shannon Nelson says:

====================
ionic: fw upgrade bug fixes

These patches address issues found in additional internal
fw-upgrade testing.

v2:
 - replaced extra state flag with postponing first link check
 - added device reset patch
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-30 18:02:46 -07:00
Shannon Nelson
6bc977faa0 ionic: add device reset to fw upgrade down
Doing a device reset addresses an obscure FW timing issue in
the FW upgrade process.

Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-30 18:02:46 -07:00
Shannon Nelson
1d53aedcf9 ionic: refresh devinfo after fw-upgrade
Make sure we can report the new FW version after a
fw-upgrade has finished by re-reading the device's
fw version information.

Fixes: c672412f61 ("ionic: remove lifs on fw reset")
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-30 18:02:46 -07:00
Shannon Nelson
16f3fd3dcc ionic: no link check until after probe
Don't bother with the link check during probe, let
the watchdog notice the first link-up.  This allows
probe to finish cleanly without any interruptions
from over excited user programs opening the device
as soon as it is registered.

Fixes: c672412f61 ("ionic: remove lifs on fw reset")
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-30 18:02:46 -07:00
Julia Lawall
865308373e dp83640: reverse arguments to list_add_tail
In this code, it appears that phyter_clocks is a list head, based on
the previous list_for_each, and that clock->list is intended to be a
list element, given that it has just been initialized in
dp83640_clock_init.  Accordingly, switch the arguments to
list_add_tail, which takes the list head as the second argument.

Fixes: cb646e2b02 ("ptp: Added a clock driver for the National Semiconductor PHYTER.")
Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-30 17:48:26 -07:00
Ido Schimmel
7979457b1d net: bridge: vlan: Add a schedule point during VLAN processing
User space can request to delete a range of VLANs from a bridge slave in
one netlink request. For each deleted VLAN the FDB needs to be traversed
in order to flush all the affected entries.

If a large range of VLANs is deleted and the number of FDB entries is
large or the FDB lock is contented, it is possible for the kernel to
loop through the deleted VLANs for a long time. In case preemption is
disabled, this can result in a soft lockup.

Fix this by adding a schedule point after each VLAN is deleted to yield
the CPU, if needed. This is safe because the VLANs are traversed in
process context.

Fixes: bdced7ef78 ("bridge: support for multiple vlans and vlan ranges in setlink and dellink requests")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: Stefan Priebe - Profihost AG <s.priebe@profihost.ag>
Tested-by: Stefan Priebe - Profihost AG <s.priebe@profihost.ag>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-30 17:45:41 -07:00
Juliet Kim
f9c6cea0b3 ibmvnic: Skip fatal error reset after passive init
During MTU change, the following events may happen.
Client-driven CRQ initialization fails due to partner’s CRQ closed,
causing client to enqueue a reset task for FATAL_ERROR. Then passive
(server-driven) CRQ initialization succeeds, causing client to
release CRQ and enqueue a reset task for failover. If the passive
CRQ initialization occurs before the FATAL reset task is processed,
the FATAL error reset task would try to access a CRQ message queue
that was freed, causing an oops. The problem may be most likely to
occur during DLPAR add vNIC with a non-default MTU, because the DLPAR
process will automatically issue a change MTU request.

Fix this by not processing fatal error reset if CRQ is passively
initialized after client-driven CRQ initialization fails.

Signed-off-by: Juliet Kim <julietk@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-30 13:27:17 -07:00
David S. Miller
81d6bc44fa Merge tag 'mlx5-fixes-2020-04-29' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:

====================
Mellanox, mlx5 fixes 2020-04-29

This series introduces some fixes to mlx5 driver.

Please pull and let me know if there is any problem.

v2:
 - Dropped the ktls patch, Tariq has to check if it is fixable in the stack

For -stable v4.12
 ('net/mlx5: Fix forced completion access non initialized command entry')
 ('net/mlx5: Fix command entry leak in Internal Error State')

For -stable v5.4
 ('net/mlx5: DR, On creation set CQ's arm_db member to right value')

For -stable v5.6
 ('net/mlx5e: Fix q counters on uplink representors')
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-30 12:58:11 -07:00
Paolo Abeni
ac2b47fb92 mptcp: fix uninitialized value access
tcp_v{4,6}_syn_recv_sock() set 'own_req' only when returning
a not NULL 'child', let's check 'own_req' only if child is
available to avoid an - unharmful - UBSAN splat.

v1 -> v2:
 - reference the correct hash

Fixes: 4c8941de78 ("mptcp: avoid flipping mp_capable field in syn_recv_sock()")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-30 12:34:07 -07:00
David S. Miller
8c75595360 Merge branch 'mptcp-fix-incoming-options-parsing'
Paolo Abeni says:

====================
mptcp: fix incoming options parsing

This series addresses a serious issue in MPTCP option parsing.

This is bigger than the usual -net change, but I was unable to find a
working, sane, smaller fix.

The core change is inside patch 2/5 which moved MPTCP options parsing from
the TCP code inside existing MPTCP hooks and clean MPTCP options status on
each processed packet.

The patch 1/5 is a needed pre-requisite, and patches 3,4,5 are smaller,
related fixes.

v1 -> v2:
 - cleaned-up patch 1/5
 - rebased on top of current -net
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-30 12:23:23 -07:00
Paolo Abeni
a77895dbc0 mptcp: initialize the data_fin field for mpc packets
When parsing MPC+data packets we set the dss field, so
we must also initialize the data_fin, or we can find stray
value there.

Fixes: 9a19371bf0 ("mptcp: fix data_fin handing in RX path")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-30 12:23:23 -07:00
Paolo Abeni
5a91e32b40 mptcp: fix 'use_ack' option access.
The mentioned RX option field is initialized only for DSS
packet, we must access it only if 'dss' is set too, or
the subflow will end-up in a bad status, leading to
RFC violations.

Fixes: d22f4988ff ("mptcp: process MP_CAPABLE data option")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-30 12:23:22 -07:00
Paolo Abeni
d6085fe19b mptcp: avoid a WARN on bad input.
Syzcaller has found a way to trigger the WARN_ON_ONCE condition
in check_fully_established().

The root cause is a legit fallback to TCP scenario, so replace
the WARN with a plain message on a more strict condition.

Fixes: f296234c98 ("mptcp: Add handling of incoming MP_JOIN requests")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-30 12:23:22 -07:00
Paolo Abeni
cfde141ea3 mptcp: move option parsing into mptcp_incoming_options()
The mptcp_options_received structure carries several per
packet flags (mp_capable, mp_join, etc.). Such fields must
be cleared on each packet, even on dropped ones or packet
not carrying any MPTCP options, but the current mptcp
code clears them only on TCP option reset.

On several races/corner cases we end-up with stray bits in
incoming options, leading to WARN_ON splats. e.g.:

[  171.164906] Bad mapping: ssn=32714 map_seq=1 map_data_len=32713
[  171.165006] WARNING: CPU: 1 PID: 5026 at net/mptcp/subflow.c:533 warn_bad_map (linux-mptcp/net/mptcp/subflow.c:533 linux-mptcp/net/mptcp/subflow.c:531)
[  171.167632] Modules linked in: ip6_vti ip_vti ip_gre ipip sit tunnel4 ip_tunnel geneve ip6_udp_tunnel udp_tunnel macsec macvtap tap ipvlan macvlan 8021q garp mrp xfrm_interface veth netdevsim nlmon dummy team bonding vcan bridge stp llc ip6_gre gre ip6_tunnel tunnel6 tun binfmt_misc intel_rapl_msr intel_rapl_common rfkill kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel joydev virtio_balloon pcspkr i2c_piix4 sunrpc ip_tables xfs libcrc32c crc32c_intel serio_raw virtio_console ata_generic virtio_blk virtio_net net_failover failover ata_piix libata
[  171.199464] CPU: 1 PID: 5026 Comm: repro Not tainted 5.7.0-rc1.mptcp_f227fdf5d388+ #95
[  171.200886] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-2.fc30 04/01/2014
[  171.202546] RIP: 0010:warn_bad_map (linux-mptcp/net/mptcp/subflow.c:533 linux-mptcp/net/mptcp/subflow.c:531)
[  171.206537] Code: c1 ea 03 0f b6 14 02 48 89 f8 83 e0 07 83 c0 03 38 d0 7c 04 84 d2 75 1d 8b 55 3c 44 89 e6 48 c7 c7 20 51 13 95 e8 37 8b 22 fe <0f> 0b 48 83 c4 08 5b 5d 41 5c c3 89 4c 24 04 e8 db d6 94 fe 8b 4c
[  171.220473] RSP: 0018:ffffc90000150560 EFLAGS: 00010282
[  171.221639] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[  171.223108] RDX: 0000000000000000 RSI: 0000000000000008 RDI: fffff5200002a09e
[  171.224388] RBP: ffff8880aa6e3c00 R08: 0000000000000001 R09: fffffbfff2ec9955
[  171.225706] R10: ffffffff9764caa7 R11: fffffbfff2ec9954 R12: 0000000000007fca
[  171.227211] R13: ffff8881066f4a7f R14: ffff8880aa6e3c00 R15: 0000000000000020
[  171.228460] FS:  00007f8623719740(0000) GS:ffff88810be00000(0000) knlGS:0000000000000000
[  171.230065] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  171.231303] CR2: 00007ffdab190a50 CR3: 00000001038ea006 CR4: 0000000000160ee0
[  171.232586] Call Trace:
[  171.233109]  <IRQ>
[  171.233531] get_mapping_status (linux-mptcp/net/mptcp/subflow.c:691)
[  171.234371] mptcp_subflow_data_available (linux-mptcp/net/mptcp/subflow.c:736 linux-mptcp/net/mptcp/subflow.c:832)
[  171.238181] subflow_state_change (linux-mptcp/net/mptcp/subflow.c:1085 (discriminator 1))
[  171.239066] tcp_fin (linux-mptcp/net/ipv4/tcp_input.c:4217)
[  171.240123] tcp_data_queue (linux-mptcp/./include/linux/compiler.h:199 linux-mptcp/net/ipv4/tcp_input.c:4822)
[  171.245083] tcp_rcv_established (linux-mptcp/./include/linux/skbuff.h:1785 linux-mptcp/./include/net/tcp.h:1774 linux-mptcp/./include/net/tcp.h:1847 linux-mptcp/net/ipv4/tcp_input.c:5238 linux-mptcp/net/ipv4/tcp_input.c:5730)
[  171.254089] tcp_v4_rcv (linux-mptcp/./include/linux/spinlock.h:393 linux-mptcp/net/ipv4/tcp_ipv4.c:2009)
[  171.258969] ip_protocol_deliver_rcu (linux-mptcp/net/ipv4/ip_input.c:204 (discriminator 1))
[  171.260214] ip_local_deliver_finish (linux-mptcp/./include/linux/rcupdate.h:651 linux-mptcp/net/ipv4/ip_input.c:232)
[  171.261389] ip_local_deliver (linux-mptcp/./include/linux/netfilter.h:307 linux-mptcp/./include/linux/netfilter.h:301 linux-mptcp/net/ipv4/ip_input.c:252)
[  171.265884] ip_rcv (linux-mptcp/./include/linux/netfilter.h:307 linux-mptcp/./include/linux/netfilter.h:301 linux-mptcp/net/ipv4/ip_input.c:539)
[  171.273666] process_backlog (linux-mptcp/./include/linux/rcupdate.h:651 linux-mptcp/net/core/dev.c:6135)
[  171.275328] net_rx_action (linux-mptcp/net/core/dev.c:6572 linux-mptcp/net/core/dev.c:6640)
[  171.280472] __do_softirq (linux-mptcp/./arch/x86/include/asm/jump_label.h:25 linux-mptcp/./include/linux/jump_label.h:200 linux-mptcp/./include/trace/events/irq.h:142 linux-mptcp/kernel/softirq.c:293)
[  171.281379] do_softirq_own_stack (linux-mptcp/arch/x86/entry/entry_64.S:1083)
[  171.282358]  </IRQ>

We could address the issue clearing explicitly the relevant fields
in several places - tcp_parse_option, tcp_fast_parse_options,
possibly others.

Instead we move the MPTCP option parsing into the already existing
mptcp ingress hook, so that we need to clear the fields in a single
place.

This allows us dropping an MPTCP hook from the TCP code and
removing the quite large mptcp_options_received from the tcp_sock
struct. On the flip side, the MPTCP sockets will traverse the
option space twice (in tcp_parse_option() and in
mptcp_incoming_options(). That looks acceptable: we already
do that for syn and 3rd ack packets, plain TCP socket will
benefit from it, and even MPTCP sockets will experience better
code locality, reducing the jumps between TCP and MPTCP code.

v1 -> v2:
 - rebased on current '-net' tree

Fixes: 648ef4b886 ("mptcp: Implement MPTCP receive path")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-30 12:23:22 -07:00
Paolo Abeni
263e1201a2 mptcp: consolidate synack processing.
Currently the MPTCP code uses 2 hooks to process syn-ack
packets, mptcp_rcv_synsent() and the sk_rx_dst_set()
callback.

We can drop the first, moving the relevant code into the
latter, reducing the hooking into the TCP code. This is
also needed by the next patch.

v1 -> v2:
 - use local tcp sock ptr instead of casting the sk variable
   several times - DaveM

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-30 12:23:22 -07:00
Rick Edgecombe
ab5130186d x86/mm/cpa: Flush direct map alias during cpa
As an optimization, cpa_flush() was changed to optionally only flush
the range in @cpa if it was small enough.  However, this range does
not include any direct map aliases changed in cpa_process_alias(). So
small set_memory_() calls that touch that alias don't get the direct
map changes flushed. This situation can happen when the virtual
address taking variants are passed an address in vmalloc or modules
space.

In these cases, force a full TLB flush.

Note this issue does not extend to cases where the set_memory_() calls are
passed a direct map address, or page array, etc, as the primary target. In
those cases the direct map would be flushed.

Fixes: 935f583982 ("x86/mm/cpa: Optimize cpa_flush_array() TLB invalidation")
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200424105343.GA20730@hirez.programming.kicks-ass.net
2020-04-30 20:14:30 +02:00
Roi Dayan
67b38de646 net/mlx5e: Fix q counters on uplink representors
Need to allocate the q counters before init_rx which needs them
when creating the rq.

Fixes: 8520fa57a4 ("net/mlx5e: Create q counters on uplink representors")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-04-30 09:20:33 -07:00
Moshe Shemesh
cece6f432c net/mlx5: Fix command entry leak in Internal Error State
Processing commands by cmd_work_handler() while already in Internal
Error State will result in entry leak, since the handler process force
completion without doorbell. Forced completion doesn't release the entry
and event completion will never arrive, so entry should be released.

Fixes: 73dd3a4839 ("net/mlx5: Avoid using pending command interface slots")
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-04-30 09:20:33 -07:00
Moshe Shemesh
f3cb3cebe2 net/mlx5: Fix forced completion access non initialized command entry
mlx5_cmd_flush() will trigger forced completions to all valid command
entries. Triggered by an asynch event such as fast teardown it can
happen at any stage of the command, including command initialization.
It will trigger forced completion and that can lead to completion on an
uninitialized command entry.

Setting MLX5_CMD_ENT_STATE_PENDING_COMP only after command entry is
initialized will ensure force completion is treated only if command
entry is initialized.

Fixes: 73dd3a4839 ("net/mlx5: Avoid using pending command interface slots")
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-04-30 09:20:32 -07:00
Erez Shitrit
8075411d93 net/mlx5: DR, On creation set CQ's arm_db member to right value
In polling mode, set arm_db member to a value that will avoid CQ
event recovery by the HW.
Otherwise we might get event without completion function.
In addition,empty completion function to was added to protect from
unexpected events.

Fixes: 297cccebdc ("net/mlx5: DR, Expose an internal API to issue RDMA operations")
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-04-30 09:20:32 -07:00
Parav Pandit
f8d1eddaf9 net/mlx5: E-switch, Fix mutex init order
In cited patch mutex is initialized after its used.
Below call trace is observed.
Fix the order to initialize the mutex early enough.
Similarly follow mirror sequence during cleanup.

kernel: DEBUG_LOCKS_WARN_ON(lock->magic != lock)
kernel: WARNING: CPU: 5 PID: 45916 at kernel/locking/mutex.c:938
__mutex_lock+0x7d6/0x8a0
kernel: Call Trace:
kernel: ? esw_vport_tbl_get+0x3b/0x250 [mlx5_core]
kernel: ? mark_held_locks+0x55/0x70
kernel: ? __slab_free+0x274/0x400
kernel: ? lockdep_hardirqs_on+0x140/0x1d0
kernel: esw_vport_tbl_get+0x3b/0x250 [mlx5_core]
kernel: ? mlx5_esw_chains_create_fdb_prio+0xa57/0xc20 [mlx5_core]
kernel: mlx5_esw_vport_tbl_get+0x88/0xf0 [mlx5_core]
kernel: mlx5_esw_chains_create+0x2f3/0x3e0 [mlx5_core]
kernel: esw_create_offloads_fdb_tables+0x11d/0x580 [mlx5_core]
kernel: esw_offloads_enable+0x26d/0x540 [mlx5_core]
kernel: mlx5_eswitch_enable_locked+0x155/0x860 [mlx5_core]
kernel: mlx5_devlink_eswitch_mode_set+0x1af/0x320 [mlx5_core]
kernel: devlink_nl_cmd_eswitch_set_doit+0x41/0xb0

Fixes: 96e326878f ("net/mlx5e: Eswitch, Use per vport tables for mirroring")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-04-30 09:20:32 -07:00
Parav Pandit
e986453905 net/mlx5: E-switch, Fix printing wrong error value
When mlx5_modify_header_alloc() fails, instead of printing the error
value returned, current error log prints 0.

Fix by printing correct error value returned by
mlx5_modify_header_alloc().

Fixes: 6724e66b90 ("net/mlx5: E-Switch, Get reg_c1 value on miss")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-04-30 09:20:32 -07:00
Parav Pandit
799499850a net/mlx5: E-switch, Fix error unwinding flow for steering init failure
Error unwinding is done incorrectly in the cited commit.
When steering init fails, there is no need to perform steering cleanup.
When vport error exists, error cleanup should be mirror of the setup
routine, i.e. to perform steering cleanup before metadata cleanup.

This avoids the call trace in accessing uninitialized objects which are
skipped during steering_init() due to failure in steering_init().

Call trace:
mlx5_cmd_modify_header_alloc:805:(pid 21128): too many modify header
actions 1, max supported 0
E-Switch: Failed to create restore mod header

BUG: kernel NULL pointer dereference, address: 00000000000000d0
[  677.263079]  mlx5_destroy_flow_group+0x13/0x80 [mlx5_core]
[  677.268921]  esw_offloads_steering_cleanup+0x51/0xf0 [mlx5_core]
[  677.275281]  esw_offloads_enable+0x1a5/0x800 [mlx5_core]
[  677.280949]  mlx5_eswitch_enable_locked+0x155/0x860 [mlx5_core]
[  677.287227]  mlx5_devlink_eswitch_mode_set+0x1af/0x320
[  677.293741]  devlink_nl_cmd_eswitch_set_doit+0x41/0xb0
[  677.299217]  genl_rcv_msg+0x1eb/0x430

Fixes: 7983a675ba ("net/mlx5: E-Switch, Enable chains only if regs loopback is enabled")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-04-30 09:20:31 -07:00
Greg Kroah-Hartman
73faaa623f Merge tag 'phy-for-5.7-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy into char-misc-linus
phy: for 5.7 -rc

*) Update MAINTAINER to include Vinod Koul as co-maintainer of PHY
*) Fix Kconfig dependencies in seen with PHY_TEGRA_XUSB
*) Re-add "qcom,sdm845-qusb2-phy" compatible in phy-qcom-qusb2.c to
   make it work with existing dtbs
*) Move clock enable from ->poweron() to ->init() in Qualcomm
   usb-hs-28nm driver to initialize HW in ->init()

Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>

* tag 'phy-for-5.7-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy:
  phy: qualcomm: usb-hs-28nm: Prepare clocks in init
  MAINTAINERS: Add Vinod Koul as Generic PHY co-maintainer
  phy: qcom-qusb2: Re add "qcom,sdm845-qusb2-phy" compat string
  phy: tegra: Select USB_COMMON for usb_get_maximum_speed()
2020-04-30 15:29:49 +02:00
Marc Zyngier
958e8e14fd KVM: arm64: vgic-v4: Initialize GICv4.1 even in the absence of a virtual ITS
KVM now expects to be able to use HW-accelerated delivery of vSGIs
as soon as the guest has enabled thm. Unfortunately, we only
initialize the GICv4 context if we have a virtual ITS exposed to
the guest.

Fix it by always initializing the GICv4.1 context if it is
available on the host.

Fixes: 2291ff2f2a ("KVM: arm64: GICv4.1: Plumb SGI implementation selection in the distributor")
Reviewed-by: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2020-04-30 12:50:23 +01:00
Marc Zyngier
6e977984f6 KVM: arm64: Save/restore sp_el0 as part of __guest_enter
We currently save/restore sp_el0 in C code. This is a bit unsafe,
as a lot of the C code expects 'current' to be accessible from
there (and the opportunity to run kernel code in HYP is specially
great with VHE).

Instead, let's move the save/restore of sp_el0 to the assembly
code (in __guest_enter), making sure that sp_el0 is correct
very early on when we exit the guest, and is preserved as long
as possible to its host value when we enter the guest.

Reviewed-by: Andrew Jones <drjones@redhat.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2020-04-30 12:01:35 +01:00
Fangrui Song
6aea9e0503 KVM: arm64: Delete duplicated label in invalid_vector
SYM_CODE_START defines \label , so it is redundant to define \label again.
A redefinition at the same place is accepted by GNU as
(https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=159fbb6088f17a341bcaaac960623cab881b4981)
but rejected by the clang integrated assembler.

Fixes: 617a2f392c ("arm64: kvm: Annotate assembly using modern annoations")
Signed-off-by: Fangrui Song <maskray@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Link: https://github.com/ClangBuiltLinux/linux/issues/988
Link: https://lore.kernel.org/r/20200413231016.250737-1-maskray@google.com
2020-04-30 11:17:00 +01:00
Oliver Neukum
9f04db234a USB: uas: add quirk for LaCie 2Big Quadra
This device needs US_FL_NO_REPORT_OPCODES to avoid going
through prolonged error handling on enumeration.

Signed-off-by: Oliver Neukum <oneukum@suse.com>
Reported-by: Julian Groß <julian.g@posteo.de>
Cc: stable <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20200429155218.7308-1-oneukum@suse.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-30 09:28:43 +02:00
Bjorn Andersson
820eeb9de6 phy: qualcomm: usb-hs-28nm: Prepare clocks in init
The AHB clock must be on for qcom_snps_hsphy_init() to be able to write
the initialization sequence to the hardware, so move the clock
enablement to phy init and exit.

Fixes: 67b27dbeac ("phy: qualcomm: Add Synopsys 28nm Hi-Speed USB PHY driver")
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Reviewed-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org>
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
2020-04-30 12:10:49 +05:30
Kishon Vijay Abraham I
6f8280cec1 MAINTAINERS: Add Vinod Koul as Generic PHY co-maintainer
Add Vinod Koul as Generic PHY Subsystem co-maintainer and move
the linux-phy to a shared repository.

Cc: Vinod Koul <vkoul@kernel.org>
Acked-By: Vinod Koul <vkoul@kernel.org>
Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
2020-04-30 12:10:43 +05:30
Jason A. Donenfeld
a9a8ba90fa crypto: arch/nhpoly1305 - process in explicit 4k chunks
Rather than chunking via PAGE_SIZE, this commit changes the arch
implementations to chunk in explicit 4k parts, so that calculations on
maximum acceptable latency don't suddenly become invalid on platforms
where PAGE_SIZE isn't 4k, such as arm64.

Fixes: 0f961f9f67 ("crypto: x86/nhpoly1305 - add AVX2 accelerated NHPoly1305")
Fixes: 012c82388c ("crypto: x86/nhpoly1305 - add SSE2 accelerated NHPoly1305")
Fixes: a00fa0c887 ("crypto: arm64/nhpoly1305 - add NEON-accelerated NHPoly1305")
Fixes: 16aae3595a ("crypto: arm/nhpoly1305 - add NEON-accelerated NHPoly1305")
Cc: stable@vger.kernel.org
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-04-30 15:16:59 +10:00
Jason A. Donenfeld
706024a52c crypto: arch/lib - limit simd usage to 4k chunks
The initial Zinc patchset, after some mailing list discussion, contained
code to ensure that kernel_fpu_enable would not be kept on for more than
a 4k chunk, since it disables preemption. The choice of 4k isn't totally
scientific, but it's not a bad guess either, and it's what's used in
both the x86 poly1305, blake2s, and nhpoly1305 code already (in the form
of PAGE_SIZE, which this commit corrects to be explicitly 4k for the
former two).

Ard did some back of the envelope calculations and found that
at 5 cycles/byte (overestimate) on a 1ghz processor (pretty slow), 4k
means we have a maximum preemption disabling of 20us, which Sebastian
confirmed was probably a good limit.

Unfortunately the chunking appears to have been left out of the final
patchset that added the glue code. So, this commit adds it back in.

Fixes: 84e03fa39f ("crypto: x86/chacha - expose SIMD ChaCha routine as library function")
Fixes: b3aad5bad2 ("crypto: arm64/chacha - expose arm64 ChaCha routine as library function")
Fixes: a44a3430d7 ("crypto: arm/chacha - expose ARM ChaCha routine as library function")
Fixes: d7d7b85356 ("crypto: x86/poly1305 - wire up faster implementations for kernel")
Fixes: f569ca1647 ("crypto: arm64/poly1305 - incorporate OpenSSL/CRYPTOGAMS NEON implementation")
Fixes: a6b803b3dd ("crypto: arm/poly1305 - incorporate OpenSSL/CRYPTOGAMS NEON implementation")
Fixes: ed0356eda1 ("crypto: blake2s - x86_64 SIMD implementation")
Cc: Eric Biggers <ebiggers@google.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: stable@vger.kernel.org
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-04-30 15:16:59 +10:00
David S. Miller
30724ccbfc Merge branch 'wireguard-fixes'
Jason A. Donenfeld says:

====================
wireguard fixes for 5.7-rc4

This series contains two fixes and a cleanup for wireguard:

1) Removal of a spurious newline, from Sultan Alsawaf.

2) Fix for a memory leak in an error path, in which memory allocated
   prior to the error wasn't freed, reported by Sultan Alsawaf.

3) Fix to ECN support to use RFC6040 properly like all the other tunnel
   drivers, from Toke Høiland-Jørgensen.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-29 14:23:05 -07:00
Toke Høiland-Jørgensen
eebabcb26e wireguard: receive: use tunnel helpers for decapsulating ECN markings
WireGuard currently only propagates ECN markings on tunnel decap according
to the old RFC3168 specification. However, the spec has since been updated
in RFC6040 to recommend slightly different decapsulation semantics. This
was implemented in the kernel as a set of common helpers for ECN
decapsulation, so let's just switch over WireGuard to using those, so it
can benefit from this enhancement and any future tweaks. We do not drop
packets with invalid ECN marking combinations, because WireGuard is
frequently used to work around broken ISPs, which could be doing that.

Fixes: e7096c131e ("net: WireGuard secure network tunnel")
Reported-by: Olivier Tilmans <olivier.tilmans@nokia-bell-labs.com>
Cc: Dave Taht <dave.taht@gmail.com>
Cc: Rodney W. Grimes <ietf@gndrsh.dnsmgr.net>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-29 14:23:05 -07:00
Jason A. Donenfeld
130c586061 wireguard: queueing: cleanup ptr_ring in error path of packet_queue_init
Prior, if the alloc_percpu of packet_percpu_multicore_worker_alloc
failed, the previously allocated ptr_ring wouldn't be freed. This commit
adds the missing call to ptr_ring_cleanup in the error case.

Reported-by: Sultan Alsawaf <sultan@kerneltoast.com>
Fixes: e7096c131e ("net: WireGuard secure network tunnel")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-29 14:23:05 -07:00
Sultan Alsawaf
d6833e4278 wireguard: send: remove errant newline from packet_encrypt_worker
This commit removes a useless newline at the end of a scope, which
doesn't add anything in the way of organization or readability.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-29 14:23:05 -07:00
Gwendal Grignou
b31d1d2b1c platform/chrome: cros_ec_sensorhub: Allocate sensorhub resource before claiming sensors
Allocate callbacks array before enumerating the sensors: The probe routine
for these sensors (for instance cros_ec_sensors_probe) can be called
within the sensorhub probe routine (cros_ec_sensors_probe())

Fixes: 145d59baff ("platform/chrome: cros_ec_sensorhub: Add FIFO support")
Signed-off-by: Gwendal Grignou <gwendal@chromium.org>
Reported-by: Douglas Anderson <dianders@chromium.org>
Tested-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Enric Balletbo i Serra <enric.balletbo@collabora.com>
2020-04-29 23:17:45 +02:00
Arnd Bergmann
2465f0d5c9 HID: mcp2221: add gpiolib dependency
Without gpiolib, this driver fails to link:

arm-linux-gnueabi-ld: drivers/hid/hid-mcp2221.o: in function `mcp2221_probe':
hid-mcp2221.c:(.text+0x1b0): undefined reference to `devm_gpiochip_add_data'
arm-linux-gnueabi-ld: drivers/hid/hid-mcp2221.o: in function `mcp_gpio_get':
hid-mcp2221.c:(.text+0x30c): undefined reference to `gpiochip_get_data'

Fixes: 328de1c519 ("HID: mcp2221: add GPIO functionality support")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Rishi Gupta <gupt21@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2020-04-29 21:47:41 +02:00
Daniel Playfair Cal
538f67407e HID: i2c-hid: reset Synaptics SYNA2393 on resume
On the Dell XPS 9570, the Synaptics SYNA2393 touchpad generates spurious
interrupts after resuming from suspend until it receives some input or
is reset. Add it to the quirk I2C_HID_QUIRK_RESET_ON_RESUME so that it
is reset when resuming from suspend.

More information about the bug can be found in this mailing list
discussion: https://www.spinics.net/lists/linux-input/msg59530.html

Signed-off-by: Daniel Playfair Cal <daniel.playfair.cal@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2020-04-29 21:44:48 +02:00
Jason Gerecke
dcce8ef8f7 HID: wacom: Report 2nd-gen Intuos Pro S center button status over BT
The state of the center button was not reported to userspace for the
2nd-gen Intuos Pro S when used over Bluetooth due to the pad handling
code not being updated to support its reduced number of buttons. This
patch uses the actual number of buttons present on the tablet to
assemble a button state bitmap.

Link: https://github.com/linuxwacom/xf86-input-wacom/issues/112
Fixes: cd47de45b855 ("HID: wacom: Add 2nd gen Intuos Pro Small support")
Signed-off-by: Jason Gerecke <jason.gerecke@wacom.com>
Cc: stable@vger.kernel.org # v5.3+
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2020-04-29 21:40:53 +02:00
Florian Westphal
42c556fef9 mptcp: replace mptcp_disconnect with a stub
Paolo points out that mptcp_disconnect is bogus:
"lock_sock(sk);
looks suspicious (lock should be already held by the caller)
And call to: tcp_disconnect(sk, flags); too, sk is not a tcp
socket".

->disconnect() gets called from e.g. inet_stream_connect when
one tries to disassociate a connected socket again (to re-connect
without closing the socket first).
MPTCP however uses mptcp_stream_connect, not inet_stream_connect,
for the mptcp-socket connect call.

inet_stream_connect only gets called indirectly, for the tcp socket,
so any ->disconnect() calls end up calling tcp_disconnect for that
tcp subflow sk.

This also explains why syzkaller has not yet reported a problem
here.  So for now replace this with a stub that doesn't do anything.

Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/14
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-29 12:39:49 -07:00
Arnd Bergmann
c165d57b55 netfilter: nf_osf: avoid passing pointer to local var
gcc-10 points out that a code path exists where a pointer to a stack
variable may be passed back to the caller:

net/netfilter/nfnetlink_osf.c: In function 'nf_osf_hdr_ctx_init':
cc1: warning: function may return address of local variable [-Wreturn-local-addr]
net/netfilter/nfnetlink_osf.c:171:16: note: declared here
  171 |  struct tcphdr _tcph;
      |                ^~~~~

I am not sure whether this can happen in practice, but moving the
variable declaration into the callers avoids the problem.

Fixes: 31a9c29210 ("netfilter: nf_osf: add struct nf_osf_hdr_ctx")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-04-29 21:17:57 +02:00
Jason Yan
9812307491 net: dsa: mv88e6xxx: remove duplicate assignment of struct members
These struct members named 'phylink_validate' was assigned twice:

static const struct mv88e6xxx_ops mv88e6190_ops = {
	......
	.phylink_validate = mv88e6390_phylink_validate,
	......
	.phylink_validate = mv88e6390_phylink_validate,
};

static const struct mv88e6xxx_ops mv88e6190x_ops = {
	......
	.phylink_validate = mv88e6390_phylink_validate,
	......
	.phylink_validate = mv88e6390x_phylink_validate,
};

static const struct mv88e6xxx_ops mv88e6191_ops = {
	......
	.phylink_validate = mv88e6390_phylink_validate,
	......
	.phylink_validate = mv88e6390_phylink_validate,
};

static const struct mv88e6xxx_ops mv88e6290_ops = {
	......
	.phylink_validate = mv88e6390_phylink_validate,
	......
	.phylink_validate = mv88e6390_phylink_validate,
};

Remove all the first one and leave the second one which are been used in
fact. Be aware that for 'mv88e6190x_ops' the assignment functions is
different while the others are all the same. This fixes the following
coccicheck warning:

drivers/net/dsa/mv88e6xxx/chip.c:3911:48-49: phylink_validate: first
occurrence line 3965, second occurrence line 3967
drivers/net/dsa/mv88e6xxx/chip.c:3970:49-50: phylink_validate: first
occurrence line 4024, second occurrence line 4026
drivers/net/dsa/mv88e6xxx/chip.c:4029:48-49: phylink_validate: first
occurrence line 4082, second occurrence line 4085
drivers/net/dsa/mv88e6xxx/chip.c:4184:48-49: phylink_validate: first
occurrence line 4238, second occurrence line 4242

Fixes: 4262c38dc4 ("net: dsa: mv88e6xxx: Add SERDES stats counters to all 6390 family members")
Signed-off-by: Jason Yan <yanaijie@huawei.com>
Reviewed-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-29 12:14:48 -07:00
John Stultz
2a15483b40 regulator: Revert "Use driver_deferred_probe_timeout for regulator_init_complete_work"
This reverts commit dca0b44957 ("regulator: Use
driver_deferred_probe_timeout for regulator_init_complete_work"),
as we ended up reverting the default deferred_probe_timeout
value back to zero, to preserve behavior with 5.6 we need to
decouple the regulator timeout which was previously 30 seconds.

This avoids breaking some systems that depend on the regulator
timeout but don't require the deferred probe timeout.

Cc: linux-pm@vger.kernel.org
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: Thierry Reding <treding@nvidia.com>
Cc: Liam Girdwood <lgirdwood@gmail.com>
Cc: Bjorn Andersson <bjorn.andersson@linaro.org>
Cc: Saravana Kannan <saravanak@google.com>
Cc: Todd Kjos <tkjos@google.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Ulf Hansson <ulf.hansson@linaro.org>
Cc: Kevin Hilman <khilman@kernel.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Rob Herring <robh@kernel.org>
Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
Suggested-by: Mark Brown <broonie@kernel.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Reviewed-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20200429172349.55979-1-john.stultz@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-29 19:57:45 +02:00
Alan Stern
0ed08faded HID: usbhid: Fix race between usbhid_close() and usbhid_stop()
The syzbot fuzzer discovered a bad race between in the usbhid driver
between usbhid_stop() and usbhid_close().  In particular,
usbhid_stop() does:

	usb_free_urb(usbhid->urbin);
	...
	usbhid->urbin = NULL; /* don't mess up next start */

and usbhid_close() does:

	usb_kill_urb(usbhid->urbin);

with no mutual exclusion.  If the two routines happen to run
concurrently so that usb_kill_urb() is called in between the
usb_free_urb() and the NULL assignment, it will access the
deallocated urb structure -- a use-after-free bug.

This patch adds a mutex to the usbhid private structure and uses it to
enforce mutual exclusion of the usbhid_start(), usbhid_stop(),
usbhid_open() and usbhid_close() callbacks.

Reported-and-tested-by: syzbot+7bf5a7b0f0a1f9446f4c@syzkaller.appspotmail.com
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
CC: <stable@vger.kernel.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2020-04-29 16:24:26 +02:00
ChenTao
5409e0cca5 interconnect: qcom: Move the static keyword to the front of declaration
Fix the following warning:

Move the static keyword to the front of declaration of sdm845_icc_osm_l3
sdm845_aggre1_noc sc7180_icc_osm_l3 sdm845_aggre2_noc sdm845_config_noc
sdm845_dc_noc sdm845_gladiator_noc sdm845_mem_noc sdm845_mmss_noc and
sdm845_system_noc, resolve the following compiler warning that can be
when building with warnings enabled (W=1):

drivers/interconnect/qcom/osm-l3.c:81:1: warning:
 const static struct qcom_icc_desc sdm845_icc_osm_l3 = {
drivers/interconnect/qcom/osm-l3.c:94:1: warning:
 const static struct qcom_icc_desc sc7180_icc_osm_l3 = {
drivers/interconnect/qcom/sdm845.c:195:1: warning:
 const static struct qcom_icc_desc sdm845_aggre1_noc = {
drivers/interconnect/qcom/sdm845.c:223:1: warning:
 const static struct qcom_icc_desc sdm845_aggre2_noc = {
drivers/interconnect/qcom/sdm845.c:284:1: warning:
 const static struct qcom_icc_desc sdm845_config_noc = {
drivers/interconnect/qcom/sdm845.c:300:1: warning:
 const static struct qcom_icc_desc sdm845_dc_noc = {
drivers/interconnect/qcom/sdm845.c:318:1: warning:
 const static struct qcom_icc_desc sdm845_gladiator_noc = {
drivers/interconnect/qcom/sdm845.c:353:1: warning:
 const static struct qcom_icc_desc sdm845_mem_noc = {
drivers/interconnect/qcom/sdm845.c:387:1: warning:
 const static struct qcom_icc_desc sdm845_mmss_noc = {
drivers/interconnect/qcom/sdm845.c:433:1: warning:
 const static struct qcom_icc_desc sdm845_system_noc = {

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: ChenTao <chentao107@huawei.com>
Link: https://lore.kernel.org/r/20200423132142.45174-1-chentao107@huawei.com
Signed-off-by: Georgi Djakov <georgi.djakov@linaro.org>
Link: https://lore.kernel.org/r/20200429101904.5771-2-georgi.djakov@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-29 13:11:44 +02:00
Lukas Bulwahn
0c1d3f2c9a MAINTAINERS: remove entry after hp100 driver removal
Commit a10079c662 ("staging: remove hp100 driver") removed all files
from ./drivers/staging/hp/, but missed to adjust MAINTAINERS.

Since then, ./scripts/get_maintainer.pl --self-test=patterns complains:

  warning: no file matches F: drivers/staging/hp/hp100.*

So, drop HP100 Driver entry in MAINTAINERS now.

Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Link: https://lore.kernel.org/r/20200429042116.29126-1-lukas.bulwahn@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-29 09:37:44 +02:00
Tyrel Datwyler
b36522150e scsi: ibmvscsi: Fix WARN_ON during event pool release
While removing an ibmvscsi client adapter a WARN_ON like the following is
seen in the kernel log:

drmgr: drmgr: -r -c slot -s U9080.M9S.783AEC8-V11-C11 -w 5 -d 1
WARNING: CPU: 9 PID: 24062 at ../kernel/dma/mapping.c:311 dma_free_attrs+0x78/0x110
Supported: No, Unreleased kernel
CPU: 9 PID: 24062 Comm: drmgr Kdump: loaded Tainted: G               X 5.3.18-12-default
NIP:  c0000000001fa758 LR: c0000000001fa744 CTR: c0000000001fa6e0
REGS: c0000002173375d0 TRAP: 0700   Tainted: G               X (5.3.18-12-default)
MSR:  8000000000029033 <SF,EE,ME,IR,DR,RI,LE>  CR: 28088282  XER: 20000000
CFAR: c0000000001fbf0c IRQMASK: 1
GPR00: c0000000001fa744 c000000217337860 c00000000161ab00 0000000000000000
GPR04: 0000000000000000 c000011e12250000 0000000018010000 0000000000000000
GPR08: 0000000000000000 0000000000000001 0000000000000001 c0080000190f4fa8
GPR12: c0000000001fa6e0 c000000007fc2a00 0000000000000000 0000000000000000
GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR24: 000000011420e310 0000000000000000 0000000000000000 0000000018010000
GPR28: c00000000159de50 c000011e12250000 0000000000006600 c000011e5c994848
NIP [c0000000001fa758] dma_free_attrs+0x78/0x110
LR [c0000000001fa744] dma_free_attrs+0x64/0x110
Call Trace:
[c000000217337860] [000000011420e310] 0x11420e310 (unreliable)
[c0000002173378b0] [c0080000190f0280] release_event_pool+0xd8/0x120 [ibmvscsi]
[c000000217337930] [c0080000190f3f74] ibmvscsi_remove+0x6c/0x160 [ibmvscsi]
[c000000217337960] [c0000000000f3cac] vio_bus_remove+0x5c/0x100
[c0000002173379a0] [c00000000087a0a4] device_release_driver_internal+0x154/0x280
[c0000002173379e0] [c0000000008777cc] bus_remove_device+0x11c/0x220
[c000000217337a60] [c000000000870fc4] device_del+0x1c4/0x470
[c000000217337b10] [c0000000008712a0] device_unregister+0x30/0xa0
[c000000217337b80] [c0000000000f39ec] vio_unregister_device+0x2c/0x60
[c000000217337bb0] [c00800001a1d0964] dlpar_remove_slot+0x14c/0x250 [rpadlpar_io]
[c000000217337c50] [c00800001a1d0bcc] remove_slot_store+0xa4/0x110 [rpadlpar_io]
[c000000217337cd0] [c000000000c091a0] kobj_attr_store+0x30/0x50
[c000000217337cf0] [c00000000057c934] sysfs_kf_write+0x64/0x90
[c000000217337d10] [c00000000057be10] kernfs_fop_write+0x1b0/0x290
[c000000217337d60] [c000000000488c4c] __vfs_write+0x3c/0x70
[c000000217337d80] [c00000000048c648] vfs_write+0xd8/0x260
[c000000217337dd0] [c00000000048ca8c] ksys_write+0xdc/0x130
[c000000217337e20] [c00000000000b488] system_call+0x5c/0x70
Instruction dump:
7c840074 f8010010 f821ffb1 20840040 eb830218 7c8407b4 48002019 60000000
2fa30000 409e003c 892d0988 792907e0 <0b090000> 2fbd0000 419e0028 2fbc0000
---[ end trace 5955b3c0cc079942 ]---
rpadlpar_io: slot U9080.M9S.783AEC8-V11-C11 removed

This is tripped as a result of irqs being disabled during the call to
dma_free_coherent() by release_event_pool(). At this point in the code path
we have quiesced the adapter and it is overly paranoid to be holding the
host lock.

[mkp: fixed build warning reported by sfr]

Link: https://lore.kernel.org/r/1588027793-17952-1-git-send-email-tyreld@linux.ibm.com
Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-04-28 22:59:55 -04:00
YueHaibing
8999dc8949 net/x25: Fix null-ptr-deref in x25_disconnect
We should check null before do x25_neigh_put in x25_disconnect,
otherwise may cause null-ptr-deref like this:

 #include <sys/socket.h>
 #include <linux/x25.h>

 int main() {
    int sck_x25;
    sck_x25 = socket(AF_X25, SOCK_SEQPACKET, 0);
    close(sck_x25);
    return 0;
 }

BUG: kernel NULL pointer dereference, address: 00000000000000d8
CPU: 0 PID: 4817 Comm: t2 Not tainted 5.7.0-rc3+ #159
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.3-
RIP: 0010:x25_disconnect+0x91/0xe0
Call Trace:
 x25_release+0x18a/0x1b0
 __sock_release+0x3d/0xc0
 sock_close+0x13/0x20
 __fput+0x107/0x270
 ____fput+0x9/0x10
 task_work_run+0x6d/0xb0
 exit_to_usermode_loop+0x102/0x110
 do_syscall_64+0x23c/0x260
 entry_SYSCALL_64_after_hwframe+0x49/0xb3

Reported-by: syzbot+6db548b615e5aeefdce2@syzkaller.appspotmail.com
Fixes: 4becb7ee5b ("net/x25: Fix x25_neigh refcnt leak when x25 disconnect")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-28 14:08:59 -07:00
Gavin Shan
caec66198d net/ena: Fix build warning in ena_xdp_set()
This fixes the following build warning in ena_xdp_set(), which is
observed on aarch64 with 64KB page size.

   In file included from ./include/net/inet_sock.h:19,
      from ./include/net/ip.h:27,
      from drivers/net/ethernet/amazon/ena/ena_netdev.c:46:
   drivers/net/ethernet/amazon/ena/ena_netdev.c: In function         \
   ‘ena_xdp_set’:                                                    \
   drivers/net/ethernet/amazon/ena/ena_netdev.c:557:6: warning:      \
   format ‘%lu’                                                      \
   expects argument of type ‘long unsigned int’, but argument 4      \
   has type ‘int’                                                    \
   [-Wformat=] "Failed to set xdp program, the current MTU (%d) is   \
   larger than the maximum allowed MTU (%lu) while xdp is on",

Signed-off-by: Gavin Shan <gshan@redhat.com>
Acked-by: Shay Agroskin <shayagr@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-28 13:58:16 -07:00
Mika Westerberg
c3bf993092 thunderbolt: Check return value of tb_sw_read() in usb4_switch_op()
The function misses checking return value of tb_sw_read() before it
accesses the value that was read. Fix this by checking the return value
first.

Fixes: b04079837b ("thunderbolt: Add initial support for USB4")
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Yehezkel Bernat <yehezkelshb@gmail.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-28 19:00:59 +02:00
John Stultz
35a672363a driver core: Ensure wait_for_device_probe() waits until the deferred_probe_timeout fires
In commit c8c43cee29 ("driver core: Fix
driver_deferred_probe_check_state() logic"), we set the default
driver_deferred_probe_timeout value to 30 seconds to allow for
drivers that are missing dependencies to have some time so that
the dependency may be loaded from userland after initcalls_done
is set.

However, Yoshihiro Shimoda reported that on his device that
expects to have unmet dependencies (due to "optional links" in
its devicetree), was failing to mount the NFS root.

In digging further, it seemed the problem was that while the
device properly probes after waiting 30 seconds for any missing
modules to load, the ip_auto_config() had already failed,
resulting in NFS to fail. This was due to ip_auto_config()
calling wait_for_device_probe() which doesn't wait for the
driver_deferred_probe_timeout to fire.

This patch tries to fix the issue by creating a waitqueue
for the driver_deferred_probe_timeout, and calling wait_event()
to make sure driver_deferred_probe_timeout is zero in
wait_for_device_probe() to make sure all the probing is
finished.

The downside to this solution is that kernel functionality that
uses wait_for_device_probe(), will block until the
driver_deferred_probe_timeout fires, regardless of if there is
any missing dependencies.

However, the previous patch reverts the default timeout value to
zero, so this side-effect will only affect users who specify a
driver_deferred_probe_timeout= value as a boot argument, where
the additional delay would be beneficial to allow modules to
load later during boot.

Thanks to Geert for chasing down that ip_auto_config was why NFS
was failing in this case!

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Rob Herring <robh@kernel.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Sudeep Holla <sudeep.holla@arm.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Naresh Kamboju <naresh.kamboju@linaro.org>
Cc: Basil Eljuse <Basil.Eljuse@arm.com>
Cc: Ferry Toth <fntoth@gmail.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Anders Roxell <anders.roxell@linaro.org>
Cc: linux-pm@vger.kernel.org
Reported-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Tested-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Fixes: c8c43cee29 ("driver core: Fix driver_deferred_probe_check_state() logic")
Signed-off-by: John Stultz <john.stultz@linaro.org>
Link: https://lore.kernel.org/r/20200422203245.83244-4-john.stultz@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-28 17:57:13 +02:00
John Stultz
4ccc03e28e driver core: Use dev_warn() instead of dev_WARN() for deferred_probe_timeout warnings
In commit c8c43cee29 ("driver core: Fix
driver_deferred_probe_check_state() logic") and following
changes the logic was changes slightly so that if there is no
driver to match whats found in the dtb, we wait the sepcified
seconds for modules to be loaded by userland, and then timeout,
where as previously we'd print "ignoring dependency for device,
assuming no driver" and immediately return -ENODEV after
initcall_done.

However, in the timeout case (which previously existed but was
practicaly un-used without a boot argument), the timeout message
uses dev_WARN(). This means folks are now seeing a big backtrace
in their boot logs if there a entry in their dts that doesn't
have a driver.

To fix this, lets use dev_warn(), instead of dev_WARN() to match
the previous error path.

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Rob Herring <robh@kernel.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Sudeep Holla <sudeep.holla@arm.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Naresh Kamboju <naresh.kamboju@linaro.org>
Cc: Basil Eljuse <Basil.Eljuse@arm.com>
Cc: Ferry Toth <fntoth@gmail.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Anders Roxell <anders.roxell@linaro.org>
Cc: linux-pm@vger.kernel.org
Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Fixes: c8c43cee29 ("driver core: Fix driver_deferred_probe_check_state() logic")
Signed-off-by: John Stultz <john.stultz@linaro.org>
Link: https://lore.kernel.org/r/20200422203245.83244-3-john.stultz@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-28 17:57:13 +02:00
John Stultz
ce68929f07 driver core: Revert default driver_deferred_probe_timeout value to 0
This patch addresses a regression in 5.7-rc1+

In commit c8c43cee29 ("driver core: Fix
driver_deferred_probe_check_state() logic"), we both cleaned up
the logic and also set the default driver_deferred_probe_timeout
value to 30 seconds to allow for drivers that are missing
dependencies to have some time so that the dependency may be
loaded from userland after initcalls_done is set.

However, Yoshihiro Shimoda reported that on his device that
expects to have unmet dependencies (due to "optional links" in
its devicetree), was failing to mount the NFS root.

In digging further, it seemed the problem was that while the
device properly probes after waiting 30 seconds for any missing
modules to load, the ip_auto_config() had already failed,
resulting in NFS to fail. This was due to ip_auto_config()
calling wait_for_device_probe() which doesn't wait for the
driver_deferred_probe_timeout to fire.

Fixing that issue is possible, but could also introduce 30
second delays in bootups for users who don't have any
missing dependencies, which is not ideal.

So I think the best solution to avoid any regressions is to
revert back to a default timeout value of zero, and allow
systems that need to utilize the timeout in order for userland
to load any modules that supply misisng dependencies in the dts
to specify the timeout length via the exiting documented boot
argument.

Thanks to Geert for chasing down that ip_auto_config was why NFS
was failing in this case!

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Rob Herring <robh@kernel.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Sudeep Holla <sudeep.holla@arm.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Naresh Kamboju <naresh.kamboju@linaro.org>
Cc: Basil Eljuse <Basil.Eljuse@arm.com>
Cc: Ferry Toth <fntoth@gmail.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Anders Roxell <anders.roxell@linaro.org>
Reported-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Tested-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Fixes: c8c43cee29 ("driver core: Fix driver_deferred_probe_check_state() logic")
Signed-off-by: John Stultz <john.stultz@linaro.org>
Link: https://lore.kernel.org/r/20200422203245.83244-2-john.stultz@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-28 17:56:26 +02:00
James Hilliard
7706b0a76a component: Silence bind error on -EPROBE_DEFER
If a component fails to bind due to -EPROBE_DEFER we should not log an
error as this is not a real failure.

Fixes messages like:
vc4-drm soc:gpu: failed to bind 3f902000.hdmi (ops vc4_hdmi_ops): -517
vc4-drm soc:gpu: master bind failed: -517

Signed-off-by: James Hilliard <james.hilliard1@gmail.com>
Link: https://lore.kernel.org/r/20200411190241.89404-1-james.hilliard1@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-28 17:54:15 +02:00
Saravana Kannan
00b2475578 driver core: Fix handling of fw_devlink=permissive
When commit 8375e74f2b ("driver core: Add fw_devlink kernel
commandline option") added fw_devlink, it didn't implement "permissive"
mode correctly.

That commit got the device links flags correct to make sure unprobed
suppliers don't block the probing of a consumer. However, if a consumer
is waiting for mandatory suppliers to register, that could still block a
consumer from probing.

This commit fixes that by making sure in permissive mode, all suppliers
to a consumer are treated as a optional suppliers. So, even if a
consumer is waiting for suppliers to register and link itself (using the
DL_FLAG_SYNC_STATE_ONLY flag) to the supplier, the consumer is never
blocked from probing.

Fixes: 8375e74f2b ("driver core: Add fw_devlink kernel commandline option")
Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Saravana Kannan <saravanak@google.com>
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Link: https://lore.kernel.org/r/20200331022832.209618-1-saravanak@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-28 17:54:13 +02:00
Luis Chamberlain
3740d93e37 coredump: fix crash when umh is disabled
Commit 64e90a8acb ("Introduce STATIC_USERMODEHELPER to mediate
call_usermodehelper()") added the optiont to disable all
call_usermodehelper() calls by setting STATIC_USERMODEHELPER_PATH to
an empty string. When this is done, and crashdump is triggered, it
will crash on null pointer dereference, since we make assumptions
over what call_usermodehelper_exec() did.

This has been reported by Sergey when one triggers a a coredump
with the following configuration:

```
CONFIG_STATIC_USERMODEHELPER=y
CONFIG_STATIC_USERMODEHELPER_PATH=""
kernel.core_pattern = |/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h %e
```

The way disabling the umh was designed was that call_usermodehelper_exec()
would just return early, without an error. But coredump assumes
certain variables are set up for us when this happens, and calls
ile_start_write(cprm.file) with a NULL file.

[    2.819676] BUG: kernel NULL pointer dereference, address: 0000000000000020
[    2.819859] #PF: supervisor read access in kernel mode
[    2.820035] #PF: error_code(0x0000) - not-present page
[    2.820188] PGD 0 P4D 0
[    2.820305] Oops: 0000 [#1] SMP PTI
[    2.820436] CPU: 2 PID: 89 Comm: a Not tainted 5.7.0-rc1+ #7
[    2.820680] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190711_202441-buildvm-armv7-10.arm.fedoraproject.org-2.fc31 04/01/2014
[    2.821150] RIP: 0010:do_coredump+0xd80/0x1060
[    2.821385] Code: e8 95 11 ed ff 48 c7 c6 cc a7 b4 81 48 8d bd 28 ff
ff ff 89 c2 e8 70 f1 ff ff 41 89 c2 85 c0 0f 84 72 f7 ff ff e9 b4 fe ff
ff <48> 8b 57 20 0f b7 02 66 25 00 f0 66 3d 00 8
0 0f 84 9c 01 00 00 44
[    2.822014] RSP: 0000:ffffc9000029bcb8 EFLAGS: 00010246
[    2.822339] RAX: 0000000000000000 RBX: ffff88803f860000 RCX: 000000000000000a
[    2.822746] RDX: 0000000000000009 RSI: 0000000000000282 RDI: 0000000000000000
[    2.823141] RBP: ffffc9000029bde8 R08: 0000000000000000 R09: ffffc9000029bc00
[    2.823508] R10: 0000000000000001 R11: ffff88803dec90be R12: ffffffff81c39da0
[    2.823902] R13: ffff88803de84400 R14: 0000000000000000 R15: 0000000000000000
[    2.824285] FS:  00007fee08183540(0000) GS:ffff88803e480000(0000) knlGS:0000000000000000
[    2.824767] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    2.825111] CR2: 0000000000000020 CR3: 000000003f856005 CR4: 0000000000060ea0
[    2.825479] Call Trace:
[    2.825790]  get_signal+0x11e/0x720
[    2.826087]  do_signal+0x1d/0x670
[    2.826361]  ? force_sig_info_to_task+0xc1/0xf0
[    2.826691]  ? force_sig_fault+0x3c/0x40
[    2.826996]  ? do_trap+0xc9/0x100
[    2.827179]  exit_to_usermode_loop+0x49/0x90
[    2.827359]  prepare_exit_to_usermode+0x77/0xb0
[    2.827559]  ? invalid_op+0xa/0x30
[    2.827747]  ret_from_intr+0x20/0x20
[    2.827921] RIP: 0033:0x55e2c76d2129
[    2.828107] Code: 2d ff ff ff e8 68 ff ff ff 5d c6 05 18 2f 00 00 01
c3 0f 1f 80 00 00 00 00 c3 0f 1f 80 00 00 00 00 e9 7b ff ff ff 55 48 89
e5 <0f> 0b b8 00 00 00 00 5d c3 66 2e 0f 1f 84 0
0 00 00 00 00 0f 1f 40
[    2.828603] RSP: 002b:00007fffeba5e080 EFLAGS: 00010246
[    2.828801] RAX: 000055e2c76d2125 RBX: 0000000000000000 RCX: 00007fee0817c718
[    2.829034] RDX: 00007fffeba5e188 RSI: 00007fffeba5e178 RDI: 0000000000000001
[    2.829257] RBP: 00007fffeba5e080 R08: 0000000000000000 R09: 00007fee08193c00
[    2.829482] R10: 0000000000000009 R11: 0000000000000000 R12: 000055e2c76d2040
[    2.829727] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[    2.829964] CR2: 0000000000000020
[    2.830149] ---[ end trace ceed83d8c68a1bf1 ]---
```

Cc: <stable@vger.kernel.org> # v4.11+
Fixes: 64e90a8acb ("Introduce STATIC_USERMODEHELPER to mediate call_usermodehelper()")
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=199795
Reported-by: Tony Vroon <chainsaw@gentoo.org>
Reported-by: Sergey Kvachonok <ravenexp@gmail.com>
Tested-by: Sergei Trofimovich <slyfox@gentoo.org>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Link: https://lore.kernel.org/r/20200416162859.26518-1-mcgrof@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-28 17:54:13 +02:00
Ulf Hansson
f458488425 amba: Initialize dma_parms for amba devices
It's currently the amba driver's responsibility to initialize the pointer,
dma_parms, for its corresponding struct device. The benefit with this
approach allows us to avoid the initialization and to not waste memory for
the struct device_dma_parameters, as this can be decided on a case by case
basis.

However, it has turned out that this approach is not very practical. Not
only does it lead to open coding, but also to real errors. In principle
callers of dma_set_max_seg_size() doesn't check the error code, but just
assumes it succeeds.

For these reasons, let's do the initialization from the common amba bus at
the device registration point. This also follows the way the PCI devices
are being managed, see pci_device_add().

Suggested-by: Christoph Hellwig <hch@lst.de>
Cc: Russell King <linux@armlinux.org.uk>
Cc: <stable@vger.kernel.org>
Tested-by: Haibo Chen <haibo.chen@nxp.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20200422101013.31267-1-ulf.hansson@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-28 17:44:34 +02:00
Ulf Hansson
9495b7e92f driver core: platform: Initialize dma_parms for platform devices
It's currently the platform driver's responsibility to initialize the
pointer, dma_parms, for its corresponding struct device. The benefit with
this approach allows us to avoid the initialization and to not waste memory
for the struct device_dma_parameters, as this can be decided on a case by
case basis.

However, it has turned out that this approach is not very practical.  Not
only does it lead to open coding, but also to real errors. In principle
callers of dma_set_max_seg_size() doesn't check the error code, but just
assumes it succeeds.

For these reasons, let's do the initialization from the common platform bus
at the device registration point. This also follows the way the PCI devices
are being managed, see pci_device_add().

Suggested-by: Christoph Hellwig <hch@lst.de>
Cc: <stable@vger.kernel.org>
Tested-by: Haibo Chen <haibo.chen@nxp.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20200422100954.31211-1-ulf.hansson@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-28 17:44:33 +02:00
Christian Gromm
5e56bc06e1 most: core: use function subsys_initcall()
This patch replaces function module_init() with subsys_initcall().
It is needed to ensure that the core module of the driver is
initialized before a component tries to register with the core. This
leads to a NULL pointer dereference if the driver is configured as
in-tree.

Signed-off-by: Christian Gromm <christian.gromm@microchip.com>
Reported-by: kernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/r/1587741394-22021-1-git-send-email-christian.gromm@microchip.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-28 15:04:09 +02:00
Dan Carpenter
522587e7c0 bus: mhi: core: Fix a NULL vs IS_ERR check in mhi_create_devices()
The mhi_alloc_device() function never returns NULL, it returns error
pointers.

Fixes: da1c4f8569 ("bus: mhi: core: Add support for creating and destroying MHI devices")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://lore.kernel.org/r/20200407093133.GM68494@mwanda
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-28 15:04:09 +02:00
Brian King
66bb7fa81e scsi: ibmvfc: Don't send implicit logouts prior to NPIV login
Commit ed830385a2 ("scsi: ibmvfc: Avoid loss of all paths during SVC node
reboot") introduced a regression where when the client resets or re-enables
its CRQ with the hypervisor there is a chance that if the server side
doesn't issue its INIT handshake quick enough the client can issue an
Implicit Logout prior to doing an NPIV Login. The server treats this
scenario as a protocol violation and closes the CRQ on its end forcing the
client through a reset that gets the client host state and next host action
out of agreement leading to a BUG assert.

ibmvfc 30000003: Partner initialization complete
ibmvfc 30000002: Partner initialization complete
ibmvfc 30000002: Host partner adapter deregistered or failed (rc=2)
ibmvfc 30000002: Partner initialized
------------[ cut here ]------------
kernel BUG at ../drivers/scsi/ibmvscsi/ibmvfc.c:4489!
Oops: Exception in kernel mode, sig: 5 [#1]
LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
Supported: No, Unreleased kernel
CPU: 16 PID: 1290 Comm: ibmvfc_0 Tainted: G           OE  X   5.3.18-12-default
NIP:  c00800000d84a2b4 LR: c00800000d84a040 CTR: c00800000d84a2a0
REGS: c00000000cb57a00 TRAP: 0700   Tainted: G           OE  X    (5.3.18-12-default)
MSR:  800000000282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 24000848  XER: 00000001
CFAR: c00800000d84a070 IRQMASK: 1
GPR00: c00800000d84a040 c00000000cb57c90 c00800000d858e00 0000000000000000
GPR04: 0000000000000000 0000000000000000 0000000000000000 00000000000000a0
GPR08: c00800000d84a074 0000000000000001 0000000000000014 c00800000d84d7d0
GPR12: 0000000000000000 c00000001ea28200 c00000000016cd98 0000000000000000
GPR16: c00800000d84b7b8 0000000000000000 0000000000000000 c00000542c706d68
GPR20: 0000000000000005 c00000542c706d88 5deadbeef0000100 5deadbeef0000122
GPR24: 000000000000000c 000000000000000b c00800000d852180 0000000000000001
GPR28: 0000000000000000 c00000542c706da0 c00000542c706860 c00000542c706828
NIP [c00800000d84a2b4] ibmvfc_work+0x3ac/0xc90 [ibmvfc]
LR [c00800000d84a040] ibmvfc_work+0x138/0xc90 [ibmvfc]

This scenario can be prevented by rejecting any attempt to send an Implicit
Logout if the client adapter is not logged in yet.

Link: https://lore.kernel.org/r/20200427214824.6890-1-tyreld@linux.ibm.com
Fixes: ed830385a2 ("scsi: ibmvfc: Avoid loss of all paths during SVC node reboot")
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-04-27 21:42:50 -04:00
David S. Miller
37255e7a8f Merge tag 'batadv-net-for-davem-20200427' of git://git.open-mesh.org/linux-merge
Simon Wunderlich says:

====================
Here are some batman-adv bugfixes:

 - fix random number generation in network coding, by George Spelvin

 - fix reference counter leaks, by Xiyu Yang (3 patches)
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-27 13:02:55 -07:00
Christophe JAILLET
10e3cc180e net/sonic: Fix a resource leak in an error handling path in 'jazz_sonic_probe()'
A call to 'dma_alloc_coherent()' is hidden in 'sonic_alloc_descriptors()',
called from 'sonic_probe1()'.

This is correctly freed in the remove function, but not in the error
handling path of the probe function.
Fix it and add the missing 'dma_free_coherent()' call.

While at it, rename a label in order to be slightly more informative.

Fixes: efcce83936 ("[PATCH] macsonic/jazzsonic network drivers update")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-27 12:07:46 -07:00
Anthony Felice
4b5b71f770 net: tc35815: Fix phydev supported/advertising mask
Commit 3c1bcc8614 ("net: ethernet: Convert phydev advertize and
supported from u32 to link mode") updated ethernet drivers to use a
linkmode bitmap. It mistakenly dropped a bitwise negation in the
tc35815 ethernet driver on a bitmask to set the supported/advertising
flags.

Found by Anthony via code inspection, not tested as I do not have the
required hardware.

Fixes: 3c1bcc8614 ("net: ethernet: Convert phydev advertize and supported from u32 to link mode")
Signed-off-by: Anthony Felice <tony.felice@timesys.com>
Reviewed-by: Akshay Bhat <akshay.bhat@timesys.com>
Reviewed-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-27 11:52:10 -07:00
Eric Dumazet
df4953e4e9 sch_sfq: validate silly quantum values
syzbot managed to set up sfq so that q->scaled_quantum was zero,
triggering an infinite loop in sfq_dequeue()

More generally, we must only accept quantum between 1 and 2^18 - 7,
meaning scaled_quantum must be in [1, 0x7FFF] range.

Otherwise, we also could have a loop in sfq_dequeue()
if scaled_quantum happens to be 0x8000, since slot->allot
could indefinitely switch between 0 and 0x8000.

Fixes: eeaeb068f1 ("sch_sfq: allow big packets and be fair")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot+0251e883fe39e7a0cb0a@syzkaller.appspotmail.com
Cc: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-27 11:49:57 -07:00
David S. Miller
cf7fc3af87 Merge branch 'bnxt_en-fixes'
Michael Chan says:

====================
bnxt_en: Bug fixes.

A collection of 5 miscellaneous bug fixes covering VF anti-spoof setup
issues, devlink MSIX max value, AER, context memory allocation error
path, and VLAN acceleration logic.

Please queue for -stable.  Thanks.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-27 11:44:06 -07:00
Michael Chan
c72cb303aa bnxt_en: Fix VLAN acceleration handling in bnxt_fix_features().
The current logic in bnxt_fix_features() will inadvertently turn on both
CTAG and STAG VLAN offload if the user tries to disable both.  Fix it
by checking that the user is trying to enable CTAG or STAG before
enabling both.  The logic is supposed to enable or disable both CTAG and
STAG together.

Fixes: 5a9f6b238e ("bnxt_en: Enable and disable RX CTAG and RX STAG VLAN acceleration together.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-27 11:44:05 -07:00
Michael Chan
bbf211b1ec bnxt_en: Return error when allocating zero size context memory.
bnxt_alloc_ctx_pg_tbls() should return error when the memory size of the
context memory to set up is zero.  By returning success (0), the caller
may proceed normally and may crash later when it tries to set up the
memory.

Fixes: 08fe9d1816 ("bnxt_en: Add Level 2 context memory paging support.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-27 11:44:05 -07:00
Michael Chan
bae361c54f bnxt_en: Improve AER slot reset.
Improve the slot reset sequence by disabling the device to prevent bad
DMAs if slot reset fails.  Return the proper result instead of always
PCI_ERS_RESULT_RECOVERED to the caller.

Fixes: 6316ea6db9 ("bnxt_en: Enable AER support.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-27 11:44:05 -07:00
Vasundhara Volam
9e68cb0359 bnxt_en: Reduce BNXT_MSIX_VEC_MAX value to supported CQs per PF.
Broadcom adapters support only maximum of 512 CQs per PF. If user sets
MSIx vectors more than supported CQs, firmware is setting incorrect value
for msix_vec_per_pf_max parameter. Fix it by reducing the BNXT_MSIX_VEC_MAX
value to 512, even though the maximum # of MSIx vectors supported by adapter
are 1280.

Fixes: f399e84978 ("bnxt_en: Use msix_vec_per_pf_max and msix_vec_per_pf_min devlink params.")
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-27 11:44:05 -07:00
Michael Chan
c71c4e49af bnxt_en: Fix VF anti-spoof filter setup.
Fix the logic that sets the enable/disable flag for the source MAC
filter according to firmware spec 1.7.1.

In the original firmware spec. before 1.7.1, the VF spoof check flags
were not latched after making the HWRM_FUNC_CFG call, so there was a
need to keep the func_flags so that subsequent calls would perserve
the VF spoof check setting.  A change was made in the 1.7.1 spec
so that the flags became latched.  So we now set or clear the anti-
spoof setting directly without retrieving the old settings in the
stored vf->func_flags which are no longer valid.  We also remove the
unneeded vf->func_flags.

Fixes: 8eb992e876 ("bnxt_en: Update firmware interface spec to 1.7.6.2.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-27 11:44:05 -07:00
Baruch Siach
c3e302edca net: phy: marvell10g: fix temperature sensor on 2110
Read the temperature sensor register from the correct location for the
88E2110 PHY. There is no enable/disable bit on 2110, so make
mv3310_hwmon_config() run on 88X3310 only.

Fixes: 62d0153547 ("net: phy: marvell10g: add support for the 88x2110 PHY")
Cc: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Baruch Siach <baruch@tkos.co.il>
Reviewed-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-27 11:38:19 -07:00
Eric Dumazet
8738c85c72 sch_choke: avoid potential panic in choke_reset()
If choke_init() could not allocate q->tab, we would crash later
in choke_reset().

BUG: KASAN: null-ptr-deref in memset include/linux/string.h:366 [inline]
BUG: KASAN: null-ptr-deref in choke_reset+0x208/0x340 net/sched/sch_choke.c:326
Write of size 8 at addr 0000000000000000 by task syz-executor822/7022

CPU: 1 PID: 7022 Comm: syz-executor822 Not tainted 5.7.0-rc1-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x188/0x20d lib/dump_stack.c:118
 __kasan_report.cold+0x5/0x4d mm/kasan/report.c:515
 kasan_report+0x33/0x50 mm/kasan/common.c:625
 check_memory_region_inline mm/kasan/generic.c:187 [inline]
 check_memory_region+0x141/0x190 mm/kasan/generic.c:193
 memset+0x20/0x40 mm/kasan/common.c:85
 memset include/linux/string.h:366 [inline]
 choke_reset+0x208/0x340 net/sched/sch_choke.c:326
 qdisc_reset+0x6b/0x520 net/sched/sch_generic.c:910
 dev_deactivate_queue.constprop.0+0x13c/0x240 net/sched/sch_generic.c:1138
 netdev_for_each_tx_queue include/linux/netdevice.h:2197 [inline]
 dev_deactivate_many+0xe2/0xba0 net/sched/sch_generic.c:1195
 dev_deactivate+0xf8/0x1c0 net/sched/sch_generic.c:1233
 qdisc_graft+0xd25/0x1120 net/sched/sch_api.c:1051
 tc_modify_qdisc+0xbab/0x1a00 net/sched/sch_api.c:1670
 rtnetlink_rcv_msg+0x44e/0xad0 net/core/rtnetlink.c:5454
 netlink_rcv_skb+0x15a/0x410 net/netlink/af_netlink.c:2469
 netlink_unicast_kernel net/netlink/af_netlink.c:1303 [inline]
 netlink_unicast+0x537/0x740 net/netlink/af_netlink.c:1329
 netlink_sendmsg+0x882/0xe10 net/netlink/af_netlink.c:1918
 sock_sendmsg_nosec net/socket.c:652 [inline]
 sock_sendmsg+0xcf/0x120 net/socket.c:672
 ____sys_sendmsg+0x6bf/0x7e0 net/socket.c:2362
 ___sys_sendmsg+0x100/0x170 net/socket.c:2416
 __sys_sendmsg+0xec/0x1b0 net/socket.c:2449
 do_syscall_64+0xf6/0x7d0 arch/x86/entry/common.c:295

Fixes: 77e62da6e6 ("sch_choke: drop all packets in queue during reset")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-27 11:35:27 -07:00
Eric Dumazet
14695212d4 fq_codel: fix TCA_FQ_CODEL_DROP_BATCH_SIZE sanity checks
My intent was to not let users set a zero drop_batch_size,
it seems I once again messed with min()/max().

Fixes: 9d18562a22 ("fq_codel: add batch ability to fq_codel_drop()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-27 11:34:04 -07:00
Xiyu Yang
62b4011fa7 net/tls: Fix sk_psock refcnt leak when in tls_data_ready()
tls_data_ready() invokes sk_psock_get(), which returns a reference of
the specified sk_psock object to "psock" with increased refcnt.

When tls_data_ready() returns, local variable "psock" becomes invalid,
so the refcount should be decreased to keep refcount balanced.

The reference counting issue happens in one exception handling path of
tls_data_ready(). When "psock->ingress_msg" is empty but "psock" is not
NULL, the function forgets to decrease the refcnt increased by
sk_psock_get(), causing a refcnt leak.

Fix this issue by calling sk_psock_put() on all paths when "psock" is
not NULL.

Signed-off-by: Xiyu Yang <xiyuyang19@fudan.edu.cn>
Signed-off-by: Xin Tan <tanxin.ctf@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-27 11:22:38 -07:00
Xiyu Yang
4becb7ee5b net/x25: Fix x25_neigh refcnt leak when x25 disconnect
x25_connect() invokes x25_get_neigh(), which returns a reference of the
specified x25_neigh object to "x25->neighbour" with increased refcnt.

When x25 connect success and returns, the reference still be hold by
"x25->neighbour", so the refcount should be decreased in
x25_disconnect() to keep refcount balanced.

The reference counting issue happens in x25_disconnect(), which forgets
to decrease the refcnt increased by x25_get_neigh() in x25_connect(),
causing a refcnt leak.

Fix this issue by calling x25_neigh_put() before x25_disconnect()
returns.

Signed-off-by: Xiyu Yang <xiyuyang19@fudan.edu.cn>
Signed-off-by: Xin Tan <tanxin.ctf@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-27 11:20:30 -07:00
Xiyu Yang
095f5614bf net/tls: Fix sk_psock refcnt leak in bpf_exec_tx_verdict()
bpf_exec_tx_verdict() invokes sk_psock_get(), which returns a reference
of the specified sk_psock object to "psock" with increased refcnt.

When bpf_exec_tx_verdict() returns, local variable "psock" becomes
invalid, so the refcount should be decreased to keep refcount balanced.

The reference counting issue happens in one exception handling path of
bpf_exec_tx_verdict(). When "policy" equals to NULL but "psock" is not
NULL, the function forgets to decrease the refcnt increased by
sk_psock_get(), causing a refcnt leak.

Fix this issue by calling sk_psock_put() on this error path before
bpf_exec_tx_verdict() returns.

Signed-off-by: Xiyu Yang <xiyuyang19@fudan.edu.cn>
Signed-off-by: Xin Tan <tanxin.ctf@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-27 11:17:16 -07:00
Richard Clark
6de556c310 aquantia: Fix the media type of AQC100 ethernet controller in the driver
The Aquantia AQC100 controller enables a SFP+ port, so the driver should
configure the media type as '_TYPE_FIBRE' instead of '_TYPE_TP'.

Signed-off-by: Richard Clark <richard.xnu.clark@gmail.com>
Cc: Igor Russkikh <irusskikh@marvell.com>
Cc: "David S. Miller" <davem@davemloft.net>
Acked-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-27 11:13:15 -07:00
David S. Miller
18e6719c14 Merge branch 'vsock-virtio-fixes-about-packet-delivery-to-monitoring-devices'
Stefano Garzarella says:

====================
vsock/virtio: fixes about packet delivery to monitoring devices

During the review of v1, Stefan pointed out an issue introduced by
that patch, where replies can appear in the packet capture before
the transmitted packet.

While fixing my patch, reverting it and adding a new flag in
'struct virtio_vsock_pkt' (patch 2/2), I found that we already had
that issue in vhost-vsock, so I fixed it (patch 1/2).

v1 -> v2:
- reverted the v1 patch, to avoid that replies can appear in the
  packet capture before the transmitted packet [Stefan]
- added patch to fix packet delivering to monitoring devices in
  vhost-vsock
- added patch to check if the packet is already delivered to
  monitoring devices

v1: https://patchwork.ozlabs.org/project/netdev/patch/20200421092527.41651-1-sgarzare@redhat.com/
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-27 10:18:01 -07:00
Stefano Garzarella
a78d163978 vsock/virtio: fix multiple packet delivery to monitoring devices
In virtio_transport.c, if the virtqueue is full, the transmitting
packet is queued up and it will be sent in the next iteration.
This causes the same packet to be delivered multiple times to
monitoring devices.

We want to continue to deliver packets to monitoring devices before
it is put in the virtqueue, to avoid that replies can appear in the
packet capture before the transmitted packet.

This patch fixes the issue, adding a new flag (tap_delivered) in
struct virtio_vsock_pkt, to check if the packet is already delivered
to monitoring devices.

In vhost/vsock.c, we are splitting packets, so we must set
'tap_delivered' to false when we queue up the same virtio_vsock_pkt
to handle the remaining bytes.

Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-27 10:18:01 -07:00
Stefano Garzarella
107bc0766b vhost/vsock: fix packet delivery order to monitoring devices
We want to deliver packets to monitoring devices before it is
put in the virtqueue, to avoid that replies can appear in the
packet capture before the transmitted packet.

Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-27 10:18:01 -07:00
John Stultz
67321e02fb phy: qcom-qusb2: Re add "qcom,sdm845-qusb2-phy" compat string
This patch fixes a regression in 5.7-rc1+

In commit 8fe75cd4cd ("phy: qcom-qusb2: Add generic QUSB2 V2
PHY support"), the change was made to add "qcom,qusb2-v2-phy"
as a generic compat string. However the change also removed
the "qcom,sdm845-qusb2-phy" compat string, which is documented
in the binding and already in use.

This patch re-adds the "qcom,sdm845-qusb2-phy" compat string
which allows the driver to continue to work with existing dts
entries such as found on the db845c.

Cc: Andy Gross <agross@kernel.org>
Cc: Bjorn Andersson <bjorn.andersson@linaro.org>
Cc: Rob Herring <robh+dt@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Doug Anderson <dianders@chromium.org>
Cc: Manu Gautam <mgautam@codeaurora.org>
Cc: Sandeep Maheswaram <sanm@codeaurora.org>
Cc: Matthias Kaehlcke <mka@chromium.org>
Cc: Stephen Boyd <swboyd@chromium.org>
Cc: Kishon Vijay Abraham I <kishon@ti.com>
Cc: linux-arm-msm@vger.kernel.org
Cc: devicetree@vger.kernel.org
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Fixes: 8fe75cd4cd ("phy: qcom-qusb2: Add generic QUSB2 V2 PHY support")
Reported-by: YongQin Liu <yongqin.liu@linaro.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
2020-04-27 18:23:29 +05:30
Xiyu Yang
8aebfffacf configfs: fix config_item refcnt leak in configfs_rmdir()
configfs_rmdir() invokes configfs_get_config_item(), which returns a
reference of the specified config_item object to "parent_item" with
increased refcnt.

When configfs_rmdir() returns, local variable "parent_item" becomes
invalid, so the refcount should be decreased to keep refcount balanced.

The reference counting issue happens in one exception handling path of
configfs_rmdir(). When down_write_killable() fails, the function forgets
to decrease the refcnt increased by configfs_get_config_item(), causing
a refcnt leak.

Fix this issue by calling config_item_put() when down_write_killable()
fails.

Signed-off-by: Xiyu Yang <xiyuyang19@fudan.edu.cn>
Signed-off-by: Xin Tan <tanxin.ctf@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-04-27 08:17:10 +02:00
Guillaume Nault
ea64d8d6c6 netfilter: nat: never update the UDP checksum when it's 0
If the UDP header of a local VXLAN endpoint is NAT-ed, and the VXLAN
device has disabled UDP checksums and enabled Tx checksum offloading,
then the skb passed to udp_manip_pkt() has hdr->check == 0 (outer
checksum disabled) and skb->ip_summed == CHECKSUM_PARTIAL (inner packet
checksum offloaded).

Because of the ->ip_summed value, udp_manip_pkt() tries to update the
outer checksum with the new address and port, leading to an invalid
checksum sent on the wire, as the original null checksum obviously
didn't take the old address and port into account.

So, we can't take ->ip_summed into account in udp_manip_pkt(), as it
might not refer to the checksum we're acting on. Instead, we can base
the decision to update the UDP checksum entirely on the value of
hdr->check, because it's null if and only if checksum is disabled:

  * A fully computed checksum can't be 0, since a 0 checksum is
    represented by the CSUM_MANGLED_0 value instead.

  * A partial checksum can't be 0, since the pseudo-header always adds
    at least one non-zero value (the UDP protocol type 0x11) and adding
    more values to the sum can't make it wrap to 0 as the carry is then
    added to the wrapped number.

  * A disabled checksum uses the special value 0.

The problem seems to be there from day one, although it was probably
not visible before UDP tunnels were implemented.

Fixes: 5b1158e909 ("[NETFILTER]: Add NAT support for nf_conntrack")
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-04-26 23:57:18 +02:00
Josh Poimboeuf
53fb6e990d objtool: Fix infinite loop in for_offset_range()
Randy reported that objtool got stuck in an infinite loop when
processing drivers/i2c/busses/i2c-parport.o.  It was caused by the
following code:

  00000000000001fd <line_set>:
   1fd:	48 b8 00 00 00 00 00	movabs $0x0,%rax
   204:	00 00 00
			1ff: R_X86_64_64	.rodata-0x8
   207:	41 55                	push   %r13
   209:	41 89 f5             	mov    %esi,%r13d
   20c:	41 54                	push   %r12
   20e:	49 89 fc             	mov    %rdi,%r12
   211:	55                   	push   %rbp
   212:	48 89 d5             	mov    %rdx,%rbp
   215:	53                   	push   %rbx
   216:	0f b6 5a 01          	movzbl 0x1(%rdx),%ebx
   21a:	48 8d 34 dd 00 00 00 	lea    0x0(,%rbx,8),%rsi
   221:	00
			21e: R_X86_64_32S	.rodata
   222:	48 89 f1             	mov    %rsi,%rcx
   225:	48 29 c1             	sub    %rax,%rcx

find_jump_table() saw the .rodata reference and tried to find a jump
table associated with it (though there wasn't one).  The -0x8 rela
addend is unusual.  It caused find_jump_table() to send a negative
table_offset (unsigned 0xfffffffffffffff8) to find_rela_by_dest().

The negative offset should have been harmless, but it actually threw
for_offset_range() for a loop... literally.  When the mask value got
incremented past the end value, it also wrapped to zero, causing the
loop exit condition to remain true forever.

Prevent this scenario from happening by ensuring the incremented value
is always >= the starting value.

Fixes: 74b873e49d ("objtool: Optimize find_rela_by_dest_range()")
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Julien Thierry <jthierry@redhat.com>
Cc: Miroslav Benes <mbenes@suse.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/02b719674b031800b61e33c30b2e823183627c19.1587842122.git.jpoimboe@redhat.com
2020-04-26 09:28:14 +02:00
Eric Dumazet
52a90612fa net: remove obsolete comment
Commit b656722906 ("net: Increase the size of skb_frag_t")
removed the 16bit limitation of a frag on some 32bit arches.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-25 20:49:32 -07:00
Paolo Abeni
1200832c6e mptcp: fix race in msk status update
Currently subflow_finish_connect() changes unconditionally
any msk socket status other than TCP_ESTABLISHED.

If an unblocking connect() races with close(), we can end-up
triggering:

IPv4: Attempt to release TCP socket in state 1 00000000e32b8b7e

when the msk socket is disposed.

Be sure to enter the established status only from SYN_SENT.

Fixes: c3c123d16c ("net: mptcp: don't hang in mptcp_sendmsg() after TCP fallback")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-25 20:38:54 -07:00
Josh Poimboeuf
81b67439d1 x86/unwind/orc: Fix premature unwind stoppage due to IRET frames
The following execution path is possible:

  fsnotify()
    [ realign the stack and store previous SP in R10 ]
    <IRQ>
      [ only IRET regs saved ]
      common_interrupt()
        interrupt_entry()
	  <NMI>
	    [ full pt_regs saved ]
	    ...
	    [ unwind stack ]

When the unwinder goes through the NMI and the IRQ on the stack, and
then sees fsnotify(), it doesn't have access to the value of R10,
because it only has the five IRET registers.  So the unwind stops
prematurely.

However, because the interrupt_entry() code is careful not to clobber
R10 before saving the full regs, the unwinder should be able to read R10
from the previously saved full pt_regs associated with the NMI.

Handle this case properly.  When encountering an IRET regs frame
immediately after a full pt_regs frame, use the pt_regs as a backup
which can be used to get the C register values.

Also, note that a call frame resets the 'prev_regs' value, because a
function is free to clobber the registers.  For this fix to work, the
IRET and full regs frames must be adjacent, with no FUNC frames in
between.  So replace the FUNC hint in interrupt_entry() with an
IRET_REGS hint.

Fixes: ee9f8fce99 ("x86/unwind: Add the ORC unwinder")
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Jones <dsj@fb.com>
Cc: Jann Horn <jannh@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: https://lore.kernel.org/r/97a408167cc09f1cfa0de31a7b70dd88868d743f.1587808742.git.jpoimboe@redhat.com
2020-04-25 12:22:29 +02:00
Josh Poimboeuf
a0f81bf268 x86/unwind/orc: Fix error path for bad ORC entry type
If the ORC entry type is unknown, nothing else can be done other than
reporting an error.  Exit the function instead of breaking out of the
switch statement.

Fixes: ee9f8fce99 ("x86/unwind: Add the ORC unwinder")
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Jones <dsj@fb.com>
Cc: Jann Horn <jannh@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: https://lore.kernel.org/r/a7fa668ca6eabbe81ab18b2424f15adbbfdc810a.1587808742.git.jpoimboe@redhat.com
2020-04-25 12:22:29 +02:00
Josh Poimboeuf
98d0c8ebf7 x86/unwind/orc: Prevent unwinding before ORC initialization
If the unwinder is called before the ORC data has been initialized,
orc_find() returns NULL, and it tries to fall back to using frame
pointers.  This can cause some unexpected warnings during boot.

Move the 'orc_init' check from orc_find() to __unwind_init(), so that it
doesn't even try to unwind from an uninitialized state.

Fixes: ee9f8fce99 ("x86/unwind: Add the ORC unwinder")
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Jones <dsj@fb.com>
Cc: Jann Horn <jannh@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: https://lore.kernel.org/r/069d1499ad606d85532eb32ce39b2441679667d5.1587808742.git.jpoimboe@redhat.com
2020-04-25 12:22:29 +02:00
Miroslav Benes
f1d9a2abff x86/unwind/orc: Don't skip the first frame for inactive tasks
When unwinding an inactive task, the ORC unwinder skips the first frame
by default.  If both the 'regs' and 'first_frame' parameters of
unwind_start() are NULL, 'state->sp' and 'first_frame' are later
initialized to the same value for an inactive task.  Given there is a
"less than or equal to" comparison used at the end of __unwind_start()
for skipping stack frames, the first frame is skipped.

Drop the equal part of the comparison and make the behavior equivalent
to the frame pointer unwinder.

Fixes: ee9f8fce99 ("x86/unwind: Add the ORC unwinder")
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Jones <dsj@fb.com>
Cc: Jann Horn <jannh@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: https://lore.kernel.org/r/7f08db872ab59e807016910acdbe82f744de7065.1587808742.git.jpoimboe@redhat.com
2020-04-25 12:22:29 +02:00
Josh Poimboeuf
b08418b548 x86/unwind: Prevent false warnings for non-current tasks
There's some daring kernel code out there which dumps the stack of
another task without first making sure the task is inactive.  If the
task happens to be running while the unwinder is reading the stack,
unusual unwinder warnings can result.

There's no race-free way for the unwinder to know whether such a warning
is legitimate, so just disable unwinder warnings for all non-current
tasks.

Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Jones <dsj@fb.com>
Cc: Jann Horn <jannh@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: https://lore.kernel.org/r/ec424a2aea1d461eb30cab48a28c6433de2ab784.1587808742.git.jpoimboe@redhat.com
2020-04-25 12:22:28 +02:00
Josh Poimboeuf
153eb2223c x86/unwind/orc: Convert global variables to static
These variables aren't used outside of unwind_orc.c, make them static.

Also annotate some of them with '__ro_after_init', as applicable.

Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Jones <dsj@fb.com>
Cc: Jann Horn <jannh@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: https://lore.kernel.org/r/43ae310bf7822b9862e571f36ae3474cfde8f301.1587808742.git.jpoimboe@redhat.com
2020-04-25 12:22:28 +02:00
Jann Horn
f977df7b7c x86/entry/64: Fix unwind hints in rewind_stack_do_exit()
The LEAQ instruction in rewind_stack_do_exit() moves the stack pointer
directly below the pt_regs at the top of the task stack before calling
do_exit(). Tell the unwinder to expect pt_regs.

Fixes: 8c1f75587a ("x86/entry/64: Add unwind hint annotations")
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Jones <dsj@fb.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: https://lore.kernel.org/r/68c33e17ae5963854916a46f522624f8e1d264f2.1587808742.git.jpoimboe@redhat.com
2020-04-25 12:22:28 +02:00
Josh Poimboeuf
96c64806b4 x86/entry/64: Fix unwind hints in __switch_to_asm()
UNWIND_HINT_FUNC has some limitations: specifically, it doesn't reset
all the registers to undefined.  This causes objtool to get confused
about the RBP push in __switch_to_asm(), resulting in bad ORC data.

While __switch_to_asm() does do some stack magic, it's otherwise a
normal callable-from-C function, so just annotate it as a function,
which makes objtool happy and allows it to produces the correct hints
automatically.

Fixes: 8c1f75587a ("x86/entry/64: Add unwind hint annotations")
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Jones <dsj@fb.com>
Cc: Jann Horn <jannh@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: https://lore.kernel.org/r/03d0411920d10f7418f2e909210d8e9a3b2ab081.1587808742.git.jpoimboe@redhat.com
2020-04-25 12:22:28 +02:00
Josh Poimboeuf
1fb143634a x86/entry/64: Fix unwind hints in kernel exit path
In swapgs_restore_regs_and_return_to_usermode, after the stack is
switched to the trampoline stack, the existing UNWIND_HINT_REGS hint is
no longer valid, which can result in the following ORC unwinder warning:

  WARNING: can't dereference registers at 000000003aeb0cdd for ip swapgs_restore_regs_and_return_to_usermode+0x93/0xa0

For full correctness, we could try to add complicated unwind hints so
the unwinder could continue to find the registers, but when when it's
this close to kernel exit, unwind hints aren't really needed anymore and
it's fine to just use an empty hint which tells the unwinder to stop.

For consistency, also move the UNWIND_HINT_EMPTY in
entry_SYSCALL_64_after_hwframe to a similar location.

Fixes: 3e3b9293d3 ("x86/entry/64: Return to userspace from the trampoline stack")
Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Reported-by: Dave Jones <dsj@fb.com>
Reported-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reported-by: Joe Mario <jmario@redhat.com>
Reported-by: Jann Horn <jannh@google.com>
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/60ea8f562987ed2d9ace2977502fe481c0d7c9a0.1587808742.git.jpoimboe@redhat.com
2020-04-25 12:22:27 +02:00
Josh Poimboeuf
06a9750edc x86/entry/64: Fix unwind hints in register clearing code
The PUSH_AND_CLEAR_REGS macro zeroes each register immediately after
pushing it.  If an NMI or exception hits after a register is cleared,
but before the UNWIND_HINT_REGS annotation, the ORC unwinder will
wrongly think the previous value of the register was zero.  This can
confuse the unwinding process and cause it to exit early.

Because ORC is simpler than DWARF, there are a limited number of unwind
annotation states, so it's not possible to add an individual unwind hint
after each push/clear combination.  Instead, the register clearing
instructions need to be consolidated and moved to after the
UNWIND_HINT_REGS annotation.

Fixes: 3f01daecd5 ("x86/entry/64: Introduce the PUSH_AND_CLEAN_REGS macro")
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Jones <dsj@fb.com>
Cc: Jann Horn <jannh@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: https://lore.kernel.org/r/68fd3d0bc92ae2d62ff7879d15d3684217d51f08.1587808742.git.jpoimboe@redhat.com
2020-04-25 12:22:27 +02:00
Josh Poimboeuf
d8dd25a461 objtool: Fix stack offset tracking for indirect CFAs
When the current frame address (CFA) is stored on the stack (i.e.,
cfa->base == CFI_SP_INDIRECT), objtool neglects to adjust the stack
offset when there are subsequent pushes or pops.  This results in bad
ORC data at the end of the ENTER_IRQ_STACK macro, when it puts the
previous stack pointer on the stack and does a subsequent push.

This fixes the following unwinder warning:

  WARNING: can't dereference registers at 00000000f0a6bdba for ip interrupt_entry+0x9f/0xa0

Fixes: 627fce1480 ("objtool: Add ORC unwind table generation")
Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Reported-by: Dave Jones <dsj@fb.com>
Reported-by: Steven Rostedt <rostedt@goodmis.org>
Reported-by: Vegard Nossum <vegard.nossum@oracle.com>
Reported-by: Joe Mario <jmario@redhat.com>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Jann Horn <jannh@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/853d5d691b29e250333332f09b8e27410b2d9924.1587808742.git.jpoimboe@redhat.com
2020-04-25 12:22:27 +02:00
Quinn Tran
c48f849d3f scsi: qla2xxx: Delete all sessions before unregister local nvme port
Delete all sessions before unregistering local nvme port.  This allows nvme
layer to decrement all active rport count down to zero.  Once the count is
down to zero, nvme would call qla to continue with the npiv port deletion.

PID: 27448  TASK: ffff9e34b777c1c0  CPU: 0   COMMAND: "qaucli"
 0 [ffff9e25e84abbd8] __schedule at ffffffff977858ca
 1 [ffff9e25e84abc68] schedule at ffffffff97785d79
 2 [ffff9e25e84abc78] schedule_timeout at ffffffff97783881
 3 [ffff9e25e84abd28] wait_for_completion at ffffffff9778612d
 4 [ffff9e25e84abd88] qla_nvme_delete at ffffffffc0e3024e [qla2xxx]
 5 [ffff9e25e84abda8] qla24xx_vport_delete at ffffffffc0e024b9 [qla2xxx]
 6 [ffff9e25e84abdf0] fc_vport_terminate at ffffffffc011c247 [scsi_transport_fc]
 7 [ffff9e25e84abe28] store_fc_host_vport_delete at ffffffffc011cd94 [scsi_transport_fc]
 8 [ffff9e25e84abe70] dev_attr_store at ffffffff974b376b
 9 [ffff9e25e84abe80] sysfs_kf_write at ffffffff972d9a92
10 [ffff9e25e84abe90] kernfs_fop_write at ffffffff972d907b
11 [ffff9e25e84abec8] vfs_write at ffffffff9724c790
12 [ffff9e25e84abf08] sys_write at ffffffff9724d55f
13 [ffff9e25e84abf50] system_call_fastpath at ffffffff97792ed2
    RIP: 00007fc0bd81a6fd  RSP: 00007ffff78d9648  RFLAGS: 00010202
    RAX: 0000000000000001  RBX: 0000000000000022  RCX: 00007ffff78d96e0
    RDX: 0000000000000022  RSI: 00007ffff78d94e0  RDI: 0000000000000008
    RBP: 00007ffff78d9440   R8: 0000000000000000   R9: 00007fc0bd48b2cd
    R10: 0000000000000017  R11: 0000000000000293  R12: 0000000000000000
    R13: 00005624e4dac840  R14: 00005624e4da9a10  R15: 0000000000000000
    ORIG_RAX: 0000000000000001  CS: 0033  SS: 002b

Link: https://lore.kernel.org/r/20200331104015.24868-4-njavali@marvell.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-04-24 12:17:06 -04:00
Arun Easi
45a76264c2 scsi: qla2xxx: Fix hang when issuing nvme disconnect-all in NPIV
In NPIV environment, a NPIV host may use a queue pair created by base host
or other NPIVs, so the check for a queue pair created by this NPIV is not
correct, and can cause an abort to fail, which in turn means the NVME
command not returned.  This leads to hang in nvme_fc layer in
nvme_fc_delete_association() which waits for all I/Os to be returned, which
is seen as hang in the application.

Link: https://lore.kernel.org/r/20200331104015.24868-3-njavali@marvell.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Arun Easi <aeasi@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-04-24 12:16:57 -04:00
Thierry Reding
b61ad5c0e2 phy: tegra: Select USB_COMMON for usb_get_maximum_speed()
The usb_get_maximum_speed() function is part of the usb-common module,
so enable it by selecting the corresponding Kconfig symbol.

While at it, also make sure to depend on USB_SUPPORT because USB_PHY
requires that. This can lead to Kconfig conflicts if USB_SUPPORT is not
enabled while attempting to enable PHY_TEGRA_XUSB.

Reported-by: kbuild test robot <lkp@intel.com>
Suggested-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
2020-04-24 13:12:14 +05:30
Marc Zyngier
446c0768f5 Merge branch 'kvm-arm64/vgic-fixes-5.7' into kvmarm-master/master 2020-04-23 16:27:33 +01:00
Marc Zyngier
66f63474da Merge branch 'kvm-arm64/psci-fixes-5.7' into kvmarm-master/master 2020-04-23 16:27:26 +01:00
Zenghui Yu
57bdb436ce KVM: arm64: vgic-its: Fix memory leak on the error path of vgic_add_lpi()
If we're going to fail out the vgic_add_lpi(), let's make sure the
allocated vgic_irq memory is also freed. Though it seems that both
cases are unlikely to fail.

Signed-off-by: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20200414030349.625-3-yuzenghui@huawei.com
2020-04-23 16:26:56 +01:00
Zenghui Yu
969ce8b526 KVM: arm64: vgic-v3: Retire all pending LPIs on vcpu destroy
It's likely that the vcpu fails to handle all virtual interrupts if
userspace decides to destroy it, leaving the pending ones stay in the
ap_list. If the un-handled one is a LPI, its vgic_irq structure will
be eventually leaked because of an extra refcount increment in
vgic_queue_irq_unlock().

This was detected by kmemleak on almost every guest destroy, the
backtrace is as follows:

unreferenced object 0xffff80725aed5500 (size 128):
comm "CPU 5/KVM", pid 40711, jiffies 4298024754 (age 166366.512s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 08 01 a9 73 6d 80 ff ff ...........sm...
c8 61 ee a9 00 20 ff ff 28 1e 55 81 6c 80 ff ff .a... ..(.U.l...
backtrace:
[<000000004bcaa122>] kmem_cache_alloc_trace+0x2dc/0x418
[<0000000069c7dabb>] vgic_add_lpi+0x88/0x418
[<00000000bfefd5c5>] vgic_its_cmd_handle_mapi+0x4dc/0x588
[<00000000cf993975>] vgic_its_process_commands.part.5+0x484/0x1198
[<000000004bd3f8e3>] vgic_its_process_commands+0x50/0x80
[<00000000b9a65b2b>] vgic_mmio_write_its_cwriter+0xac/0x108
[<0000000009641ebb>] dispatch_mmio_write+0xd0/0x188
[<000000008f79d288>] __kvm_io_bus_write+0x134/0x240
[<00000000882f39ac>] kvm_io_bus_write+0xe0/0x150
[<0000000078197602>] io_mem_abort+0x484/0x7b8
[<0000000060954e3c>] kvm_handle_guest_abort+0x4cc/0xa58
[<00000000e0d0cd65>] handle_exit+0x24c/0x770
[<00000000b44a7fad>] kvm_arch_vcpu_ioctl_run+0x460/0x1988
[<0000000025fb897c>] kvm_vcpu_ioctl+0x4f8/0xee0
[<000000003271e317>] do_vfs_ioctl+0x160/0xcd8
[<00000000e7f39607>] ksys_ioctl+0x98/0xd8

Fix it by retiring all pending LPIs in the ap_list on the destroy path.

p.s. I can also reproduce it on a normal guest shutdown. It is because
userspace still send LPIs to vcpu (through KVM_SIGNAL_MSI ioctl) while
the guest is being shutdown and unable to handle it. A little strange
though and haven't dig further...

Reviewed-by: James Morse <james.morse@arm.com>
Signed-off-by: Zenghui Yu <yuzenghui@huawei.com>
[maz: moved the distributor deallocation down to avoid an UAF splat]
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20200414030349.625-2-yuzenghui@huawei.com
2020-04-23 16:26:56 +01:00
Marc Zyngier
ba1ed9e17b KVM: arm: vgic-v2: Only use the virtual state when userspace accesses pending bits
There is no point in accessing the HW when writing to any of the
ISPENDR/ICPENDR registers from userspace, as only the guest should
be allowed to change the HW state.

Introduce new userspace-specific accessors that deal solely with
the virtual state. Note that the API differs from that of GICv3,
where userspace exclusively uses ISPENDR to set the state. Too
bad we can't reuse it.

Fixes: 82e40f558d ("KVM: arm/arm64: vgic-v2: Handle SGI bits in GICD_I{S,C}PENDR0 as WI")
Reviewed-by: James Morse <james.morse@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2020-04-23 16:26:31 +01:00
Marc Zyngier
41ee52ecbc KVM: arm: vgic: Only use the virtual state when userspace accesses enable bits
There is no point in accessing the HW when writing to any of the
ISENABLER/ICENABLER registers from userspace, as only the guest
should be allowed to change the HW state.

Introduce new userspace-specific accessors that deal solely with
the virtual state.

Reported-by: James Morse <james.morse@arm.com>
Tested-by: James Morse <james.morse@arm.com>
Reviewed-by: James Morse <james.morse@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2020-04-22 17:13:30 +01:00
Marc Zyngier
9a50ebbffa KVM: arm: vgic: Synchronize the whole guest on GIC{D,R}_I{S,C}ACTIVER read
When a guest tries to read the active state of its interrupts,
we currently just return whatever state we have in memory. This
means that if such an interrupt lives in a List Register on another
CPU, we fail to obsertve the latest active state for this interrupt.

In order to remedy this, stop all the other vcpus so that they exit
and we can observe the most recent value for the state. This is
similar to what we are doing for the write side of the same
registers, and results in new MMIO handlers for userspace (which
do not need to stop the guest, as it is supposed to be stopped
already).

Reported-by: Julien Grall <julien@xen.org>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2020-04-22 17:13:16 +01:00
Oliver Neukum
e9b3c610a0 USB: serial: garmin_gps: add sanity checking for data length
We must not process packets shorter than a packet ID

Signed-off-by: Oliver Neukum <oneukum@suse.com>
Reported-and-tested-by: syzbot+d29e9263e13ce0b9f4fd@syzkaller.appspotmail.com
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Johan Hovold <johan@kernel.org>
2020-04-22 09:10:36 +02:00
Xiyu Yang
6f91a3f7af batman-adv: Fix refcnt leak in batadv_v_ogm_process
batadv_v_ogm_process() invokes batadv_hardif_neigh_get(), which returns
a reference of the neighbor object to "hardif_neigh" with increased
refcount.

When batadv_v_ogm_process() returns, "hardif_neigh" becomes invalid, so
the refcount should be decreased to keep refcount balanced.

The reference counting issue happens in one exception handling paths of
batadv_v_ogm_process(). When batadv_v_ogm_orig_get() fails to get the
orig node and returns NULL, the refcnt increased by
batadv_hardif_neigh_get() is not decreased, causing a refcnt leak.

Fix this issue by jumping to "out" label when batadv_v_ogm_orig_get()
fails to get the orig node.

Fixes: 9323158ef9 ("batman-adv: OGMv2 - implement originators logic")
Signed-off-by: Xiyu Yang <xiyuyang19@fudan.edu.cn>
Signed-off-by: Xin Tan <tanxin.ctf@gmail.com>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2020-04-21 10:08:05 +02:00
Xiyu Yang
6107c5da0f batman-adv: Fix refcnt leak in batadv_store_throughput_override
batadv_show_throughput_override() invokes batadv_hardif_get_by_netdev(),
which gets a batadv_hard_iface object from net_dev with increased refcnt
and its reference is assigned to a local pointer 'hard_iface'.

When batadv_store_throughput_override() returns, "hard_iface" becomes
invalid, so the refcount should be decreased to keep refcount balanced.

The issue happens in one error path of
batadv_store_throughput_override(). When batadv_parse_throughput()
returns NULL, the refcnt increased by batadv_hardif_get_by_netdev() is
not decreased, causing a refcnt leak.

Fix this issue by jumping to "out" label when batadv_parse_throughput()
returns NULL.

Fixes: 0b5ecc6811 ("batman-adv: add throughput override attribute to hard_ifaces")
Signed-off-by: Xiyu Yang <xiyuyang19@fudan.edu.cn>
Signed-off-by: Xin Tan <tanxin.ctf@gmail.com>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2020-04-21 10:08:05 +02:00
Xiyu Yang
f872de8185 batman-adv: Fix refcnt leak in batadv_show_throughput_override
batadv_show_throughput_override() invokes batadv_hardif_get_by_netdev(),
which gets a batadv_hard_iface object from net_dev with increased refcnt
and its reference is assigned to a local pointer 'hard_iface'.

When batadv_show_throughput_override() returns, "hard_iface" becomes
invalid, so the refcount should be decreased to keep refcount balanced.

The issue happens in the normal path of
batadv_show_throughput_override(), which forgets to decrease the refcnt
increased by batadv_hardif_get_by_netdev() before the function returns,
causing a refcnt leak.

Fix this issue by calling batadv_hardif_put() before the
batadv_show_throughput_override() returns in the normal path.

Fixes: 0b5ecc6811 ("batman-adv: add throughput override attribute to hard_ifaces")
Signed-off-by: Xiyu Yang <xiyuyang19@fudan.edu.cn>
Signed-off-by: Xin Tan <tanxin.ctf@gmail.com>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2020-04-21 10:08:05 +02:00
George Spelvin
fd0c42c4de batman-adv: fix batadv_nc_random_weight_tq
and change to pseudorandom numbers, as this is a traffic dithering
operation that doesn't need crypto-grade.

The previous code operated in 4 steps:

1. Generate a random byte 0 <= rand_tq <= 255
2. Multiply it by BATADV_TQ_MAX_VALUE - tq
3. Divide by 255 (= BATADV_TQ_MAX_VALUE)
4. Return BATADV_TQ_MAX_VALUE - rand_tq

This would apperar to scale (BATADV_TQ_MAX_VALUE - tq) by a random
value between 0/255 and 255/255.

But!  The intermediate value between steps 3 and 4 is stored in a u8
variable.  So it's truncated, and most of the time, is less than 255, after
which the division produces 0.  Specifically, if tq is odd, the product is
always even, and can never be 255.  If tq is even, there's exactly one
random byte value that will produce a product byte of 255.

Thus, the return value is 255 (511/512 of the time) or 254 (1/512
of the time).

If we assume that the truncation is a bug, and the code is meant to scale
the input, a simpler way of looking at it is that it's returning a random
value between tq and BATADV_TQ_MAX_VALUE, inclusive.

Well, we have an optimized function for doing just that.

Fixes: 3c12de9a5c ("batman-adv: network coding - code and transmit packets if possible")
Signed-off-by: George Spelvin <lkml@sdf.org>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2020-04-21 10:08:04 +02:00
Jason Yan
f585c9d543 platform/x86/intel-uncore-freq: make uncore_root_kobj static
Fix the following sparse warning:

drivers/platform/x86/intel-uncore-frequency.c:56:16: warning: symbol
'uncore_root_kobj' was not declared. Should it be static?

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Jason Yan <yanaijie@huawei.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2020-04-17 16:48:06 +03:00
YueHaibing
713df99a9e platform/x86: wmi: Make two functions static
Fix sparse warnings:

drivers/platform/x86/xiaomi-wmi.c:26:5: warning: symbol 'xiaomi_wmi_probe' was not declared. Should it be static?
drivers/platform/x86/xiaomi-wmi.c:51:6: warning: symbol 'xiaomi_wmi_notify' was not declared. Should it be static?

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2020-04-17 16:47:26 +03:00
Dan Carpenter
4dbccb873f platform/x86: surface3_power: Fix a NULL vs IS_ERR() check in probe
The i2c_acpi_new_device() function never returns NULL, it returns error
pointers.

Fixes: b1f81b496b ("platform/x86: surface3_power: MSHW0011 rev-eng implementation")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2020-04-17 16:47:26 +03:00
Marc Zyngier
fdc9999e20 KVM: arm64: PSCI: Forbid 64bit functions for 32bit guests
Implementing (and even advertising) 64bit PSCI functions to 32bit
guests is at least a bit odd, if not altogether violating the
spec which says ("5.2.1 Register usage in arguments and return values"):

"Adherence to the SMC Calling Conventions implies that any AArch32
caller of an SMC64 function will get a return code of 0xFFFFFFFF(int32).
This matches the NOT_SUPPORTED error code used in PSCI"

Tighten the implementation by pretending these functions are not
there for 32bit guests.

Reviewed-by: Christoffer Dall <christoffer.dall@arm.com>
Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2020-04-17 09:51:45 +01:00
Marc Zyngier
2890ac993d KVM: arm64: PSCI: Narrow input registers when using 32bit functions
When a guest delibarately uses an SMC32 function number (which is allowed),
we should make sure we drop the top 32bits from the input arguments, as they
could legitimately be junk.

Reported-by: Christoffer Dall <christoffer.dall@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@arm.com>
Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2020-04-17 09:51:45 +01:00
Jason Gerecke
b43f977dd2 Revert "HID: wacom: generic: read the number of expected touches on a per collection basis"
This reverts commit 15893fa401.

The referenced commit broke pen and touch input for a variety of devices
such as the Cintiq Pro 32. Affected devices may appear to work normally
for a short amount of time, but eventually loose track of actual touch
state and can leave touch arbitration enabled which prevents the pen
from working. The commit is not itself required for any currently-available
Bluetooth device, and so we revert it to correct the behavior of broken
devices.

This breakage occurs due to a mismatch between the order of collections
and the order of usages on some devices. This commit tries to read the
contact count before processing events, but will fail if the contact
count does not occur prior to the first logical finger collection. This
is the case for devices like the Cintiq Pro 32 which place the contact
count at the very end of the report.

Without the contact count set, touches will only be partially processed.
The `wacom_wac_finger_slot` function will not open any slots since the
number of contacts seen is greater than the expectation of 0, but we will
still end up calling `input_mt_sync_frame` for each finger anyway. This
can cause problems for userspace separate from the issue currently taking
place in the kernel. Only once all of the individual finger collections
have been processed do we finally get to the enclosing collection which
contains the contact count. The value ends up being used for the *next*
report, however.

This delayed use of the contact count can cause the driver to loose track
of the actual touch state and believe that there are contacts down when
there aren't. This leaves touch arbitration enabled and prevents the pen
from working. It can also cause userspace to incorrectly treat single-
finger input as gestures.

Link: https://github.com/linuxwacom/input-wacom/issues/146
Signed-off-by: Jason Gerecke <jason.gerecke@wacom.com>
Reviewed-by: Aaron Armstrong Skomra <aaron.skomra@wacom.com>
Fixes: 15893fa401 ("HID: wacom: generic: read the number of expected touches on a per collection basis")
Cc: stable@vger.kernel.org # 5.3+
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2020-04-16 21:38:47 +02:00
Marc Zyngier
1c32ca5dc6 KVM: arm: vgic: Fix limit condition when writing to GICD_I[CS]ACTIVER
When deciding whether a guest has to be stopped we check whether this
is a private interrupt or not. Unfortunately, there's an off-by-one bug
here, and we fail to recognize a whole range of interrupts as being
global (GICv2 SPIs 32-63).

Fix the condition from > to be >=.

Cc: stable@vger.kernel.org
Fixes: abd7229626 ("KVM: arm/arm64: Simplify active_change_prepare and plug race")
Reported-by: André Przywara <andre.przywara@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2020-04-15 14:56:14 +01:00
Jiri Kosina
185af3e775 HID: alps: ALPS_1657 is too specific; use U1_UNICORN_LEGACY instead
HID_DEVICE_ID_ALPS_1657 PID is too specific, as there are many other
ALPS hardware IDs using this particular touchpad.

Rename the identifier to HID_DEVICE_ID_ALPS_U1_UNICORN_LEGACY in order
to describe reality better.

Fixes: 640e403b1f ("HID: alps: Add AUI1657 device ID")
Reported-by: Xiaojian Cao <xiaojian.cao@cn.alps.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2020-04-15 14:51:42 +02:00
Artem Borisov
640e403b1f HID: alps: Add AUI1657 device ID
This device is used on Lenovo V130-15IKB variants and uses
the same registers as U1.

Signed-off-by: Artem Borisov <dedsa2002@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2020-04-14 11:21:48 +02:00
Fabian Schindlatz
b1bd0f7528 HID: logitech: Add support for Logitech G11 extra keys
The Logitech G11 keyboard is a cheap variant of the G15 without the LCD
screen. It uses the same layout for its extra and macro keys (G1 - G18,
M1-M3, MR) and - from the input subsystem's perspective - behaves just
like the G15, so we can treat it as such.

Tested it with my own keyboard.

Signed-off-by: Fabian Schindlatz <fabian.schindlatz@fau.de>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2020-04-14 11:15:36 +02:00
Sebastian Reichel
f9e82295ee HID: multitouch: add eGalaxTouch P80H84 support
Add support for P80H84 touchscreen from eGalaxy:

  idVendor           0x0eef D-WAV Scientific Co., Ltd
  idProduct          0xc002
  iManufacturer           1 eGalax Inc.
  iProduct                2 eGalaxTouch P80H84 2019 vDIVA_1204_T01 k4.02.146

Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2020-04-14 11:14:11 +02:00
Frédéric Pierret (fepitre)
c7527373fe gcc-common.h: Update for GCC 10
Remove "params.h" include, which has been dropped in GCC 10.

Remove is_a_helper() macro, which is now defined in gimple.h, as seen
when running './scripts/gcc-plugin.sh g++ g++ gcc':

In file included from <stdin>:1:
./gcc-plugins/gcc-common.h:852:13: error: redefinition of ‘static bool is_a_helper<T>::test(U*) [with U = const gimple; T = const ggoto*]’
  852 | inline bool is_a_helper<const ggoto *>::test(const_gimple gs)
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from ./gcc-plugins/gcc-common.h:125,
                 from <stdin>:1:
/usr/lib/gcc/x86_64-redhat-linux/10/plugin/include/gimple.h:1037:1: note: ‘static bool is_a_helper<T>::test(U*) [with U = const gimple; T = const ggoto*]’ previously declared here
 1037 | is_a_helper <const ggoto *>::test (const gimple *gs)
      | ^~~~~~~~~~~~~~~~~~~~~~~~~~~

Add -Wno-format-diag to scripts/gcc-plugins/Makefile to avoid
meaningless warnings from error() formats used by plugins:

scripts/gcc-plugins/structleak_plugin.c: In function ‘int plugin_init(plugin_name_args*, plugin_gcc_version*)’:
scripts/gcc-plugins/structleak_plugin.c:253:12: warning: unquoted sequence of 2 consecutive punctuation characters ‘'-’ in format [-Wformat-diag]
  253 |   error(G_("unknown option '-fplugin-arg-%s-%s'"), plugin_name, argv[i].key);
      |            ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Signed-off-by: Frédéric Pierret (fepitre) <frederic.pierret@qubes-os.org>
Link: https://lore.kernel.org/r/20200407113259.270172-1-frederic.pierret@qubes-os.org
[kees: include -Wno-format-diag for plugin builds]
Signed-off-by: Kees Cook <keescook@chromium.org>
2020-04-13 10:19:20 -07:00
Kees Cook
8d97fb393c gcc-plugins/stackleak: Avoid assignment for unused macro argument
With GCC version >= 8, the cgraph_create_edge() macro argument using
"frequency" goes unused. Instead of assigning a temporary variable for
the argument, pass the compute_call_stmt_bb_frequency() call directly
as the macro argument so that it will just not be called when it is
not wanted by the macros.

Silences the warning:

scripts/gcc-plugins/stackleak_plugin.c:54:6: warning: variable ‘frequency’ set but not used [-Wunused-but-set-variable]

Now builds cleanly with gcc-7 and gcc-9. Both boot and pass
STACKLEAK_ERASING LKDTM test.

Signed-off-by: Kees Cook <keescook@chromium.org>
2020-04-13 10:17:44 -07:00
Jason Gerecke
778fbf4179 HID: wacom: Read HID_DG_CONTACTMAX directly for non-generic devices
We've recently switched from extracting the value of HID_DG_CONTACTMAX
at a fixed offset (which may not be correct for all tablets) to
injecting the report into the driver for the generic codepath to handle.
Unfortunately, this change was made for *all* tablets, even those which
aren't generic. Because `wacom_wac_report` ignores reports from non-
generic devices, the contact count never gets initialized. Ultimately
this results in the touch device itself failing to probe, and thus the
loss of touch input.

This commit adds back the fixed-offset extraction for non-generic devices.

Link: https://github.com/linuxwacom/input-wacom/issues/155
Fixes: 184eccd403 ("HID: wacom: generic: read HID_DG_CONTACTMAX from any feature report")
Signed-off-by: Jason Gerecke <jason.gerecke@wacom.com>
Reviewed-by: Aaron Armstrong Skomra <aaron.skomra@wacom.com>
CC: stable@vger.kernel.org # 5.3+
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
2020-04-03 12:10:22 +02:00
306 changed files with 3101 additions and 1437 deletions

View File

@@ -61,8 +61,8 @@ The ``ice`` driver reports the following versions
- running
- ICE OS Default Package
- The name of the DDP package that is active in the device. The DDP
package is loaded by the driver during initialization. Each varation
of DDP package shall have a unique name.
package is loaded by the driver during initialization. Each
variation of the DDP package has a unique name.
* - ``fw.app``
- running
- 1.3.1.0

View File

@@ -28,3 +28,5 @@ KVM
arm/index
devices/index
running-nested-guests

View File

@@ -0,0 +1,276 @@
==============================
Running nested guests with KVM
==============================
A nested guest is the ability to run a guest inside another guest (it
can be KVM-based or a different hypervisor). The straightforward
example is a KVM guest that in turn runs on a KVM guest (the rest of
this document is built on this example)::
.----------------. .----------------.
| | | |
| L2 | | L2 |
| (Nested Guest) | | (Nested Guest) |
| | | |
|----------------'--'----------------|
| |
| L1 (Guest Hypervisor) |
| KVM (/dev/kvm) |
| |
.------------------------------------------------------.
| L0 (Host Hypervisor) |
| KVM (/dev/kvm) |
|------------------------------------------------------|
| Hardware (with virtualization extensions) |
'------------------------------------------------------'
Terminology:
- L0 level-0; the bare metal host, running KVM
- L1 level-1 guest; a VM running on L0; also called the "guest
hypervisor", as it itself is capable of running KVM.
- L2 level-2 guest; a VM running on L1, this is the "nested guest"
.. note:: The above diagram is modelled after the x86 architecture;
s390x, ppc64 and other architectures are likely to have
a different design for nesting.
For example, s390x always has an LPAR (LogicalPARtition)
hypervisor running on bare metal, adding another layer and
resulting in at least four levels in a nested setup — L0 (bare
metal, running the LPAR hypervisor), L1 (host hypervisor), L2
(guest hypervisor), L3 (nested guest).
This document will stick with the three-level terminology (L0,
L1, and L2) for all architectures; and will largely focus on
x86.
Use Cases
---------
There are several scenarios where nested KVM can be useful, to name a
few:
- As a developer, you want to test your software on different operating
systems (OSes). Instead of renting multiple VMs from a Cloud
Provider, using nested KVM lets you rent a large enough "guest
hypervisor" (level-1 guest). This in turn allows you to create
multiple nested guests (level-2 guests), running different OSes, on
which you can develop and test your software.
- Live migration of "guest hypervisors" and their nested guests, for
load balancing, disaster recovery, etc.
- VM image creation tools (e.g. ``virt-install``, etc) often run
their own VM, and users expect these to work inside a VM.
- Some OSes use virtualization internally for security (e.g. to let
applications run safely in isolation).
Enabling "nested" (x86)
-----------------------
From Linux kernel v4.19 onwards, the ``nested`` KVM parameter is enabled
by default for Intel and AMD. (Though your Linux distribution might
override this default.)
In case you are running a Linux kernel older than v4.19, to enable
nesting, set the ``nested`` KVM module parameter to ``Y`` or ``1``. To
persist this setting across reboots, you can add it in a config file, as
shown below:
1. On the bare metal host (L0), list the kernel modules and ensure that
the KVM modules::
$ lsmod | grep -i kvm
kvm_intel 133627 0
kvm 435079 1 kvm_intel
2. Show information for ``kvm_intel`` module::
$ modinfo kvm_intel | grep -i nested
parm: nested:bool
3. For the nested KVM configuration to persist across reboots, place the
below in ``/etc/modprobed/kvm_intel.conf`` (create the file if it
doesn't exist)::
$ cat /etc/modprobe.d/kvm_intel.conf
options kvm-intel nested=y
4. Unload and re-load the KVM Intel module::
$ sudo rmmod kvm-intel
$ sudo modprobe kvm-intel
5. Verify if the ``nested`` parameter for KVM is enabled::
$ cat /sys/module/kvm_intel/parameters/nested
Y
For AMD hosts, the process is the same as above, except that the module
name is ``kvm-amd``.
Additional nested-related kernel parameters (x86)
-------------------------------------------------
If your hardware is sufficiently advanced (Intel Haswell processor or
higher, which has newer hardware virt extensions), the following
additional features will also be enabled by default: "Shadow VMCS
(Virtual Machine Control Structure)", APIC Virtualization on your bare
metal host (L0). Parameters for Intel hosts::
$ cat /sys/module/kvm_intel/parameters/enable_shadow_vmcs
Y
$ cat /sys/module/kvm_intel/parameters/enable_apicv
Y
$ cat /sys/module/kvm_intel/parameters/ept
Y
.. note:: If you suspect your L2 (i.e. nested guest) is running slower,
ensure the above are enabled (particularly
``enable_shadow_vmcs`` and ``ept``).
Starting a nested guest (x86)
-----------------------------
Once your bare metal host (L0) is configured for nesting, you should be
able to start an L1 guest with::
$ qemu-kvm -cpu host [...]
The above will pass through the host CPU's capabilities as-is to the
gues); or for better live migration compatibility, use a named CPU
model supported by QEMU. e.g.::
$ qemu-kvm -cpu Haswell-noTSX-IBRS,vmx=on
then the guest hypervisor will subsequently be capable of running a
nested guest with accelerated KVM.
Enabling "nested" (s390x)
-------------------------
1. On the host hypervisor (L0), enable the ``nested`` parameter on
s390x::
$ rmmod kvm
$ modprobe kvm nested=1
.. note:: On s390x, the kernel parameter ``hpage`` is mutually exclusive
with the ``nested`` paramter — i.e. to be able to enable
``nested``, the ``hpage`` parameter *must* be disabled.
2. The guest hypervisor (L1) must be provided with the ``sie`` CPU
feature — with QEMU, this can be done by using "host passthrough"
(via the command-line ``-cpu host``).
3. Now the KVM module can be loaded in the L1 (guest hypervisor)::
$ modprobe kvm
Live migration with nested KVM
------------------------------
Migrating an L1 guest, with a *live* nested guest in it, to another
bare metal host, works as of Linux kernel 5.3 and QEMU 4.2.0 for
Intel x86 systems, and even on older versions for s390x.
On AMD systems, once an L1 guest has started an L2 guest, the L1 guest
should no longer be migrated or saved (refer to QEMU documentation on
"savevm"/"loadvm") until the L2 guest shuts down. Attempting to migrate
or save-and-load an L1 guest while an L2 guest is running will result in
undefined behavior. You might see a ``kernel BUG!`` entry in ``dmesg``, a
kernel 'oops', or an outright kernel panic. Such a migrated or loaded L1
guest can no longer be considered stable or secure, and must be restarted.
Migrating an L1 guest merely configured to support nesting, while not
actually running L2 guests, is expected to function normally even on AMD
systems but may fail once guests are started.
Migrating an L2 guest is always expected to succeed, so all the following
scenarios should work even on AMD systems:
- Migrating a nested guest (L2) to another L1 guest on the *same* bare
metal host.
- Migrating a nested guest (L2) to another L1 guest on a *different*
bare metal host.
- Migrating a nested guest (L2) to a bare metal host.
Reporting bugs from nested setups
-----------------------------------
Debugging "nested" problems can involve sifting through log files across
L0, L1 and L2; this can result in tedious back-n-forth between the bug
reporter and the bug fixer.
- Mention that you are in a "nested" setup. If you are running any kind
of "nesting" at all, say so. Unfortunately, this needs to be called
out because when reporting bugs, people tend to forget to even
*mention* that they're using nested virtualization.
- Ensure you are actually running KVM on KVM. Sometimes people do not
have KVM enabled for their guest hypervisor (L1), which results in
them running with pure emulation or what QEMU calls it as "TCG", but
they think they're running nested KVM. Thus confusing "nested Virt"
(which could also mean, QEMU on KVM) with "nested KVM" (KVM on KVM).
Information to collect (generic)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The following is not an exhaustive list, but a very good starting point:
- Kernel, libvirt, and QEMU version from L0
- Kernel, libvirt and QEMU version from L1
- QEMU command-line of L1 -- when using libvirt, you'll find it here:
``/var/log/libvirt/qemu/instance.log``
- QEMU command-line of L2 -- as above, when using libvirt, get the
complete libvirt-generated QEMU command-line
- ``cat /sys/cpuinfo`` from L0
- ``cat /sys/cpuinfo`` from L1
- ``lscpu`` from L0
- ``lscpu`` from L1
- Full ``dmesg`` output from L0
- Full ``dmesg`` output from L1
x86-specific info to collect
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Both the below commands, ``x86info`` and ``dmidecode``, should be
available on most Linux distributions with the same name:
- Output of: ``x86info -a`` from L0
- Output of: ``x86info -a`` from L1
- Output of: ``dmidecode`` from L0
- Output of: ``dmidecode`` from L1
s390x-specific info to collect
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Along with the earlier mentioned generic details, the below is
also recommended:
- ``/proc/sysinfo`` from L1; this will also include the info from L0

View File

@@ -3936,11 +3936,9 @@ F: arch/powerpc/platforms/cell/
CEPH COMMON CODE (LIBCEPH)
M: Ilya Dryomov <idryomov@gmail.com>
M: Jeff Layton <jlayton@kernel.org>
M: Sage Weil <sage@redhat.com>
L: ceph-devel@vger.kernel.org
S: Supported
W: http://ceph.com/
T: git git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client.git
T: git git://github.com/ceph/ceph-client.git
F: include/linux/ceph/
F: include/linux/crush/
@@ -3948,12 +3946,10 @@ F: net/ceph/
CEPH DISTRIBUTED FILE SYSTEM CLIENT (CEPH)
M: Jeff Layton <jlayton@kernel.org>
M: Sage Weil <sage@redhat.com>
M: Ilya Dryomov <idryomov@gmail.com>
L: ceph-devel@vger.kernel.org
S: Supported
W: http://ceph.com/
T: git git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client.git
T: git git://github.com/ceph/ceph-client.git
F: Documentation/filesystems/ceph.rst
F: fs/ceph/
@@ -5935,9 +5931,9 @@ F: lib/dynamic_debug.c
DYNAMIC INTERRUPT MODERATION
M: Tal Gilboa <talgi@mellanox.com>
S: Maintained
F: Documentation/networking/net_dim.rst
F: include/linux/dim.h
F: lib/dim/
F: Documentation/networking/net_dim.rst
DZ DECSTATION DZ11 SERIAL DRIVER
M: "Maciej W. Rozycki" <macro@linux-mips.org>
@@ -7119,9 +7115,10 @@ F: include/uapi/asm-generic/
GENERIC PHY FRAMEWORK
M: Kishon Vijay Abraham I <kishon@ti.com>
M: Vinod Koul <vkoul@kernel.org>
L: linux-kernel@vger.kernel.org
S: Supported
T: git git://git.kernel.org/pub/scm/linux/kernel/git/kishon/linux-phy.git
T: git git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy.git
F: Documentation/devicetree/bindings/phy/
F: drivers/phy/
F: include/linux/phy/
@@ -7746,11 +7743,6 @@ L: platform-driver-x86@vger.kernel.org
S: Orphan
F: drivers/platform/x86/tc1100-wmi.c
HP100: Driver for HP 10/100 Mbit/s Voice Grade Network Adapter Series
M: Jaroslav Kysela <perex@perex.cz>
S: Obsolete
F: drivers/staging/hp/hp100.*
HPET: High Precision Event Timers driver
M: Clemens Ladisch <clemens@ladisch.de>
S: Maintained
@@ -14102,12 +14094,10 @@ F: drivers/media/radio/radio-tea5777.c
RADOS BLOCK DEVICE (RBD)
M: Ilya Dryomov <idryomov@gmail.com>
M: Sage Weil <sage@redhat.com>
R: Dongsheng Yang <dongsheng.yang@easystack.cn>
L: ceph-devel@vger.kernel.org
S: Supported
W: http://ceph.com/
T: git git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client.git
T: git git://github.com/ceph/ceph-client.git
F: Documentation/ABI/testing/sysfs-bus-rbd
F: drivers/block/rbd.c

View File

@@ -2,7 +2,7 @@
VERSION = 5
PATCHLEVEL = 7
SUBLEVEL = 0
EXTRAVERSION = -rc4
EXTRAVERSION = -rc5
NAME = Kleptomaniac Octopus
# *DOCUMENTATION*
@@ -729,10 +729,6 @@ else ifdef CONFIG_CC_OPTIMIZE_FOR_SIZE
KBUILD_CFLAGS += -Os
endif
ifdef CONFIG_CC_DISABLE_WARN_MAYBE_UNINITIALIZED
KBUILD_CFLAGS += -Wno-maybe-uninitialized
endif
# Tell gcc to never replace conditional load with a non-conditional one
KBUILD_CFLAGS += $(call cc-option,--param=allow-store-data-races=0)
KBUILD_CFLAGS += $(call cc-option,-fno-allow-store-data-races)
@@ -881,6 +877,17 @@ KBUILD_CFLAGS += -Wno-pointer-sign
# disable stringop warnings in gcc 8+
KBUILD_CFLAGS += $(call cc-disable-warning, stringop-truncation)
# We'll want to enable this eventually, but it's not going away for 5.7 at least
KBUILD_CFLAGS += $(call cc-disable-warning, zero-length-bounds)
KBUILD_CFLAGS += $(call cc-disable-warning, array-bounds)
KBUILD_CFLAGS += $(call cc-disable-warning, stringop-overflow)
# Another good warning that we'll want to enable eventually
KBUILD_CFLAGS += $(call cc-disable-warning, restrict)
# Enabled with W=2, disabled by default as noisy
KBUILD_CFLAGS += $(call cc-disable-warning, maybe-uninitialized)
# disable invalid "can't wrap" optimizations for signed / pointers
KBUILD_CFLAGS += $(call cc-option,-fno-strict-overflow)

View File

@@ -91,9 +91,17 @@ void chacha_crypt_arch(u32 *state, u8 *dst, const u8 *src, unsigned int bytes,
return;
}
kernel_neon_begin();
chacha_doneon(state, dst, src, bytes, nrounds);
kernel_neon_end();
do {
unsigned int todo = min_t(unsigned int, bytes, SZ_4K);
kernel_neon_begin();
chacha_doneon(state, dst, src, todo, nrounds);
kernel_neon_end();
bytes -= todo;
src += todo;
dst += todo;
} while (bytes);
}
EXPORT_SYMBOL(chacha_crypt_arch);

View File

@@ -30,7 +30,7 @@ static int nhpoly1305_neon_update(struct shash_desc *desc,
return crypto_nhpoly1305_update(desc, src, srclen);
do {
unsigned int n = min_t(unsigned int, srclen, PAGE_SIZE);
unsigned int n = min_t(unsigned int, srclen, SZ_4K);
kernel_neon_begin();
crypto_nhpoly1305_update_helper(desc, src, n, _nh_neon);

View File

@@ -160,13 +160,20 @@ void poly1305_update_arch(struct poly1305_desc_ctx *dctx, const u8 *src,
unsigned int len = round_down(nbytes, POLY1305_BLOCK_SIZE);
if (static_branch_likely(&have_neon) && do_neon) {
kernel_neon_begin();
poly1305_blocks_neon(&dctx->h, src, len, 1);
kernel_neon_end();
do {
unsigned int todo = min_t(unsigned int, len, SZ_4K);
kernel_neon_begin();
poly1305_blocks_neon(&dctx->h, src, todo, 1);
kernel_neon_end();
len -= todo;
src += todo;
} while (len);
} else {
poly1305_blocks_arm(&dctx->h, src, len, 1);
src += len;
}
src += len;
nbytes %= POLY1305_BLOCK_SIZE;
}

View File

@@ -165,8 +165,13 @@ arch_futex_atomic_op_inuser(int op, int oparg, int *oval, u32 __user *uaddr)
preempt_enable();
#endif
if (!ret)
*oval = oldval;
/*
* Store unconditionally. If ret != 0 the extra store is the least
* of the worries but GCC cannot figure out that __futex_atomic_op()
* is either setting ret to -EFAULT or storing the old value in
* oldval which results in a uninitialized warning at the call site.
*/
*oval = oldval;
return ret;
}

View File

@@ -87,9 +87,17 @@ void chacha_crypt_arch(u32 *state, u8 *dst, const u8 *src, unsigned int bytes,
!crypto_simd_usable())
return chacha_crypt_generic(state, dst, src, bytes, nrounds);
kernel_neon_begin();
chacha_doneon(state, dst, src, bytes, nrounds);
kernel_neon_end();
do {
unsigned int todo = min_t(unsigned int, bytes, SZ_4K);
kernel_neon_begin();
chacha_doneon(state, dst, src, todo, nrounds);
kernel_neon_end();
bytes -= todo;
src += todo;
dst += todo;
} while (bytes);
}
EXPORT_SYMBOL(chacha_crypt_arch);

View File

@@ -30,7 +30,7 @@ static int nhpoly1305_neon_update(struct shash_desc *desc,
return crypto_nhpoly1305_update(desc, src, srclen);
do {
unsigned int n = min_t(unsigned int, srclen, PAGE_SIZE);
unsigned int n = min_t(unsigned int, srclen, SZ_4K);
kernel_neon_begin();
crypto_nhpoly1305_update_helper(desc, src, n, _nh_neon);

View File

@@ -143,13 +143,20 @@ void poly1305_update_arch(struct poly1305_desc_ctx *dctx, const u8 *src,
unsigned int len = round_down(nbytes, POLY1305_BLOCK_SIZE);
if (static_branch_likely(&have_neon) && crypto_simd_usable()) {
kernel_neon_begin();
poly1305_blocks_neon(&dctx->h, src, len, 1);
kernel_neon_end();
do {
unsigned int todo = min_t(unsigned int, len, SZ_4K);
kernel_neon_begin();
poly1305_blocks_neon(&dctx->h, src, todo, 1);
kernel_neon_end();
len -= todo;
src += todo;
} while (len);
} else {
poly1305_blocks(&dctx->h, src, len, 1);
src += len;
}
src += len;
nbytes %= POLY1305_BLOCK_SIZE;
}

View File

@@ -200,6 +200,13 @@ static int set_core_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
}
memcpy((u32 *)regs + off, valp, KVM_REG_SIZE(reg->id));
if (*vcpu_cpsr(vcpu) & PSR_MODE32_BIT) {
int i;
for (i = 0; i < 16; i++)
*vcpu_reg32(vcpu, i) = (u32)*vcpu_reg32(vcpu, i);
}
out:
return err;
}

View File

@@ -18,6 +18,7 @@
#define CPU_GP_REG_OFFSET(x) (CPU_GP_REGS + x)
#define CPU_XREG_OFFSET(x) CPU_GP_REG_OFFSET(CPU_USER_PT_REGS + 8*x)
#define CPU_SP_EL0_OFFSET (CPU_XREG_OFFSET(30) + 8)
.text
.pushsection .hyp.text, "ax"
@@ -47,6 +48,16 @@
ldp x29, lr, [\ctxt, #CPU_XREG_OFFSET(29)]
.endm
.macro save_sp_el0 ctxt, tmp
mrs \tmp, sp_el0
str \tmp, [\ctxt, #CPU_SP_EL0_OFFSET]
.endm
.macro restore_sp_el0 ctxt, tmp
ldr \tmp, [\ctxt, #CPU_SP_EL0_OFFSET]
msr sp_el0, \tmp
.endm
/*
* u64 __guest_enter(struct kvm_vcpu *vcpu,
* struct kvm_cpu_context *host_ctxt);
@@ -60,6 +71,9 @@ SYM_FUNC_START(__guest_enter)
// Store the host regs
save_callee_saved_regs x1
// Save the host's sp_el0
save_sp_el0 x1, x2
// Now the host state is stored if we have a pending RAS SError it must
// affect the host. If any asynchronous exception is pending we defer
// the guest entry. The DSB isn't necessary before v8.2 as any SError
@@ -83,6 +97,9 @@ alternative_else_nop_endif
// when this feature is enabled for kernel code.
ptrauth_switch_to_guest x29, x0, x1, x2
// Restore the guest's sp_el0
restore_sp_el0 x29, x0
// Restore guest regs x0-x17
ldp x0, x1, [x29, #CPU_XREG_OFFSET(0)]
ldp x2, x3, [x29, #CPU_XREG_OFFSET(2)]
@@ -130,6 +147,9 @@ SYM_INNER_LABEL(__guest_exit, SYM_L_GLOBAL)
// Store the guest regs x18-x29, lr
save_callee_saved_regs x1
// Store the guest's sp_el0
save_sp_el0 x1, x2
get_host_ctxt x2, x3
// Macro ptrauth_switch_to_guest format:
@@ -139,6 +159,9 @@ SYM_INNER_LABEL(__guest_exit, SYM_L_GLOBAL)
// when this feature is enabled for kernel code.
ptrauth_switch_to_host x1, x2, x3, x4, x5
// Restore the hosts's sp_el0
restore_sp_el0 x2, x3
// Now restore the host regs
restore_callee_saved_regs x2

View File

@@ -198,7 +198,6 @@ SYM_CODE_END(__hyp_panic)
.macro invalid_vector label, target = __hyp_panic
.align 2
SYM_CODE_START(\label)
\label:
b \target
SYM_CODE_END(\label)
.endm

View File

@@ -15,8 +15,9 @@
/*
* Non-VHE: Both host and guest must save everything.
*
* VHE: Host and guest must save mdscr_el1 and sp_el0 (and the PC and pstate,
* which are handled as part of the el2 return state) on every switch.
* VHE: Host and guest must save mdscr_el1 and sp_el0 (and the PC and
* pstate, which are handled as part of the el2 return state) on every
* switch (sp_el0 is being dealt with in the assembly code).
* tpidr_el0 and tpidrro_el0 only need to be switched when going
* to host userspace or a different VCPU. EL1 registers only need to be
* switched when potentially going to run a different VCPU. The latter two
@@ -26,12 +27,6 @@
static void __hyp_text __sysreg_save_common_state(struct kvm_cpu_context *ctxt)
{
ctxt->sys_regs[MDSCR_EL1] = read_sysreg(mdscr_el1);
/*
* The host arm64 Linux uses sp_el0 to point to 'current' and it must
* therefore be saved/restored on every entry/exit to/from the guest.
*/
ctxt->gp_regs.regs.sp = read_sysreg(sp_el0);
}
static void __hyp_text __sysreg_save_user_state(struct kvm_cpu_context *ctxt)
@@ -99,12 +94,6 @@ NOKPROBE_SYMBOL(sysreg_save_guest_state_vhe);
static void __hyp_text __sysreg_restore_common_state(struct kvm_cpu_context *ctxt)
{
write_sysreg(ctxt->sys_regs[MDSCR_EL1], mdscr_el1);
/*
* The host arm64 Linux uses sp_el0 to point to 'current' and it must
* therefore be saved/restored on every entry/exit to/from the guest.
*/
write_sysreg(ctxt->gp_regs.regs.sp, sp_el0);
}
static void __hyp_text __sysreg_restore_user_state(struct kvm_cpu_context *ctxt)

View File

@@ -230,6 +230,8 @@ pte_t *huge_pte_alloc(struct mm_struct *mm,
ptep = (pte_t *)pudp;
} else if (sz == (CONT_PTE_SIZE)) {
pmdp = pmd_alloc(mm, pudp, addr);
if (!pmdp)
return NULL;
WARN_ON(addr & (sz - 1));
/*

View File

@@ -521,6 +521,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
case KVM_CAP_IOEVENTFD:
case KVM_CAP_DEVICE_CTRL:
case KVM_CAP_IMMEDIATE_EXIT:
case KVM_CAP_SET_GUEST_DEBUG:
r = 1;
break;
case KVM_CAP_PPC_GUEST_DEBUG_SSTEP:

View File

@@ -51,13 +51,10 @@
#define CAUSE_IRQ_FLAG (_AC(1, UL) << (__riscv_xlen - 1))
/* Interrupt causes (minus the high bit) */
#define IRQ_U_SOFT 0
#define IRQ_S_SOFT 1
#define IRQ_M_SOFT 3
#define IRQ_U_TIMER 4
#define IRQ_S_TIMER 5
#define IRQ_M_TIMER 7
#define IRQ_U_EXT 8
#define IRQ_S_EXT 9
#define IRQ_M_EXT 11

View File

@@ -8,6 +8,7 @@
#ifndef _ASM_RISCV_HWCAP_H
#define _ASM_RISCV_HWCAP_H
#include <linux/bits.h>
#include <uapi/asm/hwcap.h>
#ifndef __ASSEMBLY__
@@ -22,6 +23,27 @@ enum {
};
extern unsigned long elf_hwcap;
#define RISCV_ISA_EXT_a ('a' - 'a')
#define RISCV_ISA_EXT_c ('c' - 'a')
#define RISCV_ISA_EXT_d ('d' - 'a')
#define RISCV_ISA_EXT_f ('f' - 'a')
#define RISCV_ISA_EXT_h ('h' - 'a')
#define RISCV_ISA_EXT_i ('i' - 'a')
#define RISCV_ISA_EXT_m ('m' - 'a')
#define RISCV_ISA_EXT_s ('s' - 'a')
#define RISCV_ISA_EXT_u ('u' - 'a')
#define RISCV_ISA_EXT_MAX 64
unsigned long riscv_isa_extension_base(const unsigned long *isa_bitmap);
#define riscv_isa_extension_mask(ext) BIT_MASK(RISCV_ISA_EXT_##ext)
bool __riscv_isa_extension_available(const unsigned long *isa_bitmap, int bit);
#define riscv_isa_extension_available(isa_bitmap, ext) \
__riscv_isa_extension_available(isa_bitmap, RISCV_ISA_EXT_##ext)
#endif
#endif /* _ASM_RISCV_HWCAP_H */

View File

@@ -22,14 +22,6 @@ static inline int set_memory_x(unsigned long addr, int numpages) { return 0; }
static inline int set_memory_nx(unsigned long addr, int numpages) { return 0; }
#endif
#ifdef CONFIG_STRICT_KERNEL_RWX
void set_kernel_text_ro(void);
void set_kernel_text_rw(void);
#else
static inline void set_kernel_text_ro(void) { }
static inline void set_kernel_text_rw(void) { }
#endif
int set_direct_map_invalid_noflush(struct page *page);
int set_direct_map_default_noflush(struct page *page);

View File

@@ -15,8 +15,8 @@
const struct cpu_operations *cpu_ops[NR_CPUS] __ro_after_init;
void *__cpu_up_stack_pointer[NR_CPUS];
void *__cpu_up_task_pointer[NR_CPUS];
void *__cpu_up_stack_pointer[NR_CPUS] __section(.data);
void *__cpu_up_task_pointer[NR_CPUS] __section(.data);
extern const struct cpu_operations cpu_ops_sbi;
extern const struct cpu_operations cpu_ops_spinwait;

View File

@@ -6,6 +6,7 @@
* Copyright (C) 2017 SiFive
*/
#include <linux/bitmap.h>
#include <linux/of.h>
#include <asm/processor.h>
#include <asm/hwcap.h>
@@ -13,15 +14,57 @@
#include <asm/switch_to.h>
unsigned long elf_hwcap __read_mostly;
/* Host ISA bitmap */
static DECLARE_BITMAP(riscv_isa, RISCV_ISA_EXT_MAX) __read_mostly;
#ifdef CONFIG_FPU
bool has_fpu __read_mostly;
#endif
/**
* riscv_isa_extension_base() - Get base extension word
*
* @isa_bitmap: ISA bitmap to use
* Return: base extension word as unsigned long value
*
* NOTE: If isa_bitmap is NULL then Host ISA bitmap will be used.
*/
unsigned long riscv_isa_extension_base(const unsigned long *isa_bitmap)
{
if (!isa_bitmap)
return riscv_isa[0];
return isa_bitmap[0];
}
EXPORT_SYMBOL_GPL(riscv_isa_extension_base);
/**
* __riscv_isa_extension_available() - Check whether given extension
* is available or not
*
* @isa_bitmap: ISA bitmap to use
* @bit: bit position of the desired extension
* Return: true or false
*
* NOTE: If isa_bitmap is NULL then Host ISA bitmap will be used.
*/
bool __riscv_isa_extension_available(const unsigned long *isa_bitmap, int bit)
{
const unsigned long *bmap = (isa_bitmap) ? isa_bitmap : riscv_isa;
if (bit >= RISCV_ISA_EXT_MAX)
return false;
return test_bit(bit, bmap) ? true : false;
}
EXPORT_SYMBOL_GPL(__riscv_isa_extension_available);
void riscv_fill_hwcap(void)
{
struct device_node *node;
const char *isa;
size_t i;
char print_str[BITS_PER_LONG + 1];
size_t i, j, isa_len;
static unsigned long isa2hwcap[256] = {0};
isa2hwcap['i'] = isa2hwcap['I'] = COMPAT_HWCAP_ISA_I;
@@ -33,8 +76,11 @@ void riscv_fill_hwcap(void)
elf_hwcap = 0;
bitmap_zero(riscv_isa, RISCV_ISA_EXT_MAX);
for_each_of_cpu_node(node) {
unsigned long this_hwcap = 0;
unsigned long this_isa = 0;
if (riscv_of_processor_hartid(node) < 0)
continue;
@@ -44,8 +90,24 @@ void riscv_fill_hwcap(void)
continue;
}
for (i = 0; i < strlen(isa); ++i)
i = 0;
isa_len = strlen(isa);
#if IS_ENABLED(CONFIG_32BIT)
if (!strncmp(isa, "rv32", 4))
i += 4;
#elif IS_ENABLED(CONFIG_64BIT)
if (!strncmp(isa, "rv64", 4))
i += 4;
#endif
for (; i < isa_len; ++i) {
this_hwcap |= isa2hwcap[(unsigned char)(isa[i])];
/*
* TODO: X, Y and Z extension parsing for Host ISA
* bitmap will be added in-future.
*/
if ('a' <= isa[i] && isa[i] < 'x')
this_isa |= (1UL << (isa[i] - 'a'));
}
/*
* All "okay" hart should have same isa. Set HWCAP based on
@@ -56,6 +118,11 @@ void riscv_fill_hwcap(void)
elf_hwcap &= this_hwcap;
else
elf_hwcap = this_hwcap;
if (riscv_isa[0])
riscv_isa[0] &= this_isa;
else
riscv_isa[0] = this_isa;
}
/* We don't support systems with F but without D, so mask those out
@@ -65,7 +132,17 @@ void riscv_fill_hwcap(void)
elf_hwcap &= ~COMPAT_HWCAP_ISA_F;
}
pr_info("elf_hwcap is 0x%lx\n", elf_hwcap);
memset(print_str, 0, sizeof(print_str));
for (i = 0, j = 0; i < BITS_PER_LONG; i++)
if (riscv_isa[0] & BIT_MASK(i))
print_str[j++] = (char)('a' + i);
pr_info("riscv: ISA extensions %s\n", print_str);
memset(print_str, 0, sizeof(print_str));
for (i = 0, j = 0; i < BITS_PER_LONG; i++)
if (elf_hwcap & BIT_MASK(i))
print_str[j++] = (char)('a' + i);
pr_info("riscv: ELF capabilities %s\n", print_str);
#ifdef CONFIG_FPU
if (elf_hwcap & (COMPAT_HWCAP_ISA_F | COMPAT_HWCAP_ISA_D))

View File

@@ -10,6 +10,7 @@
#include <linux/cpu.h>
#include <linux/interrupt.h>
#include <linux/module.h>
#include <linux/profile.h>
#include <linux/smp.h>
#include <linux/sched.h>
@@ -63,6 +64,7 @@ void riscv_cpuid_to_hartid_mask(const struct cpumask *in, struct cpumask *out)
for_each_cpu(cpu, in)
cpumask_set_cpu(cpuid_to_hartid_map(cpu), out);
}
EXPORT_SYMBOL_GPL(riscv_cpuid_to_hartid_mask);
bool arch_match_cpu_phys_id(int cpu, u64 phys_id)
{

View File

@@ -12,7 +12,7 @@ vdso-syms += getcpu
vdso-syms += flush_icache
# Files to link into the vdso
obj-vdso = $(patsubst %, %.o, $(vdso-syms))
obj-vdso = $(patsubst %, %.o, $(vdso-syms)) note.o
# Build rules
targets := $(obj-vdso) vdso.so vdso.so.dbg vdso.lds vdso-dummy.o

View File

@@ -0,0 +1,12 @@
/* SPDX-License-Identifier: GPL-2.0-or-later */
/*
* This supplies .note.* sections to go into the PT_NOTE inside the vDSO text.
* Here we can supply some information useful to userland.
*/
#include <linux/elfnote.h>
#include <linux/version.h>
ELFNOTE_START(Linux, 0, "a")
.long LINUX_VERSION_CODE
ELFNOTE_END

View File

@@ -150,7 +150,8 @@ void __init setup_bootmem(void)
memblock_reserve(vmlinux_start, vmlinux_end - vmlinux_start);
set_max_mapnr(PFN_DOWN(mem_size));
max_low_pfn = PFN_DOWN(memblock_end_of_DRAM());
max_pfn = PFN_DOWN(memblock_end_of_DRAM());
max_low_pfn = max_pfn;
#ifdef CONFIG_BLK_DEV_INITRD
setup_initrd();
@@ -501,22 +502,6 @@ static inline void setup_vm_final(void)
#endif /* CONFIG_MMU */
#ifdef CONFIG_STRICT_KERNEL_RWX
void set_kernel_text_rw(void)
{
unsigned long text_start = (unsigned long)_text;
unsigned long text_end = (unsigned long)_etext;
set_memory_rw(text_start, (text_end - text_start) >> PAGE_SHIFT);
}
void set_kernel_text_ro(void)
{
unsigned long text_start = (unsigned long)_text;
unsigned long text_end = (unsigned long)_etext;
set_memory_ro(text_start, (text_end - text_start) >> PAGE_SHIFT);
}
void mark_rodata_ro(void)
{
unsigned long text_start = (unsigned long)_text;

View File

@@ -545,6 +545,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
case KVM_CAP_S390_AIS:
case KVM_CAP_S390_AIS_MIGRATION:
case KVM_CAP_S390_VCPU_RESETS:
case KVM_CAP_SET_GUEST_DEBUG:
r = 1;
break;
case KVM_CAP_S390_HPAGE_1M:

View File

@@ -626,10 +626,12 @@ static int handle_pqap(struct kvm_vcpu *vcpu)
* available for the guest are AQIC and TAPQ with the t bit set
* since we do not set IC.3 (FIII) we currently will only intercept
* the AQIC function code.
* Note: running nested under z/VM can result in intercepts for other
* function codes, e.g. PQAP(QCI). We do not support this and bail out.
*/
reg0 = vcpu->run->s.regs.gprs[0];
fc = (reg0 >> 24) & 0xff;
if (WARN_ON_ONCE(fc != 0x03))
if (fc != 0x03)
return -EOPNOTSUPP;
/* PQAP instruction is allowed for guest kernel only */

View File

@@ -32,16 +32,16 @@ void blake2s_compress_arch(struct blake2s_state *state,
const u32 inc)
{
/* SIMD disables preemption, so relax after processing each page. */
BUILD_BUG_ON(PAGE_SIZE / BLAKE2S_BLOCK_SIZE < 8);
BUILD_BUG_ON(SZ_4K / BLAKE2S_BLOCK_SIZE < 8);
if (!static_branch_likely(&blake2s_use_ssse3) || !crypto_simd_usable()) {
blake2s_compress_generic(state, block, nblocks, inc);
return;
}
for (;;) {
do {
const size_t blocks = min_t(size_t, nblocks,
PAGE_SIZE / BLAKE2S_BLOCK_SIZE);
SZ_4K / BLAKE2S_BLOCK_SIZE);
kernel_fpu_begin();
if (IS_ENABLED(CONFIG_AS_AVX512) &&
@@ -52,10 +52,8 @@ void blake2s_compress_arch(struct blake2s_state *state,
kernel_fpu_end();
nblocks -= blocks;
if (!nblocks)
break;
block += blocks * BLAKE2S_BLOCK_SIZE;
}
} while (nblocks);
}
EXPORT_SYMBOL(blake2s_compress_arch);

View File

@@ -153,9 +153,17 @@ void chacha_crypt_arch(u32 *state, u8 *dst, const u8 *src, unsigned int bytes,
bytes <= CHACHA_BLOCK_SIZE)
return chacha_crypt_generic(state, dst, src, bytes, nrounds);
kernel_fpu_begin();
chacha_dosimd(state, dst, src, bytes, nrounds);
kernel_fpu_end();
do {
unsigned int todo = min_t(unsigned int, bytes, SZ_4K);
kernel_fpu_begin();
chacha_dosimd(state, dst, src, todo, nrounds);
kernel_fpu_end();
bytes -= todo;
src += todo;
dst += todo;
} while (bytes);
}
EXPORT_SYMBOL(chacha_crypt_arch);

View File

@@ -29,7 +29,7 @@ static int nhpoly1305_avx2_update(struct shash_desc *desc,
return crypto_nhpoly1305_update(desc, src, srclen);
do {
unsigned int n = min_t(unsigned int, srclen, PAGE_SIZE);
unsigned int n = min_t(unsigned int, srclen, SZ_4K);
kernel_fpu_begin();
crypto_nhpoly1305_update_helper(desc, src, n, _nh_avx2);

View File

@@ -29,7 +29,7 @@ static int nhpoly1305_sse2_update(struct shash_desc *desc,
return crypto_nhpoly1305_update(desc, src, srclen);
do {
unsigned int n = min_t(unsigned int, srclen, PAGE_SIZE);
unsigned int n = min_t(unsigned int, srclen, SZ_4K);
kernel_fpu_begin();
crypto_nhpoly1305_update_helper(desc, src, n, _nh_sse2);

View File

@@ -91,8 +91,8 @@ static void poly1305_simd_blocks(void *ctx, const u8 *inp, size_t len,
struct poly1305_arch_internal *state = ctx;
/* SIMD disables preemption, so relax after processing each page. */
BUILD_BUG_ON(PAGE_SIZE < POLY1305_BLOCK_SIZE ||
PAGE_SIZE % POLY1305_BLOCK_SIZE);
BUILD_BUG_ON(SZ_4K < POLY1305_BLOCK_SIZE ||
SZ_4K % POLY1305_BLOCK_SIZE);
if (!static_branch_likely(&poly1305_use_avx) ||
(len < (POLY1305_BLOCK_SIZE * 18) && !state->is_base2_26) ||
@@ -102,8 +102,8 @@ static void poly1305_simd_blocks(void *ctx, const u8 *inp, size_t len,
return;
}
for (;;) {
const size_t bytes = min_t(size_t, len, PAGE_SIZE);
do {
const size_t bytes = min_t(size_t, len, SZ_4K);
kernel_fpu_begin();
if (IS_ENABLED(CONFIG_AS_AVX512) && static_branch_likely(&poly1305_use_avx512))
@@ -113,11 +113,10 @@ static void poly1305_simd_blocks(void *ctx, const u8 *inp, size_t len,
else
poly1305_blocks_avx(ctx, inp, bytes, padbit);
kernel_fpu_end();
len -= bytes;
if (!len)
break;
inp += bytes;
}
} while (len);
}
static void poly1305_simd_emit(void *ctx, u8 mac[POLY1305_DIGEST_SIZE],

View File

@@ -98,13 +98,6 @@ For 32-bit we have the following conventions - kernel is built with
#define SIZEOF_PTREGS 21*8
.macro PUSH_AND_CLEAR_REGS rdx=%rdx rax=%rax save_ret=0
/*
* Push registers and sanitize registers of values that a
* speculation attack might otherwise want to exploit. The
* lower registers are likely clobbered well before they
* could be put to use in a speculative execution gadget.
* Interleave XOR with PUSH for better uop scheduling:
*/
.if \save_ret
pushq %rsi /* pt_regs->si */
movq 8(%rsp), %rsi /* temporarily store the return address in %rsi */
@@ -114,34 +107,43 @@ For 32-bit we have the following conventions - kernel is built with
pushq %rsi /* pt_regs->si */
.endif
pushq \rdx /* pt_regs->dx */
xorl %edx, %edx /* nospec dx */
pushq %rcx /* pt_regs->cx */
xorl %ecx, %ecx /* nospec cx */
pushq \rax /* pt_regs->ax */
pushq %r8 /* pt_regs->r8 */
xorl %r8d, %r8d /* nospec r8 */
pushq %r9 /* pt_regs->r9 */
xorl %r9d, %r9d /* nospec r9 */
pushq %r10 /* pt_regs->r10 */
xorl %r10d, %r10d /* nospec r10 */
pushq %r11 /* pt_regs->r11 */
xorl %r11d, %r11d /* nospec r11*/
pushq %rbx /* pt_regs->rbx */
xorl %ebx, %ebx /* nospec rbx*/
pushq %rbp /* pt_regs->rbp */
xorl %ebp, %ebp /* nospec rbp*/
pushq %r12 /* pt_regs->r12 */
xorl %r12d, %r12d /* nospec r12*/
pushq %r13 /* pt_regs->r13 */
xorl %r13d, %r13d /* nospec r13*/
pushq %r14 /* pt_regs->r14 */
xorl %r14d, %r14d /* nospec r14*/
pushq %r15 /* pt_regs->r15 */
xorl %r15d, %r15d /* nospec r15*/
UNWIND_HINT_REGS
.if \save_ret
pushq %rsi /* return address on top of stack */
.endif
/*
* Sanitize registers of values that a speculation attack might
* otherwise want to exploit. The lower registers are likely clobbered
* well before they could be put to use in a speculative execution
* gadget.
*/
xorl %edx, %edx /* nospec dx */
xorl %ecx, %ecx /* nospec cx */
xorl %r8d, %r8d /* nospec r8 */
xorl %r9d, %r9d /* nospec r9 */
xorl %r10d, %r10d /* nospec r10 */
xorl %r11d, %r11d /* nospec r11 */
xorl %ebx, %ebx /* nospec rbx */
xorl %ebp, %ebp /* nospec rbp */
xorl %r12d, %r12d /* nospec r12 */
xorl %r13d, %r13d /* nospec r13 */
xorl %r14d, %r14d /* nospec r14 */
xorl %r15d, %r15d /* nospec r15 */
.endm
.macro POP_REGS pop_rdi=1 skip_r11rcx=0

View File

@@ -249,7 +249,6 @@ SYM_INNER_LABEL(entry_SYSCALL_64_after_hwframe, SYM_L_GLOBAL)
*/
syscall_return_via_sysret:
/* rcx and r11 are already restored (see code above) */
UNWIND_HINT_EMPTY
POP_REGS pop_rdi=0 skip_r11rcx=1
/*
@@ -258,6 +257,7 @@ syscall_return_via_sysret:
*/
movq %rsp, %rdi
movq PER_CPU_VAR(cpu_tss_rw + TSS_sp0), %rsp
UNWIND_HINT_EMPTY
pushq RSP-RDI(%rdi) /* RSP */
pushq (%rdi) /* RDI */
@@ -279,8 +279,7 @@ SYM_CODE_END(entry_SYSCALL_64)
* %rdi: prev task
* %rsi: next task
*/
SYM_CODE_START(__switch_to_asm)
UNWIND_HINT_FUNC
SYM_FUNC_START(__switch_to_asm)
/*
* Save callee-saved registers
* This must match the order in inactive_task_frame
@@ -321,7 +320,7 @@ SYM_CODE_START(__switch_to_asm)
popq %rbp
jmp __switch_to
SYM_CODE_END(__switch_to_asm)
SYM_FUNC_END(__switch_to_asm)
/*
* A newly forked process directly context switches into this address.
@@ -512,7 +511,7 @@ SYM_CODE_END(spurious_entries_start)
* +----------------------------------------------------+
*/
SYM_CODE_START(interrupt_entry)
UNWIND_HINT_FUNC
UNWIND_HINT_IRET_REGS offset=16
ASM_CLAC
cld
@@ -544,9 +543,9 @@ SYM_CODE_START(interrupt_entry)
pushq 5*8(%rdi) /* regs->eflags */
pushq 4*8(%rdi) /* regs->cs */
pushq 3*8(%rdi) /* regs->ip */
UNWIND_HINT_IRET_REGS
pushq 2*8(%rdi) /* regs->orig_ax */
pushq 8(%rdi) /* return address */
UNWIND_HINT_FUNC
movq (%rdi), %rdi
jmp 2f
@@ -637,6 +636,7 @@ SYM_INNER_LABEL(swapgs_restore_regs_and_return_to_usermode, SYM_L_GLOBAL)
*/
movq %rsp, %rdi
movq PER_CPU_VAR(cpu_tss_rw + TSS_sp0), %rsp
UNWIND_HINT_EMPTY
/* Copy the IRET frame to the trampoline stack. */
pushq 6*8(%rdi) /* SS */
@@ -1739,7 +1739,7 @@ SYM_CODE_START(rewind_stack_do_exit)
movq PER_CPU_VAR(cpu_current_top_of_stack), %rax
leaq -PTREGS_SIZE(%rax), %rsp
UNWIND_HINT_FUNC sp_offset=PTREGS_SIZE
UNWIND_HINT_REGS
call do_exit
SYM_CODE_END(rewind_stack_do_exit)

View File

@@ -61,11 +61,12 @@ static inline bool arch_syscall_match_sym_name(const char *sym, const char *name
{
/*
* Compare the symbol name with the system call name. Skip the
* "__x64_sys", "__ia32_sys" or simple "sys" prefix.
* "__x64_sys", "__ia32_sys", "__do_sys" or simple "sys" prefix.
*/
return !strcmp(sym + 3, name + 3) ||
(!strncmp(sym, "__x64_", 6) && !strcmp(sym + 9, name + 3)) ||
(!strncmp(sym, "__ia32_", 7) && !strcmp(sym + 10, name + 3));
(!strncmp(sym, "__ia32_", 7) && !strcmp(sym + 10, name + 3)) ||
(!strncmp(sym, "__do_sys", 8) && !strcmp(sym + 8, name + 3));
}
#ifndef COMPILE_OFFSETS

View File

@@ -1663,8 +1663,8 @@ void kvm_set_msi_irq(struct kvm *kvm, struct kvm_kernel_irq_routing_entry *e,
static inline bool kvm_irq_is_postable(struct kvm_lapic_irq *irq)
{
/* We can only post Fixed and LowPrio IRQs */
return (irq->delivery_mode == dest_Fixed ||
irq->delivery_mode == dest_LowestPrio);
return (irq->delivery_mode == APIC_DM_FIXED ||
irq->delivery_mode == APIC_DM_LOWEST);
}
static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu)

View File

@@ -19,7 +19,7 @@ struct unwind_state {
#if defined(CONFIG_UNWINDER_ORC)
bool signal, full_regs;
unsigned long sp, bp, ip;
struct pt_regs *regs;
struct pt_regs *regs, *prev_regs;
#elif defined(CONFIG_UNWINDER_FRAME_POINTER)
bool got_irq;
unsigned long *bp, *orig_sp, ip;

View File

@@ -352,8 +352,6 @@ static void __setup_APIC_LVTT(unsigned int clocks, int oneshot, int irqen)
* According to Intel, MFENCE can do the serialization here.
*/
asm volatile("mfence" : : : "memory");
printk_once(KERN_DEBUG "TSC deadline timer enabled\n");
return;
}
@@ -546,7 +544,7 @@ static struct clock_event_device lapic_clockevent = {
};
static DEFINE_PER_CPU(struct clock_event_device, lapic_events);
static u32 hsx_deadline_rev(void)
static __init u32 hsx_deadline_rev(void)
{
switch (boot_cpu_data.x86_stepping) {
case 0x02: return 0x3a; /* EP */
@@ -556,7 +554,7 @@ static u32 hsx_deadline_rev(void)
return ~0U;
}
static u32 bdx_deadline_rev(void)
static __init u32 bdx_deadline_rev(void)
{
switch (boot_cpu_data.x86_stepping) {
case 0x02: return 0x00000011;
@@ -568,7 +566,7 @@ static u32 bdx_deadline_rev(void)
return ~0U;
}
static u32 skx_deadline_rev(void)
static __init u32 skx_deadline_rev(void)
{
switch (boot_cpu_data.x86_stepping) {
case 0x03: return 0x01000136;
@@ -581,7 +579,7 @@ static u32 skx_deadline_rev(void)
return ~0U;
}
static const struct x86_cpu_id deadline_match[] = {
static const struct x86_cpu_id deadline_match[] __initconst = {
X86_MATCH_INTEL_FAM6_MODEL( HASWELL_X, &hsx_deadline_rev),
X86_MATCH_INTEL_FAM6_MODEL( BROADWELL_X, 0x0b000020),
X86_MATCH_INTEL_FAM6_MODEL( BROADWELL_D, &bdx_deadline_rev),
@@ -603,18 +601,19 @@ static const struct x86_cpu_id deadline_match[] = {
{},
};
static void apic_check_deadline_errata(void)
static __init bool apic_validate_deadline_timer(void)
{
const struct x86_cpu_id *m;
u32 rev;
if (!boot_cpu_has(X86_FEATURE_TSC_DEADLINE_TIMER) ||
boot_cpu_has(X86_FEATURE_HYPERVISOR))
return;
if (!boot_cpu_has(X86_FEATURE_TSC_DEADLINE_TIMER))
return false;
if (boot_cpu_has(X86_FEATURE_HYPERVISOR))
return true;
m = x86_match_cpu(deadline_match);
if (!m)
return;
return true;
/*
* Function pointers will have the MSB set due to address layout,
@@ -626,11 +625,12 @@ static void apic_check_deadline_errata(void)
rev = (u32)m->driver_data;
if (boot_cpu_data.microcode >= rev)
return;
return true;
setup_clear_cpu_cap(X86_FEATURE_TSC_DEADLINE_TIMER);
pr_err(FW_BUG "TSC_DEADLINE disabled due to Errata; "
"please update microcode to version: 0x%x (or later)\n", rev);
return false;
}
/*
@@ -2092,7 +2092,8 @@ void __init init_apic_mappings(void)
{
unsigned int new_apicid;
apic_check_deadline_errata();
if (apic_validate_deadline_timer())
pr_debug("TSC deadline timer available\n");
if (x2apic_mode) {
boot_cpu_physical_apicid = read_apic_id();

View File

@@ -183,7 +183,8 @@ recursion_check:
*/
if (visit_mask) {
if (*visit_mask & (1UL << info->type)) {
printk_deferred_once(KERN_WARNING "WARNING: stack recursion on stack type %d\n", info->type);
if (task == current)
printk_deferred_once(KERN_WARNING "WARNING: stack recursion on stack type %d\n", info->type);
goto unknown;
}
*visit_mask |= 1UL << info->type;

View File

@@ -344,6 +344,9 @@ bad_address:
if (IS_ENABLED(CONFIG_X86_32))
goto the_end;
if (state->task != current)
goto the_end;
if (state->regs) {
printk_deferred_once(KERN_WARNING
"WARNING: kernel stack regs at %p in %s:%d has bad 'bp' value %p\n",

View File

@@ -8,19 +8,21 @@
#include <asm/orc_lookup.h>
#define orc_warn(fmt, ...) \
printk_deferred_once(KERN_WARNING pr_fmt("WARNING: " fmt), ##__VA_ARGS__)
printk_deferred_once(KERN_WARNING "WARNING: " fmt, ##__VA_ARGS__)
#define orc_warn_current(args...) \
({ \
if (state->task == current) \
orc_warn(args); \
})
extern int __start_orc_unwind_ip[];
extern int __stop_orc_unwind_ip[];
extern struct orc_entry __start_orc_unwind[];
extern struct orc_entry __stop_orc_unwind[];
static DEFINE_MUTEX(sort_mutex);
int *cur_orc_ip_table = __start_orc_unwind_ip;
struct orc_entry *cur_orc_table = __start_orc_unwind;
unsigned int lookup_num_blocks;
bool orc_init;
static bool orc_init __ro_after_init;
static unsigned int lookup_num_blocks __ro_after_init;
static inline unsigned long orc_ip(const int *ip)
{
@@ -142,9 +144,6 @@ static struct orc_entry *orc_find(unsigned long ip)
{
static struct orc_entry *orc;
if (!orc_init)
return NULL;
if (ip == 0)
return &null_orc_entry;
@@ -189,6 +188,10 @@ static struct orc_entry *orc_find(unsigned long ip)
#ifdef CONFIG_MODULES
static DEFINE_MUTEX(sort_mutex);
static int *cur_orc_ip_table = __start_orc_unwind_ip;
static struct orc_entry *cur_orc_table = __start_orc_unwind;
static void orc_sort_swap(void *_a, void *_b, int size)
{
struct orc_entry *orc_a, *orc_b;
@@ -381,9 +384,38 @@ static bool deref_stack_iret_regs(struct unwind_state *state, unsigned long addr
return true;
}
/*
* If state->regs is non-NULL, and points to a full pt_regs, just get the reg
* value from state->regs.
*
* Otherwise, if state->regs just points to IRET regs, and the previous frame
* had full regs, it's safe to get the value from the previous regs. This can
* happen when early/late IRQ entry code gets interrupted by an NMI.
*/
static bool get_reg(struct unwind_state *state, unsigned int reg_off,
unsigned long *val)
{
unsigned int reg = reg_off/8;
if (!state->regs)
return false;
if (state->full_regs) {
*val = ((unsigned long *)state->regs)[reg];
return true;
}
if (state->prev_regs) {
*val = ((unsigned long *)state->prev_regs)[reg];
return true;
}
return false;
}
bool unwind_next_frame(struct unwind_state *state)
{
unsigned long ip_p, sp, orig_ip = state->ip, prev_sp = state->sp;
unsigned long ip_p, sp, tmp, orig_ip = state->ip, prev_sp = state->sp;
enum stack_type prev_type = state->stack_info.type;
struct orc_entry *orc;
bool indirect = false;
@@ -445,43 +477,39 @@ bool unwind_next_frame(struct unwind_state *state)
break;
case ORC_REG_R10:
if (!state->regs || !state->full_regs) {
orc_warn("missing regs for base reg R10 at ip %pB\n",
(void *)state->ip);
if (!get_reg(state, offsetof(struct pt_regs, r10), &sp)) {
orc_warn_current("missing R10 value at %pB\n",
(void *)state->ip);
goto err;
}
sp = state->regs->r10;
break;
case ORC_REG_R13:
if (!state->regs || !state->full_regs) {
orc_warn("missing regs for base reg R13 at ip %pB\n",
(void *)state->ip);
if (!get_reg(state, offsetof(struct pt_regs, r13), &sp)) {
orc_warn_current("missing R13 value at %pB\n",
(void *)state->ip);
goto err;
}
sp = state->regs->r13;
break;
case ORC_REG_DI:
if (!state->regs || !state->full_regs) {
orc_warn("missing regs for base reg DI at ip %pB\n",
(void *)state->ip);
if (!get_reg(state, offsetof(struct pt_regs, di), &sp)) {
orc_warn_current("missing RDI value at %pB\n",
(void *)state->ip);
goto err;
}
sp = state->regs->di;
break;
case ORC_REG_DX:
if (!state->regs || !state->full_regs) {
orc_warn("missing regs for base reg DX at ip %pB\n",
(void *)state->ip);
if (!get_reg(state, offsetof(struct pt_regs, dx), &sp)) {
orc_warn_current("missing DX value at %pB\n",
(void *)state->ip);
goto err;
}
sp = state->regs->dx;
break;
default:
orc_warn("unknown SP base reg %d for ip %pB\n",
orc_warn("unknown SP base reg %d at %pB\n",
orc->sp_reg, (void *)state->ip);
goto err;
}
@@ -504,44 +532,48 @@ bool unwind_next_frame(struct unwind_state *state)
state->sp = sp;
state->regs = NULL;
state->prev_regs = NULL;
state->signal = false;
break;
case ORC_TYPE_REGS:
if (!deref_stack_regs(state, sp, &state->ip, &state->sp)) {
orc_warn("can't dereference registers at %p for ip %pB\n",
(void *)sp, (void *)orig_ip);
orc_warn_current("can't access registers at %pB\n",
(void *)orig_ip);
goto err;
}
state->regs = (struct pt_regs *)sp;
state->prev_regs = NULL;
state->full_regs = true;
state->signal = true;
break;
case ORC_TYPE_REGS_IRET:
if (!deref_stack_iret_regs(state, sp, &state->ip, &state->sp)) {
orc_warn("can't dereference iret registers at %p for ip %pB\n",
(void *)sp, (void *)orig_ip);
orc_warn_current("can't access iret registers at %pB\n",
(void *)orig_ip);
goto err;
}
if (state->full_regs)
state->prev_regs = state->regs;
state->regs = (void *)sp - IRET_FRAME_OFFSET;
state->full_regs = false;
state->signal = true;
break;
default:
orc_warn("unknown .orc_unwind entry type %d for ip %pB\n",
orc_warn("unknown .orc_unwind entry type %d at %pB\n",
orc->type, (void *)orig_ip);
break;
goto err;
}
/* Find BP: */
switch (orc->bp_reg) {
case ORC_REG_UNDEFINED:
if (state->regs && state->full_regs)
state->bp = state->regs->bp;
if (get_reg(state, offsetof(struct pt_regs, bp), &tmp))
state->bp = tmp;
break;
case ORC_REG_PREV_SP:
@@ -564,8 +596,8 @@ bool unwind_next_frame(struct unwind_state *state)
if (state->stack_info.type == prev_type &&
on_stack(&state->stack_info, (void *)state->sp, sizeof(long)) &&
state->sp <= prev_sp) {
orc_warn("stack going in the wrong direction? ip=%pB\n",
(void *)orig_ip);
orc_warn_current("stack going in the wrong direction? at %pB\n",
(void *)orig_ip);
goto err;
}
@@ -585,6 +617,9 @@ EXPORT_SYMBOL_GPL(unwind_next_frame);
void __unwind_start(struct unwind_state *state, struct task_struct *task,
struct pt_regs *regs, unsigned long *first_frame)
{
if (!orc_init)
goto done;
memset(state, 0, sizeof(*state));
state->task = task;
@@ -651,7 +686,7 @@ void __unwind_start(struct unwind_state *state, struct task_struct *task,
/* Otherwise, skip ahead to the user-specified starting frame: */
while (!unwind_done(state) &&
(!on_stack(&state->stack_info, first_frame, sizeof(long)) ||
state->sp <= (unsigned long)first_frame))
state->sp < (unsigned long)first_frame))
unwind_next_frame(state);
return;

View File

@@ -225,12 +225,12 @@ static int ioapic_set_irq(struct kvm_ioapic *ioapic, unsigned int irq,
}
/*
* AMD SVM AVIC accelerate EOI write and do not trap,
* in-kernel IOAPIC will not be able to receive the EOI.
* In this case, we do lazy update of the pending EOI when
* trying to set IOAPIC irq.
* AMD SVM AVIC accelerate EOI write iff the interrupt is edge
* triggered, in which case the in-kernel IOAPIC will not be able
* to receive the EOI. In this case, we do a lazy update of the
* pending EOI when trying to set IOAPIC irq.
*/
if (kvm_apicv_activated(ioapic->kvm))
if (edge && kvm_apicv_activated(ioapic->kvm))
ioapic_lazy_update_eoi(ioapic, irq);
/*

View File

@@ -345,7 +345,7 @@ static struct page **sev_pin_memory(struct kvm *kvm, unsigned long uaddr,
return NULL;
/* Pin the user virtual address. */
npinned = get_user_pages_fast(uaddr, npages, FOLL_WRITE, pages);
npinned = get_user_pages_fast(uaddr, npages, write ? FOLL_WRITE : 0, pages);
if (npinned != npages) {
pr_err("SEV: Failure locking %lu pages.\n", npages);
goto err;

View File

@@ -1752,6 +1752,8 @@ static int db_interception(struct vcpu_svm *svm)
if (svm->vcpu.guest_debug &
(KVM_GUESTDBG_SINGLESTEP | KVM_GUESTDBG_USE_HW_BP)) {
kvm_run->exit_reason = KVM_EXIT_DEBUG;
kvm_run->debug.arch.dr6 = svm->vmcb->save.dr6;
kvm_run->debug.arch.dr7 = svm->vmcb->save.dr7;
kvm_run->debug.arch.pc =
svm->vmcb->save.cs.base + svm->vmcb->save.rip;
kvm_run->debug.arch.exception = DB_VECTOR;

View File

@@ -5165,7 +5165,7 @@ static int handle_invept(struct kvm_vcpu *vcpu)
*/
break;
default:
BUG_ON(1);
BUG();
break;
}

View File

@@ -82,6 +82,9 @@ SYM_FUNC_START(vmx_vmexit)
/* IMPORTANT: Stuff the RSB immediately after VM-Exit, before RET! */
FILL_RETURN_BUFFER %_ASM_AX, RSB_CLEAR_LOOPS, X86_FEATURE_RETPOLINE
/* Clear RFLAGS.CF and RFLAGS.ZF to preserve VM-Exit, i.e. !VM-Fail. */
or $1, %_ASM_AX
pop %_ASM_AX
.Lvmexit_skip_rsb:
#endif

View File

@@ -926,19 +926,6 @@ EXPORT_SYMBOL_GPL(kvm_set_xcr);
__reserved_bits; \
})
static u64 kvm_host_cr4_reserved_bits(struct cpuinfo_x86 *c)
{
u64 reserved_bits = __cr4_reserved_bits(cpu_has, c);
if (kvm_cpu_cap_has(X86_FEATURE_LA57))
reserved_bits &= ~X86_CR4_LA57;
if (kvm_cpu_cap_has(X86_FEATURE_UMIP))
reserved_bits &= ~X86_CR4_UMIP;
return reserved_bits;
}
static int kvm_valid_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
{
if (cr4 & cr4_reserved_bits)
@@ -3385,6 +3372,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
case KVM_CAP_GET_MSR_FEATURES:
case KVM_CAP_MSR_PLATFORM_INFO:
case KVM_CAP_EXCEPTION_PAYLOAD:
case KVM_CAP_SET_GUEST_DEBUG:
r = 1;
break;
case KVM_CAP_SYNC_REGS:
@@ -9675,7 +9663,9 @@ int kvm_arch_hardware_setup(void *opaque)
if (!kvm_cpu_cap_has(X86_FEATURE_XSAVES))
supported_xss = 0;
cr4_reserved_bits = kvm_host_cr4_reserved_bits(&boot_cpu_data);
#define __kvm_cpu_cap_has(UNUSED_, f) kvm_cpu_cap_has(f)
cr4_reserved_bits = __cr4_reserved_bits(__kvm_cpu_cap_has, UNUSED_);
#undef __kvm_cpu_cap_has
if (kvm_has_tsc_control) {
/*
@@ -9707,7 +9697,8 @@ int kvm_arch_check_processor_compat(void *opaque)
WARN_ON(!irqs_disabled());
if (kvm_host_cr4_reserved_bits(c) != cr4_reserved_bits)
if (__cr4_reserved_bits(cpu_has, c) !=
__cr4_reserved_bits(cpu_has, &boot_cpu_data))
return -EIO;
return ops->check_processor_compatibility();

View File

@@ -43,7 +43,8 @@ struct cpa_data {
unsigned long pfn;
unsigned int flags;
unsigned int force_split : 1,
force_static_prot : 1;
force_static_prot : 1,
force_flush_all : 1;
struct page **pages;
};
@@ -355,10 +356,10 @@ static void cpa_flush(struct cpa_data *data, int cache)
return;
}
if (cpa->numpages <= tlb_single_page_flush_ceiling)
on_each_cpu(__cpa_flush_tlb, cpa, 1);
else
if (cpa->force_flush_all || cpa->numpages > tlb_single_page_flush_ceiling)
flush_tlb_all();
else
on_each_cpu(__cpa_flush_tlb, cpa, 1);
if (!cache)
return;
@@ -1598,6 +1599,8 @@ static int cpa_process_alias(struct cpa_data *cpa)
alias_cpa.flags &= ~(CPA_PAGES_ARRAY | CPA_ARRAY);
alias_cpa.curpage = 0;
cpa->force_flush_all = 1;
ret = __change_page_attr_set_clr(&alias_cpa, 0);
if (ret)
return ret;
@@ -1618,6 +1621,7 @@ static int cpa_process_alias(struct cpa_data *cpa)
alias_cpa.flags &= ~(CPA_PAGES_ARRAY | CPA_ARRAY);
alias_cpa.curpage = 0;
cpa->force_flush_all = 1;
/*
* The high mapping range is imprecise, so ignore the
* return value.

View File

@@ -123,6 +123,7 @@
#include <linux/ioprio.h>
#include <linux/sbitmap.h>
#include <linux/delay.h>
#include <linux/backing-dev.h>
#include "blk.h"
#include "blk-mq.h"
@@ -4976,8 +4977,9 @@ bfq_set_next_ioprio_data(struct bfq_queue *bfqq, struct bfq_io_cq *bic)
ioprio_class = IOPRIO_PRIO_CLASS(bic->ioprio);
switch (ioprio_class) {
default:
dev_err(bfqq->bfqd->queue->backing_dev_info->dev,
"bfq: bad prio class %d\n", ioprio_class);
pr_err("bdi %s: bfq: bad prio class %d\n",
bdi_dev_name(bfqq->bfqd->queue->backing_dev_info),
ioprio_class);
/* fall through */
case IOPRIO_CLASS_NONE:
/*

View File

@@ -496,7 +496,7 @@ const char *blkg_dev_name(struct blkcg_gq *blkg)
{
/* some drivers (floppy) instantiate a queue w/o disk registered */
if (blkg->q->backing_dev_info->dev)
return dev_name(blkg->q->backing_dev_info->dev);
return bdi_dev_name(blkg->q->backing_dev_info);
return NULL;
}

View File

@@ -466,7 +466,7 @@ struct ioc_gq {
*/
atomic64_t vtime;
atomic64_t done_vtime;
atomic64_t abs_vdebt;
u64 abs_vdebt;
u64 last_vtime;
/*
@@ -1142,7 +1142,7 @@ static void iocg_kick_waitq(struct ioc_gq *iocg, struct ioc_now *now)
struct iocg_wake_ctx ctx = { .iocg = iocg };
u64 margin_ns = (u64)(ioc->period_us *
WAITQ_TIMER_MARGIN_PCT / 100) * NSEC_PER_USEC;
u64 abs_vdebt, vdebt, vshortage, expires, oexpires;
u64 vdebt, vshortage, expires, oexpires;
s64 vbudget;
u32 hw_inuse;
@@ -1152,18 +1152,15 @@ static void iocg_kick_waitq(struct ioc_gq *iocg, struct ioc_now *now)
vbudget = now->vnow - atomic64_read(&iocg->vtime);
/* pay off debt */
abs_vdebt = atomic64_read(&iocg->abs_vdebt);
vdebt = abs_cost_to_cost(abs_vdebt, hw_inuse);
vdebt = abs_cost_to_cost(iocg->abs_vdebt, hw_inuse);
if (vdebt && vbudget > 0) {
u64 delta = min_t(u64, vbudget, vdebt);
u64 abs_delta = min(cost_to_abs_cost(delta, hw_inuse),
abs_vdebt);
iocg->abs_vdebt);
atomic64_add(delta, &iocg->vtime);
atomic64_add(delta, &iocg->done_vtime);
atomic64_sub(abs_delta, &iocg->abs_vdebt);
if (WARN_ON_ONCE(atomic64_read(&iocg->abs_vdebt) < 0))
atomic64_set(&iocg->abs_vdebt, 0);
iocg->abs_vdebt -= abs_delta;
}
/*
@@ -1219,12 +1216,18 @@ static bool iocg_kick_delay(struct ioc_gq *iocg, struct ioc_now *now, u64 cost)
u64 expires, oexpires;
u32 hw_inuse;
lockdep_assert_held(&iocg->waitq.lock);
/* debt-adjust vtime */
current_hweight(iocg, NULL, &hw_inuse);
vtime += abs_cost_to_cost(atomic64_read(&iocg->abs_vdebt), hw_inuse);
vtime += abs_cost_to_cost(iocg->abs_vdebt, hw_inuse);
/* clear or maintain depending on the overage */
if (time_before_eq64(vtime, now->vnow)) {
/*
* Clear or maintain depending on the overage. Non-zero vdebt is what
* guarantees that @iocg is online and future iocg_kick_delay() will
* clear use_delay. Don't leave it on when there's no vdebt.
*/
if (!iocg->abs_vdebt || time_before_eq64(vtime, now->vnow)) {
blkcg_clear_delay(blkg);
return false;
}
@@ -1258,9 +1261,12 @@ static enum hrtimer_restart iocg_delay_timer_fn(struct hrtimer *timer)
{
struct ioc_gq *iocg = container_of(timer, struct ioc_gq, delay_timer);
struct ioc_now now;
unsigned long flags;
spin_lock_irqsave(&iocg->waitq.lock, flags);
ioc_now(iocg->ioc, &now);
iocg_kick_delay(iocg, &now, 0);
spin_unlock_irqrestore(&iocg->waitq.lock, flags);
return HRTIMER_NORESTART;
}
@@ -1368,14 +1374,13 @@ static void ioc_timer_fn(struct timer_list *timer)
* should have woken up in the last period and expire idle iocgs.
*/
list_for_each_entry_safe(iocg, tiocg, &ioc->active_iocgs, active_list) {
if (!waitqueue_active(&iocg->waitq) &&
!atomic64_read(&iocg->abs_vdebt) && !iocg_is_idle(iocg))
if (!waitqueue_active(&iocg->waitq) && iocg->abs_vdebt &&
!iocg_is_idle(iocg))
continue;
spin_lock(&iocg->waitq.lock);
if (waitqueue_active(&iocg->waitq) ||
atomic64_read(&iocg->abs_vdebt)) {
if (waitqueue_active(&iocg->waitq) || iocg->abs_vdebt) {
/* might be oversleeping vtime / hweight changes, kick */
iocg_kick_waitq(iocg, &now);
iocg_kick_delay(iocg, &now, 0);
@@ -1718,28 +1723,49 @@ static void ioc_rqos_throttle(struct rq_qos *rqos, struct bio *bio)
* tests are racy but the races aren't systemic - we only miss once
* in a while which is fine.
*/
if (!waitqueue_active(&iocg->waitq) &&
!atomic64_read(&iocg->abs_vdebt) &&
if (!waitqueue_active(&iocg->waitq) && !iocg->abs_vdebt &&
time_before_eq64(vtime + cost, now.vnow)) {
iocg_commit_bio(iocg, bio, cost);
return;
}
/*
* We're over budget. If @bio has to be issued regardless,
* remember the abs_cost instead of advancing vtime.
* iocg_kick_waitq() will pay off the debt before waking more IOs.
* We activated above but w/o any synchronization. Deactivation is
* synchronized with waitq.lock and we won't get deactivated as long
* as we're waiting or has debt, so we're good if we're activated
* here. In the unlikely case that we aren't, just issue the IO.
*/
spin_lock_irq(&iocg->waitq.lock);
if (unlikely(list_empty(&iocg->active_list))) {
spin_unlock_irq(&iocg->waitq.lock);
iocg_commit_bio(iocg, bio, cost);
return;
}
/*
* We're over budget. If @bio has to be issued regardless, remember
* the abs_cost instead of advancing vtime. iocg_kick_waitq() will pay
* off the debt before waking more IOs.
*
* This way, the debt is continuously paid off each period with the
* actual budget available to the cgroup. If we just wound vtime,
* we would incorrectly use the current hw_inuse for the entire
* amount which, for example, can lead to the cgroup staying
* blocked for a long time even with substantially raised hw_inuse.
* actual budget available to the cgroup. If we just wound vtime, we
* would incorrectly use the current hw_inuse for the entire amount
* which, for example, can lead to the cgroup staying blocked for a
* long time even with substantially raised hw_inuse.
*
* An iocg with vdebt should stay online so that the timer can keep
* deducting its vdebt and [de]activate use_delay mechanism
* accordingly. We don't want to race against the timer trying to
* clear them and leave @iocg inactive w/ dangling use_delay heavily
* penalizing the cgroup and its descendants.
*/
if (bio_issue_as_root_blkg(bio) || fatal_signal_pending(current)) {
atomic64_add(abs_cost, &iocg->abs_vdebt);
iocg->abs_vdebt += abs_cost;
if (iocg_kick_delay(iocg, &now, cost))
blkcg_schedule_throttle(rqos->q,
(bio->bi_opf & REQ_SWAP) == REQ_SWAP);
spin_unlock_irq(&iocg->waitq.lock);
return;
}
@@ -1756,20 +1782,6 @@ static void ioc_rqos_throttle(struct rq_qos *rqos, struct bio *bio)
* All waiters are on iocg->waitq and the wait states are
* synchronized using waitq.lock.
*/
spin_lock_irq(&iocg->waitq.lock);
/*
* We activated above but w/o any synchronization. Deactivation is
* synchronized with waitq.lock and we won't get deactivated as
* long as we're waiting, so we're good if we're activated here.
* In the unlikely case that we are deactivated, just issue the IO.
*/
if (unlikely(list_empty(&iocg->active_list))) {
spin_unlock_irq(&iocg->waitq.lock);
iocg_commit_bio(iocg, bio, cost);
return;
}
init_waitqueue_func_entry(&wait.wait, iocg_wake_fn);
wait.wait.private = current;
wait.bio = bio;
@@ -1801,6 +1813,7 @@ static void ioc_rqos_merge(struct rq_qos *rqos, struct request *rq,
struct ioc_now now;
u32 hw_inuse;
u64 abs_cost, cost;
unsigned long flags;
/* bypass if disabled or for root cgroup */
if (!ioc->enabled || !iocg->level)
@@ -1820,15 +1833,28 @@ static void ioc_rqos_merge(struct rq_qos *rqos, struct request *rq,
iocg->cursor = bio_end;
/*
* Charge if there's enough vtime budget and the existing request
* has cost assigned. Otherwise, account it as debt. See debt
* handling in ioc_rqos_throttle() for details.
* Charge if there's enough vtime budget and the existing request has
* cost assigned.
*/
if (rq->bio && rq->bio->bi_iocost_cost &&
time_before_eq64(atomic64_read(&iocg->vtime) + cost, now.vnow))
time_before_eq64(atomic64_read(&iocg->vtime) + cost, now.vnow)) {
iocg_commit_bio(iocg, bio, cost);
else
atomic64_add(abs_cost, &iocg->abs_vdebt);
return;
}
/*
* Otherwise, account it as debt if @iocg is online, which it should
* be for the vast majority of cases. See debt handling in
* ioc_rqos_throttle() for details.
*/
spin_lock_irqsave(&iocg->waitq.lock, flags);
if (likely(!list_empty(&iocg->active_list))) {
iocg->abs_vdebt += abs_cost;
iocg_kick_delay(iocg, &now, cost);
} else {
iocg_commit_bio(iocg, bio, cost);
}
spin_unlock_irqrestore(&iocg->waitq.lock, flags);
}
static void ioc_rqos_done_bio(struct rq_qos *rqos, struct bio *bio)
@@ -1998,7 +2024,6 @@ static void ioc_pd_init(struct blkg_policy_data *pd)
iocg->ioc = ioc;
atomic64_set(&iocg->vtime, now.vnow);
atomic64_set(&iocg->done_vtime, now.vnow);
atomic64_set(&iocg->abs_vdebt, 0);
atomic64_set(&iocg->active_period, atomic64_read(&ioc->cur_period));
INIT_LIST_HEAD(&iocg->active_list);
iocg->hweight_active = HWEIGHT_WHOLE;

View File

@@ -287,7 +287,7 @@ static void exit_tfm(struct crypto_skcipher *tfm)
crypto_free_skcipher(ctx->child);
}
static void free(struct skcipher_instance *inst)
static void free_inst(struct skcipher_instance *inst)
{
crypto_drop_skcipher(skcipher_instance_ctx(inst));
kfree(inst);
@@ -400,12 +400,12 @@ static int create(struct crypto_template *tmpl, struct rtattr **tb)
inst->alg.encrypt = encrypt;
inst->alg.decrypt = decrypt;
inst->free = free;
inst->free = free_inst;
err = skcipher_register_instance(tmpl, inst);
if (err) {
err_free_inst:
free(inst);
free_inst(inst);
}
return err;
}

View File

@@ -322,7 +322,7 @@ static void exit_tfm(struct crypto_skcipher *tfm)
crypto_free_cipher(ctx->tweak);
}
static void free(struct skcipher_instance *inst)
static void free_inst(struct skcipher_instance *inst)
{
crypto_drop_skcipher(skcipher_instance_ctx(inst));
kfree(inst);
@@ -434,12 +434,12 @@ static int create(struct crypto_template *tmpl, struct rtattr **tb)
inst->alg.encrypt = encrypt;
inst->alg.decrypt = decrypt;
inst->free = free;
inst->free = free_inst;
err = skcipher_register_instance(tmpl, inst);
if (err) {
err_free_inst:
free(inst);
free_inst(inst);
}
return err;
}

View File

@@ -645,6 +645,7 @@ static void amba_device_initialize(struct amba_device *dev, const char *name)
dev->dev.release = amba_device_release;
dev->dev.bus = &amba_bustype;
dev->dev.dma_mask = &dev->dev.coherent_dma_mask;
dev->dev.dma_parms = &dev->dma_parms;
dev->res.name = dev_name(&dev->dev);
}

View File

@@ -256,7 +256,8 @@ static int try_to_bring_up_master(struct master *master,
ret = master->ops->bind(master->dev);
if (ret < 0) {
devres_release_group(master->dev, NULL);
dev_info(master->dev, "master bind failed: %d\n", ret);
if (ret != -EPROBE_DEFER)
dev_info(master->dev, "master bind failed: %d\n", ret);
return ret;
}
@@ -611,8 +612,9 @@ static int component_bind(struct component *component, struct master *master,
devres_release_group(component->dev, NULL);
devres_release_group(master->dev, NULL);
dev_err(master->dev, "failed to bind %s (ops %ps): %d\n",
dev_name(component->dev), component->ops, ret);
if (ret != -EPROBE_DEFER)
dev_err(master->dev, "failed to bind %s (ops %ps): %d\n",
dev_name(component->dev), component->ops, ret);
}
return ret;

View File

@@ -2370,6 +2370,11 @@ u32 fw_devlink_get_flags(void)
return fw_devlink_flags;
}
static bool fw_devlink_is_permissive(void)
{
return fw_devlink_flags == DL_FLAG_SYNC_STATE_ONLY;
}
/**
* device_add - add device to device hierarchy.
* @dev: device.
@@ -2524,7 +2529,7 @@ int device_add(struct device *dev)
if (fw_devlink_flags && is_fwnode_dev &&
fwnode_has_op(dev->fwnode, add_links)) {
fw_ret = fwnode_call_int_op(dev->fwnode, add_links, dev);
if (fw_ret == -ENODEV)
if (fw_ret == -ENODEV && !fw_devlink_is_permissive())
device_link_wait_for_mandatory_supplier(dev);
else if (fw_ret)
device_link_wait_for_optional_supplier(dev);

View File

@@ -224,17 +224,9 @@ static int deferred_devs_show(struct seq_file *s, void *data)
}
DEFINE_SHOW_ATTRIBUTE(deferred_devs);
#ifdef CONFIG_MODULES
/*
* In the case of modules, set the default probe timeout to
* 30 seconds to give userland some time to load needed modules
*/
int driver_deferred_probe_timeout = 30;
#else
/* In the case of !modules, no probe timeout needed */
int driver_deferred_probe_timeout = -1;
#endif
int driver_deferred_probe_timeout;
EXPORT_SYMBOL_GPL(driver_deferred_probe_timeout);
static DECLARE_WAIT_QUEUE_HEAD(probe_timeout_waitqueue);
static int __init deferred_probe_timeout_setup(char *str)
{
@@ -266,8 +258,8 @@ int driver_deferred_probe_check_state(struct device *dev)
return -ENODEV;
}
if (!driver_deferred_probe_timeout) {
dev_WARN(dev, "deferred probe timeout, ignoring dependency");
if (!driver_deferred_probe_timeout && initcalls_done) {
dev_warn(dev, "deferred probe timeout, ignoring dependency");
return -ETIMEDOUT;
}
@@ -284,6 +276,7 @@ static void deferred_probe_timeout_work_func(struct work_struct *work)
list_for_each_entry_safe(private, p, &deferred_probe_pending_list, deferred_probe)
dev_info(private->device, "deferred probe pending");
wake_up(&probe_timeout_waitqueue);
}
static DECLARE_DELAYED_WORK(deferred_probe_timeout_work, deferred_probe_timeout_work_func);
@@ -658,6 +651,9 @@ int driver_probe_done(void)
*/
void wait_for_device_probe(void)
{
/* wait for probe timeout */
wait_event(probe_timeout_waitqueue, !driver_deferred_probe_timeout);
/* wait for the deferred probe workqueue to finish */
flush_work(&deferred_probe_work);

View File

@@ -380,6 +380,8 @@ struct platform_object {
*/
static void setup_pdev_dma_masks(struct platform_device *pdev)
{
pdev->dev.dma_parms = &pdev->dma_parms;
if (!pdev->dev.coherent_dma_mask)
pdev->dev.coherent_dma_mask = DMA_BIT_MASK(32);
if (!pdev->dev.dma_mask) {

View File

@@ -33,6 +33,15 @@ struct virtio_blk_vq {
} ____cacheline_aligned_in_smp;
struct virtio_blk {
/*
* This mutex must be held by anything that may run after
* virtblk_remove() sets vblk->vdev to NULL.
*
* blk-mq, virtqueue processing, and sysfs attribute code paths are
* shut down before vblk->vdev is set to NULL and therefore do not need
* to hold this mutex.
*/
struct mutex vdev_mutex;
struct virtio_device *vdev;
/* The disk structure for the kernel. */
@@ -44,6 +53,13 @@ struct virtio_blk {
/* Process context for config space updates */
struct work_struct config_work;
/*
* Tracks references from block_device_operations open/release and
* virtio_driver probe/remove so this object can be freed once no
* longer in use.
*/
refcount_t refs;
/* What host tells us, plus 2 for header & tailer. */
unsigned int sg_elems;
@@ -295,10 +311,55 @@ out:
return err;
}
static void virtblk_get(struct virtio_blk *vblk)
{
refcount_inc(&vblk->refs);
}
static void virtblk_put(struct virtio_blk *vblk)
{
if (refcount_dec_and_test(&vblk->refs)) {
ida_simple_remove(&vd_index_ida, vblk->index);
mutex_destroy(&vblk->vdev_mutex);
kfree(vblk);
}
}
static int virtblk_open(struct block_device *bd, fmode_t mode)
{
struct virtio_blk *vblk = bd->bd_disk->private_data;
int ret = 0;
mutex_lock(&vblk->vdev_mutex);
if (vblk->vdev)
virtblk_get(vblk);
else
ret = -ENXIO;
mutex_unlock(&vblk->vdev_mutex);
return ret;
}
static void virtblk_release(struct gendisk *disk, fmode_t mode)
{
struct virtio_blk *vblk = disk->private_data;
virtblk_put(vblk);
}
/* We provide getgeo only to please some old bootloader/partitioning tools */
static int virtblk_getgeo(struct block_device *bd, struct hd_geometry *geo)
{
struct virtio_blk *vblk = bd->bd_disk->private_data;
int ret = 0;
mutex_lock(&vblk->vdev_mutex);
if (!vblk->vdev) {
ret = -ENXIO;
goto out;
}
/* see if the host passed in geometry config */
if (virtio_has_feature(vblk->vdev, VIRTIO_BLK_F_GEOMETRY)) {
@@ -314,11 +375,15 @@ static int virtblk_getgeo(struct block_device *bd, struct hd_geometry *geo)
geo->sectors = 1 << 5;
geo->cylinders = get_capacity(bd->bd_disk) >> 11;
}
return 0;
out:
mutex_unlock(&vblk->vdev_mutex);
return ret;
}
static const struct block_device_operations virtblk_fops = {
.owner = THIS_MODULE,
.open = virtblk_open,
.release = virtblk_release,
.getgeo = virtblk_getgeo,
};
@@ -655,6 +720,10 @@ static int virtblk_probe(struct virtio_device *vdev)
goto out_free_index;
}
/* This reference is dropped in virtblk_remove(). */
refcount_set(&vblk->refs, 1);
mutex_init(&vblk->vdev_mutex);
vblk->vdev = vdev;
vblk->sg_elems = sg_elems;
@@ -820,8 +889,6 @@ out:
static void virtblk_remove(struct virtio_device *vdev)
{
struct virtio_blk *vblk = vdev->priv;
int index = vblk->index;
int refc;
/* Make sure no work handler is accessing the device. */
flush_work(&vblk->config_work);
@@ -831,18 +898,21 @@ static void virtblk_remove(struct virtio_device *vdev)
blk_mq_free_tag_set(&vblk->tag_set);
mutex_lock(&vblk->vdev_mutex);
/* Stop all the virtqueues. */
vdev->config->reset(vdev);
refc = kref_read(&disk_to_dev(vblk->disk)->kobj.kref);
/* Virtqueues are stopped, nothing can use vblk->vdev anymore. */
vblk->vdev = NULL;
put_disk(vblk->disk);
vdev->config->del_vqs(vdev);
kfree(vblk->vqs);
kfree(vblk);
/* Only free device id if we don't have any users */
if (refc == 1)
ida_simple_remove(&vd_index_ida, index);
mutex_unlock(&vblk->vdev_mutex);
virtblk_put(vblk);
}
#ifdef CONFIG_PM_SLEEP

View File

@@ -812,10 +812,9 @@ int mhi_register_controller(struct mhi_controller *mhi_cntrl,
if (!mhi_cntrl)
return -EINVAL;
if (!mhi_cntrl->runtime_get || !mhi_cntrl->runtime_put)
return -EINVAL;
if (!mhi_cntrl->status_cb || !mhi_cntrl->link_status)
if (!mhi_cntrl->runtime_get || !mhi_cntrl->runtime_put ||
!mhi_cntrl->status_cb || !mhi_cntrl->read_reg ||
!mhi_cntrl->write_reg)
return -EINVAL;
ret = parse_config(mhi_cntrl, config);

View File

@@ -11,9 +11,6 @@
extern struct bus_type mhi_bus_type;
/* MHI MMIO register mapping */
#define PCI_INVALID_READ(val) (val == U32_MAX)
#define MHIREGLEN (0x0)
#define MHIREGLEN_MHIREGLEN_MASK (0xFFFFFFFF)
#define MHIREGLEN_MHIREGLEN_SHIFT (0)

View File

@@ -18,16 +18,7 @@
int __must_check mhi_read_reg(struct mhi_controller *mhi_cntrl,
void __iomem *base, u32 offset, u32 *out)
{
u32 tmp = readl(base + offset);
/* If there is any unexpected value, query the link status */
if (PCI_INVALID_READ(tmp) &&
mhi_cntrl->link_status(mhi_cntrl))
return -EIO;
*out = tmp;
return 0;
return mhi_cntrl->read_reg(mhi_cntrl, base + offset, out);
}
int __must_check mhi_read_reg_field(struct mhi_controller *mhi_cntrl,
@@ -49,7 +40,7 @@ int __must_check mhi_read_reg_field(struct mhi_controller *mhi_cntrl,
void mhi_write_reg(struct mhi_controller *mhi_cntrl, void __iomem *base,
u32 offset, u32 val)
{
writel(val, base + offset);
mhi_cntrl->write_reg(mhi_cntrl, base + offset, val);
}
void mhi_write_reg_field(struct mhi_controller *mhi_cntrl, void __iomem *base,
@@ -294,7 +285,7 @@ void mhi_create_devices(struct mhi_controller *mhi_cntrl)
!(mhi_chan->ee_mask & BIT(mhi_cntrl->ee)))
continue;
mhi_dev = mhi_alloc_device(mhi_cntrl);
if (!mhi_dev)
if (IS_ERR(mhi_dev))
return;
mhi_dev->dev_type = MHI_DEVICE_XFER;
@@ -336,7 +327,8 @@ void mhi_create_devices(struct mhi_controller *mhi_cntrl)
/* Channel name is same for both UL and DL */
mhi_dev->chan_name = mhi_chan->name;
dev_set_name(&mhi_dev->dev, "%04x_%s", mhi_chan->chan,
dev_set_name(&mhi_dev->dev, "%s_%s",
dev_name(mhi_cntrl->cntrl_dev),
mhi_dev->chan_name);
/* Init wakeup source if available */

View File

@@ -902,7 +902,11 @@ int mhi_sync_power_up(struct mhi_controller *mhi_cntrl)
MHI_PM_IN_ERROR_STATE(mhi_cntrl->pm_state),
msecs_to_jiffies(mhi_cntrl->timeout_ms));
return (MHI_IN_MISSION_MODE(mhi_cntrl->ee)) ? 0 : -EIO;
ret = (MHI_IN_MISSION_MODE(mhi_cntrl->ee)) ? 0 : -ETIMEDOUT;
if (ret)
mhi_power_down(mhi_cntrl, false);
return ret;
}
EXPORT_SYMBOL(mhi_sync_power_up);

View File

@@ -673,41 +673,14 @@ int chcr_ktls_cpl_set_tcb_rpl(struct adapter *adap, unsigned char *input)
return 0;
}
/*
* chcr_write_cpl_set_tcb_ulp: update tcb values.
* TCB is responsible to create tcp headers, so all the related values
* should be correctly updated.
* @tx_info - driver specific tls info.
* @q - tx queue on which packet is going out.
* @tid - TCB identifier.
* @pos - current index where should we start writing.
* @word - TCB word.
* @mask - TCB word related mask.
* @val - TCB word related value.
* @reply - set 1 if looking for TP response.
* return - next position to write.
*/
static void *chcr_write_cpl_set_tcb_ulp(struct chcr_ktls_info *tx_info,
struct sge_eth_txq *q, u32 tid,
void *pos, u16 word, u64 mask,
static void *__chcr_write_cpl_set_tcb_ulp(struct chcr_ktls_info *tx_info,
u32 tid, void *pos, u16 word, u64 mask,
u64 val, u32 reply)
{
struct cpl_set_tcb_field_core *cpl;
struct ulptx_idata *idata;
struct ulp_txpkt *txpkt;
void *save_pos = NULL;
u8 buf[48] = {0};
int left;
left = (void *)q->q.stat - pos;
if (unlikely(left < CHCR_SET_TCB_FIELD_LEN)) {
if (!left) {
pos = q->q.desc;
} else {
save_pos = pos;
pos = buf;
}
}
/* ULP_TXPKT */
txpkt = pos;
txpkt->cmd_dest = htonl(ULPTX_CMD_V(ULP_TX_PKT) | ULP_TXPKT_DEST_V(0));
@@ -732,18 +705,54 @@ static void *chcr_write_cpl_set_tcb_ulp(struct chcr_ktls_info *tx_info,
idata = (struct ulptx_idata *)(cpl + 1);
idata->cmd_more = htonl(ULPTX_CMD_V(ULP_TX_SC_NOOP));
idata->len = htonl(0);
pos = idata + 1;
if (save_pos) {
pos = chcr_copy_to_txd(buf, &q->q, save_pos,
CHCR_SET_TCB_FIELD_LEN);
} else {
/* check again if we are at the end of the queue */
if (left == CHCR_SET_TCB_FIELD_LEN)
return pos;
}
/*
* chcr_write_cpl_set_tcb_ulp: update tcb values.
* TCB is responsible to create tcp headers, so all the related values
* should be correctly updated.
* @tx_info - driver specific tls info.
* @q - tx queue on which packet is going out.
* @tid - TCB identifier.
* @pos - current index where should we start writing.
* @word - TCB word.
* @mask - TCB word related mask.
* @val - TCB word related value.
* @reply - set 1 if looking for TP response.
* return - next position to write.
*/
static void *chcr_write_cpl_set_tcb_ulp(struct chcr_ktls_info *tx_info,
struct sge_eth_txq *q, u32 tid,
void *pos, u16 word, u64 mask,
u64 val, u32 reply)
{
int left = (void *)q->q.stat - pos;
if (unlikely(left < CHCR_SET_TCB_FIELD_LEN)) {
if (!left) {
pos = q->q.desc;
else
pos = idata + 1;
} else {
u8 buf[48] = {0};
__chcr_write_cpl_set_tcb_ulp(tx_info, tid, buf, word,
mask, val, reply);
return chcr_copy_to_txd(buf, &q->q, pos,
CHCR_SET_TCB_FIELD_LEN);
}
}
pos = __chcr_write_cpl_set_tcb_ulp(tx_info, tid, pos, word,
mask, val, reply);
/* check again if we are at the end of the queue */
if (left == CHCR_SET_TCB_FIELD_LEN)
pos = q->q.desc;
return pos;
}

View File

@@ -16,7 +16,7 @@
int efi_tpm_final_log_size;
EXPORT_SYMBOL(efi_tpm_final_log_size);
static int tpm2_calc_event_log_size(void *data, int count, void *size_info)
static int __init tpm2_calc_event_log_size(void *data, int count, void *size_info)
{
struct tcg_pcr_event2_head *header;
int event_size, size = 0;

View File

@@ -3372,15 +3372,12 @@ int amdgpu_device_suspend(struct drm_device *dev, bool fbcon)
}
}
amdgpu_device_set_pg_state(adev, AMD_PG_STATE_UNGATE);
amdgpu_device_set_cg_state(adev, AMD_CG_STATE_UNGATE);
amdgpu_amdkfd_suspend(adev, !fbcon);
amdgpu_ras_suspend(adev);
r = amdgpu_device_ip_suspend_phase1(adev);
amdgpu_amdkfd_suspend(adev, !fbcon);
/* evict vram memory */
amdgpu_bo_evict_vram(adev);

View File

@@ -2008,17 +2008,22 @@ void amdgpu_dm_update_connector_after_detect(
dc_sink_retain(aconnector->dc_sink);
if (sink->dc_edid.length == 0) {
aconnector->edid = NULL;
drm_dp_cec_unset_edid(&aconnector->dm_dp_aux.aux);
if (aconnector->dc_link->aux_mode) {
drm_dp_cec_unset_edid(
&aconnector->dm_dp_aux.aux);
}
} else {
aconnector->edid =
(struct edid *) sink->dc_edid.raw_edid;
(struct edid *)sink->dc_edid.raw_edid;
drm_connector_update_edid_property(connector,
aconnector->edid);
drm_dp_cec_set_edid(&aconnector->dm_dp_aux.aux,
aconnector->edid);
aconnector->edid);
if (aconnector->dc_link->aux_mode)
drm_dp_cec_set_edid(&aconnector->dm_dp_aux.aux,
aconnector->edid);
}
amdgpu_dm_update_freesync_caps(connector, aconnector->edid);
update_connector_ext_caps(aconnector);
} else {

View File

@@ -834,11 +834,10 @@ static void disable_dangling_plane(struct dc *dc, struct dc_state *context)
static void wait_for_no_pipes_pending(struct dc *dc, struct dc_state *context)
{
int i;
int count = 0;
struct pipe_ctx *pipe;
PERF_TRACE();
for (i = 0; i < MAX_PIPES; i++) {
pipe = &context->res_ctx.pipe_ctx[i];
int count = 0;
struct pipe_ctx *pipe = &context->res_ctx.pipe_ctx[i];
if (!pipe->plane_state)
continue;

View File

@@ -3068,25 +3068,32 @@ validate_out:
return out;
}
bool dcn20_validate_bandwidth(struct dc *dc, struct dc_state *context,
bool fast_validate)
/*
* This must be noinline to ensure anything that deals with FP registers
* is contained within this call; previously our compiling with hard-float
* would result in fp instructions being emitted outside of the boundaries
* of the DC_FP_START/END macros, which makes sense as the compiler has no
* idea about what is wrapped and what is not
*
* This is largely just a workaround to avoid breakage introduced with 5.6,
* ideally all fp-using code should be moved into its own file, only that
* should be compiled with hard-float, and all code exported from there
* should be strictly wrapped with DC_FP_START/END
*/
static noinline bool dcn20_validate_bandwidth_fp(struct dc *dc,
struct dc_state *context, bool fast_validate)
{
bool voltage_supported = false;
bool full_pstate_supported = false;
bool dummy_pstate_supported = false;
double p_state_latency_us;
DC_FP_START();
p_state_latency_us = context->bw_ctx.dml.soc.dram_clock_change_latency_us;
context->bw_ctx.dml.soc.disable_dram_clock_change_vactive_support =
dc->debug.disable_dram_clock_change_vactive_support;
if (fast_validate) {
voltage_supported = dcn20_validate_bandwidth_internal(dc, context, true);
DC_FP_END();
return voltage_supported;
return dcn20_validate_bandwidth_internal(dc, context, true);
}
// Best case, we support full UCLK switch latency
@@ -3115,7 +3122,15 @@ bool dcn20_validate_bandwidth(struct dc *dc, struct dc_state *context,
restore_dml_state:
context->bw_ctx.dml.soc.dram_clock_change_latency_us = p_state_latency_us;
return voltage_supported;
}
bool dcn20_validate_bandwidth(struct dc *dc, struct dc_state *context,
bool fast_validate)
{
bool voltage_supported = false;
DC_FP_START();
voltage_supported = dcn20_validate_bandwidth_fp(dc, context, fast_validate);
DC_FP_END();
return voltage_supported;
}

View File

@@ -1200,7 +1200,7 @@ static void dml_rq_dlg_get_dlg_params(
min_hratio_fact_l = 1.0;
min_hratio_fact_c = 1.0;
if (htaps_l <= 1)
if (hratio_l <= 1)
min_hratio_fact_l = 2.0;
else if (htaps_l <= 6) {
if ((hratio_l * 2.0) > 4.0)
@@ -1216,7 +1216,7 @@ static void dml_rq_dlg_get_dlg_params(
hscale_pixel_rate_l = min_hratio_fact_l * dppclk_freq_in_mhz;
if (htaps_c <= 1)
if (hratio_c <= 1)
min_hratio_fact_c = 2.0;
else if (htaps_c <= 6) {
if ((hratio_c * 2.0) > 4.0)
@@ -1522,8 +1522,8 @@ static void dml_rq_dlg_get_dlg_params(
disp_dlg_regs->refcyc_per_vm_group_vblank = get_refcyc_per_vm_group_vblank(mode_lib, e2e_pipe_param, num_pipes, pipe_idx) * refclk_freq_in_mhz;
disp_dlg_regs->refcyc_per_vm_group_flip = get_refcyc_per_vm_group_flip(mode_lib, e2e_pipe_param, num_pipes, pipe_idx) * refclk_freq_in_mhz;
disp_dlg_regs->refcyc_per_vm_req_vblank = get_refcyc_per_vm_req_vblank(mode_lib, e2e_pipe_param, num_pipes, pipe_idx) * refclk_freq_in_mhz;
disp_dlg_regs->refcyc_per_vm_req_flip = get_refcyc_per_vm_req_flip(mode_lib, e2e_pipe_param, num_pipes, pipe_idx) * refclk_freq_in_mhz;
disp_dlg_regs->refcyc_per_vm_req_vblank = get_refcyc_per_vm_req_vblank(mode_lib, e2e_pipe_param, num_pipes, pipe_idx) * refclk_freq_in_mhz * dml_pow(2, 10);
disp_dlg_regs->refcyc_per_vm_req_flip = get_refcyc_per_vm_req_flip(mode_lib, e2e_pipe_param, num_pipes, pipe_idx) * refclk_freq_in_mhz * dml_pow(2, 10);
// Clamp to max for now
if (disp_dlg_regs->refcyc_per_vm_group_vblank >= (unsigned int)dml_pow(2, 23))

View File

@@ -108,7 +108,7 @@
#define ASSERT(expr) ASSERT_CRITICAL(expr)
#else
#define ASSERT(expr) WARN_ON(!(expr))
#define ASSERT(expr) WARN_ON_ONCE(!(expr))
#endif
#define BREAK_TO_DEBUGGER() ASSERT(0)

View File

@@ -241,8 +241,12 @@ static int drm_hdcp_request_srm(struct drm_device *drm_dev,
ret = request_firmware_direct(&fw, (const char *)fw_name,
drm_dev->dev);
if (ret < 0)
if (ret < 0) {
*revoked_ksv_cnt = 0;
*revoked_ksv_list = NULL;
ret = 0;
goto exit;
}
if (fw->size && fw->data)
ret = drm_hdcp_srm_update(fw->data, fw->size, revoked_ksv_list,
@@ -287,6 +291,8 @@ int drm_hdcp_check_ksvs_revoked(struct drm_device *drm_dev, u8 *ksvs,
ret = drm_hdcp_request_srm(drm_dev, &revoked_ksv_list,
&revoked_ksv_cnt);
if (ret)
return ret;
/* revoked_ksv_cnt will be zero when above function failed */
for (i = 0; i < revoked_ksv_cnt; i++)

View File

@@ -843,6 +843,7 @@ static const struct of_device_id ingenic_drm_of_match[] = {
{ .compatible = "ingenic,jz4770-lcd", .data = &jz4770_soc_info },
{ /* sentinel */ },
};
MODULE_DEVICE_TABLE(of, ingenic_drm_of_match);
static struct platform_driver ingenic_drm_driver = {
.driver = {

View File

@@ -717,7 +717,7 @@ static void sun6i_dsi_encoder_enable(struct drm_encoder *encoder)
struct drm_display_mode *mode = &encoder->crtc->state->adjusted_mode;
struct sun6i_dsi *dsi = encoder_to_sun6i_dsi(encoder);
struct mipi_dsi_device *device = dsi->device;
union phy_configure_opts opts = { 0 };
union phy_configure_opts opts = { };
struct phy_configure_opts_mipi_dphy *cfg = &opts.mipi_dphy;
u16 delay;
int err;

View File

@@ -221,6 +221,7 @@ struct virtio_gpu_fpriv {
/* virtio_ioctl.c */
#define DRM_VIRTIO_NUM_IOCTLS 10
extern struct drm_ioctl_desc virtio_gpu_ioctls[DRM_VIRTIO_NUM_IOCTLS];
void virtio_gpu_create_context(struct drm_device *dev, struct drm_file *file);
/* virtio_kms.c */
int virtio_gpu_init(struct drm_device *dev);

View File

@@ -39,6 +39,9 @@ int virtio_gpu_gem_create(struct drm_file *file,
int ret;
u32 handle;
if (vgdev->has_virgl_3d)
virtio_gpu_create_context(dev, file);
ret = virtio_gpu_object_create(vgdev, params, &obj, NULL);
if (ret < 0)
return ret;

View File

@@ -34,8 +34,7 @@
#include "virtgpu_drv.h"
static void virtio_gpu_create_context(struct drm_device *dev,
struct drm_file *file)
void virtio_gpu_create_context(struct drm_device *dev, struct drm_file *file)
{
struct virtio_gpu_device *vgdev = dev->dev_private;
struct virtio_gpu_fpriv *vfpriv = file->driver_priv;

View File

@@ -1155,6 +1155,7 @@ config HID_ALPS
config HID_MCP2221
tristate "Microchip MCP2221 HID USB-to-I2C/SMbus host support"
depends on USB_HID && I2C
depends on GPIOLIB
---help---
Provides I2C and SMBUS host adapter functionality over USB-HID
through MCP2221 device.

View File

@@ -802,6 +802,7 @@ static int alps_probe(struct hid_device *hdev, const struct hid_device_id *id)
break;
case HID_DEVICE_ID_ALPS_U1_DUAL:
case HID_DEVICE_ID_ALPS_U1:
case HID_DEVICE_ID_ALPS_U1_UNICORN_LEGACY:
data->dev_type = U1;
break;
default:

View File

@@ -79,10 +79,10 @@
#define HID_DEVICE_ID_ALPS_U1_DUAL_PTP 0x121F
#define HID_DEVICE_ID_ALPS_U1_DUAL_3BTN_PTP 0x1220
#define HID_DEVICE_ID_ALPS_U1 0x1215
#define HID_DEVICE_ID_ALPS_U1_UNICORN_LEGACY 0x121E
#define HID_DEVICE_ID_ALPS_T4_BTNLESS 0x120C
#define HID_DEVICE_ID_ALPS_1222 0x1222
#define USB_VENDOR_ID_AMI 0x046b
#define USB_DEVICE_ID_AMI_VIRT_KEYBOARD_AND_MOUSE 0xff10
@@ -385,6 +385,7 @@
#define USB_DEVICE_ID_DWAV_EGALAX_MULTITOUCH_7349 0x7349
#define USB_DEVICE_ID_DWAV_EGALAX_MULTITOUCH_73F7 0x73f7
#define USB_DEVICE_ID_DWAV_EGALAX_MULTITOUCH_A001 0xa001
#define USB_DEVICE_ID_DWAV_EGALAX_MULTITOUCH_C002 0xc002
#define USB_VENDOR_ID_ELAN 0x04f3
#define USB_DEVICE_ID_TOSHIBA_CLICK_L9W 0x0401
@@ -759,6 +760,7 @@
#define USB_DEVICE_ID_LOGITECH_RUMBLEPAD2 0xc218
#define USB_DEVICE_ID_LOGITECH_RUMBLEPAD2_2 0xc219
#define USB_DEVICE_ID_LOGITECH_G15_LCD 0xc222
#define USB_DEVICE_ID_LOGITECH_G11 0xc225
#define USB_DEVICE_ID_LOGITECH_G15_V2_LCD 0xc227
#define USB_DEVICE_ID_LOGITECH_G510 0xc22d
#define USB_DEVICE_ID_LOGITECH_G510_USB_AUDIO 0xc22e
@@ -1097,6 +1099,9 @@
#define USB_DEVICE_ID_SYMBOL_SCANNER_2 0x1300
#define USB_DEVICE_ID_SYMBOL_SCANNER_3 0x1200
#define I2C_VENDOR_ID_SYNAPTICS 0x06cb
#define I2C_PRODUCT_ID_SYNAPTICS_SYNA2393 0x7a13
#define USB_VENDOR_ID_SYNAPTICS 0x06cb
#define USB_DEVICE_ID_SYNAPTICS_TP 0x0001
#define USB_DEVICE_ID_SYNAPTICS_INT_TP 0x0002
@@ -1111,6 +1116,7 @@
#define USB_DEVICE_ID_SYNAPTICS_LTS2 0x1d10
#define USB_DEVICE_ID_SYNAPTICS_HD 0x0ac3
#define USB_DEVICE_ID_SYNAPTICS_QUAD_HD 0x1ac3
#define USB_DEVICE_ID_SYNAPTICS_DELL_K12A 0x2819
#define USB_DEVICE_ID_SYNAPTICS_ACER_SWITCH5_012 0x2968
#define USB_DEVICE_ID_SYNAPTICS_TP_V103 0x5710
#define USB_DEVICE_ID_SYNAPTICS_ACER_SWITCH5 0x81a7

View File

@@ -872,6 +872,10 @@ error_hw_stop:
}
static const struct hid_device_id lg_g15_devices[] = {
/* The G11 is a G15 without the LCD, treat it as a G15 */
{ HID_USB_DEVICE(USB_VENDOR_ID_LOGITECH,
USB_DEVICE_ID_LOGITECH_G11),
.driver_data = LG_G15 },
{ HID_USB_DEVICE(USB_VENDOR_ID_LOGITECH,
USB_DEVICE_ID_LOGITECH_G15_LCD),
.driver_data = LG_G15 },

View File

@@ -1922,6 +1922,9 @@ static const struct hid_device_id mt_devices[] = {
{ .driver_data = MT_CLS_EGALAX_SERIAL,
MT_USB_DEVICE(USB_VENDOR_ID_DWAV,
USB_DEVICE_ID_DWAV_EGALAX_MULTITOUCH_A001) },
{ .driver_data = MT_CLS_EGALAX,
MT_USB_DEVICE(USB_VENDOR_ID_DWAV,
USB_DEVICE_ID_DWAV_EGALAX_MULTITOUCH_C002) },
/* Elitegroup panel */
{ .driver_data = MT_CLS_SERIAL,

View File

@@ -163,6 +163,7 @@ static const struct hid_device_id hid_quirks[] = {
{ HID_USB_DEVICE(USB_VENDOR_ID_SYNAPTICS, USB_DEVICE_ID_SYNAPTICS_LTS2), HID_QUIRK_NO_INIT_REPORTS },
{ HID_USB_DEVICE(USB_VENDOR_ID_SYNAPTICS, USB_DEVICE_ID_SYNAPTICS_QUAD_HD), HID_QUIRK_NO_INIT_REPORTS },
{ HID_USB_DEVICE(USB_VENDOR_ID_SYNAPTICS, USB_DEVICE_ID_SYNAPTICS_TP_V103), HID_QUIRK_NO_INIT_REPORTS },
{ HID_USB_DEVICE(USB_VENDOR_ID_SYNAPTICS, USB_DEVICE_ID_SYNAPTICS_DELL_K12A), HID_QUIRK_NO_INIT_REPORTS },
{ HID_USB_DEVICE(USB_VENDOR_ID_TOPMAX, USB_DEVICE_ID_TOPMAX_COBRAPAD), HID_QUIRK_BADPAD },
{ HID_USB_DEVICE(USB_VENDOR_ID_TOUCHPACK, USB_DEVICE_ID_TOUCHPACK_RTS), HID_QUIRK_MULTI_INPUT },
{ HID_USB_DEVICE(USB_VENDOR_ID_TPV, USB_DEVICE_ID_TPV_OPTICAL_TOUCHSCREEN_8882), HID_QUIRK_NOGET },

View File

@@ -177,6 +177,8 @@ static const struct i2c_hid_quirks {
I2C_HID_QUIRK_BOGUS_IRQ },
{ USB_VENDOR_ID_ALPS_JP, HID_ANY_ID,
I2C_HID_QUIRK_RESET_ON_RESUME },
{ I2C_VENDOR_ID_SYNAPTICS, I2C_PRODUCT_ID_SYNAPTICS_SYNA2393,
I2C_HID_QUIRK_RESET_ON_RESUME },
{ USB_VENDOR_ID_ITE, I2C_DEVICE_ID_ITE_LENOVO_LEGION_Y720,
I2C_HID_QUIRK_BAD_INPUT_SIZE },
{ 0, 0 }

View File

@@ -682,16 +682,21 @@ static int usbhid_open(struct hid_device *hid)
struct usbhid_device *usbhid = hid->driver_data;
int res;
mutex_lock(&usbhid->mutex);
set_bit(HID_OPENED, &usbhid->iofl);
if (hid->quirks & HID_QUIRK_ALWAYS_POLL)
return 0;
if (hid->quirks & HID_QUIRK_ALWAYS_POLL) {
res = 0;
goto Done;
}
res = usb_autopm_get_interface(usbhid->intf);
/* the device must be awake to reliably request remote wakeup */
if (res < 0) {
clear_bit(HID_OPENED, &usbhid->iofl);
return -EIO;
res = -EIO;
goto Done;
}
usbhid->intf->needs_remote_wakeup = 1;
@@ -725,6 +730,9 @@ static int usbhid_open(struct hid_device *hid)
msleep(50);
clear_bit(HID_RESUME_RUNNING, &usbhid->iofl);
Done:
mutex_unlock(&usbhid->mutex);
return res;
}
@@ -732,6 +740,8 @@ static void usbhid_close(struct hid_device *hid)
{
struct usbhid_device *usbhid = hid->driver_data;
mutex_lock(&usbhid->mutex);
/*
* Make sure we don't restart data acquisition due to
* a resumption we no longer care about by avoiding racing
@@ -743,12 +753,13 @@ static void usbhid_close(struct hid_device *hid)
clear_bit(HID_IN_POLLING, &usbhid->iofl);
spin_unlock_irq(&usbhid->lock);
if (hid->quirks & HID_QUIRK_ALWAYS_POLL)
return;
if (!(hid->quirks & HID_QUIRK_ALWAYS_POLL)) {
hid_cancel_delayed_stuff(usbhid);
usb_kill_urb(usbhid->urbin);
usbhid->intf->needs_remote_wakeup = 0;
}
hid_cancel_delayed_stuff(usbhid);
usb_kill_urb(usbhid->urbin);
usbhid->intf->needs_remote_wakeup = 0;
mutex_unlock(&usbhid->mutex);
}
/*
@@ -1057,6 +1068,8 @@ static int usbhid_start(struct hid_device *hid)
unsigned int n, insize = 0;
int ret;
mutex_lock(&usbhid->mutex);
clear_bit(HID_DISCONNECTED, &usbhid->iofl);
usbhid->bufsize = HID_MIN_BUFFER_SIZE;
@@ -1177,6 +1190,8 @@ static int usbhid_start(struct hid_device *hid)
usbhid_set_leds(hid);
device_set_wakeup_enable(&dev->dev, 1);
}
mutex_unlock(&usbhid->mutex);
return 0;
fail:
@@ -1187,6 +1202,7 @@ fail:
usbhid->urbout = NULL;
usbhid->urbctrl = NULL;
hid_free_buffers(dev, hid);
mutex_unlock(&usbhid->mutex);
return ret;
}
@@ -1202,6 +1218,8 @@ static void usbhid_stop(struct hid_device *hid)
usbhid->intf->needs_remote_wakeup = 0;
}
mutex_lock(&usbhid->mutex);
clear_bit(HID_STARTED, &usbhid->iofl);
spin_lock_irq(&usbhid->lock); /* Sync with error and led handlers */
set_bit(HID_DISCONNECTED, &usbhid->iofl);
@@ -1222,6 +1240,8 @@ static void usbhid_stop(struct hid_device *hid)
usbhid->urbout = NULL;
hid_free_buffers(hid_to_usb_dev(hid), hid);
mutex_unlock(&usbhid->mutex);
}
static int usbhid_power(struct hid_device *hid, int lvl)
@@ -1382,6 +1402,7 @@ static int usbhid_probe(struct usb_interface *intf, const struct usb_device_id *
INIT_WORK(&usbhid->reset_work, hid_reset);
timer_setup(&usbhid->io_retry, hid_retry_timeout, 0);
spin_lock_init(&usbhid->lock);
mutex_init(&usbhid->mutex);
ret = hid_add_device(hid);
if (ret) {

View File

@@ -80,6 +80,7 @@ struct usbhid_device {
dma_addr_t outbuf_dma; /* Output buffer dma */
unsigned long last_out; /* record of last output for timeouts */
struct mutex mutex; /* start/stop/open/close */
spinlock_t lock; /* fifo spinlock */
unsigned long iofl; /* I/O flags (CTRL_RUNNING, OUT_RUNNING) */
struct timer_list io_retry; /* Retry timer */

View File

@@ -319,9 +319,11 @@ static void wacom_feature_mapping(struct hid_device *hdev,
data[0] = field->report->id;
ret = wacom_get_report(hdev, HID_FEATURE_REPORT,
data, n, WAC_CMD_RETRIES);
if (ret == n) {
if (ret == n && features->type == HID_GENERIC) {
ret = hid_report_raw_event(hdev,
HID_FEATURE_REPORT, data, n, 0);
} else if (ret == 2 && features->type != HID_GENERIC) {
features->touch_max = data[1];
} else {
features->touch_max = 16;
hid_warn(hdev, "wacom_feature_mapping: "

View File

@@ -1427,11 +1427,13 @@ static void wacom_intuos_pro2_bt_pad(struct wacom_wac *wacom)
{
struct input_dev *pad_input = wacom->pad_input;
unsigned char *data = wacom->data;
int nbuttons = wacom->features.numbered_buttons;
int buttons = data[282] | ((data[281] & 0x40) << 2);
int expresskeys = data[282];
int center = (data[281] & 0x40) >> 6;
int ring = data[285] & 0x7F;
bool ringstatus = data[285] & 0x80;
bool prox = buttons || ringstatus;
bool prox = expresskeys || center || ringstatus;
/* Fix touchring data: userspace expects 0 at left and increasing clockwise */
ring = 71 - ring;
@@ -1439,7 +1441,8 @@ static void wacom_intuos_pro2_bt_pad(struct wacom_wac *wacom)
if (ring > 71)
ring -= 72;
wacom_report_numbered_buttons(pad_input, 9, buttons);
wacom_report_numbered_buttons(pad_input, nbuttons,
expresskeys | (center << (nbuttons - 1)));
input_report_abs(pad_input, ABS_WHEEL, ringstatus ? ring : 0);
@@ -2637,9 +2640,25 @@ static void wacom_wac_finger_pre_report(struct hid_device *hdev,
case HID_DG_TIPSWITCH:
hid_data->last_slot_field = equivalent_usage;
break;
case HID_DG_CONTACTCOUNT:
hid_data->cc_report = report->id;
hid_data->cc_index = i;
hid_data->cc_value_index = j;
break;
}
}
}
if (hid_data->cc_report != 0 &&
hid_data->cc_index >= 0) {
struct hid_field *field = report->field[hid_data->cc_index];
int value = field->value[hid_data->cc_value_index];
if (value)
hid_data->num_expected = value;
}
else {
hid_data->num_expected = wacom_wac->features.touch_max;
}
}
static void wacom_wac_finger_report(struct hid_device *hdev,
@@ -2649,7 +2668,6 @@ static void wacom_wac_finger_report(struct hid_device *hdev,
struct wacom_wac *wacom_wac = &wacom->wacom_wac;
struct input_dev *input = wacom_wac->touch_input;
unsigned touch_max = wacom_wac->features.touch_max;
struct hid_data *hid_data = &wacom_wac->hid_data;
/* If more packets of data are expected, give us a chance to
* process them rather than immediately syncing a partial
@@ -2663,7 +2681,6 @@ static void wacom_wac_finger_report(struct hid_device *hdev,
input_sync(input);
wacom_wac->hid_data.num_received = 0;
hid_data->num_expected = 0;
/* keep touch state for pen event */
wacom_wac->shared->touch_down = wacom_wac_finger_count_touches(wacom_wac);
@@ -2738,73 +2755,12 @@ static void wacom_report_events(struct hid_device *hdev,
}
}
static void wacom_set_num_expected(struct hid_device *hdev,
struct hid_report *report,
int collection_index,
struct hid_field *field,
int field_index)
{
struct wacom *wacom = hid_get_drvdata(hdev);
struct wacom_wac *wacom_wac = &wacom->wacom_wac;
struct hid_data *hid_data = &wacom_wac->hid_data;
unsigned int original_collection_level =
hdev->collection[collection_index].level;
bool end_collection = false;
int i;
if (hid_data->num_expected)
return;
// find the contact count value for this segment
for (i = field_index; i < report->maxfield && !end_collection; i++) {
struct hid_field *field = report->field[i];
unsigned int field_level =
hdev->collection[field->usage[0].collection_index].level;
unsigned int j;
if (field_level != original_collection_level)
continue;
for (j = 0; j < field->maxusage; j++) {
struct hid_usage *usage = &field->usage[j];
if (usage->collection_index != collection_index) {
end_collection = true;
break;
}
if (wacom_equivalent_usage(usage->hid) == HID_DG_CONTACTCOUNT) {
hid_data->cc_report = report->id;
hid_data->cc_index = i;
hid_data->cc_value_index = j;
if (hid_data->cc_report != 0 &&
hid_data->cc_index >= 0) {
struct hid_field *field =
report->field[hid_data->cc_index];
int value =
field->value[hid_data->cc_value_index];
if (value)
hid_data->num_expected = value;
}
}
}
}
if (hid_data->cc_report == 0 || hid_data->cc_index < 0)
hid_data->num_expected = wacom_wac->features.touch_max;
}
static int wacom_wac_collection(struct hid_device *hdev, struct hid_report *report,
int collection_index, struct hid_field *field,
int field_index)
{
struct wacom *wacom = hid_get_drvdata(hdev);
if (WACOM_FINGER_FIELD(field))
wacom_set_num_expected(hdev, report, collection_index, field,
field_index);
wacom_report_events(hdev, report, collection_index, field_index);
/*

View File

@@ -78,7 +78,7 @@ static struct qcom_icc_node *sdm845_osm_l3_nodes[] = {
[SLAVE_OSM_L3] = &sdm845_osm_l3,
};
const static struct qcom_icc_desc sdm845_icc_osm_l3 = {
static const struct qcom_icc_desc sdm845_icc_osm_l3 = {
.nodes = sdm845_osm_l3_nodes,
.num_nodes = ARRAY_SIZE(sdm845_osm_l3_nodes),
};
@@ -91,7 +91,7 @@ static struct qcom_icc_node *sc7180_osm_l3_nodes[] = {
[SLAVE_OSM_L3] = &sc7180_osm_l3,
};
const static struct qcom_icc_desc sc7180_icc_osm_l3 = {
static const struct qcom_icc_desc sc7180_icc_osm_l3 = {
.nodes = sc7180_osm_l3_nodes,
.num_nodes = ARRAY_SIZE(sc7180_osm_l3_nodes),
};

View File

@@ -192,7 +192,7 @@ static struct qcom_icc_node *aggre1_noc_nodes[] = {
[SLAVE_ANOC_PCIE_A1NOC_SNOC] = &qns_pcie_a1noc_snoc,
};
const static struct qcom_icc_desc sdm845_aggre1_noc = {
static const struct qcom_icc_desc sdm845_aggre1_noc = {
.nodes = aggre1_noc_nodes,
.num_nodes = ARRAY_SIZE(aggre1_noc_nodes),
.bcms = aggre1_noc_bcms,
@@ -220,7 +220,7 @@ static struct qcom_icc_node *aggre2_noc_nodes[] = {
[SLAVE_SERVICE_A2NOC] = &srvc_aggre2_noc,
};
const static struct qcom_icc_desc sdm845_aggre2_noc = {
static const struct qcom_icc_desc sdm845_aggre2_noc = {
.nodes = aggre2_noc_nodes,
.num_nodes = ARRAY_SIZE(aggre2_noc_nodes),
.bcms = aggre2_noc_bcms,
@@ -281,7 +281,7 @@ static struct qcom_icc_node *config_noc_nodes[] = {
[SLAVE_SERVICE_CNOC] = &srvc_cnoc,
};
const static struct qcom_icc_desc sdm845_config_noc = {
static const struct qcom_icc_desc sdm845_config_noc = {
.nodes = config_noc_nodes,
.num_nodes = ARRAY_SIZE(config_noc_nodes),
.bcms = config_noc_bcms,
@@ -297,7 +297,7 @@ static struct qcom_icc_node *dc_noc_nodes[] = {
[SLAVE_MEM_NOC_CFG] = &qhs_memnoc,
};
const static struct qcom_icc_desc sdm845_dc_noc = {
static const struct qcom_icc_desc sdm845_dc_noc = {
.nodes = dc_noc_nodes,
.num_nodes = ARRAY_SIZE(dc_noc_nodes),
.bcms = dc_noc_bcms,
@@ -315,7 +315,7 @@ static struct qcom_icc_node *gladiator_noc_nodes[] = {
[SLAVE_SERVICE_GNOC] = &srvc_gnoc,
};
const static struct qcom_icc_desc sdm845_gladiator_noc = {
static const struct qcom_icc_desc sdm845_gladiator_noc = {
.nodes = gladiator_noc_nodes,
.num_nodes = ARRAY_SIZE(gladiator_noc_nodes),
.bcms = gladiator_noc_bcms,
@@ -350,7 +350,7 @@ static struct qcom_icc_node *mem_noc_nodes[] = {
[SLAVE_EBI1] = &ebi,
};
const static struct qcom_icc_desc sdm845_mem_noc = {
static const struct qcom_icc_desc sdm845_mem_noc = {
.nodes = mem_noc_nodes,
.num_nodes = ARRAY_SIZE(mem_noc_nodes),
.bcms = mem_noc_bcms,
@@ -384,7 +384,7 @@ static struct qcom_icc_node *mmss_noc_nodes[] = {
[SLAVE_CAMNOC_UNCOMP] = &qns_camnoc_uncomp,
};
const static struct qcom_icc_desc sdm845_mmss_noc = {
static const struct qcom_icc_desc sdm845_mmss_noc = {
.nodes = mmss_noc_nodes,
.num_nodes = ARRAY_SIZE(mmss_noc_nodes),
.bcms = mmss_noc_bcms,
@@ -430,7 +430,7 @@ static struct qcom_icc_node *system_noc_nodes[] = {
[SLAVE_TCU] = &xs_sys_tcu_cfg,
};
const static struct qcom_icc_desc sdm845_system_noc = {
static const struct qcom_icc_desc sdm845_system_noc = {
.nodes = system_noc_nodes,
.num_nodes = ARRAY_SIZE(system_noc_nodes),
.bcms = system_noc_bcms,

View File

@@ -101,6 +101,8 @@ struct kmem_cache *amd_iommu_irq_cache;
static void update_domain(struct protection_domain *domain);
static int protection_domain_init(struct protection_domain *domain);
static void detach_device(struct device *dev);
static void update_and_flush_device_table(struct protection_domain *domain,
struct domain_pgtable *pgtable);
/****************************************************************************
*
@@ -151,6 +153,26 @@ static struct protection_domain *to_pdomain(struct iommu_domain *dom)
return container_of(dom, struct protection_domain, domain);
}
static void amd_iommu_domain_get_pgtable(struct protection_domain *domain,
struct domain_pgtable *pgtable)
{
u64 pt_root = atomic64_read(&domain->pt_root);
pgtable->root = (u64 *)(pt_root & PAGE_MASK);
pgtable->mode = pt_root & 7; /* lowest 3 bits encode pgtable mode */
}
static u64 amd_iommu_domain_encode_pgtable(u64 *root, int mode)
{
u64 pt_root;
/* lowest 3 bits encode pgtable mode */
pt_root = mode & 7;
pt_root |= (u64)root;
return pt_root;
}
static struct iommu_dev_data *alloc_dev_data(u16 devid)
{
struct iommu_dev_data *dev_data;
@@ -1397,13 +1419,18 @@ static struct page *free_sub_pt(unsigned long root, int mode,
static void free_pagetable(struct protection_domain *domain)
{
unsigned long root = (unsigned long)domain->pt_root;
struct domain_pgtable pgtable;
struct page *freelist = NULL;
unsigned long root;
BUG_ON(domain->mode < PAGE_MODE_NONE ||
domain->mode > PAGE_MODE_6_LEVEL);
amd_iommu_domain_get_pgtable(domain, &pgtable);
atomic64_set(&domain->pt_root, 0);
freelist = free_sub_pt(root, domain->mode, freelist);
BUG_ON(pgtable.mode < PAGE_MODE_NONE ||
pgtable.mode > PAGE_MODE_6_LEVEL);
root = (unsigned long)pgtable.root;
freelist = free_sub_pt(root, pgtable.mode, freelist);
free_page_list(freelist);
}
@@ -1417,24 +1444,39 @@ static bool increase_address_space(struct protection_domain *domain,
unsigned long address,
gfp_t gfp)
{
struct domain_pgtable pgtable;
unsigned long flags;
bool ret = false;
u64 *pte;
bool ret = true;
u64 *pte, root;
spin_lock_irqsave(&domain->lock, flags);
if (address <= PM_LEVEL_SIZE(domain->mode) ||
WARN_ON_ONCE(domain->mode == PAGE_MODE_6_LEVEL))
amd_iommu_domain_get_pgtable(domain, &pgtable);
if (address <= PM_LEVEL_SIZE(pgtable.mode))
goto out;
ret = false;
if (WARN_ON_ONCE(pgtable.mode == PAGE_MODE_6_LEVEL))
goto out;
pte = (void *)get_zeroed_page(gfp);
if (!pte)
goto out;
*pte = PM_LEVEL_PDE(domain->mode,
iommu_virt_to_phys(domain->pt_root));
domain->pt_root = pte;
domain->mode += 1;
*pte = PM_LEVEL_PDE(pgtable.mode, iommu_virt_to_phys(pgtable.root));
pgtable.root = pte;
pgtable.mode += 1;
update_and_flush_device_table(domain, &pgtable);
domain_flush_complete(domain);
/*
* Device Table needs to be updated and flushed before the new root can
* be published.
*/
root = amd_iommu_domain_encode_pgtable(pte, pgtable.mode);
atomic64_set(&domain->pt_root, root);
ret = true;
@@ -1451,16 +1493,29 @@ static u64 *alloc_pte(struct protection_domain *domain,
gfp_t gfp,
bool *updated)
{
struct domain_pgtable pgtable;
int level, end_lvl;
u64 *pte, *page;
BUG_ON(!is_power_of_2(page_size));
while (address > PM_LEVEL_SIZE(domain->mode))
*updated = increase_address_space(domain, address, gfp) || *updated;
amd_iommu_domain_get_pgtable(domain, &pgtable);
level = domain->mode - 1;
pte = &domain->pt_root[PM_LEVEL_INDEX(level, address)];
while (address > PM_LEVEL_SIZE(pgtable.mode)) {
/*
* Return an error if there is no memory to update the
* page-table.
*/
if (!increase_address_space(domain, address, gfp))
return NULL;
/* Read new values to check if update was successful */
amd_iommu_domain_get_pgtable(domain, &pgtable);
}
level = pgtable.mode - 1;
pte = &pgtable.root[PM_LEVEL_INDEX(level, address)];
address = PAGE_SIZE_ALIGN(address, page_size);
end_lvl = PAGE_SIZE_LEVEL(page_size);
@@ -1536,16 +1591,19 @@ static u64 *fetch_pte(struct protection_domain *domain,
unsigned long address,
unsigned long *page_size)
{
struct domain_pgtable pgtable;
int level;
u64 *pte;
*page_size = 0;
if (address > PM_LEVEL_SIZE(domain->mode))
amd_iommu_domain_get_pgtable(domain, &pgtable);
if (address > PM_LEVEL_SIZE(pgtable.mode))
return NULL;
level = domain->mode - 1;
pte = &domain->pt_root[PM_LEVEL_INDEX(level, address)];
level = pgtable.mode - 1;
pte = &pgtable.root[PM_LEVEL_INDEX(level, address)];
*page_size = PTE_LEVEL_PAGE_SIZE(level);
while (level > 0) {
@@ -1660,7 +1718,13 @@ out:
unsigned long flags;
spin_lock_irqsave(&dom->lock, flags);
update_domain(dom);
/*
* Flush domain TLB(s) and wait for completion. Any Device-Table
* Updates and flushing already happened in
* increase_address_space().
*/
domain_flush_tlb_pde(dom);
domain_flush_complete(dom);
spin_unlock_irqrestore(&dom->lock, flags);
}
@@ -1806,6 +1870,7 @@ static void dma_ops_domain_free(struct protection_domain *domain)
static struct protection_domain *dma_ops_domain_alloc(void)
{
struct protection_domain *domain;
u64 *pt_root, root;
domain = kzalloc(sizeof(struct protection_domain), GFP_KERNEL);
if (!domain)
@@ -1814,12 +1879,14 @@ static struct protection_domain *dma_ops_domain_alloc(void)
if (protection_domain_init(domain))
goto free_domain;
domain->mode = PAGE_MODE_3_LEVEL;
domain->pt_root = (void *)get_zeroed_page(GFP_KERNEL);
domain->flags = PD_DMA_OPS_MASK;
if (!domain->pt_root)
pt_root = (void *)get_zeroed_page(GFP_KERNEL);
if (!pt_root)
goto free_domain;
root = amd_iommu_domain_encode_pgtable(pt_root, PAGE_MODE_3_LEVEL);
atomic64_set(&domain->pt_root, root);
domain->flags = PD_DMA_OPS_MASK;
if (iommu_get_dma_cookie(&domain->domain) == -ENOMEM)
goto free_domain;
@@ -1841,16 +1908,17 @@ static bool dma_ops_domain(struct protection_domain *domain)
}
static void set_dte_entry(u16 devid, struct protection_domain *domain,
struct domain_pgtable *pgtable,
bool ats, bool ppr)
{
u64 pte_root = 0;
u64 flags = 0;
u32 old_domid;
if (domain->mode != PAGE_MODE_NONE)
pte_root = iommu_virt_to_phys(domain->pt_root);
if (pgtable->mode != PAGE_MODE_NONE)
pte_root = iommu_virt_to_phys(pgtable->root);
pte_root |= (domain->mode & DEV_ENTRY_MODE_MASK)
pte_root |= (pgtable->mode & DEV_ENTRY_MODE_MASK)
<< DEV_ENTRY_MODE_SHIFT;
pte_root |= DTE_FLAG_IR | DTE_FLAG_IW | DTE_FLAG_V | DTE_FLAG_TV;
@@ -1923,6 +1991,7 @@ static void clear_dte_entry(u16 devid)
static void do_attach(struct iommu_dev_data *dev_data,
struct protection_domain *domain)
{
struct domain_pgtable pgtable;
struct amd_iommu *iommu;
bool ats;
@@ -1938,7 +2007,9 @@ static void do_attach(struct iommu_dev_data *dev_data,
domain->dev_cnt += 1;
/* Update device table */
set_dte_entry(dev_data->devid, domain, ats, dev_data->iommu_v2);
amd_iommu_domain_get_pgtable(domain, &pgtable);
set_dte_entry(dev_data->devid, domain, &pgtable,
ats, dev_data->iommu_v2);
clone_aliases(dev_data->pdev);
device_flush_dte(dev_data);
@@ -2249,23 +2320,36 @@ static int amd_iommu_domain_get_attr(struct iommu_domain *domain,
*
*****************************************************************************/
static void update_device_table(struct protection_domain *domain)
static void update_device_table(struct protection_domain *domain,
struct domain_pgtable *pgtable)
{
struct iommu_dev_data *dev_data;
list_for_each_entry(dev_data, &domain->dev_list, list) {
set_dte_entry(dev_data->devid, domain, dev_data->ats.enabled,
dev_data->iommu_v2);
set_dte_entry(dev_data->devid, domain, pgtable,
dev_data->ats.enabled, dev_data->iommu_v2);
clone_aliases(dev_data->pdev);
}
}
static void update_and_flush_device_table(struct protection_domain *domain,
struct domain_pgtable *pgtable)
{
update_device_table(domain, pgtable);
domain_flush_devices(domain);
}
static void update_domain(struct protection_domain *domain)
{
update_device_table(domain);
struct domain_pgtable pgtable;
domain_flush_devices(domain);
/* Update device table */
amd_iommu_domain_get_pgtable(domain, &pgtable);
update_and_flush_device_table(domain, &pgtable);
/* Flush domain TLB(s) and wait for completion */
domain_flush_tlb_pde(domain);
domain_flush_complete(domain);
}
int __init amd_iommu_init_api(void)
@@ -2375,6 +2459,7 @@ out_err:
static struct iommu_domain *amd_iommu_domain_alloc(unsigned type)
{
struct protection_domain *pdomain;
u64 *pt_root, root;
switch (type) {
case IOMMU_DOMAIN_UNMANAGED:
@@ -2382,13 +2467,15 @@ static struct iommu_domain *amd_iommu_domain_alloc(unsigned type)
if (!pdomain)
return NULL;
pdomain->mode = PAGE_MODE_3_LEVEL;
pdomain->pt_root = (void *)get_zeroed_page(GFP_KERNEL);
if (!pdomain->pt_root) {
pt_root = (void *)get_zeroed_page(GFP_KERNEL);
if (!pt_root) {
protection_domain_free(pdomain);
return NULL;
}
root = amd_iommu_domain_encode_pgtable(pt_root, PAGE_MODE_3_LEVEL);
atomic64_set(&pdomain->pt_root, root);
pdomain->domain.geometry.aperture_start = 0;
pdomain->domain.geometry.aperture_end = ~0ULL;
pdomain->domain.geometry.force_aperture = true;
@@ -2406,7 +2493,7 @@ static struct iommu_domain *amd_iommu_domain_alloc(unsigned type)
if (!pdomain)
return NULL;
pdomain->mode = PAGE_MODE_NONE;
atomic64_set(&pdomain->pt_root, PAGE_MODE_NONE);
break;
default:
return NULL;
@@ -2418,6 +2505,7 @@ static struct iommu_domain *amd_iommu_domain_alloc(unsigned type)
static void amd_iommu_domain_free(struct iommu_domain *dom)
{
struct protection_domain *domain;
struct domain_pgtable pgtable;
domain = to_pdomain(dom);
@@ -2435,7 +2523,9 @@ static void amd_iommu_domain_free(struct iommu_domain *dom)
dma_ops_domain_free(domain);
break;
default:
if (domain->mode != PAGE_MODE_NONE)
amd_iommu_domain_get_pgtable(domain, &pgtable);
if (pgtable.mode != PAGE_MODE_NONE)
free_pagetable(domain);
if (domain->flags & PD_IOMMUV2_MASK)
@@ -2518,10 +2608,12 @@ static int amd_iommu_map(struct iommu_domain *dom, unsigned long iova,
gfp_t gfp)
{
struct protection_domain *domain = to_pdomain(dom);
struct domain_pgtable pgtable;
int prot = 0;
int ret;
if (domain->mode == PAGE_MODE_NONE)
amd_iommu_domain_get_pgtable(domain, &pgtable);
if (pgtable.mode == PAGE_MODE_NONE)
return -EINVAL;
if (iommu_prot & IOMMU_READ)
@@ -2541,8 +2633,10 @@ static size_t amd_iommu_unmap(struct iommu_domain *dom, unsigned long iova,
struct iommu_iotlb_gather *gather)
{
struct protection_domain *domain = to_pdomain(dom);
struct domain_pgtable pgtable;
if (domain->mode == PAGE_MODE_NONE)
amd_iommu_domain_get_pgtable(domain, &pgtable);
if (pgtable.mode == PAGE_MODE_NONE)
return 0;
return iommu_unmap_page(domain, iova, page_size);
@@ -2553,9 +2647,11 @@ static phys_addr_t amd_iommu_iova_to_phys(struct iommu_domain *dom,
{
struct protection_domain *domain = to_pdomain(dom);
unsigned long offset_mask, pte_pgsize;
struct domain_pgtable pgtable;
u64 *pte, __pte;
if (domain->mode == PAGE_MODE_NONE)
amd_iommu_domain_get_pgtable(domain, &pgtable);
if (pgtable.mode == PAGE_MODE_NONE)
return iova;
pte = fetch_pte(domain, iova, &pte_pgsize);
@@ -2708,16 +2804,26 @@ EXPORT_SYMBOL(amd_iommu_unregister_ppr_notifier);
void amd_iommu_domain_direct_map(struct iommu_domain *dom)
{
struct protection_domain *domain = to_pdomain(dom);
struct domain_pgtable pgtable;
unsigned long flags;
u64 pt_root;
spin_lock_irqsave(&domain->lock, flags);
/* First save pgtable configuration*/
amd_iommu_domain_get_pgtable(domain, &pgtable);
/* Update data structure */
domain->mode = PAGE_MODE_NONE;
pt_root = amd_iommu_domain_encode_pgtable(NULL, PAGE_MODE_NONE);
atomic64_set(&domain->pt_root, pt_root);
/* Make changes visible to IOMMUs */
update_domain(domain);
/* Restore old pgtable in domain->ptroot to free page-table */
pt_root = amd_iommu_domain_encode_pgtable(pgtable.root, pgtable.mode);
atomic64_set(&domain->pt_root, pt_root);
/* Page-table is not visible to IOMMU anymore, so free it */
free_pagetable(domain);
@@ -2908,9 +3014,11 @@ static u64 *__get_gcr3_pte(u64 *root, int level, int pasid, bool alloc)
static int __set_gcr3(struct protection_domain *domain, int pasid,
unsigned long cr3)
{
struct domain_pgtable pgtable;
u64 *pte;
if (domain->mode != PAGE_MODE_NONE)
amd_iommu_domain_get_pgtable(domain, &pgtable);
if (pgtable.mode != PAGE_MODE_NONE)
return -EINVAL;
pte = __get_gcr3_pte(domain->gcr3_tbl, domain->glx, pasid, true);
@@ -2924,9 +3032,11 @@ static int __set_gcr3(struct protection_domain *domain, int pasid,
static int __clear_gcr3(struct protection_domain *domain, int pasid)
{
struct domain_pgtable pgtable;
u64 *pte;
if (domain->mode != PAGE_MODE_NONE)
amd_iommu_domain_get_pgtable(domain, &pgtable);
if (pgtable.mode != PAGE_MODE_NONE)
return -EINVAL;
pte = __get_gcr3_pte(domain->gcr3_tbl, domain->glx, pasid, false);

View File

@@ -468,8 +468,7 @@ struct protection_domain {
iommu core code */
spinlock_t lock; /* mostly used to lock the page table*/
u16 id; /* the domain id written to the device table */
int mode; /* paging mode (0-6 levels) */
u64 *pt_root; /* page table root pointer */
atomic64_t pt_root; /* pgtable root and pgtable mode */
int glx; /* Number of levels for GCR3 table */
u64 *gcr3_tbl; /* Guest CR3 table */
unsigned long flags; /* flags to find out type of domain */
@@ -477,6 +476,12 @@ struct protection_domain {
unsigned dev_iommu[MAX_IOMMUS]; /* per-IOMMU reference count */
};
/* For decocded pt_root */
struct domain_pgtable {
int mode;
u64 *root;
};
/*
* Structure where we save information about one hardware AMD IOMMU in the
* system.

View File

@@ -453,7 +453,7 @@ static int viommu_add_resv_mem(struct viommu_endpoint *vdev,
if (!region)
return -ENOMEM;
list_add(&vdev->resv_regions, &region->list);
list_add(&region->list, &vdev->resv_regions);
return 0;
}

View File

@@ -1465,6 +1465,13 @@ static const struct mei_cfg mei_me_pch12_cfg = {
MEI_CFG_DMA_128,
};
/* LBG with quirk for SPS Firmware exclusion */
static const struct mei_cfg mei_me_pch12_sps_cfg = {
MEI_CFG_PCH8_HFS,
MEI_CFG_FW_VER_SUPP,
MEI_CFG_FW_SPS,
};
/* Tiger Lake and newer devices */
static const struct mei_cfg mei_me_pch15_cfg = {
MEI_CFG_PCH8_HFS,
@@ -1487,6 +1494,7 @@ static const struct mei_cfg *const mei_cfg_list[] = {
[MEI_ME_PCH8_CFG] = &mei_me_pch8_cfg,
[MEI_ME_PCH8_SPS_CFG] = &mei_me_pch8_sps_cfg,
[MEI_ME_PCH12_CFG] = &mei_me_pch12_cfg,
[MEI_ME_PCH12_SPS_CFG] = &mei_me_pch12_sps_cfg,
[MEI_ME_PCH15_CFG] = &mei_me_pch15_cfg,
};

View File

@@ -80,6 +80,9 @@ struct mei_me_hw {
* servers platforms with quirk for
* SPS firmware exclusion.
* @MEI_ME_PCH12_CFG: Platform Controller Hub Gen12 and newer
* @MEI_ME_PCH12_SPS_CFG: Platform Controller Hub Gen12 and newer
* servers platforms with quirk for
* SPS firmware exclusion.
* @MEI_ME_PCH15_CFG: Platform Controller Hub Gen15 and newer
* @MEI_ME_NUM_CFG: Upper Sentinel.
*/
@@ -93,6 +96,7 @@ enum mei_cfg_idx {
MEI_ME_PCH8_CFG,
MEI_ME_PCH8_SPS_CFG,
MEI_ME_PCH12_CFG,
MEI_ME_PCH12_SPS_CFG,
MEI_ME_PCH15_CFG,
MEI_ME_NUM_CFG,
};

View File

@@ -70,7 +70,7 @@ static const struct pci_device_id mei_me_pci_tbl[] = {
{MEI_PCI_DEVICE(MEI_DEV_ID_SPT_2, MEI_ME_PCH8_CFG)},
{MEI_PCI_DEVICE(MEI_DEV_ID_SPT_H, MEI_ME_PCH8_SPS_CFG)},
{MEI_PCI_DEVICE(MEI_DEV_ID_SPT_H_2, MEI_ME_PCH8_SPS_CFG)},
{MEI_PCI_DEVICE(MEI_DEV_ID_LBG, MEI_ME_PCH12_CFG)},
{MEI_PCI_DEVICE(MEI_DEV_ID_LBG, MEI_ME_PCH12_SPS_CFG)},
{MEI_PCI_DEVICE(MEI_DEV_ID_BXT_M, MEI_ME_PCH8_CFG)},
{MEI_PCI_DEVICE(MEI_DEV_ID_APL_I, MEI_ME_PCH8_CFG)},

View File

@@ -1483,7 +1483,7 @@ static void __exit most_exit(void)
ida_destroy(&mdev_id);
}
module_init(most_init);
subsys_initcall(most_init);
module_exit(most_exit);
MODULE_LICENSE("GPL");
MODULE_AUTHOR("Christian Gromm <christian.gromm@microchip.com>");

View File

@@ -24,8 +24,8 @@ config NET_DSA_MV88E6XXX_PTP
bool "PTP support for Marvell 88E6xxx"
default n
depends on NET_DSA_MV88E6XXX_GLOBAL2
depends on PTP_1588_CLOCK
imply NETWORK_PHY_TIMESTAMPING
imply PTP_1588_CLOCK
help
Say Y to enable PTP hardware timestamping on Marvell 88E6xxx switch
chips that support it.

Some files were not shown because too many files have changed in this diff Show More