Compare commits

...

858 Commits

Author SHA1 Message Date
Linus Torvalds
1e28eed176 Linux 5.12-rc3 2021-03-14 14:41:02 -07:00
Alexey Dobriyan
c995f12ad8 prctl: fix PR_SET_MM_AUXV kernel stack leak
Doing a

	prctl(PR_SET_MM, PR_SET_MM_AUXV, addr, 1);

will copy 1 byte from userspace to (quite big) on-stack array
and then stash everything to mm->saved_auxv.
AT_NULL terminator will be inserted at the very end.

/proc/*/auxv handler will find that AT_NULL terminator
and copy original stack contents to userspace.

This devious scheme requires CAP_SYS_RESOURCE.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-03-14 14:33:27 -07:00
Linus Torvalds
70404fe303 Merge tag 'irq-urgent-2021-03-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Thomas Gleixner:
 "A set of irqchip updates:

   - Make the GENERIC_IRQ_MULTI_HANDLER configuration correct

   - Add a missing DT compatible string for the Ingenic driver

   - Remove the pointless debugfs_file pointer from struct irqdomain"

* tag 'irq-urgent-2021-03-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip/ingenic: Add support for the JZ4760
  dt-bindings/irq: Add compatible string for the JZ4760B
  irqchip: Do not blindly select CONFIG_GENERIC_IRQ_MULTI_HANDLER
  ARM: ep93xx: Select GENERIC_IRQ_MULTI_HANDLER directly
  irqdomain: Remove debugfs_file from struct irq_domain
2021-03-14 13:33:33 -07:00
Linus Torvalds
802b31c0dd Merge tag 'timers-urgent-2021-03-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fix from Thomas Gleixner:
 "A single fix in for hrtimers to prevent an interrupt storm caused by
  the lack of reevaluation of the timers which expire in softirq context
  under certain circumstances, e.g. when the clock was set"

* tag 'timers-urgent-2021-03-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  hrtimer: Update softirq_expires_next correctly after __hrtimer_get_next_event()
2021-03-14 13:29:38 -07:00
Linus Torvalds
c72cbc9361 Merge tag 'sched-urgent-2021-03-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Thomas Gleixner:
 "A set of scheduler updates:

   - Prevent a NULL pointer dereference in the migration_stop_cpu()
     mechanims

   - Prevent self concurrency of affine_move_task()

   - Small fixes and cleanups related to task migration/affinity setting

   - Ensure that sync_runqueues_membarrier_state() is invoked on the
     current CPU when it is in the cpu mask"

* tag 'sched-urgent-2021-03-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/membarrier: fix missing local execution of ipi_sync_rq_state()
  sched: Simplify set_affinity_pending refcounts
  sched: Fix affine_move_task() self-concurrency
  sched: Optimize migration_cpu_stop()
  sched: Collate affine_move_task() stoppers
  sched: Simplify migration_cpu_stop()
  sched: Fix migration_cpu_stop() requeueing
2021-03-14 13:27:06 -07:00
Linus Torvalds
19469d2ada Merge tag 'objtool-urgent-2021-03-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull objtool fix from Thomas Gleixner:
 "A single objtool fix to handle the PUSHF/POPF validation correctly for
  the paravirt changes which modified arch_local_irq_restore not to use
  popf"

* tag 'objtool-urgent-2021-03-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  objtool,x86: Fix uaccess PUSHF/POPF validation
2021-03-14 13:15:55 -07:00
Linus Torvalds
fa509ff879 Merge tag 'locking-urgent-2021-03-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking fixes from Thomas Gleixner:
 "A couple of locking fixes:

   - A fix for the static_call mechanism so it handles unaligned
     addresses correctly.

   - Make u64_stats_init() a macro so every instance gets a seperate
     lockdep key.

   - Make seqcount_latch_init() a macro as well to preserve the static
     variable which is used for the lockdep key"

* tag 'locking-urgent-2021-03-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  seqlock,lockdep: Fix seqcount_latch_init()
  u64_stats,lockdep: Fix u64_stats_init() vs lockdep
  static_call: Fix the module key fixup
2021-03-14 13:03:21 -07:00
Linus Torvalds
75013c6c52 Merge tag 'perf_urgent_for_v5.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Borislav Petkov:

 - Make sure PMU internal buffers are flushed for per-CPU events too and
   properly handle PID/TID for large PEBS.

 - Handle the case properly when there's no PMU and therefore return an
   empty list of perf MSRs for VMX to switch instead of reading random
   garbage from the stack.

* tag 'perf_urgent_for_v5.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/perf: Use RET0 as default for guest_get_msrs to handle "no PMU" case
  perf/x86/intel: Set PERF_ATTACH_SCHED_CB for large PEBS and LBR
  perf/core: Flush PMU internal buffers for per-CPU events
2021-03-14 12:57:17 -07:00
Linus Torvalds
836d7f0572 Merge tag 'efi-urgent-for-v5.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull EFI fix from Ard Biesheuvel via Borislav Petkov:
 "Fix an oversight in the handling of EFI_RT_PROPERTIES_TABLE, which was
  added v5.10, but failed to take the SetVirtualAddressMap() RT service
  into account"

* tag 'efi-urgent-for-v5.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  efi: stub: omit SetVirtualAddressMap() if marked unsupported in RT_PROP table
2021-03-14 12:54:56 -07:00
Linus Torvalds
0a7c10df49 Merge tag 'x86_urgent_for_v5.12_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:

 - A couple of SEV-ES fixes and robustifications: verify usermode stack
   pointer in NMI is not coming from the syscall gap, correctly track
   IRQ states in the #VC handler and access user insn bytes atomically
   in same handler as latter cannot sleep.

 - Balance 32-bit fast syscall exit path to do the proper work on exit
   and thus not confuse audit and ptrace frameworks.

 - Two fixes for the ORC unwinder going "off the rails" into KASAN
   redzones and when ORC data is missing.

* tag 'x86_urgent_for_v5.12_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/sev-es: Use __copy_from_user_inatomic()
  x86/sev-es: Correctly track IRQ states in runtime #VC handler
  x86/sev-es: Check regs->sp is trusted before adjusting #VC IST stack
  x86/sev-es: Introduce ip_within_syscall_gap() helper
  x86/entry: Fix entry/exit mismatch on failed fast 32-bit syscalls
  x86/unwind/orc: Silence warnings caused by missing ORC data
  x86/unwind/orc: Disable KASAN checking in the ORC unwinder, part 2
2021-03-14 12:48:10 -07:00
Linus Torvalds
c3c7579f5e Merge tag 'powerpc-5.12-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
 "Some more powerpc fixes for 5.12:

   - Fix wrong instruction encoding for lis in ppc_function_entry(),
     which could potentially lead to missed kprobes.

   - Fix SET_FULL_REGS on 32-bit and 64e, which prevented ptrace of
     non-volatile GPRs immediately after exec.

   - Clean up a missed SRR specifier in the recent interrupt rework.

   - Don't treat unrecoverable_exception() as an interrupt handler, it's
     called from other handlers so shouldn't do the interrupt entry/exit
     accounting itself.

   - Fix build errors caused by missing declarations for
     [en/dis]able_kernel_vsx().

  Thanks to Christophe Leroy, Daniel Axtens, Geert Uytterhoeven, Jiri
  Olsa, Naveen N. Rao, and Nicholas Piggin"

* tag 'powerpc-5.12-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/traps: unrecoverable_exception() is not an interrupt handler
  powerpc: Fix missing declaration of [en/dis]able_kernel_vsx()
  powerpc/64s/exception: Clean up a missed SRR specifier
  powerpc: Fix inverted SET_FULL_REGS bitop
  powerpc/64s: Use symbolic macros for function entry encoding
  powerpc/64s: Fix instruction encoding for lis in ppc_function_entry()
2021-03-14 12:37:43 -07:00
Linus Torvalds
9d0c8e793f Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM fixes from Paolo Bonzini:
 "More fixes for ARM and x86"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: LAPIC: Advancing the timer expiration on guest initiated write
  KVM: x86/mmu: Skip !MMU-present SPTEs when removing SP in exclusive mode
  KVM: kvmclock: Fix vCPUs > 64 can't be online/hotpluged
  kvm: x86: annotate RCU pointers
  KVM: arm64: Fix exclusive limit for IPA size
  KVM: arm64: Reject VM creation when the default IPA size is unsupported
  KVM: arm64: Ensure I-cache isolation between vcpus of a same VM
  KVM: arm64: Don't use cbz/adr with external symbols
  KVM: arm64: Fix range alignment when walking page tables
  KVM: arm64: Workaround firmware wrongly advertising GICv2-on-v3 compatibility
  KVM: arm64: Rename __vgic_v3_get_ich_vtr_el2() to __vgic_v3_get_gic_config()
  KVM: arm64: Don't access PMSELR_EL0/PMUSERENR_EL0 when no PMU is available
  KVM: arm64: Turn kvm_arm_support_pmu_v3() into a static key
  KVM: arm64: Fix nVHE hyp panic host context restore
  KVM: arm64: Avoid corrupting vCPU context register in guest exit
  KVM: arm64: nvhe: Save the SPE context early
  kvm: x86: use NULL instead of using plain integer as pointer
  KVM: SVM: Connect 'npt' module param to KVM's internal 'npt_enabled'
  KVM: x86: Ensure deadline timer has truly expired before posting its IRQ
2021-03-14 12:35:02 -07:00
Linus Torvalds
50eb842fe5 Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
 "28 patches.

  Subsystems affected by this series: mm (memblock, pagealloc, hugetlb,
  highmem, kfence, oom-kill, madvise, kasan, userfaultfd, memcg, and
  zram), core-kernel, kconfig, fork, binfmt, MAINTAINERS, kbuild, and
  ia64"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (28 commits)
  zram: fix broken page writeback
  zram: fix return value on writeback_store
  mm/memcg: set memcg when splitting page
  mm/memcg: rename mem_cgroup_split_huge_fixup to split_page_memcg and add nr_pages argument
  ia64: fix ptrace(PTRACE_SYSCALL_INFO_EXIT) sign
  ia64: fix ia64_syscall_get_set_arguments() for break-based syscalls
  mm/userfaultfd: fix memory corruption due to writeprotect
  kasan: fix KASAN_STACK dependency for HW_TAGS
  kasan, mm: fix crash with HW_TAGS and DEBUG_PAGEALLOC
  mm/madvise: replace ptrace attach requirement for process_madvise
  include/linux/sched/mm.h: use rcu_dereference in in_vfork()
  kfence: fix reports if constant function prefixes exist
  kfence, slab: fix cache_alloc_debugcheck_after() for bulk allocations
  kfence: fix printk format for ptrdiff_t
  linux/compiler-clang.h: define HAVE_BUILTIN_BSWAP*
  MAINTAINERS: exclude uapi directories in API/ABI section
  binfmt_misc: fix possible deadlock in bm_register_write
  mm/highmem.c: fix zero_user_segments() with start > end
  hugetlb: do early cow when page pinned on src mm
  mm: use is_cow_mapping() across tree where proper
  ...
2021-03-14 12:23:34 -07:00
Thomas Gleixner
b470ebc9e0 Merge tag 'irqchip-fixes-5.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/urgent
Pull irqchip fixes from Marc Zyngier:

  - More compatible strings for the Ingenic irqchip (introducing the
    JZ4760B SoC)
  - Select GENERIC_IRQ_MULTI_HANDLER on the ARM ep93xx platform
  - Drop all GENERIC_IRQ_MULTI_HANDLER selections from the irqchip
    Kconfig, now relying on the architecture to get it right
  - Drop the debugfs_file field from struct irq_domain, now that
    debugfs can track things on its own
2021-03-14 16:34:35 +01:00
Linus Torvalds
88fe49249c Merge tag 'char-misc-5.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc driver fixes from Greg KH:
 "Here are some small misc/char driver fixes to resolve some reported
  problems:

   - habanalabs driver fixes

   - Acrn build fixes (reported many times)

   - pvpanic module table export fix

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'char-misc-5.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  misc/pvpanic: Export module FDT device table
  misc: fastrpc: restrict user apps from sending kernel RPC messages
  virt: acrn: Correct type casting of argument of copy_from_user()
  virt: acrn: Use EPOLLIN instead of POLLIN
  virt: acrn: Use vfs_poll() instead of f_op->poll()
  virt: acrn: Make remove_cpu sysfs invisible with !CONFIG_HOTPLUG_CPU
  cpu/hotplug: Fix build error of using {add,remove}_cpu() with !CONFIG_SMP
  habanalabs: fix debugfs address translation
  habanalabs: Disable file operations after device is removed
  habanalabs: Call put_pid() when releasing control device
  drivers: habanalabs: remove unused dentry pointer for debugfs files
  habanalabs: mark hl_eq_inc_ptr() as static
2021-03-13 12:38:44 -08:00
Linus Torvalds
be61af330e Merge tag 'staging-5.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull staging driver fixes from Greg KH:
 "Here are some small staging driver fixes for reported problems. They
  include:

   - wfx header file cleanup patch reverted as it could cause problems

   - comedi driver endian fixes

   - buffer overflow problems for staging wifi drivers

   - build dependency issue for rtl8192e driver

  All have been in linux-next for a while with no reported problems"

* tag 'staging-5.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (23 commits)
  Revert "staging: wfx: remove unused included header files"
  staging: rtl8188eu: prevent ->ssid overflow in rtw_wx_set_scan()
  staging: rtl8188eu: fix potential memory corruption in rtw_check_beacon_data()
  staging: rtl8192u: fix ->ssid overflow in r8192_wx_set_scan()
  staging: comedi: pcl726: Use 16-bit 0 for interrupt data
  staging: comedi: ni_65xx: Use 16-bit 0 for interrupt data
  staging: comedi: ni_6527: Use 16-bit 0 for interrupt data
  staging: comedi: comedi_parport: Use 16-bit 0 for interrupt data
  staging: comedi: amplc_pc236_common: Use 16-bit 0 for interrupt data
  staging: comedi: pcl818: Fix endian problem for AI command data
  staging: comedi: pcl711: Fix endian problem for AI command data
  staging: comedi: me4000: Fix endian problem for AI command data
  staging: comedi: dmm32at: Fix endian problem for AI command data
  staging: comedi: das800: Fix endian problem for AI command data
  staging: comedi: das6402: Fix endian problem for AI command data
  staging: comedi: adv_pci1710: Fix endian problem for AI command data
  staging: comedi: addi_apci_1500: Fix endian problem for command sample
  staging: comedi: addi_apci_1032: Fix endian problem for COS sample
  staging: ks7010: prevent buffer overflow in ks_wlan_set_scan()
  staging: rtl8712: Fix possible buffer overflow in r8712_sitesurvey_cmd
  ...
2021-03-13 12:36:53 -08:00
Linus Torvalds
cc14086f22 Merge tag 'tty-5.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty/serial fixes from Greg KH:
 "Here are some small tty and serial driver fixes to resolve some
  reported problems:

   - led tty trigger fixes based on review and were acked by the led
     maintainer

   - revert a max310x serial driver patch as it was causing problems

   - revert a pty change as it was also causing problems

  All of these have been in linux-next for a while with no reported
  problems"

* tag 'tty-5.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
  Revert "drivers:tty:pty: Fix a race causing data loss on close"
  Revert "serial: max310x: rework RX interrupt handling"
  leds: trigger/tty: Use led_set_brightness_sync() from workqueue
  leds: trigger: Fix error path to not unlock the unlocked mutex
2021-03-13 12:34:29 -08:00
Linus Torvalds
5c7bdbf882 Merge tag 'usb-5.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
 "Here are a small number of USB fixes for 5.12-rc3 to resolve a bunch
  of reported issues:

   - usbip fixups for issues found by syzbot

   - xhci driver fixes and quirk additions

   - gadget driver fixes

   - dwc3 QCOM driver fix

   - usb-serial new ids and fixes

   - usblp fix for a long-time issue

   - cdc-acm quirk addition

   - other tiny fixes for reported problems

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'usb-5.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (25 commits)
  xhci: Fix repeated xhci wake after suspend due to uncleared internal wake state
  usb: xhci: Fix ASMedia ASM1042A and ASM3242 DMA addressing
  xhci: Improve detection of device initiated wake signal.
  usb: xhci: do not perform Soft Retry for some xHCI hosts
  usbip: fix vudc usbip_sockfd_store races leading to gpf
  usbip: fix vhci_hcd attach_store() races leading to gpf
  usbip: fix stub_dev usbip_sockfd_store() races leading to gpf
  usbip: fix vudc to check for stream socket
  usbip: fix vhci_hcd to check for stream socket
  usbip: fix stub_dev to check for stream socket
  usb: dwc3: qcom: Add missing DWC3 OF node refcount decrement
  USB: usblp: fix a hang in poll() if disconnected
  USB: gadget: udc: s3c2410_udc: fix return value check in s3c2410_udc_probe()
  usb: renesas_usbhs: Clear PIPECFG for re-enabling pipe with other EPNUM
  usb: dwc3: qcom: Honor wakeup enabled/disabled state
  usb: gadget: f_uac1: stop playback on function disable
  usb: gadget: f_uac2: always increase endpoint max_packet_size by one audio slot
  USB: gadget: u_ether: Fix a configfs return code
  usb: dwc3: qcom: add ACPI device id for sc8180x
  Goodix Fingerprint device is not a modem
  ...
2021-03-13 12:32:57 -08:00
Linus Torvalds
420623430a Merge tag 'erofs-for-5.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs
Pull erofs fix from Gao Xiang:
 "Fix an urgent regression introduced by commit baa2c7c971 ("block:
  set .bi_max_vecs as actual allocated vector number"), which could
  cause unexpected hung since linux 5.12-rc1.

  Resolve it by avoiding using bio->bi_max_vecs completely"

* tag 'erofs-for-5.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
  erofs: fix bio->bi_max_vecs behavior change
2021-03-13 12:26:22 -08:00
Linus Torvalds
e83bad7f77 Merge tag 'kbuild-fixes-v5.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild fixes from Masahiro Yamada:

 - avoid 'make image_name' invoking syncconfig

 - fix a couple of bugs in scripts/dummy-tools

 - fix LLD_VENDOR and locale issues in scripts/ld-version.sh

 - rebuild GCC plugins when the compiler is upgraded

 - allow LTO to be enabled with KASAN_HW_TAGS

 - allow LTO to be enabled without LLVM=1

* tag 'kbuild-fixes-v5.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
  kbuild: fix ld-version.sh to not be affected by locale
  kbuild: remove meaningless parameter to $(call if_changed_rule,dtc)
  kbuild: remove LLVM=1 test from HAS_LTO_CLANG
  kbuild: remove unneeded -O option to dtc
  kbuild: dummy-tools: adjust to scripts/cc-version.sh
  kbuild: Allow LTO to be selected with KASAN_HW_TAGS
  kbuild: dummy-tools: support MPROFILE_KERNEL checks for ppc
  kbuild: rebuild GCC plugins when the compiler is upgraded
  kbuild: Fix ld-version.sh script if LLD was built with LLD_VENDOR
  kbuild: dummy-tools: fix inverted tests for gcc
  kbuild: add image_name to no-sync-config-targets
2021-03-13 12:18:59 -08:00
Minchan Kim
2766f18216 zram: fix broken page writeback
commit 0d8359620d ("zram: support page writeback") introduced two
problems.  It overwrites writeback_store's return value as kstrtol's
return value, which makes return value zero so user could see zero as
return value of write syscall even though it wrote data successfully.

It also breaks index value in the loop in that it doesn't increase the
index any longer.  It means it can write only first starting block index
so user couldn't write all idle pages in the zram so lose memory saving
chance.

This patch fixes those issues.

Link: https://lkml.kernel.org/r/20210312173949.2197662-2-minchan@kernel.org
Fixes: 0d8359620d9b("zram: support page writeback")
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reported-by: Amos Bianchi <amosbianchi@google.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: John Dias <joaodias@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-03-13 11:27:31 -08:00
Minchan Kim
57e0076e65 zram: fix return value on writeback_store
writeback_store's return value is overwritten by submit_bio_wait's return
value.  Thus, writeback_store will return zero since there was no IO
error.  In the end, write syscall from userspace will see the zero as
return value, which could make the process stall to keep trying the write
until it will succeed.

Link: https://lkml.kernel.org/r/20210312173949.2197662-1-minchan@kernel.org
Fixes: 3b82a051c101("drivers/block/zram/zram_drv.c: fix error return codes not being returned in writeback_store")
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: John Dias <joaodias@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-03-13 11:27:31 -08:00
Zhou Guanghui
e1baddf847 mm/memcg: set memcg when splitting page
As described in the split_page() comment, for the non-compound high order
page, the sub-pages must be freed individually.  If the memcg of the first
page is valid, the tail pages cannot be uncharged when be freed.

For example, when alloc_pages_exact is used to allocate 1MB continuous
physical memory, 2MB is charged(kmemcg is enabled and __GFP_ACCOUNT is
set).  When make_alloc_exact free the unused 1MB and free_pages_exact free
the applied 1MB, actually, only 4KB(one page) is uncharged.

Therefore, the memcg of the tail page needs to be set when splitting a
page.

Michel:

There are at least two explicit users of __GFP_ACCOUNT with
alloc_exact_pages added recently.  See 7efe8ef274 ("KVM: arm64:
Allocate stage-2 pgd pages with GFP_KERNEL_ACCOUNT") and c419621873
("KVM: s390: Add memcg accounting to KVM allocations"), so this is not
just a theoretical issue.

Link: https://lkml.kernel.org/r/20210304074053.65527-3-zhouguanghui1@huawei.com
Signed-off-by: Zhou Guanghui <zhouguanghui1@huawei.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Hanjun Guo <guohanjun@huawei.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Rui Xiang <rui.xiang@huawei.com>
Cc: Tianhong Ding <dingtianhong@huawei.com>
Cc: Weilong Chen <chenweilong@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-03-13 11:27:31 -08:00
Zhou Guanghui
be6c8982e4 mm/memcg: rename mem_cgroup_split_huge_fixup to split_page_memcg and add nr_pages argument
Rename mem_cgroup_split_huge_fixup to split_page_memcg and explicitly pass
in page number argument.

In this way, the interface name is more common and can be used by
potential users.  In addition, the complete info(memcg and flag) of the
memcg needs to be set to the tail pages.

Link: https://lkml.kernel.org/r/20210304074053.65527-2-zhouguanghui1@huawei.com
Signed-off-by: Zhou Guanghui <zhouguanghui1@huawei.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Hanjun Guo <guohanjun@huawei.com>
Cc: Tianhong Ding <dingtianhong@huawei.com>
Cc: Weilong Chen <chenweilong@huawei.com>
Cc: Rui Xiang <rui.xiang@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-03-13 11:27:31 -08:00
Sergei Trofimovich
61bf318eac ia64: fix ptrace(PTRACE_SYSCALL_INFO_EXIT) sign
In https://bugs.gentoo.org/769614 Dmitry noticed that
`ptrace(PTRACE_GET_SYSCALL_INFO)` does not return error sign properly.

The bug is in mismatch between get/set errors:

static inline long syscall_get_error(struct task_struct *task,
                                     struct pt_regs *regs)
{
        return regs->r10 == -1 ? regs->r8:0;
}

static inline long syscall_get_return_value(struct task_struct *task,
                                            struct pt_regs *regs)
{
        return regs->r8;
}

static inline void syscall_set_return_value(struct task_struct *task,
                                            struct pt_regs *regs,
                                            int error, long val)
{
        if (error) {
                /* error < 0, but ia64 uses > 0 return value */
                regs->r8 = -error;
                regs->r10 = -1;
        } else {
                regs->r8 = val;
                regs->r10 = 0;
        }
}

Tested on v5.10 on rx3600 machine (ia64 9040 CPU).

Link: https://lkml.kernel.org/r/20210221002554.333076-2-slyfox@gentoo.org
Link: https://bugs.gentoo.org/769614
Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
Reported-by: Dmitry V. Levin <ldv@altlinux.org>
Reviewed-by: Dmitry V. Levin <ldv@altlinux.org>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-03-13 11:27:31 -08:00
Sergei Trofimovich
0ceb1ace4a ia64: fix ia64_syscall_get_set_arguments() for break-based syscalls
In https://bugs.gentoo.org/769614 Dmitry noticed that
`ptrace(PTRACE_GET_SYSCALL_INFO)` does not work for syscalls called via
glibc's syscall() wrapper.

ia64 has two ways to call syscalls from userspace: via `break` and via
`eps` instructions.

The difference is in stack layout:

1. `eps` creates simple stack frame: no locals, in{0..7} == out{0..8}
2. `break` uses userspace stack frame: may be locals (glibc provides
   one), in{0..7} == out{0..8}.

Both work fine in syscall handling cde itself.

But `ptrace(PTRACE_GET_SYSCALL_INFO)` uses unwind mechanism to
re-extract syscall arguments but it does not account for locals.

The change always skips locals registers. It should not change `eps`
path as kernel's handler already enforces locals=0 and fixes `break`.

Tested on v5.10 on rx3600 machine (ia64 9040 CPU).

Link: https://lkml.kernel.org/r/20210221002554.333076-1-slyfox@gentoo.org
Link: https://bugs.gentoo.org/769614
Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
Reported-by: Dmitry V. Levin <ldv@altlinux.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-03-13 11:27:31 -08:00
Nadav Amit
6ce64428d6 mm/userfaultfd: fix memory corruption due to writeprotect
Userfaultfd self-test fails occasionally, indicating a memory corruption.

Analyzing this problem indicates that there is a real bug since mmap_lock
is only taken for read in mwriteprotect_range() and defers flushes, and
since there is insufficient consideration of concurrent deferred TLB
flushes in wp_page_copy().  Although the PTE is flushed from the TLBs in
wp_page_copy(), this flush takes place after the copy has already been
performed, and therefore changes of the page are possible between the time
of the copy and the time in which the PTE is flushed.

To make matters worse, memory-unprotection using userfaultfd also poses a
problem.  Although memory unprotection is logically a promotion of PTE
permissions, and therefore should not require a TLB flush, the current
userrfaultfd code might actually cause a demotion of the architectural PTE
permission: when userfaultfd_writeprotect() unprotects memory region, it
unintentionally *clears* the RW-bit if it was already set.  Note that this
unprotecting a PTE that is not write-protected is a valid use-case: the
userfaultfd monitor might ask to unprotect a region that holds both
write-protected and write-unprotected PTEs.

The scenario that happens in selftests/vm/userfaultfd is as follows:

cpu0				cpu1			cpu2
----				----			----
							[ Writable PTE
							  cached in TLB ]
userfaultfd_writeprotect()
[ write-*unprotect* ]
mwriteprotect_range()
mmap_read_lock()
change_protection()

change_protection_range()
...
change_pte_range()
[ *clear* “write”-bit ]
[ defer TLB flushes ]
				[ page-fault ]
				...
				wp_page_copy()
				 cow_user_page()
				  [ copy page ]
							[ write to old
							  page ]
				...
				 set_pte_at_notify()

A similar scenario can happen:

cpu0		cpu1		cpu2		cpu3
----		----		----		----
						[ Writable PTE
				  		  cached in TLB ]
userfaultfd_writeprotect()
[ write-protect ]
[ deferred TLB flush ]
		userfaultfd_writeprotect()
		[ write-unprotect ]
		[ deferred TLB flush]
				[ page-fault ]
				wp_page_copy()
				 cow_user_page()
				 [ copy page ]
				 ...		[ write to page ]
				set_pte_at_notify()

This race exists since commit 292924b260 ("userfaultfd: wp: apply
_PAGE_UFFD_WP bit").  Yet, as Yu Zhao pointed, these races became apparent
since commit 09854ba94c ("mm: do_wp_page() simplification") which made
wp_page_copy() more likely to take place, specifically if page_count(page)
> 1.

To resolve the aforementioned races, check whether there are pending
flushes on uffd-write-protected VMAs, and if there are, perform a flush
before doing the COW.

Further optimizations will follow to avoid during uffd-write-unprotect
unnecassary PTE write-protection and TLB flushes.

Link: https://lkml.kernel.org/r/20210304095423.3825684-1-namit@vmware.com
Fixes: 09854ba94c ("mm: do_wp_page() simplification")
Signed-off-by: Nadav Amit <namit@vmware.com>
Suggested-by: Yu Zhao <yuzhao@google.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Tested-by: Peter Xu <peterx@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: <stable@vger.kernel.org>	[5.9+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-03-13 11:27:31 -08:00
Andrey Konovalov
d9b571c885 kasan: fix KASAN_STACK dependency for HW_TAGS
There's a runtime failure when running HW_TAGS-enabled kernel built with
GCC on hardware that doesn't support MTE.  GCC-built kernels always have
CONFIG_KASAN_STACK enabled, even though stack instrumentation isn't
supported by HW_TAGS.  Having that config enabled causes KASAN to issue
MTE-only instructions to unpoison kernel stacks, which causes the failure.

Fix the issue by disallowing CONFIG_KASAN_STACK when HW_TAGS is used.

(The commit that introduced CONFIG_KASAN_HW_TAGS specified proper
 dependency for CONFIG_KASAN_STACK_ENABLE but not for CONFIG_KASAN_STACK.)

Link: https://lkml.kernel.org/r/59e75426241dbb5611277758c8d4d6f5f9298dac.1615215441.git.andreyknvl@google.com
Fixes: 6a63a63ff1 ("kasan: introduce CONFIG_KASAN_HW_TAGS")
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reported-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: <stable@vger.kernel.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Marco Elver <elver@google.com>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Branislav Rankov <Branislav.Rankov@arm.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-03-13 11:27:31 -08:00
Andrey Konovalov
f9d79e8dce kasan, mm: fix crash with HW_TAGS and DEBUG_PAGEALLOC
Currently, kasan_free_nondeferred_pages()->kasan_free_pages() is called
after debug_pagealloc_unmap_pages(). This causes a crash when
debug_pagealloc is enabled, as HW_TAGS KASAN can't set tags on an
unmapped page.

This patch puts kasan_free_nondeferred_pages() before
debug_pagealloc_unmap_pages() and arch_free_page(), which can also make
the page unavailable.

Link: https://lkml.kernel.org/r/24cd7db274090f0e5bc3adcdc7399243668e3171.1614987311.git.andreyknvl@google.com
Fixes: 94ab5b61ee ("kasan, arm64: enable CONFIG_KASAN_HW_TAGS")
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Marco Elver <elver@google.com>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Branislav Rankov <Branislav.Rankov@arm.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-03-13 11:27:31 -08:00
Suren Baghdasaryan
96cfe2c0fd mm/madvise: replace ptrace attach requirement for process_madvise
process_madvise currently requires ptrace attach capability.
PTRACE_MODE_ATTACH gives one process complete control over another
process.  It effectively removes the security boundary between the two
processes (in one direction).  Granting ptrace attach capability even to a
system process is considered dangerous since it creates an attack surface.
This severely limits the usage of this API.

The operations process_madvise can perform do not affect the correctness
of the operation of the target process; they only affect where the data is
physically located (and therefore, how fast it can be accessed).  What we
want is the ability for one process to influence another process in order
to optimize performance across the entire system while leaving the
security boundary intact.

Replace PTRACE_MODE_ATTACH with a combination of PTRACE_MODE_READ and
CAP_SYS_NICE.  PTRACE_MODE_READ to prevent leaking ASLR metadata and
CAP_SYS_NICE for influencing process performance.

Link: https://lkml.kernel.org/r/20210303185807.2160264-1-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Minchan Kim <minchan@kernel.org>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jeff Vander Stoep <jeffv@google.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Tim Murray <timmurray@google.com>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: James Morris <jmorris@namei.org>
Cc: <stable@vger.kernel.org>	[5.10+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-03-13 11:27:30 -08:00
Matthew Wilcox (Oracle)
149fc78735 include/linux/sched/mm.h: use rcu_dereference in in_vfork()
Fix a sparse warning by using rcu_dereference().  Technically this is a
bug and a sufficiently aggressive compiler could reload the `real_parent'
pointer outside the protection of the rcu lock (and access freed memory),
but I think it's pretty unlikely to happen.

Link: https://lkml.kernel.org/r/20210221194207.1351703-1-willy@infradead.org
Fixes: b18dc5f291 ("mm, oom: skip vforked tasks from being selected")
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-03-13 11:27:30 -08:00
Marco Elver
0aa41cae92 kfence: fix reports if constant function prefixes exist
Some architectures prefix all functions with a constant string ('.' on
ppc64).  Add ARCH_FUNC_PREFIX, which may optionally be defined in
<asm/kfence.h>, so that get_stack_skipnr() can work properly.

Link: https://lkml.kernel.org/r/f036c53d-7e81-763c-47f4-6024c6c5f058@csgroup.eu
Link: https://lkml.kernel.org/r/20210304144000.1148590-1-elver@google.com
Signed-off-by: Marco Elver <elver@google.com>
Reported-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Tested-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Jann Horn <jannh@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-03-13 11:27:30 -08:00
Marco Elver
df3ae2c994 kfence, slab: fix cache_alloc_debugcheck_after() for bulk allocations
cache_alloc_debugcheck_after() performs checks on an object, including
adjusting the returned pointer.  None of this should apply to KFENCE
objects.  While for non-bulk allocations, the checks are skipped when we
allocate via KFENCE, for bulk allocations cache_alloc_debugcheck_after()
is called via cache_alloc_debugcheck_after_bulk().

Fix it by skipping cache_alloc_debugcheck_after() for KFENCE objects.

Link: https://lkml.kernel.org/r/20210304205256.2162309-1-elver@google.com
Signed-off-by: Marco Elver <elver@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Jann Horn <jannh@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-03-13 11:27:30 -08:00
Marco Elver
702b16d724 kfence: fix printk format for ptrdiff_t
Use %td for ptrdiff_t.

Link: https://lkml.kernel.org/r/3abbe4c9-16ad-c168-a90f-087978ccd8f7@csgroup.eu
Link: https://lkml.kernel.org/r/20210303121157.3430807-1-elver@google.com
Signed-off-by: Marco Elver <elver@google.com>
Reported-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Alexander Potapenko <glider@google.com>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-03-13 11:27:30 -08:00
Arnd Bergmann
97e4910232 linux/compiler-clang.h: define HAVE_BUILTIN_BSWAP*
Separating compiler-clang.h from compiler-gcc.h inadventently dropped the
definitions of the three HAVE_BUILTIN_BSWAP macros, which requires falling
back to the open-coded version and hoping that the compiler detects it.

Since all versions of clang support the __builtin_bswap interfaces, add
back the flags and have the headers pick these up automatically.

This results in a 4% improvement of compilation speed for arm defconfig.

Note: it might also be worth revisiting which architectures set
CONFIG_ARCH_USE_BUILTIN_BSWAP for one compiler or the other, today this is
set on six architectures (arm32, csky, mips, powerpc, s390, x86), while
another ten architectures define custom helpers (alpha, arc, ia64, m68k,
mips, nios2, parisc, sh, sparc, xtensa), and the rest (arm64, h8300,
hexagon, microblaze, nds32, openrisc, riscv) just get the unoptimized
version and rely on the compiler to detect it.

A long time ago, the compiler builtins were architecture specific, but
nowadays, all compilers that are able to build the kernel have correct
implementations of them, though some may not be as optimized as the inline
asm versions.

The patch that dropped the optimization landed in v4.19, so as discussed
it would be fairly safe to backport this revert to stable kernels to the
4.19/5.4/5.10 stable kernels, but there is a remaining risk for
regressions, and it has no known side-effects besides compile speed.

Link: https://lkml.kernel.org/r/20210226161151.2629097-1-arnd@kernel.org
Link: https://lore.kernel.org/lkml/20210225164513.3667778-1-arnd@kernel.org/
Fixes: 815f0ddb34 ("include/linux/compiler*.h: make compiler-*.h mutually exclusive")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Miguel Ojeda <ojeda@kernel.org>
Acked-by: Nick Desaulniers <ndesaulniers@google.com>
Acked-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Nick Hu <nickhu@andestech.com>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Vincent Chen <deanbo422@gmail.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Guo Ren <guoren@kernel.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Sami Tolvanen <samitolvanen@google.com>
Cc: Marco Elver <elver@google.com>
Cc: Arvind Sankar <nivedita@alum.mit.edu>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-03-13 11:27:30 -08:00
Vlastimil Babka
f0b15b6081 MAINTAINERS: exclude uapi directories in API/ABI section
Commit 7b4693e644 ("MAINTAINERS: add uapi directories to API/ABI
section") added include/uapi/ and arch/*/include/uapi/ so that patches
modifying them CC linux-api.  However that was already done in the past
and resulted in too much noise and thus later removed, as explained in
b14fd334ff ("MAINTAINERS: trim the file triggers for ABI/API")

To prevent another round of addition and removal in the future, change the
entries to X: (explicit exclusion) for documentation purposes, although
they are not subdirectories of broader included directories, as there is
apparently no defined way to add plain comments in subsystem sections.

Link: https://lkml.kernel.org/r/20210301100255.25229-1-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reported-by: Michael Kerrisk (man-pages) <mtk.manpages@gmail.com>
Acked-by: Michael Kerrisk (man-pages) <mtk.manpages@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-03-13 11:27:30 -08:00
Lior Ribak
e7850f4d84 binfmt_misc: fix possible deadlock in bm_register_write
There is a deadlock in bm_register_write:

First, in the begining of the function, a lock is taken on the binfmt_misc
root inode with inode_lock(d_inode(root)).

Then, if the user used the MISC_FMT_OPEN_FILE flag, the function will call
open_exec on the user-provided interpreter.

open_exec will call a path lookup, and if the path lookup process includes
the root of binfmt_misc, it will try to take a shared lock on its inode
again, but it is already locked, and the code will get stuck in a deadlock

To reproduce the bug:
$ echo ":iiiii:E::ii::/proc/sys/fs/binfmt_misc/bla:F" > /proc/sys/fs/binfmt_misc/register

backtrace of where the lock occurs (#5):
0  schedule () at ./arch/x86/include/asm/current.h:15
1  0xffffffff81b51237 in rwsem_down_read_slowpath (sem=0xffff888003b202e0, count=<optimized out>, state=state@entry=2) at kernel/locking/rwsem.c:992
2  0xffffffff81b5150a in __down_read_common (state=2, sem=<optimized out>) at kernel/locking/rwsem.c:1213
3  __down_read (sem=<optimized out>) at kernel/locking/rwsem.c:1222
4  down_read (sem=<optimized out>) at kernel/locking/rwsem.c:1355
5  0xffffffff811ee22a in inode_lock_shared (inode=<optimized out>) at ./include/linux/fs.h:783
6  open_last_lookups (op=0xffffc9000022fe34, file=0xffff888004098600, nd=0xffffc9000022fd10) at fs/namei.c:3177
7  path_openat (nd=nd@entry=0xffffc9000022fd10, op=op@entry=0xffffc9000022fe34, flags=flags@entry=65) at fs/namei.c:3366
8  0xffffffff811efe1c in do_filp_open (dfd=<optimized out>, pathname=pathname@entry=0xffff8880031b9000, op=op@entry=0xffffc9000022fe34) at fs/namei.c:3396
9  0xffffffff811e493f in do_open_execat (fd=fd@entry=-100, name=name@entry=0xffff8880031b9000, flags=<optimized out>, flags@entry=0) at fs/exec.c:913
10 0xffffffff811e4a92 in open_exec (name=<optimized out>) at fs/exec.c:948
11 0xffffffff8124aa84 in bm_register_write (file=<optimized out>, buffer=<optimized out>, count=19, ppos=<optimized out>) at fs/binfmt_misc.c:682
12 0xffffffff811decd2 in vfs_write (file=file@entry=0xffff888004098500, buf=buf@entry=0xa758d0 ":iiiii:E::ii::i:CF
", count=count@entry=19, pos=pos@entry=0xffffc9000022ff10) at fs/read_write.c:603
13 0xffffffff811defda in ksys_write (fd=<optimized out>, buf=0xa758d0 ":iiiii:E::ii::i:CF
", count=19) at fs/read_write.c:658
14 0xffffffff81b49813 in do_syscall_64 (nr=<optimized out>, regs=0xffffc9000022ff58) at arch/x86/entry/common.c:46
15 0xffffffff81c0007c in entry_SYSCALL_64 () at arch/x86/entry/entry_64.S:120

To solve the issue, the open_exec call is moved to before the write
lock is taken by bm_register_write

Link: https://lkml.kernel.org/r/20210228224414.95962-1-liorribak@gmail.com
Fixes: 948b701a60 ("binfmt_misc: add persistent opened binary handler for containers")
Signed-off-by: Lior Ribak <liorribak@gmail.com>
Acked-by: Helge Deller <deller@gmx.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-03-13 11:27:30 -08:00
OGAWA Hirofumi
184cee516f mm/highmem.c: fix zero_user_segments() with start > end
zero_user_segments() is used from __block_write_begin_int(), for example
like the following

	zero_user_segments(page, 4096, 1024, 512, 918)

But new the zero_user_segments() implementation for for HIGHMEM +
TRANSPARENT_HUGEPAGE doesn't handle "start > end" case correctly, and hits
BUG_ON().  (we can fix __block_write_begin_int() instead though, it is the
old and multiple usage)

Also it calls kmap_atomic() unnecessarily while start == end == 0.

Link: https://lkml.kernel.org/r/87v9ab60r4.fsf@mail.parknet.co.jp
Fixes: 0060ef3b4e ("mm: support THPs in zero_user_segments")
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-03-13 11:27:30 -08:00
Peter Xu
4eae4efa2c hugetlb: do early cow when page pinned on src mm
This is the last missing piece of the COW-during-fork effort when there're
pinned pages found.  One can reference 70e806e4e6 ("mm: Do early cow for
pinned pages during fork() for ptes", 2020-09-27) for more information,
since we do similar things here rather than pte this time, but just for
hugetlb.

Note that after Jason's recent work on 57efa1fe59 ("mm/gup: prevent
gup_fast from racing with COW during fork", 2020-12-15) which is safer and
easier to understand, we're safe now within the whole copy_page_range()
against gup-fast, we don't need the wr-protect trick that proposed in
70e806e4e6 anymore.

Link: https://lkml.kernel.org/r/20210217233547.93892-6-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: David Airlie <airlied@linux.ie>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: Gal Pressman <galpress@amazon.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Kirill Shutemov <kirill@shutemov.name>
Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Roland Scheidegger <sroland@vmware.com>
Cc: VMware Graphics <linux-graphics-maintainer@vmware.com>
Cc: Wei Zhang <wzam@amazon.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-03-13 11:27:30 -08:00
Peter Xu
ca6eb14d64 mm: use is_cow_mapping() across tree where proper
After is_cow_mapping() is exported in mm.h, replace some manual checks
elsewhere throughout the tree but start to use the new helper.

Link: https://lkml.kernel.org/r/20210217233547.93892-5-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Jason Gunthorpe <jgg@ziepe.ca>
Cc: VMware Graphics <linux-graphics-maintainer@vmware.com>
Cc: Roland Scheidegger <sroland@vmware.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: Gal Pressman <galpress@amazon.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Kirill Shutemov <kirill@shutemov.name>
Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Wei Zhang <wzam@amazon.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-03-13 11:27:30 -08:00
Peter Xu
97a7e4733b mm: introduce page_needs_cow_for_dma() for deciding whether cow
We've got quite a few places (pte, pmd, pud) that explicitly checked
against whether we should break the cow right now during fork().  It's
easier to provide a helper, especially before we work the same thing on
hugetlbfs.

Since we'll reference is_cow_mapping() in mm.h, move it there too.
Actually it suites mm.h more since internal.h is mm/ only, but mm.h is
exported to the whole kernel.  With that we should expect another patch to
use is_cow_mapping() whenever we can across the kernel since we do use it
quite a lot but it's always done with raw code against VM_* flags.

Link: https://lkml.kernel.org/r/20210217233547.93892-4-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: David Airlie <airlied@linux.ie>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: Gal Pressman <galpress@amazon.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Kirill Shutemov <kirill@shutemov.name>
Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Roland Scheidegger <sroland@vmware.com>
Cc: VMware Graphics <linux-graphics-maintainer@vmware.com>
Cc: Wei Zhang <wzam@amazon.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-03-13 11:27:30 -08:00
Peter Xu
ca7e0457ef hugetlb: break earlier in add_reservation_in_range() when we can
All the regions maintained in hugetlb reserved map is inclusive on "from"
but exclusive on "to".  We can break earlier even if rg->from==t because
it already means no possible intersection.

This does not need a Fixes in all cases because when it happens
(rg->from==t) we'll not break out of the loop while we should, however the
next thing we'd do is still add the last file_region we'd need and quit
the loop in the next round.  So this change is not a bugfix (since the old
code should still run okay iiuc), but we'd better still touch it up to
make it logically sane.

Link: https://lkml.kernel.org/r/20210217233547.93892-3-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: David Airlie <airlied@linux.ie>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: Gal Pressman <galpress@amazon.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Kirill Shutemov <kirill@shutemov.name>
Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Roland Scheidegger <sroland@vmware.com>
Cc: VMware Graphics <linux-graphics-maintainer@vmware.com>
Cc: Wei Zhang <wzam@amazon.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-03-13 11:27:30 -08:00
Peter Xu
2103cf9c3f hugetlb: dedup the code to add a new file_region
Patch series "mm/hugetlb: Early cow on fork, and a few cleanups", v5.

As reported by Gal [1], we still miss the code clip to handle early cow
for hugetlb case, which is true.  Again, it still feels odd to fork()
after using a few huge pages, especially if they're privately mapped to
me..  However I do agree with Gal and Jason in that we should still have
that since that'll complete the early cow on fork effort at least, and
it'll still fix issues where buffers are not well under control and not
easy to apply MADV_DONTFORK.

The first two patches (1-2) are some cleanups I noticed when reading into
the hugetlb reserve map code.  I think it's good to have but they're not
necessary for fixing the fork issue.

The last two patches (3-4) are the real fix.

I tested this with a fork() after some vfio-pci assignment, so I'm pretty
sure the page copy path could trigger well (page will be accounted right
after the fork()), but I didn't do data check since the card I assigned is
some random nic.

  https://github.com/xzpeter/linux/tree/fork-cow-pin-huge

[1] https://lore.kernel.org/lkml/27564187-4a08-f187-5a84-3df50009f6ca@amazon.com/

Introduce hugetlb_resv_map_add() helper to add a new file_region rather
than duplication the similar code twice in add_reservation_in_range().

Link: https://lkml.kernel.org/r/20210217233547.93892-1-peterx@redhat.com
Link: https://lkml.kernel.org/r/20210217233547.93892-2-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: Gal Pressman <galpress@amazon.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Wei Zhang <wzam@amazon.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jann Horn <jannh@google.com>
Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
Cc: Kirill Shutemov <kirill@shutemov.name>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: David Airlie <airlied@linux.ie>
Cc: Roland Scheidegger <sroland@vmware.com>
Cc: VMware Graphics <linux-graphics-maintainer@vmware.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-03-13 11:27:30 -08:00
Fenghua Yu
82e69a121b mm/fork: clear PASID for new mm
When a new mm is created, its PASID should be cleared, i.e.  the PASID is
initialized to its init state 0 on both ARM and X86.

This patch was part of the series introducing mm->pasid, but got lost
along the way [1].  It still makes sense to have it, because each address
space has a different PASID.  And the IOMMU code in
iommu_sva_alloc_pasid() expects the pasid field of a new mm struct to be
cleared.

[1] https://lore.kernel.org/linux-iommu/YDgh53AcQHT+T3L0@otcwcpicx3.sc.intel.com/

Link: https://lkml.kernel.org/r/20210302103837.2562625-1-jean-philippe@linaro.org
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Cc: Jacob Pan <jacob.jun.pan@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-03-13 11:27:30 -08:00
Mike Rapoport
0740a50b9b mm/page_alloc.c: refactor initialization of struct page for holes in memory layout
There could be struct pages that are not backed by actual physical memory.
This can happen when the actual memory bank is not a multiple of
SECTION_SIZE or when an architecture does not register memory holes
reserved by the firmware as memblock.memory.

Such pages are currently initialized using init_unavailable_mem() function
that iterates through PFNs in holes in memblock.memory and if there is a
struct page corresponding to a PFN, the fields of this page are set to
default values and it is marked as Reserved.

init_unavailable_mem() does not take into account zone and node the page
belongs to and sets both zone and node links in struct page to zero.

Before commit 73a6e474cb ("mm: memmap_init: iterate over memblock
regions rather that check each PFN") the holes inside a zone were
re-initialized during memmap_init() and got their zone/node links right.
However, after that commit nothing updates the struct pages representing
such holes.

On a system that has firmware reserved holes in a zone above ZONE_DMA, for
instance in a configuration below:

	# grep -A1 E820 /proc/iomem
	7a17b000-7a216fff : Unknown E820 type
	7a217000-7bffffff : System RAM

unset zone link in struct page will trigger

	VM_BUG_ON_PAGE(!zone_spans_pfn(page_zone(page), pfn), page);

in set_pfnblock_flags_mask() when called with a struct page from a range
other than E820_TYPE_RAM because there are pages in the range of
ZONE_DMA32 but the unset zone link in struct page makes them appear as a
part of ZONE_DMA.

Interleave initialization of the unavailable pages with the normal
initialization of memory map, so that zone and node information will be
properly set on struct pages that are not backed by the actual memory.

With this change the pages for holes inside a zone will get proper
zone/node links and the pages that are not spanned by any node will get
links to the adjacent zone/node.  The holes between nodes will be
prepended to the zone/node above the hole and the trailing pages in the
last section that will be appended to the zone/node below.

[akpm@linux-foundation.org: don't initialize static to zero, use %llu for u64]

Link: https://lkml.kernel.org/r/20210225224351.7356-2-rppt@kernel.org
Fixes: 73a6e474cb ("mm: memmap_init: iterate over memblock regions rather that check each PFN")
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Reported-by: Qian Cai <cai@lca.pw>
Reported-by: Andrea Arcangeli <aarcange@redhat.com>
Reviewed-by: Baoquan He <bhe@redhat.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Łukasz Majczak <lma@semihalf.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: "Sarvela, Tomi P" <tomi.p.sarvela@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-03-13 11:27:30 -08:00
Masahiro Yamada
ea29b20a82 init/Kconfig: make COMPILE_TEST depend on HAS_IOMEM
I read the commit log of the following two:

- bc083a64b6 ("init/Kconfig: make COMPILE_TEST depend on !UML")
- 334ef6ed06 ("init/Kconfig: make COMPILE_TEST depend on !S390")

Both are talking about HAS_IOMEM dependency missing in many drivers.

So, 'depends on HAS_IOMEM' seems the direct, sensible solution to me.

This does not change the behavior of UML. UML still cannot enable
COMPILE_TEST because it does not provide HAS_IOMEM.

The current dependency for S390 is too strong. Under the condition of
CONFIG_PCI=y, S390 provides HAS_IOMEM, hence can enable COMPILE_TEST.

I also removed the meaningless 'default n'.

Link: https://lkml.kernel.org/r/20210224140809.1067582-1-masahiroy@kernel.org
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Arnd Bergmann <arnd@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: KP Singh <kpsingh@google.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Quentin Perret <qperret@google.com>
Cc: Valentin Schneider <valentin.schneider@arm.com>
Cc: "Enrico Weigelt, metux IT consult" <lkml@metux.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-03-13 11:27:30 -08:00
Arnd Bergmann
cbf78d8507 stop_machine: mark helpers __always_inline
With clang-13, some functions only get partially inlined, with a
specialized version referring to a global variable.  This triggers a
harmless build-time check for the intel-rng driver:

WARNING: modpost: drivers/char/hw_random/intel-rng.o(.text+0xe): Section mismatch in reference from the function stop_machine() to the function .init.text:intel_rng_hw_init()
The function stop_machine() references
the function __init intel_rng_hw_init().
This is often because stop_machine lacks a __init
annotation or the annotation of intel_rng_hw_init is wrong.

In this instance, an easy workaround is to force the stop_machine()
function to be inline, along with related interfaces that did not show the
same behavior at the moment, but theoretically could.

The combination of the two patches listed below triggers the behavior in
clang-13, but individually these commits are correct.

Link: https://lkml.kernel.org/r/20210225130153.1956990-1-arnd@kernel.org
Fixes: fe5595c074 ("stop_machine: Provide stop_machine_cpuslocked()")
Fixes: ee527cd3a2 ("Use stop_machine_run in the Intel RNG driver")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-03-13 11:27:30 -08:00
Arnd Bergmann
34dc2efb39 memblock: fix section mismatch warning
The inlining logic in clang-13 is rewritten to often not inline some
functions that were inlined by all earlier compilers.

In case of the memblock interfaces, this exposed a harmless bug of a
missing __init annotation:

WARNING: modpost: vmlinux.o(.text+0x507c0a): Section mismatch in reference from the function memblock_bottom_up() to the variable .meminit.data:memblock
The function memblock_bottom_up() references
the variable __meminitdata memblock.
This is often because memblock_bottom_up lacks a __meminitdata
annotation or the annotation of memblock is wrong.

Interestingly, these annotations were present originally, but got removed
with the explanation that the __init annotation prevents the function from
getting inlined.  I checked this again and found that while this is the
case with clang, gcc (version 7 through 10, did not test others) does
inline the functions regardless.

As the previous change was apparently intended to help the clang builds,
reverting it to help the newer clang versions seems appropriate as well.
gcc builds don't seem to care either way.

Link: https://lkml.kernel.org/r/20210225133808.2188581-1-arnd@kernel.org
Fixes: 5bdba520c1 ("mm: memblock: drop __init from memblock functions to make it inline")
Reference: 2cfb3665e8 ("include/linux/memblock.h: add __init to memblock_set_bottom_up()")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Faiyaz Mohammed <faiyazm@codeaurora.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Aslan Bakirov <aslan@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-03-13 11:27:29 -08:00
Masahiro Yamada
bcbcf50f52 kbuild: fix ld-version.sh to not be affected by locale
ld-version.sh checks the output from $(LD) --version, but it has a
problem on some locales.

For example, in Italian:

  $ LC_MESSAGES=it_IT.UTF-8 ld --version | head -n 1
  ld di GNU (GNU Binutils for Debian) 2.35.2

This makes ld-version.sh fail because it expects "GNU ld" for the
BFD linker case.

Add LC_ALL=C to override the user's locale.

BTW, setting LC_MESSAGES=C (or LANG=C) is not enough because it is
ineffective if LC_ALL is set on the user's environment.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=212105
Reported-by: Marco Scardovi
Reported-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Recensito-da: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
2021-03-13 11:12:13 +09:00
Linus Torvalds
f296bfd5cd Merge tag 'nfs-for-5.12-2' of git://git.linux-nfs.org/projects/anna/linux-nfs
Pull NFS client bugfixes from Anna Schumaker:
 "These are mostly fixes for issues discovered at the recent NFS
  bakeathon:

   - Fix PNFS_FLEXFILE_LAYOUT kconfig so it is possible to build
     into the kernel

   - Correct size calculationn for create reply length

   - Set memalloc_nofs_save() for sync tasks to prevent deadlocks

   - Don't revalidate directory permissions on lookup failure

   - Don't clear inode cache when lookup fails

   - Change functions to use nfs_set_cache_invalid() for proper
     delegation handling

   - Fix return value of _nfs4_get_security_label()

   - Return an error when attempting to remove system.nfs4_acl"

* tag 'nfs-for-5.12-2' of git://git.linux-nfs.org/projects/anna/linux-nfs:
  nfs: we don't support removing system.nfs4_acl
  NFSv4.2: fix return value of _nfs4_get_security_label()
  NFS: Fix open coded versions of nfs_set_cache_invalid() in NFSv4
  NFS: Fix open coded versions of nfs_set_cache_invalid()
  NFS: Clean up function nfs_mark_dir_for_revalidate()
  NFS: Don't gratuitously clear the inode cache when lookup failed
  NFS: Don't revalidate the directory permissions on a lookup failure
  SUNRPC: Set memalloc_nofs_save() for sync tasks
  NFS: Correct size calculation for create reply length
  nfs: fix PNFS_FLEXFILE_LAYOUT Kconfig default
2021-03-12 14:19:35 -08:00
Linus Torvalds
b6b8aa27a3 Merge branch 'for-v5.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull userns fix from Eric Biederman:
 "Removing the ambiguity broke userspace so this reverts the change"

* 'for-v5.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  Revert 95ebabde38 ("capabilities: Don't allow writing ambiguous v3 file capabilities")
2021-03-12 13:58:04 -08:00
Linus Torvalds
9afc116379 Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
 "Ten updates: one non code maintainer update for vmw_pvscsi, five code
  updates for ibmvfc and four for UFS.

  All are either trivial patches or bug fixes"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: vmw_pvscsi: MAINTAINERS: Update maintainer
  scsi: ufs: Convert sysfs sprintf/snprintf family to sysfs_emit
  scsi: ufs: Remove redundant checks of !hba in suspend/resume callbacks
  scsi: ufs: ufs-qcom: Disable interrupt in reset path
  scsi: ufs: Minor adjustments to error handling
  scsi: ibmvfc: Reinitialize sub-CRQs and perform channel enquiry after LPM
  scsi: ibmvfc: Store return code of H_FREE_SUB_CRQ during cleanup
  scsi: ibmvfc: Treat H_CLOSED as success during sub-CRQ registration
  scsi: ibmvfc: Fix invalid sub-CRQ handles after hard reset
  scsi: ibmvfc: Simplify handling of sub-CRQ initialization
2021-03-12 13:37:18 -08:00
Eric W. Biederman
3b0c2d3eaa Revert 95ebabde38 ("capabilities: Don't allow writing ambiguous v3 file capabilities")
It turns out that there are in fact userspace implementations that
care and this recent change caused a regression.

https://github.com/containers/buildah/issues/3071

As the motivation for the original change was future development,
and the impact is existing real world code just revert this change
and allow the ambiguity in v3 file caps.

Cc: stable@vger.kernel.org
Fixes: 95ebabde38 ("capabilities: Don't allow writing ambiguous v3 file capabilities")
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2021-03-12 15:27:14 -06:00
Linus Torvalds
ce307084c9 Merge tag 'block-5.12-2021-03-12-v2' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
 "Mostly just random fixes all over the map.

  The only odd-one-out change is finally getting the rename of
  BIO_MAX_PAGES to BIO_MAX_VECS done. This should've been done with the
  multipage bvec change, but it's been left.

  Do it now to avoid hassles around changes piling up for the next merge
  window.

  Summary:

   - NVMe pull request:
      - one more quirk (Dmitry Monakhov)
      - fix max_zone_append_sectors initialization (Chaitanya Kulkarni)
      - nvme-fc reset/create race fix (James Smart)
      - fix status code on aborts/resets (Hannes Reinecke)
      - fix the CSS check for ZNS namespaces (Chaitanya Kulkarni)
      - fix a use after free in a debug printk in nvme-rdma (Lv Yunlong)

   - Follow-up NVMe error fix for NULL 'id' (Christoph)

   - Fixup for the bd_size_lock being IRQ safe, now that the offending
     driver has been dropped (Damien).

   - rsxx probe failure error return (Jia-Ju)

   - umem probe failure error return (Wei)

   - s390/dasd unbind fixes (Stefan)

   - blk-cgroup stats summing fix (Xunlei)

   - zone reset handling fix (Damien)

   - Rename BIO_MAX_PAGES to BIO_MAX_VECS (Christoph)

   - Suppress uevent trigger for hidden devices (Daniel)

   - Fix handling of discard on busy device (Jan)

   - Fix stale cache issue with zone reset (Shin'ichiro)"

* tag 'block-5.12-2021-03-12-v2' of git://git.kernel.dk/linux-block:
  nvme: fix the nsid value to print in nvme_validate_or_alloc_ns
  block: Discard page cache of zone reset target range
  block: Suppress uevent for hidden device when removed
  block: rename BIO_MAX_PAGES to BIO_MAX_VECS
  nvme-pci: add the DISABLE_WRITE_ZEROES quirk for a Samsung PM1725a
  nvme-rdma: Fix a use after free in nvmet_rdma_write_data_done
  nvme-core: check ctrl css before setting up zns
  nvme-fc: fix racing controller reset and create association
  nvme-fc: return NVME_SC_HOST_ABORTED_CMD when a command has been aborted
  nvme-fc: set NVME_REQ_CANCELLED in nvme_fc_terminate_exchange()
  nvme: add NVME_REQ_CANCELLED flag in nvme_cancel_request()
  nvme: simplify error logic in nvme_validate_ns()
  nvme: set max_zone_append_sectors nvme_revalidate_zones
  block: rsxx: fix error return code of rsxx_pci_probe()
  block: Fix REQ_OP_ZONE_RESET_ALL handling
  umem: fix error return code in mm_pci_probe()
  blk-cgroup: Fix the recursive blkg rwstat
  s390/dasd: fix hanging IO request during DASD driver unbind
  s390/dasd: fix hanging DASD driver unbind
  block: Try to handle busy underlying device on discard
2021-03-12 13:25:49 -08:00
Linus Torvalds
9278be92f2 Merge tag 'io_uring-5.12-2021-03-12' of git://git.kernel.dk/linux-block
Pull io_uring fixes from Jens Axboe:
 "Not quite as small this week as I had hoped, but at least this should
  be the end of it. All the little known issues have been ironed out -
  most of it little stuff, but cancelations being the bigger part. Only
  minor tweaks and/or regular fixes expected beyond this point.

   - Fix the creds tracking for async (io-wq and SQPOLL)

   - Various SQPOLL fixes related to parking, sharing, forking, IOPOLL,
     completions, and life times. Much simpler now.

   - Make IO threads unfreezable by default, on account of a bug report
     that had them spinning on resume. Honestly not quite sure why
     thawing leaves us with a perpetual signal pending (causing the
     spin), but for now make them unfreezable like there were in 5.11
     and prior.

   - Move personality_idr to xarray, solving a use-after-free related to
     removing an entry from the iterator callback. Buffer idr needs the
     same treatment.

   - Re-org around and task vs context tracking, enabling the fixing of
     cancelations, and then cancelation fixes on top.

   - Various little bits of cleanups and hardening, and removal of now
     dead parts"

* tag 'io_uring-5.12-2021-03-12' of git://git.kernel.dk/linux-block: (34 commits)
  io_uring: fix OP_ASYNC_CANCEL across tasks
  io_uring: cancel sqpoll via task_work
  io_uring: prevent racy sqd->thread checks
  io_uring: remove useless ->startup completion
  io_uring: cancel deferred requests in try_cancel
  io_uring: perform IOPOLL reaping if canceler is thread itself
  io_uring: force creation of separate context for ATTACH_WQ and non-threads
  io_uring: remove indirect ctx into sqo injection
  io_uring: fix invalid ctx->sq_thread_idle
  kernel: make IO threads unfreezable by default
  io_uring: always wait for sqd exited when stopping SQPOLL thread
  io_uring: remove unneeded variable 'ret'
  io_uring: move all io_kiocb init early in io_init_req()
  io-wq: fix ref leak for req in case of exit cancelations
  io_uring: fix complete_post races for linked req
  io_uring: add io_disarm_next() helper
  io_uring: fix io_sq_offload_create error handling
  io-wq: remove unused 'user' member of io_wq
  io_uring: Convert personality_idr to XArray
  io_uring: clean R_DISABLED startup mess
  ...
2021-03-12 13:13:57 -08:00
Linus Torvalds
261410082d Merge tag 'devprop-5.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull device properties framework fixes from Rafael Wysocki:
 "Prevent software nodes from being registered before their parents and
  fix a recent mistake causing already registered software nodes to be
  registered again in some cases (Heikki Krogerus)"

* tag 'devprop-5.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  software node: Fix device_add_software_node()
  software node: Fix node registration
2021-03-12 13:09:29 -08:00
Linus Torvalds
3077f0279e Merge tag 'pm-5.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
 "These fix an operating performance point (OPP) reference counting
  issue and three issues in ARM cpufreq drivers.

  Specifics:

   - Add a flag to mark OPPs that are not referenced by he OPP core any
     more to prevent OPPs from being freed prematurely by mistake (Beata
     Michalska).

   - Add ARM Vexpress platforms to the cpufreq-dt-platdev blacklist
     since the actual scaling of them is handled elsewhere (Sudeep
     Holla).

   - Fix a function return value check and a possible use-after-free in
     the qcom-hw cpufreq driver (Shawn Guo, Wei Yongjun)"

* tag 'pm-5.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  opp: Don't drop extra references to OPPs accidentally
  cpufreq: blacklist Arm Vexpress platforms in cpufreq-dt-platdev
  cpufreq: qcom-hw: Fix return value check in qcom_cpufreq_hw_cpu_init()
  cpufreq: qcom-hw: fix dereferencing freed memory 'data'
2021-03-12 12:28:03 -08:00
Christoph Hellwig
f4f9fc29e5 nvme: fix the nsid value to print in nvme_validate_or_alloc_ns
ns can be NULL at this point, and my move of the check from
the original patch by Chaitanya broke this.

Fixes: 0ec84df495 ("nvme-core: check ctrl css before setting up zns")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-12 13:17:45 -07:00
Linus Torvalds
344178334b Merge tag 'sound-5.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
 "No surprise here, only a collection of device-specific fixes for
  USB-audio and HD-audio at this time"

* tag 'sound-5.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: hda/hdmi: Cancel pending works before suspend
  ALSA: hda: Avoid spurious unsol event handling during S3/S4
  ALSA: hda: Flush pending unsolicited events before suspend
  ALSA: usb-audio: fix use after free in usb_audio_disconnect
  ALSA: usb-audio: fix NULL ptr dereference in usb_audio_probe
  ALSA: hda/ca0132: Add Sound BlasterX AE-5 Plus support
  ALSA: hda: Drop the BATCH workaround for AMD controllers
  ALSA: hda/conexant: Add quirk for mute LED control on HP ZBook G5
  ALSA: usb-audio: Apply the control quirk to Plantronics headsets
  ALSA: usb-audio: Fix "cannot get freq eq" errors on Dell AE515 sound bar
  ALSA: hda: ignore invalid NHLT table
  ALSA: usb-audio: Disable USB autosuspend properly in setup_disable_autosuspend()
  ALSA: usb: Add Plantronics C320-M USB ctrl msg delay quirk
2021-03-12 12:01:26 -08:00
Linus Torvalds
568099a703 Merge tag 'mmc-v5.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc
Pull MMC fixes from Ulf Hansson:
 "MMC core:

   - Fix partition switch time for eMMC

  MMC host:

   - mmci: Enforce R1B response to fix busy detection for
     the stm32 variants

   - cqhci: Fix crash when removing mmc module/card"

* tag 'mmc-v5.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
  mmc: cqhci: Fix random crash when remove mmc module/card
  mmc: core: Fix partition switch time for eMMC
  mmc: mmci: Add MMC_CAP_NEED_RSP_BUSY for the stm32 variants
2021-03-12 11:58:33 -08:00
Linus Torvalds
270c0551ab Merge tag 'regulator-fix-v5.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
Pull regulator fixes from Mark Brown:
 "A small collection fo driver specific fixes that have arrived since
  the merge window"

* tag 'regulator-fix-v5.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
  regulator: mt6315: Fix off-by-one for .n_voltages
  regulator: rt4831: Fix return value check in rt4831_regulator_probe()
  regulator: pca9450: Clear PRESET_EN bit to fix BUCK1/2/3 voltage setting
  regulator: qcom-rpmh: Use correct buck for S1C regulator
  regulator: qcom-rpmh: Correct the pmic5_hfsmps515 buck
  regulator: pca9450: Fix return value when failing to get sd-vsel GPIO
  regulator: mt6315: Return REGULATOR_MODE_INVALID for invalid mode
2021-03-12 11:55:58 -08:00
Linus Torvalds
8d9d53de51 Merge tag 'configfs-for-5.12' of git://git.infradead.org/users/hch/configfs
Pull configfs fix from Christoph Hellwig:

 - fix a use-after-free in __configfs_open_file (Daiyue Zhang)

* tag 'configfs-for-5.12' of git://git.infradead.org/users/hch/configfs:
  configfs: fix a use-after-free in __configfs_open_file
2021-03-12 11:48:14 -08:00
Linus Torvalds
b77b5fdd05 Merge tag 'gfs2-v5.12-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2
Pull gfs2 fixes from Andreas Gruenbacher:
 "Various gfs2 fixes"

* tag 'gfs2-v5.12-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
  gfs2: bypass log flush if the journal is not live
  gfs2: bypass signal_our_withdraw if no journal
  gfs2: fix use-after-free in trans_drain
  gfs2: make function gfs2_make_fs_ro() to void type
2021-03-12 11:46:09 -08:00
Linus Torvalds
17f8fc198a Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
 "We've got a smattering of changes all over the place which we've
  acrued since -rc1. To my knowledge, there aren't any pending issues at
  the moment, but there's still plenty of time for something else to
  crop up...

  Summary:

   - Fix booting a 52-bit-VA-aware kernel on Qualcomm Amberwing

   - Fix pfn_valid() not to reject all ZONE_DEVICE memory

   - Fix memory tagging setup for hotplugged memory regions

   - Fix KASAN tagging in page_alloc() when DEBUG_VIRTUAL is enabled

   - Fix accidental truncation of CPU PMU event counters

   - Fix error code initialisation when failing probe of DMC620 PMU

   - Fix return value initialisation for sve-ptrace selftest

   - Drop broken support for CMDLINE_EXTEND"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  perf/arm_dmc620_pmu: Fix error return code in dmc620_pmu_device_probe()
  arm64: mm: remove unused __cpu_uses_extended_idmap[_level()]
  arm64: mm: use a 48-bit ID map when possible on 52-bit VA builds
  arm64: perf: Fix 64-bit event counter read truncation
  arm64/mm: Fix __enable_mmu() for new TGRAN range values
  kselftest: arm64: Fix exit code of sve-ptrace
  arm64: mte: Map hotplugged memory as Normal Tagged
  arm64: kasan: fix page_alloc tagging with DEBUG_VIRTUAL
  arm64/mm: Reorganize pfn_valid()
  arm64/mm: Fix pfn_valid() for ZONE_DEVICE based memory
  arm64/mm: Drop THP conditionality from FORCE_MAX_ZONEORDER
  arm64/mm: Drop redundant ARCH_WANT_HUGE_PMD_SHARE
  arm64: Drop support for CMDLINE_EXTEND
  arm64: cpufeatures: Fix handling of CONFIG_CMDLINE for idreg overrides
2021-03-12 11:39:53 -08:00
Linus Torvalds
6bf8819fed Merge tag 'for-linus-5.12b-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen fixes from Juergen Gross:
 "Two fix series and a single cleanup:

   - a small cleanup patch to remove unneeded symbol exports

   - a series to cleanup Xen grant handling (avoiding allocations in
     some cases, and using common defines for "invalid" values)

   - a series to address a race issue in Xen event channel handling"

* tag 'for-linus-5.12b-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  Xen/gntdev: don't needlessly use kvcalloc()
  Xen/gnttab: introduce common INVALID_GRANT_{HANDLE,REF}
  Xen/gntdev: don't needlessly allocate k{,un}map_ops[]
  Xen: drop exports of {set,clear}_foreign_p2m_mapping()
  xen/events: avoid handling the same event on two cpus at the same time
  xen/events: don't unmask an event channel when an eoi is pending
  xen/events: reset affinity of 2-level event when tearing it down
2021-03-12 11:34:36 -08:00
Wanpeng Li
35737d2db2 KVM: LAPIC: Advancing the timer expiration on guest initiated write
Advancing the timer expiration should only be necessary on guest initiated
writes. When we cancel the timer and clear .pending during state restore,
clear expired_tscdeadline as well.

Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Message-Id: <1614818118-965-1-git-send-email-wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-12 13:18:52 -05:00
Sean Christopherson
8df9f1af2e KVM: x86/mmu: Skip !MMU-present SPTEs when removing SP in exclusive mode
If mmu_lock is held for write, don't bother setting !PRESENT SPTEs to
REMOVED_SPTE when recursively zapping SPTEs as part of shadow page
removal.  The concurrent write protections provided by REMOVED_SPTE are
not needed, there are no backing page side effects to record, and MMIO
SPTEs can be left as is since they are protected by the memslot
generation, not by ensuring that the MMIO SPTE is unreachable (which
is racy with respect to lockless walks regardless of zapping behavior).

Skipping !PRESENT drastically reduces the number of updates needed to
tear down sparsely populated MMUs, e.g. when tearing down a 6gb VM that
didn't touch much memory, 6929/7168 (~96.6%) of SPTEs were '0' and could
be skipped.

Avoiding the write itself is likely close to a wash, but avoiding
__handle_changed_spte() is a clear-cut win as that involves saving and
restoring all non-volatile GPRs (it's a subtly big function), as well as
several conditional branches before bailing out.

Cc: Ben Gardon <bgardon@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210310003029.1250571-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-12 13:18:52 -05:00
Wanpeng Li
d7eb79c629 KVM: kvmclock: Fix vCPUs > 64 can't be online/hotpluged
# lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                88
On-line CPU(s) list:   0-63
Off-line CPU(s) list:  64-87

# cat /proc/cmdline
BOOT_IMAGE=/vmlinuz-5.10.0-rc3-tlinux2-0050+ root=/dev/mapper/cl-root ro
rd.lvm.lv=cl/root rhgb quiet console=ttyS0 LANG=en_US .UTF-8 no-kvmclock-vsyscall

# echo 1 > /sys/devices/system/cpu/cpu76/online
-bash: echo: write error: Cannot allocate memory

The per-cpu vsyscall pvclock data pointer assigns either an element of the
static array hv_clock_boot (#vCPU <= 64) or dynamically allocated memory
hvclock_mem (vCPU > 64), the dynamically memory will not be allocated if
kvmclock vsyscall is disabled, this can result in cpu hotpluged fails in
kvmclock_setup_percpu() which returns -ENOMEM. It's broken for no-vsyscall
and sometimes you end up with vsyscall disabled if the host does something
strange. This patch fixes it by allocating this dynamically memory
unconditionally even if vsyscall is disabled.

Fixes: 6a1cac56f4 ("x86/kvm: Use __bss_decrypted attribute in shared variables")
Reported-by: Zelin Deng <zelin.deng@linux.alibaba.com>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: stable@vger.kernel.org#v4.19-rc5+
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Message-Id: <1614130683-24137-1-git-send-email-wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-12 13:18:16 -05:00
Muhammad Usama Anjum
6fcd9cbc6a kvm: x86: annotate RCU pointers
This patch adds the annotation to fix the following sparse errors:
arch/x86/kvm//x86.c:8147:15: error: incompatible types in comparison expression (different address spaces):
arch/x86/kvm//x86.c:8147:15:    struct kvm_apic_map [noderef] __rcu *
arch/x86/kvm//x86.c:8147:15:    struct kvm_apic_map *
arch/x86/kvm//x86.c:10628:16: error: incompatible types in comparison expression (different address spaces):
arch/x86/kvm//x86.c:10628:16:    struct kvm_apic_map [noderef] __rcu *
arch/x86/kvm//x86.c:10628:16:    struct kvm_apic_map *
arch/x86/kvm//x86.c:10629:15: error: incompatible types in comparison expression (different address spaces):
arch/x86/kvm//x86.c:10629:15:    struct kvm_pmu_event_filter [noderef] __rcu *
arch/x86/kvm//x86.c:10629:15:    struct kvm_pmu_event_filter *
arch/x86/kvm//lapic.c:267:15: error: incompatible types in comparison expression (different address spaces):
arch/x86/kvm//lapic.c:267:15:    struct kvm_apic_map [noderef] __rcu *
arch/x86/kvm//lapic.c:267:15:    struct kvm_apic_map *
arch/x86/kvm//lapic.c:269:9: error: incompatible types in comparison expression (different address spaces):
arch/x86/kvm//lapic.c:269:9:    struct kvm_apic_map [noderef] __rcu *
arch/x86/kvm//lapic.c:269:9:    struct kvm_apic_map *
arch/x86/kvm//lapic.c:637:15: error: incompatible types in comparison expression (different address spaces):
arch/x86/kvm//lapic.c:637:15:    struct kvm_apic_map [noderef] __rcu *
arch/x86/kvm//lapic.c:637:15:    struct kvm_apic_map *
arch/x86/kvm//lapic.c:994:15: error: incompatible types in comparison expression (different address spaces):
arch/x86/kvm//lapic.c:994:15:    struct kvm_apic_map [noderef] __rcu *
arch/x86/kvm//lapic.c:994:15:    struct kvm_apic_map *
arch/x86/kvm//lapic.c:1036:15: error: incompatible types in comparison expression (different address spaces):
arch/x86/kvm//lapic.c:1036:15:    struct kvm_apic_map [noderef] __rcu *
arch/x86/kvm//lapic.c:1036:15:    struct kvm_apic_map *
arch/x86/kvm//lapic.c:1173:15: error: incompatible types in comparison expression (different address spaces):
arch/x86/kvm//lapic.c:1173:15:    struct kvm_apic_map [noderef] __rcu *
arch/x86/kvm//lapic.c:1173:15:    struct kvm_apic_map *
arch/x86/kvm//pmu.c:190:18: error: incompatible types in comparison expression (different address spaces):
arch/x86/kvm//pmu.c:190:18:    struct kvm_pmu_event_filter [noderef] __rcu *
arch/x86/kvm//pmu.c:190:18:    struct kvm_pmu_event_filter *
arch/x86/kvm//pmu.c:251:18: error: incompatible types in comparison expression (different address spaces):
arch/x86/kvm//pmu.c:251:18:    struct kvm_pmu_event_filter [noderef] __rcu *
arch/x86/kvm//pmu.c:251:18:    struct kvm_pmu_event_filter *
arch/x86/kvm//pmu.c:522:18: error: incompatible types in comparison expression (different address spaces):
arch/x86/kvm//pmu.c:522:18:    struct kvm_pmu_event_filter [noderef] __rcu *
arch/x86/kvm//pmu.c:522:18:    struct kvm_pmu_event_filter *
arch/x86/kvm//pmu.c:522:18: error: incompatible types in comparison expression (different address spaces):
arch/x86/kvm//pmu.c:522:18:    struct kvm_pmu_event_filter [noderef] __rcu *
arch/x86/kvm//pmu.c:522:18:    struct kvm_pmu_event_filter *

Signed-off-by: Muhammad Usama Anjum <musamaanjum@gmail.com>
Message-Id: <20210305191123.GA497469@LEGION>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-12 13:17:41 -05:00
Rafael J. Wysocki
7180323227 Merge branch 'pm-opp'
* pm-opp:
  opp: Don't drop extra references to OPPs accidentally
2021-03-12 18:47:22 +01:00
Rafael J. Wysocki
bee7359f65 Merge branch 'opp/linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm
Pull an operating performance points (OPP) framework fix for 5.12
from Viresh Kumar:

"Fix OPP refcount issue noticed by Beata."

* 'opp/linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm:
  opp: Don't drop extra references to OPPs accidentally
2021-03-12 18:45:44 +01:00
Pavel Begunkov
58f9937383 io_uring: fix OP_ASYNC_CANCEL across tasks
IORING_OP_ASYNC_CANCEL tries io-wq cancellation only for current task.
If it fails go over tctx_list and try it out for every single tctx.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-12 09:42:56 -07:00
Pavel Begunkov
521d6a737a io_uring: cancel sqpoll via task_work
1) The first problem is io_uring_cancel_sqpoll() ->
io_uring_cancel_task_requests() basically doing park(); park(); and so
hanging.

2) Another one is more subtle, when the master task is doing cancellations,
but SQPOLL task submits in-between the end of the cancellation but
before finish() requests taking a ref to the ctx, and so eternally
locking it up.

3) Yet another is a dying SQPOLL task doing io_uring_cancel_sqpoll() and
same io_uring_cancel_sqpoll() from the owner task, they race for
tctx->wait events. And there probably more of them.

Instead do SQPOLL cancellations from within SQPOLL task context via
task_work, see io_sqpoll_cancel_sync(). With that we don't need temporal
park()/unpark() during cancellation, which is ugly, subtle and anyway
doesn't allow to do io_run_task_work() properly.

io_uring_cancel_sqpoll() is called only from SQPOLL task context and
under sqd locking, so all parking is removed from there. And so,
io_sq_thread_[un]park() and io_sq_thread_stop() are not used now by
SQPOLL task, and that spare us from some headache.

Also remove ctx->sqd_list early to avoid 2). And kill tctx->sqpoll,
which is not used anymore.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-12 09:42:55 -07:00
Pavel Begunkov
26984fbf3a io_uring: prevent racy sqd->thread checks
SQPOLL thread to which we're trying to attach may be going away, it's
not nice but a more serious problem is if io_sq_offload_create() sees
sqd->thread==NULL, and tries to init it with a new thread. There are
tons of ways it can be exploited or fail.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-12 09:42:53 -07:00
Marc Zyngier
262b003d05 KVM: arm64: Fix exclusive limit for IPA size
When registering a memslot, we check the size and location of that
memslot against the IPA size to ensure that we can provide guest
access to the whole of the memory.

Unfortunately, this check rejects memslot that end-up at the exact
limit of the addressing capability for a given IPA size. For example,
it refuses the creation of a 2GB memslot at 0x8000000 with a 32bit
IPA space.

Fix it by relaxing the check to accept a memslot reaching the
limit of the IPA space.

Fixes: c3058d5da2 ("arm/arm64: KVM: Ensure memslots are within KVM_PHYS_SIZE")
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org
Reviewed-by: Andrew Jones <drjones@redhat.com>
Link: https://lore.kernel.org/r/20210311100016.3830038-3-maz@kernel.org
2021-03-12 15:43:22 +00:00
Marc Zyngier
7d717558dd KVM: arm64: Reject VM creation when the default IPA size is unsupported
KVM/arm64 has forever used a 40bit default IPA space, partially
due to its 32bit heritage (where the only choice is 40bit).

However, there are implementations in the wild that have a *cough*
much smaller *cough* IPA space, which leads to a misprogramming of
VTCR_EL2, and a guest that is stuck on its first memory access
if userspace dares to ask for the default IPA setting (which most
VMMs do).

Instead, blundly reject the creation of such VM, as we can't
satisfy the requirements from userspace (with a one-off warning).
Also clarify the boot warning, and document that the VM creation
will fail when an unsupported IPA size is provided.

Although this is an ABI change, it doesn't really change much
for userspace:

- the guest couldn't run before this change, but no error was
  returned. At least userspace knows what is happening.

- a memory slot that was accepted because it did fit the default
  IPA space now doesn't even get a chance to be registered.

The other thing that is left doing is to convince userspace to
actually use the IPA space setting instead of relying on the
antiquated default.

Fixes: 233a7cb235 ("kvm: arm64: Allow tuning the physical address size for VM")
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org
Reviewed-by: Andrew Jones <drjones@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Link: https://lore.kernel.org/r/20210311100016.3830038-2-maz@kernel.org
2021-03-12 15:42:57 +00:00
Bob Peterson
0efc4976e3 gfs2: bypass log flush if the journal is not live
Patch fe3e397668 ("gfs2: Rework the log space allocation logic")
changed gfs2_log_flush to reserve a set of journal blocks in case no
transaction is active.  However, gfs2_log_flush also gets called in
cases where we don't have an active journal, for example, for spectator
mounts.  In that case, trying to reserve blocks would sleep forever, but
we want gfs2_log_flush to be a no-op instead.

Fixes: fe3e397668 ("gfs2: Rework the log space allocation logic")
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-03-12 15:52:48 +01:00
Pavel Begunkov
0df8ea602b io_uring: remove useless ->startup completion
We always do complete(&sqd->startup) almost right after sqd->thread
creation, either in the success path or in io_sq_thread_finish(). It's
specifically created not started for us to be able to set some stuff
like sqd->thread and io_uring_alloc_task_context() before following
right after wake_up_new_task().

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-12 07:23:01 -07:00
Pavel Begunkov
e1915f76a8 io_uring: cancel deferred requests in try_cancel
As io_uring_cancel_files() and others let SQO to run between
io_uring_try_cancel_requests(), SQO may generate new deferred requests,
so it's safer to try to cancel them in it.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-12 07:23:00 -07:00
Jens Axboe
d4b64fd702 Merge tag 'nvme-5.12-2021-03-12' of git://git.infradead.org/nvme into block-5.12
Pull NVMe fixes from Christoph:

"nvme fixes for 5.12:

 - one more quirk (Dmitry Monakhov)
 - fix max_zone_append_sectors initialization (Chaitanya Kulkarni)
 - nvme-fc reset/create race fix (James Smart)
 - fix status code on aborts/resets (Hannes Reinecke)
 - fix the CSS check for ZNS namespaces (Chaitanya Kulkarni)
 - fix a use after free in a debug printk in nvme-rdma (Lv Yunlong)"

* tag 'nvme-5.12-2021-03-12' of git://git.infradead.org/nvme:
  nvme-pci: add the DISABLE_WRITE_ZEROES quirk for a Samsung PM1725a
  nvme-rdma: Fix a use after free in nvmet_rdma_write_data_done
  nvme-core: check ctrl css before setting up zns
  nvme-fc: fix racing controller reset and create association
  nvme-fc: return NVME_SC_HOST_ABORTED_CMD when a command has been aborted
  nvme-fc: set NVME_REQ_CANCELLED in nvme_fc_terminate_exchange()
  nvme: add NVME_REQ_CANCELLED flag in nvme_cancel_request()
  nvme: simplify error logic in nvme_validate_ns()
  nvme: set max_zone_append_sectors nvme_revalidate_zones
2021-03-12 07:21:15 -07:00
Bob Peterson
d5bf630f35 gfs2: bypass signal_our_withdraw if no journal
Before this patch, function signal_our_withdraw referenced the journal
inode immediately. But corrupt file systems may have some invalid
journals, in which case our attempt to read it in will withdraw and the
resulting signal_our_withdraw would dereference the NULL value.

This patch adds a check to signal_our_withdraw so that if the journal
has not yet been initialized, it simply returns and does the old-style
withdraw.

Thanks, Andy Price, for his analysis.

Reported-by: syzbot+50a8a9cf8127f2c6f5df@syzkaller.appspotmail.com
Fixes: 601ef0d52e ("gfs2: Force withdraw to replay journals and wait for it to finish")
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-03-12 14:55:23 +01:00
Wei Yongjun
c8e3866836 perf/arm_dmc620_pmu: Fix error return code in dmc620_pmu_device_probe()
Fix to return negative error code -ENOMEM from the error handling
case instead of 0, as done elsewhere in this function.

Fixes: 53c218da22 ("driver/perf: Add PMU driver for the ARM DMC-620 memory controller")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Link: https://lore.kernel.org/r/20210312080421.277562-1-weiyongjun1@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
2021-03-12 11:30:31 +00:00
Peter Zijlstra
ba08abca66 objtool,x86: Fix uaccess PUSHF/POPF validation
Commit ab234a260b ("x86/pv: Rework arch_local_irq_restore() to not
use popf") replaced "push %reg; popf" with something like: "test
$0x200, %reg; jz 1f; sti; 1:", which breaks the pushf/popf symmetry
that commit ea24213d80 ("objtool: Add UACCESS validation") relies
on.

The result is:

  drivers/gpu/drm/amd/amdgpu/si.o: warning: objtool: si_common_hw_init()+0xf36: PUSHF stack exhausted

Meanwhile, commit c9c324dc22 ("objtool: Support stack layout changes
in alternatives") makes that we can actually use stack-ops in
alternatives, which means we can revert 1ff865e343 ("x86,smap: Fix
smap_{save,restore}() alternatives").

That in turn means we can limit the PUSHF/POPF handling of
ea24213d80 to those instructions that are in alternatives.

Fixes: ab234a260b ("x86/pv: Rework arch_local_irq_restore() to not use popf")
Reported-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lkml.kernel.org/r/YEY4rIbQYa5fnnEp@hirez.programming.kicks-ass.net
2021-03-12 09:15:49 +01:00
Beata Michalska
606a5d4227 opp: Don't drop extra references to OPPs accidentally
We are required to call dev_pm_opp_put() from outside of the
opp_table->lock as debugfs removal needs to happen lock-less to avoid
circular dependency issues.

commit cf1fac943c ("opp: Reduce the size of critical section in
_opp_kref_release()") tried to fix that introducing a new routine
_opp_get_next() which keeps returning OPPs that can be freed by the
callers and this routine shall be called without holding the
opp_table->lock.

Though the commit overlooked the fact that the OPPs can be referenced by
other users as well and this routine will end up dropping references
which were taken by other users and hence freeing the OPPs prematurely.

In effect, other users of the OPPs will end up having invalid pointers
at hand. We didn't see any crash reports earlier as the exact situation
never happened, though it is certainly possible.

We need a way to mark which OPPs are no longer referenced by the OPP
core, so we don't drop extra references to them accidentally.

This commit adds another OPP flag, "removed", which is used to track
this. And now we should never end up dropping extra references to the
OPPs.

Cc: v5.11+ <stable@vger.kernel.org> # v5.11+
Fixes: cf1fac943c ("opp: Reduce the size of critical section in _opp_kref_release()")
Signed-off-by: Beata Michalska <beata.michalska@arm.com>
[ Viresh: Almost rewrote entire patch, added new "removed" field,
	  rewrote commit log and added the correct Fixes tag. ]
Co-developed-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2021-03-12 09:26:52 +05:30
Linus Torvalds
f78d76e72a Merge tag 'drm-fixes-2021-03-12-1' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
 "Regular fixes for rc3. The i915 pull was based on the rc1 tag so I
  just cherry-picked the single fix from there to avoid it. The misc and
  amd trees seem to be on okay bases.

  It's a bunch of fixes across the tree, amdgpu has most of them a few
  ttm fixes around qxl, and nouveau.

  core:
   - Clear holes when converting compat ioctl's between 32-bits and
     64-bits.

  docs:
   - Use gitlab for drm bugzilla now.

  ttm:
   - Fix ttm page pool accounting.

  fbdev:
   - Fix oops in drm_fbdev_cleanup()

  shmem:
   - Assorted fixes for shmem helpers.

  qxl:
   - unpin qxl bos created as pinned when freeing them, and make ttm
     only warn once on this behavior.
   - Zero head.surface_id correctly in qxl.

  atyfb:
   - Use LCD management for atyfb on PPC_MAC.

  meson:
   - Shutdown kms poll helper in meson correctly.

  nouveau:
   - fix regression in bo syncing

  i915:
   - Wedge the GPU if command parser setup fails

  amdgpu:
   - Fix aux backlight control
   - Add a backlight override parameter
   - Various display fixes
   - PCIe DPM fix for vega
   - Polaris watermark fixes
   - Additional S0ix fix

  radeon:
   - Fix GEM regression
   - Fix AGP dependency handling"

* tag 'drm-fixes-2021-03-12-1' of git://anongit.freedesktop.org/drm/drm: (33 commits)
  drm/nouveau: fix dma syncing for loops (v2)
  drm/i915: Wedge the GPU if command parser setup fails
  drm/compat: Clear bounce structures
  drm/shmem-helpers: vunmap: Don't put pages for dma-buf
  drm: meson_drv add shutdown function
  drm/shmem-helper: Don't remove the offset in vm_area_struct pgoff
  drm/shmem-helper: Check for purged buffers in fault handler
  qxl: Fix uninitialised struct field head.surface_id
  drm/ttm: Fix TTM page pool accounting
  drm/ttm: soften TTM warnings
  drm: Use USB controller's DMA mask when importing dmabufs
  MAINTAINERS: update drm bug reporting URL
  fbdev: atyfb: use LCD management functions for PPC_PMAC also
  fbdev: atyfb: always declare aty_{ld,st}_lcd()
  drm/qxl: fix lockdep issue in qxl_alloc_release_reserved
  drm/qxl: unpin release objects
  drm/fb-helper: only unmap if buffer not null
  drm/amdgpu: fix S0ix handling when the CONFIG_AMD_PMC=m
  drm/radeon: fix AGP dependency
  drm/radeon: also init GEM funcs in radeon_gem_prime_import_sg_table
  ...
2021-03-11 17:38:49 -08:00
Dave Airlie
4042160c2e drm/nouveau: fix dma syncing for loops (v2)
The index variable should only be increased in one place.

Noticed this while trying to track down another oops.

v2: use while loop.

Fixes: f295c8cfec ("drm/nouveau: fix dma syncing warning with debugging on.")
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210311043527.5376-1-airlied@gmail.com
2021-03-12 11:21:47 +10:00
Tvrtko Ursulin
a829f033e9 drm/i915: Wedge the GPU if command parser setup fails
Commit 311a50e76a ("drm/i915: Add support for mandatory cmdparsing")
introduced mandatory command parsing but setup failures were not
translated into wedging the GPU which was probably the intent.

Possible errors come in two categories. Either the sanity check on
internal tables has failed, which should be caught in CI unless an
affected platform would be missed in testing; or memory allocation failure
happened during driver load, which should be extremely unlikely but for
correctness should still be handled.

v2:
 * Tidy coding style. (Chris)

[airlied: cherry-picked to avoid rc1 base]
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Fixes: 311a50e76a ("drm/i915: Add support for mandatory cmdparsing")
Cc: Jon Bloomfield <jon.bloomfield@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Chris Wilson <chris.p.wilson@intel.com>
Reviewed-by: Chris Wilson <chris.p.wilson@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210302114213.1102223-1-tvrtko.ursulin@linux.intel.com
(cherry picked from commit 5a1a659762d35a6dc51047c9127c011303c77b7f)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2021-03-12 11:20:50 +10:00
Dave Airlie
fb198483ed Merge tag 'amd-drm-fixes-5.12-2021-03-10' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes
amd-drm-fixes-5.12-2021-03-10:

amdgpu:
- Fix aux backlight control
- Add a backlight override parameter
- Various display fixes
- PCIe DPM fix for vega
- Polaris watermark fixes
- Additional S0ix fix

radeon:
- Fix GEM regression
- Fix AGP dependency handling

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210310221141.3974-1-alexander.deucher@amd.com
2021-03-12 11:20:02 +10:00
Dave Airlie
e0da968623 Merge tag 'drm-misc-fixes-2021-03-11' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
drm-misc-fixes for rc3, rebased on rc2:
- Fix oops in drm_fbdev_cleanup()
- unpin qxl bos created as pinned when freeing them,
  and make ttm only warn once on this behavior.
- Use LCD management for atyfb on PPC_MAC.
- Use gitlab for drm bugzilla now.
- Fix ttm page pool accounting.
- Zero head.surface_id correctly in qxl.
- Assorted fixes for shmem helpers.
- Shutdown kms poll helper in meson correctly.
- Clear holes when converting compat ioctl's between 32-bits and 64-bits.

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/4606f08e-d0e8-c543-5e96-cee2fd728a41@linux.intel.com
2021-03-12 10:30:36 +10:00
Christophe Leroy
0b736881c8 powerpc/traps: unrecoverable_exception() is not an interrupt handler
unrecoverable_exception() is called from interrupt handlers or
after an interrupt handler has failed.

Make it a standard function to avoid doubling the actions
performed on interrupt entry (e.g.: user time accounting).

Fixes: 3a96570ffc ("powerpc: convert interrupt handlers to use wrappers")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/ae96c59fa2cb7f24a8929c58cfa2c909cb8ff1f1.1615291471.git.christophe.leroy@csgroup.eu
2021-03-12 11:02:12 +11:00
Shin'ichiro Kawasaki
e511350590 block: Discard page cache of zone reset target range
When zone reset ioctl and data read race for a same zone on zoned block
devices, the data read leaves stale page cache even though the zone
reset ioctl zero clears all the zone data on the device. To avoid
non-zero data read from the stale page cache after zone reset, discard
page cache of reset target zones in blkdev_zone_mgmt_ioctl(). Introduce
the helper function blkdev_truncate_zone_range() to discard the page
cache. Ensure the page cache discarded by calling the helper function
before and after zone reset in same manner as fallocate does.

This patch can be applied back to the stable kernel version v5.10.y.
Rework is needed for older stable kernels.

Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Fixes: 3ed05a987e ("blk-zoned: implement ioctls")
Cc: <stable@vger.kernel.org> # 5.10+
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20210311072546.678999-1-shinichiro.kawasaki@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-11 11:49:25 -07:00
Daniel Wagner
9ec491447b block: Suppress uevent for hidden device when removed
register_disk() suppress uevents for devices with the GENHD_FL_HIDDEN
but enables uevents at the end again in order to announce disk after
possible partitions are created.

When the device is removed the uevents are still on and user land sees
'remove' messages for devices which were never 'add'ed to the system.

  KERNEL[95481.571887] remove   /devices/virtual/nvme-fabrics/ctl/nvme5/nvme0c5n1 (block)

Let's suppress the uevents for GENHD_FL_HIDDEN by not enabling the
uevents at all.

Signed-off-by: Daniel Wagner <dwagner@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin Wilck <mwilck@suse.com>
Link: https://lore.kernel.org/r/20210311151917.136091-1-dwagner@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-11 11:48:25 -07:00
Linus Torvalds
28806e4d9b Merge tag 'media/v5.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull media fixes from Mauro Carvalho Chehab:
 "A couple of fixes:

   - fix a build issue with CEC

   - fix a deadlock at usbtv driver

   - fix some null pointer address issues at vsp1 driver

   - fix a wrong bitmap setting at rkisp1 driver"

* tag 'media/v5.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
  media: rkisp1: params: fix wrong bits settings
  media: v4l: vsp1: Fix uif null pointer access
  media: v4l: vsp1: Fix bru null pointer access
  media: usbtv: Fix deadlock on suspend
  media: rc: compile rc-cec.c into rc-core
2021-03-11 10:39:18 -08:00
J. Bruce Fields
4f8be1f53b nfs: we don't support removing system.nfs4_acl
The NFSv4 protocol doesn't have any notion of reomoving an attribute, so
removexattr(path,"system.nfs4_acl") doesn't make sense.

There's no documented return value.  Arguably it could be EOPNOTSUPP but
I'm a little worried an application might take that to mean that we
don't support ACLs or xattrs.  How about EINVAL?

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2021-03-11 13:17:42 -05:00
Jens Axboe
d052d1d685 io_uring: perform IOPOLL reaping if canceler is thread itself
We bypass IOPOLL completion polling (and reaping) for the SQPOLL thread,
but if it's the thread itself invoking cancelations, then we still need
to perform it or no one will.

Fixes: 9936c7c2bc ("io_uring: deduplicate core cancellations sequence")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-11 10:49:20 -07:00
Jens Axboe
5c2469e0a2 io_uring: force creation of separate context for ATTACH_WQ and non-threads
Earlier kernels had SQPOLL threads that could share across anything, as
we grabbed the context we needed on a per-ring basis. This is no longer
the case, so only allow attaching directly if we're in the same thread
group. That is the common use case. For non-group tasks, just setup a
new context and thread as we would've done if sharing wasn't set. This
isn't 100% ideal in terms of CPU utilization for the forked and share
case, but hopefully that isn't much of a concern. If it is, there are
plans in motion for how to improve that. Most importantly, we want to
avoid app side regressions where sharing worked before and now doesn't.
With this patch, functionality is equivalent to previous kernels that
supported IORING_SETUP_ATTACH_WQ with SQPOLL.

Reported-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-11 10:17:56 -07:00
Christoph Hellwig
a8affc03a9 block: rename BIO_MAX_PAGES to BIO_MAX_VECS
Ever since the addition of multipage bio_vecs BIO_MAX_PAGES has been
horribly confusingly misnamed.  Rename it to BIO_MAX_VECS to stop
confusing users of the bio API.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20210311110137.1132391-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-11 07:47:48 -07:00
Axel Lin
d450293c55 regulator: mt6315: Fix off-by-one for .n_voltages
The valid selector is 0 ~ 0xbf, so the .n_voltages should be 0xc0.

Signed-off-by: Axel Lin <axel.lin@ingics.com>
Link: https://lore.kernel.org/r/20210311020558.579597-1-axel.lin@ingics.com
Signed-off-by: Mark Brown <broonie@kernel.org>
2021-03-11 13:23:21 +00:00
Ard Biesheuvel
30b2675761 arm64: mm: remove unused __cpu_uses_extended_idmap[_level()]
These routines lost all existing users during the latest merge window so
we can remove them. This avoids the need to fix them in the context of
fixing a regression related to the ID map on 52-bit VA kernels.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20210310171515.416643-3-ardb@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
2021-03-11 13:04:28 +00:00
Ard Biesheuvel
7ba8f2b2d6 arm64: mm: use a 48-bit ID map when possible on 52-bit VA builds
52-bit VA kernels can run on hardware that is only 48-bit capable, but
configure the ID map as 52-bit by default. This was not a problem until
recently, because the special T0SZ value for a 52-bit VA space was never
programmed into the TCR register anwyay, and because a 52-bit ID map
happens to use the same number of translation levels as a 48-bit one.

This behavior was changed by commit 1401bef703 ("arm64: mm: Always update
TCR_EL1 from __cpu_set_tcr_t0sz()"), which causes the unsupported T0SZ
value for a 52-bit VA to be programmed into TCR_EL1. While some hardware
simply ignores this, Mark reports that Amberwing systems choke on this,
resulting in a broken boot. But even before that commit, the unsupported
idmap_t0sz value was exposed to KVM and used to program TCR_EL2 incorrectly
as well.

Given that we already have to deal with address spaces being either 48-bit
or 52-bit in size, the cleanest approach seems to be to simply default to
a 48-bit VA ID map, and only switch to a 52-bit one if the placement of the
kernel in DRAM requires it. This is guaranteed not to happen unless the
system is actually 52-bit VA capable.

Fixes: 90ec95cda9 ("arm64: mm: Introduce VA_BITS_MIN")
Reported-by: Mark Salter <msalter@redhat.com>
Link: http://lore.kernel.org/r/20210310003216.410037-1-msalter@redhat.com
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20210310171515.416643-2-ardb@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
2021-03-11 13:04:28 +00:00
Mathias Nyman
d26c00e727 xhci: Fix repeated xhci wake after suspend due to uncleared internal wake state
If port terminations are detected in suspend, but link never reaches U0
then xHCI may have an internal uncleared wake state that will cause an
immediate wake after suspend.

This wake state is normally cleared when driver clears the PORT_CSC bit,
which is set after a device is enabled and in U0.

Write 1 to clear PORT_CSC for ports that don't have anything connected
when suspending. This makes sure any pending internal wake states in
xHCI are cleared.

Cc: stable@vger.kernel.org
Tested-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20210311115353.2137560-5-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-11 13:03:06 +01:00
Forest Crossman
b71c669ad8 usb: xhci: Fix ASMedia ASM1042A and ASM3242 DMA addressing
I've confirmed that both the ASMedia ASM1042A and ASM3242 have the same
problem as the ASM1142 and ASM2142/ASM3142, where they lose some of the
upper bits of 64-bit DMA addresses. As with the other chips, this can
cause problems on systems where the upper bits matter, and adding the
XHCI_NO_64BIT_SUPPORT quirk completely fixes the issue.

Cc: stable@vger.kernel.org
Signed-off-by: Forest Crossman <cyrozap@gmail.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20210311115353.2137560-4-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-11 13:03:06 +01:00
Mathias Nyman
253f588c70 xhci: Improve detection of device initiated wake signal.
A xHC USB 3 port might miss the first wake signal from a USB 3 device
if the port LFPS reveiver isn't enabled fast enough after xHC resume.

xHC host will anyway be resumed by a PME# signal, but will go back to
suspend if no port activity is seen.
The device resends the U3 LFPS wake signal after a 100ms delay, but
by then host is already suspended, starting all over from the
beginning of this issue.

USB 3 specs say U3 wake LFPS signal is sent for max 10ms, then device
needs to delay 100ms before resending the wake.

Don't suspend immediately if port activity isn't detected in resume.
Instead add a retry. If there is no port activity then delay for 120ms,
and re-check for port activity.

Cc: <stable@vger.kernel.org>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20210311115353.2137560-3-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-11 13:03:06 +01:00
Stanislaw Gruszka
a4a251f8c2 usb: xhci: do not perform Soft Retry for some xHCI hosts
On some systems rt2800usb and mt7601u devices are unable to operate since
commit f8f80be501 ("xhci: Use soft retry to recover faster from
transaction errors")

Seems that some xHCI controllers can not perform Soft Retry correctly,
affecting those devices.

To avoid the problem add xhci->quirks flag that restore pre soft retry
xhci behaviour for affected xHCI controllers. Currently those are
AMD_PROMONTORYA_4 and AMD_PROMONTORYA_2, since it was confirmed
by the users: on those xHCI hosts issue happen and is gone after
disabling Soft Retry.

[minor commit message rewording for checkpatch -Mathias]

Fixes: f8f80be501 ("xhci: Use soft retry to recover faster from transaction errors")
Cc: <stable@vger.kernel.org> # 4.20+
Reported-by: Bernhard <bernhard.gebetsberger@gmx.at>
Tested-by: Bernhard <bernhard.gebetsberger@gmx.at>
Signed-off-by: Stanislaw Gruszka <stf_xl@wp.pl>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=202541
Link: https://lore.kernel.org/r/20210311115353.2137560-2-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-11 13:03:06 +01:00
Daiyue Zhang
14fbbc8297 configfs: fix a use-after-free in __configfs_open_file
Commit b0841eefd9 ("configfs: provide exclusion between IO and removals")
uses ->frag_dead to mark the fragment state, thus no bothering with extra
refcount on config_item when opening a file. The configfs_get_config_item
was removed in __configfs_open_file, but not with config_item_put. So the
refcount on config_item will lost its balance, causing use-after-free
issues in some occasions like this:

Test:
1. Mount configfs on /config with read-only items:
drwxrwx--- 289 root   root            0 2021-04-01 11:55 /config
drwxr-xr-x   2 root   root            0 2021-04-01 11:54 /config/a
--w--w--w-   1 root   root         4096 2021-04-01 11:53 /config/a/1.txt
......

2. Then run:
for file in /config
do
echo $file
grep -R 'key' $file
done

3. __configfs_open_file will be called in parallel, the first one
got called will do:
if (file->f_mode & FMODE_READ) {
	if (!(inode->i_mode & S_IRUGO))
		goto out_put_module;
			config_item_put(buffer->item);
				kref_put()
					package_details_release()
						kfree()

the other one will run into use-after-free issues like this:
BUG: KASAN: use-after-free in __configfs_open_file+0x1bc/0x3b0
Read of size 8 at addr fffffff155f02480 by task grep/13096
CPU: 0 PID: 13096 Comm: grep VIP: 00 Tainted: G        W       4.14.116-kasan #1
TGID: 13096 Comm: grep
Call trace:
dump_stack+0x118/0x160
kasan_report+0x22c/0x294
__asan_load8+0x80/0x88
__configfs_open_file+0x1bc/0x3b0
configfs_open_file+0x28/0x34
do_dentry_open+0x2cc/0x5c0
vfs_open+0x80/0xe0
path_openat+0xd8c/0x2988
do_filp_open+0x1c4/0x2fc
do_sys_open+0x23c/0x404
SyS_openat+0x38/0x48

Allocated by task 2138:
kasan_kmalloc+0xe0/0x1ac
kmem_cache_alloc_trace+0x334/0x394
packages_make_item+0x4c/0x180
configfs_mkdir+0x358/0x740
vfs_mkdir2+0x1bc/0x2e8
SyS_mkdirat+0x154/0x23c
el0_svc_naked+0x34/0x38

Freed by task 13096:
kasan_slab_free+0xb8/0x194
kfree+0x13c/0x910
package_details_release+0x524/0x56c
kref_put+0xc4/0x104
config_item_put+0x24/0x34
__configfs_open_file+0x35c/0x3b0
configfs_open_file+0x28/0x34
do_dentry_open+0x2cc/0x5c0
vfs_open+0x80/0xe0
path_openat+0xd8c/0x2988
do_filp_open+0x1c4/0x2fc
do_sys_open+0x23c/0x404
SyS_openat+0x38/0x48
el0_svc_naked+0x34/0x38

To fix this issue, remove the config_item_put in
__configfs_open_file to balance the refcount of config_item.

Fixes: b0841eefd9 ("configfs: provide exclusion between IO and removals")
Signed-off-by: Daiyue Zhang <zhangdaiyue1@huawei.com>
Signed-off-by: Yi Chen <chenyi77@huawei.com>
Signed-off-by: Ge Qiu <qiuge@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-03-11 12:13:48 +01:00
Dmitry Monakhov
abbb5f5929 nvme-pci: add the DISABLE_WRITE_ZEROES quirk for a Samsung PM1725a
This adds a quirk for Samsung PM1725a drive which fixes timeouts and
I/O errors due to the fact that the controller does not properly
handle the Write Zeroes command, dmesg log:

nvme nvme0: I/O 528 QID 10 timeout, aborting
nvme nvme0: I/O 529 QID 10 timeout, aborting
nvme nvme0: I/O 530 QID 10 timeout, aborting
nvme nvme0: I/O 531 QID 10 timeout, aborting
nvme nvme0: I/O 532 QID 10 timeout, aborting
nvme nvme0: I/O 533 QID 10 timeout, aborting
nvme nvme0: I/O 534 QID 10 timeout, aborting
nvme nvme0: I/O 535 QID 10 timeout, aborting
nvme nvme0: Abort status: 0x0
nvme nvme0: Abort status: 0x0
nvme nvme0: Abort status: 0x0
nvme nvme0: Abort status: 0x0
nvme nvme0: Abort status: 0x0
nvme nvme0: Abort status: 0x0
nvme nvme0: Abort status: 0x0
nvme nvme0: Abort status: 0x0
nvme nvme0: I/O 528 QID 10 timeout, reset controller
nvme nvme0: controller is down; will reset: CSTS=0x3, PCI_STATUS=0x10
nvme nvme0: Device not ready; aborting reset, CSTS=0x3
nvme nvme0: Device not ready; aborting reset, CSTS=0x3
nvme nvme0: Removing after probe failure status: -19
nvme0n1: detected capacity change from 6251233968 to 0
blk_update_request: I/O error, dev nvme0n1, sector 32776 op 0x1:(WRITE) flags 0x3000 phys_seg 6 prio class 0
blk_update_request: I/O error, dev nvme0n1, sector 113319936 op 0x9:(WRITE_ZEROES) flags 0x800 phys_seg 0 prio class 0
Buffer I/O error on dev nvme0n1p2, logical block 1, lost async page write
blk_update_request: I/O error, dev nvme0n1, sector 113319680 op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
Buffer I/O error on dev nvme0n1p2, logical block 2, lost async page write
blk_update_request: I/O error, dev nvme0n1, sector 113319424 op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
Buffer I/O error on dev nvme0n1p2, logical block 3, lost async page write
blk_update_request: I/O error, dev nvme0n1, sector 113319168 op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
Buffer I/O error on dev nvme0n1p2, logical block 4, lost async page write
blk_update_request: I/O error, dev nvme0n1, sector 113318912 op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
Buffer I/O error on dev nvme0n1p2, logical block 5, lost async page write
blk_update_request: I/O error, dev nvme0n1, sector 113318656 op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
Buffer I/O error on dev nvme0n1p2, logical block 6, lost async page write
blk_update_request: I/O error, dev nvme0n1, sector 113318400 op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
blk_update_request: I/O error, dev nvme0n1, sector 113318144 op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
blk_update_request: I/O error, dev nvme0n1, sector 113317888 op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0

Signed-off-by: Dmitry Monakhov <dmtrmonakhov@yandex-team.ru>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-03-11 11:48:54 +01:00
Lv Yunlong
abec6561fc nvme-rdma: Fix a use after free in nvmet_rdma_write_data_done
In nvmet_rdma_write_data_done, rsp is recoverd by wc->wr_cqe and freed by
nvmet_rdma_release_rsp(). But after that, pr_info() used the freed
chunk's member object and could leak the freed chunk address with
wc->wr_cqe by computing the offset.

Signed-off-by: Lv Yunlong <lyl2019@mail.ustc.edu.cn>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-03-11 11:48:35 +01:00
Chaitanya Kulkarni
0ec84df495 nvme-core: check ctrl css before setting up zns
Ensure multiple Command Sets are supported before starting to setup a
ZNS namespace.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
[hch: move the check around a bit]
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-03-11 11:48:35 +01:00
James Smart
f20ef34d71 nvme-fc: fix racing controller reset and create association
Recent patch to prevent calling __nvme_fc_abort_outstanding_ios in
interrupt context results in a possible race condition. A controller
reset results in errored io completions, which schedules error
work. The change of error work to a work element allows it to fire
after the ctrl state transition to NVME_CTRL_CONNECTING, causing
any outstanding io (used to initialize the controller) to fail and
cause problems for connect_work.

Add a state check to only schedule error work if not in the RESETTING
state.

Fixes: 19fce0470f ("nvme-fc: avoid calling _nvme_fc_abort_outstanding_ios from interrupt context")
Signed-off-by: Nigel Kirkland <nkirkland2304@gmail.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-03-11 11:48:35 +01:00
Hannes Reinecke
ae3afe6308 nvme-fc: return NVME_SC_HOST_ABORTED_CMD when a command has been aborted
When a command has been aborted we should return NVME_SC_HOST_ABORTED_CMD
to be consistent with the other transports.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Daniel Wagner <dwagner@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-03-11 11:48:34 +01:00
Hannes Reinecke
3c7aafbc8d nvme-fc: set NVME_REQ_CANCELLED in nvme_fc_terminate_exchange()
nvme_fc_terminate_exchange() is being called when exchanges are
being deleted, and as such we should be setting the NVME_REQ_CANCELLED
flag to have identical behaviour on all transports.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Daniel Wagner <dwagner@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-03-11 11:48:34 +01:00
Hannes Reinecke
d358938198 nvme: add NVME_REQ_CANCELLED flag in nvme_cancel_request()
NVME_REQ_CANCELLED is translated into -EINTR in nvme_submit_sync_cmd(),
so we should be setting this flags during nvme_cancel_request() to
ensure that the callers to nvme_submit_sync_cmd() will get the correct
error code when the controller is reset.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chao Leng <lengchao@huawei.com>
Reviewed-by: Daniel Wagner <dwagner@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-03-11 11:48:34 +01:00
Hannes Reinecke
d95c1f4179 nvme: simplify error logic in nvme_validate_ns()
We only should remove namespaces when we get fatal error back from
the device or when the namespace IDs have changed.
So instead of painfully masking out error numbers which might indicate
that the error should be ignored we could use an NVME status code
to indicated when the namespace should be removed.
That simplifies the final logic and makes it less error-prone.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Daniel Wagner <dwagner@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-03-11 11:48:34 +01:00
Chaitanya Kulkarni
e6ad55988b nvme: set max_zone_append_sectors nvme_revalidate_zones
The chunk_sectors value affects max_zone_append_sectors.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Tested-by: Kanchan Joshi <joshi.k@samsung.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-03-11 11:48:34 +01:00
Dafna Hirschfeld
2025a48cfd media: rkisp1: params: fix wrong bits settings
The histogram mode is set using 'rkisp1_params_set_bits'.
Only the bits of the mode should be the value argument for
that function. Otherwise bits outside the mode mask are
turned on which is not what was intended.

Fixes: bae1155cf5 ("media: staging: rkisp1: add output device for parameters")
Signed-off-by: Dafna Hirschfeld <dafna.hirschfeld@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
2021-03-11 11:40:29 +01:00
Biju Das
6732f31393 media: v4l: vsp1: Fix uif null pointer access
RZ/G2L SoC has no UIF. This patch fixes null pointer access, when UIF
module is not used.

Fixes: 5e824f989e6e8("media: v4l: vsp1: Integrate DISCOM in display pipeline")
Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
2021-03-11 11:40:29 +01:00
Biju Das
ac8d82f586 media: v4l: vsp1: Fix bru null pointer access
RZ/G2L SoC has only BRS. This patch fixes null pointer access,when only
BRS is enabled.

Fixes: cbb7fa49c7466("media: v4l: vsp1: Rename BRU to BRx")
Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
2021-03-11 11:40:28 +01:00
Maxim Mikityanskiy
8a7e27fd5c media: usbtv: Fix deadlock on suspend
usbtv doesn't support power management, so on system suspend the
.disconnect callback of the driver is called. The teardown sequence
includes a call to snd_card_free. Its implementation waits until the
refcount of the sound card device drops to zero, however, if its file is
open, snd_card_file_add takes a reference, which can't be dropped during
the suspend, because the userspace processes are already frozen at this
point. snd_card_free waits for completion forever, leading to a hang on
suspend.

This commit fixes this deadlock condition by replacing snd_card_free
with snd_card_free_when_closed, that doesn't wait until all references
are released, allowing suspend to progress.

Fixes: 63ddf68de5 ("[media] usbtv: add audio support")
Signed-off-by: Maxim Mikityanskiy <maxtram95@gmail.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
2021-03-11 11:40:28 +01:00
Hans Verkuil
f09f9f93af media: rc: compile rc-cec.c into rc-core
The rc-cec keymap is unusual in that it can't be built as a module,
instead it is registered directly in rc-main.c if CONFIG_MEDIA_CEC_RC
is set. This is because it can be called from drm_dp_cec_set_edid() via
cec_register_adapter() in an asynchronous context, and it is not
allowed to use request_module() to load rc-cec.ko in that case. Trying to
do so results in a 'WARN_ON_ONCE(wait && current_is_async())'.

Since this keymap is only used if CONFIG_MEDIA_CEC_RC is set, we
just compile this keymap into the rc-core module and never as a
separate module.

Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Fixes: 2c6d1fffa1 (drm: add support for DisplayPort CEC-Tunneling-over-AUX)
Reported-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Sean Young <sean@mess.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
2021-03-11 11:40:28 +01:00
Daniel Vetter
de066e1163 drm/compat: Clear bounce structures
Some of them have gaps, or fields we don't clear. Native ioctl code
does full copies plus zero-extends on size mismatch, so nothing can
leak. But compat is more hand-rolled so need to be careful.

None of these matter for performance, so just memset.

Also I didn't fix up the CONFIG_DRM_LEGACY or CONFIG_DRM_AGP ioctl, those
are security holes anyway.

Acked-by: Maxime Ripard <mripard@kernel.org>
Reported-by: syzbot+620cf21140fc7e772a5d@syzkaller.appspotmail.com # vblank ioctl
Cc: syzbot+620cf21140fc7e772a5d@syzkaller.appspotmail.com
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210222100643.400935-1-daniel.vetter@ffwll.ch
(cherry picked from commit e926c474eb)
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
2021-03-11 11:11:33 +01:00
Noralf Trønnes
64e194e278 drm/shmem-helpers: vunmap: Don't put pages for dma-buf
dma-buf importing was reworked in commit 7d2cd72a9a
("drm/shmem-helpers: Simplify dma-buf importing"). Before that commit
drm_gem_shmem_prime_import_sg_table() did set ->pages_use_count=1 and
drm_gem_shmem_vunmap_locked() could call drm_gem_shmem_put_pages()
unconditionally. Now without the use count set, put pages is called also
on dma-bufs. Fix this by only putting pages if it's not imported.

Signed-off-by: Noralf Trønnes <noralf@tronnes.org>
Fixes: 7d2cd72a9a ("drm/shmem-helpers: Simplify dma-buf importing")
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Acked-by: Thomas Zimmermann <tzimmermann@suse.de>
Tested-by: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20210219122203.51130-1-noralf@tronnes.org
(cherry picked from commit cdea72518a)
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
2021-03-11 11:11:33 +01:00
Artem Lapkin
fa0c16caf3 drm: meson_drv add shutdown function
Problem: random stucks on reboot stage about 1/20 stuck/reboots
// debug kernel log
[    4.496660] reboot: kernel restart prepare CMD:(null)
[    4.498114] meson_ee_pwrc c883c000.system-controller:power-controller: shutdown begin
[    4.503949] meson_ee_pwrc c883c000.system-controller:power-controller: shutdown domain 0:VPU...
...STUCK...

Solution: add shutdown function to meson_drm driver
// debug kernel log
[    5.231896] reboot: kernel restart prepare CMD:(null)
[    5.246135] [drm:meson_drv_shutdown]
...
[    5.259271] meson_ee_pwrc c883c000.system-controller:power-controller: shutdown begin
[    5.274688] meson_ee_pwrc c883c000.system-controller:power-controller: shutdown domain 0:VPU...
[    5.338331] reboot: Restarting system
[    5.358293] psci: PSCI_0_2_FN_SYSTEM_RESET reboot_mode:0 cmd:(null)
bl31 reboot reason: 0xd
bl31 reboot reason: 0x0
system cmd  1.
...REBOOT...

Tested: on VIM1 VIM2 VIM3 VIM3L khadas sbcs - 1000+ successful reboots
and Odroid boards, WeTek Play2 (GXBB)

Fixes: bbbe775ec5 ("drm: Add support for Amlogic Meson Graphic Controller")
Signed-off-by: Artem Lapkin <art@khadas.com>
Tested-by: Christian Hewitt <christianshewitt@gmail.com>
Acked-by: Neil Armstrong <narmstrong@baylibre.com>
Acked-by: Kevin Hilman <khilman@baylibre.com>
Signed-off-by: Neil Armstrong <narmstrong@baylibre.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210302042202.3728113-1-art@khadas.com
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
2021-03-11 11:11:33 +01:00
Neil Roberts
11d5a4745e drm/shmem-helper: Don't remove the offset in vm_area_struct pgoff
When mmapping the shmem, it would previously adjust the pgoff in the
vm_area_struct to remove the fake offset that is added to be able to
identify the buffer. This patch removes the adjustment and makes the
fault handler use the vm_fault address to calculate the page offset
instead. Although using this address is apparently discouraged, several
DRM drivers seem to be doing it anyway.

The problem with removing the pgoff is that it prevents
drm_vma_node_unmap from working because that searches the mapping tree
by address. That doesn't work because all of the mappings are at offset
0. drm_vma_node_unmap is being used by the shmem helpers when purging
the buffer.

This fixes a bug in Panfrost which is using drm_gem_shmem_purge. Without
this the mapping for the purged buffer can still be accessed which might
mean it would access random pages from other buffers

v2: Don't check whether the unsigned page_offset is less than 0.

Cc: stable@vger.kernel.org
Fixes: 17acb9f35e ("drm/shmem: Add madvise state and purge helpers")
Signed-off-by: Neil Roberts <nroberts@igalia.com>
Reviewed-by: Steven Price <steven.price@arm.com>
Signed-off-by: Steven Price <steven.price@arm.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210223155125.199577-3-nroberts@igalia.com
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
2021-03-11 11:11:33 +01:00
Neil Roberts
d611b4a090 drm/shmem-helper: Check for purged buffers in fault handler
When a buffer is madvised as not needed and then purged, any attempts to
access the buffer from user-space should cause a bus fault. This patch
adds a check for that.

Cc: stable@vger.kernel.org
Fixes: 17acb9f35e ("drm/shmem: Add madvise state and purge helpers")
Signed-off-by: Neil Roberts <nroberts@igalia.com>
Reviewed-by: Steven Price <steven.price@arm.com>
Signed-off-by: Steven Price <steven.price@arm.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210223155125.199577-2-nroberts@igalia.com
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
2021-03-11 11:11:33 +01:00
Colin Ian King
738acd49eb qxl: Fix uninitialised struct field head.surface_id
The surface_id struct field in head is not being initialized and
static analysis warns that this is being passed through to
dev->monitors_config->heads[i] on an assignment. Clear up this
warning by initializing it to zero.

Addresses-Coverity: ("Uninitialized scalar variable")
Fixes: a6d3c4d798 ("qxl: hook monitors_config updates into crtc, not encoder.")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20210304094928.2280722-1-colin.king@canonical.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
2021-03-11 11:11:33 +01:00
Anthony DeRossi
ca63d76fd2 drm/ttm: Fix TTM page pool accounting
Freed pages are not subtracted from the allocated_pages counter in
ttm_pool_type_fini(), causing a leak in the count on device removal.
The next shrinker invocation loops forever trying to free pages that are
no longer in the pool:

  rcu: INFO: rcu_sched self-detected stall on CPU
  rcu:  3-....: (9998 ticks this GP) idle=54e/1/0x4000000000000000 softirq=434857/434857 fqs=2237
    (t=10001 jiffies g=2194533 q=49211)
  NMI backtrace for cpu 3
  CPU: 3 PID: 1034 Comm: kswapd0 Tainted: P           O      5.11.0-com #1
  Hardware name: System manufacturer System Product Name/PRIME X570-PRO, BIOS 1405 11/19/2019
  Call Trace:
   <IRQ>
   ...
   </IRQ>
   sysvec_apic_timer_interrupt+0x77/0x80
   asm_sysvec_apic_timer_interrupt+0x12/0x20
  RIP: 0010:mutex_unlock+0x16/0x20
  Code: e7 48 8b 70 10 e8 7a 53 77 ff eb aa e8 43 6c ff ff 0f 1f 00 65 48 8b 14 25 00 6d 01 00 31 c9 48 89 d0 f0 48 0f b1 0f 48 39 c2 <74> 05 e9 e3 fe ff ff c3 66 90 48 8b 47 20 48 85 c0 74 0f 8b 50 10
  RSP: 0018:ffffbdb840797be8 EFLAGS: 00000246
  RAX: ffff9ff445a41c00 RBX: ffffffffc02a9ef8 RCX: 0000000000000000
  RDX: ffff9ff445a41c00 RSI: ffffbdb840797c78 RDI: ffffffffc02a9ac0
  RBP: 0000000000000080 R08: 0000000000000000 R09: ffffbdb840797c80
  R10: 0000000000000000 R11: fffffffffffffff5 R12: 0000000000000000
  R13: 0000000000000000 R14: 0000000000000084 R15: ffffffffc02a9a60
   ttm_pool_shrink+0x7d/0x90 [ttm]
   ttm_pool_shrinker_scan+0x5/0x20 [ttm]
   do_shrink_slab+0x13a/0x1a0
...

debugfs shows the incorrect total:

  $ cat /sys/kernel/debug/dri/0/ttm_page_pool
            --- 0--- --- 1--- --- 2--- --- 3--- --- 4--- --- 5--- --- 6--- --- 7--- --- 8--- --- 9--- ---10---
  wc      :        0        0        0        0        0        0        0        0        0        0        0
  uc      :        0        0        0        0        0        0        0        0        0        0        0
  wc 32   :        0        0        0        0        0        0        0        0        0        0        0
  uc 32   :        0        0        0        0        0        0        0        0        0        0        0
  DMA uc  :        0        0        0        0        0        0        0        0        0        0        0
  DMA wc  :        0        0        0        0        0        0        0        0        0        0        0
  DMA     :        0        0        0        0        0        0        0        0        0        0        0

  total   :     3029 of  8244261

Using ttm_pool_type_take() to remove pages from the pool before freeing
them correctly accounts for the freed pages.

Fixes: d099fc8f54 ("drm/ttm: new TT backend allocation pool v3")
Signed-off-by: Anthony DeRossi <ajderossi@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210303011723.22512-1-ajderossi@gmail.com
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
2021-03-11 11:11:33 +01:00
Christian König
d228f8d874 drm/ttm: soften TTM warnings
QXL indeed unrefs pinned BOs and the warnings are spamming peoples log files.

Make sure we warn only once until the QXL driver is fixed.

Signed-off-by: Christian König <christian.koenig@amd.com>
References: https://lore.kernel.org/lkml/YD+eYcMMcdlXB8PY@alley/
Link: https://patchwork.freedesktop.org/patch/422834/
Reviewed-by: Daniel Vetter <daniel@ffwll.ch>
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
2021-03-11 11:11:33 +01:00
Thomas Zimmermann
659ab7a49c drm: Use USB controller's DMA mask when importing dmabufs
USB devices cannot perform DMA and hence have no dma_mask set in their
device structure. Therefore importing dmabuf into a USB-based driver
fails, which breaks joining and mirroring of display in X11.

For USB devices, pick the associated USB controller as attachment device.
This allows the DRM import helpers to perform the DMA setup. If the DMA
controller does not support DMA transfers, we're out of luck and cannot
import. Our current USB-based DRM drivers don't use DMA, so the actual
DMA device is not important.

Tested by joining/mirroring displays of udl and radeon under Gnome/X11.

v8:
	* release dmadev if device initialization fails (Noralf)
	* fix commit description (Noralf)
v7:
	* fix use-before-init bug in gm12u320 (Dan)
v6:
	* implement workaround in DRM drivers and hold reference to
	  DMA device while USB device is in use
	* remove dev_is_usb() (Greg)
	* collapse USB helper into usb_intf_get_dma_device() (Alan)
	* integrate Daniel's TODO statement (Daniel)
	* fix typos (Greg)
v5:
	* provide a helper for USB interfaces (Alan)
	* add FIXME item to documentation and TODO list (Daniel)
v4:
	* implement workaround with USB helper functions (Greg)
	* use struct usb_device->bus->sysdev as DMA device (Takashi)
v3:
	* drop gem_create_object
	* use DMA mask of USB controller, if any (Daniel, Christian, Noralf)
v2:
	* move fix to importer side (Christian, Daniel)
	* update SHMEM and CMA helpers for new PRIME callbacks

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Fixes: 6eb0233ec2 ("usb: don't inherity DMA properties for USB devices")
Tested-by: Pavel Machek <pavel@ucw.cz>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Noralf Trønnes <noralf@tronnes.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: <stable@vger.kernel.org> # v5.10+
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20210303133229.3288-1-tzimmermann@suse.de
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
2021-03-11 11:11:33 +01:00
Pavel Turinský
301469c121 MAINTAINERS: update drm bug reporting URL
The original bugzilla seems to be read-only now, linking to the gitlab
for new bugs.

Signed-off-by: Pavel Turinský <ledoian@kam.mff.cuni.cz>
Cc: trivial@kernel.org
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20210228163658.54962-1-ledoian@kam.mff.cuni.cz
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
2021-03-11 11:11:32 +01:00
Randy Dunlap
b266409310 fbdev: atyfb: use LCD management functions for PPC_PMAC also
Include PPC_PMAC in the configs that use aty_ld_lcd() and
aty_st_lcd() implementations so that the PM code may work
correctly for PPC_PMAC.

Suggested-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: linux-fbdev@vger.kernel.org
Cc: dri-devel@lists.freedesktop.org
Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: David Airlie <airlied@linux.ie>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210226173008.18236-1-rdunlap@infradead.org
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
2021-03-11 11:11:32 +01:00
Randy Dunlap
39a3898abf fbdev: atyfb: always declare aty_{ld,st}_lcd()
The previously added stubs for aty_{ld,}st_lcd() make it
so that these functions are used regardless of the config
options that were guarding them, so remove the #ifdef/#endif
lines and make their declarations always visible.
This fixes build warnings that were reported by clang:

   drivers/video/fbdev/aty/atyfb_base.c:180:6: warning: no previous prototype for function 'aty_st_lcd' [-Wmissing-prototypes]
   void aty_st_lcd(int index, u32 val, const struct atyfb_par *par)
        ^
   drivers/video/fbdev/aty/atyfb_base.c:180:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   void aty_st_lcd(int index, u32 val, const struct atyfb_par *par)

   drivers/video/fbdev/aty/atyfb_base.c:183:5: warning: no previous prototype for function 'aty_ld_lcd' [-Wmissing-prototypes]
   u32 aty_ld_lcd(int index, const struct atyfb_par *par)
       ^
   drivers/video/fbdev/aty/atyfb_base.c:183:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   u32 aty_ld_lcd(int index, const struct atyfb_par *par)

They should not be marked as static since they are used in
mach64_ct.c.

Fixes: bfa5782b9c ("fbdev: atyfb: add stubs for aty_{ld,st}_lcd()")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: kernel test robot <lkp@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: linux-fbdev@vger.kernel.org
Cc: dri-devel@lists.freedesktop.org
Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: David Airlie <airlied@linux.ie>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210224215528.822-1-rdunlap@infradead.org
Acked-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
2021-03-11 11:11:32 +01:00
Gerd Hoffmann
e998d3c8cb drm/qxl: fix lockdep issue in qxl_alloc_release_reserved
Call qxl_bo_unpin (which does a reservation) without holding the
release_mutex lock.  Fixes lockdep (correctly) warning on a possible
deadlock.

Fixes: e8dd3506dc ("drm/qxl: unpin release objects")
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Acked-by: Thomas Zimmermann <tzimmermann@suse.de>
Link: http://patchwork.freedesktop.org/patch/msgid/20210217123213.2199186-5-kraxel@redhat.com
(cherry picked from commit 19089b760e)
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
2021-03-11 11:11:22 +01:00
Gerd Hoffmann
e8dd3506dc drm/qxl: unpin release objects
Balances the qxl_create_bo(..., pinned=true, ...);
call in qxl_release_bo_alloc().

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Acked-by: Thomas Zimmermann <tzimmermann@suse.de>
Link: http://patchwork.freedesktop.org/patch/msgid/20210204145712.1531203-5-kraxel@redhat.com
(cherry picked from commit 65ffea3c6e)
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
2021-03-11 11:08:14 +01:00
Tong Zhang
874a52f9b6 drm/fb-helper: only unmap if buffer not null
drm_fbdev_cleanup() can be called when fb_helper->buffer is null, hence
fb_helper->buffer should be checked before calling
drm_client_buffer_vunmap(). This buffer is also checked in
drm_client_framebuffer_delete(), so we should also do the same thing for
drm_client_buffer_vunmap().

[  199.128742] RIP: 0010:drm_client_buffer_vunmap+0xd/0x20
[  199.129031] Code: 43 18 48 8b 53 20 49 89 45 00 49 89 55 08 5b 44 89 e0 41 5c 41 5d 41 5e 5d
c3 0f 1f 00 53 48 89 fb 48 8d 7f 10 e8 73 7d a1 ff <48> 8b 7b 10 48 8d 73 18 5b e9 75 53 fc ff 0
f 1f 44 00 00 48 b8 00
[  199.130041] RSP: 0018:ffff888103f3fc88 EFLAGS: 00010282
[  199.130329] RAX: 0000000000000001 RBX: 0000000000000000 RCX: ffffffff8214d46d
[  199.130733] RDX: 1ffffffff079c6b9 RSI: 0000000000000246 RDI: ffffffff83ce35c8
[  199.131119] RBP: ffff888103d25458 R08: 0000000000000001 R09: fffffbfff0791761
[  199.131505] R10: ffffffff83c8bb07 R11: fffffbfff0791760 R12: 0000000000000000
[  199.131891] R13: ffff888103d25468 R14: ffff888103d25418 R15: ffff888103f18120
[  199.132277] FS:  00007f36fdcbb6a0(0000) GS:ffff88815b400000(0000) knlGS:0000000000000000
[  199.132721] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  199.133033] CR2: 0000000000000010 CR3: 0000000103d26000 CR4: 00000000000006f0
[  199.133420] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  199.133807] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  199.134195] Call Trace:
[  199.134333]  drm_fbdev_cleanup+0x179/0x1a0
[  199.134562]  drm_fbdev_client_unregister+0x2b/0x40
[  199.134828]  drm_client_dev_unregister+0xa8/0x180
[  199.135088]  drm_dev_unregister+0x61/0x110
[  199.135315]  mgag200_pci_remove+0x38/0x52 [mgag200]
[  199.135586]  pci_device_remove+0x62/0xe0
[  199.135806]  device_release_driver_internal+0x148/0x270
[  199.136094]  driver_detach+0x76/0xe0
[  199.136294]  bus_remove_driver+0x7e/0x100
[  199.136521]  pci_unregister_driver+0x28/0xf0
[  199.136759]  __x64_sys_delete_module+0x268/0x300
[  199.137016]  ? __ia32_sys_delete_module+0x300/0x300
[  199.137285]  ? call_rcu+0x3e4/0x580
[  199.137481]  ? fpregs_assert_state_consistent+0x4d/0x60
[  199.137767]  ? exit_to_user_mode_prepare+0x2f/0x130
[  199.138037]  do_syscall_64+0x33/0x40
[  199.138237]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[  199.138517] RIP: 0033:0x7f36fdc3dcf7

Signed-off-by: Tong Zhang <ztong0001@gmail.com>
Fixes: 763aea17bf ("drm/fb-helper: Unmap client buffer during shutdown")
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Maxime Ripard <mripard@kernel.org>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: dri-devel@lists.freedesktop.org
Cc: <stable@vger.kernel.org> # v5.11+
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20210228044625.171151-1-ztong0001@gmail.com
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
2021-03-11 11:08:14 +01:00
Masahiro Yamada
285a65f1a1 kbuild: remove meaningless parameter to $(call if_changed_rule,dtc)
This is a remnant of commit 78046fabe6 ("kbuild: determine the output
format of DTC by the target suffix").

The parameter "yaml" is meaningless because cmd_dtc no loner takes $(2).

Reported-by: Rob Herring <robh@kernel.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2021-03-11 18:22:48 +09:00
Greg Kroah-Hartman
a949b9eab0 Merge tag 'usb-serial-5.12-rc3' of https://git.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial into usb-linus
Johan writes:

USB-serial fixes for 5.12-rc3

Here's a fix for a long-standing memory leak after probe failure in
io_edgeport and a fix for a NULL-deref on disconnect in the new xr
driver.

Included are also some new device ids.

All have been in linux-next with no reported issues.

* tag 'usb-serial-5.12-rc3' of https://git.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial:
  USB: serial: io_edgeport: fix memory leak in edge_startup
  USB: serial: ch341: add new Product ID
  USB: serial: xr: fix NULL-deref on disconnect
  USB: serial: cp210x: add some more GE USB IDs
  USB: serial: cp210x: add ID for Acuity Brands nLight Air Adapter
2021-03-11 09:53:34 +01:00
Masahiro Yamada
4c273d23c4 kbuild: remove LLVM=1 test from HAS_LTO_CLANG
As Documentation/kbuild/llvm.rst notes, LLVM=1 switches the default of
tools, but you can still override CC, LD, etc. individually. This LLVM=1
check is unneeded because each tool is already checked separately.

"make CC=clang LD=ld.lld NM=llvm-nm AR=llvm-ar LLVM_IAS=1 menuconfig"
should be able to enable Clang LTO.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
2021-03-11 14:52:55 +09:00
Masahiro Yamada
64bfc99429 kbuild: remove unneeded -O option to dtc
This piece of code converts the target suffix to the dtc -O option:

    *.dtb      ->  -O dtb
    *.dt.yaml  ->  -O yaml

Commit ce88c9c794 ("kbuild: Add support to build overlays (%.dtbo)")
added the third case:

    *.dtbo     ->  -O dtbo

This works thanks to commit 163f0469bf2e ("dtc: Allow overlays to have
.dtbo extension") in the upstream DTC, which has already been pulled in
the kernel.

However, I think it is a bit odd because "dtbo" is not a format name.
At least, it does not show up in the help message of dtc.

$ scripts/dtc/dtc --help
  [ snip ]
  -O, --out-format <arg>
        Output formats are:
                dts - device tree source text
                dtb - device tree blob
                yaml - device tree encoded as YAML
                asm - assembler source

So, I am not a big fan of the second hunk of that change:

        } else if (streq(outform, "dtbo")) {
                dt_to_blob(outf, dti, outversion);

Anyway, we did not need to do this in Makefile in the first place.

guess_type_by_name() had already understood ".yaml" before commit
4f0e3a57d6 ("kbuild: Add support for DT binding schema checks"),
and now does ".dtbo" as well.

Makefile does not need to duplicate the same logic. Let's leave it
to dtc.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Rob Herring <robh@kernel.org>
2021-03-11 14:52:54 +09:00
Masahiro Yamada
f9bc754be4 kbuild: dummy-tools: adjust to scripts/cc-version.sh
Commit aec6c60a01 ("kbuild: check the minimum compiler version in
Kconfig") changed how the script detects the compiler version.

Get 'make CROSS_COMPILE=scripts/dummy-tools/' back working again.

Fixes: aec6c60a01 ("kbuild: check the minimum compiler version in Kconfig")
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Acked-by: Miguel Ojeda <ojeda@kernel.org>
2021-03-11 14:52:54 +09:00
Sami Tolvanen
bf3c255150 kbuild: Allow LTO to be selected with KASAN_HW_TAGS
While LTO with KASAN is normally not useful, hardware tag-based KASAN
can be used also in production kernels with ARM64_MTE. Therefore, allow
KASAN_HW_TAGS to be selected together with HAS_LTO_CLANG.

Reported-by: Alistair Delva <adelva@google.com>
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2021-03-11 14:40:50 +09:00
Jiri Slaby
2eab791f94 kbuild: dummy-tools: support MPROFILE_KERNEL checks for ppc
ppc64le checks for -mprofile-kernel to define MPROFILE_KERNEL Kconfig.
Kconfig calls arch/powerpc/tools/gcc-check-mprofile-kernel.sh for that
purpose. This script performs two checks:
1) build with -mprofile-kernel should contain "_mcount"
2) build with -mprofile-kernel with a function marked as "notrace"
   should not produce "_mcount"

So support this in dummy-tools' gcc, so that we have MPROFILE_KERNEL
always true.

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2021-03-11 14:40:50 +09:00
Masahiro Yamada
ce6ed1c4c9 kbuild: rebuild GCC plugins when the compiler is upgraded
Linus reported a build error due to the GCC plugin incompatibility
when the compiler is upgraded. [1]

GCC plugins are tied to a particular GCC version. So, they must be
rebuilt when the compiler is upgraded.

This seems to be a long-standing flaw since the initial support of
GCC plugins.

Extend commit 8b59cd81dc ("kbuild: ensure full rebuild when the
compiler is updated"), so that GCC plugins are covered by the
compiler upgrade detection.

[1]: https://lore.kernel.org/lkml/CAHk-=wieoN5ttOy7SnsGwZv+Fni3R6m-Ut=oxih6bbZ28G+4dw@mail.gmail.com/

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
2021-03-11 14:40:50 +09:00
Jan Beulich
f1d20d8643 Xen/gntdev: don't needlessly use kvcalloc()
Requesting zeroed memory when all of it will be overwritten subsequently
by all ones is a waste of processing bandwidth. In fact, rather than
recording zeroed ->grants[], fill that array too with more appropriate
"invalid" indicators.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/9a726be2-4893-8ffe-0ef1-b70dd1c229b1@suse.com
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2021-03-10 16:39:29 -06:00
Jan Beulich
bce21a2b48 Xen/gnttab: introduce common INVALID_GRANT_{HANDLE,REF}
It's not helpful if every driver has to cook its own. Generalize
xenbus'es INVALID_GRANT_HANDLE and pcifront's INVALID_GRANT_REF (which
shouldn't have expanded to zero to begin with). Use the constants in
p2m.c and gntdev.c right away, and update field types where necessary so
they would match with the constants' types (albeit without touching
struct ioctl_gntdev_grant_ref's ref field, as that's part of the public
interface of the kernel and would require introducing a dependency on
Xen's grant_table.h public header).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/db7c38a5-0d75-d5d1-19de-e5fe9f0b9c48@suse.com
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2021-03-10 16:39:29 -06:00
Jan Beulich
36caa3fedf Xen/gntdev: don't needlessly allocate k{,un}map_ops[]
They're needed only in the not-auto-translate (i.e. PV) case; there's no
point in allocating memory that's never going to get accessed.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/180d50cb-5531-8952-4bf0-d65c554638ed@suse.com
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2021-03-10 16:39:29 -06:00
Jan Beulich
0f9b05b9a0 Xen: drop exports of {set,clear}_foreign_p2m_mapping()
They're only used internally, and the layering violation they contain
(x86) or imply (Arm) of calling HYPERVISOR_grant_table_op() strongly
advise against any (uncontrolled) use from a module. The functions also
never had users except the ones from drivers/xen/grant-table.c forever
since their introduction in 3.15.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Link: https://lore.kernel.org/r/746a5cd6-1446-eda4-8b23-03c1cac30b8d@suse.com
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2021-03-10 16:39:29 -06:00
Juergen Gross
b6622798bc xen/events: avoid handling the same event on two cpus at the same time
When changing the cpu affinity of an event it can happen today that
(with some unlucky timing) the same event will be handled on the old
and the new cpu at the same time.

Avoid that by adding an "event active" flag to the per-event data and
call the handler only if this flag isn't set.

Cc: stable@vger.kernel.org
Reported-by: Julien Grall <julien@xen.org>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
Link: https://lore.kernel.org/r/20210306161833.4552-4-jgross@suse.com
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2021-03-10 16:39:29 -06:00
Juergen Gross
25da4618af xen/events: don't unmask an event channel when an eoi is pending
An event channel should be kept masked when an eoi is pending for it.
When being migrated to another cpu it might be unmasked, though.

In order to avoid this keep three different flags for each event channel
to be able to distinguish "normal" masking/unmasking from eoi related
masking/unmasking and temporary masking. The event channel should only
be able to generate an interrupt if all flags are cleared.

Cc: stable@vger.kernel.org
Fixes: 54c9de8989 ("xen/events: add a new "late EOI" evtchn framework")
Reported-by: Julien Grall <julien@xen.org>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Tested-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Link: https://lore.kernel.org/r/20210306161833.4552-3-jgross@suse.com

[boris -- corrected Fixed tag format]

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2021-03-10 16:37:29 -06:00
Alex Deucher
a5cb3c1a36 drm/amdgpu: fix S0ix handling when the CONFIG_AMD_PMC=m
Need to check the module variant as well.

Acked-by: Prike Liang <Prike.Liang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2021-03-10 16:23:26 -05:00
Christian König
cba2afb65c drm/radeon: fix AGP dependency
When AGP is compiled as module radeon must be compiled as module as
well.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-03-10 16:22:45 -05:00
Christian König
a25955ba12 drm/radeon: also init GEM funcs in radeon_gem_prime_import_sg_table
Otherwise we will run into a NULL ptr deref.

Signed-off-by: Christian König <christian.koenig@amd.com>
Bug: https://bugzilla.kernel.org/show_bug.cgi?id=212137
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 5.11.x
2021-03-10 16:22:03 -05:00
Evan Quan
48123d068f drm/amd/pm: correct the watermark settings for Polaris
The "/ 10" should be applied to the right-hand operand instead of
the left-hand one.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Noticed-by: Georgios Toptsidis <gtoptsid@gmail.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2021-03-10 16:21:21 -05:00
Kenneth Feng
50ceb1fe7a drm/amd/pm: bug fix for pcie dpm
Currently the pcie dpm has two problems.
1. Only the high dpm level speed/width can be overrided
if the requested values are out of the pcie capability.
2. The high dpm level is always overrided though sometimes
it's not necesarry.

Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2021-03-10 16:20:34 -05:00
Nirmoy Das
521f04f9e3 drm/amdgpu: fb BO should be ttm_bo_type_device
FB BO should not be ttm_bo_type_kernel type and
amdgpufb_create_pinned_object() pins the FB BO anyway.

Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-03-10 16:19:47 -05:00
Zhan Liu
eda29602f1 drm/amdgpu/display: Use wm_table.entries for dcn301 calculate_wm
[Why]
For DGPU Navi, the wm_table.nv_entries are used. These entires are not
populated for DCN301 Vangogh APU, but instead wm_table.entries are.

[How]
Use DCN21 Renoir style wm calculations.

Signed-off-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Zhan Liu <zhan.liu@amd.com>
Reviewed-by: Dmytro Laktyushkin <Dmytro.Laktyushkin@amd.com>
Acked-by: Zhan Liu <zhan.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-03-10 16:19:27 -05:00
Dillon Varone
d2c9128595 drm/amd/display: Enabled pipe harvesting in dcn30
[Why & How]
Ported logic from dcn21 for reading in pipe fusing to dcn30.
Supported configurations are 1 and 6 pipes. Invalid fusing
will revert to 1 pipe being enabled.

Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Dillon Varone <dillon.varone@amd.com>
Reviewed-by: Jun Lei <Jun.Lei@amd.com>
Acked-by: Eryk Brol <eryk.brol@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-03-10 16:17:27 -05:00
Sung Lee
b0075d114c drm/amd/display: Revert dram_clock_change_latency for DCN2.1
[WHY & HOW]
Using values provided by DF for latency may cause hangs in
multi display configurations. Revert change to previous value.

Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Sung Lee <sung.lee@amd.com>
Reviewed-by: Haonan Wang <Haonan.Wang2@amd.com>
Acked-by: Eryk Brol <eryk.brol@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-03-10 16:16:27 -05:00
Linus Torvalds
a74e6a014c Merge tag 's390-5.12-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Heiko Carstens:

 - fix various user space visible copy_to_user() instances which return
   the number of bytes left to copy instead of -EFAULT

 - make TMPFS_INODE64 available again for s390 and alpha, now that both
   architectures have been switched to 64-bit ino_t (see commit
   96c0a6a72d: "s390,alpha: switch to 64-bit ino_t")

 - make sure to release a shared hypervisor resource within the zcore
   device driver also on restart and power down; also remove unneeded
   surrounding debugfs_create return value checks

 - for the new hardware counter set device driver rename the uapi header
   file to be a bit more generic; also remove 60 second read limit which
   is not really necessary and without the limit the interface can be
   easier tested

 - some small cleanups, the largest being to convert all long long in
   our time and idle code to longs

 - update defconfigs

* tag 's390-5.12-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390: remove IBM_PARTITION and CONFIGFS_FS from zfcpdump defconfig
  s390: update defconfigs
  s390,alpha: make TMPFS_INODE64 available again
  s390/cio: return -EFAULT if copy_to_user() fails
  s390/tty3270: avoid comma separated statements
  s390/cpumf: remove unneeded semicolon
  s390/crypto: return -EFAULT if copy_to_user() fails
  s390/cio: return -EFAULT if copy_to_user() fails
  s390/cpumf: rename header file to hwctrset.h
  s390/zcore: release dump save area on restart or power down
  s390/zcore: no need to check return value of debugfs_create functions
  s390/cpumf: remove 60 seconds read limit
  s390/topology: remove always false if check
  s390/time,idle: get rid of unsigned long long
2021-03-10 13:15:16 -08:00
Qingqing Zhuo
7afa0033d6 drm/amd/display: Enable pflip interrupt upon pipe enable
[Why]
pflip interrupt would not be enabled promptly if a pipe is disabled
and re-enabled, causing flip_done timeout error during DP
compliance tests

[How]
Enable pflip interrupt upon pipe enablement

Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Qingqing Zhuo <qingqing.zhuo@amd.com>
Reviewed-by: Nicholas Kazlauskas <Nicholas.Kazlauskas@amd.com>
Acked-by: Eryk Brol <eryk.brol@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-03-10 16:14:36 -05:00
Holger Hoffstätte
680174cfd1 drm/amdgpu/display: use GFP_ATOMIC in dcn21_validate_bandwidth_fp()
After fixing nested FPU contexts caused by 41401ac677 we're still seeing
complaints about spurious kernel_fpu_end(). As it turns out this was
already fixed for dcn20 in commit f41ed88cbd ("drm/amdgpu/display:
use GFP_ATOMIC in dcn20_validate_bandwidth_internal") but never moved
forward to dcn21.

Signed-off-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2021-03-10 16:13:13 -05:00
Holger Hoffstätte
15e8b95d5f drm/amd/display: Fix nested FPU context in dcn21_validate_bandwidth()
Commit 41401ac677 added FPU wrappers to dcn21_validate_bandwidth(),
which was correct. Unfortunately a nested function alredy contained
DC_FP_START()/DC_FP_END() calls, which results in nested FPU context
enter/exit and complaints by kernel_fpu_begin_mask().
This can be observed e.g. with 5.10.20, which backported 41401ac677
and now emits the following warning on boot:

WARNING: CPU: 6 PID: 858 at arch/x86/kernel/fpu/core.c:129 kernel_fpu_begin_mask+0xa5/0xc0
Call Trace:
 dcn21_calculate_wm+0x47/0xa90 [amdgpu]
 dcn21_validate_bandwidth_fp+0x15d/0x2b0 [amdgpu]
 dcn21_validate_bandwidth+0x29/0x40 [amdgpu]
 dc_validate_global_state+0x3c7/0x4c0 [amdgpu]

The warning is emitted due to the additional DC_FP_START/END calls in
patch_bounding_box(), which is inlined into dcn21_calculate_wm(),
its only caller. Removing the calls brings the code in line with
dcn20 and makes the warning disappear.

Fixes: 41401ac677 ("drm/amd/display: Add FPU wrappers to dcn21_validate_bandwidth()")
Signed-off-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2021-03-10 16:12:44 -05:00
Takashi Iwai
7a46f05e5e drm/amd/display: Add a backlight module option
There seem devices that don't work with the aux channel backlight
control.  For allowing such users to test with the other backlight
control method, provide a new module option, aux_backlight, to specify
enabling or disabling the aux backport support explicitly.  As
default, the aux support is detected by the hardware capability.

v2: make the backlight option generic in case we add future
backlight types (Alex)

BugLink: https://bugzilla.opensuse.org/show_bug.cgi?id=1180749
BugLink: https://gitlab.freedesktop.org/drm/amd/-/issues/1438
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2021-03-10 16:12:15 -05:00
Alex Deucher
0ad3e64eb4 drm/amdgpu/display: handle aux backlight in backlight_get_brightness
Need to fetch it via aux.

Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2021-03-10 16:11:44 -05:00
Alex Deucher
dfd8b7fbd9 drm/amdgpu/display: don't assert in set backlight function
It just spams the logs.

Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2021-03-10 16:11:23 -05:00
Alex Deucher
a2f8d98869 drm/amdgpu/display: simplify backlight setting
Avoid the extra wrapper function.

Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2021-03-10 16:11:01 -05:00
Shuah Khan
46613c9dfa usbip: fix vudc usbip_sockfd_store races leading to gpf
usbip_sockfd_store() is invoked when user requests attach (import)
detach (unimport) usb gadget device from usbip host. vhci_hcd sends
import request and usbip_sockfd_store() exports the device if it is
free for export.

Export and unexport are governed by local state and shared state
- Shared state (usbip device status, sockfd) - sockfd and Device
  status are used to determine if stub should be brought up or shut
  down. Device status is shared between host and client.
- Local state (tcp_socket, rx and tx thread task_struct ptrs)
  A valid tcp_socket controls rx and tx thread operations while the
  device is in exported state.
- While the device is exported, device status is marked used and socket,
  sockfd, and thread pointers are valid.

Export sequence (stub-up) includes validating the socket and creating
receive (rx) and transmit (tx) threads to talk to the client to provide
access to the exported device. rx and tx threads depends on local and
shared state to be correct and in sync.

Unexport (stub-down) sequence shuts the socket down and stops the rx and
tx threads. Stub-down sequence relies on local and shared states to be
in sync.

There are races in updating the local and shared status in the current
stub-up sequence resulting in crashes. These stem from starting rx and
tx threads before local and global state is updated correctly to be in
sync.

1. Doesn't handle kthread_create() error and saves invalid ptr in local
   state that drives rx and tx threads.
2. Updates tcp_socket and sockfd,  starts stub_rx and stub_tx threads
   before updating usbip_device status to SDEV_ST_USED. This opens up a
   race condition between the threads and usbip_sockfd_store() stub up
   and down handling.

Fix the above problems:
- Stop using kthread_get_run() macro to create/start threads.
- Create threads and get task struct reference.
- Add kthread_create() failure handling and bail out.
- Hold usbip_device lock to update local and shared states after
  creating rx and tx threads.
- Update usbip_device status to SDEV_ST_USED.
- Update usbip_device tcp_socket, sockfd, tcp_rx, and tcp_tx
- Start threads after usbip_device (tcp_socket, sockfd, tcp_rx, tcp_tx,
  and status) is complete.

Credit goes to syzbot and Tetsuo Handa for finding and root-causing the
kthread_get_run() improper error handling problem and others. This is a
hard problem to find and debug since the races aren't seen in a normal
case. Fuzzing forces the race window to be small enough for the
kthread_get_run() error path bug and starting threads before updating the
local and shared state bug in the stub-up sequence.

Fixes: 9720b4bc76 ("staging/usbip: convert to kthread")
Cc: stable@vger.kernel.org
Reported-by: syzbot <syzbot+a93fba6d384346a761e3@syzkaller.appspotmail.com>
Reported-by: syzbot <syzbot+bf1a360e305ee719e364@syzkaller.appspotmail.com>
Reported-by: syzbot <syzbot+95ce4b142579611ef0a9@syzkaller.appspotmail.com>
Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Link: https://lore.kernel.org/r/b1c08b983ffa185449c9f0f7d1021dc8c8454b60.1615171203.git.skhan@linuxfoundation.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-10 19:32:34 +01:00
Shuah Khan
718ad9693e usbip: fix vhci_hcd attach_store() races leading to gpf
attach_store() is invoked when user requests import (attach) a device
from usbip host.

Attach and detach are governed by local state and shared state
- Shared state (usbip device status) - Device status is used to manage
  the attach and detach operations on import-able devices.
- Local state (tcp_socket, rx and tx thread task_struct ptrs)
  A valid tcp_socket controls rx and tx thread operations while the
  device is in exported state.
- Device has to be in the right state to be attached and detached.

Attach sequence includes validating the socket and creating receive (rx)
and transmit (tx) threads to talk to the host to get access to the
imported device. rx and tx threads depends on local and shared state to
be correct and in sync.

Detach sequence shuts the socket down and stops the rx and tx threads.
Detach sequence relies on local and shared states to be in sync.

There are races in updating the local and shared status in the current
attach sequence resulting in crashes. These stem from starting rx and
tx threads before local and global state is updated correctly to be in
sync.

1. Doesn't handle kthread_create() error and saves invalid ptr in local
   state that drives rx and tx threads.
2. Updates tcp_socket and sockfd,  starts stub_rx and stub_tx threads
   before updating usbip_device status to VDEV_ST_NOTASSIGNED. This opens
   up a race condition between the threads, port connect, and detach
   handling.

Fix the above problems:
- Stop using kthread_get_run() macro to create/start threads.
- Create threads and get task struct reference.
- Add kthread_create() failure handling and bail out.
- Hold vhci and usbip_device locks to update local and shared states after
  creating rx and tx threads.
- Update usbip_device status to VDEV_ST_NOTASSIGNED.
- Update usbip_device tcp_socket, sockfd, tcp_rx, and tcp_tx
- Start threads after usbip_device (tcp_socket, sockfd, tcp_rx, tcp_tx,
  and status) is complete.

Credit goes to syzbot and Tetsuo Handa for finding and root-causing the
kthread_get_run() improper error handling problem and others. This is
hard problem to find and debug since the races aren't seen in a normal
case. Fuzzing forces the race window to be small enough for the
kthread_get_run() error path bug and starting threads before updating the
local and shared state bug in the attach sequence.
- Update usbip_device tcp_rx and tcp_tx pointers holding vhci and
  usbip_device locks.

Tested with syzbot reproducer:
- https://syzkaller.appspot.com/text?tag=ReproC&x=14801034d00000

Fixes: 9720b4bc76 ("staging/usbip: convert to kthread")
Cc: stable@vger.kernel.org
Reported-by: syzbot <syzbot+a93fba6d384346a761e3@syzkaller.appspotmail.com>
Reported-by: syzbot <syzbot+bf1a360e305ee719e364@syzkaller.appspotmail.com>
Reported-by: syzbot <syzbot+95ce4b142579611ef0a9@syzkaller.appspotmail.com>
Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Link: https://lore.kernel.org/r/bb434bd5d7a64fbec38b5ecfb838a6baef6eb12b.1615171203.git.skhan@linuxfoundation.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-10 19:32:34 +01:00
Shuah Khan
9380afd6df usbip: fix stub_dev usbip_sockfd_store() races leading to gpf
usbip_sockfd_store() is invoked when user requests attach (import)
detach (unimport) usb device from usbip host. vhci_hcd sends import
request and usbip_sockfd_store() exports the device if it is free
for export.

Export and unexport are governed by local state and shared state
- Shared state (usbip device status, sockfd) - sockfd and Device
  status are used to determine if stub should be brought up or shut
  down.
- Local state (tcp_socket, rx and tx thread task_struct ptrs)
  A valid tcp_socket controls rx and tx thread operations while the
  device is in exported state.
- While the device is exported, device status is marked used and socket,
  sockfd, and thread pointers are valid.

Export sequence (stub-up) includes validating the socket and creating
receive (rx) and transmit (tx) threads to talk to the client to provide
access to the exported device. rx and tx threads depends on local and
shared state to be correct and in sync.

Unexport (stub-down) sequence shuts the socket down and stops the rx and
tx threads. Stub-down sequence relies on local and shared states to be
in sync.

There are races in updating the local and shared status in the current
stub-up sequence resulting in crashes. These stem from starting rx and
tx threads before local and global state is updated correctly to be in
sync.

1. Doesn't handle kthread_create() error and saves invalid ptr in local
   state that drives rx and tx threads.
2. Updates tcp_socket and sockfd,  starts stub_rx and stub_tx threads
   before updating usbip_device status to SDEV_ST_USED. This opens up a
   race condition between the threads and usbip_sockfd_store() stub up
   and down handling.

Fix the above problems:
- Stop using kthread_get_run() macro to create/start threads.
- Create threads and get task struct reference.
- Add kthread_create() failure handling and bail out.
- Hold usbip_device lock to update local and shared states after
  creating rx and tx threads.
- Update usbip_device status to SDEV_ST_USED.
- Update usbip_device tcp_socket, sockfd, tcp_rx, and tcp_tx
- Start threads after usbip_device (tcp_socket, sockfd, tcp_rx, tcp_tx,
  and status) is complete.

Credit goes to syzbot and Tetsuo Handa for finding and root-causing the
kthread_get_run() improper error handling problem and others. This is a
hard problem to find and debug since the races aren't seen in a normal
case. Fuzzing forces the race window to be small enough for the
kthread_get_run() error path bug and starting threads before updating the
local and shared state bug in the stub-up sequence.

Tested with syzbot reproducer:
- https://syzkaller.appspot.com/text?tag=ReproC&x=14801034d00000

Fixes: 9720b4bc76 ("staging/usbip: convert to kthread")
Cc: stable@vger.kernel.org
Reported-by: syzbot <syzbot+a93fba6d384346a761e3@syzkaller.appspotmail.com>
Reported-by: syzbot <syzbot+bf1a360e305ee719e364@syzkaller.appspotmail.com>
Reported-by: syzbot <syzbot+95ce4b142579611ef0a9@syzkaller.appspotmail.com>
Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Link: https://lore.kernel.org/r/268a0668144d5ff36ec7d87fdfa90faf583b7ccc.1615171203.git.skhan@linuxfoundation.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-10 19:32:34 +01:00
Shuah Khan
6801854be9 usbip: fix vudc to check for stream socket
Fix usbip_sockfd_store() to validate the passed in file descriptor is
a stream socket. If the file descriptor passed was a SOCK_DGRAM socket,
sock_recvmsg() can't detect end of stream.

Cc: stable@vger.kernel.org
Suggested-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Link: https://lore.kernel.org/r/387a670316002324113ac7ea1e8b53f4085d0c95.1615171203.git.skhan@linuxfoundation.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-10 19:32:34 +01:00
Shuah Khan
f55a057169 usbip: fix vhci_hcd to check for stream socket
Fix attach_store() to validate the passed in file descriptor is a
stream socket. If the file descriptor passed was a SOCK_DGRAM socket,
sock_recvmsg() can't detect end of stream.

Cc: stable@vger.kernel.org
Suggested-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Link: https://lore.kernel.org/r/52712aa308915bda02cece1589e04ee8b401d1f3.1615171203.git.skhan@linuxfoundation.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-10 19:32:33 +01:00
Shuah Khan
47ccc8fc2c usbip: fix stub_dev to check for stream socket
Fix usbip_sockfd_store() to validate the passed in file descriptor is
a stream socket. If the file descriptor passed was a SOCK_DGRAM socket,
sock_recvmsg() can't detect end of stream.

Cc: stable@vger.kernel.org
Suggested-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Link: https://lore.kernel.org/r/e942d2bd03afb8e8552bd2a5d84e18d17670d521.1615171203.git.skhan@linuxfoundation.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-10 19:32:33 +01:00
Linus Torvalds
9b1ea29bc0 Revert "mm, slub: consider rest of partial list if acquire_slab() fails"
This reverts commit 8ff60eb052.

The kernel test robot reports a huge performance regression due to the
commit, and the reason seems fairly straightforward: when there is
contention on the page list (which is what causes acquire_slab() to
fail), we do _not_ want to just loop and try again, because that will
transfer the contention to the 'n->list_lock' spinlock we hold, and
just make things even worse.

This is admittedly likely a problem only on big machines - the kernel
test robot report comes from a 96-thread dual socket Intel Xeon Gold
6252 setup, but the regression there really is quite noticeable:

   -47.9% regression of stress-ng.rawpkt.ops_per_sec

and the commit that was marked as being fixed (7ced371971: "slub:
Acquire_slab() avoid loop") actually did the loop exit early very
intentionally (the hint being that "avoid loop" part of that commit
message), exactly to avoid this issue.

The correct thing to do may be to pick some kind of reasonable middle
ground: instead of breaking out of the loop on the very first sign of
contention, or trying over and over and over again, the right thing may
be to re-try _once_, and then give up on the second failure (or pick
your favorite value for "once"..).

Reported-by: kernel test robot <oliver.sang@intel.com>
Link: https://lore.kernel.org/lkml/20210301080404.GF12822@xsang-OptiPlex-9020/
Cc: Jann Horn <jannh@google.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-03-10 10:18:04 -08:00
Linus Torvalds
d3110f256d Merge tag 'for-linus-2021-03-10' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux
Pull detached mounts fix from Christian Brauner:
 "Creating a series of detached mounts, attaching them to the
  filesystem, and unmounting them can be used to trigger an integer
  overflow in ns->mounts causing the kernel to block any new mounts in
  count_mounts() and returning ENOSPC because it falsely assumes that
  the maximum number of mounts in the mount namespace has been reached,
  i.e. it thinks it can't fit the new mounts into the mount namespace
  anymore.

  Without this fix heavy use of the new mount API with move_mount() will
  cause the host to become unuseable and thus blocks some xfstest
  patches I want to resend.

  Depending on the number of mounts in your system, this can be
  reproduced on any kernel that supportes open_tree() and move_mount().

  A reproducer has been sent for inclusion with xfstests. It takes care
  to do this in another mount namespace, not in the host's mount
  namespace so there shouldn't be any risk in running it but if one did
  run it on the host it would require a reboot in order to be able to
  mount again. See

      https://lore.kernel.org/fstests/20210309121041.753359-1-christian.brauner@ubuntu.com

  The root cause of this is that detached mounts aren't handled
  correctly when source and target mount are identical and reside on a
  shared mount causing a broken mount tree where the detached source
  itself is propagated which propagation prevents for regular
  bind-mounts and new mounts.

  This ultimately leads to a miscalculation of the number of mounts in
  the mount namespace.

  Detached mounts created via 'open_tree(fd, path, OPEN_TREE_CLONE)' are
  essentially like an unattached bind-mount. They can then later on be
  attached to the filesystem via move_mount() which calls into
  attach_recursive_mount().

  Part of attaching it to the filesystem is making sure that mounts get
  correctly propagated in case the destination mountpoint is MS_SHARED,
  i.e. is a shared mountpoint. This is done by calling into
  propagate_mnt() which walks the list of peers calling propagate_one()
  on each mount in this list making sure it receives the propagation
  event. The propagate_one() function thereby skips both new mounts and
  bind mounts to not propagate them "into themselves". Both are
  identified by checking whether the mount is already attached to any
  mount namespace in mnt->mnt_ns. The is what the IS_MNT_NEW() helper is
  responsible for.

  However, detached mounts have an anonymous mount namespace attached to
  them stashed in mnt->mnt_ns which means that IS_MNT_NEW() doesn't
  realize they need to be skipped causing the mount to propagate "into
  itself" breaking the mount table and causing a disconnect between the
  number of mounts recorded as being beneath or reachable from the
  target mountpoint and the number of mounts actually recorded/counted
  in ns->mounts ultimately causing an overflow which in turn prevents
  any new mounts via the ENOSPC issue.

  So teach propagation to handle detached mounts by making it aware of
  them. I've been tracking this issue down for the last couple of days
  and then verifying that the fix is correct by unmounting everything in
  my current mount table leaving only /proc and /sys mounted and running
  the reproducer above overnight verifying the number of mounts counted
  in ns->mounts. With this fix the counts are correct and the ENOSPC
  issue can't be reproduced.

  This change will only have an effect on mounts created with the new
  mount API since detached mounts cannot be created with the old mount
  API so regressions are extremely unlikely.

  Here's an illustration:

    #### mount():
    ubuntu@f1-vm:~$ sudo mount --bind /mnt/ /mnt/
    ubuntu@f1-vm:~$ findmnt  | grep -i mnt
    ├─/mnt                                /dev/sda2[/mnt] ext4       rw,relatime

    #### open_tree(OPEN_TREE_CLONE) + move_mount() with bug:
    ubuntu@f1-vm:~$ sudo ./mount-new /mnt/ /mnt/
    ubuntu@f1-vm:~$ findmnt  | grep -i mnt
    ├─/mnt                                /dev/sda2[/mnt] ext4       rw,relatime
    │ └─/mnt                              /dev/sda2[/mnt] ext4       rw,relatime

    #### open_tree(OPEN_TREE_CLONE) + move_mount() with the fix:
    ubuntu@f1-vm:~$ sudo ./mount-new /mnt /mnt
    ubuntu@f1-vm:~$ findmnt | grep -i mnt
    └─/mnt                                /dev/sda2[/mnt] ext4       rw,relatime"

* tag 'for-linus-2021-03-10' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
  mount: fix mounting of detached mounts onto targets that reside on shared mounts
2021-03-10 10:01:35 -08:00
Linus Torvalds
d0df9aabef Merge tag '5.12-rc2-smb3' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs fixes from Steve French:
 "Six cifs/smb3 fixes, three of them for stable, including some
  important mulitchannel crediting fixes, and a fix for statfs error
  handling"

* tag '5.12-rc2-smb3' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: do not send close in compound create+close requests
  cifs: return proper error code in statfs(2)
  cifs: change noisy error message to FYI
  cifs: print MIDs in decimal notation
  cifs: ask for more credit on async read/write code paths
  cifs: fix credit accounting for extra channel
2021-03-10 09:55:06 -08:00
Shile Zhang
65527a51c6 misc/pvpanic: Export module FDT device table
Export the module FDT device table to ensure the FDT compatible strings
are listed in the module alias. This help the pvpanic driver can be
loaded on boot automatically not only the ACPI device, but also the FDT
device.

Fixes: 46f934c9a1 ("misc/pvpanic: add support to get pvpanic device info FDT")
Signed-off-by: Shile Zhang <shile.zhang@linux.alibaba.com>
Link: https://lore.kernel.org/r/20210218123116.207751-1-shile.zhang@linux.alibaba.com
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-10 17:07:58 +01:00
Dmitry Baryshkov
20c40794eb misc: fastrpc: restrict user apps from sending kernel RPC messages
Verify that user applications are not using the kernel RPC message
handle to restrict them from directly attaching to guest OS on the
remote subsystem. This is a port of CVE-2019-2308 fix.

Fixes: c68cfb718c ("misc: fastrpc: Add support for context Invoke method")
Cc: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Cc: Jonathan Marek <jonathan@marek.ca>
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Link: https://lore.kernel.org/r/20210212192658.3476137-1-dmitry.baryshkov@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-10 17:01:29 +01:00
Shuo Liu
1201d68f47 virt: acrn: Correct type casting of argument of copy_from_user()
hsm.c:336:50: warning: incorrect type in argument 2 (different address spaces)
hsm.c:336:50:    expected void const [noderef] __user *from
hsm.c:336:50:    got void *

This patch fixes above sparse warning.

Fixes: 3d679d5aec ("virt: acrn: Introduce interfaces to query C-states and P-states allowed by hypervisor")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Shuo Liu <shuo.a.liu@intel.com>
Link: https://lore.kernel.org/r/20210310153708.17451-1-shuo.a.liu@intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-10 16:59:50 +01:00
Sean Christopherson
c8e2fe13d1 x86/perf: Use RET0 as default for guest_get_msrs to handle "no PMU" case
Initialize x86_pmu.guest_get_msrs to return 0/NULL to handle the "nop"
case.  Patching in perf_guest_get_msrs_nop() during setup does not work
if there is no PMU, as setup bails before updating the static calls,
leaving x86_pmu.guest_get_msrs NULL and thus a complete nop.  Ultimately,
this causes VMX abort on VM-Exit due to KVM putting random garbage from
the stack into the MSR load list.

Add a comment in KVM to note that nr_msrs is valid if and only if the
return value is non-NULL.

Fixes: abd562df94 ("x86/perf: Use static_call for x86_pmu.guest_get_msrs")
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Reported-by: syzbot+cce9ef2dd25246f815ee@syzkaller.appspotmail.com
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210309171019.1125243-1-seanjc@google.com
2021-03-10 16:45:09 +01:00
Jia-Ju Bai
df66617bfe block: rsxx: fix error return code of rsxx_pci_probe()
When create_singlethread_workqueue returns NULL to card->event_wq, no
error return code of rsxx_pci_probe() is assigned.

To fix this bug, st is assigned with -ENOMEM in this case.

Fixes: 8722ff8cdb ("block: IBM RamSan 70/80 device driver")
Reported-by: TOTE Robot <oslab@tsinghua.edu.cn>
Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Link: https://lore.kernel.org/r/20210310033017.4023-1-baijiaju1990@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-10 08:25:37 -07:00
Damien Le Moal
faa44c69da block: Fix REQ_OP_ZONE_RESET_ALL handling
Similarly to a single zone reset operation (REQ_OP_ZONE_RESET), execute
REQ_OP_ZONE_RESET_ALL operations with REQ_SYNC set.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-10 07:45:47 -07:00
Pavel Begunkov
7d41e8543d io_uring: remove indirect ctx into sqo injection
We use ->ctx_new_list to notify sqo about new ctx pending, then sqo
should stop and splice it to its sqd->ctx_list, paired with
->sq_thread_comp.

The last one is broken because nobody reinitialises it, and trying to
fix it would only add more complexity and bugs. And the first isn't
really needed as is done under park(), that protects from races well.
Add ctx into sqd->ctx_list directly (under park()), it's much simpler
and allows to kill both, ctx_new_list and sq_thread_comp.

note: apparently there is no real problem at the moment, because
sq_thread_comp is used only by io_sq_thread_finish() followed by
parking, where list_del(&ctx->sqd_list) removes it well regardless
whether it's in the new or the active list.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-10 07:30:32 -07:00
Pavel Begunkov
78d7f6ba82 io_uring: fix invalid ctx->sq_thread_idle
We have to set ctx->sq_thread_idle before adding a ring to an SQ task,
otherwise sqd races for seeing zero and accounting it as such.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-10 07:29:59 -07:00
Jens Axboe
e22bc9b481 kernel: make IO threads unfreezable by default
The io-wq threads were already marked as no-freeze, but the manager was
not. On resume, we perpetually have signal_pending() being true, and
hence the manager will loop and spin 100% of the time.

Just mark the tasks created by create_io_thread() as PF_NOFREEZE by
default, and remove any knowledge of it in io-wq and io_uring.

Reported-by: Kevin Locke <kevin@kevinlocke.name>
Tested-by: Kevin Locke <kevin@kevinlocke.name>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-10 07:28:43 -07:00
Jens Axboe
e8f98f2454 io_uring: always wait for sqd exited when stopping SQPOLL thread
We have a tiny race where io_put_sq_data() calls io_sq_thead_stop()
and finds the thread gone, but the thread has indeed not fully
exited or called complete() yet. Close it up by always having
io_sq_thread_stop() wait on completion of the exit event.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-10 07:28:43 -07:00
Yang Li
5199328a0d io_uring: remove unneeded variable 'ret'
Fix the following coccicheck warning:
./fs/io_uring.c:8984:5-8: Unneeded variable: "ret". Return "0" on line
8998

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Link: https://lore.kernel.org/r/1615271441-33649-1-git-send-email-yang.lee@linux.alibaba.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-10 07:28:43 -07:00
Jens Axboe
93e68e036c io_uring: move all io_kiocb init early in io_init_req()
If we hit an error path in the function, make sure that the io_kiocb is
fully initialized at that point so that freeing the request always sees
a valid state.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-10 07:28:43 -07:00
yangerkun
70e3512509 io-wq: fix ref leak for req in case of exit cancelations
do_work such as io_wq_submit_work that cancel the work may leave a ref of
req as 1 if we have links. Fix it by call io_run_cancel.

Fixes: 4fb6ac3262 ("io-wq: improve manager/worker handling over exec")
Signed-off-by: yangerkun <yangerkun@huawei.com>
Link: https://lore.kernel.org/r/20210309030410.3294078-1-yangerkun@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-10 07:28:42 -07:00
Pavel Begunkov
7a612350a9 io_uring: fix complete_post races for linked req
Calling io_queue_next() after spin_unlock in io_req_complete_post()
races with the other side extracting and reusing this request. Hand
coded parts of io_req_find_next() considering that io_disarm_next()
and io_req_task_queue() have (and safe) to be called with
completion_lock held.

It already does io_commit_cqring() and io_cqring_ev_posted(), so just
reuse it for post io_disarm_next().

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/5672a62f3150ee7c55849f40c0037655c4f2840f.1615250156.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-10 07:28:42 -07:00
Pavel Begunkov
33cc89a9fc io_uring: add io_disarm_next() helper
A preparation patch placing all preparations before extracting a next
request into a separate helper io_disarm_next().

Also, don't spuriously do ev_posted in a rare case where REQ_F_FAIL_LINK
is set but there are no requests linked (i.e. after cancelling a linked
timeout or setting IOSQE_IO_LINK on a last request of a submission
batch).

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/44ecff68d6b47e1c4e6b891bdde1ddc08cfc3590.1615250156.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-10 07:28:42 -07:00
Pavel Begunkov
97a73a0f9f io_uring: fix io_sq_offload_create error handling
Don't set IO_SQ_THREAD_SHOULD_STOP when io_sq_offload_create() has
failed on io_uring_alloc_task_context() but leave everything to
io_sq_thread_finish(), because currently io_sq_thread_finish()
hangs on trying to park it. That's great it stalls there, because
otherwise the following io_sq_thread_stop() would be skipped on
IO_SQ_THREAD_SHOULD_STOP check and the sqo would race for sqd with
freeing ctx.

A simple error injection gives something like this.

[  245.463955] INFO: task sqpoll-test-hang:523 blocked for more than 122 seconds.
[  245.463983] Call Trace:
[  245.463990]  __schedule+0x36b/0x950
[  245.464005]  schedule+0x68/0xe0
[  245.464013]  schedule_timeout+0x209/0x2a0
[  245.464032]  wait_for_completion+0x8b/0xf0
[  245.464043]  io_sq_thread_finish+0x44/0x1a0
[  245.464049]  io_uring_setup+0x9ea/0xc80
[  245.464058]  __x64_sys_io_uring_setup+0x16/0x20
[  245.464064]  do_syscall_64+0x38/0x50
[  245.464073]  entry_SYSCALL_64_after_hwframe+0x44/0xae

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-10 07:28:42 -07:00
Jens Axboe
cc20e3fec6 io-wq: remove unused 'user' member of io_wq
Previous patches killed the last user of this, now it's just a dead member
in the struct. Get rid of it.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-10 07:28:42 -07:00
Matthew Wilcox (Oracle)
61cf93700f io_uring: Convert personality_idr to XArray
You can't call idr_remove() from within a idr_for_each() callback,
but you can call xa_erase() from an xa_for_each() loop, so switch the
entire personality_idr from the IDR to the XArray.  This manifests as a
use-after-free as idr_for_each() attempts to walk the rest of the node
after removing the last entry from it.

Fixes: 071698e13a ("io_uring: allow registering credentials")
Cc: stable@vger.kernel.org # 5.6+
Reported-by: yangerkun <yangerkun@huawei.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
[Pavel: rebased (creds load was moved into io_init_req())]
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/7ccff36e1375f2b0ebf73d957f037b43becc0dde.1615212806.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-10 07:28:42 -07:00
Pavel Begunkov
0298ef969a io_uring: clean R_DISABLED startup mess
There are enough of problems with IORING_SETUP_R_DISABLED, including the
burden of checking and kicking off the SQO task all over the codebase --
for exit/cancel/etc.

Rework it, always start the thread but don't do submit unless the flag
is gone, that's much easier.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-10 07:28:42 -07:00
Pavel Begunkov
f458dd8441 io_uring: fix unrelated ctx reqs cancellation
io-wq now is per-task, so cancellations now should match against
request's ctx.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-10 07:28:42 -07:00
Jens Axboe
05962f95f9 io_uring: SQPOLL parking fixes
We keep running into weird dependency issues between the sqd lock and
the parking state. Disentangle the SQPOLL thread from the last bits of
the kthread parking inheritance, and just replace the parking state,
and two associated locks, with a single rw mutex. The SQPOLL thread
keeps the mutex for read all the time, except if someone has marked us
needing to park. Then we drop/re-acquire and try again.

This greatly simplifies the parking state machine (by just getting rid
of it), and makes it a lot more obvious how it works - if you need to
modify the ctx list, then you simply park the thread which will grab
the lock for writing.

Fold in fix from Hillf Danton on not setting STOP on a fatal signal.

Fixes: e54945ae94 ("io_uring: SQPOLL stop error handling fixes")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-10 07:28:22 -07:00
Heikki Krogerus
2a92c90f2e software node: Fix device_add_software_node()
The function device_add_software_node() was meant to
register the node supplied to it, but only if that node
wasn't already registered. Right now the function attempts
to always register the node. That will cause a failure with
nodes that are already registered.

Fixing that by incrementing the reference count of the nodes
that have already been registered, and only registering the
new nodes. Also, clarifying the behaviour in the function
documentation.

Fixes: e68d0119e3 ("software node: Introduce device_add_software_node()")
Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Tested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-03-10 15:25:02 +01:00
Heikki Krogerus
8891123f9c software node: Fix node registration
Software node can not be registered before its parent.

Fixes: 80488a6b1d ("software node: Add support for static node descriptors")
Cc: 5.10+ <stable@vger.kernel.org> # 5.10+
Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Tested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-03-10 15:19:58 +01:00
Juergen Gross
9e77d96b8e xen/events: reset affinity of 2-level event when tearing it down
When creating a new event channel with 2-level events the affinity
needs to be reset initially in order to avoid using an old affinity
from earlier usage of the event channel port. So when tearing an event
channel down reset all affinity bits.

The same applies to the affinity when onlining a vcpu: all old
affinity settings for this vcpu must be reset. As percpu events get
initialized before the percpu event channel hook is called,
resetting of the affinities happens after offlining a vcpu (this is
working, as initial percpu memory is zeroed out).

Cc: stable@vger.kernel.org
Reported-by: Julien Grall <julien@xen.org>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
Link: https://lore.kernel.org/r/20210306161833.4552-2-jgross@suse.com
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2021-03-10 07:25:18 -06:00
Wei Yongjun
e9e7fce038 regulator: rt4831: Fix return value check in rt4831_regulator_probe()
In case of error, the function dev_get_regmap() returns NULL
pointer not ERR_PTR(). The IS_ERR() test in the return value
check should be replaced with NULL test.

Fixes: 9351ab8b0c ("regulator: rt4831: Adds support for Richtek RT4831 DSV regulator")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Link: https://lore.kernel.org/r/20210305034930.3236099-1-weiyongjun1@huawei.com
Signed-off-by: Mark Brown <broonie@kernel.org>
2021-03-10 12:22:27 +00:00
Frieder Schrempf
98b94b6e38 regulator: pca9450: Clear PRESET_EN bit to fix BUCK1/2/3 voltage setting
The driver uses the DVS registers PCA9450_REG_BUCKxOUT_DVS0 to set the
voltage for the buck regulators 1, 2 and 3. This has no effect as the
PRESET_EN bit is set by default and therefore the preset values are used
instead, which are set to 850 mV.

To fix this we clear the PRESET_EN bit at time of initialization.

Fixes: 0935ff5f1f ("regulator: pca9450: add pca9450 pmic driver")
Cc: <stable@vger.kernel.org>
Signed-off-by: Frieder Schrempf <frieder.schrempf@kontron.de>
Link: https://lore.kernel.org/r/20210222115229.166620-1-frieder.schrempf@kontron.de
Signed-off-by: Mark Brown <broonie@kernel.org>
2021-03-10 12:22:26 +00:00
satya priya
dfe03bca8d regulator: qcom-rpmh: Use correct buck for S1C regulator
Use correct buck, that is, pmic5_hfsmps515 for S1C regulator
of PM8350C PMIC.

Signed-off-by: satya priya <skakit@codeaurora.org>
Link: https://lore.kernel.org/r/1614155592-14060-7-git-send-email-skakit@codeaurora.org
Signed-off-by: Mark Brown <broonie@kernel.org>
2021-03-10 12:22:25 +00:00
satya priya
e610e072c8 regulator: qcom-rpmh: Correct the pmic5_hfsmps515 buck
Correct the REGULATOR_LINEAR_RANGE and n_voltges for
pmic5_hfsmps515 buck.

Signed-off-by: satya priya <skakit@codeaurora.org>
Link: https://lore.kernel.org/r/1614155592-14060-4-git-send-email-skakit@codeaurora.org
Signed-off-by: Mark Brown <broonie@kernel.org>
2021-03-10 12:22:24 +00:00
Frieder Schrempf
5fe5f17dc5 regulator: pca9450: Fix return value when failing to get sd-vsel GPIO
This fixes the return value of pca9450_i2c_probe() to use the correct
error code when getting the sd-vsel GPIO fails.

Signed-off-by: Frieder Schrempf <frieder.schrempf@kontron.de>
Link: https://lore.kernel.org/r/20210222150809.208942-1-frieder.schrempf@kontron.de
Signed-off-by: Mark Brown <broonie@kernel.org>
2021-03-10 12:22:23 +00:00
Axel Lin
fbc102fb4c regulator: mt6315: Return REGULATOR_MODE_INVALID for invalid mode
-EINVAL is not a valid return value for .of_map_mode, return
REGULATOR_MODE_INVALID instead.

Signed-off-by: Axel Lin <axel.lin@ingics.com>
Link: https://lore.kernel.org/r/20210215034813.45510-1-axel.lin@ingics.com
Signed-off-by: Mark Brown <broonie@kernel.org>
2021-03-10 12:22:22 +00:00
Takashi Iwai
eea46a0879 ALSA: hda/hdmi: Cancel pending works before suspend
The per_pin->work might be still floating at the suspend, and this may
hit the access to the hardware at an unexpected timing.  Cancel the
work properly at the suspend callback for avoiding the buggy access.

Note that the bug doesn't trigger easily in the recent kernels since
the work is queued only when the repoll count is set, and usually it's
only at the resume callback, but it's still possible to hit in
theory.

BugLink: https://bugzilla.suse.com/show_bug.cgi?id=1182377
Reported-and-tested-by: Abhishek Sahu <abhsahu@nvidia.com>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20210310112809.9215-4-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2021-03-10 12:52:01 +01:00
Takashi Iwai
5ff9dde42e ALSA: hda: Avoid spurious unsol event handling during S3/S4
When HD-audio bus receives unsolicited events during its system
suspend/resume (S3 and S4) phase, the controller driver may still try
to process events although the codec chips are already (or yet)
powered down.  This might screw up the codec communication, resulting
in CORB/RIRB errors.  Such events should be rather skipped, as the
codec chip status such as the jack status will be fully refreshed at
the system resume time.

Since we're tracking the system suspend/resume state in codec
power.power_state field, let's add the check in the common unsol event
handler entry point to filter out such events.

BugLink: https://bugzilla.suse.com/show_bug.cgi?id=1182377
Tested-by: Abhishek Sahu <abhsahu@nvidia.com>
Cc: <stable@vger.kernel.org> # 183ab39eb0: ALSA: hda: Initialize power_state
Link: https://lore.kernel.org/r/20210310112809.9215-3-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2021-03-10 12:51:49 +01:00
Takashi Iwai
13661fc484 ALSA: hda: Flush pending unsolicited events before suspend
The HD-audio controller driver processes the unsolicited events via
its work asynchronously, and this might be pending when the system
goes to suspend.  When a lengthy event handling like ELD byte reads is
running, this might trigger unexpected accesses among suspend/resume
procedure, typically seen with Nvidia driver that still requires the
handling via unsolicited event verbs for ELD updates.

This patch adds the flush of unsol_work to assure that pending events
are processed before going into suspend.

Buglink: https://bugzilla.suse.com/show_bug.cgi?id=1182377
Reported-and-tested-by: Abhishek Sahu <abhsahu@nvidia.com>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20210310112809.9215-2-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2021-03-10 12:51:32 +01:00
Rob Herring
7bb8bc6eb5 arm64: perf: Fix 64-bit event counter read truncation
Commit 0fdf1bb759 ("arm64: perf: Avoid PMXEV* indirection") changed
armv8pmu_read_evcntr() to return a u32 instead of u64. The result is
silent truncation of the event counter when using 64-bit counters. Given
the offending commit appears to have passed thru several folks, it seems
likely this was a bad rebase after v8.5 PMU 64-bit counters landed.

Cc: Alexandru Elisei <alexandru.elisei@arm.com>
Cc: Julien Thierry <julien.thierry.kdev@gmail.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: <stable@vger.kernel.org>
Fixes: 0fdf1bb759 ("arm64: perf: Avoid PMXEV* indirection")
Signed-off-by: Rob Herring <robh@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
Link: https://lore.kernel.org/r/20210310004412.1450128-1-robh@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
2021-03-10 11:08:23 +00:00
James Morse
26f55386f9 arm64/mm: Fix __enable_mmu() for new TGRAN range values
As per ARM ARM DDI 0487G.a, when FEAT_LPA2 is implemented, ID_AA64MMFR0_EL1
might contain a range of values to describe supported translation granules
(4K and 16K pages sizes in particular) instead of just enabled or disabled
values. This changes __enable_mmu() function to handle complete acceptable
range of values (depending on whether the field is signed or unsigned) now
represented with ID_AA64MMFR0_TGRAN_SUPPORTED_[MIN..MAX] pair. While here,
also fix similar situations in EFI stub and KVM as well.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: kvmarm@lists.cs.columbia.edu
Cc: linux-efi@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Acked-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Link: https://lore.kernel.org/r/1615355590-21102-1-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2021-03-10 11:01:57 +00:00
Mark Brown
07e644885b kselftest: arm64: Fix exit code of sve-ptrace
We track if sve-ptrace encountered a failure in a variable but don't
actually use that value when we exit the program, do so.

Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20210309190304.39169-1-broonie@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
2021-03-10 10:58:11 +00:00
Catalin Marinas
d15dfd3138 arm64: mte: Map hotplugged memory as Normal Tagged
In a system supporting MTE, the linear map must allow reading/writing
allocation tags by setting the memory type as Normal Tagged. Currently,
this is only handled for memory present at boot. Hotplugged memory uses
Normal non-Tagged memory.

Introduce pgprot_mhp() for hotplugged memory and use it in
add_memory_resource(). The arm64 code maps pgprot_mhp() to
pgprot_tagged().

Note that ZONE_DEVICE memory should not be mapped as Tagged and
therefore setting the memory type in arch_add_memory() is not feasible.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Fixes: 0178dc7613 ("arm64: mte: Use Normal Tagged attributes for the linear map")
Reported-by: Patrick Daly <pdaly@codeaurora.org>
Tested-by: Patrick Daly <pdaly@codeaurora.org>
Link: https://lore.kernel.org/r/1614745263-27827-1-git-send-email-pdaly@codeaurora.org
Cc: <stable@vger.kernel.org> # 5.10.x
Cc: Will Deacon <will@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Link: https://lore.kernel.org/r/20210309122601.5543-1-catalin.marinas@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2021-03-10 10:56:46 +00:00
Peter Zijlstra
4817a52b30 seqlock,lockdep: Fix seqcount_latch_init()
seqcount_init() must be a macro in order to preserve the static
variable that is used for the lockdep key. Don't then wrap it in an
inline function, which destroys that.

Luckily there aren't many users of this function, but fix it before it
becomes a problem.

Fixes: 80793c3471 ("seqlock: Introduce seqcount_latch_t")
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/YEeFEbNUVkZaXDp4@hirez.programming.kicks-ass.net
2021-03-10 09:51:45 +01:00
Peter Zijlstra
d5b0e0677b u64_stats,lockdep: Fix u64_stats_init() vs lockdep
Jakub reported that:

    static struct net_device *rtl8139_init_board(struct pci_dev *pdev)
    {
	    ...
	    u64_stats_init(&tp->rx_stats.syncp);
	    u64_stats_init(&tp->tx_stats.syncp);
	    ...
    }

results in lockdep getting confused between the RX and TX stats lock.
This is because u64_stats_init() is an inline calling seqcount_init(),
which is a macro using a static variable to generate a lockdep class.

By wrapping that in an inline, we negate the effect of the macro and
fold the static key variable, hence the confusion.

Fix by also making u64_stats_init() a macro for the case where it
matters, leaving the other case an inline for argument validation
etc.

Reported-by: Jakub Kicinski <kuba@kernel.org>
Debugged-by: "Ahmed S. Darwish" <a.darwish@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: "Erhard F." <erhard_f@mailbox.org>
Link: https://lkml.kernel.org/r/YEXicy6+9MksdLZh@hirez.programming.kicks-ass.net
2021-03-10 09:51:45 +01:00
Yejune Deng
a758b7c4c6 virt: acrn: Use EPOLLIN instead of POLLIN
This fixes the following sparse warning:
"sparse warnings: (new ones prefixed by >>)"
>> drivers/virt/acrn/irqfd.c:163:13: sparse: sparse: restricted __poll_t
 degrades to integer

Fixes: dcf9625f2a ("virt: acrn: Use vfs_poll() instead of f_op->poll()")
Reported-by: kernel test robot <lkp@intel.com>
Acked-by: Shuo Liu <shuo.a.liu@intel.com>
Signed-off-by: Yejune Deng <yejune.deng@gmail.com>
Link: https://lore.kernel.org/r/20210310074901.7486-1-yejune.deng@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-10 09:50:23 +01:00
Serge Semin
1cffb1c664 usb: dwc3: qcom: Add missing DWC3 OF node refcount decrement
of_get_child_by_name() increments the reference counter of the OF node it
managed to find. So after the code is done using the device node, the
refcount must be decremented. Add missing of_node_put() invocation then
to the dwc3_qcom_of_register_core() method, since DWC3 OF node is being
used only there.

Fixes: a4333c3a6b ("usb: dwc3: Add Qualcomm DWC3 glue driver")
Signed-off-by: Serge Semin <Sergey.Semin@baikalelectronics.ru>
Link: https://lore.kernel.org/r/20210212205521.14280-1-Sergey.Semin@baikalelectronics.ru
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-10 09:36:13 +01:00
Pete Zaitcev
9de2c43acf USB: usblp: fix a hang in poll() if disconnected
Apparently an application that opens a device and calls select()
on it, will hang if the decice is disconnected. It's a little
surprising that we had this bug for 15 years, but apparently
nobody ever uses select() with a printer: only write() and read(),
and those work fine. Well, you can also select() with a timeout.

The fix is modeled after devio.c. A few other drivers check the
condition first, then do not add the wait queue in case the
device is disconnected. We doubt that's completely race-free.
So, this patch adds the process first, then locks properly
and checks for the disconnect.

Reviewed-by: Zqiang <qiang.zhang@windriver.com>
Signed-off-by: Pete Zaitcev <zaitcev@redhat.com>
Cc: stable <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20210303221053.1cf3313e@suzdal.zaitcev.lan
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-10 09:36:13 +01:00
Wei Yongjun
414c20df7d USB: gadget: udc: s3c2410_udc: fix return value check in s3c2410_udc_probe()
In case of error, the function devm_platform_ioremap_resource()
returns ERR_PTR() and never returns NULL. The NULL test in the
return value check should be replaced with IS_ERR().

Fixes: 188db4435a ("usb: gadget: s3c: use platform resources")
Cc: stable <stable@vger.kernel.org>
Reported-by: Hulk Robot <hulkci@huawei.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Link: https://lore.kernel.org/r/20210305034927.3232386-1-weiyongjun1@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-10 09:36:13 +01:00
Yoshihiro Shimoda
b1d25e6ee5 usb: renesas_usbhs: Clear PIPECFG for re-enabling pipe with other EPNUM
According to the datasheet, this controller has a restriction
which "set an endpoint number so that combinations of the DIR bit and
the EPNUM bits do not overlap.". However, since the udc core driver is
possible to assign a bulk pipe as an interrupt endpoint, an endpoint
number may not match the pipe number. After that, when user rebinds
another gadget driver, this driver broke the restriction because
the driver didn't clear any configuration in usb_ep_disable().

Example:
 # modprobe g_ncm
 Then, EP3 = pipe 3, EP4 = pipe 4, EP5 = pipe 6
 # rmmod g_ncm
 # modprobe g_hid
 Then, EP3 = pipe 6, EP4 = pipe 7.
 So, pipe 3 and pipe 6 are set as EP3.

So, clear PIPECFG register in usbhs_pipe_free().

Fixes: dfb87b8bfe ("usb: renesas_usbhs: gadget: fix re-enabling pipe without re-connecting")
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Link: https://lore.kernel.org/r/1615168538-26101-1-git-send-email-yoshihiro.shimoda.uh@renesas.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-10 09:36:13 +01:00
Matthias Kaehlcke
2664deb093 usb: dwc3: qcom: Honor wakeup enabled/disabled state
The dwc3-qcom currently enables wakeup interrupts unconditionally
when suspending, however this should not be done when wakeup is
disabled (e.g. through the sysfs attribute power/wakeup). Only
enable wakeup interrupts when device_may_wakeup() returns true.

Fixes: a4333c3a6b ("usb: dwc3: Add Qualcomm DWC3 glue driver")
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
Cc: stable <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20210302103659.v2.1.I44954d9e1169f2cf5c44e6454d357c75ddfa99a2@changeid
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-10 09:36:12 +01:00
Ruslan Bilovol
cc2ac63d4c usb: gadget: f_uac1: stop playback on function disable
There is missing playback stop/cleanup in case of
gadget's ->disable callback that happens on
events like USB host resetting or gadget disconnection

Fixes: 0591bc2360 ("usb: gadget: add f_uac1 variant based on a new u_audio api")
Cc: <stable@vger.kernel.org> # 4.13+
Signed-off-by: Ruslan Bilovol <ruslan.bilovol@gmail.com>
Link: https://lore.kernel.org/r/1614599375-8803-3-git-send-email-ruslan.bilovol@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-10 09:36:12 +01:00
Ruslan Bilovol
789ea77310 usb: gadget: f_uac2: always increase endpoint max_packet_size by one audio slot
As per UAC2 Audio Data Formats spec (2.3.1.1 USB Packets),
if the sampling rate is a constant, the allowable variation
of number of audio slots per virtual frame is +/- 1 audio slot.

It means that endpoint should be able to accept/send +1 audio
slot.

Previous endpoint max_packet_size calculation code
was adding sometimes +1 audio slot due to DIV_ROUND_UP
behaviour which was rounding up to closest integer.
However this doesn't work if the numbers are divisible.

It had no any impact with Linux hosts which ignore
this issue, but in case of more strict Windows it
caused rejected enumeration

Thus always add +1 audio slot to endpoint's max packet size

Fixes: 913e4a90b6 ("usb: gadget: f_uac2: finalize wMaxPacketSize according to bandwidth")
Cc: Peter Chen <peter.chen@freescale.com>
Cc: <stable@vger.kernel.org> #v4.3+
Signed-off-by: Ruslan Bilovol <ruslan.bilovol@gmail.com>
Link: https://lore.kernel.org/r/1614599375-8803-2-git-send-email-ruslan.bilovol@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-10 09:36:12 +01:00
Dan Carpenter
650bf52208 USB: gadget: u_ether: Fix a configfs return code
If the string is invalid, this should return -EINVAL instead of 0.

Fixes: 73517cf49b ("usb: gadget: add RNDIS configfs options for class/subclass/protocol")
Cc: stable <stable@vger.kernel.org>
Acked-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Link: https://lore.kernel.org/r/YCqZ3P53yyIg5cn7@mwanda
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-10 09:36:12 +01:00
Shawn Guo
1edbff9c80 usb: dwc3: qcom: add ACPI device id for sc8180x
It enables USB Host support for sc8180x ACPI boot, both the standalone
one and the one behind URS (USB Role Switch).  And they share the
the same dwc3_acpi_pdata with sdm845.

Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
Link: https://lore.kernel.org/r/20210301075745.20544-1-shawn.guo@linaro.org
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-10 09:36:12 +01:00
Yorick de Wid
4d8654e81d Goodix Fingerprint device is not a modem
The CDC ACM driver is false matching the Goodix Fingerprint device
against the USB_CDC_ACM_PROTO_AT_V25TER.

The Goodix Fingerprint device is a biometrics sensor that should be
handled in user-space. libfprint has some support for Goodix
fingerprint sensors, although not for this particular one. It is
possible that the vendor allocates a PID per OEM (Lenovo, Dell etc).
If this happens to be the case then more devices from the same vendor
could potentially match the ACM modem module table.

Signed-off-by: Yorick de Wid <ydewid@gmail.com>
Cc: stable <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20210213144901.53199-1-ydewid@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-10 09:36:12 +01:00
Greg Kroah-Hartman
c776b77a27 Revert "drivers:tty:pty: Fix a race causing data loss on close"
This reverts commit 33d4ae9885.

Pierre-Louis writes:

	Our SOF/audio CI shows an across-the-board regression when we try v5.12-rc1,
	specifically on pause/resume tests with an interactive terminal running 'aplay
	-i' commands managed by expect to simulate the user pressing the space bar to
	pause/unpause. It turns out the processes are not longer killed and the audio
	devices remain busy (see publicly available test results listed below).

	git bisect points to commit 33d4ae9885 ("drivers:tty:pty: Fix a race
	causing data loss on close"). Reverting the patch fixes the issue on all test
	devices.

	Further analysis with Corey Minyard points to a problem where a slave tty will
	not get a SIGHUP when the master is closed.

So revert this for now:

Reported-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Link: https://lore.kernel.org/r/00154592-c5ee-aaba-956e-b265473b53bc@linux.intel.com
Cc: Corey Minyard <cminyard@mvista.com>
Cc: Jiri Slaby <jirislaby@kernel.org>
Cc: Mark Brown <broonie@kernel.org>,
Fixes: 33d4ae9885 ("drivers:tty:pty: Fix a race causing data loss on close")
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-10 09:27:56 +01:00
Alexander Shiyan
2334de198f Revert "serial: max310x: rework RX interrupt handling"
This reverts commit fce3c5c1a2.

FIFO is triggered 4 intervals after receiving a byte, it's good
when we don't care about the time of reception, but are only
interested in the presence of any activity on the line.
Unfortunately, this method is not suitable for all tasks,
for example, the RS-485 protocol will not work properly,
since the state machine must track the request-response time
and after the timeout expires, a decision is made that the device
on the line is not responding.

Signed-off-by: Alexander Shiyan <shc_work@mail.ru>
Link: https://lore.kernel.org/r/20210217080608.31192-1-shc_work@mail.ru
Fixes: fce3c5c1a2 ("serial: max310x: rework RX interrupt handling")
Cc: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-10 09:27:56 +01:00
Uwe Kleine-König
ba8a86e4da leds: trigger/tty: Use led_set_brightness_sync() from workqueue
led_set_brightness() involves scheduling a workqueue. As here the led's
brightness setting is done in context of the trigger's workqueue this is
unjustified overhead and it's more sensible to use
led_set_brightness_sync().

Fixes: fd4a641ac8 ("leds: trigger: implement a tty trigger")
Reported-by: Pavel Machek <pavel@ucw.cz>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Link: https://lore.kernel.org/r/20210219133307.4840-3-u.kleine-koenig@pengutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-10 09:27:56 +01:00
Uwe Kleine-König
1487e7bae8 leds: trigger: Fix error path to not unlock the unlocked mutex
ttyname is allocated before the mutex is taken, so it must not be
unlocked in the error path.

Fixes: fd4a641ac8 ("leds: trigger: implement a tty trigger")
Reported-by: Pavel Machek <pavel@ucw.cz>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Link: https://lore.kernel.org/r/20210219133307.4840-2-u.kleine-koenig@pengutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-10 09:27:55 +01:00
Greg Kroah-Hartman
16d7586dcc Revert "staging: wfx: remove unused included header files"
This reverts commit 314fd52f01.

It turns .h files into non-stand-alone when building, which might cause
problems in the long-run.

Reported-by: Jérôme Pouiller <jerome.pouiller@silabs.com>
Cc: Muhammad Usama Anjum <musamaanjum@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-10 09:23:30 +01:00
Dan Carpenter
74b6b20df8 staging: rtl8188eu: prevent ->ssid overflow in rtw_wx_set_scan()
This code has a check to prevent read overflow but it needs another
check to prevent writing beyond the end of the ->ssid[] array.

Fixes: a2c60d42d9 ("staging: r8188eu: Add files for new driver - part 16")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: stable <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/YEHymwsnHewzoam7@mwanda
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-10 09:23:30 +01:00
Dan Carpenter
d4ac640322 staging: rtl8188eu: fix potential memory corruption in rtw_check_beacon_data()
The "ie_len" is a value in the 1-255 range that comes from the user.  We
have to cap it to ensure that it's not too large or it could lead to
memory corruption.

Fixes: 9a7fe54ddc ("staging: r8188eu: Add source files for new driver - part 1")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: stable <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/YEHyQCrFZKTXyT7J@mwanda
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-10 09:23:30 +01:00
Dan Carpenter
87107518d7 staging: rtl8192u: fix ->ssid overflow in r8192_wx_set_scan()
We need to cap len at IW_ESSID_MAX_SIZE (32) to avoid memory corruption.
This can be controlled by the user via the ioctl.

Fixes: 5f53d8ca3d ("Staging: add rtl8192SU wireless usb driver")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: stable <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/YEHoAWMOSZBUw91F@mwanda
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-10 09:23:30 +01:00
Ian Abbott
fd3ce6557d staging: comedi: pcl726: Use 16-bit 0 for interrupt data
The pcl726 driver has an "interrupt" subdevice that supports Comedi
asynchronous commands, placing a value in the Comedi buffer for each
interrupt.  The subdevice uses Comedi's 16-bit sample format but the
interrupt handler is calling `comedi_buf_write_samples()` with the
address of a 32-bit integer `&s->state`.  On bigendian machines, this
will copy 2 bytes from the wrong end of the 32-bit integer.  This isn't
really a problem since `s->state` will always be 0 for this subdevice,
but clean it up by using a 16-bit variable initialized to 0 to pass the
value.

Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Link: https://lore.kernel.org/r/20210223143055.257402-15-abbotti@mev.co.uk
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-10 09:23:30 +01:00
Ian Abbott
5ff1c08c28 staging: comedi: ni_65xx: Use 16-bit 0 for interrupt data
The ni_65xx driver has an "interrupt" subdevice that supports Comedi
asynchronous commands, placing a value in the Comedi buffer for each
interrupt.  The subdevice uses Comedi's 16-bit sample format but the
interrupt handler is calling `comedi_buf_write_samples()` with the
address of a 32-bit integer `&s->state`.  On bigendian machines, this
will copy 2 bytes from the wrong end of the 32-bit integer.  This isn't
really a problem since `s->state` will always be 0 for this subdevice,
but clean it up by using a 16-bit variable initialized to 0 to pass the
value.

Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Link: https://lore.kernel.org/r/20210223143055.257402-14-abbotti@mev.co.uk
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-10 09:23:30 +01:00
Ian Abbott
33444638ae staging: comedi: ni_6527: Use 16-bit 0 for interrupt data
The ni_6527 driver has an "interrupt" subdevice that supports Comedi
asynchronous commands, placing a value in the Comedi buffer for each
interrupt.  The subdevice uses Comedi's 16-bit sample format but the
interrupt handler is calling `comedi_buf_write_samples()` with the
address of a 32-bit integer `&s->state`.  On bigendian machines, this
will copy 2 bytes from the wrong end of the 32-bit integer.  This isn't
really a problem since `s->state` will always be 0 for this subdevice,
but clean it up by using a 16-bit variable initialized to 0 to pass the
value.

Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Link: https://lore.kernel.org/r/20210223143055.257402-13-abbotti@mev.co.uk
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-10 09:23:29 +01:00
Ian Abbott
a1acdbc554 staging: comedi: comedi_parport: Use 16-bit 0 for interrupt data
The comedi_parport driver has an "interrupt" subdevice that supports
Comedi asynchronous commands, placing a value in the Comedi buffer for
each interrupt.  The subdevice uses Comedi's 16-bit sample format but
the interrupt handler is calling `comedi_buf_write_samples()` with the
address of a 32-bit integer `&s->state`.  On bigendian machines, this
will copy 2 bytes from the wrong end of the 32-bit integer.  This isn't
really a problem since `s->state` will always be 0 for this subdevice,
but clean it up by using a 16-bit variable initialized to 0 to pass the
value.

Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Link: https://lore.kernel.org/r/20210223143055.257402-12-abbotti@mev.co.uk
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-10 09:23:29 +01:00
Ian Abbott
8536749d49 staging: comedi: amplc_pc236_common: Use 16-bit 0 for interrupt data
The Amplicon PC36AT/PCI236 common driver has an "interrupt" subdevice
that supports Comedi asynchronous commands, placing a value in the
Comedi buffer for each interrupt.  The subdevice uses Comedi's 16-bit
sample format but the interrupt handler is calling
`comedi_buf_write_samples()` with the address of a 32-bit integer
`&s->state`.  On bigendian machines, this will copy 2 bytes from the
wrong end of the 32-bit integer.  This isn't really a problem since
`s->state` will always be 0 for this subdevice, but clean it up by using
a 16-bit variable initialized to 0 to pass the value.

Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Link: https://lore.kernel.org/r/20210223143055.257402-11-abbotti@mev.co.uk
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-10 09:23:29 +01:00
Ian Abbott
148e34fd33 staging: comedi: pcl818: Fix endian problem for AI command data
The analog input subdevice supports Comedi asynchronous commands that
use Comedi's 16-bit sample format.  However, the call to
`comedi_buf_write_samples()` is passing the address of a 32-bit integer
parameter.  On bigendian machines, this will copy 2 bytes from the wrong
end of the 32-bit value.  Fix it by changing the type of the parameter
holding the sample value to `unsigned short`.

[Note: the bug was introduced in commit edf4537bcb ("staging: comedi:
pcl818: use comedi_buf_write_samples()") but the patch applies better to
commit d615416de6 ("staging: comedi: pcl818: introduce
pcl818_ai_write_sample()").]

Fixes: d615416de6 ("staging: comedi: pcl818: introduce pcl818_ai_write_sample()")
Cc: <stable@vger.kernel.org> # 4.0+
Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Link: https://lore.kernel.org/r/20210223143055.257402-10-abbotti@mev.co.uk
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-10 09:23:29 +01:00
Ian Abbott
a084303a64 staging: comedi: pcl711: Fix endian problem for AI command data
The analog input subdevice supports Comedi asynchronous commands that
use Comedi's 16-bit sample format.  However, the call to
`comedi_buf_write_samples()` is passing the address of a 32-bit integer
variable.  On bigendian machines, this will copy 2 bytes from the wrong
end of the 32-bit value.  Fix it by changing the type of the variable
holding the sample value to `unsigned short`.

Fixes: 1f44c034de ("staging: comedi: pcl711: use comedi_buf_write_samples()")
Cc: <stable@vger.kernel.org> # 3.19+
Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Link: https://lore.kernel.org/r/20210223143055.257402-9-abbotti@mev.co.uk
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-10 09:23:29 +01:00
Ian Abbott
b39dfcced3 staging: comedi: me4000: Fix endian problem for AI command data
The analog input subdevice supports Comedi asynchronous commands that
use Comedi's 16-bit sample format.  However, the calls to
`comedi_buf_write_samples()` are passing the address of a 32-bit integer
variable.  On bigendian machines, this will copy 2 bytes from the wrong
end of the 32-bit value.  Fix it by changing the type of the variable
holding the sample value to `unsigned short`.

Fixes: de88924f67 ("staging: comedi: me4000: use comedi_buf_write_samples()")
Cc: <stable@vger.kernel.org> # 3.19+
Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Link: https://lore.kernel.org/r/20210223143055.257402-8-abbotti@mev.co.uk
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-10 09:23:29 +01:00
Ian Abbott
54999c0d94 staging: comedi: dmm32at: Fix endian problem for AI command data
The analog input subdevice supports Comedi asynchronous commands that
use Comedi's 16-bit sample format.  However, the call to
`comedi_buf_write_samples()` is passing the address of a 32-bit integer
variable.  On bigendian machines, this will copy 2 bytes from the wrong
end of the 32-bit value.  Fix it by changing the type of the variable
holding the sample value to `unsigned short`.

[Note: the bug was introduced in commit 1700529b24 ("staging: comedi:
dmm32at: use comedi_buf_write_samples()") but the patch applies better
to the later (but in the same kernel release) commit 0c0eadadcb
("staging: comedi: dmm32at: introduce dmm32_ai_get_sample()").]

Fixes: 0c0eadadcb ("staging: comedi: dmm32at: introduce dmm32_ai_get_sample()")
Cc: <stable@vger.kernel.org> # 3.19+
Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Link: https://lore.kernel.org/r/20210223143055.257402-7-abbotti@mev.co.uk
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-10 09:23:29 +01:00
Ian Abbott
459b1e8c8f staging: comedi: das800: Fix endian problem for AI command data
The analog input subdevice supports Comedi asynchronous commands that
use Comedi's 16-bit sample format.  However, the call to
`comedi_buf_write_samples()` is passing the address of a 32-bit integer
variable.  On bigendian machines, this will copy 2 bytes from the wrong
end of the 32-bit value.  Fix it by changing the type of the variable
holding the sample value to `unsigned short`.

Fixes: ad9eb43c93 ("staging: comedi: das800: use comedi_buf_write_samples()")
Cc: <stable@vger.kernel.org> # 3.19+
Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Link: https://lore.kernel.org/r/20210223143055.257402-6-abbotti@mev.co.uk
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-10 09:23:29 +01:00
Ian Abbott
1c0f20b787 staging: comedi: das6402: Fix endian problem for AI command data
The analog input subdevice supports Comedi asynchronous commands that
use Comedi's 16-bit sample format.  However, the call to
`comedi_buf_write_samples()` is passing the address of a 32-bit integer
variable.  On bigendian machines, this will copy 2 bytes from the wrong
end of the 32-bit value.  Fix it by changing the type of the variable
holding the sample value to `unsigned short`.

Fixes: d1d24cb65e ("staging: comedi: das6402: read analog input samples in interrupt handler")
Cc: <stable@vger.kernel.org> # 3.19+
Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Link: https://lore.kernel.org/r/20210223143055.257402-5-abbotti@mev.co.uk
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-10 09:23:29 +01:00
Ian Abbott
b2e78630f7 staging: comedi: adv_pci1710: Fix endian problem for AI command data
The analog input subdevice supports Comedi asynchronous commands that
use Comedi's 16-bit sample format.  However, the calls to
`comedi_buf_write_samples()` are passing the address of a 32-bit integer
variable.  On bigendian machines, this will copy 2 bytes from the wrong
end of the 32-bit value.  Fix it by changing the type of the variables
holding the sample value to `unsigned short`.  The type of the `val`
parameter of `pci1710_ai_read_sample()` is changed to `unsigned short *`
accordingly.  The type of the `val` variable in `pci1710_ai_insn_read()`
is also changed to `unsigned short` since its address is passed to
`pci1710_ai_read_sample()`.

Fixes: a9c3a015c1 ("staging: comedi: adv_pci1710: use comedi_buf_write_samples()")
Cc: <stable@vger.kernel.org> # 4.0+
Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Link: https://lore.kernel.org/r/20210223143055.257402-4-abbotti@mev.co.uk
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-10 09:23:29 +01:00
Ian Abbott
ac0bbf55ed staging: comedi: addi_apci_1500: Fix endian problem for command sample
The digital input subdevice supports Comedi asynchronous commands that
read interrupt status information.  This uses 16-bit Comedi samples (of
which only the bottom 8 bits contain status information).  However, the
interrupt handler is calling `comedi_buf_write_samples()` with the
address of a 32-bit variable `unsigned int status`.  On a bigendian
machine, this will copy 2 bytes from the wrong end of the variable.  Fix
it by changing the type of the variable to `unsigned short`.

Fixes: a8c66b684e ("staging: comedi: addi_apci_1500: rewrite the subdevice support functions")
Cc: <stable@vger.kernel.org> #4.0+
Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Link: https://lore.kernel.org/r/20210223143055.257402-3-abbotti@mev.co.uk
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-10 09:23:29 +01:00
Ian Abbott
25317f428a staging: comedi: addi_apci_1032: Fix endian problem for COS sample
The Change-Of-State (COS) subdevice supports Comedi asynchronous
commands to read 16-bit change-of-state values.  However, the interrupt
handler is calling `comedi_buf_write_samples()` with the address of a
32-bit integer `&s->state`.  On bigendian architectures, it will copy 2
bytes from the wrong end of the 32-bit integer.  Fix it by transferring
the value via a 16-bit integer.

Fixes: 6bb45f2b0c ("staging: comedi: addi_apci_1032: use comedi_buf_write_samples()")
Cc: <stable@vger.kernel.org> # 3.19+
Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Link: https://lore.kernel.org/r/20210223143055.257402-2-abbotti@mev.co.uk
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-10 09:23:29 +01:00
Dan Carpenter
e163b9823a staging: ks7010: prevent buffer overflow in ks_wlan_set_scan()
The user can specify a "req->essid_len" of up to 255 but if it's
over IW_ESSID_MAX_SIZE (32) that can lead to memory corruption.

Fixes: 13a9930d15 ("staging: ks7010: add driver from Nanonote extra-repository")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: stable <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/YD4fS8+HmM/Qmrw6@mwanda
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-10 09:23:28 +01:00
Lee Gibson
b93c1e3981 staging: rtl8712: Fix possible buffer overflow in r8712_sitesurvey_cmd
Function r8712_sitesurvey_cmd calls memcpy without checking the length.
A user could control that length and trigger a buffer overflow.
Fix by checking the length is within the maximum allowed size.

Signed-off-by: Lee Gibson <leegib@gmail.com>
Link: https://lore.kernel.org/r/20210301132648.420296-1-leegib@gmail.com
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-10 09:23:28 +01:00
Lee Gibson
8687bf9ef9 staging: rtl8192e: Fix possible buffer overflow in _rtl92e_wx_set_scan
Function _rtl92e_wx_set_scan calls memcpy without checking the length.
A user could control that length and trigger a buffer overflow.
Fix by checking the length is within the maximum allowed size.

Reviewed-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Lee Gibson <leegib@gmail.com>
Cc: stable <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20210226145157.424065-1-leegib@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-10 09:23:28 +01:00
Dan Carpenter
d660f4f42c staging: rtl8712: unterminated string leads to read overflow
The memdup_user() function does not necessarily return a NUL terminated
string so this can lead to a read overflow.  Switch from memdup_user()
to strndup_user() to fix this bug.

Fixes: c6dc001f2a ("staging: r8712u: Merging Realtek's latest (v2.6.6). Various fixes.")
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Link: https://lore.kernel.org/r/YDYSR+1rj26NRhvb@mwanda
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-10 09:23:28 +01:00
Julian Braha
7c36194558 staging: rtl8192e: fix kconfig dependency on CRYPTO
When RTLLIB_CRYPTO_TKIP is enabled and CRYPTO is disabled,
Kbuild gives the following warning:

WARNING: unmet direct dependencies detected for CRYPTO_MICHAEL_MIC
  Depends on [n]: CRYPTO [=n]
  Selected by [m]:
  - RTLLIB_CRYPTO_TKIP [=m] && STAGING [=y] && RTLLIB [=m]

WARNING: unmet direct dependencies detected for CRYPTO_LIB_ARC4
  Depends on [n]: CRYPTO [=n]
  Selected by [m]:
  - RTLLIB_CRYPTO_TKIP [=m] && STAGING [=y] && RTLLIB [=m]
  - RTLLIB_CRYPTO_WEP [=m] && STAGING [=y] && RTLLIB [=m]

This is because RTLLIB_CRYPTO_TKIP selects CRYPTO_MICHAEL_MIC and
CRYPTO_LIB_ARC4, without depending on or selecting CRYPTO,
despite those config options being subordinate to CRYPTO.

Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Julian Braha <julianbraha@gmail.com>
Link: https://lore.kernel.org/r/20210222180607.399753-1-julianbraha@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-10 09:23:28 +01:00
Yejune Deng
dcf9625f2a virt: acrn: Use vfs_poll() instead of f_op->poll()
Use a more advanced function vfs_poll() in acrn_irqfd_assign().
At the same time, modify the definition of events.

Signed-off-by: Yejune Deng <yejune.deng@gmail.com>
Signed-off-by: Shuo Liu <shuo.a.liu@intel.com>
Link: https://lore.kernel.org/r/20210221133306.33530-1-shuo.a.liu@intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-10 09:16:09 +01:00
Shuo Liu
e54b788869 virt: acrn: Make remove_cpu sysfs invisible with !CONFIG_HOTPLUG_CPU
Without cpu hotplug support, vCPU cannot be removed from a Service VM.
Don't expose remove_cpu sysfs when CONFIG_HOTPLUG_CPU disabled.

Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Qais Yousef <qais.yousef@arm.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested
Signed-off-by: Shuo Liu <shuo.a.liu@intel.com>
Link: https://lore.kernel.org/r/20210221134339.57851-2-shuo.a.liu@intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-10 09:16:09 +01:00
Shuo Liu
51f2403035 cpu/hotplug: Fix build error of using {add,remove}_cpu() with !CONFIG_SMP
279dcf693a ("virt: acrn: Introduce an interface for Service VM to
control vCPU") introduced {add,remove}_cpu() usage and it hit below
error with !CONFIG_SMP:

../drivers/virt/acrn/hsm.c: In function ‘remove_cpu_store’:
../drivers/virt/acrn/hsm.c:389:3: error: implicit declaration of function ‘remove_cpu’; [-Werror=implicit-function-declaration]
   remove_cpu(cpu);

../drivers/virt/acrn/hsm.c:402:2: error: implicit declaration of function ‘add_cpu’; [-Werror=implicit-function-declaration]
   add_cpu(cpu);

Add add_cpu() function prototypes with !CONFIG_SMP and remove_cpu() with
!CONFIG_HOTPLUG_CPU for such usage.

Fixes: 279dcf693a ("virt: acrn: Introduce an interface for Service VM to control vCPU")
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Qais Yousef <qais.yousef@arm.com>
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Qais Yousef <qais.yousef@arm.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested
Signed-off-by: Shuo Liu <shuo.a.liu@intel.com>
Link: https://lore.kernel.org/r/20210221134339.57851-1-shuo.a.liu@intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-10 09:16:09 +01:00
farah kassabri
15097e9338 habanalabs: fix debugfs address translation
when user uses virtual addresses to access dram through debugfs,
driver translate this address to physical and use it
for the access through the pcie bar.
in case dram page size is different than the dmmu
page size, we need to have special treatment
for adding the page offset to the actual address, which
is to use the dram page size mask to fetch the page offset
from the virtual address, instead of the dmmu last hop shift.

Signed-off-by: farah kassabri <fkassabri@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-03-10 09:16:09 +01:00
Tomer Tayar
ffd123fe83 habanalabs: Disable file operations after device is removed
A device can be removed from the PCI subsystem while a process holds the
file descriptor opened.
In such a case, the driver attempts to kill the process, but as it is
still possible that the process will be alive after this step, the
device removal will complete, and we will end up with a process object
that points to a device object which was already released.

To prevent the usage of this released device object, disable the
following file operations for this process object, and avoid the cleanup
steps when the file descriptor is eventually closed.
The latter is just a best effort, as memory leak will occur.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-03-10 09:16:09 +01:00
Tomer Tayar
27ac5aada0 habanalabs: Call put_pid() when releasing control device
The refcount of the "hl_fpriv" structure is not used for the control
device, and thus hl_hpriv_put() is not called when releasing this
device.
This results with no call to put_pid(), so add it explicitly in
hl_device_release_ctrl().

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-03-10 09:16:09 +01:00
Greg Kroah-Hartman
bd0c48e53d drivers: habanalabs: remove unused dentry pointer for debugfs files
The dentry for the created debugfs file was being saved, but never used
anywhere.  As the pointer isn't needed for anything, and the debugfs
files are being properly removed by removing the parent directory,
remove the saved pointer as well, saving a tiny bit of memory and logic.

Cc: Oded Gabbay <ogabbay@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Tomer Tayar <ttayar@habana.ai>
Cc: Moti Haimovski <mhaimovski@habana.ai>
Cc: Omer Shpigelman <oshpigelman@habana.ai>
Cc: Ofir Bitton <obitton@habana.ai>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-03-10 09:16:08 +01:00
Oded Gabbay
16db6b532f habanalabs: mark hl_eq_inc_ptr() as static
hl_eq_inc_ptr() is not called from anywhere outside irq.c so mark
it as static

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-03-10 09:16:08 +01:00
Linus Torvalds
05a59d7979 Merge git://git.kernel.org:/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from David Miller:

 1) Fix transmissions in dynamic SMPS mode in ath9k, from Felix Fietkau.

 2) TX skb error handling fix in mt76 driver, also from Felix.

 3) Fix BPF_FETCH atomic in x86 JIT, from Brendan Jackman.

 4) Avoid double free of percpu pointers when freeing a cloned bpf prog.
    From Cong Wang.

 5) Use correct printf format for dma_addr_t in ath11k, from Geert
    Uytterhoeven.

 6) Fix resolve_btfids build with older toolchains, from Kun-Chuan
    Hsieh.

 7) Don't report truncated frames to mac80211 in mt76 driver, from
    Lorenzop Bianconi.

 8) Fix watcdog timeout on suspend/resume of stmmac, from Joakim Zhang.

 9) mscc ocelot needs NET_DEVLINK selct in Kconfig, from Arnd Bergmann.

10) Fix sign comparison bug in TCP_ZEROCOPY_RECEIVE getsockopt(), from
    Arjun Roy.

11) Ignore routes with deleted nexthop object in mlxsw, from Ido
    Schimmel.

12) Need to undo tcp early demux lookup sometimes in nf_nat, from
    Florian Westphal.

13) Fix gro aggregation for udp encaps with zero csum, from Daniel
    Borkmann.

14) Make sure to always use imp*_ndo_send when necessaey, from Jason A.
    Donenfeld.

15) Fix TRSCER masks in sh_eth driver from Sergey Shtylyov.

16) prevent overly huge skb allocationsd in qrtr, from Pavel Skripkin.

17) Prevent rx ring copnsumer index loss of sync in enetc, from Vladimir
    Oltean.

18) Make sure textsearch copntrol block is large enough, from Wilem de
    Bruijn.

19) Revert MAC changes to r8152 leading to instability, from Hates Wang.

20) Advance iov in 9p even for empty reads, from Jissheng Zhang.

21) Double hook unregister in nftables, from PabloNeira Ayuso.

22) Fix memleak in ixgbe, fropm Dinghao Liu.

23) Avoid dups in pkt scheduler class dumps, from Maximilian Heyne.

24) Various mptcp fixes from Florian Westphal, Paolo Abeni, and Geliang
    Tang.

25) Fix DOI refcount bugs in cipso, from Paul Moore.

26) One too many irqsave in ibmvnic, from Junlin Yang.

27) Fix infinite loop with MPLS gso segmenting via virtio_net, from
    Balazs Nemeth.

* git://git.kernel.org:/pub/scm/linux/kernel/git/netdev/net: (164 commits)
  s390/qeth: fix notification for pending buffers during teardown
  s390/qeth: schedule TX NAPI on QAOB completion
  s390/qeth: improve completion of pending TX buffers
  s390/qeth: fix memory leak after failed TX Buffer allocation
  net: avoid infinite loop in mpls_gso_segment when mpls_hlen == 0
  net: check if protocol extracted by virtio_net_hdr_set_proto is correct
  net: dsa: xrs700x: check if partner is same as port in hsr join
  net: lapbether: Remove netif_start_queue / netif_stop_queue
  atm: idt77252: fix null-ptr-dereference
  atm: uPD98402: fix incorrect allocation
  atm: fix a typo in the struct description
  net: qrtr: fix error return code of qrtr_sendmsg()
  mptcp: fix length of ADD_ADDR with port sub-option
  net: bonding: fix error return code of bond_neigh_init()
  net: enetc: allow hardware timestamping on TX queues with tc-etf enabled
  net: enetc: set MAC RX FIFO to recommended value
  net: davicom: Use platform_get_irq_optional()
  net: davicom: Fix regulator not turned off on driver removal
  net: davicom: Fix regulator not turned off on failed probe
  net: dsa: fix switchdev objects on bridge master mistakenly being applied on ports
  ...
2021-03-09 17:15:56 -08:00
Linus Torvalds
6a30bedfdf Merge git://git.kernel.org:/pub/scm/linux/kernel/git/davem/sparc
Pull sparc fixes from David Miller:
 "Fix opcode filtering for exceptions, and clean up defconfig"

* git://git.kernel.org:/pub/scm/linux/kernel/git/davem/sparc:
  sparc: sparc64_defconfig: remove duplicate CONFIGs
  sparc64: Fix opcode filtering in handling of no fault loads
2021-03-09 17:08:41 -08:00
Corentin Labbe
69264b4a43 sparc: sparc64_defconfig: remove duplicate CONFIGs
After my patch there is CONFIG_ATA defined twice.
Remove the duplicate one.
Same problem for CONFIG_HAPPYMEAL, except I added as builtin for boot
test with NFS.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Fixes: a57cdeb369 ("sparc: sparc64_defconfig: add necessary configs for qemu")
Signed-off-by: Corentin Labbe <clabbe@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-09 16:22:40 -08:00
Rob Gardner
e5e8b80d35 sparc64: Fix opcode filtering in handling of no fault loads
is_no_fault_exception() has two bugs which were discovered via random
opcode testing with stress-ng. Both are caused by improper filtering
of opcodes.

The first bug can be triggered by a floating point store with a no-fault
ASI, for instance "sta %f0, [%g0] #ASI_PNF", opcode C1A01040.

The code first tests op3[5] (0x1000000), which denotes a floating
point instruction, and then tests op3[2] (0x200000), which denotes a
store instruction. But these bits are not mutually exclusive, and the
above mentioned opcode has both bits set. The intent is to filter out
stores, so the test for stores must be done first in order to have
any effect.

The second bug can be triggered by a floating point load with one of
the invalid ASI values 0x8e or 0x8f, which pass this check in
is_no_fault_exception():
     if ((asi & 0xf2) == ASI_PNF)

An example instruction is "ldqa [%l7 + %o7] #ASI 0x8f, %f38",
opcode CF95D1EF. Asi values greater than 0x8b (ASI_SNFL) are fatal
in handle_ldf_stq(), and is_no_fault_exception() must not allow these
invalid asi values to make it that far.

In both of these cases, handle_ldf_stq() reacts by calling
sun4v_data_access_exception() or spitfire_data_access_exception(),
which call is_no_fault_exception() and results in an infinite
recursion.

Signed-off-by: Rob Gardner <rob.gardner@oracle.com>
Tested-by: Anatoly Pugachev <matorola@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-09 16:21:10 -08:00
Christophe Leroy
bd73758803 powerpc: Fix missing declaration of [en/dis]able_kernel_vsx()
Add stub instances of enable_kernel_vsx() and disable_kernel_vsx()
when CONFIG_VSX is not set, to avoid following build failure.

  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dcn_calcs.o
  In file included from ./drivers/gpu/drm/amd/amdgpu/../display/dc/dm_services_types.h:29,
                   from ./drivers/gpu/drm/amd/amdgpu/../display/dc/dm_services.h:37,
                   from drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dcn_calcs.c:27:
  drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dcn_calcs.c: In function 'dcn_bw_apply_registry_override':
  ./drivers/gpu/drm/amd/amdgpu/../display/dc/os_types.h:64:3: error: implicit declaration of function 'enable_kernel_vsx'; did you mean 'enable_kernel_fp'? [-Werror=implicit-function-declaration]
     64 |   enable_kernel_vsx(); \
        |   ^~~~~~~~~~~~~~~~~
  drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dcn_calcs.c:640:2: note: in expansion of macro 'DC_FP_START'
    640 |  DC_FP_START();
        |  ^~~~~~~~~~~
  ./drivers/gpu/drm/amd/amdgpu/../display/dc/os_types.h:75:3: error: implicit declaration of function 'disable_kernel_vsx'; did you mean 'disable_kernel_fp'? [-Werror=implicit-function-declaration]
     75 |   disable_kernel_vsx(); \
        |   ^~~~~~~~~~~~~~~~~~
  drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dcn_calcs.c:676:2: note: in expansion of macro 'DC_FP_END'
    676 |  DC_FP_END();
        |  ^~~~~~~~~
  cc1: some warnings being treated as errors
  make[5]: *** [drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dcn_calcs.o] Error 1

This works because the caller is checking if VSX is available using
cpu_has_feature():

  #define DC_FP_START() { \
  	if (cpu_has_feature(CPU_FTR_VSX_COMP)) { \
  		preempt_disable(); \
  		enable_kernel_vsx(); \
  	} else if (cpu_has_feature(CPU_FTR_ALTIVEC_COMP)) { \
  		preempt_disable(); \
  		enable_kernel_altivec(); \
  	} else if (!cpu_has_feature(CPU_FTR_FPU_UNAVAILABLE)) { \
  		preempt_disable(); \
  		enable_kernel_fp(); \
  	} \

When CONFIG_VSX is not selected, cpu_has_feature(CPU_FTR_VSX_COMP)
constant folds to 'false' so the call to enable_kernel_vsx() is
discarded and the build succeeds.

Fixes: 16a9dea110 ("amdgpu: Enable initial DCN support on POWER")
Cc: stable@vger.kernel.org # v5.6+
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
[mpe: Incorporate some discussion comments into the change log]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/8d7d285a027e9d21f5ff7f850fa71a2655b0c4af.1615279170.git.christophe.leroy@csgroup.eu
2021-03-10 11:15:00 +11:00
David S. Miller
8515455720 Merge branch 's390-qeth-fixes'
Julian Wiedmann says:

====================
s390/qeth: fixes 2021-03-09

please apply the following patch series to netdev's net tree.

This brings one fix for a memleak in an error path of the setup code.
Also several fixes for dealing with pending TX buffers - two for old
bugs in their completion handling, and one recent regression in a
teardown path.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-09 16:14:54 -08:00
Julian Wiedmann
7eefda7f35 s390/qeth: fix notification for pending buffers during teardown
The cited commit reworked the state machine for pending TX buffers.
In qeth_iqd_tx_complete() it turned PENDING into a transient state, and
uses NEED_QAOB for buffers that get parked while waiting for their QAOB
completion.

But it missed to adjust the check in qeth_tx_complete_buf(). So if
qeth_tx_complete_pending_bufs() is called during teardown to drain
the parked TX buffers, we no longer raise a notification for af_iucv.

Instead of updating the checked state, just move this code into
qeth_tx_complete_pending_bufs() itself. This also gets rid of the
special-case in the common TX completion path.

Fixes: 8908f36d20 ("s390/qeth: fix af_iucv notification race")
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-09 16:14:54 -08:00
Julian Wiedmann
3e83d467a0 s390/qeth: schedule TX NAPI on QAOB completion
When a QAOB notifies us that a pending TX buffer has been delivered, the
actual TX completion processing by qeth_tx_complete_pending_bufs()
is done within the context of a TX NAPI instance. We shouldn't rely on
this instance being scheduled by some other TX event, but just do it
ourselves.

qeth_qdio_handle_aob() is called from qeth_poll(), ie. our main NAPI
instance. To avoid touching the TX queue's NAPI instance
before/after it is (un-)registered, reorder the code in qeth_open()
and qeth_stop() accordingly.

Fixes: 0da9581ddb ("qeth: exploit asynchronous delivery of storage blocks")
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-09 16:14:54 -08:00
Julian Wiedmann
c20383ad16 s390/qeth: improve completion of pending TX buffers
The current design attaches a pending TX buffer to a custom
single-linked list, which is anchored at the buffer's slot on the
TX ring. The buffer is then checked for final completion whenever
this slot is processed during a subsequent TX NAPI poll cycle.

But if there's insufficient traffic on the ring, we might never make
enough progress to get back to this ring slot and discover the pending
buffer's final TX completion. In particular if this missing TX
completion blocks the application from sending further traffic.

So convert the custom single-linked list code to a per-queue list_head,
and scan this list on every TX NAPI cycle.

Fixes: 0da9581ddb ("qeth: exploit asynchronous delivery of storage blocks")
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-09 16:14:54 -08:00
Julian Wiedmann
e7a36d27f6 s390/qeth: fix memory leak after failed TX Buffer allocation
When qeth_alloc_qdio_queues() fails to allocate one of the buffers that
back an Output Queue, the 'out_freeoutqbufs' path will free all
previously allocated buffers for this queue. But it misses to free the
half-finished queue struct itself.

Move the buffer allocation into qeth_alloc_output_queue(), and deal with
such errors internally.

Fixes: 0da9581ddb ("qeth: exploit asynchronous delivery of storage blocks")
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Reviewed-by: Alexandra Winter <wintera@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-09 16:14:53 -08:00
David S. Miller
b005c9ef5a Merge branch 'virtio_net-infinite-loop'
Balazs Nemeth says:

====================
net: prevent infinite loop caused by incorrect proto from virtio_net_hdr_set_proto

These patches prevent an infinite loop for gso packets with a protocol
from virtio net hdr that doesn't match the protocol in the packet.
Note that packets coming from a device without
header_ops->parse_protocol being implemented will not be caught by
the check in virtio_net_hdr_to_skb, but the infinite loop will still
be prevented by the check in the gso layer.

Changes from v2 to v3:
  - Remove unused *eth.
  - Use MPLS_HLEN to also check if the MPLS header length is a multiple
    of four.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-09 16:12:20 -08:00
Balazs Nemeth
d348ede32e net: avoid infinite loop in mpls_gso_segment when mpls_hlen == 0
A packet with skb_inner_network_header(skb) == skb_network_header(skb)
and ETH_P_MPLS_UC will prevent mpls_gso_segment from pulling any headers
from the packet. Subsequently, the call to skb_mac_gso_segment will
again call mpls_gso_segment with the same packet leading to an infinite
loop. In addition, ensure that the header length is a multiple of four,
which should hold irrespective of the number of stacked labels.

Signed-off-by: Balazs Nemeth <bnemeth@redhat.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-09 16:12:20 -08:00
Balazs Nemeth
924a9bc362 net: check if protocol extracted by virtio_net_hdr_set_proto is correct
For gso packets, virtio_net_hdr_set_proto sets the protocol (if it isn't
set) based on the type in the virtio net hdr, but the skb could contain
anything since it could come from packet_snd through a raw socket. If
there is a mismatch between what virtio_net_hdr_set_proto sets and
the actual protocol, then the skb could be handled incorrectly later
on.

An example where this poses an issue is with the subsequent call to
skb_flow_dissect_flow_keys_basic which relies on skb->protocol being set
correctly. A specially crafted packet could fool
skb_flow_dissect_flow_keys_basic preventing EINVAL to be returned.

Avoid blindly trusting the information provided by the virtio net header
by checking that the protocol in the packet actually matches the
protocol set by virtio_net_hdr_set_proto. Note that since the protocol
is only checked if skb->dev implements header_ops->parse_protocol,
packets from devices without the implementation are not checked at this
stage.

Fixes: 9274124f02 ("net: stricter validation of untrusted gso packets")
Signed-off-by: Balazs Nemeth <bnemeth@redhat.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-09 16:12:20 -08:00
George McCollister
286a8624d7 net: dsa: xrs700x: check if partner is same as port in hsr join
Don't assign dp to partner if it's the same port that xrs700x_hsr_join
was called with. The partner port is supposed to be the other port in
the HSR/PRP redundant pair not the same port. This fixes an issue
observed in testing where forwarding between redundant HSR ports on this
switch didn't work depending on the order the ports were added to the
hsr device.

Fixes: bd62e6f5e6 ("net: dsa: xrs700x: add HSR offloading support")
Signed-off-by: George McCollister <george.mccollister@gmail.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-09 16:03:37 -08:00
Daniel Axtens
c080a17330 powerpc/64s/exception: Clean up a missed SRR specifier
Nick's patch cleaning up the SRR specifiers in exception-64s.S missed
a single instance of EXC_HV_OR_STD. Clean that up.

Caught by clang's integrated assembler.

Fixes: 3f7fbd97d0 ("powerpc/64s/exception: Clean up SRR specifiers")
Signed-off-by: Daniel Axtens <dja@axtens.net>
Acked-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210225031006.1204774-2-dja@axtens.net
2021-03-10 07:59:31 +11:00
Nicholas Piggin
73ac798818 powerpc: Fix inverted SET_FULL_REGS bitop
This bit operation was inverted and set the low bit rather than
cleared it, breaking the ability to ptrace non-volatile GPRs after
exec. Fix.

Only affects 64e and 32-bit.

Fixes: feb9df3462 ("powerpc/64s: Always has full regs, so remove remnant checks")
Cc: stable@vger.kernel.org # v5.8+
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210308085530.3191843-1-npiggin@gmail.com
2021-03-10 07:59:30 +11:00
Michael Ellerman
7aed41cff3 powerpc/64s: Use symbolic macros for function entry encoding
In ppc_function_entry() we look for a specific set of instructions by
masking the instructions and comparing with a known value. Currently
those known values are just literal hex values, and we recently
discovered one of them was wrong.

Instead construct the values using the existing constants we have for
defining various fields of instructions.

Suggested-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Link: https://lore.kernel.org/r/20210309071544.515303-1-mpe@ellerman.id.au
2021-03-10 07:58:04 +11:00
Wei Yongjun
eeb05595d2 umem: fix error return code in mm_pci_probe()
Fix to return negative error code -ENOMEM from the blk_alloc_queue()
and dma_alloc_coherent() error handling cases instead of 0, as done
elsewhere in this function.

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Link: https://lore.kernel.org/r/20210308123501.2573816-1-weiyongjun1@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-09 13:56:05 -07:00
Linus Torvalds
4b3d9f9cf1 Merge tag 'gpio-fixes-for-v5.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux
Pull gpio fixes from Bartosz Golaszewski:
 "A bunch of fixes for the GPIO subsystem. We have two regressions in
  the core code spotted right after the merge window, a series of fixes
  for ACPI GPIO and a subsequent fix for a related regression in
  gpio-pca953x + a minor tweak in .gitignore and a rework of handling of
  the gpio-line-names to remedy a regression in stm32mp151.

  Summary:

   - fix two regressions in core GPIO subsystem code: one NULL-pointer
     dereference and one list corruption

   - read GPIO line names from fwnode instead of using the generic
     device properties to fix a regression on stm32mp151

   - fixes to ACPI GPIO and gpio-pca953x to handle a regression in IRQ
     handling on Intel Galileo

   - update .gitignore in GPIO selftests"

* tag 'gpio-fixes-for-v5.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
  gpiolib: Read "gpio-line-names" from a firmware node
  gpio: pca953x: Set IRQ type when handle Intel Galileo Gen 2
  gpiolib: acpi: Allow to find GpioInt() resource by name and index
  gpiolib: acpi: Add ACPI_GPIO_QUIRK_ABSOLUTE_NUMBER quirk
  gpiolib: acpi: Add missing IRQF_ONESHOT
  gpio: fix gpio-device list corruption
  gpio: fix NULL-deref-on-deregistration regression
  selftests: gpio: update .gitignore
2021-03-09 12:02:18 -08:00
Linus Torvalds
9c39198a65 Merge tag 'mips-fixes_5.12_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux
Pull MIPS fixes from Thomas Bogendoerfer:

 - fixes for boot breakage because of misaligned FDTs

 - fix for overwritten exception handlers

 - enable MIPS optimized crypto for all MIPS CPUs to improve wireguard
   performance

* tag 'mips-fixes_5.12_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
  MIPS: kernel: Reserve exception base early to prevent corruption
  MIPS: vmlinux.lds.S: align raw appended dtb to 8 bytes
  crypto: mips/poly1305 - enable for all MIPS processors
  MIPS: boot/compressed: Copy DTB to aligned address
2021-03-09 11:56:41 -08:00
Bernhard Rosenkränzer
1f09af0625 kbuild: Fix ld-version.sh script if LLD was built with LLD_VENDOR
If LLD was built with -DLLD_VENDOR="xyz", ld.lld --version output
will prefix LLD_VENDOR. Since LLD_VENDOR can contain spaces, the
LLD identifier isn't guaranteed to be $2 either.

Adjust the version checker to handle such versions of lld.

Link: https://lore.kernel.org/lkml/20210302221211.1620858-1-bero@lindev.ch/
Signed-off-by: Bernhard Rosenkränzer <bero@lindev.ch>
[masahiro yamada: refactor the code]
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Nathan Chancellor <nathan@kernel.org>
2021-03-10 04:18:28 +09:00
Jiri Slaby
b3d9fc1436 kbuild: dummy-tools: fix inverted tests for gcc
There is a test in Kconfig which takes inverted value of a compiler
check:
* config CC_HAS_INT128
        def_bool !$(cc-option,$(m64-flag) -D__SIZEOF_INT128__=0)

This results in CC_HAS_INT128 not being in super-config generated by
dummy-tools. So take this into account in the gcc script.

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2021-03-10 04:17:47 +09:00
Masahiro Yamada
993bdde945 kbuild: add image_name to no-sync-config-targets
'make image_name' needs include/config/auto.conf to show the correct
output because KBUILD_IMAGE depends on CONFIG options, but should not
attempt to resync the configuration.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2021-03-10 04:17:47 +09:00
Xie He
f7d9d48545 net: lapbether: Remove netif_start_queue / netif_stop_queue
For the devices in this driver, the default qdisc is "noqueue",
because their "tx_queue_len" is 0.

In function "__dev_queue_xmit" in "net/core/dev.c", devices with the
"noqueue" qdisc are specially handled. Packets are transmitted without
being queued after a "dev->flags & IFF_UP" check. However, it's possible
that even if this check succeeds, "ops->ndo_stop" may still have already
been called. This is because in "__dev_close_many", "ops->ndo_stop" is
called before clearing the "IFF_UP" flag.

If we call "netif_stop_queue" in "ops->ndo_stop", then it's possible in
"__dev_queue_xmit", it sees the "IFF_UP" flag is present, and then it
checks "netif_xmit_stopped" and finds that the queue is already stopped.
In this case, it will complain that:
"Virtual device ... asks to queue packet!"

To prevent "__dev_queue_xmit" from generating this complaint, we should
not call "netif_stop_queue" in "ops->ndo_stop".

We also don't need to call "netif_start_queue" in "ops->ndo_open",
because after a netdev is allocated and registered, the
"__QUEUE_STATE_DRV_XOFF" flag is initially not set, so there is no need
to call "netif_start_queue" to clear it.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Xie He <xie.he.0141@gmail.com>
Acked-by: Martin Schiller <ms@dev.tdt.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-09 11:03:09 -08:00
Ondrej Mosnacek
53cb245454 NFSv4.2: fix return value of _nfs4_get_security_label()
An xattr 'get' handler is expected to return the length of the value on
success, yet _nfs4_get_security_label() (and consequently also
nfs4_xattr_get_nfs4_label(), which is used as an xattr handler) returns
just 0 on success.

Fix this by returning label.len instead, which contains the length of
the result.

Fixes: aa9c266962 ("NFS: Client implementation of Labeled-NFS")
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Reviewed-by: James Morris <jamorris@linux.microsoft.com>
Reviewed-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2021-03-09 13:30:39 -05:00
Marc Zyngier
01dc9262ff KVM: arm64: Ensure I-cache isolation between vcpus of a same VM
It recently became apparent that the ARMv8 architecture has interesting
rules regarding attributes being used when fetching instructions
if the MMU is off at Stage-1.

In this situation, the CPU is allowed to fetch from the PoC and
allocate into the I-cache (unless the memory is mapped with
the XN attribute at Stage-2).

If we transpose this to vcpus sharing a single physical CPU,
it is possible for a vcpu running with its MMU off to influence
another vcpu running with its MMU on, as the latter is expected to
fetch from the PoU (and self-patching code doesn't flush below that
level).

In order to solve this, reuse the vcpu-private TLB invalidation
code to apply the same policy to the I-cache, nuking it every time
the vcpu runs on a physical CPU that ran another vcpu of the same
VM in the past.

This involve renaming __kvm_tlb_flush_local_vmid() to
__kvm_flush_cpu_context(), and inserting a local i-cache invalidation
there.

Cc: stable@vger.kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20210303164505.68492-1-maz@kernel.org
2021-03-09 17:58:56 +00:00
Andrey Konovalov
86c83365ab arm64: kasan: fix page_alloc tagging with DEBUG_VIRTUAL
When CONFIG_DEBUG_VIRTUAL is enabled, the default page_to_virt() macro
implementation from include/linux/mm.h is used. That definition doesn't
account for KASAN tags, which leads to no tags on page_alloc allocations.

Provide an arm64-specific definition for page_to_virt() when
CONFIG_DEBUG_VIRTUAL is enabled that takes care of KASAN tags.

Fixes: 2813b9c029 ("kasan, mm, arm64: tag non slab memory allocated via pagealloc")
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/4b55b35202706223d3118230701c6a59749d9b72.1615219501.git.andreyknvl@google.com
Signed-off-by: Will Deacon <will@kernel.org>
2021-03-09 13:19:21 +00:00
Joerg Roedel
bffe30dd9f x86/sev-es: Use __copy_from_user_inatomic()
The #VC handler must run in atomic context and cannot sleep. This is a
problem when it tries to fetch instruction bytes from user-space via
copy_from_user().

Introduce a insn_fetch_from_user_inatomic() helper which uses
__copy_from_user_inatomic() to safely copy the instruction bytes to
kernel memory in the #VC handler.

Fixes: 5e3427a7bc ("x86/sev-es: Handle instruction fetches from user-space")
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: stable@vger.kernel.org # v5.10+
Link: https://lkml.kernel.org/r/20210303141716.29223-6-joro@8bytes.org
2021-03-09 12:37:54 +01:00
Joerg Roedel
62441a1fb5 x86/sev-es: Correctly track IRQ states in runtime #VC handler
Call irqentry_nmi_enter()/irqentry_nmi_exit() in the #VC handler to
correctly track the IRQ state during its execution.

Fixes: 0786138c78 ("x86/sev-es: Add a Runtime #VC Exception Handler")
Reported-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: stable@vger.kernel.org # v5.10+
Link: https://lkml.kernel.org/r/20210303141716.29223-5-joro@8bytes.org
2021-03-09 12:33:46 +01:00
Joerg Roedel
545ac14c16 x86/sev-es: Check regs->sp is trusted before adjusting #VC IST stack
The code in the NMI handler to adjust the #VC handler IST stack is
needed in case an NMI hits when the #VC handler is still using its IST
stack.

But the check for this condition also needs to look if the regs->sp
value is trusted, meaning it was not set by user-space. Extend the check
to not use regs->sp when the NMI interrupted user-space code or the
SYSCALL gap.

Fixes: 315562c9af ("x86/sev-es: Adjust #VC IST Stack on entering NMI handler")
Reported-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: stable@vger.kernel.org # 5.10+
Link: https://lkml.kernel.org/r/20210303141716.29223-3-joro@8bytes.org
2021-03-09 12:26:26 +01:00
Naveen N. Rao
cea15316ce powerpc/64s: Fix instruction encoding for lis in ppc_function_entry()
'lis r2,N' is 'addis r2,0,N' and the instruction encoding in the macro
LIS_R2 is incorrect (it currently maps to 'addis r0,r2,N'). Fix the
same.

Fixes: c71b7eff42 ("powerpc: Add ABIv2 support to ppc_function_entry")
Cc: stable@vger.kernel.org # v3.16+
Reported-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Segher Boessenkool <segher@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210304020411.16796-1-naveen.n.rao@linux.vnet.ibm.com
2021-03-09 22:13:26 +11:00
Thomas Bogendoerfer
bd67b711bf MIPS: kernel: Reserve exception base early to prevent corruption
BMIPS is one of the few platforms that do change the exception base.
After commit 2dcb396454 ("memblock: do not start bottom-up allocations
with kernel_end") we started seeing BMIPS boards fail to boot with the
built-in FDT being corrupted.

Before the cited commit, early allocations would be in the [kernel_end,
RAM_END] range, but after commit they would be within [RAM_START +
PAGE_SIZE, RAM_END].

The custom exception base handler that is installed by
bmips_ebase_setup() done for BMIPS5000 CPUs ends-up trampling on the
memory region allocated by unflatten_and_copy_device_tree() thus
corrupting the FDT used by the kernel.

To fix this, we need to perform an early reservation of the custom
exception space. Additional we reserve the first 4k (1k for R3k) for
either normal exception vector space (legacy CPUs) or special vectors
like cache exceptions.

Huge thanks to Serge for analysing and proposing a solution to this
issue.

Fixes: 2dcb396454 ("memblock: do not start bottom-up allocations with kernel_end")
Reported-by: Kamal Dasu <kdasu.kdev@gmail.com>
Debugged-by: Serge Semin <Sergey.Semin@baikalelectronics.ru>
Acked-by: Mike Rapoport <rppt@linux.ibm.com>
Tested-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Serge Semin <fancer.lancer@gmail.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2021-03-09 11:22:59 +01:00
Frank Li
f06391c45e mmc: cqhci: Fix random crash when remove mmc module/card
[ 6684.493350] Unable to handle kernel paging request at virtual address ffff800011c5b0f0
[ 6684.498531] mmc0: card 0001 removed
[ 6684.501556] Mem abort info:
[ 6684.509681]   ESR = 0x96000047
[ 6684.512786]   EC = 0x25: DABT (current EL), IL = 32 bits
[ 6684.518394]   SET = 0, FnV = 0
[ 6684.521707]   EA = 0, S1PTW = 0
[ 6684.524998] Data abort info:
[ 6684.528236]   ISV = 0, ISS = 0x00000047
[ 6684.532986]   CM = 0, WnR = 1
[ 6684.536129] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000081b22000
[ 6684.543923] [ffff800011c5b0f0] pgd=00000000bffff003, p4d=00000000bffff003, pud=00000000bfffe003, pmd=00000000900e1003, pte=0000000000000000
[ 6684.557915] Internal error: Oops: 96000047 [#1] PREEMPT SMP
[ 6684.564240] Modules linked in: sdhci_esdhc_imx(-) sdhci_pltfm sdhci cqhci mmc_block mmc_core fsl_jr_uio caam_jr caamkeyblob_desc caamhash_desc caamalg_desc crypto_engine rng_core authenc libdes crct10dif_ce flexcan can_dev caam error [last unloaded: mmc_core]
[ 6684.587281] CPU: 0 PID: 79138 Comm: kworker/0:3H Not tainted 5.10.9-01410-g3ba33182767b-dirty #10
[ 6684.596160] Hardware name: Freescale i.MX8DXL EVK (DT)
[ 6684.601320] Workqueue: kblockd blk_mq_run_work_fn

[ 6684.606094] pstate: 40000005 (nZcv daif -PAN -UAO -TCO BTYPE=--)
[ 6684.612286] pc : cqhci_request+0x148/0x4e8 [cqhci]
^GMessage from syslogd@  at Thu Jan  1 01:51:24 1970 ...[ 6684.617085] lr : cqhci_request+0x314/0x4e8 [cqhci]
[ 6684.626734] sp : ffff80001243b9f0
[ 6684.630049] x29: ffff80001243b9f0 x28: ffff00002c3dd000
[ 6684.635367] x27: 0000000000000001 x26: 0000000000000001
[ 6684.640690] x25: ffff00002c451000 x24: 000000000000000f
[ 6684.646007] x23: ffff000017e71c80 x22: ffff00002c451000
[ 6684.651326] x21: ffff00002c0f3550 x20: ffff00002c0f3550
[ 6684.656651] x19: ffff000017d46880 x18: ffff00002cea1500
[ 6684.661977] x17: 0000000000000000 x16: 0000000000000000
[ 6684.667294] x15: 000001ee628e3ed1 x14: 0000000000000278
[ 6684.672610] x13: 0000000000000001 x12: 0000000000000001
[ 6684.677927] x11: 0000000000000000 x10: 0000000000000000
[ 6684.683243] x9 : 000000000000002b x8 : 0000000000001000
[ 6684.688560] x7 : 0000000000000010 x6 : ffff00002c0f3678
[ 6684.693886] x5 : 000000000000000f x4 : ffff800011c5b000
[ 6684.699211] x3 : 000000000002d988 x2 : 0000000000000008
[ 6684.704537] x1 : 00000000000000f0 x0 : 0002d9880008102f
[ 6684.709854] Call trace:
[ 6684.712313]  cqhci_request+0x148/0x4e8 [cqhci]
[ 6684.716803]  mmc_cqe_start_req+0x58/0x68 [mmc_core]
[ 6684.721698]  mmc_blk_mq_issue_rq+0x460/0x810 [mmc_block]
[ 6684.727018]  mmc_mq_queue_rq+0x118/0x2b0 [mmc_block]

The problem occurs when cqhci_request() get called after cqhci_disable() as
it leads to access of allocated memory that has already been freed. Let's
fix the problem by calling cqhci_disable() a bit later in the remove path.

Signed-off-by: Frank Li <Frank.Li@nxp.com>
Diagnosed-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Link: https://lore.kernel.org/r/20210303174248.542175-1-Frank.Li@nxp.com
Fixes: f690f4409d ("mmc: mmc: Enable CQE's")
Cc: stable@vger.kernel.org
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2021-03-09 10:00:52 +01:00
Adrian Hunter
66fbacccba mmc: core: Fix partition switch time for eMMC
Avoid the following warning by always defining partition switch time:

 [    3.209874] mmc1: unspecified timeout for CMD6 - use generic
 [    3.222780] ------------[ cut here ]------------
 [    3.233363] WARNING: CPU: 1 PID: 111 at drivers/mmc/core/mmc_ops.c:575 __mmc_switch+0x200/0x204

Reported-by: Paul Fertser <fercerpav@gmail.com>
Fixes: 1c447116d0 ("mmc: mmc: Fix partition switch timeout for some eMMCs")
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Link: https://lore.kernel.org/r/168bbfd6-0c5b-5ace-ab41-402e7937c46e@intel.com
Cc: stable@vger.kernel.org
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2021-03-09 10:00:52 +01:00
Yann Gautier
774514bf97 mmc: mmci: Add MMC_CAP_NEED_RSP_BUSY for the stm32 variants
An issue has been observed on STM32MP157C-EV1 board, with an erase command
with secure erase argument, ending up waiting for ~4 hours before timeout.

The requested busy timeout from the mmc core ends up with 14784000ms (~4
hours), but the supported host->max_busy_timeout is 86767ms, which leads to
that the core switch to use an R1 response in favor of the R1B and polls
for busy with the host->card_busy() ops. In this case the polling doesn't
work as expected, as we never detects that the card stops signaling busy,
which leads to the following message:

 mmc1: Card stuck being busy! __mmc_poll_for_busy

The problem boils done to that the stm32 variants can't use R1 responses in
favor of R1B responses, as it leads to an internal state machine in the
controller to get stuck. To continue to process requests, it would need to
be reset.

To fix this problem, let's set MMC_CAP_NEED_RSP_BUSY for the stm32 variant,
which prevent the mmc core from switching to R1 responses. Additionally,
let's cap the cmd->busy_timeout to the host->max_busy_timeout, thus rely on
86767ms to be sufficient (~66 seconds was need for this test case).

Fixes: 94fe2580a2 ("mmc: core: Enable erase/discard/trim support for all mmc hosts")
Signed-off-by: Yann Gautier <yann.gautier@foss.st.com>
Link: https://lore.kernel.org/r/20210225145454.12780-1-yann.gautier@foss.st.com
Cc: stable@vger.kernel.org
[Ulf: Simplified the code and extended the commit message]
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2021-03-09 10:00:52 +01:00
Sami Tolvanen
dbaee836d6 KVM: arm64: Don't use cbz/adr with external symbols
allmodconfig + CONFIG_LTO_CLANG_THIN=y fails to build due to following
linker errors:

  ld.lld: error: irqbypass.c:(function __guest_enter: .text+0x21CC):
  relocation R_AARCH64_CONDBR19 out of range: 2031220 is not in
  [-1048576, 1048575]; references hyp_panic
  >>> defined in vmlinux.o

  ld.lld: error: irqbypass.c:(function __guest_enter: .text+0x21E0):
  relocation R_AARCH64_ADR_PREL_LO21 out of range: 2031200 is not in
  [-1048576, 1048575]; references hyp_panic
  >>> defined in vmlinux.o

This is because with LTO, the compiler ends up placing hyp_panic()
more than 1MB away from __guest_enter(). Use an unconditional branch
and adr_l instead to fix the issue.

Link: https://github.com/ClangBuiltLinux/linux/issues/1317
Reported-by: Nathan Chancellor <nathan@kernel.org>
Suggested-by: Marc Zyngier <maz@kernel.org>
Suggested-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Will Deacon <will@kernel.org>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210305202124.3768527-1-samitolvanen@google.com
2021-03-09 08:48:24 +00:00
Paul Cercueil
5fbecd2389 irqchip/ingenic: Add support for the JZ4760
Add support for the interrupt controller found in the JZ4760 SoC, which
works exactly like the one in the JZ4770.

Signed-off-by: Paul Cercueil <paul@crapouillou.net>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210307172014.73481-2-paul@crapouillou.net
2021-03-09 08:45:17 +00:00
Paul Cercueil
673433e7c2 dt-bindings/irq: Add compatible string for the JZ4760B
Add the ingenic,jz4760b-intc compatible string with a fallback to the
ingenic,jz4760-intc compatible string.

Signed-off-by: Paul Cercueil <paul@crapouillou.net>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210307172014.73481-1-paul@crapouillou.net
2021-03-09 08:45:11 +00:00
Pavel Skripkin
cfdc67acc7 USB: serial: io_edgeport: fix memory leak in edge_startup
sysbot found memory leak in edge_startup().
The problem was that when an error was received from the usb_submit_urb(),
nothing was cleaned up.

Reported-by: syzbot+59f777bdcbdd7eea5305@syzkaller.appspotmail.com
Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
Fixes: 6e8cf7751f ("USB: add EPIC support to the io_edgeport driver")
Cc: stable@vger.kernel.org	# 2.6.21: c5c0c55598
Signed-off-by: Johan Hovold <johan@kernel.org>
2021-03-09 09:05:37 +01:00
Niv Sardi
5563b3b642 USB: serial: ch341: add new Product ID
Add PID for CH340 that's found on cheap programmers.

The driver works flawlessly as soon as the new PID (0x9986) is added to it.
These look like ANU232MI but ship with a ch341 inside. They have no special
identifiers (mine only has the string "DB9D20130716" printed on the PCB and
nothing identifiable on the packaging. The merchant i bought it from
doesn't sell these anymore).

the lsusb -v output is:
Bus 001 Device 009: ID 9986:7523
Device Descriptor:
  bLength                18
  bDescriptorType         1
  bcdUSB               1.10
  bDeviceClass          255 Vendor Specific Class
  bDeviceSubClass         0
  bDeviceProtocol         0
  bMaxPacketSize0         8
  idVendor           0x9986
  idProduct          0x7523
  bcdDevice            2.54
  iManufacturer           0
  iProduct                0
  iSerial                 0
  bNumConfigurations      1
  Configuration Descriptor:
    bLength                 9
    bDescriptorType         2
    wTotalLength       0x0027
    bNumInterfaces          1
    bConfigurationValue     1
    iConfiguration          0
    bmAttributes         0x80
      (Bus Powered)
    MaxPower               96mA
    Interface Descriptor:
      bLength                 9
      bDescriptorType         4
      bInterfaceNumber        0
      bAlternateSetting       0
      bNumEndpoints           3
      bInterfaceClass       255 Vendor Specific Class
      bInterfaceSubClass      1
      bInterfaceProtocol      2
      iInterface              0
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x82  EP 2 IN
        bmAttributes            2
          Transfer Type            Bulk
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0020  1x 32 bytes
        bInterval               0
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x02  EP 2 OUT
        bmAttributes            2
          Transfer Type            Bulk
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0020  1x 32 bytes
        bInterval               0
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x81  EP 1 IN
        bmAttributes            3
          Transfer Type            Interrupt
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0008  1x 8 bytes
        bInterval               1

Signed-off-by: Niv Sardi <xaiki@evilgiggle.com>
Cc: stable@vger.kernel.org
Signed-off-by: Johan Hovold <johan@kernel.org>
2021-03-09 09:05:36 +01:00
Johan Hovold
f164f5d8a7 USB: serial: xr: fix NULL-deref on disconnect
Claiming the sibling control interface is a bit more involved and
specifically requires adding support to USB-serial core for managing
either interface being unbound first, something which could otherwise
lead to a NULL-pointer dereference.

Similarly, additional infrastructure is also needed to handle suspend
properly.

Since the driver currently isn't actually using the control interface,
we can defer this for now by simply not claiming the control interface.

Fixes: c2d405aa86 ("USB: serial: add MaxLinear/Exar USB to Serial driver")
Reported-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Tested-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Manivannan Sadhasivam <mani@kernel.org>
Signed-off-by: Johan Hovold <johan@kernel.org>
2021-03-09 09:05:36 +01:00
Sebastian Reichel
42213a0190 USB: serial: cp210x: add some more GE USB IDs
GE CS1000 has some more custom USB IDs for CP2102N; add them
to the driver to have working auto-probing.

Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Cc: stable@vger.kernel.org
Signed-off-by: Johan Hovold <johan@kernel.org>
2021-03-09 09:05:36 +01:00
Karan Singhal
ca667a3320 USB: serial: cp210x: add ID for Acuity Brands nLight Air Adapter
IDs of nLight Air Adapter, Acuity Brands, Inc.:
vid: 10c4
pid: 88d8

Signed-off-by: Karan Singhal <karan.singhal@acuitybrands.com>
Cc: stable@vger.kernel.org
Signed-off-by: Johan Hovold <johan@kernel.org>
2021-03-09 09:05:36 +01:00
Pavel Skripkin
c5aa956eae ALSA: usb-audio: fix use after free in usb_audio_disconnect
The problem was in wrong "if" placement. chip->quirk_type is freed
in snd_card_free_when_closed(), but inside if statement it's accesed.

Fixes: 9799110825 ("ALSA: usb-audio: Disable USB autosuspend properly in setup_disable_autosuspend()")
Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/16da19126ff461e5e64a9aec648cce28fb8ed73e.1615242183.git.paskripkin@gmail.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2021-03-09 07:40:46 +01:00
Pavel Skripkin
30dea07180 ALSA: usb-audio: fix NULL ptr dereference in usb_audio_probe
syzbot reported null pointer dereference in usb_audio_probe.
The problem was in case, when quirk == NULL. It's not an
error condition, so quirk must be checked before dereferencing.

Call Trace:
 usb_probe_interface+0x315/0x7f0 drivers/usb/core/driver.c:396
 really_probe+0x291/0xe60 drivers/base/dd.c:554
 driver_probe_device+0x26b/0x3d0 drivers/base/dd.c:740
 __device_attach_driver+0x1d1/0x290 drivers/base/dd.c:846
 bus_for_each_drv+0x15f/0x1e0 drivers/base/bus.c:431
 __device_attach+0x228/0x4a0 drivers/base/dd.c:914
 bus_probe_device+0x1e4/0x290 drivers/base/bus.c:491
 device_add+0xbdb/0x1db0 drivers/base/core.c:3242
 usb_set_configuration+0x113f/0x1910 drivers/usb/core/message.c:2164
 usb_generic_driver_probe+0xba/0x100 drivers/usb/core/generic.c:238
 usb_probe_device+0xd9/0x2c0 drivers/usb/core/driver.c:293
 really_probe+0x291/0xe60 drivers/base/dd.c:554
 driver_probe_device+0x26b/0x3d0 drivers/base/dd.c:740
 __device_attach_driver+0x1d1/0x290 drivers/base/dd.c:846
 bus_for_each_drv+0x15f/0x1e0 drivers/base/bus.c:431
 __device_attach+0x228/0x4a0 drivers/base/dd.c:914
 bus_probe_device+0x1e4/0x290 drivers/base/bus.c:491
 device_add+0xbdb/0x1db0 drivers/base/core.c:3242
 usb_new_device.cold+0x721/0x1058 drivers/usb/core/hub.c:2555
 hub_port_connect drivers/usb/core/hub.c:5223 [inline]
 hub_port_connect_change drivers/usb/core/hub.c:5363 [inline]
 port_event drivers/usb/core/hub.c:5509 [inline]
 hub_event+0x2357/0x4320 drivers/usb/core/hub.c:5591
 process_one_work+0x98d/0x1600 kernel/workqueue.c:2275
 worker_thread+0x64c/0x1120 kernel/workqueue.c:2421
 kthread+0x3b1/0x4a0 kernel/kthread.c:292
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294

Reported-by: syzbot+719da9b149a931f5143f@syzkaller.appspotmail.com
Fixes: 9799110825 ("ALSA: usb-audio: Disable USB autosuspend properly in setup_disable_autosuspend()")
Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/f1ebad6e721412843bd1b12584444c0a63c6b2fb.1615242183.git.paskripkin@gmail.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2021-03-09 07:40:09 +01:00
Simeon Simeonoff
f15c5c11ab ALSA: hda/ca0132: Add Sound BlasterX AE-5 Plus support
The new AE-5 Plus model has a different Subsystem ID compared to the
non-plus model. Adding the new id to the list of quirks.

Signed-off-by: Simeon Simeonoff <sim.simeonoff@gmail.com>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/998cafbe10b648f724ee33570553f2d780a38963.camel@gmail.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2021-03-09 07:34:37 +01:00
Linus Torvalds
987a08741d Merge git://git.kernel.org:/pub/scm/linux/kernel/git/davem/sparc
Pull sparc updates from David Miller:
 "Just some more random bits from Al, including a conversion over to
  generic extables"

* git://git.kernel.org:/pub/scm/linux/kernel/git/davem/sparc:
  sparc32: take ->thread.flags out
  sparc32: get rid of fake_swapper_regs
  sparc64: get rid of fake_swapper_regs
  sparc32: switch to generic extables
  sparc32: switch copy_user.S away from range exception table entries
  sparc32: get rid of range exception table entries in checksum_32.S
  sparc32: switch __bzero() away from range exception table entries
  sparc32: kill lookup_fault()
  sparc32: don't bother with lookup_fault() in __bzero()
2021-03-08 22:01:58 -08:00
Paulo Alcantara
04ad69c342 cifs: do not send close in compound create+close requests
In case of interrupted syscalls, prevent sending CLOSE commands for
compound CREATE+CLOSE requests by introducing an
CIFS_CP_CREATE_CLOSE_OP flag to indicate lower layers that it should
not send a CLOSE command to the MIDs corresponding the compound
CREATE+CLOSE request.

A simple reproducer:

    #!/bin/bash

    mount //server/share /mnt -o username=foo,password=***
    tc qdisc add dev eth0 root netem delay 450ms
    stat -f /mnt &>/dev/null & pid=$!
    sleep 0.01
    kill $pid
    tc qdisc del dev eth0 root
    umount /mnt

Before patch:

    ...
    6 0.256893470 192.168.122.2 → 192.168.122.15 SMB2 402 Create Request File: ;GetInfo Request FS_INFO/FileFsFullSizeInformation;Close Request
    7 0.257144491 192.168.122.15 → 192.168.122.2 SMB2 498 Create Response File: ;GetInfo Response;Close Response
    9 0.260798209 192.168.122.2 → 192.168.122.15 SMB2 146 Close Request File:
   10 0.260841089 192.168.122.15 → 192.168.122.2 SMB2 130 Close Response, Error: STATUS_FILE_CLOSED

Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
CC: <stable@vger.kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2021-03-08 21:23:22 -06:00
Paulo Alcantara
14302ee330 cifs: return proper error code in statfs(2)
In cifs_statfs(), if server->ops->queryfs is not NULL, then we should
use its return value rather than always returning 0.  Instead, use rc
variable as it is properly set to 0 in case there is no
server->ops->queryfs.

Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
CC: <stable@vger.kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2021-03-08 21:23:11 -06:00
Paulo Alcantara
e3d100eae4 cifs: change noisy error message to FYI
A customer has reported that their dmesg were being flooded by

  CIFS: VFS: \\server Cancelling wait for mid xxx cmd: a
  CIFS: VFS: \\server Cancelling wait for mid yyy cmd: b
  CIFS: VFS: \\server Cancelling wait for mid zzz cmd: c

because some processes that were performing statfs(2) on the share had
been interrupted due to their automount setup when certain users
logged in and out.

Change it to FYI as they should be mostly informative rather than
error messages.

Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2021-03-08 21:22:52 -06:00
Paulo Alcantara
bf1bc694b6 cifs: print MIDs in decimal notation
The MIDs are mostly printed as decimal, so let's make it consistent.

Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2021-03-08 21:22:31 -06:00
Tong Zhang
4416e98594 atm: idt77252: fix null-ptr-dereference
this one is similar to the phy_data allocation fix in uPD98402, the
driver allocate the idt77105_priv and store to dev_data but later
dereference using dev->dev_data, which will cause null-ptr-dereference.

fix this issue by changing dev_data to phy_data so that PRIV(dev) can
work correctly.

Signed-off-by: Tong Zhang <ztong0001@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-08 15:16:30 -08:00
Tong Zhang
3153724fc0 atm: uPD98402: fix incorrect allocation
dev->dev_data is set in zatm.c, calling zatm_start() will overwrite this
dev->dev_data in uPD98402_start() and a subsequent PRIV(dev)->lock
(i.e dev->phy_data->lock) will result in a null-ptr-dereference.

I believe this is a typo and what it actually want to do is to allocate
phy_data instead of dev_data.

Signed-off-by: Tong Zhang <ztong0001@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-08 15:16:30 -08:00
Tong Zhang
1019d7923d atm: fix a typo in the struct description
phy_data means private PHY data not date

Signed-off-by: Tong Zhang <ztong0001@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-08 15:16:30 -08:00
Jia-Ju Bai
179d0ba0c4 net: qrtr: fix error return code of qrtr_sendmsg()
When sock_alloc_send_skb() returns NULL to skb, no error return code of
qrtr_sendmsg() is assigned.
To fix this bug, rc is assigned with -ENOMEM in this case.

Fixes: 194ccc8829 ("net: qrtr: Support decoding incoming v2 packets")
Reported-by: TOTE Robot <oslab@tsinghua.edu.cn>
Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-08 15:02:53 -08:00
Davide Caratti
27ab92d999 mptcp: fix length of ADD_ADDR with port sub-option
in current Linux, MPTCP peers advertising endpoints with port numbers use
a sub-option length that wrongly accounts for the trailing TCP NOP. Also,
receivers will only process incoming ADD_ADDR with port having such wrong
sub-option length. Fix this, making ADD_ADDR compliant to RFC8684 §3.4.1.

this can be verified running tcpdump on the kselftests artifacts:

 unpatched kernel:
 [root@bottarga mptcp]# tcpdump -tnnr unpatched.pcap | grep add-addr
 reading from file unpatched.pcap, link-type LINUX_SLL (Linux cooked v1), snapshot length 65535
 IP 10.0.1.1.10000 > 10.0.1.2.53078: Flags [.], ack 101, win 509, options [nop,nop,TS val 214459678 ecr 521312851,mptcp add-addr v1 id 1 a00:201:2774:2d88:7436:85c3:17fd:101], length 0
 IP 10.0.1.2.53078 > 10.0.1.1.10000: Flags [.], ack 101, win 502, options [nop,nop,TS val 521312852 ecr 214459678,mptcp add-addr[bad opt]]

 patched kernel:
 [root@bottarga mptcp]# tcpdump -tnnr patched.pcap | grep add-addr
 reading from file patched.pcap, link-type LINUX_SLL (Linux cooked v1), snapshot length 65535
 IP 10.0.1.1.10000 > 10.0.1.2.38178: Flags [.], ack 101, win 509, options [nop,nop,TS val 3728873902 ecr 2732713192,mptcp add-addr v1 id 1 10.0.2.1:10100 hmac 0xbccdfcbe59292a1f,nop,nop], length 0
 IP 10.0.1.2.38178 > 10.0.1.1.10000: Flags [.], ack 101, win 502, options [nop,nop,TS val 2732713195 ecr 3728873902,mptcp add-addr v1-echo id 1 10.0.2.1:10100,nop,nop], length 0

Fixes: 22fb85ffae ("mptcp: add port support for ADD_ADDR suboption writing")
CC: stable@vger.kernel.org # 5.11+
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Acked-and-tested-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-08 15:02:03 -08:00
Trond Myklebust
b6f80a2ebb NFS: Fix open coded versions of nfs_set_cache_invalid() in NFSv4
nfs_set_cache_invalid() has code to handle delegations, and other
optimisations, so let's use it when appropriate.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2021-03-08 16:32:11 -05:00
Trond Myklebust
ac46b3d768 NFS: Fix open coded versions of nfs_set_cache_invalid()
nfs_set_cache_invalid() has code to handle delegations, and other
optimisations, so let's use it when appropriate.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2021-03-08 16:13:55 -05:00
Trond Myklebust
fd6d3feed0 NFS: Clean up function nfs_mark_dir_for_revalidate()
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2021-03-08 16:01:02 -05:00
Trond Myklebust
47397915ed NFS: Don't gratuitously clear the inode cache when lookup failed
The fact that the lookup revalidation failed, does not mean that the
inode contents have changed.

Fixes: 5ceb9d7fda ("NFS: Refactor nfs_lookup_revalidate()")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2021-03-08 15:57:17 -05:00
Trond Myklebust
82e7ca1334 NFS: Don't revalidate the directory permissions on a lookup failure
There should be no reason to expect the directory permissions to change
just because the directory contents changed or a negative lookup timed
out. So let's avoid doing a full call to nfs_mark_for_revalidate() in
that case.
Furthermore, if this is a negative dentry, and we haven't actually done
a new lookup, then we have no reason yet to believe the directory has
changed at all. So let's remove the gratuitous directory inode
invalidation altogether when called from
nfs_lookup_revalidate_negative().

Reported-by: Geert Jansen <gerardu@amazon.com>
Fixes: 5ceb9d7fda ("NFS: Refactor nfs_lookup_revalidate()")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2021-03-08 15:48:02 -05:00
Benjamin Coddington
f0940f4b32 SUNRPC: Set memalloc_nofs_save() for sync tasks
We could recurse into NFS doing memory reclaim while sending a sync task,
which might result in a deadlock.  Set memalloc_nofs_save for sync task
execution.

Fixes: a1231fda7e ("SUNRPC: Set memalloc_nofs_save() on all rpciod/xprtiod jobs")
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2021-03-08 15:32:16 -05:00
Marc Zyngier
a79f7051cc irqchip: Do not blindly select CONFIG_GENERIC_IRQ_MULTI_HANDLER
Implementing CONFIG_GENERIC_IRQ_MULTI_HANDLER is a decision that is
made at the architecture level, and shouldn't involve the irqchip
at all (we even provide a fallback helper when the option isn't
selected).

Drop all instances of such selection from non-arch code.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210217142800.2547737-1-maz@kernel.org
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Hector Martin <marcan@marcan.st>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-03-08 20:18:41 +00:00
Marc Zyngier
3e895f4cbd ARM: ep93xx: Select GENERIC_IRQ_MULTI_HANDLER directly
ep93xx currently relies of CONFIG_ARM_VIC to select
GENERIC_IRQ_MULTI_HANDLER. Given that this is logically a platform
architecture property, add the selection of GENERIC_IRQ_MULTI_HANDLER
at the platform level.

Further patches will remove the selection from the irqchip side.

Reported-by: Marc Rutland <mark.rutland@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Hector Martin <marcan@marcan.st>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-03-08 20:18:32 +00:00
Greg Kroah-Hartman
69dd4503a7 irqdomain: Remove debugfs_file from struct irq_domain
There's no need to keep around a dentry pointer to a simple file that
debugfs itself can look up when we need to remove it from the system.
So simplify the code by deleting the variable and cleaning up the logic
around the debugfs file.

Cc: Marc Zyngier <maz@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/YCvYV53ZdzQSWY6w@kroah.com
2021-03-08 20:12:08 +00:00
Jia-Ju Bai
2055a99da8 net: bonding: fix error return code of bond_neigh_init()
When slave is NULL or slave_ops->ndo_neigh_setup is NULL, no error
return code of bond_neigh_init() is assigned.
To fix this bug, ret is assigned with -EINVAL in these cases.

Fixes: 9e99bfefdb ("bonding: fix bond_neigh_init()")
Reported-by: TOTE Robot <oslab@tsinghua.edu.cn>
Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-08 12:05:36 -08:00
Vladimir Oltean
29d98f54a4 net: enetc: allow hardware timestamping on TX queues with tc-etf enabled
The txtime is passed to the driver in skb->skb_mstamp_ns, which is
actually in a union with skb->tstamp (the place where software
timestamps are kept).

Since commit b50a5c70ff ("net: allow simultaneous SW and HW transmit
timestamping"), __sock_recv_timestamp has some logic for making sure
that the two calls to skb_tstamp_tx:

skb_tx_timestamp(skb) # Software timestamp in the driver
-> skb_tstamp_tx(skb, NULL)

and

skb_tstamp_tx(skb, &shhwtstamps) # Hardware timestamp in the driver

will both do the right thing and in a race-free manner, meaning that
skb_tx_timestamp will deliver a cmsg with the software timestamp only,
and skb_tstamp_tx with a non-NULL hwtstamps argument will deliver a cmsg
with the hardware timestamp only.

Why are races even possible? Well, because although the software timestamp
skb->tstamp is private per skb, the hardware timestamp skb_hwtstamps(skb)
lives in skb_shinfo(skb), an area which is shared between skbs and their
clones. And skb_tstamp_tx works by cloning the packets when timestamping
them, therefore attempting to perform hardware timestamping on an skb's
clone will also change the hardware timestamp of the original skb. And
the original skb might have been yet again cloned for software
timestamping, at an earlier stage.

So the logic in __sock_recv_timestamp can't be as simple as saying
"does this skb have a hardware timestamp? if yes I'll send the hardware
timestamp to the socket, otherwise I'll send the software timestamp",
precisely because the hardware timestamp is shared.
Instead, it's quite the other way around: __sock_recv_timestamp says
"does this skb have a software timestamp? if yes, I'll send the software
timestamp, otherwise the hardware one". This works because the software
timestamp is not shared with clones.

But that means we have a problem when we attempt hardware timestamping
with skbs that don't have the skb->tstamp == 0. __sock_recv_timestamp
will say "oh, yeah, this must be some sort of odd clone" and will not
deliver the hardware timestamp to the socket. And this is exactly what
is happening when we have txtime enabled on the socket: as mentioned,
that is put in a union with skb->tstamp, so it is quite easy to mistake
it.

Do what other drivers do (intel igb/igc) and write zero to skb->tstamp
before taking the hardware timestamp. It's of no use to us now (we're
already on the TX confirmation path).

Fixes: 0d08c9ec7d ("enetc: add support time specific departure base on the qos etf")
Cc: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-08 12:03:42 -08:00
Alex Marginean
1b2395dfff net: enetc: set MAC RX FIFO to recommended value
On LS1028A, the MAC RX FIFO defaults to the value 2, which is too high
and may lead to RX lock-up under traffic at a rate higher than 6 Gbps.
Set it to 1 instead, as recommended by the hardware design team and by
later versions of the ENETC block guide.

Signed-off-by: Alex Marginean <alexandru.marginean@nxp.com>
Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Reviewed-by: Jason Liu <jason.hui.liu@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-08 12:03:42 -08:00
Paul Cercueil
2e26962236 net: davicom: Use platform_get_irq_optional()
The second IRQ line really is optional, so use
platform_get_irq_optional() to obtain it.

Signed-off-by: Paul Cercueil <paul@crapouillou.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-08 12:01:58 -08:00
Paul Cercueil
cf9e60aa69 net: davicom: Fix regulator not turned off on driver removal
We must disable the regulator that was enabled in the probe function.

Fixes: 7994fe55a4 ("dm9000: Add regulator and reset support to dm9000")
Signed-off-by: Paul Cercueil <paul@crapouillou.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-08 12:01:58 -08:00
Paul Cercueil
ac88c531a5 net: davicom: Fix regulator not turned off on failed probe
When the probe fails or requests to be defered, we must disable the
regulator that was previously enabled.

Fixes: 7994fe55a4 ("dm9000: Add regulator and reset support to dm9000")
Signed-off-by: Paul Cercueil <paul@crapouillou.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-08 12:01:58 -08:00
Vladimir Oltean
03cbb87054 net: dsa: fix switchdev objects on bridge master mistakenly being applied on ports
Tobias reports that after the blamed patch, VLAN objects being added to
a bridge device are being added to all slave ports instead (swp2, swp3).

ip link add br0 type bridge vlan_filtering 1
ip link set swp2 master br0
ip link set swp3 master br0
bridge vlan add dev br0 vid 100 self

This is because the fix was too broad: we made dsa_port_offloads_netdev
say "yes, I offload the br0 bridge" for all slave ports, but we didn't
add the checks whether the switchdev object was in fact meant for the
physical port or for the bridge itself. So we are reacting on events in
a way in which we shouldn't.

The reason why the fix was too broad is because the question itself,
"does this DSA port offload this netdev", was too broad in the first
place. The solution is to disambiguate the question and separate it into
two different functions, one to be called for each switchdev attribute /
object that has an orig_dev == net_bridge (dsa_port_offloads_bridge),
and the other for orig_dev == net_bridge_port (*_offloads_bridge_port).

In the case of VLAN objects on the bridge interface, this solves the
problem because we know that VLAN objects are per bridge port and not
per bridge. And when orig_dev is equal to the net_bridge, we offload it
as a bridge, but not as a bridge port; that's how we are able to skip
reacting on those events. Note that this is compatible with future plans
to have explicit offloading of VLAN objects on the bridge interface as a
bridge port (in DSA, this signifies that we should add that VLAN towards
the CPU port).

Fixes: 99b8202b17 ("net: dsa: fix SWITCHDEV_ATTR_ID_BRIDGE_VLAN_FILTERING getting ignored")
Reported-by: Tobias Waldekranz <tobias@waldekranz.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Tobias Waldekranz <tobias@waldekranz.com>
Tested-by: Tobias Waldekranz <tobias@waldekranz.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-08 11:59:00 -08:00
Jia-Ju Bai
62765d3955 net: wan: fix error return code of uhdlc_init()
When priv->rx_skbuff or priv->tx_skbuff is NULL, no error return code of
uhdlc_init() is assigned.
To fix this bug, ret is assigned with -ENOMEM in these cases.

Reported-by: TOTE Robot <oslab@tsinghua.edu.cn>
Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-08 11:56:40 -08:00
Jia-Ju Bai
143c253f42 net: hisilicon: hns: fix error return code of hns_nic_clear_all_rx_fetch()
When hns_assemble_skb() returns NULL to skb, no error return code of
hns_nic_clear_all_rx_fetch() is assigned.
To fix this bug, ret is assigned with -ENOMEM in this case.

Reported-by: TOTE Robot <oslab@tsinghua.edu.cn>
Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-08 11:56:00 -08:00
Grant Grundler
4d8c79b7e9 net: usb: log errors to dmesg/syslog
Errors in protocol should be logged when the driver aborts operations.
If the driver can carry on and "humor" the device, then emitting
the message as debug output level is fine.

Signed-off-by: Grant Grundler <grundler@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-08 11:54:32 -08:00
Grant Grundler
492bbe7f8a net: usb: cdc_ncm: emit dev_err on error paths
Several error paths in bind/probe code will only emit
output using dev_dbg. But if we are going to fail the
bind/probe, emit related output with "err" priority.

Signed-off-by: Grant Grundler <grundler@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-08 11:54:05 -08:00
Bhaskar Chowdhury
a4813dc7ba net: ethernet: chelsio: inline_crypto: Mundane typos fixed throughout the file chcr_ktls.c
Mundane typos fixes throughout the file.

s/establised/established/
s/availbale/available/
s/vaues/values/
s/Incase/In case/

Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-08 11:52:46 -08:00
Philipp Zabel
bf9279cd63 net: dsa: bcm_sf2: simplify optional reset handling
As of commit bb475230b8 ("reset: make optional functions really
optional"), the reset framework API calls use NULL pointers to describe
optional, non-present reset controls.

This allows to unconditionally return errors from
devm_reset_control_get_optional_exclusive.

Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-08 11:51:36 -08:00
Frank Sorenson
ad3dbe35c8 NFS: Correct size calculation for create reply length
CREATE requests return a post_op_fh3, rather than nfs_fh3. The
post_op_fh3 includes an extra word to indicate 'handle_follows'.

Without that additional word, create fails when full 64-byte
filehandles are in use.

Add NFS3_post_op_fh_sz, and correct the size calculation for
NFS3_createres_sz.

Signed-off-by: Frank Sorenson <sorenson@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2021-03-08 13:40:12 -05:00
Anshuman Khandual
093bbe211e arm64/mm: Reorganize pfn_valid()
There are multiple instances of pfn_to_section_nr() and __pfn_to_section()
when CONFIG_SPARSEMEM is enabled. This can be optimized if memory section
is fetched earlier. This replaces the open coded PFN and ADDR conversion
with PFN_PHYS() and PHYS_PFN() helpers. While there, also add a comment.
This does not cause any functional change.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/1614921898-4099-3-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2021-03-08 18:04:00 +00:00
Anshuman Khandual
eeb0753ba2 arm64/mm: Fix pfn_valid() for ZONE_DEVICE based memory
pfn_valid() validates a pfn but basically it checks for a valid struct page
backing for that pfn. It should always return positive for memory ranges
backed with struct page mapping. But currently pfn_valid() fails for all
ZONE_DEVICE based memory types even though they have struct page mapping.

pfn_valid() asserts that there is a memblock entry for a given pfn without
MEMBLOCK_NOMAP flag being set. The problem with ZONE_DEVICE based memory is
that they do not have memblock entries. Hence memblock_is_map_memory() will
invariably fail via memblock_search() for a ZONE_DEVICE based address. This
eventually fails pfn_valid() which is wrong. memblock_is_map_memory() needs
to be skipped for such memory ranges. As ZONE_DEVICE memory gets hotplugged
into the system via memremap_pages() called from a driver, their respective
memory sections will not have SECTION_IS_EARLY set.

Normal hotplug memory will never have MEMBLOCK_NOMAP set in their memblock
regions. Because the flag MEMBLOCK_NOMAP was specifically designed and set
for firmware reserved memory regions. memblock_is_map_memory() can just be
skipped as its always going to be positive and that will be an optimization
for the normal hotplug memory. Like ZONE_DEVICE based memory, all normal
hotplugged memory too will not have SECTION_IS_EARLY set for their sections

Skipping memblock_is_map_memory() for all non early memory sections would
fix pfn_valid() problem for ZONE_DEVICE based memory and also improve its
performance for normal hotplug memory as well.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Acked-by: David Hildenbrand <david@redhat.com>
Fixes: 73b20c84d4 ("arm64: mm: implement pte_devmap support")
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/1614921898-4099-2-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2021-03-08 18:04:00 +00:00
Bjørn Mork
6654111c89 MIPS: vmlinux.lds.S: align raw appended dtb to 8 bytes
The devicetree specification requires 8-byte alignment in
memory. This is now enforced by libfdt since commit 79edff1206
("scripts/dtc: Update to upstream version v1.6.0-51-g183df9e9c2b9")
which included the upstream commit 5e735860c478 ("libfdt: Check for
8-byte address alignment in fdt_ro_probe_()").

This broke the MIPS raw appended DTBs which would be appended to
the image immediately following the initramfs section.  This ends
with a 32bit size, resulting in a 4-byte alignment of the DTB.

Fix by padding with zeroes to 8-bytes when MIPS_RAW_APPENDED_DTB
is defined.

Fixes: 79edff1206 ("scripts/dtc: Update to upstream version v1.6.0-51-g183df9e9c2b9")
Cc: Rob Herring <robh+dt@kernel.org>
Cc: Frank Rowand <frowand.list@gmail.com>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2021-03-08 18:36:08 +01:00
Timo Rothenpieler
a0590473c5 nfs: fix PNFS_FLEXFILE_LAYOUT Kconfig default
This follows what was done in 8c2fabc654.
With the default being m, it's impossible to build the module into the
kernel.

Signed-off-by: Timo Rothenpieler <timo@rothenpieler.org>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2021-03-08 11:47:47 -05:00
Takashi Iwai
0a0be72f93 Merge branch 'for-linus-5.12-rc1' into for-linus 2021-03-08 17:09:34 +01:00
Takashi Iwai
28e96c1693 ALSA: hda: Drop the BATCH workaround for AMD controllers
The commit c02f77d32d ("ALSA: hda - Workaround for crackled sound on
AMD controller (1022:1457)") introduced a few workarounds for the
recent AMD HD-audio controller, and one of them is the forced BATCH
PCM mode so that PulseAudio avoids the timer-based scheduling.  This
was thought to cover for some badly working applications, but this
actually worsens for more others.  In total, this wasn't a good idea
to enforce it.

This is a partial revert of the commit above for dropping the PCM
BATCH enforcement part to recover from the regression again.

Fixes: c02f77d32d ("ALSA: hda - Workaround for crackled sound on AMD controller (1022:1457)")
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=195303
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20210308160726.22930-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2021-03-08 17:09:25 +01:00
Rafael J. Wysocki
b7dea0cb3d Merge branch 'cpufreq/arm/fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm
Pull ARM cpufreq fixes for 5.12 from Viresh Kumar:

"- Two patches for qcom-hw driver to fix dereferencing and return
   value check.

- Add vexpress to cpufreq-dt blacklist."

* 'cpufreq/arm/fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm:
  cpufreq: blacklist Arm Vexpress platforms in cpufreq-dt-platdev
  cpufreq: qcom-hw: Fix return value check in qcom_cpufreq_hw_cpu_init()
  cpufreq: qcom-hw: fix dereferencing freed memory 'data'
2021-03-08 16:48:52 +01:00
Christian Brauner
ee2e3f5062 mount: fix mounting of detached mounts onto targets that reside on shared mounts
Creating a series of detached mounts, attaching them to the filesystem,
and unmounting them can be used to trigger an integer overflow in
ns->mounts causing the kernel to block any new mounts in count_mounts()
and returning ENOSPC because it falsely assumes that the maximum number
of mounts in the mount namespace has been reached, i.e. it thinks it
can't fit the new mounts into the mount namespace anymore.

Depending on the number of mounts in your system, this can be reproduced
on any kernel that supportes open_tree() and move_mount() by compiling
and running the following program:

  /* SPDX-License-Identifier: LGPL-2.1+ */

  #define _GNU_SOURCE
  #include <errno.h>
  #include <fcntl.h>
  #include <getopt.h>
  #include <limits.h>
  #include <stdbool.h>
  #include <stdio.h>
  #include <stdlib.h>
  #include <string.h>
  #include <sys/mount.h>
  #include <sys/stat.h>
  #include <sys/syscall.h>
  #include <sys/types.h>
  #include <unistd.h>

  /* open_tree() */
  #ifndef OPEN_TREE_CLONE
  #define OPEN_TREE_CLONE 1
  #endif

  #ifndef OPEN_TREE_CLOEXEC
  #define OPEN_TREE_CLOEXEC O_CLOEXEC
  #endif

  #ifndef __NR_open_tree
          #if defined __alpha__
                  #define __NR_open_tree 538
          #elif defined _MIPS_SIM
                  #if _MIPS_SIM == _MIPS_SIM_ABI32        /* o32 */
                          #define __NR_open_tree 4428
                  #endif
                  #if _MIPS_SIM == _MIPS_SIM_NABI32       /* n32 */
                          #define __NR_open_tree 6428
                  #endif
                  #if _MIPS_SIM == _MIPS_SIM_ABI64        /* n64 */
                          #define __NR_open_tree 5428
                  #endif
          #elif defined __ia64__
                  #define __NR_open_tree (428 + 1024)
          #else
                  #define __NR_open_tree 428
          #endif
  #endif

  /* move_mount() */
  #ifndef MOVE_MOUNT_F_EMPTY_PATH
  #define MOVE_MOUNT_F_EMPTY_PATH 0x00000004 /* Empty from path permitted */
  #endif

  #ifndef __NR_move_mount
          #if defined __alpha__
                  #define __NR_move_mount 539
          #elif defined _MIPS_SIM
                  #if _MIPS_SIM == _MIPS_SIM_ABI32        /* o32 */
                          #define __NR_move_mount 4429
                  #endif
                  #if _MIPS_SIM == _MIPS_SIM_NABI32       /* n32 */
                          #define __NR_move_mount 6429
                  #endif
                  #if _MIPS_SIM == _MIPS_SIM_ABI64        /* n64 */
                          #define __NR_move_mount 5429
                  #endif
          #elif defined __ia64__
                  #define __NR_move_mount (428 + 1024)
          #else
                  #define __NR_move_mount 429
          #endif
  #endif

  static inline int sys_open_tree(int dfd, const char *filename, unsigned int flags)
  {
          return syscall(__NR_open_tree, dfd, filename, flags);
  }

  static inline int sys_move_mount(int from_dfd, const char *from_pathname, int to_dfd,
                                   const char *to_pathname, unsigned int flags)
  {
          return syscall(__NR_move_mount, from_dfd, from_pathname, to_dfd, to_pathname, flags);
  }

  static bool is_shared_mountpoint(const char *path)
  {
          bool shared = false;
          FILE *f = NULL;
          char *line = NULL;
          int i;
          size_t len = 0;

          f = fopen("/proc/self/mountinfo", "re");
          if (!f)
                  return 0;

          while (getline(&line, &len, f) > 0) {
                  char *slider1, *slider2;

                  for (slider1 = line, i = 0; slider1 && i < 4; i++)
                          slider1 = strchr(slider1 + 1, ' ');

                  if (!slider1)
                          continue;

                  slider2 = strchr(slider1 + 1, ' ');
                  if (!slider2)
                          continue;

                  *slider2 = '\0';
                  if (strcmp(slider1 + 1, path) == 0) {
                          /* This is the path. Is it shared? */
                          slider1 = strchr(slider2 + 1, ' ');
                          if (slider1 && strstr(slider1, "shared:")) {
                                  shared = true;
                                  break;
                          }
                  }
          }
          fclose(f);
          free(line);

          return shared;
  }

  static void usage(void)
  {
          const char *text = "mount-new [--recursive] <base-dir>\n";
          fprintf(stderr, "%s", text);
          _exit(EXIT_SUCCESS);
  }

  #define exit_usage(format, ...)                              \
          ({                                                   \
                  fprintf(stderr, format "\n", ##__VA_ARGS__); \
                  usage();                                     \
          })

  #define exit_log(format, ...)                                \
          ({                                                   \
                  fprintf(stderr, format "\n", ##__VA_ARGS__); \
                  exit(EXIT_FAILURE);                          \
          })

  static const struct option longopts[] = {
          {"help",        no_argument,            0,      'a'},
          { NULL,         no_argument,            0,       0 },
  };

  int main(int argc, char *argv[])
  {
          int exit_code = EXIT_SUCCESS, index = 0;
          int dfd, fd_tree, new_argc, ret;
          char *base_dir;
          char *const *new_argv;
          char target[PATH_MAX];

          while ((ret = getopt_long_only(argc, argv, "", longopts, &index)) != -1) {
                  switch (ret) {
                  case 'a':
                          /* fallthrough */
                  default:
                          usage();
                  }
          }

          new_argv = &argv[optind];
          new_argc = argc - optind;
          if (new_argc < 1)
                  exit_usage("Missing base directory\n");
          base_dir = new_argv[0];

          if (*base_dir != '/')
                  exit_log("Please specify an absolute path");

          /* Ensure that target is a shared mountpoint. */
          if (!is_shared_mountpoint(base_dir))
                  exit_log("Please ensure that \"%s\" is a shared mountpoint", base_dir);

          dfd = open(base_dir, O_RDONLY | O_DIRECTORY | O_CLOEXEC);
          if (dfd < 0)
                  exit_log("%m - Failed to open base directory \"%s\"", base_dir);

          ret = mkdirat(dfd, "detached-move-mount", 0755);
          if (ret < 0)
                  exit_log("%m - Failed to create required temporary directories");

          ret = snprintf(target, sizeof(target), "%s/detached-move-mount", base_dir);
          if (ret < 0 || (size_t)ret >= sizeof(target))
                  exit_log("%m - Failed to assemble target path");

          /*
           * Having a mount table with 10000 mounts is already quite excessive
           * and shoult account even for weird test systems.
           */
          for (size_t i = 0; i < 10000; i++) {
                  fd_tree = sys_open_tree(dfd, "detached-move-mount",
                                          OPEN_TREE_CLONE |
                                          OPEN_TREE_CLOEXEC |
                                          AT_EMPTY_PATH);
                  if (fd_tree < 0) {
                          fprintf(stderr, "%m - Failed to open %d(detached-move-mount)", dfd);
                          exit_code = EXIT_FAILURE;
                          break;
                  }

                  ret = sys_move_mount(fd_tree, "", dfd, "detached-move-mount", MOVE_MOUNT_F_EMPTY_PATH);
                  if (ret < 0) {
                          if (errno == ENOSPC)
                                  fprintf(stderr, "%m - Buggy mount counting");
                          else
                                  fprintf(stderr, "%m - Failed to attach mount to %d(detached-move-mount)", dfd);
                          exit_code = EXIT_FAILURE;
                          break;
                  }
                  close(fd_tree);

                  ret = umount2(target, MNT_DETACH);
                  if (ret < 0) {
                          fprintf(stderr, "%m - Failed to unmount %s", target);
                          exit_code = EXIT_FAILURE;
                          break;
                  }
          }

          (void)unlinkat(dfd, "detached-move-mount", AT_REMOVEDIR);
          close(dfd);

          exit(exit_code);
  }

and wait for the kernel to refuse any new mounts by returning ENOSPC.
How many iterations are needed depends on the number of mounts in your
system. Assuming you have something like 50 mounts on a standard system
it should be almost instantaneous.

The root cause of this is that detached mounts aren't handled correctly
when source and target mount are identical and reside on a shared mount
causing a broken mount tree where the detached source itself is
propagated which propagation prevents for regular bind-mounts and new
mounts. This ultimately leads to a miscalculation of the number of
mounts in the mount namespace.

Detached mounts created via
open_tree(fd, path, OPEN_TREE_CLONE)
are essentially like an unattached new mount, or an unattached
bind-mount. They can then later on be attached to the filesystem via
move_mount() which calls into attach_recursive_mount(). Part of
attaching it to the filesystem is making sure that mounts get correctly
propagated in case the destination mountpoint is MS_SHARED, i.e. is a
shared mountpoint. This is done by calling into propagate_mnt() which
walks the list of peers calling propagate_one() on each mount in this
list making sure it receives the propagation event.
The propagate_one() functions thereby skips both new mounts and bind
mounts to not propagate them "into themselves". Both are identified by
checking whether the mount is already attached to any mount namespace in
mnt->mnt_ns. The is what the IS_MNT_NEW() helper is responsible for.

However, detached mounts have an anonymous mount namespace attached to
them stashed in mnt->mnt_ns which means that IS_MNT_NEW() doesn't
realize they need to be skipped causing the mount to propagate "into
itself" breaking the mount table and causing a disconnect between the
number of mounts recorded as being beneath or reachable from the target
mountpoint and the number of mounts actually recorded/counted in
ns->mounts ultimately causing an overflow which in turn prevents any new
mounts via the ENOSPC issue.

So teach propagation to handle detached mounts by making it aware of
them. I've been tracking this issue down for the last couple of days and
then verifying that the fix is correct by
unmounting everything in my current mount table leaving only /proc and
/sys mounted and running the reproducer above overnight verifying the
number of mounts counted in ns->mounts. With this fix the counts are
correct and the ENOSPC issue can't be reproduced.

This change will only have an effect on mounts created with the new
mount API since detached mounts cannot be created with the old mount API
so regressions are extremely unlikely.

Link: https://lore.kernel.org/r/20210306101010.243666-1-christian.brauner@ubuntu.com
Fixes: 2db154b3ea ("vfs: syscall: Add move_mount(2) to move mounts around")
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Cc: <stable@vger.kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-03-08 15:18:43 +01:00
Joerg Roedel
78a81d88f6 x86/sev-es: Introduce ip_within_syscall_gap() helper
Introduce a helper to check whether an exception came from the syscall
gap and use it in the SEV-ES code. Extend the check to also cover the
compatibility SYSCALL entry path.

Fixes: 315562c9af ("x86/sev-es: Adjust #VC IST Stack on entering NMI handler")
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: stable@vger.kernel.org # 5.10+
Link: https://lkml.kernel.org/r/20210303141716.29223-2-joro@8bytes.org
2021-03-08 14:22:17 +01:00
Anshuman Khandual
79cc2ed5a7 arm64/mm: Drop THP conditionality from FORCE_MAX_ZONEORDER
Currently without THP being enabled, MAX_ORDER via FORCE_MAX_ZONEORDER gets
reduced to 11, which falls below HUGETLB_PAGE_ORDER for certain 16K and 64K
page size configurations. This is problematic which throws up the following
warning during boot as pageblock_order via HUGETLB_PAGE_ORDER order exceeds
MAX_ORDER.

WARNING: CPU: 7 PID: 127 at mm/vmstat.c:1092 __fragmentation_index+0x58/0x70
Modules linked in:
CPU: 7 PID: 127 Comm: kswapd0 Not tainted 5.12.0-rc1-00005-g0221e3101a1 #237
Hardware name: linux,dummy-virt (DT)
pstate: 20400005 (nzCv daif +PAN -UAO -TCO BTYPE=--)
pc : __fragmentation_index+0x58/0x70
lr : fragmentation_index+0x88/0xa8
sp : ffff800016ccfc00
x29: ffff800016ccfc00 x28: 0000000000000000
x27: ffff800011fd4000 x26: 0000000000000002
x25: ffff800016ccfda0 x24: 0000000000000002
x23: 0000000000000640 x22: ffff0005ffcb5b18
x21: 0000000000000002 x20: 000000000000000d
x19: ffff0005ffcb3980 x18: 0000000000000004
x17: 0000000000000001 x16: 0000000000000019
x15: ffff800011ca7fb8 x14: 00000000000002b3
x13: 0000000000000000 x12: 00000000000005e0
x11: 0000000000000003 x10: 0000000000000080
x9 : ffff800011c93948 x8 : 0000000000000000
x7 : 0000000000000000 x6 : 0000000000007000
x5 : 0000000000007944 x4 : 0000000000000032
x3 : 000000000000001c x2 : 000000000000000b
x1 : ffff800016ccfc10 x0 : 000000000000000d
Call trace:
__fragmentation_index+0x58/0x70
compaction_suitable+0x58/0x78
wakeup_kcompactd+0x8c/0xd8
balance_pgdat+0x570/0x5d0
kswapd+0x1e0/0x388
kthread+0x154/0x158
ret_from_fork+0x10/0x30

This solves the problem via keeping FORCE_MAX_ZONEORDER unchanged with or
without THP on 16K and 64K page size configurations, making sure that the
HUGETLB_PAGE_ORDER (and pageblock_order) would never exceed MAX_ORDER.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/1614597914-28565-1-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2021-03-08 13:03:54 +00:00
Anshuman Khandual
07fb6dc327 arm64/mm: Drop redundant ARCH_WANT_HUGE_PMD_SHARE
There is already an ARCH_WANT_HUGE_PMD_SHARE which is being selected for
applicable configurations. Hence just drop the other redundant entry.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Link: https://lore.kernel.org/r/1614575192-21307-1-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2021-03-08 13:02:35 +00:00
Will Deacon
cae118b6ac arm64: Drop support for CMDLINE_EXTEND
The documented behaviour for CMDLINE_EXTEND is that the arguments from
the bootloader are appended to the built-in kernel command line. This
also matches the option parsing behaviour for the EFI stub and early ID
register overrides.

Bizarrely, the fdt behaviour is the other way around: appending the
built-in command line to the bootloader arguments, resulting in a
command-line that doesn't necessarily line-up with the parsing order and
definitely doesn't line-up with the documented behaviour.

As it turns out, there is a proposal [1] to replace CMDLINE_EXTEND with
CMDLINE_PREPEND and CMDLINE_APPEND options which should hopefully make
the intended behaviour much clearer. While we wait for those to land,
drop CMDLINE_EXTEND for now as there appears to be little enthusiasm for
changing the current FDT behaviour.

[1] https://lore.kernel.org/lkml/20190319232448.45964-2-danielwa@cisco.com/

Cc: Max Uvarov <muvarov@gmail.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Doug Anderson <dianders@chromium.org>
Cc: Tyler Hicks <tyhicks@linux.microsoft.com>
Cc: Frank Rowand <frowand.list@gmail.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/CAL_JsqJX=TCCs7=gg486r9TN4NYscMTCLNfqJF9crskKPq-bTg@mail.gmail.com
Link: https://lore.kernel.org/r/20210303134927.18975-3-will@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
2021-03-08 12:56:34 +00:00
Will Deacon
df304c2d0d arm64: cpufeatures: Fix handling of CONFIG_CMDLINE for idreg overrides
The built-in kernel commandline (CONFIG_CMDLINE) can be configured in
three different ways:

  1. CMDLINE_FORCE: Use CONFIG_CMDLINE instead of any bootloader args
  2. CMDLINE_EXTEND: Append the bootloader args to CONFIG_CMDLINE
  3. CMDLINE_FROM_BOOTLOADER: Only use CONFIG_CMDLINE if there aren't
     any bootloader args.

The early cmdline parsing to detect idreg overrides gets (2) and (3)
slightly wrong: in the case of (2) the bootloader args are parsed first
and in the case of (3) the CMDLINE is always parsed.

Fix these issues by moving the bootargs parsing out into a helper
function and following the same logic as that used by the EFI stub.

Reviewed-by: Marc Zyngier <maz@kernel.org>
Fixes: 3320030355 ("arm64: cpufeature: Add an early command-line cpufeature override facility")
Link: https://lore.kernel.org/r/20210303134927.18975-2-will@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
2021-03-08 12:56:34 +00:00
Andy Shevchenko
b41ba2ec54 gpiolib: Read "gpio-line-names" from a firmware node
On STM32MP1, the GPIO banks are subnodes of pin-controller@50002000,
see arch/arm/boot/dts/stm32mp151.dtsi. The driver for
pin-controller@50002000 is in drivers/pinctrl/stm32/pinctrl-stm32.c
and iterates over all of its DT subnodes when registering each GPIO
bank gpiochip. Each gpiochip has:

  - gpio_chip.parent = dev,
    where dev is the device node of the pin controller
  - gpio_chip.of_node = np,
    which is the OF node of the GPIO bank

Therefore, dev_fwnode(chip->parent) != of_fwnode_handle(chip.of_node),
i.e. pin-controller@50002000 != pin-controller@50002000/gpio@5000*000.

The original code behaved correctly, as it extracted the "gpio-line-names"
from of_fwnode_handle(chip.of_node) = pin-controller@50002000/gpio@5000*000.

To achieve the same behaviour, read property from the firmware node.

Fixes: 7cba1a4d5e ("gpiolib: generalize devprop_gpiochip_set_names() for device properties")
Reported-by: Marek Vasut <marex@denx.de>
Reported-by: Roman Guskov <rguskov@dh-electronics.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Tested-by: Marek Vasut <marex@denx.de>
Reviewed-by: Marek Vasut <marex@denx.de>
Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
2021-03-08 11:59:17 +01:00
Andy Shevchenko
eb441337c7 gpio: pca953x: Set IRQ type when handle Intel Galileo Gen 2
The commit 0ea683931a ("gpio: dwapb: Convert driver to using the
GPIO-lib-based IRQ-chip") indeliberately made a regression on how
IRQ line from GPIO I²C expander is handled. I.e. it reveals that
the quirk for Intel Galileo Gen 2 misses the part of setting IRQ type
which previously was predefined by gpio-dwapb driver. Now, we have to
reorganize the approach to call necessary parts, which can be done via
ACPI_GPIO_QUIRK_ABSOLUTE_NUMBER quirk.

Without this fix and with above mentioned change the kernel hangs
on the first IRQ event with:

    gpio gpiochip3: Persistence not supported for GPIO 1
    irq 32, desc: 62f8fb50, depth: 0, count: 0, unhandled: 0
    ->handle_irq():  41c7b0ab, handle_bad_irq+0x0/0x40
    ->irq_data.chip(): e03f1e72, 0xc2539218
    ->action(): 0ecc7e6f
    ->action->handler(): 8a3db21e, irq_default_primary_handler+0x0/0x10
       IRQ_NOPROBE set
    unexpected IRQ trap at vector 20

Fixes: ba8c90c618 ("gpio: pca953x: Override IRQ for one of the expanders on Galileo Gen 2")
Depends-on: 0ea683931a ("gpio: dwapb: Convert driver to using the GPIO-lib-based IRQ-chip")
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
2021-03-08 11:59:17 +01:00
Andy Shevchenko
809390219f gpiolib: acpi: Allow to find GpioInt() resource by name and index
Currently only search by index is supported. However, in some cases
we might need to pass the quirks to the acpi_dev_gpio_irq_get().

For this, split out acpi_dev_gpio_irq_get_by() and replace
acpi_dev_gpio_irq_get() by calling above with NULL for name parameter.

Fixes: ba8c90c618 ("gpio: pca953x: Override IRQ for one of the expanders on Galileo Gen 2")
Depends-on: 0ea683931a ("gpio: dwapb: Convert driver to using the GPIO-lib-based IRQ-chip")
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
2021-03-08 11:59:17 +01:00
Andy Shevchenko
62d5247d23 gpiolib: acpi: Add ACPI_GPIO_QUIRK_ABSOLUTE_NUMBER quirk
On some systems the ACPI tables has wrong pin number and instead of
having a relative one it provides an absolute one in the global GPIO
number space.

Add ACPI_GPIO_QUIRK_ABSOLUTE_NUMBER quirk to cope with such cases.

Fixes: ba8c90c618 ("gpio: pca953x: Override IRQ for one of the expanders on Galileo Gen 2")
Depends-on: 0ea683931a ("gpio: dwapb: Convert driver to using the GPIO-lib-based IRQ-chip")
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
2021-03-08 11:59:17 +01:00
Yang Li
6e5d579173 gpiolib: acpi: Add missing IRQF_ONESHOT
fixed the following coccicheck:
./drivers/gpio/gpiolib-acpi.c:176:7-27: ERROR: Threaded IRQ with no
primary handler requested without IRQF_ONESHOT

Make sure threaded IRQs without a primary handler are always request
with IRQF_ONESHOT

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Acked-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2021-03-08 11:59:17 +01:00
Johan Hovold
cf25ef6b63 gpio: fix gpio-device list corruption
Make sure to hold the gpio_lock when removing the gpio device from the
gpio_devices list (when dropping the last reference) to avoid corrupting
the list when there are concurrent accesses.

Fixes: ff2b135922 ("gpio: make the gpiochip a real device")
Cc: stable@vger.kernel.org      # 4.6
Reviewed-by: Saravana Kannan <saravanak@google.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
2021-03-08 11:59:17 +01:00
Johan Hovold
a6112998ee gpio: fix NULL-deref-on-deregistration regression
Fix a NULL-pointer deference when deregistering the gpio character
device that was introduced by the recent stub-driver hack. When the new
"driver" is unbound as part of deregistration, driver core clears the
driver-data pointer which is used to retrieve the struct gpio_device in
its release callback.

Fix this by using container_of() in the release callback as should have
been done all along.

Fixes: 4731210c09 ("gpiolib: Bind gpio_device to a driver to enable fw_devlink=on by default")
Cc: Saravana Kannan <saravanak@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reported-by: syzbot+d27b4c8adbbff70fbfde@syzkaller.appspotmail.com
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
2021-03-08 11:59:17 +01:00
Bartosz Golaszewski
542104ee0c selftests: gpio: update .gitignore
The executable that we build for GPIO selftests was renamed to
gpio-mockup-cdev. Let's update .gitignore so that we don't show it
as an untracked file.

Fixes: 8bc395a6a2 ("selftests: gpio: rework and simplify test implementation")
Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Reviewed-by: Kent Gibson <warthog618@gmail.com>
2021-03-08 11:59:16 +01:00
Maciej W. Rozycki
6c810cf20f crypto: mips/poly1305 - enable for all MIPS processors
The MIPS Poly1305 implementation is generic MIPS code written such as to
support down to the original MIPS I and MIPS III ISA for the 32-bit and
64-bit variant respectively.  Lift the current limitation then to enable
code for MIPSr1 ISA or newer processors only and have it available for
all MIPS processors.

Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
Fixes: a11d055e7a ("crypto: mips/poly1305 - incorporate OpenSSL/CRYPTOGAMS optimized implementation")
Cc: stable@vger.kernel.org # v5.5+
Acked-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2021-03-08 11:52:17 +01:00
Sudeep Holla
fbb31cb805 cpufreq: blacklist Arm Vexpress platforms in cpufreq-dt-platdev
Add "arm,vexpress" to cpufreq-dt-platdev blacklist since the actual
scaling is handled by the firmware cpufreq drivers(scpi, scmi and
vexpress-spc).

Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2021-03-08 16:20:07 +05:30
Wei Yongjun
536eb97abe cpufreq: qcom-hw: Fix return value check in qcom_cpufreq_hw_cpu_init()
In case of error, the function ioremap() returns NULL pointer
not ERR_PTR(). The IS_ERR() test in the return value check
should be replaced with NULL test.

Fixes: 67fc209b52 ("cpufreq: qcom-hw: drop devm_xxx() calls from init/exit hooks")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Acked-by: Shawn Guo <shawn.guo@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2021-03-08 16:20:07 +05:30
Shawn Guo
02fc409540 cpufreq: qcom-hw: fix dereferencing freed memory 'data'
Commit 67fc209b52 ("cpufreq: qcom-hw: drop devm_xxx() calls from
init/exit hooks") introduces an issue of dereferencing freed memory
'data'.  Fix it.

Fixes: 67fc209b52 ("cpufreq: qcom-hw: drop devm_xxx() calls from init/exit hooks")
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2021-03-08 16:20:06 +05:30
Paul Cercueil
7a05293af3 MIPS: boot/compressed: Copy DTB to aligned address
Since 5.12-rc1, the Device Tree blob must now be properly aligned.

Therefore, the decompress routine must be careful to copy the blob at
the next aligned address after the kernel image.

This commit fixes the kernel sometimes not booting with a Device Tree
blob appended to it.

Fixes: 79edff1206 ("scripts/dtc: Update to upstream version v1.6.0-51-g183df9e9c2b9")
Signed-off-by: Paul Cercueil <paul@crapouillou.net>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2021-03-08 11:49:47 +01:00
Alexander Egorenkov
78c7cccaab s390: remove IBM_PARTITION and CONFIGFS_FS from zfcpdump defconfig
Remove by zfcpdump unused CONFIG_IBM_PARTITION and CONFIG_CONFIGFS_FS.

Signed-off-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Reviewed-by: Steffen Maier <maier@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-03-08 10:46:30 +01:00
Heiko Carstens
d50aa69d36 s390: update defconfigs
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-03-08 10:46:30 +01:00
Heiko Carstens
d0ed78e178 s390,alpha: make TMPFS_INODE64 available again
Both s390 and alpha have been switched to 64-bit ino_t with
commit 96c0a6a72d ("s390,alpha: switch to 64-bit ino_t").
Therefore enable TMPFS_INODE64 for both architectures again.

Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Link: https://lore.kernel.org/linux-mm/YCV7QiyoweJwvN+m@osiris/
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-03-08 10:46:30 +01:00
Eric Farman
d9c48a948d s390/cio: return -EFAULT if copy_to_user() fails
Fixes: 120e214e50 ("vfio: ccw: realize VFIO_DEVICE_G(S)ET_IRQ_INFO ioctls")
Signed-off-by: Eric Farman <farman@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-03-08 10:46:30 +01:00
Joe Perches
db232eb42c s390/tty3270: avoid comma separated statements
Use semicolons and braces.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Link: https://lore.kernel.org/r/9988babd9cca4ac841961d9f0bbf5e49caa87659.1598331149.git.joe@perches.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-03-08 10:46:30 +01:00
Jiapeng Chong
1c0a9c7997 s390/cpumf: remove unneeded semicolon
Fix the following coccicheck warnings:

./arch/s390/kernel/perf_cpum_cf.c:272:2-3: Unneeded semicolon.

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Link: https://lore.kernel.org/r/1614233736-87331-1-git-send-email-jiapeng.chong@linux.alibaba.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-03-08 10:46:29 +01:00
Wang Qing
942df4be7a s390/crypto: return -EFAULT if copy_to_user() fails
The copy_to_user() function returns the number of bytes remaining to be
copied, but we want to return -EFAULT if the copy doesn't complete.

Fixes: e06670c5fe ("s390: vfio-ap: implement VFIO_DEVICE_GET_INFO ioctl")
Signed-off-by: Wang Qing <wangqing@vivo.com>
Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Link: https://lore.kernel.org/r/1614600502-16714-1-git-send-email-wangqing@vivo.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-03-08 10:46:29 +01:00
Wang Qing
51c44babdc s390/cio: return -EFAULT if copy_to_user() fails
The copy_to_user() function returns the number of bytes remaining to be
copied, but we want to return -EFAULT if the copy doesn't complete.

Fixes: e01bcdd613 ("vfio: ccw: realize VFIO_DEVICE_GET_REGION_INFO ioctl")
Signed-off-by: Wang Qing <wangqing@vivo.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Link: https://lore.kernel.org/r/1614600093-13992-1-git-send-email-wangqing@vivo.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-03-08 10:46:29 +01:00
Thomas Richter
46b635b6ab s390/cpumf: rename header file to hwctrset.h
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Suggested-by: Hendrick Brueckner <brueckner@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-03-08 10:46:28 +01:00
Alexander Egorenkov
dabdfac0e8 s390/zcore: release dump save area on restart or power down
The zFCP/NVMe standalone dumper is supposed to release the dump save area
resource as soon as possible but might fail to do so, for instance, if it
crashes. To avoid this situation, register a reboot notifier and ensure
the dump save area resource is released on reboot or power down.

Signed-off-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-03-08 10:46:28 +01:00
Alexander Egorenkov
7449ca8731 s390/zcore: no need to check return value of debugfs_create functions
When calling debugfs functions, there is no need to ever check the
return value.  The function can work or not, but the code logic should
never do something different based on this.

See commit 7dd541a3fb ("s390: no need to check return value of debugfs_create functions").

Signed-off-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-03-08 10:46:28 +01:00
Thomas Richter
c41b20de1a s390/cpumf: remove 60 seconds read limit
Remove the 60 seconds read interval limit. Do not impose any limit
at all and allow read of complete counter sets.

Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-03-08 10:46:28 +01:00
Heiko Carstens
f9d8cbf33e s390/topology: remove always false if check
The cpumask being checked in cpu_group_map() must have at least one
cpu set; therefore remove the check.

Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-03-08 10:46:27 +01:00
Heiko Carstens
eba8e1af5a s390/time,idle: get rid of unsigned long long
Get rid of unsigned long long, and use unsigned long instead
everywhere. The usage of unsigned long long is a leftover from
31 bit kernel support.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-03-08 10:46:27 +01:00
Anna-Maria Behnsen
46eb1701c0 hrtimer: Update softirq_expires_next correctly after __hrtimer_get_next_event()
hrtimer_force_reprogram() and hrtimer_interrupt() invokes
__hrtimer_get_next_event() to find the earliest expiry time of hrtimer
bases. __hrtimer_get_next_event() does not update
cpu_base::[softirq_]_expires_next to preserve reprogramming logic. That
needs to be done at the callsites.

hrtimer_force_reprogram() updates cpu_base::softirq_expires_next only when
the first expiring timer is a softirq timer and the soft interrupt is not
activated. That's wrong because cpu_base::softirq_expires_next is left
stale when the first expiring timer of all bases is a timer which expires
in hard interrupt context. hrtimer_interrupt() does never update
cpu_base::softirq_expires_next which is wrong too.

That becomes a problem when clock_settime() sets CLOCK_REALTIME forward and
the first soft expiring timer is in the CLOCK_REALTIME_SOFT base. Setting
CLOCK_REALTIME forward moves the clock MONOTONIC based expiry time of that
timer before the stale cpu_base::softirq_expires_next.

cpu_base::softirq_expires_next is cached to make the check for raising the
soft interrupt fast. In the above case the soft interrupt won't be raised
until clock monotonic reaches the stale cpu_base::softirq_expires_next
value. That's incorrect, but what's worse it that if the softirq timer
becomes the first expiring timer of all clock bases after the hard expiry
timer has been handled the reprogramming of the clockevent from
hrtimer_interrupt() will result in an interrupt storm. That happens because
the reprogramming does not use cpu_base::softirq_expires_next, it uses
__hrtimer_get_next_event() which returns the actual expiry time. Once clock
MONOTONIC reaches cpu_base::softirq_expires_next the soft interrupt is
raised and the storm subsides.

Change the logic in hrtimer_force_reprogram() to evaluate the soft and hard
bases seperately, update softirq_expires_next and handle the case when a
soft expiring timer is the first of all bases by comparing the expiry times
and updating the required cpu base fields. Split this functionality into a
separate function to be able to use it in hrtimer_interrupt() as well
without copy paste.

Fixes: 5da7016046 ("hrtimer: Implement support for softirq based hrtimers")
Reported-by: Mikael Beckius <mikael.beckius@windriver.com>
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Mikael Beckius <mikael.beckius@windriver.com>
Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20210223160240.27518-1-anna-maria@linutronix.de
2021-03-08 09:37:01 +01:00
Gao Xiang
9f377622a4 erofs: fix bio->bi_max_vecs behavior change
Martin reported an issue that directory read could be hung on the
latest -rc kernel with some certain image. The root cause is that
commit baa2c7c971 ("block: set .bi_max_vecs as actual allocated
vector number") changes .bi_max_vecs behavior. bio->bi_max_vecs
is set as actual allocated vector number rather than the requested
number now.

Let's avoid using .bi_max_vecs completely instead.

Link: https://lore.kernel.org/r/20210306040438.8084-1-hsiangkao@aol.com
Reported-by: Martin DEVERA <devik@eaxlabs.cz>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
[ Gao Xiang: note that <= 5.11 kernels are not impacted. ]
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
2021-03-08 10:43:32 +08:00
Linus Torvalds
144c79ef33 Merge tag 'perf-tools-fixes-for-v5.12-2020-03-07' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
Pull perf tools fixes from Arnaldo Carvalho de Melo:
 "Perf tool fixes:

   - Fix wrong skipping for per-die aggregation in 'perf stat'.

   - Fix race in signal handling on large core count machines, setting
     up signal handlers earlier.

   - Fix -F for branch & mem modes in 'perf report'.

   - Fix the condition checks for max number of NUMA nodes in 'perf
     bench numa'.

   - Fix crash in 'perf diff' error path.

   - Fix filtering of empty build-ids in 'perf archive'.

   - Ensure read cmdlines from libtraceevent are null terminated.

  Recent regressions:

   - Fix control fifo permissions in 'perf daemon'.

   - Fix 'perf daemon' compile error with ASAN.

   - Fix running 'perf daemon' test for non root user.

   - Fix PERF_SAMPLE_WEIGHT_STRUCT 'perf test' failure on non-x86
     arches.

   - Fix event's PMU name parsing related to new drm/i915/gt
     software-gt-awake-time event.

  Fixes from compiler instrumentation:

   - Fix leaks in 'perf test' entries, found using ASAN.

   - Fix use-after-free when 'perf stat -r' option is used.

  Arch specific:

   - Fix bitmap for option om ARM's CS-ETM.

  Documentation:

   - Fix documentation of verbose options.

  Build:

   - Clean 'generated' directory used for creating the syscall table on
     x86.

   - Fix ccache usage in $(CC) when generating arch errno table.

   - Cast (struct timeval).tv_sec when printing, fixing the build with
     MUSL libc.

   - Tighten snprintf() string precision to pass gcc check on some
     32-bit arches.

   - Update UAPI copies from the kernel sources.

   - Fix regression on feature detection 'make clean' target"

* tag 'perf-tools-fixes-for-v5.12-2020-03-07' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (44 commits)
  perf cs-etm: Fix bitmap for option
  perf trace: Fix race in signal handling
  perf map: Tighten snprintf() string precision to pass gcc check on some 32-bit arches
  perf report: Fix -F for branch & mem modes
  perf tests x86: Move insn.h include to make sure it finds stddef.h
  perf test: Support the ins_lat check in the X86 specific test
  perf test: Fix sample-parsing failure on non-x86 platforms
  perf archive: Fix filtering of empty build-ids
  perf daemon: Fix compile error with Asan
  perf stat: Fix use-after-free when -r option is used
  libperf: Add perf_evlist__reset_id_hash()
  perf stat: Fix wrong skipping for per-die aggregation
  tools headers UAPI: Sync KVM's kvm.h and vmx.h headers with the kernel sources
  tools headers cpufeatures: Sync with the kernel sources
  tools headers UAPI: Update tools' copy of linux/coresight-pmu.h
  tools headers: Update syscall.tbl files to support mount_setattr
  perf test: Fix cpu and thread map leaks in perf_time_to_tsc test
  perf test: Fix cpu map leaks in cpu_map_print test
  perf test: Fix a memory leak in thread_map_remove test
  perf test: Fix a thread map leak in thread_map_synthesize test
  ...
2021-03-07 13:35:49 -08:00
Linus Torvalds
3bb48a8506 Merge branch 'parisc-5.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
Pull parisc fixes from Helge Deller:
 "Two small parisc architecture fixes: fix a linking failure reported by
  the kernel test robot and remove a duplicate include"

* 'parisc-5.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
  arch/parisc/kernel: remove duplicate include in ptrace
  parisc: Enable -mlong-calls gcc option with CONFIG_COMPILE_TEST
2021-03-07 13:28:31 -08:00
Linus Torvalds
fbda790430 Merge tag 'powerpc-5.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
 "One non-fix, the conversion of vio_driver->remove() to return void,
  which touches various powerpc specific drivers.

  Fix the privilege checks we do in our perf handling, which could cause
  soft/hard lockups in some configurations.

  Fix a bug with IRQ affinity seen on kdump kernels when CPU 0 is
  offline in the second kernel.

  Fix missed page faults after mprotect(..., PROT_NONE) on 603 (32-bit).

  Fix a bug in our VSX (vector) instruction emulation, which should only
  be seen when doing VSX ops to cache inhibited mappings.

  Three commits fixing various build issues with obscure configurations.

  Thanks to Athira Rajeev, Cédric Le Goater, Christophe Leroy, Christoph
  Plattner, Greg Kurz, Jordan Niethe, Laurent Vivier, Ravi Bangoria,
  Tyrel Datwyler, and Uwe Kleine-König"

* tag 'powerpc-5.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/sstep: Fix VSX instruction emulation
  powerpc/perf: Fix handling of privilege level checks in perf interrupt context
  powerpc: Force inlining of mmu_has_feature to fix build failure
  vio: make remove callback return void
  powerpc/syscall: Force inlining of __prep_irq_for_enabled_exit()
  powerpc/603: Fix protection of user pages mapped with PROT_NONE
  powerpc/pseries: Don't enforce MSI affinity with kdump
  powerpc/4xx: Fix build errors from mfdcr()
2021-03-07 13:24:44 -08:00
Linus Torvalds
dac5187078 Merge tag 'm68k-for-v5.12-tag2' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k
Pull m68k fix from Geert Uytterhoeven:
 "Fix virt_addr_valid() W=1 compiler warnings.

  This is a single non-critical fix. As the build bots are now testing
  all new code with W=1, these warnings are popping up everywhere,
  confusing people. Hence I think it makes sense to silence it as soon
  as possible"

* tag 'm68k-for-v5.12-tag2' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k:
  m68k: Fix virt_addr_valid() W=1 compiler warnings
2021-03-07 13:20:50 -08:00
Stefan Metzmacher
041474885e io_uring: kill io_sq_thread_fork() and return -EOWNERDEAD if the sq_thread is gone
This brings the behavior back in line with what 5.11 and earlier did,
and this is no longer needed with the improved handling of creds
not needing to do unshare().

Signed-off-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-07 14:12:43 -07:00
Stefan Metzmacher
7c30f36a98 io_uring: run __io_sq_thread() with the initial creds from io_uring_setup()
With IORING_SETUP_ATTACH_WQ we should let __io_sq_thread() use the
initial creds from each ctx.

Signed-off-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-07 14:12:43 -07:00
Pavel Begunkov
678eeba481 io-wq: warn on creating manager while exiting
Add a simple warning making sure that nobody tries to create a new
manager while we're under IO_WQ_BIT_EXIT. That can potentially happen
due to racy work submission after final put.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-07 14:12:43 -07:00
Pavel Begunkov
1b00764f09 io_uring: cancel reqs of all iowq's on ring exit
io_ring_exit_work() have to cancel all requests, including those staying
in io-wq, however it tries only cancellation of current tctx, which is
NULL. If we've got task==NULL, use the ctx-to-tctx map to go over all
tctx/io-wq and try cancellations on them.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-07 14:12:43 -07:00
Pavel Begunkov
b5bb3a24f6 io_uring: warn when ring exit takes too long
We use system_unbound_wq to run io_ring_exit_work(), so it's hard to
monitor whether removal hang or not. Add WARN_ONCE to catch hangs.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-07 14:12:43 -07:00
Pavel Begunkov
baf186c4d3 io_uring: index io_uring->xa by ctx not file
We don't use task file notes anymore, and no need left in indexing
task->io_uring->xa by file, and replace it with ctx. It's better
design-wise, especially since we keep a dangling file, and so have to
keep an eye on not dereferencing it.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-07 14:12:43 -07:00
Pavel Begunkov
eebd2e37e6 io_uring: don't take task ring-file notes
With ->flush() gone we're now leaving all uring file notes until the
task dies/execs, so the ctx will not be freed until all tasks that have
ever submit a request die. It was nicer with flush but not much, we
could have locked as described ctx in many cases.

Now we guarantee that ctx outlives all tctx in a sense that
io_ring_exit_work() waits for all tctxs to drop their corresponding
enties in ->xa, and ctx won't go away until then. Hence, additional
io_uring file reference (a.k.a. task file notes) are not needed anymore.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-07 14:12:43 -07:00
Pavel Begunkov
d56d938b4b io_uring: do ctx initiated file note removal
Another preparation patch. When full quiesce is done on ctx exit, use
task_work infra to remove corresponding to the ctx io_uring->xa entries.
For that we use the back tctx map. Also use ->in_idle to prevent
removing it while we traversing ->xa on cancellation, just ignore it.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-07 14:12:43 -07:00
Pavel Begunkov
13bf43f5f4 io_uring: introduce ctx to tctx back map
For each pair tcxt-ctx create an object and chain it into ctx, so we
have a way to traverse all tctx that are using current ctx. Preparation
patch, will be used later.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-07 14:12:43 -07:00
Pavel Begunkov
2941267bd3 io_uring: make del_task_file more forgiving
Rework io_uring_del_task_file(), so it accepts an index to delete, and
it's not necessarily have to be in the ->xa. Infer file from xa_erase()
to maintain a single origin of truth.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-07 14:12:43 -07:00
Bob Peterson
1a5a2cfd34 gfs2: fix use-after-free in trans_drain
This patch adds code to function trans_drain to remove drained
bd elements from the ail lists, if queued, before freeing the bd.
If we don't remove the bd from the ail, function ail_drain will
try to reference the bd after it has been freed by trans_drain.

Thanks to Andy Price for his analysis of the problem.

Reported-by: Andy Price <anprice@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-03-07 17:04:55 +01:00
Yang Li
eb602521f4 gfs2: make function gfs2_make_fs_ro() to void type
It fixes the following warning detected by coccinelle:
./fs/gfs2/super.c:592:5-10: Unneeded variable: "error". Return "0" on
line 628

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-03-07 17:04:55 +01:00
Ard Biesheuvel
9e9888a0fe efi: stub: omit SetVirtualAddressMap() if marked unsupported in RT_PROP table
The EFI_RT_PROPERTIES_TABLE contains a mask of runtime services that are
available after ExitBootServices(). This mostly does not concern the EFI
stub at all, given that it runs before that. However, there is one call
that is made at runtime, which is the call to SetVirtualAddressMap()
(which is not even callable at boot time to begin with)

So add the missing handling of the RT_PROP table to ensure that we only
call SetVirtualAddressMap() if it is not being advertised as unsupported
by the firmware.

Cc: <stable@vger.kernel.org> # v5.10+
Tested-by: Shawn Guo <shawn.guo@linaro.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2021-03-07 09:31:02 +01:00
David S. Miller
9270bbe258 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes for net:

1) Fix incorrect enum type definition in nfnetlink_cthelper UAPI,
   from Dmitry V. Levin.

2) Remove extra space in deprecated automatic helper assignment
   notice, from Klemen Košir.

3) Drop early socket demux socket after NAT mangling, from
   Florian Westphal. Add a test to exercise this bug.

4) Fix bogus invalid packet report in the conntrack TCP tracker,
   also from Florian.

5) Fix access to xt[NFPROTO_UNSPEC] list with no mutex
   in target/match_revfn(), from Vasily Averin.

6) Disallow updates on the table ownership flag.

7) Fix double hook unregistration of tables with owner.

8) Remove bogus check on the table owner in __nft_release_tables().
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-06 17:02:40 -08:00
Suzuki K Poulose
6fc5baf547 perf cs-etm: Fix bitmap for option
When set option with macros ETM_OPT_CTXTID and ETM_OPT_TS, it wrongly
takes these two values (14 and 28 prespectively) as bit masks, but
actually both are the offset for bits.  But this doesn't lead to
further failure due to the AND logic operation will be always true for
ETM_OPT_CTXTID / ETM_OPT_TS.

This patch defines new independent macros (rather than using the
"config" bits) for requesting the "contextid" and "timestamp" for
cs_etm_set_option().

Signed-off-by: Suzuki Poulouse <suzuki.poulose@arm.com>
Reviewed-by: Mike Leach <mike.leach@linaro.org>
Cc: Al Grant <al.grant@arm.com>
Cc: Daniel Kiss <daniel.kiss@arm.com>
Cc: Denis Nikitin <denik@chromium.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: coresight@lists.linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-doc@vger.kernel.org
Link: http://lore.kernel.org/lkml/20210206150833.42120-5-leo.yan@linaro.org
[ Extract the change as a separate patch for easier review ]
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-06 16:54:33 -03:00
Michael Petlan
86a19008af perf trace: Fix race in signal handling
Since a lot of stuff happens before the SIGINT signal handler is registered
(scanning /proc/*, etc.), on bigger systems, such as Cavium Sabre CN99xx,
it may happen that first interrupt signal is lost and perf isn't correctly
terminated.

The reproduction code might look like the following:

    perf trace -a &
    PERF_PID=$!
    sleep 4
    kill -INT $PERF_PID

The issue has been found on a CN99xx machine with RHEL-8 and the patch fixes
it by registering the signal handlers earlier in the init stage.

Suggested-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Michael Petlan <mpetlan@redhat.com>
Tested-by: Michael Petlan <mpetlan@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: https://lore.kernel.org/lkml/YEJnaMzH2ctp3PPx@kernel.org/
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-06 16:54:32 -03:00
Arnaldo Carvalho de Melo
77d02bd00c perf map: Tighten snprintf() string precision to pass gcc check on some 32-bit arches
Noticed on a debian:experimental mips and mipsel cross build build
environment:

  perfbuilder@ec265a086e9b:~$ mips-linux-gnu-gcc --version | head -1
  mips-linux-gnu-gcc (Debian 10.2.1-3) 10.2.1 20201224
  perfbuilder@ec265a086e9b:~$

    CC       /tmp/build/perf/util/map.o
  util/map.c: In function 'map__new':
  util/map.c:109:5: error: '%s' directive output may be truncated writing between 1 and 2147483645 bytes into a region of size 4096 [-Werror=format-truncation=]
    109 |    "%s/platforms/%s/arch-%s/usr/lib/%s",
        |     ^~
  In file included from /usr/mips-linux-gnu/include/stdio.h:867,
                   from util/symbol.h:11,
                   from util/map.c:2:
  /usr/mips-linux-gnu/include/bits/stdio2.h:67:10: note: '__builtin___snprintf_chk' output 32 or more bytes (assuming 4294967321) into a destination of size 4096
     67 |   return __builtin___snprintf_chk (__s, __n, __USE_FORTIFY_LEVEL - 1,
        |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
     68 |        __bos (__s), __fmt, __va_arg_pack ());
        |        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  cc1: all warnings being treated as errors

Since we have the lenghts for what lands in that place, use it to give
the compiler more info and make it happy.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-06 16:54:32 -03:00
Ravi Bangoria
6740a4e70e perf report: Fix -F for branch & mem modes
perf report fails to add valid additional fields with -F when
used with branch or mem modes. Fix it.

Before patch:

  $ perf record -b
  $ perf report -b -F +srcline_from --stdio
  Error:
  Invalid --fields key: `srcline_from'

After patch:

  $ perf report -b -F +srcline_from --stdio
  # Samples: 8K of event 'cycles'
  # Event count (approx.): 8784
  ...

Committer notes:

There was an inversion: when looking at branch stack dimensions (keys)
it was checking if the sort mode was 'mem', not 'branch'.

Fixes: aa6b3c9923 ("perf report: Make -F more strict like -s")
Reported-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Reviewed-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lore.kernel.org/lkml/20210304062958.85465-1-ravi.bangoria@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-06 16:54:32 -03:00
Arnaldo Carvalho de Melo
c1f272df51 perf tests x86: Move insn.h include to make sure it finds stddef.h
In some versions of alpine Linux the perf build is broken since commit
1d509f2a6e ("x86/insn: Support big endian cross-compiles"):

  In file included from /usr/include/linux/byteorder/little_endian.h:13,
                   from /usr/include/asm/byteorder.h:5,
                   from arch/x86/util/../../../../arch/x86/include/asm/insn.h:10,
                   from arch/x86/util/archinsn.c:2:
  /usr/include/linux/swab.h:161:8: error: unknown type name '__always_inline'
   static __always_inline __u16 __swab16p(const __u16 *p)

So move the inclusion of arch/x86/include/asm/insn.h to later in the
places where linux/stddef.h (that conditionally defines
__always_inline) to workaround this problem on Alpine Linux 3.9 to 3.11,
3.12 onwards works.

Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-06 16:54:32 -03:00
Kan Liang
7d9d4c6edb perf test: Support the ins_lat check in the X86 specific test
The ins_lat of PERF_SAMPLE_WEIGHT_STRUCT stands for the instruction
latency, which is only available for X86. Add a X86 specific test for
the ins_lat and PERF_SAMPLE_WEIGHT_STRUCT type.

The test__x86_sample_parsing() uses the same way as the
test__sample_parsing() to verify a sample type. Since the ins_lat and
PERF_SAMPLE_WEIGHT_STRUCT are the only X86 specific sample type for now,
the test__x86_sample_parsing() only verify the PERF_SAMPLE_WEIGHT_STRUCT
type. Other sample types are still verified in the generic test.

  $ perf test 77 -v
  77: x86 Sample parsing                                              :
  --- start ---
  test child forked, pid 102370
  test child finished with 0
  ---- end ----
  x86 Sample parsing: Ok

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Link: http://lore.kernel.org/lkml/1614787285-104151-2-git-send-email-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-06 16:54:32 -03:00
Kan Liang
a8146d66ab perf test: Fix sample-parsing failure on non-x86 platforms
Executing 'perf test 27' fails on s390:

  [root@t35lp46 perf]# ./perf test -Fv 27
  27: Sample parsing
  --- start ---
  ---- end ----
  Sample parsing: FAILED!
  [root@t35lp46 perf]#

The commit fbefe9c2f8 ("perf tools: Support arch specific
PERF_SAMPLE_WEIGHT_STRUCT processing") changes the ins_lat to a
model-specific variable only for X86, but perf test still verify the
variable in the generic test.

Remove the ins_lat check in the generic test. The following patch will
add it in the X86 specific test.

Fixes: fbefe9c2f8 ("perf tools: Support arch specific PERF_SAMPLE_WEIGHT_STRUCT processing")
Reported-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Tested-by: Thomas Richter <tmricht@linux.ibm.com>
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Link: http://lore.kernel.org/lkml/1614787285-104151-1-git-send-email-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-06 16:54:31 -03:00
Nicholas Fraser
ec4d0a7680 perf archive: Fix filtering of empty build-ids
A non-existent build-id used to be treated as all-zero SHA-1 hash.
Build-ids are now variable width. A non-existent build-id is an empty
string and "perf buildid-list" pads this with spaces. This is true even
when using old perf.data files recorded from older versions of perf;
"perf buildid-list" never reports an all-zero hash anymore.

This fixes "perf-archive" to skip missing build-ids by skipping lines
that start with a padding space rather than with zeroes.

Signed-off-by: Nicholas Fraser <nfraser@codeweavers.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Huw Davies <huw@codeweavers.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ulrich Czekalla <uczekalla@codeweavers.com>
Link: https://lore.kernel.org/r/442bffc7-ac5c-0975-b876-a549efce2413@codeweavers.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-06 16:54:31 -03:00
Namhyung Kim
bd57a9f33a perf daemon: Fix compile error with Asan
I'm seeing a build failure when build with address sanitizer.  It seems
we could write to the name[100] if the var is longer.

  $ make EXTRA_CFLAGS=-fsanitize=address
  ...
    CC       builtin-daemon.o
  In function ‘get_session_name’,
    inlined from ‘session_config’ at builtin-daemon.c:164:6,
    inlined from ‘server_config’ at builtin-daemon.c:223:10:
  builtin-daemon.c:155:11: error: writing 1 byte into a region of size 0 [-Werror=stringop-overflow=]
    155 |  *session = 0;
        |  ~~~~~~~~~^~~
  builtin-daemon.c: In function ‘server_config’:
  builtin-daemon.c:162:7: note: at offset 100 to object ‘name’ with size 100 declared here
    162 |  char name[100];
        |       ^~~~

Fixes: c0666261ff ("perf daemon: Add config file support")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20210224071438.686677-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-06 16:54:31 -03:00
Namhyung Kim
513068f2b1 perf stat: Fix use-after-free when -r option is used
I got a segfault when using -r option with event groups.  The option
makes it run the workload multiple times and it will reuse the evlist
and evsel for each run.

While most of resources are allocated and freed properly, the id hash
in the evlist was not and it resulted in the bug.  You can see it with
the address sanitizer like below:

  $ perf stat -r 100 -e '{cycles,instructions}' true
  =================================================================
  ==693052==ERROR: AddressSanitizer: heap-use-after-free on
      address 0x6080000003d0 at pc 0x558c57732835 bp 0x7fff1526adb0 sp 0x7fff1526ada8
  WRITE of size 8 at 0x6080000003d0 thread T0
    #0 0x558c57732834 in hlist_add_head /home/namhyung/project/linux/tools/include/linux/list.h:644
    #1 0x558c57732834 in perf_evlist__id_hash /home/namhyung/project/linux/tools/lib/perf/evlist.c:237
    #2 0x558c57732834 in perf_evlist__id_add /home/namhyung/project/linux/tools/lib/perf/evlist.c:244
    #3 0x558c57732834 in perf_evlist__id_add_fd /home/namhyung/project/linux/tools/lib/perf/evlist.c:285
    #4 0x558c5747733e in store_evsel_ids util/evsel.c:2765
    #5 0x558c5747733e in evsel__store_ids util/evsel.c:2782
    #6 0x558c5730b717 in __run_perf_stat /home/namhyung/project/linux/tools/perf/builtin-stat.c:895
    #7 0x558c5730b717 in run_perf_stat /home/namhyung/project/linux/tools/perf/builtin-stat.c:1014
    #8 0x558c5730b717 in cmd_stat /home/namhyung/project/linux/tools/perf/builtin-stat.c:2446
    #9 0x558c57427c24 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:313
    #10 0x558c572b1a48 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:365
    #11 0x558c572b1a48 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:409
    #12 0x558c572b1a48 in main /home/namhyung/project/linux/tools/perf/perf.c:539
    #13 0x7fcadb9f7d09 in __libc_start_main ../csu/libc-start.c:308
    #14 0x558c572b60f9 in _start (/home/namhyung/project/linux/tools/perf/perf+0x45d0f9)

Actually the nodes in the hash table are struct perf_stream_id and
they were freed in the previous run.  Fix it by resetting the hash.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20210225035148.778569-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-06 16:54:31 -03:00
Namhyung Kim
e2a99c9a9a libperf: Add perf_evlist__reset_id_hash()
Add the perf_evlist__reset_id_hash() function as an internal function so
that it can be called by perf to reset the hash table.  This is
necessary for 'perf stat' to run the workload multiple times.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20210225035148.778569-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-06 16:54:31 -03:00
Jin Yao
034f7ee130 perf stat: Fix wrong skipping for per-die aggregation
Uncore becomes die-scope on Xeon Cascade Lake-AP and perf has supported
--per-die aggregation yet.

One issue is found in check_per_pkg() for uncore events running on AP
system. On cascade Lake-AP, we have:

S0-D0
S0-D1
S1-D0
S1-D1

But in check_per_pkg(), S0-D1 and S1-D1 are skipped because the mask
bits for S0 and S1 have been set for S0-D0 and S1-D0. It doesn't check
die_id. So the counting for S0-D1 and S1-D1 are set to zero.  That's not
correct.

  root@lkp-csl-2ap4 ~# ./perf stat -a -I 1000 -e llc_misses.mem_read --per-die -- sleep 5
     1.001460963 S0-D0           1            1317376 Bytes llc_misses.mem_read
     1.001460963 S0-D1           1             998016 Bytes llc_misses.mem_read
     1.001460963 S1-D0           1             970496 Bytes llc_misses.mem_read
     1.001460963 S1-D1           1            1291264 Bytes llc_misses.mem_read
     2.003488021 S0-D0           1            1082048 Bytes llc_misses.mem_read
     2.003488021 S0-D1           1            1919040 Bytes llc_misses.mem_read
     2.003488021 S1-D0           1             890752 Bytes llc_misses.mem_read
     2.003488021 S1-D1           1            2380800 Bytes llc_misses.mem_read
     3.005613270 S0-D0           1            1126080 Bytes llc_misses.mem_read
     3.005613270 S0-D1           1            2898176 Bytes llc_misses.mem_read
     3.005613270 S1-D0           1             870912 Bytes llc_misses.mem_read
     3.005613270 S1-D1           1            3388608 Bytes llc_misses.mem_read
     4.007627598 S0-D0           1            1124608 Bytes llc_misses.mem_read
     4.007627598 S0-D1           1            3884416 Bytes llc_misses.mem_read
     4.007627598 S1-D0           1             921088 Bytes llc_misses.mem_read
     4.007627598 S1-D1           1            4451840 Bytes llc_misses.mem_read
     5.001479927 S0-D0           1             963328 Bytes llc_misses.mem_read
     5.001479927 S0-D1           1            4831936 Bytes llc_misses.mem_read
     5.001479927 S1-D0           1             895104 Bytes llc_misses.mem_read
     5.001479927 S1-D1           1            5496640 Bytes llc_misses.mem_read

From above output, we can see S0-D1 and S1-D1 don't report the interval
values, they are continued to grow. That's because check_per_pkg()
wrongly decides to use zero counts for S0-D1 and S1-D1.

So in check_per_pkg(), we should use hashmap(socket,die) to decide if
the cpu counts needs to skip. Only considering socket is not enough.

Now with this patch,

  root@lkp-csl-2ap4 ~# ./perf stat -a -I 1000 -e llc_misses.mem_read --per-die -- sleep 5
     1.001586691 S0-D0           1            1229440 Bytes llc_misses.mem_read
     1.001586691 S0-D1           1             976832 Bytes llc_misses.mem_read
     1.001586691 S1-D0           1             938304 Bytes llc_misses.mem_read
     1.001586691 S1-D1           1            1227328 Bytes llc_misses.mem_read
     2.003776312 S0-D0           1            1586752 Bytes llc_misses.mem_read
     2.003776312 S0-D1           1             875392 Bytes llc_misses.mem_read
     2.003776312 S1-D0           1             855616 Bytes llc_misses.mem_read
     2.003776312 S1-D1           1             949376 Bytes llc_misses.mem_read
     3.006512788 S0-D0           1            1338880 Bytes llc_misses.mem_read
     3.006512788 S0-D1           1             920064 Bytes llc_misses.mem_read
     3.006512788 S1-D0           1             877184 Bytes llc_misses.mem_read
     3.006512788 S1-D1           1            1020736 Bytes llc_misses.mem_read
     4.008895291 S0-D0           1             926592 Bytes llc_misses.mem_read
     4.008895291 S0-D1           1             906368 Bytes llc_misses.mem_read
     4.008895291 S1-D0           1             892224 Bytes llc_misses.mem_read
     4.008895291 S1-D1           1             987712 Bytes llc_misses.mem_read
     5.001590993 S0-D0           1             962624 Bytes llc_misses.mem_read
     5.001590993 S0-D1           1             912512 Bytes llc_misses.mem_read
     5.001590993 S1-D0           1             891200 Bytes llc_misses.mem_read
     5.001590993 S1-D1           1             978432 Bytes llc_misses.mem_read

On no-die system, die_id is 0, actually it's hashmap(socket,0), original behavior
is not changed.

Reported-by: Ying Huang <ying.huang@intel.com>
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jin Yao <yao.jin@intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ying Huang <ying.huang@intel.com>
Link: http://lore.kernel.org/lkml/20210128013417.25597-1-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-06 16:54:30 -03:00
Arnaldo Carvalho de Melo
33dc525f93 tools headers UAPI: Sync KVM's kvm.h and vmx.h headers with the kernel sources
To pick the changes in:

  fe6b6bc802 ("KVM: VMX: Enable bus lock VM exit")

That makes 'perf kvm-stat' aware of this new BUS_LOCK exit reason, thus
addressing the following perf build warning:

  Warning: Kernel ABI header at 'tools/arch/x86/include/uapi/asm/vmx.h' differs from latest version at 'arch/x86/include/uapi/asm/vmx.h'
  diff -u tools/arch/x86/include/uapi/asm/vmx.h arch/x86/include/uapi/asm/vmx.h

Cc: Chenyi Qiang <chenyi.qiang@intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-06 16:54:30 -03:00
Arnaldo Carvalho de Melo
1a9bcadd00 tools headers cpufeatures: Sync with the kernel sources
To pick the changes from:

  3b9c723ed7 ("KVM: SVM: Add support for SVM instruction address check change")
  b85a0425d8 ("Enumerate AVX Vector Neural Network instructions")
  fb35d30fe5 ("x86/cpufeatures: Assign dedicated feature word for CPUID_0x8000001F[EAX]")

This only causes these perf files to be rebuilt:

  CC       /tmp/build/perf/bench/mem-memcpy-x86-64-asm.o
  CC       /tmp/build/perf/bench/mem-memset-x86-64-asm.o

And addresses this perf build warning:

  Warning: Kernel ABI header at 'tools/arch/x86/include/asm/cpufeatures.h' differs from latest version at 'arch/x86/include/asm/cpufeatures.h'
  diff -u tools/arch/x86/include/asm/cpufeatures.h arch/x86/include/asm/cpufeatures.h

Cc: Borislav Petkov <bp@suse.de>
Cc: Kyung Min Park <kyung.min.park@intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Wei Huang <wei.huang2@amd.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-06 16:54:30 -03:00
Arnaldo Carvalho de Melo
6c0afc579a tools headers UAPI: Update tools' copy of linux/coresight-pmu.h
To get the changes in these commits:

  88f11864cf ("coresight: etm-perf: Support PID tracing for kernel at EL2")
  53abf3fe83 ("coresight: etm-perf: Clarify comment on perf options")

This will possibly be used in patches lined up for v5.13.

And silence this perf build warning:

  Warning: Kernel ABI header at 'tools/include/linux/coresight-pmu.h' differs from latest version at 'include/linux/coresight-pmu.h'
  diff -u tools/include/linux/coresight-pmu.h include/linux/coresight-pmu.h

Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-06 16:54:30 -03:00
Arnaldo Carvalho de Melo
743108e104 tools headers: Update syscall.tbl files to support mount_setattr
To pick the changes from:

  9caccd4154 ("fs: introduce MOUNT_ATTR_IDMAP")

This adds this new syscall to the tables used by tools such as 'perf
trace', so that one can specify it by name and have it filtered, etc.

Addressing these perf build warnings:

  Warning: Kernel ABI header at 'tools/perf/arch/x86/entry/syscalls/syscall_64.tbl' differs from latest version at 'arch/x86/entry/syscalls/syscall_64.tbl'
  diff -u tools/perf/arch/x86/entry/syscalls/syscall_64.tbl arch/x86/entry/syscalls/syscall_64.tbl
  Warning: Kernel ABI header at 'tools/perf/arch/powerpc/entry/syscalls/syscall.tbl' differs from latest version at 'arch/powerpc/kernel/syscalls/syscall.tbl'
  diff -u tools/perf/arch/powerpc/entry/syscalls/syscall.tbl arch/powerpc/kernel/syscalls/syscall.tbl
  Warning: Kernel ABI header at 'tools/perf/arch/s390/entry/syscalls/syscall.tbl' differs from latest version at 'arch/s390/kernel/syscalls/syscall.tbl'
  diff -u tools/perf/arch/s390/entry/syscalls/syscall.tbl arch/s390/kernel/syscalls/syscall.tbl

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lore.kernel.org/lkml/YD6Wsxr9ByUbab/a@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-06 16:54:30 -03:00
Namhyung Kim
846580c235 perf test: Fix cpu and thread map leaks in perf_time_to_tsc test
It should release the maps at the end.

  $ perf test -v 71
  71: Convert perf time to TSC                   :
  --- start ---
  test child forked, pid 178744
  mmap size 528384B
  1st event perf time 59207256505278 tsc 13187166645142
  rdtsc          time 59207256542151 tsc 13187166723020
  2nd event perf time 59207256543749 tsc 13187166726393

  =================================================================
  ==178744==ERROR: LeakSanitizer: detected memory leaks

  Direct leak of 40 byte(s) in 1 object(s) allocated from:
    #0 0x7faf601f9e8f in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:145
    #1 0x55b620cfc00a in cpu_map__trim_new /home/namhyung/project/linux/tools/lib/perf/cpumap.c:79
    #2 0x55b620cfca2f in perf_cpu_map__read /home/namhyung/project/linux/tools/lib/perf/cpumap.c:149
    #3 0x55b620cfd1ef in cpu_map__read_all_cpu_map /home/namhyung/project/linux/tools/lib/perf/cpumap.c:166
    #4 0x55b620cfd1ef in perf_cpu_map__new /home/namhyung/project/linux/tools/lib/perf/cpumap.c:181
    #5 0x55b6209ef1b2 in test__perf_time_to_tsc tests/perf-time-to-tsc.c:73
    #6 0x55b6209828fb in run_test tests/builtin-test.c:428
    #7 0x55b6209828fb in test_and_print tests/builtin-test.c:458
    #8 0x55b620984a53 in __cmd_test tests/builtin-test.c:679
    #9 0x55b620984a53 in cmd_test tests/builtin-test.c:825
    #10 0x55b6209f0cd4 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:313
    #11 0x55b62087aa88 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:365
    #12 0x55b62087aa88 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:409
    #13 0x55b62087aa88 in main /home/namhyung/project/linux/tools/perf/perf.c:539
    #14 0x7faf5fd2fd09 in __libc_start_main ../csu/libc-start.c:308

  SUMMARY: AddressSanitizer: 72 byte(s) leaked in 2 allocation(s).
  test child finished with 1
  ---- end ----
  Convert perf time to TSC: FAILED!

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20210301140409.184570-12-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-06 16:54:30 -03:00
Namhyung Kim
690d91f5ec perf test: Fix cpu map leaks in cpu_map_print test
It should be released after printing the map.

  $ perf test -v 52
  52: Print cpu map                              :
  --- start ---
  test child forked, pid 172233

  =================================================================
  ==172233==ERROR: LeakSanitizer: detected memory leaks

  Direct leak of 156 byte(s) in 1 object(s) allocated from:
    #0 0x7fc472518e8f in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:145
    #1 0x55e63b378f7a in cpu_map__trim_new /home/namhyung/project/linux/tools/lib/perf/cpumap.c:79
    #2 0x55e63b37a05c in perf_cpu_map__new /home/namhyung/project/linux/tools/lib/perf/cpumap.c:237
    #3 0x55e63b056d16 in cpu_map_print tests/cpumap.c:102
    #4 0x55e63b056d16 in test__cpu_map_print tests/cpumap.c:120
    #5 0x55e63afff8fb in run_test tests/builtin-test.c:428
    #6 0x55e63afff8fb in test_and_print tests/builtin-test.c:458
    #7 0x55e63b001a53 in __cmd_test tests/builtin-test.c:679
    #8 0x55e63b001a53 in cmd_test tests/builtin-test.c:825
    #9 0x55e63b06dc44 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:313
    #10 0x55e63aef7a88 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:365
    #11 0x55e63aef7a88 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:409
    #12 0x55e63aef7a88 in main /home/namhyung/project/linux/tools/perf/perf.c:539
    #13 0x7fc47204ed09 in __libc_start_main ../csu/libc-start.c:308
  ...

  SUMMARY: AddressSanitizer: 448 byte(s) leaked in 7 allocation(s).
  test child finished with 1
  ---- end ----
  Print cpu map: FAILED!

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20210301140409.184570-11-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-06 16:54:29 -03:00
Namhyung Kim
641b625033 perf test: Fix a memory leak in thread_map_remove test
The str should be freed after creating a thread map.  Also change the
open-coded thread map deletion to a call to perf_thread_map__put().

  $ perf test -v 44
  44: Remove thread map                          :
  --- start ---
  test child forked, pid 165536
  2 threads: 165535, 165536
  1 thread: 165536
  0 thread:

  =================================================================
  ==165536==ERROR: LeakSanitizer: detected memory leaks

  Direct leak of 14 byte(s) in 1 object(s) allocated from:
    #0 0x7f54453ffe8f in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:145
    #1 0x7f5444f8c6a7 in __vasprintf_internal libio/vasprintf.c:71

  SUMMARY: AddressSanitizer: 14 byte(s) leaked in 1 allocation(s).
  test child finished with 1
  ---- end ----
  Remove thread map: FAILED!

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20210301140409.184570-10-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-06 16:54:29 -03:00
Namhyung Kim
4be42882e1 perf test: Fix a thread map leak in thread_map_synthesize test
It missed to call perf_thread_map__put() after using the map.

  $ perf test -v 43
  43: Synthesize thread map                      :
  --- start ---
  test child forked, pid 162640

  =================================================================
  ==162640==ERROR: LeakSanitizer: detected memory leaks

  Direct leak of 32 byte(s) in 1 object(s) allocated from:
    #0 0x7fd48cdaa1f8 in __interceptor_realloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:164
    #1 0x563e6d5f8d0e in perf_thread_map__realloc /home/namhyung/project/linux/tools/lib/perf/threadmap.c:23
    #2 0x563e6d3ef69a in thread_map__new_by_pid util/thread_map.c:46
    #3 0x563e6d2cec90 in test__thread_map_synthesize tests/thread-map.c:97
    #4 0x563e6d27d8fb in run_test tests/builtin-test.c:428
    #5 0x563e6d27d8fb in test_and_print tests/builtin-test.c:458
    #6 0x563e6d27fa53 in __cmd_test tests/builtin-test.c:679
    #7 0x563e6d27fa53 in cmd_test tests/builtin-test.c:825
    #8 0x563e6d2ebce4 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:313
    #9 0x563e6d175a88 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:365
    #10 0x563e6d175a88 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:409
    #11 0x563e6d175a88 in main /home/namhyung/project/linux/tools/perf/perf.c:539
    #12 0x7fd48c8dfd09 in __libc_start_main ../csu/libc-start.c:308

  SUMMARY: AddressSanitizer: 8224 byte(s) leaked in 2 allocation(s).
  test child finished with 1
  ---- end ----
  Synthesize thread map: FAILED!

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20210301140409.184570-9-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-06 16:54:29 -03:00
Namhyung Kim
953e7b5960 perf test: Fix cpu and thread map leaks in switch_tracking test
The evlist and cpu/thread maps should be released together.
Otherwise the following error was reported by Asan.

  $ perf test -v 35
  35: Track with sched_switch                    :
  --- start ---
  test child forked, pid 159287
  Using CPUID GenuineIntel-6-8E-C
  mmap size 528384B
  1295 events recorded

  =================================================================
  ==159287==ERROR: LeakSanitizer: detected memory leaks

  Direct leak of 40 byte(s) in 1 object(s) allocated from:
    #0 0x7fa28d9a2e8f in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:145
    #1 0x5652f5a5affa in cpu_map__trim_new /home/namhyung/project/linux/tools/lib/perf/cpumap.c:79
    #2 0x5652f5a5ba1f in perf_cpu_map__read /home/namhyung/project/linux/tools/lib/perf/cpumap.c:149
    #3 0x5652f5a5c1df in cpu_map__read_all_cpu_map /home/namhyung/project/linux/tools/lib/perf/cpumap.c:166
    #4 0x5652f5a5c1df in perf_cpu_map__new /home/namhyung/project/linux/tools/lib/perf/cpumap.c:181
    #5 0x5652f5723bbf in test__switch_tracking tests/switch-tracking.c:350
    #6 0x5652f56e18fb in run_test tests/builtin-test.c:428
    #7 0x5652f56e18fb in test_and_print tests/builtin-test.c:458
    #8 0x5652f56e3a53 in __cmd_test tests/builtin-test.c:679
    #9 0x5652f56e3a53 in cmd_test tests/builtin-test.c:825
    #10 0x5652f574fcc4 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:313
    #11 0x5652f55d9a88 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:365
    #12 0x5652f55d9a88 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:409
    #13 0x5652f55d9a88 in main /home/namhyung/project/linux/tools/perf/perf.c:539
    #14 0x7fa28d4d8d09 in __libc_start_main ../csu/libc-start.c:308

  SUMMARY: AddressSanitizer: 72 byte(s) leaked in 2 allocation(s).
  test child finished with 1
  ---- end ----
  Track with sched_switch: FAILED!

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20210301140409.184570-8-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-06 16:54:29 -03:00
Namhyung Kim
f2c3202ba0 perf test: Fix cpu and thread map leaks in keep_tracking test
The evlist and the cpu/thread maps should be released together.
Otherwise following error was reported by Asan.

  $ perf test -v 28
  28: Use a dummy software event to keep tracking:
  --- start ---
  test child forked, pid 156810
  mmap size 528384B

  =================================================================
  ==156810==ERROR: LeakSanitizer: detected memory leaks

  Direct leak of 40 byte(s) in 1 object(s) allocated from:
    #0 0x7f637d2bce8f in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:145
    #1 0x55cc6295cffa in cpu_map__trim_new /home/namhyung/project/linux/tools/lib/perf/cpumap.c:79
    #2 0x55cc6295da1f in perf_cpu_map__read /home/namhyung/project/linux/tools/lib/perf/cpumap.c:149
    #3 0x55cc6295e1df in cpu_map__read_all_cpu_map /home/namhyung/project/linux/tools/lib/perf/cpumap.c:166
    #4 0x55cc6295e1df in perf_cpu_map__new /home/namhyung/project/linux/tools/lib/perf/cpumap.c:181
    #5 0x55cc626287cf in test__keep_tracking tests/keep-tracking.c:84
    #6 0x55cc625e38fb in run_test tests/builtin-test.c:428
    #7 0x55cc625e38fb in test_and_print tests/builtin-test.c:458
    #8 0x55cc625e5a53 in __cmd_test tests/builtin-test.c:679
    #9 0x55cc625e5a53 in cmd_test tests/builtin-test.c:825
    #10 0x55cc62651cc4 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:313
    #11 0x55cc624dba88 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:365
    #12 0x55cc624dba88 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:409
    #13 0x55cc624dba88 in main /home/namhyung/project/linux/tools/perf/perf.c:539
    #14 0x7f637cdf2d09 in __libc_start_main ../csu/libc-start.c:308

  SUMMARY: AddressSanitizer: 72 byte(s) leaked in 2 allocation(s).
  test child finished with 1
  ---- end ----
  Use a dummy software event to keep tracking: FAILED!

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20210301140409.184570-7-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-06 16:54:28 -03:00
Namhyung Kim
e06c3ca492 perf test: Fix cpu and thread map leaks in code_reading test
The evlist and the cpu/thread maps should be released together.
Otherwise following error was reported by Asan.

Note that this test still has memory leaks in DSOs so it still fails
even after this change.  I'll take a look at that too.

  # perf test -v 26
  26: Object code reading                        :
  --- start ---
  test child forked, pid 154184
  Looking at the vmlinux_path (8 entries long)
  symsrc__init: build id mismatch for vmlinux.
  symsrc__init: cannot get elf header.
  Using /proc/kcore for kernel data
  Using /proc/kallsyms for symbols
  Parsing event 'cycles'
  mmap size 528384B
  ...
  =================================================================
  ==154184==ERROR: LeakSanitizer: detected memory leaks

  Direct leak of 439 byte(s) in 1 object(s) allocated from:
    #0 0x7fcb66e77037 in __interceptor_calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:154
    #1 0x55ad9b7e821e in dso__new_id util/dso.c:1256
    #2 0x55ad9b8cfd4a in __machine__addnew_vdso util/vdso.c:132
    #3 0x55ad9b8cfd4a in machine__findnew_vdso util/vdso.c:347
    #4 0x55ad9b845b7e in map__new util/map.c:176
    #5 0x55ad9b8415a2 in machine__process_mmap2_event util/machine.c:1787
    #6 0x55ad9b8fab16 in perf_tool__process_synth_event util/synthetic-events.c:64
    #7 0x55ad9b8fab16 in perf_event__synthesize_mmap_events util/synthetic-events.c:499
    #8 0x55ad9b8fbfdf in __event__synthesize_thread util/synthetic-events.c:741
    #9 0x55ad9b8ff3e3 in perf_event__synthesize_thread_map util/synthetic-events.c:833
    #10 0x55ad9b738585 in do_test_code_reading tests/code-reading.c:608
    #11 0x55ad9b73b25d in test__code_reading tests/code-reading.c:722
    #12 0x55ad9b6f28fb in run_test tests/builtin-test.c:428
    #13 0x55ad9b6f28fb in test_and_print tests/builtin-test.c:458
    #14 0x55ad9b6f4a53 in __cmd_test tests/builtin-test.c:679
    #15 0x55ad9b6f4a53 in cmd_test tests/builtin-test.c:825
    #16 0x55ad9b760cc4 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:313
    #17 0x55ad9b5eaa88 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:365
    #18 0x55ad9b5eaa88 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:409
    #19 0x55ad9b5eaa88 in main /home/namhyung/project/linux/tools/perf/perf.c:539
    #20 0x7fcb669acd09 in __libc_start_main ../csu/libc-start.c:308

    ...
  SUMMARY: AddressSanitizer: 471 byte(s) leaked in 2 allocation(s).
  test child finished with 1
  ---- end ----
  Object code reading: FAILED!

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20210301140409.184570-6-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-06 16:54:28 -03:00
Namhyung Kim
97ab7c524f perf test: Fix cpu and thread map leaks in sw_clock_freq test
The evlist has the maps with its own refcounts so we don't need to set
the pointers to NULL.  Otherwise following error was reported by Asan.

Also change the goto label since it doesn't need to have two.

  # perf test -v 25
  25: Software clock events period values        :
  --- start ---
  test child forked, pid 149154
  mmap size 528384B
  mmap size 528384B

  =================================================================
  ==149154==ERROR: LeakSanitizer: detected memory leaks

  Direct leak of 32 byte(s) in 1 object(s) allocated from:
    #0 0x7fef5cd071f8 in __interceptor_realloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:164
    #1 0x56260d5e8b8e in perf_thread_map__realloc /home/namhyung/project/linux/tools/lib/perf/threadmap.c:23
    #2 0x56260d3df7a9 in thread_map__new_by_tid util/thread_map.c:63
    #3 0x56260d2ac6b2 in __test__sw_clock_freq tests/sw-clock.c:65
    #4 0x56260d26d8fb in run_test tests/builtin-test.c:428
    #5 0x56260d26d8fb in test_and_print tests/builtin-test.c:458
    #6 0x56260d26fa53 in __cmd_test tests/builtin-test.c:679
    #7 0x56260d26fa53 in cmd_test tests/builtin-test.c:825
    #8 0x56260d2dbb64 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:313
    #9 0x56260d165a88 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:365
    #10 0x56260d165a88 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:409
    #11 0x56260d165a88 in main /home/namhyung/project/linux/tools/perf/perf.c:539
    #12 0x7fef5c83cd09 in __libc_start_main ../csu/libc-start.c:308

    ...
  test child finished with 1
  ---- end ----
  Software clock events period values      : FAILED!

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20210301140409.184570-5-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-06 16:54:28 -03:00
Namhyung Kim
83d25ccde5 perf test: Fix cpu and thread map leaks in task_exit test
The evlist has the maps with its own refcounts so we don't need to set
the pointers to NULL.  Otherwise following error was reported by Asan.

Also change the goto label since it doesn't need to have two.

  # perf test -v 24
  24: Number of exit events of a simple workload :
  --- start ---
  test child forked, pid 145915
  mmap size 528384B

  =================================================================
  ==145915==ERROR: LeakSanitizer: detected memory leaks

  Direct leak of 32 byte(s) in 1 object(s) allocated from:
    #0 0x7fc44e50d1f8 in __interceptor_realloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:164
    #1 0x561cf50f4d2e in perf_thread_map__realloc /home/namhyung/project/linux/tools/lib/perf/threadmap.c:23
    #2 0x561cf4eeb949 in thread_map__new_by_tid util/thread_map.c:63
    #3 0x561cf4db7fd2 in test__task_exit tests/task-exit.c:74
    #4 0x561cf4d798fb in run_test tests/builtin-test.c:428
    #5 0x561cf4d798fb in test_and_print tests/builtin-test.c:458
    #6 0x561cf4d7ba53 in __cmd_test tests/builtin-test.c:679
    #7 0x561cf4d7ba53 in cmd_test tests/builtin-test.c:825
    #8 0x561cf4de7d04 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:313
    #9 0x561cf4c71a88 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:365
    #10 0x561cf4c71a88 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:409
    #11 0x561cf4c71a88 in main /home/namhyung/project/linux/tools/perf/perf.c:539
    #12 0x7fc44e042d09 in __libc_start_main ../csu/libc-start.c:308

    ...
  test child finished with 1
  ---- end ----
  Number of exit events of a simple workload: FAILED!

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20210301140409.184570-4-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-06 16:54:28 -03:00
Namhyung Kim
09a61c8f86 perf test: Fix a memory leak in attr test
The get_argv_exec_path() returns a dynamic memory so it should be
freed after use.

  $ perf test -v 17
  ...
  ==141682==ERROR: LeakSanitizer: detected memory leaks

  Direct leak of 33 byte(s) in 1 object(s) allocated from:
    #0 0x7f09107d2e8f in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:145
    #1 0x7f091035f6a7 in __vasprintf_internal libio/vasprintf.c:71

  SUMMARY: AddressSanitizer: 33 byte(s) leaked in 1 allocation(s).

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20210301140409.184570-3-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-06 16:54:27 -03:00
Namhyung Kim
b0faef924d perf test: Fix cpu and thread map leaks in basic mmap test
The evlist has the maps with its own refcounts so we don't need to set
the pointers to NULL.  Otherwise following error was reported by Asan.

  # perf test -v 4
   4: Read samples using the mmap interface      :
  --- start ---
  test child forked, pid 139782
  mmap size 528384B

  =================================================================
  ==139782==ERROR: LeakSanitizer: detected memory leaks

  Direct leak of 40 byte(s) in 1 object(s) allocated from:
    #0 0x7f1f76daee8f in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:145
    #1 0x564ba21a0fea in cpu_map__trim_new /home/namhyung/project/linux/tools/lib/perf/cpumap.c:79
    #2 0x564ba21a1a0f in perf_cpu_map__read /home/namhyung/project/linux/tools/lib/perf/cpumap.c:149
    #3 0x564ba21a21cf in cpu_map__read_all_cpu_map /home/namhyung/project/linux/tools/lib/perf/cpumap.c:166
    #4 0x564ba21a21cf in perf_cpu_map__new /home/namhyung/project/linux/tools/lib/perf/cpumap.c:181
    #5 0x564ba1e48298 in test__basic_mmap tests/mmap-basic.c:55
    #6 0x564ba1e278fb in run_test tests/builtin-test.c:428
    #7 0x564ba1e278fb in test_and_print tests/builtin-test.c:458
    #8 0x564ba1e29a53 in __cmd_test tests/builtin-test.c:679
    #9 0x564ba1e29a53 in cmd_test tests/builtin-test.c:825
    #10 0x564ba1e95cb4 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:313
    #11 0x564ba1d1fa88 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:365
    #12 0x564ba1d1fa88 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:409
    #13 0x564ba1d1fa88 in main /home/namhyung/project/linux/tools/perf/perf.c:539
    #14 0x7f1f768e4d09 in __libc_start_main ../csu/libc-start.c:308

    ...
  test child finished with 1
  ---- end ----
  Read samples using the mmap interface: FAILED!
  failed to open shell test directory: /home/namhyung/libexec/perf-core/tests/shell

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Link: https://lore.kernel.org/r/20210301140409.184570-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-06 16:54:27 -03:00
Jiri Olsa
84ea603650 perf tools: Fix event's PMU name parsing
Jin Yao reported parser error for software event:

  # perf stat -e software/r1a/ -a -- sleep 1
  event syntax error: 'software/r1a/'
                       \___ parser error

This happens after commit 8c3b1ba0e7 ("drm/i915/gt: Track the
overall awake/busy time"), where new software-gt-awake-time event's
non-pmu-event-style makes event parser conflict with software PMU.

If we allow PE_PMU_EVENT_PRE to be parsed as PMU name, we fix the
conflict and the following character '/' for PMU or '-' for
non-pmu-event-style event allows parser to decide what even is
specified.

Fixes: 8c3b1ba0e7 ("drm/i915/gt: Track the overall awake/busy time")
Reported-by: Jin Yao <yao.jin@linux.intel.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20210301122315.63471-1-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-06 16:54:27 -03:00
Jiri Olsa
36bc511f63 perf daemon: Fix running test for non root user
John reported that the daemon test is not working for non root user.
Changing the tests configurations so it's allowed to run under normal
user.

Fixes: 2291bb915b ("perf tests: Add daemon 'list' command test")
Reported-by: John Garry <john.garry@huawei.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: John Garry <john.garry@huawei.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20210301122510.64402-2-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-06 16:54:27 -03:00
Jiri Olsa
31bf4e7cb6 perf daemon: Fix control fifo permissions
Add proper mode for mkfifo calls to get read and write permissions for
user. We can't use O_RDWR in here, changing to standard permission
value.

Fixes: 6a6d1804a1 ("perf daemon: Set control fifo for session")
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20210301122510.64402-1-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-06 16:54:26 -03:00
Antonio Terceiro
dacfc08dca perf build: Fix ccache usage in $(CC) when generating arch errno table
This was introduced by commit e4ffd066ff ("perf: Normalize gcc
parameter when generating arch errno table").

Assuming the first word of $(CC) is the actual compiler breaks usage
like CC="ccache gcc": the script ends up calling ccache directly with
gcc arguments, what fails. Instead of getting the first word, just
remove from $(CC) any word that starts with a "-". This maintains the
spirit of the original patch, while not breaking ccache users.

Fixes: e4ffd066ff ("perf: Normalize gcc parameter when generating arch errno table")
Signed-off-by: Antonio Terceiro <antonio.terceiro@linaro.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: He Zhe <zhe.he@windriver.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: http://lore.kernel.org/lkml/20210224130046.346977-1-antonio.terceiro@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-06 16:54:26 -03:00
Ian Rogers
b55ff1d145 perf tools: Fix documentation of verbose options
Option doesn't take a value, make sure the man pages agree. For example:

  $ perf evlist --verbose=1
   Error: option `verbose' takes no value

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20210226183145.1878782-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-06 16:54:26 -03:00
Ian Rogers
137a525893 perf traceevent: Ensure read cmdlines are null terminated.
Issue detected by address sanitizer.

Fixes: cd4ceb6343 ("perf util: Save pid-cmdline mapping into tracing header")
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20210226221431.1985458-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-06 16:54:26 -03:00
Athira Rajeev
394e4306b0 perf bench numa: Fix the condition checks for max number of NUMA nodes
In systems having higher node numbers available like node
255, perf numa bench will fail with SIGABORT.

  <<>>
  perf: bench/numa.c:1416: init: Assertion `!(g->p.nr_nodes > 64 || g->p.nr_nodes < 0)' failed.
  Aborted (core dumped)
  <<>>

Snippet from 'numactl -H' below on a powerpc system where the highest
node number available is 255:

  available: 6 nodes (0,8,252-255)
  node 0 cpus: <cpu-list>
  node 0 size: 519587 MB
  node 0 free: 516659 MB
  node 8 cpus: <cpu-list>
  node 8 size: 523607 MB
  node 8 free: 486757 MB
  node 252 cpus:
  node 252 size: 0 MB
  node 252 free: 0 MB
  node 253 cpus:
  node 253 size: 0 MB
  node 253 free: 0 MB
  node 254 cpus:
  node 254 size: 0 MB
  node 254 free: 0 MB
  node 255 cpus:
  node 255 size: 0 MB
  node 255 free: 0 MB
  node distances:
  node   0   8  252  253  254  255

Note: <cpu-list> expands to actual cpu list in the original output.
These nodes 252-255 are to represent the memory on GPUs and are valid
nodes.

The perf numa bench init code has a condition check to see if the number
of NUMA nodes (nr_nodes) exceeds MAX_NR_NODES. The value of MAX_NR_NODES
defined in perf code is 64. And the 'nr_nodes' is the value from
numa_max_node() which represents the highest node number available in the
system. In some systems where we could have NUMA node 255, this condition
check fails and results in SIGABORT.

The numa benchmark uses static value of MAX_NR_NODES in the code to
represent size of two NUMA node arrays and node bitmask used for setting
memory policy. Patch adds a fix to dynamically allocate size for the
two arrays and bitmask value based on the node numbers available in the
system. With the fix, perf numa benchmark will work with node configuration
on any system and thus removes the static MAX_NR_NODES value.

Signed-off-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lore.kernel.org/lkml/1614271802-1503-1-git-send-email-atrajeev@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-06 16:54:25 -03:00
Dmitry Safonov
ffc52b7ae5 perf diff: Don't crash on freeing errno-session on the error path
__cmd_diff() sets result of perf_session__new() to d->session.

In case of failure, it's errno and perf-diff may crash with:

  failed to open perf.data: Permission denied
  Failed to open perf.data
  Segmentation fault (core dumped)

From the coredump:

0  0x00005569a62b5955 in auxtrace__free (session=0xffffffffffffffff)
    at util/auxtrace.c:2681
1  0x00005569a626b37d in perf_session__delete (session=0xffffffffffffffff)
    at util/session.c:295
2  perf_session__delete (session=0xffffffffffffffff) at util/session.c:291
3  0x00005569a618008a in __cmd_diff () at builtin-diff.c:1239
4  cmd_diff (argc=<optimized out>, argv=<optimized out>) at builtin-diff.c:2011
[..]

Funny enough, it won't always crash. For me it crashes only if failed
file is second in cmd-line: the reason is that cmd_diff() check files for
branch-stacks [in check_file_brstack()] and if the first file doesn't
have brstacks, it doesn't proceed to try open other files from cmd-line.

Check d->session before calling perf_session__delete().

Another solution would be assigning to temporary variable, checking it,
but I find it easier to follow with IS_ERR() check in the same function.
After some time it's still obvious why the check is needed, and with
temp variable it's possible to make the same mistake.

Committer testing:

  $ perf record sleep 1
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.001 MB perf.data (8 samples) ]
  $ perf diff
  failed to open perf.data.old: No such file or directory
  Failed to open perf.data.old
  $ perf record sleep 1
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.001 MB perf.data (8 samples) ]
  $ perf diff
  # Event 'cycles:u'
  #
  # Baseline  Delta Abs  Shared Object     Symbol
  # ........  .........  ................  ..........................
  #
       0.92%    +87.66%  [unknown]         [k] 0xffffffff8825de16
      11.39%     +0.04%  ld-2.32.so        [.] __GI___tunables_init
      87.70%             ld-2.32.so        [.] _dl_check_map_versions
  $ sudo chown root:root perf.data
  [sudo] password for acme:
  $ perf diff
  failed to open perf.data: Permission denied
  Failed to open perf.data
  Segmentation fault (core dumped)
  $

After the patch:

  $ perf diff
  failed to open perf.data: Permission denied
  Failed to open perf.data
  $

Signed-off-by: Dmitry Safonov <dima@arista.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: dmitry safonov <dima@arista.com>
Link: http://lore.kernel.org/lkml/20210302023533.1572231-1-dima@arista.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-06 16:54:25 -03:00
Andreas Wendleder
2b1919ec83 perf tools: Clean 'generated' directory used for creating the syscall table on x86
Remove generated directory tools/perf/arch/x86/include/generated.

Signed-off-by: Andreas Wendleder <andreas.wendleder@gmail.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20210301185642.163396-1-gonsolo@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-06 16:54:25 -03:00
Jiri Olsa
762323eb39 perf build: Move feature cleanup under tools/build
Arnaldo reported issue for following build command:

  $ rm -rf /tmp/krava; mkdir /tmp/krava; make O=/tmp/krava clean
    CLEAN    config
  /bin/sh: line 0: cd: /tmp/krava/feature/: No such file or directory
  ../../scripts/Makefile.include:17: *** output directory "/tmp/krava/feature/" does not exist.  Stop.
  make[1]: *** [Makefile.perf:1010: config-clean] Error 2
  make: *** [Makefile:90: clean] Error 2

The problem is that now that we include scripts/Makefile.include
in feature's Makefile (which is fine and needed), we need to ensure
the OUTPUT directory exists, before executing (out of tree) clean
command.

Removing the feature's cleanup from perf Makefile and fixing
feature's cleanup under build Makefile, so it now checks that
there's existing OUTPUT directory before calling the clean.

Fixes: 211a741cd3 ("tools: Factor Clang, LLC and LLVM utils definitions")
Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com> # LLVM/Clang v13-git
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20210224150831.409639-1-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-06 16:54:24 -03:00
Pierre Gondois
ded2e511a8 perf tools: Cast (struct timeval).tv_sec when printing
The musl-libc [1] defines (struct timeval).tv_sec as a 'long long' for
arm and other architectures. The default build having a '-Wformat' flag,
not casting the field when printing prevents from building perf.

This patch casts the (struct timeval).tv_sec fields to the expected
format.

[1] git://git.musl-libc.org/musl

Signed-off-by: Pierre Gondois <Pierre.Gondois@arm.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Douglas.raillard@arm.com
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20210224182410.5366-1-Pierre.Gondois@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-06 16:54:24 -03:00
Arnaldo Carvalho de Melo
21b7e35bdf tools headers UAPI: Sync kvm.h headers with the kernel sources
To pick the changes in:

  d9a47edabc ("KVM: PPC: Book3S HV: Introduce new capability for 2nd DAWR")
  8d4e7e8083 ("KVM: x86: declare Xen HVM shared info capability and add test case")
  40da8ccd72 ("KVM: x86/xen: Add event channel interrupt vector upcall")

These new IOCTLs are now supported on 'perf trace':

  $ tools/perf/trace/beauty/kvm_ioctl.sh > before
  $ cp include/uapi/linux/kvm.h tools/include/uapi/linux/kvm.h
  $ tools/perf/trace/beauty/kvm_ioctl.sh > after
  $ diff -u before after
  --- before	2021-02-23 09:55:46.229058308 -0300
  +++ after	2021-02-23 09:55:57.509308058 -0300
  @@ -91,6 +91,10 @@
   	[0xc1] = "GET_SUPPORTED_HV_CPUID",
   	[0xc6] = "X86_SET_MSR_FILTER",
   	[0xc7] = "RESET_DIRTY_RINGS",
  +	[0xc8] = "XEN_HVM_GET_ATTR",
  +	[0xc9] = "XEN_HVM_SET_ATTR",
  +	[0xca] = "XEN_VCPU_GET_ATTR",
  +	[0xcb] = "XEN_VCPU_SET_ATTR",
   	[0xe0] = "CREATE_DEVICE",
   	[0xe1] = "SET_DEVICE_ATTR",
   	[0xe2] = "GET_DEVICE_ATTR",
  $

Addressing this perf build warning:
  Warning: Kernel ABI header at 'tools/include/uapi/linux/kvm.h' differs from latest version at 'include/uapi/linux/kvm.h'
  diff -u tools/include/uapi/linux/kvm.h include/uapi/linux/kvm.h

Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-06 16:54:23 -03:00
Arnaldo Carvalho de Melo
303550a447 tools headers UAPI s390: Sync ptrace.h kernel headers
To pick up the changes from:

  56e62a7370 ("s390: convert to generic entry")

That only adds two new defines, so shouldn't cause problems when
building the BPF selftests.

Silencing this perf build warning:

  Warning: Kernel ABI header at 'tools/arch/s390/include/uapi/asm/ptrace.h' differs from latest version at 'arch/s390/include/uapi/asm/ptrace.h'
  diff -u tools/arch/s390/include/uapi/asm/ptrace.h arch/s390/include/uapi/asm/ptrace.h

Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-06 16:54:23 -03:00
Arnaldo Carvalho de Melo
add76c0113 perf arch powerpc: Sync powerpc syscall.tbl with the kernel sources
To get the changes in:

  fbcee2ebe8 ("powerpc/32: Always save non volatile GPRs at syscall entry")

That shouldn't cause any change in tooling, just silences the following
tools/perf/ build warning:

  Warning: Kernel ABI header at 'tools/perf/arch/powerpc/entry/syscalls/syscall.tbl' differs from latest version at 'arch/powerpc/kernel/syscalls/syscall.tbl'

Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-06 16:54:23 -03:00
Arnaldo Carvalho de Melo
1e61463cfc tools headers UAPI: Sync openat2.h with the kernel sources
To pick the changes in:

  99668f6180 ("fs: expose LOOKUP_CACHED through openat2() RESOLVE_CACHED")

That don't result in any change in tooling, only silences this perf
build warning:

  Warning: Kernel ABI header at 'tools/include/uapi/linux/openat2.h' differs from latest version at 'include/uapi/linux/openat2.h'
  diff -u tools/include/uapi/linux/openat2.h include/uapi/linux/openat2.h

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-06 16:54:22 -03:00
Arnaldo Carvalho de Melo
c2446944b3 tools headers UAPI: Sync drm/i915_drm.h with the kernel sources
To pick the changes in:

  8c3b1ba0e7 ("drm/i915/gt: Track the overall awake/busy time")
  348fb0cb0a ("drm/i915/pmu: Deprecate I915_PMU_LAST and optimize state tracking")

That don't result in any change in tooling:

  $ tools/perf/trace/beauty/drm_ioctl.sh > before
  $ cp include/uapi/drm/i915_drm.h tools/include/uapi/drm/i915_drm.h
  $ tools/perf/trace/beauty/drm_ioctl.sh > after
  $ diff -u before after
  $

Only silences this perf build warning:

  Warning: Kernel ABI header at 'tools/include/uapi/drm/i915_drm.h' differs from latest version at 'include/uapi/drm/i915_drm.h'
  diff -u tools/include/uapi/drm/i915_drm.h include/uapi/drm/i915_drm.h

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-06 16:54:22 -03:00
Arnaldo Carvalho de Melo
3ae0415d0b tools headers UAPI: Update tools's copy of drm.h headers
Picking the changes from:

  0e0dc44800 ("drm/doc: demote old doc-comments in drm.h")

Silencing these perf build warnings:

  Warning: Kernel ABI header at 'tools/include/uapi/drm/drm.h' differs from latest version at 'include/uapi/drm/drm.h'
  diff -u tools/include/uapi/drm/drm.h include/uapi/drm/drm.h

No changes in tooling as these are just C comment documentation changes.

Cc: Simon Ser <contact@emersion.fr>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-06 16:54:22 -03:00
Jens Axboe
003e8dccdb io-wq: always track creds for async issue
If we go async with a request, grab the creds that the task currently has
assigned and make sure that the async side switches to them. This is
handled in the same way that we do for registered personalities.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-06 10:57:01 -07:00
Jens Axboe
886d0137f1 io-wq: fix race in freeing 'wq' and worker access
Ran into a use-after-free on the main io-wq struct, wq. It has a worker
ref and completion event, but the manager itself isn't holding a
reference. This can lead to a race where the manager thinks there are
no workers and exits, but a worker is being added. That leads to the
following trace:

BUG: KASAN: use-after-free in io_wqe_worker+0x4c0/0x5e0
Read of size 8 at addr ffff888108baa8a0 by task iou-wrk-3080422/3080425

CPU: 5 PID: 3080425 Comm: iou-wrk-3080422 Not tainted 5.12.0-rc1+ #110
Hardware name: Micro-Star International Co., Ltd. MS-7C60/TRX40 PRO 10G (MS-7C60), BIOS 1.60 05/13/2020
Call Trace:
 dump_stack+0x90/0xbe
 print_address_description.constprop.0+0x67/0x28d
 ? io_wqe_worker+0x4c0/0x5e0
 kasan_report.cold+0x7b/0xd4
 ? io_wqe_worker+0x4c0/0x5e0
 __asan_load8+0x6d/0xa0
 io_wqe_worker+0x4c0/0x5e0
 ? io_worker_handle_work+0xc00/0xc00
 ? recalc_sigpending+0xe5/0x120
 ? io_worker_handle_work+0xc00/0xc00
 ? io_worker_handle_work+0xc00/0xc00
 ret_from_fork+0x1f/0x30

Allocated by task 3080422:
 kasan_save_stack+0x23/0x60
 __kasan_kmalloc+0x80/0xa0
 kmem_cache_alloc_node_trace+0xa0/0x480
 io_wq_create+0x3b5/0x600
 io_uring_alloc_task_context+0x13c/0x380
 io_uring_add_task_file+0x109/0x140
 __x64_sys_io_uring_enter+0x45f/0x660
 do_syscall_64+0x32/0x80
 entry_SYSCALL_64_after_hwframe+0x44/0xae

Freed by task 3080422:
 kasan_save_stack+0x23/0x60
 kasan_set_track+0x20/0x40
 kasan_set_free_info+0x24/0x40
 __kasan_slab_free+0xe8/0x120
 kfree+0xa8/0x400
 io_wq_put+0x14a/0x220
 io_wq_put_and_exit+0x9a/0xc0
 io_uring_clean_tctx+0x101/0x140
 __io_uring_files_cancel+0x36e/0x3c0
 do_exit+0x169/0x1340
 __x64_sys_exit+0x34/0x40
 do_syscall_64+0x32/0x80
 entry_SYSCALL_64_after_hwframe+0x44/0xae

Have the manager itself hold a reference, and now both drop points drop
and complete if we hit zero, and the manager can unconditionally do a
wait_for_completion() instead of having a race between reading the ref
count and waiting if it was non-zero.

Fixes: fb3a1f6c74 ("io-wq: have manager wait for all workers to exit")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-06 10:57:01 -07:00
Aurelien Aptel
88fd98a230 cifs: ask for more credit on async read/write code paths
When doing a large read or write workload we only
very gradually increase the number of credits
which can cause problems with parallelizing large i/o
(I/O ramps up more slowly than it should for large
read/write workloads) especially with multichannel
when the number of credits on the secondary channels
starts out low (e.g. less than about 130) or when
recovering after server throttled back the number
of credit.

Signed-off-by: Aurelien Aptel <aaptel@suse.com>
Reviewed-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2021-03-06 11:35:57 -06:00
Aurelien Aptel
a249cc8bc2 cifs: fix credit accounting for extra channel
With multichannel, operations like the queries
from "ls -lR" can cause all credits to be used and
errors to be returned since max_credits was not
being set correctly on the secondary channels and
thus the client was requesting 0 credits incorrectly
in some cases (which can lead to not having
enough credits to perform any operation on that
channel).

Signed-off-by: Aurelien Aptel <aaptel@suse.com>
CC: <stable@vger.kernel.org> # v5.8+
Reviewed-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2021-03-06 11:35:57 -06:00
Geert Uytterhoeven
a65a802aad m68k: Fix virt_addr_valid() W=1 compiler warnings
If CONFIG_DEBUG_SG=y, and CONFIG_MMU=y:

    include/linux/scatterlist.h: In function ‘sg_set_buf’:
    arch/m68k/include/asm/page_mm.h:174:49: warning: ordered comparison of pointer with null pointer [-Wextra]
      174 | #define virt_addr_valid(kaddr) ((void *)(kaddr) >= (void *)PAGE_OFFSET && (void *)(kaddr) < high_memory)
	  |                                                 ^~

or CONFIG_MMU=n:

    include/linux/scatterlist.h: In function ‘sg_set_buf’:
    arch/m68k/include/asm/page_no.h:33:50: warning: ordered comparison of pointer with null pointer [-Wextra]
       33 | #define virt_addr_valid(kaddr) (((void *)(kaddr) >= (void *)PAGE_OFFSET) && \
	  |                                                  ^~

Fix this by doing the comparison in the "unsigned long" instead of the
"void *" domain.

Note that for now this is only seen when compiling btrfs, due to commit
e9aa7c285d ("btrfs: enable W=1 checks for btrfs"), but as people
are doing more W=1 compile testing, it will start to show up elsewhere,
too.

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Greg Ungerer <gerg@linux-m68k.org>
Link: https://lore.kernel.org/r/20210305084122.4118826-1-geert@linux-m68k.org
2021-03-06 14:15:07 +01:00
Andy Lutomirski
5d5675df79 x86/entry: Fix entry/exit mismatch on failed fast 32-bit syscalls
On a 32-bit fast syscall that fails to read its arguments from user
memory, the kernel currently does syscall exit work but not
syscall entry work.  This confuses audit and ptrace.  For example:

    $ ./tools/testing/selftests/x86/syscall_arg_fault_32
    ...
    strace: pid 264258: entering, ptrace_syscall_info.op == 2
    ...

This is a minimal fix intended for ease of backporting.  A more
complete cleanup is coming.

Fixes: 0b085e68f4 ("x86/entry: Consolidate 32/64 bit syscall entry")
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/8c82296ddf803b91f8d1e5eac89e5803ba54ab0e.1614884673.git.luto@kernel.org
2021-03-06 13:10:06 +01:00
Josh Poimboeuf
b59cc97674 x86/unwind/orc: Silence warnings caused by missing ORC data
The ORC unwinder attempts to fall back to frame pointers when ORC data
is missing for a given instruction.  It sets state->error, but then
tries to keep going as a best-effort type of thing.  That may result in
further warnings if the unwinder gets lost.

Until we have some way to register generated code with the unwinder,
missing ORC will be expected, and occasionally going off the rails will
also be expected.  So don't warn about it.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Ivan Babrou <ivan@cloudflare.com>
Link: https://lkml.kernel.org/r/06d02c4bbb220bd31668db579278b0352538efbb.1612534649.git.jpoimboe@redhat.com
2021-03-06 13:09:45 +01:00
Josh Poimboeuf
e504e74cc3 x86/unwind/orc: Disable KASAN checking in the ORC unwinder, part 2
KASAN reserves "redzone" areas between stack frames in order to detect
stack overruns.  A read or write to such an area triggers a KASAN
"stack-out-of-bounds" BUG.

Normally, the ORC unwinder stays in-bounds and doesn't access the
redzone.  But sometimes it can't find ORC metadata for a given
instruction.  This can happen for code which is missing ORC metadata, or
for generated code.  In such cases, the unwinder attempts to fall back
to frame pointers, as a best-effort type thing.

This fallback often works, but when it doesn't, the unwinder can get
confused and go off into the weeds into the KASAN redzone, triggering
the aforementioned KASAN BUG.

But in this case, the unwinder's confusion is actually harmless and
working as designed.  It already has checks in place to prevent
off-stack accesses, but those checks get short-circuited by the KASAN
BUG.  And a BUG is a lot more disruptive than a harmless unwinder
warning.

Disable the KASAN checks by using READ_ONCE_NOCHECK() for all stack
accesses.  This finishes the job started by commit 881125bfe6
("x86/unwind: Disable KASAN checking in the ORC unwinder"), which only
partially fixed the issue.

Fixes: ee9f8fce99 ("x86/unwind: Add the ORC unwinder")
Reported-by: Ivan Babrou <ivan@cloudflare.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Tested-by: Ivan Babrou <ivan@cloudflare.com>
Cc: stable@kernel.org
Link: https://lkml.kernel.org/r/9583327904ebbbeda399eca9c56d6c7085ac20fe.1612534649.git.jpoimboe@redhat.com
2021-03-06 13:09:37 +01:00
Kan Liang
afbef30149 perf/x86/intel: Set PERF_ATTACH_SCHED_CB for large PEBS and LBR
To supply a PID/TID for large PEBS, it requires flushing the PEBS buffer
in a context switch.

For normal LBRs, a context switch can flip the address space and LBR
entries are not tagged with an identifier, we need to wipe the LBR, even
for per-cpu events.

For LBR callstack, save/restore the stack is required during a context
switch.

Set PERF_ATTACH_SCHED_CB for the event with large PEBS & LBR.

Fixes: 9c964efa43 ("perf/x86/intel: Drain the PEBS buffer during context switches")
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20201130193842.10569-2-kan.liang@linux.intel.com
2021-03-06 12:52:44 +01:00
Kan Liang
a5398bffc0 perf/core: Flush PMU internal buffers for per-CPU events
Sometimes the PMU internal buffers have to be flushed for per-CPU events
during a context switch, e.g., large PEBS. Otherwise, the perf tool may
report samples in locations that do not belong to the process where the
samples are processed in, because PEBS does not tag samples with PID/TID.

The current code only flush the buffers for a per-task event. It doesn't
check a per-CPU event.

Add a new event state flag, PERF_ATTACH_SCHED_CB, to indicate that the
PMU internal buffers have to be flushed for this event during a context
switch.

Add sched_cb_entry and perf_sched_cb_usages back to track the PMU/cpuctx
which is required to be flushed.

Only need to invoke the sched_task() for per-CPU events in this patch.
The per-task events have been handled in perf_event_context_sched_in/out
already.

Fixes: 9c964efa43 ("perf/x86/intel: Drain the PEBS buffer during context switches")
Reported-by: Gabriel Marin <gmx@google.com>
Originally-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20201130193842.10569-1-kan.liang@linux.intel.com
2021-03-06 12:52:39 +01:00
Peter Zijlstra
50bf8080a9 static_call: Fix the module key fixup
Provided the target address of a R_X86_64_PC32 relocation is aligned,
the low two bits should be invariant between the relative and absolute
value.

Turns out the address is not aligned and things go sideways, ensure we
transfer the bits in the absolute form when fixing up the key address.

Fixes: 73f44fe19d ("static_call: Allow module use without exposing static_call_key")
Reported-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Link: https://lkml.kernel.org/r/20210225220351.GE4746@worktop.programming.kicks-ass.net
2021-03-06 12:49:08 +01:00
Mathieu Desnoyers
ce29ddc47b sched/membarrier: fix missing local execution of ipi_sync_rq_state()
The function sync_runqueues_membarrier_state() should copy the
membarrier state from the @mm received as parameter to each runqueue
currently running tasks using that mm.

However, the use of smp_call_function_many() skips the current runqueue,
which is unintended. Replace by a call to on_each_cpu_mask().

Fixes: 227a4aadc7 ("sched/membarrier: Fix p->mm->membarrier_state racy load")
Reported-by: Nadav Amit <nadav.amit@gmail.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: stable@vger.kernel.org # 5.4.x+
Link: https://lore.kernel.org/r/74F1E842-4A84-47BF-B6C2-5407DFDD4A4A@gmail.com
2021-03-06 12:40:21 +01:00
Peter Zijlstra
50caf9c14b sched: Simplify set_affinity_pending refcounts
Now that we have set_affinity_pending::stop_pending to indicate if a
stopper is in progress, and we have the guarantee that if that stopper
exists, it will (eventually) complete our @pending we can simplify the
refcount scheme by no longer counting the stopper thread.

Fixes: 6d337eab04 ("sched: Fix migrate_disable() vs set_cpus_allowed_ptr()")
Cc: stable@kernel.org
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Link: https://lkml.kernel.org/r/20210224131355.724130207@infradead.org
2021-03-06 12:40:21 +01:00
Peter Zijlstra
9e81889c76 sched: Fix affine_move_task() self-concurrency
Consider:

   sched_setaffinity(p, X);		sched_setaffinity(p, Y);

Then the first will install p->migration_pending = &my_pending; and
issue stop_one_cpu_nowait(pending); and the second one will read
p->migration_pending and _also_ issue: stop_one_cpu_nowait(pending),
the _SAME_ @pending.

This causes stopper list corruption.

Add set_affinity_pending::stop_pending, to indicate if a stopper is in
progress.

Fixes: 6d337eab04 ("sched: Fix migrate_disable() vs set_cpus_allowed_ptr()")
Cc: stable@kernel.org
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Link: https://lkml.kernel.org/r/20210224131355.649146419@infradead.org
2021-03-06 12:40:21 +01:00
Peter Zijlstra
3f1bc119cd sched: Optimize migration_cpu_stop()
When the purpose of migration_cpu_stop() is to migrate the task to
'any' valid CPU, don't migrate the task when it's already running on a
valid CPU.

Fixes: 6d337eab04 ("sched: Fix migrate_disable() vs set_cpus_allowed_ptr()")
Cc: stable@kernel.org
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Link: https://lkml.kernel.org/r/20210224131355.569238629@infradead.org
2021-03-06 12:40:21 +01:00
Peter Zijlstra
58b1a45086 sched: Collate affine_move_task() stoppers
The SCA_MIGRATE_ENABLE and task_running() cases are almost identical,
collapse them to avoid further duplication.

Fixes: 6d337eab04 ("sched: Fix migrate_disable() vs set_cpus_allowed_ptr()")
Cc: stable@kernel.org
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Link: https://lkml.kernel.org/r/20210224131355.500108964@infradead.org
2021-03-06 12:40:21 +01:00
Peter Zijlstra
c20cf065d4 sched: Simplify migration_cpu_stop()
When affine_move_task() issues a migration_cpu_stop(), the purpose of
that function is to complete that @pending, not any random other
p->migration_pending that might have gotten installed since.

This realization much simplifies migration_cpu_stop() and allows
further necessary steps to fix all this as it provides the guarantee
that @pending's stopper will complete @pending (and not some random
other @pending).

Fixes: 6d337eab04 ("sched: Fix migrate_disable() vs set_cpus_allowed_ptr()")
Cc: stable@kernel.org
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Link: https://lkml.kernel.org/r/20210224131355.430014682@infradead.org
2021-03-06 12:40:20 +01:00
Peter Zijlstra
8a6edb5257 sched: Fix migration_cpu_stop() requeueing
When affine_move_task(p) is called on a running task @p, which is not
otherwise already changing affinity, we'll first set
p->migration_pending and then do:

	 stop_one_cpu(cpu_of_rq(rq), migration_cpu_stop, &arg);

This then gets us to migration_cpu_stop() running on the CPU that was
previously running our victim task @p.

If we find that our task is no longer on that runqueue (this can
happen because of a concurrent migration due to load-balance etc.),
then we'll end up at the:

	} else if (dest_cpu < 1 || pending) {

branch. Which we'll take because we set pending earlier. Here we first
check if the task @p has already satisfied the affinity constraints,
if so we bail early [A]. Otherwise we'll reissue migration_cpu_stop()
onto the CPU that is now hosting our task @p:

	stop_one_cpu_nowait(cpu_of(rq), migration_cpu_stop,
			    &pending->arg, &pending->stop_work);

Except, we've never initialized pending->arg, which will be all 0s.

This then results in running migration_cpu_stop() on the next CPU with
arg->p == NULL, which gives the by now obvious result of fireworks.

The cure is to change affine_move_task() to always use pending->arg,
furthermore we can use the exact same pattern as the
SCA_MIGRATE_ENABLE case, since we'll block on the pending->done
completion anyway, no point in adding yet another completion in
stop_one_cpu().

This then gives a clear distinction between the two
migration_cpu_stop() use cases:

  - sched_exec() / migrate_task_to() : arg->pending == NULL
  - affine_move_task() : arg->pending != NULL;

And we can have it ignore p->migration_pending when !arg->pending. Any
stop work from sched_exec() / migrate_task_to() is in addition to stop
works from affine_move_task(), which will be sufficient to issue the
completion.

Fixes: 6d337eab04 ("sched: Fix migrate_disable() vs set_cpus_allowed_ptr()")
Cc: stable@kernel.org
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Link: https://lkml.kernel.org/r/20210224131355.357743989@infradead.org
2021-03-06 12:40:20 +01:00
Takashi Iwai
56b26497bb ALSA: hda/conexant: Add quirk for mute LED control on HP ZBook G5
The mute and mic-mute LEDs on HP ZBook Studio G5 are controlled via
GPIO bits 0x10 and 0x20, respectively, and we need the extra setup for
those.

As the similar code is already present for other HP models but with
different GPIO pins, this patch factors out the common helper code and
applies those GPIO values for each model.

BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=211893
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20210306095018.11746-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2021-03-06 10:51:14 +01:00
Jia He
357ad203d4 KVM: arm64: Fix range alignment when walking page tables
When walking the page tables at a given level, and if the start
address for the range isn't aligned for that level, we propagate
the misalignment on each iteration at that level.

This results in the walker ignoring a number of entries (depending
on the original misalignment) on each subsequent iteration.

Properly aligning the address before the next iteration addresses
this issue.

Cc: stable@vger.kernel.org
Reported-by: Howard Zhang <Howard.Zhang@arm.com>
Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Jia He <justin.he@arm.com>
Fixes: b1e57de62c ("KVM: arm64: Add stand-alone page-table walker infrastructure")
[maz: rewrite commit message]
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210303024225.2591-1-justin.he@arm.com
Message-Id: <20210305185254.3730990-9-maz@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-06 04:18:41 -05:00
Marc Zyngier
9739f6ef05 KVM: arm64: Workaround firmware wrongly advertising GICv2-on-v3 compatibility
It looks like we have broken firmware out there that wrongly advertises
a GICv2 compatibility interface, despite the CPUs not being able to deal
with it.

To work around this, check that the CPU initialising KVM is actually able
to switch to MMIO instead of system registers, and use that as a
precondition to enable GICv2 compatibility in KVM.

Note that the detection happens on a single CPU. If the firmware is
lying *and* that the CPUs are asymetric, all hope is lost anyway.

Reported-by: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Message-Id: <20210305185254.3730990-8-maz@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-06 04:18:41 -05:00
Marc Zyngier
b9d699e269 KVM: arm64: Rename __vgic_v3_get_ich_vtr_el2() to __vgic_v3_get_gic_config()
As we are about to report a bit more information to the rest of
the kernel, rename __vgic_v3_get_ich_vtr_el2() to the more
explicit __vgic_v3_get_gic_config().

No functional change.

Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Message-Id: <20210305185254.3730990-7-maz@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-06 04:18:41 -05:00
Marc Zyngier
f27647b588 KVM: arm64: Don't access PMSELR_EL0/PMUSERENR_EL0 when no PMU is available
When running under a nesting hypervisor, it isn't guaranteed that
the virtual HW will include a PMU. In which case, let's not try
to access the PMU registers in the world switch, as that'd be
deadly.

Reported-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
Link: https://lore.kernel.org/r/20210209114844.3278746-3-maz@kernel.org
Message-Id: <20210305185254.3730990-6-maz@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-06 04:18:40 -05:00
Marc Zyngier
6b5b368fcc KVM: arm64: Turn kvm_arm_support_pmu_v3() into a static key
We currently find out about the presence of a HW PMU (or the handling
of that PMU by perf, which amounts to the same thing) in a fairly
roundabout way, by checking the number of counters available to perf.
That's good enough for now, but we will soon need to find about about
that on paths where perf is out of reach (in the world switch).

Instead, let's turn kvm_arm_support_pmu_v3() into a static key.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
Link: https://lore.kernel.org/r/20210209114844.3278746-2-maz@kernel.org
Message-Id: <20210305185254.3730990-5-maz@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-06 04:18:40 -05:00
Andrew Scull
c4b000c392 KVM: arm64: Fix nVHE hyp panic host context restore
When panicking from the nVHE hyp and restoring the host context, x29 is
expected to hold a pointer to the host context. This wasn't being done
so fix it to make sure there's a valid pointer the host context being
used.

Rather than passing a boolean indicating whether or not the host context
should be restored, instead pass the pointer to the host context. NULL
is passed to indicate that no context should be restored.

Fixes: a2e102e20f ("KVM: arm64: nVHE: Handle hyp panics")
Cc: stable@vger.kernel.org
Signed-off-by: Andrew Scull <ascull@google.com>
[maz: partial rewrite to fit 5.12-rc1]
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210219122406.1337626-1-ascull@google.com
Message-Id: <20210305185254.3730990-4-maz@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-06 04:18:40 -05:00
Will Deacon
31948332d5 KVM: arm64: Avoid corrupting vCPU context register in guest exit
Commit 7db2153047 ("KVM: arm64: Restore hyp when panicking in guest
context") tracks the currently running vCPU, clearing the pointer to
NULL on exit from a guest.

Unfortunately, the use of 'set_loaded_vcpu' clobbers x1 to point at the
kvm_hyp_ctxt instead of the vCPU context, causing the subsequent RAS
code to go off into the weeds when it saves the DISR assuming that the
CPU context is embedded in a struct vCPU.

Leave x1 alone and use x3 as a temporary register instead when clearing
the vCPU on the guest exit path.

Cc: Marc Zyngier <maz@kernel.org>
Cc: Andrew Scull <ascull@google.com>
Cc: <stable@vger.kernel.org>
Fixes: 7db2153047 ("KVM: arm64: Restore hyp when panicking in guest context")
Suggested-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210226181211.14542-1-will@kernel.org
Message-Id: <20210305185254.3730990-3-maz@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-06 04:18:40 -05:00
Suzuki K Poulose
b96b0c5de6 KVM: arm64: nvhe: Save the SPE context early
The nVHE KVM hyp drains and disables the SPE buffer, before
entering the guest, as the EL1&0 translation regime
is going to be loaded with that of the guest.

But this operation is performed way too late, because :
  - The owning translation regime of the SPE buffer
    is transferred to EL2. (MDCR_EL2_E2PB == 0)
  - The guest Stage1 is loaded.

Thus the flush could use the host EL1 virtual address,
but use the EL2 translations instead of host EL1, for writing
out any cached data.

Fix this by moving the SPE buffer handling early enough.
The restore path is doing the right thing.

Fixes: 014c4c77aa ("KVM: arm64: Improve debug register save/restore flow")
Cc: stable@vger.kernel.org
Cc: Christoffer Dall <christoffer.dall@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Alexandru Elisei <alexandru.elisei@arm.com>
Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210302120345.3102874-1-suzuki.poulose@arm.com
Message-Id: <20210305185254.3730990-2-maz@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-06 04:18:39 -05:00
Muhammad Usama Anjum
4691453406 kvm: x86: use NULL instead of using plain integer as pointer
Sparse warnings removed:
warning: Using plain integer as NULL pointer

Signed-off-by: Muhammad Usama Anjum <musamaanjum@gmail.com>
Message-Id: <20210305180816.GA488770@LEGION>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-06 04:18:39 -05:00
Linus Torvalds
a38fd87484 Linux 5.12-rc2 2021-03-05 17:33:41 -08:00
Linus Torvalds
f3ed4de6cc Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma fixes from Jason Gunthorpe:
 "Nothing special here, though Bob's regression fixes for rxe would have
  made it before the rc cycle had there not been such strong winter
  weather!

   - Fix corner cases in the rxe reference counting cleanup that are
     causing regressions in blktests for SRP

   - Two kdoc fixes so W=1 is clean

   - Missing error return in error unwind for mlx5

   - Wrong lock type nesting in IB CM"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
  RDMA/rxe: Fix errant WARN_ONCE in rxe_completer()
  RDMA/rxe: Fix extra deref in rxe_rcv_mcast_pkt()
  RDMA/rxe: Fix missed IB reference counting in loopback
  RDMA/uverbs: Fix kernel-doc warning of _uverbs_alloc
  RDMA/mlx5: Set correct kernel-doc identifier
  IB/mlx5: Add missing error code
  RDMA/rxe: Fix missing kconfig dependency on CRYPTO
  RDMA/cm: Fix IRQ restore in ib_send_cm_sidr_rep
2021-03-05 17:27:59 -08:00
Linus Torvalds
de5bd6c54b Merge tag 'gcc-plugins-v5.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull gcc-plugins fixes from Kees Cook:
 "Tiny gcc-plugin fixes for v5.12-rc2. These issues are small but have
  been reported a couple times now by static analyzers, so best to get
  them fixed to reduce the noise. :)

   - Fix coding style issues (Jason Yan)"

* tag 'gcc-plugins-v5.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  gcc-plugins: latent_entropy: remove unneeded semicolon
  gcc-plugins: structleak: remove unneeded variable 'ret'
2021-03-05 17:23:03 -08:00
Linus Torvalds
8b24ef4478 Merge tag 'pstore-v5.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull pstore fixes from Kees Cook:

 - Rate-limit ECC warnings (Dmitry Osipenko)

 - Fix error path check for NULL (Tetsuo Handa)

* tag 'pstore-v5.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  pstore/ram: Rate-limit "uncorrectable error in header" message
  pstore: Fix warning in pstore_kill_sb()
2021-03-05 17:21:25 -08:00
Jakub Kicinski
a4dcfbc4ee ethernet: alx: fix order of calls on resume
netif_device_attach() will unpause the queues so we can't call
it before __alx_open(). This went undetected until
commit b0999223f2 ("alx: add ability to allocate and free
alx_napi structures") but now if stack tries to xmit immediately
on resume before __alx_open() we'll crash on the NAPI being null:

 BUG: kernel NULL pointer dereference, address: 0000000000000198
 CPU: 0 PID: 12 Comm: ksoftirqd/0 Tainted: G           OE 5.10.0-3-amd64 #1 Debian 5.10.13-1
 Hardware name: Gigabyte Technology Co., Ltd. To be filled by O.E.M./H77-D3H, BIOS F15 11/14/2013
 RIP: 0010:alx_start_xmit+0x34/0x650 [alx]
 Code: 41 56 41 55 41 54 55 53 48 83 ec 20 0f b7 57 7c 8b 8e b0
0b 00 00 39 ca 72 06 89 d0 31 d2 f7 f1 89 d2 48 8b 84 df
 RSP: 0018:ffffb09240083d28 EFLAGS: 00010297
 RAX: 0000000000000000 RBX: ffffa04d80ae7800 RCX: 0000000000000004
 RDX: 0000000000000000 RSI: ffffa04d80afa000 RDI: ffffa04e92e92a00
 RBP: 0000000000000042 R08: 0000000000000100 R09: ffffa04ea3146700
 R10: 0000000000000014 R11: 0000000000000000 R12: ffffa04e92e92100
 R13: 0000000000000001 R14: ffffa04e92e92a00 R15: ffffa04e92e92a00
 FS:  0000000000000000(0000) GS:ffffa0508f600000(0000) knlGS:0000000000000000
 i915 0000:00:02.0: vblank wait timed out on crtc 0
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000198 CR3: 000000004460a001 CR4: 00000000001706f0
 Call Trace:
  dev_hard_start_xmit+0xc7/0x1e0
  sch_direct_xmit+0x10f/0x310

Cc: <stable@vger.kernel.org> # 4.9+
Fixes: bc2bebe8de ("alx: remove WoL support")
Reported-by: Zbynek Michl <zbynek.michl@gmail.com>
Link: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=983595
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Tested-by: Zbynek Michl <zbynek.michl@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-05 15:05:33 -08:00
George McCollister
3e21a10fde lan743x: trim all 4 bytes of the FCS; not just 2
Trim all 4 bytes of the received FCS; not just 2 of them. Leaving 2
bytes of the FCS on the frame breaks DSA tailing tag drivers.

Fixes: a8db76d40e ("lan743x: boost performance on cpu archs w/o dma cache snooping")
Signed-off-by: George McCollister <george.mccollister@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-05 14:42:55 -08:00
Linus Torvalds
63dcd69d9b Merge tag 'for-5.12/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper fixes from Mike Snitzer:
 "Fix DM verity target's optional Forward Error Correction (FEC) for
  Reed-Solomon roots that are unaligned to block size"

* tag 'for-5.12/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
  dm verity: fix FEC for RS roots unaligned to block size
  dm bufio: subtract the number of initial sectors in dm_bufio_get_device_size
2021-03-05 13:25:23 -08:00
Michael Braun
d8861bab48 gianfar: fix jumbo packets+napi+rx overrun crash
When using jumbo packets and overrunning rx queue with napi enabled,
the following sequence is observed in gfar_add_rx_frag:

   | lstatus                              |       | skb                   |
t  | lstatus,  size, flags                | first | len, data_len, *ptr   |
---+--------------------------------------+-------+-----------------------+
13 | 18002348, 9032, INTERRUPT LAST       | 0     | 9600, 8000,  f554c12e |
12 | 10000640, 1600, INTERRUPT            | 0     | 8000, 6400,  f554c12e |
11 | 10000640, 1600, INTERRUPT            | 0     | 6400, 4800,  f554c12e |
10 | 10000640, 1600, INTERRUPT            | 0     | 4800, 3200,  f554c12e |
09 | 10000640, 1600, INTERRUPT            | 0     | 3200, 1600,  f554c12e |
08 | 14000640, 1600, INTERRUPT FIRST      | 0     | 1600, 0,     f554c12e |
07 | 14000640, 1600, INTERRUPT FIRST      | 1     | 0,    0,     f554c12e |
06 | 1c000080, 128,  INTERRUPT LAST FIRST | 1     | 0,    0,     abf3bd6e |
05 | 18002348, 9032, INTERRUPT LAST       | 0     | 8000, 6400,  c5a57780 |
04 | 10000640, 1600, INTERRUPT            | 0     | 6400, 4800,  c5a57780 |
03 | 10000640, 1600, INTERRUPT            | 0     | 4800, 3200,  c5a57780 |
02 | 10000640, 1600, INTERRUPT            | 0     | 3200, 1600,  c5a57780 |
01 | 10000640, 1600, INTERRUPT            | 0     | 1600, 0,     c5a57780 |
00 | 14000640, 1600, INTERRUPT FIRST      | 1     | 0,    0,     c5a57780 |

So at t=7 a new packets is started but not finished, probably due to rx
overrun - but rx overrun is not indicated in the flags. Instead a new
packets starts at t=8. This results in skb->len to exceed size for the LAST
fragment at t=13 and thus a negative fragment size added to the skb.

This then crashes:

kernel BUG at include/linux/skbuff.h:2277!
Oops: Exception in kernel mode, sig: 5 [#1]
...
NIP [c04689f4] skb_pull+0x2c/0x48
LR [c03f62ac] gfar_clean_rx_ring+0x2e4/0x844
Call Trace:
[ec4bfd38] [c06a84c4] _raw_spin_unlock_irqrestore+0x60/0x7c (unreliable)
[ec4bfda8] [c03f6a44] gfar_poll_rx_sq+0x48/0xe4
[ec4bfdc8] [c048d504] __napi_poll+0x54/0x26c
[ec4bfdf8] [c048d908] net_rx_action+0x138/0x2c0
[ec4bfe68] [c06a8f34] __do_softirq+0x3a4/0x4fc
[ec4bfed8] [c0040150] run_ksoftirqd+0x58/0x70
[ec4bfee8] [c0066ecc] smpboot_thread_fn+0x184/0x1cc
[ec4bff08] [c0062718] kthread+0x140/0x144
[ec4bff38] [c0012350] ret_from_kernel_thread+0x14/0x1c

This patch fixes this by checking for computed LAST fragment size, so a
negative sized fragment is never added.
In order to prevent the newer rx frame from getting corrupted, the FIRST
flag is checked to discard the incomplete older frame.

Signed-off-by: Michael Braun <michael-dev@fami-braun.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-05 13:13:32 -08:00
Denis Efremov
155b23e6e5 sun/niu: fix wrong RXMAC_BC_FRM_CNT_COUNT count
RXMAC_BC_FRM_CNT_COUNT added to mp->rx_bcasts twice in a row
in niu_xmac_interrupt(). Remove the second addition.

Signed-off-by: Denis Efremov <efremov@linux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-05 13:00:38 -08:00
Denis Efremov
85554bcd12 net/hamradio/6pack: remove redundant check in sp_encaps()
"len > sp->mtu" checked twice in a row in sp_encaps().
Remove the second check.

Signed-off-by: Denis Efremov <efremov@linux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-05 13:00:38 -08:00
Hayes Wang
abbf9a0ef8 r8169: fix r8168fp_adjust_ocp_cmd function
The (0xBAF70000 & 0x00FFF000) << 6 should be (0xf70 << 18).

Fixes: 561535b0f2 ("r8169: fix OCP access on RTL8117")
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Acked-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-05 13:00:38 -08:00
Xu Wang
0a7e0c3b57 selftest/net/ipsec.c: Remove unneeded semicolon
fix semicolon.cocci warning:
tools/testing/selftests/net/ipsec.c:1788:2-3: Unneeded semicolon

Signed-off-by: Xu Wang <vulab@iscas.ac.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-05 13:00:38 -08:00
Junlin Yang
69cdb7947a ibmvnic: remove excessive irqsave
ibmvnic_remove locks multiple spinlocks while disabling interrupts:
spin_lock_irqsave(&adapter->state_lock, flags);
spin_lock_irqsave(&adapter->rwi_lock, flags);

As reported by coccinelle, the second _irqsave() overwrites the value
saved in 'flags' by the first _irqsave(),   therefore when the second
_irqrestore() comes,the value in 'flags' is not valid,the value saved
by the first _irqsave() has been lost.
This likely leads to IRQs remaining disabled. So remove the second
_irqsave():
spin_lock_irqsave(&adapter->state_lock, flags);
spin_lock(&adapter->rwi_lock);

Generated by: ./scripts/coccinelle/locks/flags.cocci
./drivers/net/ethernet/ibm/ibmvnic.c:5413:1-18:
ERROR: nested lock+irqsave that reuses flags from line 5404.

Fixes: 4a41c421f3 ("ibmvnic: serialize access to work queue on remove")
Signed-off-by: Junlin Yang <yangjunlin@yulong.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-05 13:00:38 -08:00
Sergey Nazarov
e233febda6 CIPSO: Fix unaligned memory access in cipso_v4_gentag_hdr
We need to use put_unaligned when writing 32-bit DOI value
in cipso_v4_gentag_hdr to avoid unaligned memory access.

v2: unneeded type cast removed as Ondrej Mosnacek suggested.

Signed-off-by: Sergey Nazarov <s-nazarov@yandex.ru>
Acked-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-05 13:00:38 -08:00
Linus Torvalds
47454caf45 Merge tag 'block-5.12-2021-03-05' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:

 - NVMe fixes:
      - more device quirks (Julian Einwag, Zoltán Böszörményi, Pascal
        Terjan)
      - fix a hwmon error return (Daniel Wagner)
      - fix the keep alive timeout initialization (Martin George)
      - ensure the model_number can't be changed on a used subsystem
        (Max Gurtovoy)

 - rsxx missing -EFAULT on copy_to_user() failure (Dan)

 - rsxx remove unused linux.h include (Tian)

 - kill unused RQF_SORTED (Jean)

 - updated outdated BFQ comments (Joseph)

 - revert work-around commit for bd_size_lock, since we removed the
   offending user in this merge window (Damien)

* tag 'block-5.12-2021-03-05' of git://git.kernel.dk/linux-block:
  nvmet: model_number must be immutable once set
  nvme-fabrics: fix kato initialization
  nvme-hwmon: Return error code when registration fails
  nvme-pci: add quirks for Lexar 256GB SSD
  nvme-pci: mark Kingston SKC2000 as not supporting the deepest power state
  nvme-pci: mark Seagate Nytro XM1440 as QUIRK_NO_NS_DESC_LIST.
  rsxx: Return -EFAULT if copy_to_user() fails
  block/bfq: update comments and default value in docs for fifo_expire
  rsxx: remove unused including <linux/version.h>
  block: Drop leftover references to RQF_SORTED
  block: revert "block: fix bd_size_lock use"
2021-03-05 12:59:37 -08:00
Wong Vee Khee
8eb37ab7cc stmmac: intel: Fixes clock registration error seen for multiple interfaces
Issue seen when enumerating multiple Intel mGbE interfaces in EHL.

[    6.898141] intel-eth-pci 0000:00:1d.2: enabling device (0000 -> 0002)
[    6.900971] intel-eth-pci 0000:00:1d.2: Fail to register stmmac-clk
[    6.906434] intel-eth-pci 0000:00:1d.2: User ID: 0x51, Synopsys ID: 0x52

We fix it by making the clock name to be unique following the format
of stmmac-pci_name(pci_dev) so that we can differentiate the clock for
these Intel mGbE interfaces in EHL platform as follow:

  /sys/kernel/debug/clk/stmmac-0000:00:1d.1
  /sys/kernel/debug/clk/stmmac-0000:00:1d.2
  /sys/kernel/debug/clk/stmmac-0000:00:1e.4

Fixes: 58da0cfa6c ("net: stmmac: create dwmac-intel.c to contain all Intel platform")
Signed-off-by: Wong Vee Khee <vee.khee.wong@intel.com>
Signed-off-by: Voon Weifeng <weifeng.voon@intel.com>
Co-developed-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-05 12:59:32 -08:00
Ong Boon Leong
9a7b3950c7 net: stmmac: Fix VLAN filter delete timeout issue in Intel mGBE SGMII
For Intel mGbE controller, MAC VLAN filter delete operation will time-out
if serdes power-down sequence happened first during driver remove() with
below message.

[82294.764958] intel-eth-pci 0000:00:1e.4 eth2: stmmac_dvr_remove: removing driver
[82294.778677] intel-eth-pci 0000:00:1e.4 eth2: Timeout accessing MAC_VLAN_Tag_Filter
[82294.779997] intel-eth-pci 0000:00:1e.4 eth2: failed to kill vid 0081/0
[82294.947053] intel-eth-pci 0000:00:1d.2 eth1: stmmac_dvr_remove: removing driver
[82295.002091] intel-eth-pci 0000:00:1d.1 eth0: stmmac_dvr_remove: removing driver

Therefore, we delay the serdes power-down to be after unregister_netdev()
which triggers the VLAN filter delete.

Fixes: b9663b7ca6 ("net: stmmac: Enable SERDES power up/down sequence")
Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-05 12:59:32 -08:00
Jia-Ju Bai
6650d31f21 net: intel: iavf: fix error return code of iavf_init_get_resources()
When iavf_process_config() fails, no error return code of
iavf_init_get_resources() is assigned.
To fix this bug, err is assigned with the return value of
iavf_process_config(), and then err is checked.

Reported-by: TOTE Robot <oslab@tsinghua.edu.cn>
Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-05 12:59:32 -08:00
Jia-Ju Bai
38c26ff304 net: tehuti: fix error return code in bdx_probe()
When bdx_read_mac() fails, no error return code of bdx_probe()
is assigned.
To fix this bug, err is assigned with -EFAULT as error return code.

Reported-by: TOTE Robot <oslab@tsinghua.edu.cn>
Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-05 12:59:32 -08:00
Linus Torvalds
f292e8730a Merge tag 'io_uring-5.12-2021-03-05' of git://git.kernel.dk/linux-block
Pull io_uring fixes from Jens Axboe:
 "A bit of a mix between fallout from the worker change, cleanups and
  reductions now possible from that change, and fixes in general. In
  detail:

   - Fully serialize manager and worker creation, fixing races due to
     that.

   - Clean up some naming that had gone stale.

   - SQPOLL fixes.

   - Fix race condition around task_work rework that went into this
     merge window.

   - Implement unshare. Used for when the original task does unshare(2)
     or setuid/seteuid and friends, drops the original workers and forks
     new ones.

   - Drop the only remaining piece of state shuffling we had left, which
     was cred. Move it into issue instead, and we can drop all of that
     code too.

   - Kill f_op->flush() usage. That was such a nasty hack that we had
     out of necessity, we no longer need it.

   - Following from ->flush() removal, we can also drop various bits of
     ctx state related to SQPOLL and cancelations.

   - Fix an issue with IOPOLL retry, which originally was fallout from a
     filemap change (removing iov_iter_revert()), but uncovered an issue
     with iovec re-import too late.

   - Fix an issue with system suspend.

   - Use xchg() for fallback work, instead of cmpxchg().

   - Properly destroy io-wq on exec.

   - Add create_io_thread() core helper, and use that in io-wq and
     io_uring. This allows us to remove various silly completion events
     related to thread setup.

   - A few error handling fixes.

  This should be the grunt of fixes necessary for the new workers, next
  week should be quieter. We've got a pending series from Pavel on
  cancelations, and how tasks and rings are indexed. Outside of that,
  should just be minor fixes. Even with these fixes, we're still killing
  a net ~80 lines"

* tag 'io_uring-5.12-2021-03-05' of git://git.kernel.dk/linux-block: (41 commits)
  io_uring: don't restrict issue_flags for io_openat
  io_uring: make SQPOLL thread parking saner
  io-wq: kill hashed waitqueue before manager exits
  io_uring: clear IOCB_WAITQ for non -EIOCBQUEUED return
  io_uring: don't keep looping for more events if we can't flush overflow
  io_uring: move to using create_io_thread()
  kernel: provide create_io_thread() helper
  io_uring: reliably cancel linked timeouts
  io_uring: cancel-match based on flags
  io-wq: ensure all pending work is canceled on exit
  io_uring: ensure that threads freeze on suspend
  io_uring: remove extra in_idle wake up
  io_uring: inline __io_queue_async_work()
  io_uring: inline io_req_clean_work()
  io_uring: choose right tctx->io_wq for try cancel
  io_uring: fix -EAGAIN retry with IOPOLL
  io-wq: fix error path leak of buffered write hash map
  io_uring: remove sqo_task
  io_uring: kill sqo_dead and sqo submission halting
  io_uring: ignore double poll add on the same waitqueue head
  ...
2021-03-05 12:44:43 -08:00
Kevin(Yudong) Yang
00ff801bb8 net/mlx4_en: update moderation when config reset
This patch fixes a bug that the moderation config will not be
applied when calling mlx4_en_reset_config. For example, when
turning on rx timestamping, mlx4_en_reset_config() will be called,
causing the NIC to forget previous moderation config.

This fix is in phase with a previous fix:
commit 79c54b6bbf ("net/mlx4_en: Fix TX moderation info loss
after set_ringparam is called")

Tested: Before this patch, on a host with NIC using mlx4, run
netserver and stream TCP to the host at full utilization.
$ sar -I SUM 1
                 INTR    intr/s
14:03:56          sum  48758.00

After rx hwtstamp is enabled:
$ sar -I SUM 1
14:10:38          sum 317771.00
We see the moderation is not working properly and issued 7x more
interrupts.

After the patch, and turned on rx hwtstamp, the rate of interrupts
is as expected:
$ sar -I SUM 1
14:52:11          sum  49332.00

Fixes: 79c54b6bbf ("net/mlx4_en: Fix TX moderation info loss after set_ringparam is called")
Signed-off-by: Kevin(Yudong) Yang <yyd@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
CC: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-05 12:42:31 -08:00
Linus Torvalds
6d47254c06 Merge tag 'pm-5.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
 "These fix the usage of device links in the runtime PM core code and
  update the DTPM (Dynamic Thermal Power Management) feature added
  recently.

  Specifics:

   - Make the runtime PM core code avoid attempting to suspend supplier
     devices before updating the PM-runtime status of a consumer to
     'suspended' (Rafael Wysocki).

   - Fix DTPM (Dynamic Thermal Power Management) root node
     initialization and label that feature as EXPERIMENTAL in Kconfig
     (Daniel Lezcano)"

* tag 'pm-5.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  powercap/drivers/dtpm: Add the experimental label to the option description
  powercap/drivers/dtpm: Fix root node initialization
  PM: runtime: Update device status before letting suppliers suspend
2021-03-05 12:36:33 -08:00
Linus Torvalds
ea6be461cb Merge tag 'acpi-5.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fix from Rafael Wysocki:
 "Make the empty stubs of some helper functions used when CONFIG_ACPI is
  not set actually match those functions (Andy Shevchenko)"

* tag 'acpi-5.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI: bus: Constify is_acpi_node() and friends (part 2)
2021-03-05 12:32:17 -08:00
David S. Miller
638526bb41 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Alexei Starovoitov says:

====================
pull-request: bpf 2021-03-04

The following pull-request contains BPF updates for your *net* tree.

We've added 7 non-merge commits during the last 4 day(s) which contain
a total of 9 files changed, 128 insertions(+), 40 deletions(-).

The main changes are:

1) Fix 32-bit cmpxchg, from Brendan.

2) Fix atomic+fetch logic, from Ilya.

3) Fix usage of bpf_csum_diff in selftests, from Yauheni.
====================
2021-03-05 12:29:36 -08:00
Linus Torvalds
fc2c8d0af0 Merge tag 'iommu-fixes-v5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull iommu fixes from Joerg Roedel:

 - Fix a sleeping-while-atomic issue in the AMD IOMMU code

 - Disable lazy IOTLB flush for untrusted devices in the Intel VT-d
   driver

 - Fix status code definitions for Intel VT-d

 - Fix IO Page Fault issue in Tegra IOMMU driver

* tag 'iommu-fixes-v5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
  iommu/vt-d: Fix status code for Allocate/Free PASID command
  iommu: Don't use lazy flush for untrusted device
  iommu/tegra-smmu: Fix mc errors on tegra124-nyan
  iommu/amd: Fix sleeping in atomic in increase_address_space()
2021-03-05 12:26:24 -08:00
Linus Torvalds
f09b04cc64 Merge tag 'for-5.12-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
 "More regression fixes and stabilization.

  Regressions:

   - zoned mode
      - count zone sizes in wider int types
      - fix space accounting for read-only block groups

   - subpage: fix page tail zeroing

  Fixes:

   - fix spurious warning when remounting with free space tree

   - fix warning when creating a directory with smack enabled

   - ioctl checks for qgroup inheritance when creating a snapshot

   - qgroup
      - fix missing unlock on error path in zero range
      - fix amount of released reservation on error
      - fix flushing from unsafe context with open transaction,
        potentially deadlocking

   - minor build warning fixes"

* tag 'for-5.12-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: zoned: do not account freed region of read-only block group as zone_unusable
  btrfs: zoned: use sector_t for zone sectors
  btrfs: subpage: fix the false data csum mismatch error
  btrfs: fix warning when creating a directory with smack enabled
  btrfs: don't flush from btrfs_delayed_inode_reserve_metadata
  btrfs: export and rename qgroup_reserve_meta
  btrfs: free correct amount of space in btrfs_delayed_inode_reserve_metadata
  btrfs: fix spurious free_space_tree remount warning
  btrfs: validate qgroup inherit for SNAP_CREATE_V2 ioctl
  btrfs: unlock extents in btrfs_zero_range in case of quota reservation errors
  btrfs: ref-verify: use 'inline void' keyword ordering
2021-03-05 12:21:14 -08:00
Linus Torvalds
6bf331d5ce Merge tag 'devicetree-fixes-for-5.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
Pull devicetree fixes from Rob Herring:

 - Another batch of graph and video-interfaces schema conversions

 - Drop DT header symlink for dropped C6X arch

 - Fix bcm2711-hdmi schema error

* tag 'devicetree-fixes-for-5.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
  dt-bindings: media: Use graph and video-interfaces schemas, round 2
  dts: drop dangling c6x symlink
  dt-bindings: bcm2711-hdmi: Fix broken schema
2021-03-05 12:12:28 -08:00
Linus Torvalds
54663cf398 Merge tag 'trace-v5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fixes from Steven Rostedt:
 "Functional fixes:

   - Fix big endian conversion for arm64 in recordmcount processing

   - Fix timestamp corruption in ring buffer on discarding events

   - Fix memory leak in __create_synth_event()

   - Skip selftests if tracing is disabled as it will cause them to
     fail.

  Non-functional fixes:

   - Fix help text in Kconfig

   - Remove duplicate prototype for trace_empty()

   - Fix stale comment about the trace_event_call flags.

  Self test update:

   - Add more information to the validation output of when a corrupt
     timestamp is found in the ring buffer, and also trigger a warning
     to make sure that tests catch it"

* tag 'trace-v5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Fix comment about the trace_event_call flags
  tracing: Skip selftests if tracing is disabled
  tracing: Fix memory leak in __create_synth_event()
  ring-buffer: Add a little more information and a WARN when time stamp going backwards is detected
  ring-buffer: Force before_stamp and write_stamp to be different on discard
  tracing: Fix help text of TRACEPOINT_BENCHMARK in Kconfig
  tracing: Remove duplicate declaration from trace.h
  ftrace: Have recordmcount use w8 to read relp->r_info in arm64_is_fake_mcount
2021-03-05 12:04:59 -08:00
Xunlei Pang
4f44657d74 blk-cgroup: Fix the recursive blkg rwstat
The current blkio.throttle.io_service_bytes_recursive doesn't
work correctly.

As an example, for the following blkcg hierarchy:
 (Made 1GB READ in test1, 512MB READ in test2)
     test
    /    \
 test1   test2

$ head -n 1 test/test1/blkio.throttle.io_service_bytes_recursive
8:0 Read 1073684480
$ head -n 1 test/test2/blkio.throttle.io_service_bytes_recursive
8:0 Read 537448448
$ head -n 1 test/blkio.throttle.io_service_bytes_recursive
8:0 Read 537448448

Clearly, above data of "test" reflects "test2" not "test1"+"test2".

Do the correct summary in blkg_rwstat_recursive_sum().

Signed-off-by: Xunlei Pang <xlpang@linux.alibaba.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-05 11:32:15 -07:00
Stefan Haberland
66f669a272 s390/dasd: fix hanging IO request during DASD driver unbind
Prevent that an IO request is build during device shutdown initiated by
a driver unbind. This request will never be able to be processed or
canceled and will hang forever. This will lead also to a hanging unbind.

Fix by checking not only if the device is in READY state but also check
that there is no device offline initiated before building a new IO request.

Fixes: e443343e50 ("s390/dasd: blk-mq conversion")

Cc: <stable@vger.kernel.org> # v4.14+
Signed-off-by: Stefan Haberland <sth@linux.ibm.com>
Tested-by: Bjoern Walk <bwalk@linux.ibm.com>
Reviewed-by: Jan Hoeppner <hoeppner@linux.ibm.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-05 11:30:17 -07:00
Stefan Haberland
7d365bd0bf s390/dasd: fix hanging DASD driver unbind
In case of an unbind of the DASD device driver the function
dasd_generic_remove() is called which shuts down the device.
Among others this functions removes the int_handler from the cdev.
During shutdown the device cancels all outstanding IO requests and waits
for completion of the clear request.
Unfortunately the clear interrupt will never be received when there is no
interrupt handler connected.

Fix by moving the int_handler removal after the call to the state machine
where no request or interrupt is outstanding.

Cc: stable@vger.kernel.org
Signed-off-by: Stefan Haberland <sth@linux.ibm.com>
Tested-by: Bjoern Walk <bwalk@linux.ibm.com>
Reviewed-by: Jan Hoeppner <hoeppner@linux.ibm.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-05 11:30:17 -07:00
Jan Kara
56887cffe9 block: Try to handle busy underlying device on discard
Commit 384d87ef2c ("block: Do not discard buffers under a mounted
filesystem") made paths issuing discard or zeroout requests to the
underlying device try to grab block device in exclusive mode. If that
failed we returned EBUSY to userspace. This however caused unexpected
fallout in userspace where e.g. FUSE filesystems issue discard requests
from userspace daemons although the device is open exclusively by the
kernel. Also shrinking of logical volume by LVM issues discard requests
to a device which may be claimed exclusively because there's another LV
on the same PV. So to avoid these userspace regressions, fall back to
invalidate_inode_pages2_range() instead of returning EBUSY to userspace
and return EBUSY only of that call fails as well (meaning that there's
indeed someone using the particular device range we are trying to
discard).

Link: https://bugzilla.kernel.org/show_bug.cgi?id=211167
Fixes: 384d87ef2c ("block: Do not discard buffers under a mounted filesystem")
CC: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-05 11:29:26 -07:00
Bob Pearson
545c4ab463 RDMA/rxe: Fix errant WARN_ONCE in rxe_completer()
In rxe_comp.c in rxe_completer() the function free_pkt() did not clear skb
which triggered a warning at 'done:' and could possibly at 'exit:'. The
WARN_ONCE() calls are not actually needed.  The call to free_pkt() is
moved to the end to clearly show that all skbs are freed.

Fixes: 899aba891c ("RDMA/rxe: Fix FIXME in rxe_udp_encap_recv()")
Link: https://lore.kernel.org/r/20210304192048.2958-1-rpearson@hpe.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-03-05 14:15:22 -04:00
Bob Pearson
5e4a7ccc96 RDMA/rxe: Fix extra deref in rxe_rcv_mcast_pkt()
rxe_rcv_mcast_pkt() dropped a reference to ib_device when no error
occurred causing an underflow on the reference counter.  This code is
cleaned up to be clearer and easier to read.

Fixes: 899aba891c ("RDMA/rxe: Fix FIXME in rxe_udp_encap_recv()")
Link: https://lore.kernel.org/r/20210304192048.2958-1-rpearson@hpe.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-03-05 14:15:18 -04:00
Bob Pearson
21e27ac82d RDMA/rxe: Fix missed IB reference counting in loopback
When the noted patch below extending the reference taken by
rxe_get_dev_from_net() in rxe_udp_encap_recv() until each skb is freed it
was not matched by a reference in the loopback path resulting in
underflows.

Fixes: 899aba891c ("RDMA/rxe: Fix FIXME in rxe_udp_encap_recv()")
Link: https://lore.kernel.org/r/20210304192048.2958-1-rpearson@hpe.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-03-05 14:11:02 -04:00
Pavel Begunkov
e45cff5885 io_uring: don't restrict issue_flags for io_openat
45d189c606 ("io_uring: replace force_nonblock with flags") did
something strange for io_openat() slicing all issue_flags but
IO_URING_F_NONBLOCK. Not a bug for now, but better to just forward the
flags.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-05 09:52:29 -07:00
Jens Axboe
a2b658e4a0 Merge tag 'nvme-5.12-2021-03-05' of git://git.infradead.org/nvme into block-5.12
Pull NVMe fixes from Christoph:

"nvme fixes for 5.12:

 - more device quirks (Julian Einwag, Zoltán Böszörményi, Pascal Terjan)
 - fix a hwmon error return (Daniel Wagner)
 - fix the keep alive timeout initialization (Martin George)
 - ensure the model_number can't be changed on a used subsystem
   (Max Gurtovoy)"

* tag 'nvme-5.12-2021-03-05' of git://git.infradead.org/nvme:
  nvmet: model_number must be immutable once set
  nvme-fabrics: fix kato initialization
  nvme-hwmon: Return error code when registration fails
  nvme-pci: add quirks for Lexar 256GB SSD
  nvme-pci: mark Kingston SKC2000 as not supporting the deepest power state
  nvme-pci: mark Seagate Nytro XM1440 as QUIRK_NO_NS_DESC_LIST.
2021-03-05 09:13:07 -07:00
Jens Axboe
86e0d6766c io_uring: make SQPOLL thread parking saner
We have this weird true/false return from parking, and then some of the
callers decide to look at that. It can lead to unbalanced parks and
sqd locking. Have the callers check the thread status once it's parked.
We know we have the lock at that point, so it's either valid or it's NULL.

Fix race with parking on thread exit. We need to be careful here with
ordering of the sdq->lock and the IO_SQ_THREAD_SHOULD_PARK bit.

Rename sqd->completion to sqd->parked to reflect that this is the only
thing this completion event doesn.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-05 08:44:39 -07:00
Jens Axboe
09ca6c40c2 io-wq: kill hashed waitqueue before manager exits
If we race with shutting down the io-wq context and someone queueing
a hashed entry, then we can exit the manager with it armed. If it then
triggers after the manager has exited, we can have a use-after-free where
io_wqe_hash_wake() attempts to wake a now gone manager process.

Move the killing of the hashed write queue into the manager itself, so
that we know we've killed it before the task exits.

Fixes: e941894eae ("io-wq: make buffered file write hashed work map per-ctx")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-05 08:44:09 -07:00
Jens Axboe
b5b0ecb736 io_uring: clear IOCB_WAITQ for non -EIOCBQUEUED return
The callback can only be armed, if we get -EIOCBQUEUED returned. It's
important that we clear the WAITQ bit for other cases, otherwise we can
queue for async retry and filemap will assume that we're armed and
return -EAGAIN instead of just blocking for the IO.

Cc: stable@vger.kernel.org # 5.9+
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-05 08:43:09 -07:00
Jens Axboe
ca0a26511c io_uring: don't keep looping for more events if we can't flush overflow
It doesn't make sense to wait for more events to come in, if we can't
even flush the overflow we already have to the ring. Return -EBUSY for
that condition, just like we do for attempts to submit with overflow
pending.

Cc: stable@vger.kernel.org # 5.11
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-05 08:43:09 -07:00
Jens Axboe
46fe18b16c io_uring: move to using create_io_thread()
This allows us to do task creation and setup without needing to use
completions to try and synchronize with the starting thread. Get rid of
the old io_wq_fork_thread() wrapper, and the 'wq' and 'worker' startup
completion events - we can now do setup before the task is running.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-05 08:43:01 -07:00
Rafael J. Wysocki
7bff4c26b6 Merge branch 'powercap'
* powercap:
  powercap/drivers/dtpm: Add the experimental label to the option description
  powercap/drivers/dtpm: Fix root node initialization
2021-03-05 16:19:10 +01:00
Sean Christopherson
99840a7545 KVM: SVM: Connect 'npt' module param to KVM's internal 'npt_enabled'
Directly connect the 'npt' param to the 'npt_enabled' variable so that
runtime adjustments to npt_enabled are reflected in sysfs.  Move the
!PAE restriction to a runtime check to ensure NPT is forced off if the
host is using 2-level paging, and add a comment explicitly stating why
NPT requires a 64-bit kernel or a kernel with PAE enabled.

Opportunistically switch the param to octal permissions.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210305021637.3768573-1-seanjc@google.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-05 08:33:15 -05:00
Sean Christopherson
beda430177 KVM: x86: Ensure deadline timer has truly expired before posting its IRQ
When posting a deadline timer interrupt, open code the checks guarding
__kvm_wait_lapic_expire() in order to skip the lapic_timer_int_injected()
check in kvm_wait_lapic_expire().  The injection check will always fail
since the interrupt has not yet be injected.  Moving the call after
injection would also be wrong as that wouldn't actually delay delivery
of the IRQ if it is indeed sent via posted interrupt.

Fixes: 010fd37fdd ("KVM: LAPIC: Reduce world switch latency caused by timer_advance_ns")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210305021808.3769732-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-05 08:30:21 -05:00
Max Gurtovoy
d9f273b758 nvmet: model_number must be immutable once set
In case we have already established connection to nvmf target, it
shouldn't be allowed to change the model_number. E.g. if someone will
identify ctrl and get model_number of "my_model" later on will change
the model_numbel via configfs to "my_new_model" this will break the NVMe
specification for "Get Log Page – Persistent Event Log" that refers to
Model Number as: "This field contains the same value as reported in the
Model Number field of the Identify Controller data structure, bytes
63:24."

Although it doesn't mentioned explicitly that this field can't be
changed, we can assume it.

So allow setting this field only once: using configfs or in the first
identify ctrl operation.

Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-03-05 13:41:03 +01:00
Martin George
32feb6de47 nvme-fabrics: fix kato initialization
Currently kato is initialized to NVME_DEFAULT_KATO for both
discovery & i/o controllers. This is a problem specifically
for non-persistent discovery controllers since it always ends
up with a non-zero kato value. Fix this by initializing kato
to zero instead, and ensuring various controllers are assigned
appropriate kato values as follows:

non-persistent controllers  - kato set to zero
persistent controllers      - kato set to NVMF_DEV_DISC_TMO
                              (or any positive int via nvme-cli)
i/o controllers             - kato set to NVME_DEFAULT_KATO
                              (or any positive int via nvme-cli)

Signed-off-by: Martin George <marting@netapp.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-03-05 13:41:03 +01:00
Daniel Wagner
78570f8873 nvme-hwmon: Return error code when registration fails
The hwmon pointer wont be NULL if the registration fails. Though the
exit code path will assign it to ctrl->hwmon_device. Later
nvme_hwmon_exit() will try to free the invalid pointer. Avoid this by
returning the error code from hwmon_device_register_with_info().

Fixes: ed7770f662 ("nvme/hwmon: rework to avoid devm allocation")
Signed-off-by: Daniel Wagner <dwagner@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-03-05 13:41:03 +01:00
Pascal Terjan
6e6a6828c5 nvme-pci: add quirks for Lexar 256GB SSD
Add the NVME_QUIRK_NO_NS_DESC_LIST and NVME_QUIRK_IGNORE_DEV_SUBNQN
quirks for this buggy device.

Reported and tested in https://bugs.mageia.org/show_bug.cgi?id=28417

Signed-off-by: Pascal Terjan <pterjan@google.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-03-05 13:41:03 +01:00
Zoltán Böszörményi
dc22c1c058 nvme-pci: mark Kingston SKC2000 as not supporting the deepest power state
My 2TB SKC2000 showed the exact same symptoms that were provided
in 538e4a8c57 ("nvme-pci: avoid the deepest sleep state on
Kingston A2000 SSDs"), i.e. a complete NVME lockup that needed
cold boot to get it back.

According to some sources, the A2000 is simply a rebadged
SKC2000 with a slightly optimized firmware.

Adding the SKC2000 PCI ID to the quirk list with the same workaround
as the A2000 made my laptop survive a 5 hours long Yocto bootstrap
buildfest which reliably triggered the SSD lockup previously.

Signed-off-by: Zoltán Böszörményi <zboszor@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-03-05 13:41:03 +01:00
Julian Einwag
5e112d3fb8 nvme-pci: mark Seagate Nytro XM1440 as QUIRK_NO_NS_DESC_LIST.
The kernel fails to fully detect these SSDs, only the character devices
are present:

[   10.785605] nvme nvme0: pci function 0000:04:00.0
[   10.876787] nvme nvme1: pci function 0000:81:00.0
[   13.198614] nvme nvme0: missing or invalid SUBNQN field.
[   13.198658] nvme nvme1: missing or invalid SUBNQN field.
[   13.206896] nvme nvme0: Shutdown timeout set to 20 seconds
[   13.215035] nvme nvme1: Shutdown timeout set to 20 seconds
[   13.225407] nvme nvme0: 16/0/0 default/read/poll queues
[   13.233602] nvme nvme1: 16/0/0 default/read/poll queues
[   13.239627] nvme nvme0: Identify Descriptors failed (8194)
[   13.246315] nvme nvme1: Identify Descriptors failed (8194)

Adding the NVME_QUIRK_NO_NS_DESC_LIST fixes this problem.

BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=205679
Signed-off-by: Julian Einwag <jeinwag-nvme@marcapo.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
2021-03-05 13:40:55 +01:00
Linus Torvalds
280d542f6f Merge tag 'drm-fixes-2021-03-05' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
 "More may show up but this is what I have at this stage: just a single
  nouveau regression fix, and a bunch of amdgpu fixes.

  amdgpu:
   - S0ix fix
   - Handle new NV12 SKU
   - Misc power fixes
   - Display uninitialized value fix
   - PCIE debugfs register access fix

  nouveau:
   - regression fix for gk104"

* tag 'drm-fixes-2021-03-05' of git://anongit.freedesktop.org/drm/drm:
  drm/amdgpu: fix parameter error of RREG32_PCIE() in amdgpu_regs_pcie
  drm/amd/display: fix the return of the uninitialized value in ret
  drm/amdgpu: enable BACO runpm by default on sienna cichlid and navy flounder
  drm/amd/pm: correct Arcturus mmTHM_BACO_CNTL register address
  drm/amdgpu/swsmu/vangogh: Only use RLCPowerNotify msg for disable
  drm/amdgpu/pm: make unsupported power profile messages debug
  drm/amdgpu:disable VCN for Navi12 SKU
  drm/amdgpu: Only check for S0ix if AMD_PMC is configured
  drm/nouveau/fifo/gk104-gp1xx: fix creation of sw class
2021-03-04 19:06:28 -08:00
Brendan Jackman
39491867ac bpf: Explicitly zero-extend R0 after 32-bit cmpxchg
As pointed out by Ilya and explained in the new comment, there's a
discrepancy between x86 and BPF CMPXCHG semantics: BPF always loads
the value from memory into r0, while x86 only does so when r0 and the
value in memory are different. The same issue affects s390.

At first this might sound like pure semantics, but it makes a real
difference when the comparison is 32-bit, since the load will
zero-extend r0/rax.

The fix is to explicitly zero-extend rax after doing such a
CMPXCHG. Since this problem affects multiple archs, this is done in
the verifier by patching in a BPF_ZEXT_REG instruction after every
32-bit cmpxchg. Any archs that don't need such manual zero-extension
can do a look-ahead with insn_is_zext to skip the unnecessary mov.

Note this still goes on top of Ilya's patch:

https://lore.kernel.org/bpf/20210301154019.129110-1-iii@linux.ibm.com/T/#u

Differences v5->v6[1]:
 - Moved is_cmpxchg_insn and ensured it can be safely re-used. Also renamed it
   and removed 'inline' to match the style of the is_*_function helpers.
 - Fixed up comments in verifier test (thanks for the careful review, Martin!)

Differences v4->v5[1]:
 - Moved the logic entirely into opt_subreg_zext_lo32_rnd_hi32, thanks to Martin
   for suggesting this.

Differences v3->v4[1]:
 - Moved the optimization against pointless zext into the correct place:
   opt_subreg_zext_lo32_rnd_hi32 is called _after_ fixup_bpf_calls.

Differences v2->v3[1]:
 - Moved patching into fixup_bpf_calls (patch incoming to rename this function)
 - Added extra commentary on bpf_jit_needs_zext
 - Added check to avoid adding a pointless zext(r0) if there's already one there.

Difference v1->v2[1]: Now solved centrally in the verifier instead of
  specifically for the x86 JIT. Thanks to Ilya and Daniel for the suggestions!

[1] v5: https://lore.kernel.org/bpf/CA+i-1C3ytZz6FjcPmUg5s4L51pMQDxWcZNvM86w4RHZ_o2khwg@mail.gmail.com/T/#t
    v4: https://lore.kernel.org/bpf/CA+i-1C3ytZz6FjcPmUg5s4L51pMQDxWcZNvM86w4RHZ_o2khwg@mail.gmail.com/T/#t
    v3: https://lore.kernel.org/bpf/08669818-c99d-0d30-e1db-53160c063611@iogearbox.net/T/#t
    v2: https://lore.kernel.org/bpf/08669818-c99d-0d30-e1db-53160c063611@iogearbox.net/T/#t
    v1: https://lore.kernel.org/bpf/d7ebaefb-bfd6-a441-3ff2-2fdfe699b1d2@iogearbox.net/T/#t

Reported-by: Ilya Leoshkevich <iii@linux.ibm.com>
Fixes: 5ffa25502b ("bpf: Add instructions for atomic_[cmp]xchg")
Signed-off-by: Brendan Jackman <jackmanb@google.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Ilya Leoshkevich <iii@linux.ibm.com>
Tested-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2021-03-04 19:06:03 -08:00
Linus Torvalds
44195bd771 Merge tag 'mkp-scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi
Pull iSCSI fixes from Martin Petersen:
 "Three fixes for missed iSCSI verification checks (and make the sysfs
  files use "sysfs_emit()" - that's what it is there for)"

* tag 'mkp-scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi:
  scsi: iscsi: Verify lengths on passthrough PDUs
  scsi: iscsi: Ensure sysfs attributes are limited to PAGE_SIZE
  scsi: iscsi: Restrict sessions and handles to admin capabilities
2021-03-04 18:53:30 -08:00
Dave Airlie
a1f1054124 Merge tag 'amd-drm-fixes-5.12-2021-03-03' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes
amd-drm-fixes-5.12-2021-03-03:

amdgpu:
- S0ix fix
- Handle new NV12 SKU
- Misc power fixes
- Display uninitialized value fix
- PCIE debugfs register access fix

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210304043255.3792-1-alexander.deucher@amd.com
2021-03-05 11:13:22 +10:00
Dave Airlie
a727df407c Merge branch '00.00-inst' of git://github.com/skeggsb/linux into drm-fixes
A single regression fix here that I noticed while testing a bunch of
boards for something else, not sure where this got lost!  Prevents 3D
driver from initialising on some GPUs.

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Ben Skeggs <skeggsb@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/CACAvsv5gmq14BrDmkMncfd=tHVSSaU89BdBEWfs6Jy-aRz03GQ@mail.gmail.com
2021-03-05 11:10:29 +10:00
Chris Leech
f9dbdf97a5 scsi: iscsi: Verify lengths on passthrough PDUs
Open-iSCSI sends passthrough PDUs over netlink, but the kernel should be
verifying that the provided PDU header and data lengths fall within the
netlink message to prevent accessing beyond that in memory.

Cc: stable@vger.kernel.org
Reported-by: Adam Nichols <adam@grimm-co.com>
Reviewed-by: Lee Duncan <lduncan@suse.com>
Reviewed-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Chris Leech <cleech@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-03-04 20:09:51 -05:00
Chris Leech
ec98ea7070 scsi: iscsi: Ensure sysfs attributes are limited to PAGE_SIZE
As the iSCSI parameters are exported back through sysfs, it should be
enforcing that they never are more than PAGE_SIZE (which should be more
than enough) before accepting updates through netlink.

Change all iSCSI sysfs attributes to use sysfs_emit().

Cc: stable@vger.kernel.org
Reported-by: Adam Nichols <adam@grimm-co.com>
Reviewed-by: Lee Duncan <lduncan@suse.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Chris Leech <cleech@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-03-04 20:09:51 -05:00
Lee Duncan
688e8128b7 scsi: iscsi: Restrict sessions and handles to admin capabilities
Protect the iSCSI transport handle, available in sysfs, by requiring
CAP_SYS_ADMIN to read it. Also protect the netlink socket by restricting
reception of messages to ones sent with CAP_SYS_ADMIN. This disables
normal users from being able to end arbitrary iSCSI sessions.

Cc: stable@vger.kernel.org
Reported-by: Adam Nichols <adam@grimm-co.com>
Reviewed-by: Chris Leech <cleech@redhat.com>
Reviewed-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Lee Duncan <lduncan@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-03-04 20:09:50 -05:00
Paul Moore
ad5d07f4a9 cipso,calipso: resolve a number of problems with the DOI refcounts
The current CIPSO and CALIPSO refcounting scheme for the DOI
definitions is a bit flawed in that we:

1. Don't correctly match gets/puts in netlbl_cipsov4_list().
2. Decrement the refcount on each attempt to remove the DOI from the
   DOI list, only removing it from the list once the refcount drops
   to zero.

This patch fixes these problems by adding the missing "puts" to
netlbl_cipsov4_list() and introduces a more conventional, i.e.
not-buggy, refcounting mechanism to the DOI definitions.  Upon the
addition of a DOI to the DOI list, it is initialized with a refcount
of one, removing a DOI from the list removes it from the list and
drops the refcount by one; "gets" and "puts" behave as expected with
respect to refcounts, increasing and decreasing the DOI's refcount by
one.

Fixes: b1edeb1023 ("netlabel: Replace protocol/NetLabel linking with refrerence counts")
Fixes: d7cce01504 ("netlabel: Add support for removing a CALIPSO DOI.")
Reported-by: syzbot+9ec037722d2603a9f52e@syzkaller.appspotmail.com
Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-04 15:26:57 -08:00
Jens Axboe
cc440e8738 kernel: provide create_io_thread() helper
Provide a generic helper for setting up an io_uring worker. Returns a
task_struct so that the caller can do whatever setup is needed, then call
wake_up_new_task() to kick it into gear.

Add a kernel_clone_args member, io_thread, which tells copy_process() to
mark the task with PF_IO_WORKER.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-04 15:45:03 -07:00
Pavel Begunkov
dd59a3d595 io_uring: reliably cancel linked timeouts
Linked timeouts are fired asynchronously (i.e. soft-irq), and use
generic cancellation paths to do its stuff, including poking into io-wq.
The problem is that it's racy to access tctx->io_wq, as
io_uring_task_cancel() and others may be happening at this exact moment.
Mark linked timeouts with REQ_F_INLIFGHT for now, making sure there are
no timeouts before io-wq destraction.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-04 15:45:01 -07:00
Pavel Begunkov
b05a1bcd40 io_uring: cancel-match based on flags
Instead of going into request internals, like checking req->file->f_op,
do match them based on REQ_F_INFLIGHT, it's set only when we want it to
be reliably cancelled.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-04 15:44:50 -07:00
Jiri Wiesner
67eb211487 ibmvnic: always store valid MAC address
The last change to ibmvnic_set_mac(), 8fc3672a8a, meant to prevent
users from setting an invalid MAC address on an ibmvnic interface
that has not been brought up yet. The change also prevented the
requested MAC address from being stored by the adapter object for an
ibmvnic interface when the state of the ibmvnic interface is
VNIC_PROBED - that is after probing has finished but before the
ibmvnic interface is brought up. The MAC address stored by the
adapter object is used and sent to the hypervisor for checking when
an ibmvnic interface is brought up.

The ibmvnic driver ignoring the requested MAC address when in
VNIC_PROBED state caused LACP bonds (bonds in 802.3ad mode) with more
than one slave to malfunction. The bonding code must be able to
change the MAC address of its slaves before they are brought up
during enslaving. The inability of kernels with 8fc3672a8a to set
the MAC addresses of bonding slaves is observable in the output of
"ip address show". The MAC addresses of the slaves are the same as
the MAC address of the bond on a working system whereas the slaves
retain their original MAC addresses on a system with a malfunctioning
LACP bond.

Fixes: 8fc3672a8a ("ibmvnic: fix ibmvnic_set_mac")
Signed-off-by: Jiri Wiesner <jwiesner@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-04 14:37:38 -08:00
Hillf Danton
863a42b289 netdevsim: init u64 stats for 32bit hardware
Init the u64 stats in order to avoid the lockdep prints on the 32bit
hardware like

 INFO: trying to register non-static key.
 the code is fine but needs lockdep annotation.
 turning off the locking correctness validator.
 CPU: 0 PID: 4695 Comm: syz-executor.0 Not tainted 5.11.0-rc5-syzkaller #0
 Hardware name: ARM-Versatile Express
 Backtrace:
 [<826fc5b8>] (dump_backtrace) from [<826fc82c>] (show_stack+0x18/0x1c arch/arm/kernel/traps.c:252)
 [<826fc814>] (show_stack) from [<8270d1f8>] (__dump_stack lib/dump_stack.c:79 [inline])
 [<826fc814>] (show_stack) from [<8270d1f8>] (dump_stack+0xa8/0xc8 lib/dump_stack.c:120)
 [<8270d150>] (dump_stack) from [<802bf9c0>] (assign_lock_key kernel/locking/lockdep.c:935 [inline])
 [<8270d150>] (dump_stack) from [<802bf9c0>] (register_lock_class+0xabc/0xb68 kernel/locking/lockdep.c:1247)
 [<802bef04>] (register_lock_class) from [<802baa2c>] (__lock_acquire+0x84/0x32d4 kernel/locking/lockdep.c:4711)
 [<802ba9a8>] (__lock_acquire) from [<802be840>] (lock_acquire.part.0+0xf0/0x554 kernel/locking/lockdep.c:5442)
 [<802be750>] (lock_acquire.part.0) from [<802bed10>] (lock_acquire+0x6c/0x74 kernel/locking/lockdep.c:5415)
 [<802beca4>] (lock_acquire) from [<81560548>] (seqcount_lockdep_reader_access include/linux/seqlock.h:103 [inline])
 [<802beca4>] (lock_acquire) from [<81560548>] (__u64_stats_fetch_begin include/linux/u64_stats_sync.h:164 [inline])
 [<802beca4>] (lock_acquire) from [<81560548>] (u64_stats_fetch_begin include/linux/u64_stats_sync.h:175 [inline])
 [<802beca4>] (lock_acquire) from [<81560548>] (nsim_get_stats64+0xdc/0xf0 drivers/net/netdevsim/netdev.c:70)
 [<8156046c>] (nsim_get_stats64) from [<81e2efa0>] (dev_get_stats+0x44/0xd0 net/core/dev.c:10405)
 [<81e2ef5c>] (dev_get_stats) from [<81e53204>] (rtnl_fill_stats+0x38/0x120 net/core/rtnetlink.c:1211)
 [<81e531cc>] (rtnl_fill_stats) from [<81e59d58>] (rtnl_fill_ifinfo+0x6d4/0x148c net/core/rtnetlink.c:1783)
 [<81e59684>] (rtnl_fill_ifinfo) from [<81e5ceb4>] (rtmsg_ifinfo_build_skb+0x9c/0x108 net/core/rtnetlink.c:3798)
 [<81e5ce18>] (rtmsg_ifinfo_build_skb) from [<81e5d0ac>] (rtmsg_ifinfo_event net/core/rtnetlink.c:3830 [inline])
 [<81e5ce18>] (rtmsg_ifinfo_build_skb) from [<81e5d0ac>] (rtmsg_ifinfo_event net/core/rtnetlink.c:3821 [inline])
 [<81e5ce18>] (rtmsg_ifinfo_build_skb) from [<81e5d0ac>] (rtmsg_ifinfo+0x44/0x70 net/core/rtnetlink.c:3839)
 [<81e5d068>] (rtmsg_ifinfo) from [<81e45c2c>] (register_netdevice+0x664/0x68c net/core/dev.c:10103)
 [<81e455c8>] (register_netdevice) from [<815608bc>] (nsim_create+0xf8/0x124 drivers/net/netdevsim/netdev.c:317)
 [<815607c4>] (nsim_create) from [<81561184>] (__nsim_dev_port_add+0x108/0x188 drivers/net/netdevsim/dev.c:941)
 [<8156107c>] (__nsim_dev_port_add) from [<815620d8>] (nsim_dev_port_add_all drivers/net/netdevsim/dev.c:990 [inline])
 [<8156107c>] (__nsim_dev_port_add) from [<815620d8>] (nsim_dev_probe+0x5cc/0x750 drivers/net/netdevsim/dev.c:1119)
 [<81561b0c>] (nsim_dev_probe) from [<815661dc>] (nsim_bus_probe+0x10/0x14 drivers/net/netdevsim/bus.c:287)
 [<815661cc>] (nsim_bus_probe) from [<811724c0>] (really_probe+0x100/0x50c drivers/base/dd.c:554)
 [<811723c0>] (really_probe) from [<811729c4>] (driver_probe_device+0xf8/0x1c8 drivers/base/dd.c:740)
 [<811728cc>] (driver_probe_device) from [<81172fe4>] (__device_attach_driver+0x8c/0xf0 drivers/base/dd.c:846)
 [<81172f58>] (__device_attach_driver) from [<8116fee0>] (bus_for_each_drv+0x88/0xd8 drivers/base/bus.c:431)
 [<8116fe58>] (bus_for_each_drv) from [<81172c6c>] (__device_attach+0xdc/0x1d0 drivers/base/dd.c:914)
 [<81172b90>] (__device_attach) from [<8117305c>] (device_initial_probe+0x14/0x18 drivers/base/dd.c:961)
 [<81173048>] (device_initial_probe) from [<81171358>] (bus_probe_device+0x90/0x98 drivers/base/bus.c:491)
 [<811712c8>] (bus_probe_device) from [<8116e77c>] (device_add+0x320/0x824 drivers/base/core.c:3109)
 [<8116e45c>] (device_add) from [<8116ec9c>] (device_register+0x1c/0x20 drivers/base/core.c:3182)
 [<8116ec80>] (device_register) from [<81566710>] (nsim_bus_dev_new drivers/net/netdevsim/bus.c:336 [inline])
 [<8116ec80>] (device_register) from [<81566710>] (new_device_store+0x178/0x208 drivers/net/netdevsim/bus.c:215)
 [<81566598>] (new_device_store) from [<8116fcb4>] (bus_attr_store+0x2c/0x38 drivers/base/bus.c:122)
 [<8116fc88>] (bus_attr_store) from [<805b4b8c>] (sysfs_kf_write+0x48/0x54 fs/sysfs/file.c:139)
 [<805b4b44>] (sysfs_kf_write) from [<805b3c90>] (kernfs_fop_write_iter+0x128/0x1ec fs/kernfs/file.c:296)
 [<805b3b68>] (kernfs_fop_write_iter) from [<804d22fc>] (call_write_iter include/linux/fs.h:1901 [inline])
 [<805b3b68>] (kernfs_fop_write_iter) from [<804d22fc>] (new_sync_write fs/read_write.c:518 [inline])
 [<805b3b68>] (kernfs_fop_write_iter) from [<804d22fc>] (vfs_write+0x3dc/0x57c fs/read_write.c:605)
 [<804d1f20>] (vfs_write) from [<804d2604>] (ksys_write+0x68/0xec fs/read_write.c:658)
 [<804d259c>] (ksys_write) from [<804d2698>] (__do_sys_write fs/read_write.c:670 [inline])
 [<804d259c>] (ksys_write) from [<804d2698>] (sys_write+0x10/0x14 fs/read_write.c:667)
 [<804d2688>] (sys_write) from [<80200060>] (ret_fast_syscall+0x0/0x2c arch/arm/mm/proc-v7.S:64)

Fixes: 83c9e13aa3 ("netdevsim: add software driver for testing offloads")
Reported-by: syzbot+e74a6857f2d0efe3ad81@syzkaller.appspotmail.com
Tested-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Hillf Danton <hdanton@sina.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-04 14:36:26 -08:00
David S. Miller
bdda7dfab1 Merge branch 'mptcp-fixes'
Mat Martineau says:

====================
mptcp: Fixes for v5.12

These patches from the MPTCP tree fix a few multipath TCP issues:

Patches 1 and 5 clear some stale pointers when subflows close.

Patches 2, 4, and 9 plug some memory leaks.

Patch 3 fixes a memory accounting error identified by syzkaller.

Patches 6 and 7 fix a race condition that slowed data transmission.

Patch 8 adds missing wakeups when write buffer space is freed.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-04 14:30:13 -08:00
Geliang Tang
9238e900d6 mptcp: free resources when the port number is mismatched
When the port number is mismatched with the announced ones, use
'goto dispose_child' to free the resources instead of using 'goto out'.

This patch also moves the port number checking code in
subflow_syn_recv_sock before mptcp_finish_join, otherwise subflow_drop_ctx
will fail in dispose_child.

Fixes: 5bc56388c7 ("mptcp: add port number check for MP_JOIN")
Reported-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-04 14:30:13 -08:00
Paolo Abeni
417789df4a mptcp: fix missing wakeup
__mptcp_clean_una() can free write memory and should wake-up
user-space processes when needed.

When such function is invoked by the MPTCP receive path, the wakeup
is not needed, as the TCP stack will later trigger subflow_write_space
which will do the wakeup as needed.

Other __mptcp_clean_una() call sites need an additional wakeup check
Let's bundle the relevant code in a new helper and use it.

Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/165
Fixes: 6e628cd3a8 ("mptcp: use mptcp release_cb for delayed tasks")
Fixes: 64b9cea7a0 ("mptcp: fix spurious retransmissions")
Tested-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-04 14:30:13 -08:00
Paolo Abeni
c2e6048fa1 mptcp: fix race in release_cb
If we receive a MPTCP_PUSH_PENDING even from a subflow when
mptcp_release_cb() is serving the previous one, the latter
will be delayed up to the next release_sock(msk).

Address the issue implementing a test/serve loop for such
event.

Additionally rename the push helper to __mptcp_push_pending()
to be more consistent with the existing code.

Fixes: 6e628cd3a8 ("mptcp: use mptcp release_cb for delayed tasks")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-04 14:30:13 -08:00
Paolo Abeni
2948d0a1e5 mptcp: factor out __mptcp_retrans helper()
Will simplify the following patch, no functional change
intended.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-04 14:30:13 -08:00
Florian Westphal
c8fe62f076 mptcp: reset 'first' and ack_hint on subflow close
Just like with last_snd, we have to NULL 'first' on subflow close.

ack_hint isn't strictly required (its never dereferenced), but better to
clear this explicitly as well instead of making it an exception.

msk->first is dereferenced unconditionally at accept time, but
at that point the ssk is not on the conn_list yet -- this means
worker can't see it when iterating the conn_list.

Reported-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-04 14:30:13 -08:00
Florian Westphal
17aee05dc8 mptcp: dispose initial struct socket when its subflow is closed
Christoph Paasch reported following crash:
dst_release underflow
WARNING: CPU: 0 PID: 1319 at net/core/dst.c:175 dst_release+0xc1/0xd0 net/core/dst.c:175
CPU: 0 PID: 1319 Comm: syz-executor217 Not tainted 5.11.0-rc6af8e85128b4d0d24083c5cac646e891227052e0c #70
Call Trace:
 rt_cache_route+0x12e/0x140 net/ipv4/route.c:1503
 rt_set_nexthop.constprop.0+0x1fc/0x590 net/ipv4/route.c:1612
 __mkroute_output net/ipv4/route.c:2484 [inline]
...

The worker leaves msk->subflow alone even when it
happened to close the subflow ssk associated with it.

Fixes: 866f26f2a9 ("mptcp: always graft subflow socket to parent")
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/157
Reported-by: Christoph Paasch <cpaasch@apple.com>
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-04 14:30:13 -08:00
Paolo Abeni
eaeef1ce55 mptcp: fix memory accounting on allocation error
In case of memory pressure the MPTCP xmit path keeps
at most a single skb in the tx cache, eventually freeing
additional ones.

The associated counter for forward memory is not update
accordingly, and that causes the following splat:

WARNING: CPU: 0 PID: 12 at net/core/stream.c:208 sk_stream_kill_queues+0x3ca/0x530 net/core/stream.c:208
Modules linked in:
CPU: 0 PID: 12 Comm: kworker/0:1 Not tainted 5.11.0-rc2 #59
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
Workqueue: events mptcp_worker
RIP: 0010:sk_stream_kill_queues+0x3ca/0x530 net/core/stream.c:208
Code: 03 0f b6 04 02 84 c0 74 08 3c 03 0f 8e 63 01 00 00 8b ab 00 01 00 00 e9 60 ff ff ff e8 2f 24 d3 fe 0f 0b eb 97 e8 26 24 d3 fe <0f> 0b eb a0 e8 1d 24 d3 fe 0f 0b e9 a5 fe ff ff 4c 89 e7 e8 0e d0
RSP: 0018:ffffc900000c7bc8 EFLAGS: 00010293
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: ffff88810030ac40 RSI: ffffffff8262ca4a RDI: 0000000000000003
RBP: 0000000000000d00 R08: 0000000000000000 R09: ffffffff85095aa7
R10: ffffffff8262c9ea R11: 0000000000000001 R12: ffff888108908100
R13: ffffffff85095aa0 R14: ffffc900000c7c48 R15: 1ffff92000018f85
FS:  0000000000000000(0000) GS:ffff88811b200000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fa7444baef8 CR3: 0000000035ee9005 CR4: 0000000000170ef0
Call Trace:
 __mptcp_destroy_sock+0x4a7/0x6c0 net/mptcp/protocol.c:2547
 mptcp_worker+0x7dd/0x1610 net/mptcp/protocol.c:2272
 process_one_work+0x896/0x1170 kernel/workqueue.c:2275
 worker_thread+0x605/0x1350 kernel/workqueue.c:2421
 kthread+0x344/0x410 kernel/kthread.c:292
 ret_from_fork+0x22/0x30 arch/x86/entry/entry_64.S:296

At close time, as reported by syzkaller/Christoph.

This change address the issue properly updating the fwd
allocated memory counter in the error path.

Reported-by: Christoph Paasch <cpaasch@apple.com>
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/136
Fixes: 724cfd2ee8 ("mptcp: allocate TX skbs in msk context")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-04 14:30:13 -08:00
Florian Westphal
f07157792c mptcp: put subflow sock on connect error
mptcp_add_pending_subflow() performs a sock_hold() on the subflow,
then adds the subflow to the join list.

Without a sock_put the subflow sk won't be freed in case connect() fails.

unreferenced object 0xffff88810c03b100 (size 3000):
[..]
    sk_prot_alloc.isra.0+0x2f/0x110
    sk_alloc+0x5d/0xc20
    inet6_create+0x2b7/0xd30
    __sock_create+0x17f/0x410
    mptcp_subflow_create_socket+0xff/0x9c0
    __mptcp_subflow_connect+0x1da/0xaf0
    mptcp_pm_nl_work+0x6e0/0x1120
    mptcp_worker+0x508/0x9a0

Fixes: 5b950ff433 ("mptcp: link MPC subflow into msk only after accept")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-04 14:30:13 -08:00
Florian Westphal
e0be4931f3 mptcp: reset last_snd on subflow close
Send logic caches last active subflow in the msk, so it needs to be
cleared when the cached subflow is closed.

Fixes: d5f49190de ("mptcp: allow picking different xmit subflows")
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/155
Reported-by: Christoph Paasch <cpaasch@apple.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-04 14:30:13 -08:00
Maximilian Heyne
bfc2560563 net: sched: avoid duplicates in classes dump
This is a follow up of commit ea32746953 ("net: sched: avoid
duplicates in qdisc dump") which has fixed the issue only for the qdisc
dump.

The duplicate printing also occurs when dumping the classes via
  tc class show dev eth0

Fixes: 59cc1f61f0 ("net: sched: convert qdisc linked list to hashtable")
Signed-off-by: Maximilian Heyne <mheyne@amazon.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-04 14:27:47 -08:00
Vishal Bhakta
1112963427 scsi: vmw_pvscsi: MAINTAINERS: Update maintainer
The entries in the source files are removed as well.

Link: https://lore.kernel.org/r/20210226234347.21535-1-vbhakta@vmware.com
Signed-off-by: Vishal Bhakta <vbhakta@vmware.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-03-04 17:21:25 -05:00
Jiapeng Chong
1481b7fe43 scsi: ufs: Convert sysfs sprintf/snprintf family to sysfs_emit
Fix the following coccicheck warning:

./drivers/scsi/ufs/ufshcd.c:1538:8-16: WARNING: use scnprintf or
sprintf.

Link: https://lore.kernel.org/r/1614665298-115183-1-git-send-email-jiapeng.chong@linux.alibaba.com
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Reviewed-by: Avri Altman <avri.altman@wdc.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-03-04 17:21:25 -05:00
Can Guo
4d195dc451 scsi: ufs: Remove redundant checks of !hba in suspend/resume callbacks
Runtime and system suspend/resume can only come after hba probe invokes
platform_set_drvdata(pdev, hba), meaning hba cannot be NULL in these PM
callbacks, so remove the checks of !hba.

Link: https://lore.kernel.org/r/1614145010-36079-4-git-send-email-cang@codeaurora.org
Reviewed-by: Avri Altman <avri.altman@wdc.com>
Signed-off-by: Can Guo <cang@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-03-04 17:21:24 -05:00
Nitin Rawat
4a791574a0 scsi: ufs: ufs-qcom: Disable interrupt in reset path
Disable interrupt in reset path to flush pending IRQ handler in order to
avoid possible NoC issues.

Link: https://lore.kernel.org/r/1614145010-36079-3-git-send-email-cang@codeaurora.org
Reviewed-by: Avri Altman <avri.altman@wdc.com>
Signed-off-by: Nitin Rawat <nitirawa@codeaurora.org>
Signed-off-by: Can Guo <cang@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-03-04 17:21:24 -05:00
Can Guo
aa53f580e6 scsi: ufs: Minor adjustments to error handling
In error handling prepare stage, after SCSI requests are blocked, do a
down/up_write(clk_scaling_lock) to clean up the queuecommand() path.
Meanwhile, stop eeh_work in case it disturbs error recovery. Moreover,
reset ufshcd_state at the entrance of ufshcd_probe_hba(), since it may be
called multiple times during error recovery.

Link: https://lore.kernel.org/r/1614145010-36079-2-git-send-email-cang@codeaurora.org
Reviewed-by: Avri Altman <avri.altman@wdc.com>
Signed-off-by: Can Guo <cang@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-03-04 17:21:24 -05:00
Tyrel Datwyler
dbdbb81bde scsi: ibmvfc: Reinitialize sub-CRQs and perform channel enquiry after LPM
A live partition migration (LPM) results in a CRQ disconnect similar to a
hard reset. In this LPM case the hypervisor mostly preserves the CRQ
transport such that it simply needs to be reenabled. However, the
capabilities may have changed such as fewer channels, or no channels at
all. Further, its possible that there may be sub-CRQ support, but no
channel support. The CRQ reenable path currently doesn't take any of this
into consideration.

For simplicity release and reinitialize sub-CRQs during reenable, and set
do_enquiry and using_channels with the appropriate values to trigger
channel renegotiation.

Link: https://lore.kernel.org/r/20210302230543.9905-6-tyreld@linux.ibm.com
Fixes: 3034ebe263 ("scsi: ibmvfc: Add alloc/dealloc routines for SCSI Sub-CRQ Channels")
Reviewed-by: Brian King <brking@linux.ibm.com>
Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-03-04 17:21:24 -05:00
Tyrel Datwyler
0217a272fe scsi: ibmvfc: Store return code of H_FREE_SUB_CRQ during cleanup
The H_FREE_SUB_CRQ hypercall can return a retry delay return code that
indicates the call needs to be retried after a specific amount of time
delay. The error path to free a sub-CRQ in case of a failure during channel
registration fails to capture the return code of H_FREE_SUB_CRQ which will
result in the delay loop being skipped in the case of a retry delay return
code.

Store the return code result of the H_FREE_SUB_CRQ call such that the
return code check in the delay loop evaluates a meaningful value. Also, use
the rtas_busy_delay() to check the rc value and delay for the appropriate
amount of time.

Link: https://lore.kernel.org/r/20210302230543.9905-5-tyreld@linux.ibm.com
Fixes: 39e461fddf ("scsi: ibmvfc: Map/request irq and register Sub-CRQ interrupt handler")
Reviewed-by: Brian King <brking@linux.ibm.com>
Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-03-04 17:21:24 -05:00
Tyrel Datwyler
2162dc23f8 scsi: ibmvfc: Treat H_CLOSED as success during sub-CRQ registration
A non-zero return code for H_REG_SUB_CRQ is currently treated as a failure
resulting in failing sub-CRQ setup. The case of H_CLOSED should not be
treated as a failure. This return code translates to a successful sub-CRQ
registration by the hypervisor, and is meant to communicate back that there
is currently no partner VIOS CRQ connection established as of yet. This is
a common occurrence during a disconnect where the client adapter can
possibly come back up prior to the partner adapter.

For non-zero return code from H_REG_SUB_CRQ treat a H_CLOSED as success so
that sub-CRQs are successfully setup.

Link: https://lore.kernel.org/r/20210302230543.9905-4-tyreld@linux.ibm.com
Fixes: 3034ebe263 ("scsi: ibmvfc: Add alloc/dealloc routines for SCSI Sub-CRQ Channels")
Reviewed-by: Brian King <brking@linux.ibm.com>
Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-03-04 17:21:24 -05:00
Tyrel Datwyler
5cf529649b scsi: ibmvfc: Fix invalid sub-CRQ handles after hard reset
A hard reset results in a complete transport disconnect such that the CRQ
connection with the partner VIOS is broken. This has the side effect of
also invalidating the associated sub-CRQs. The current code assumes that
the sub-CRQs are perserved resulting in a protocol violation after trying
to reconnect them with the VIOS. This introduces an infinite loop such that
the VIOS forces a disconnect after each subsequent attempt to re-register
with invalid handles.

Avoid the aforementioned issue by releasing the sub-CRQs prior to CRQ
disconnect, and driving a reinitialization of the sub-CRQs once a new CRQ
is registered with the hypervisor.

Link: https://lore.kernel.org/r/20210302230543.9905-3-tyreld@linux.ibm.com
Fixes: 3034ebe263 ("scsi: ibmvfc: Add alloc/dealloc routines for SCSI Sub-CRQ Channels")
Reviewed-by: Brian King <brking@linux.ibm.com>
Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-03-04 17:21:07 -05:00
Daniele Palmas
6c59cff38e net: usb: qmi_wwan: allow qmimux add/del with master up
There's no reason for preventing the creation and removal
of qmimux network interfaces when the underlying interface
is up.

This makes qmi_wwan mux implementation more similar to the
rmnet one, simplifying userspace management of the same
logical interfaces.

Fixes: c6adf77953 ("net: usb: qmi_wwan: add qmap mux protocol support")
Reported-by: Aleksander Morgado <aleksander@aleksander.es>
Signed-off-by: Daniele Palmas <dnlplm@gmail.com>
Acked-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-04 14:20:21 -08:00
Tyrel Datwyler
443cc4b45c scsi: ibmvfc: Simplify handling of sub-CRQ initialization
If ibmvfc_init_sub_crqs() fails ibmvfc_probe() simply parrots registration
failure reported elsewhere, and futher vhost->scsi_scrq.scrq == NULL is
indication enough to the driver that it has no sub-CRQs available. The
mq_enabled check can also be moved into ibmvfc_init_sub_crqs() such that
each caller doesn't have to gate the call with a mq_enabled check. Finally,
in the case of sub-CRQ setup failure setting do_enquiry can be turned off
to putting the driver into single queue fallback mode.

The aforementioned changes also simplify the next patch in the series that
fixes a hard reset issue, by tying a sub-CRQ setup failure and do_enquiry
logic into ibmvfc_init_sub_crqs().

Link: https://lore.kernel.org/r/20210302230543.9905-2-tyreld@linux.ibm.com
Reviewed-by: Brian King <brking@linux.ibm.com>
Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-03-04 17:20:04 -05:00
Vladimir Oltean
6a5166e07c net: dsa: sja1105: fix ucast/bcast flooding always remaining enabled
In the blamed patch I managed to introduce a bug while moving code
around: the same logic is applied to the ucast_egress_floods and
bcast_egress_floods variables both on the "if" and the "else" branches.

This is clearly an unintended change compared to how the code used to be
prior to that bugfix, so restore it.

Fixes: 7f7ccdea8c ("net: dsa: sja1105: fix leakage of flooded frames outside bridging domain")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-04 14:19:01 -08:00
Vladimir Oltean
053d8ad10d net: dsa: sja1105: fix SGMII PCS being forced to SPEED_UNKNOWN instead of SPEED_10
When using MLO_AN_PHY or MLO_AN_FIXED, the MII_BMCR of the SGMII PCS is
read before resetting the switch so it can be reprogrammed afterwards.
This works for the speeds of 1Gbps and 100Mbps, but not for 10Mbps,
because SPEED_10 is actually 0, so AND-ing anything with 0 is false,
therefore that last branch is dead code.

Do what others do (genphy_read_status_fixed, phy_mii_ioctl) and just
remove the check for SPEED_10, let it fall into the default case.

Fixes: ffe10e679c ("net: dsa: sja1105: Add support for the SGMII port")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-04 14:19:01 -08:00
Vladimir Oltean
f1becbed41 net: mscc: ocelot: properly reject destination IP keys in VCAP IS1
An attempt is made to warn the user about the fact that VCAP IS1 cannot
offload keys matching on destination IP (at least given the current half
key format), but sadly that warning fails miserably in practice, due to
the fact that it operates on an uninitialized "match" variable. We must
first decode the keys from the flow rule.

Fixes: 75944fda1d ("net: mscc: ocelot: offload ingress skbedit and vlan actions to VCAP IS1")
Reported-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-04 14:16:24 -08:00
David S. Miller
87e5e09427 Merge branch 'nexthop-blackhole'
Ido Schimmel says:

====================
nexthop: Do not flush blackhole nexthops when loopback goes down

Patch #1 prevents blackhole nexthops from being flushed when the
loopback device goes down given that as far as user space is concerned,
these nexthops do not have a nexthop device.

Patch #2 adds a test case.

There are no regressions in fib_nexthops.sh with this change:

 # ./fib_nexthops.sh
 ...
 Tests passed: 165
 Tests failed:   0
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-04 14:04:49 -08:00
Ido Schimmel
3a1099d314 selftests: fib_nexthops: Test blackhole nexthops when loopback goes down
Test that blackhole nexthops are not flushed when the loopback device
goes down.

Output without previous patch:

 # ./fib_nexthops.sh -t basic

 Basic functional tests
 ----------------------
 TEST: List with nothing defined                                     [ OK ]
 TEST: Nexthop get on non-existent id                                [ OK ]
 TEST: Nexthop with no device or gateway                             [ OK ]
 TEST: Nexthop with down device                                      [ OK ]
 TEST: Nexthop with device that is linkdown                          [ OK ]
 TEST: Nexthop with device only                                      [ OK ]
 TEST: Nexthop with duplicate id                                     [ OK ]
 TEST: Blackhole nexthop                                             [ OK ]
 TEST: Blackhole nexthop with other attributes                       [ OK ]
 TEST: Blackhole nexthop with loopback device down                   [FAIL]
 TEST: Create group                                                  [ OK ]
 TEST: Create group with blackhole nexthop                           [FAIL]
 TEST: Create multipath group where 1 path is a blackhole            [ OK ]
 TEST: Multipath group can not have a member replaced by blackhole   [ OK ]
 TEST: Create group with non-existent nexthop                        [ OK ]
 TEST: Create group with same nexthop multiple times                 [ OK ]
 TEST: Replace nexthop with nexthop group                            [ OK ]
 TEST: Replace nexthop group with nexthop                            [ OK ]
 TEST: Nexthop group and device                                      [ OK ]
 TEST: Test proto flush                                              [ OK ]
 TEST: Nexthop group and blackhole                                   [ OK ]

 Tests passed:  19
 Tests failed:   2

Output with previous patch:

 # ./fib_nexthops.sh -t basic

 Basic functional tests
 ----------------------
 TEST: List with nothing defined                                     [ OK ]
 TEST: Nexthop get on non-existent id                                [ OK ]
 TEST: Nexthop with no device or gateway                             [ OK ]
 TEST: Nexthop with down device                                      [ OK ]
 TEST: Nexthop with device that is linkdown                          [ OK ]
 TEST: Nexthop with device only                                      [ OK ]
 TEST: Nexthop with duplicate id                                     [ OK ]
 TEST: Blackhole nexthop                                             [ OK ]
 TEST: Blackhole nexthop with other attributes                       [ OK ]
 TEST: Blackhole nexthop with loopback device down                   [ OK ]
 TEST: Create group                                                  [ OK ]
 TEST: Create group with blackhole nexthop                           [ OK ]
 TEST: Create multipath group where 1 path is a blackhole            [ OK ]
 TEST: Multipath group can not have a member replaced by blackhole   [ OK ]
 TEST: Create group with non-existent nexthop                        [ OK ]
 TEST: Create group with same nexthop multiple times                 [ OK ]
 TEST: Replace nexthop with nexthop group                            [ OK ]
 TEST: Replace nexthop group with nexthop                            [ OK ]
 TEST: Nexthop group and device                                      [ OK ]
 TEST: Test proto flush                                              [ OK ]
 TEST: Nexthop group and blackhole                                   [ OK ]

 Tests passed:  21
 Tests failed:   0

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-04 14:04:49 -08:00
Ido Schimmel
76c03bf8e2 nexthop: Do not flush blackhole nexthops when loopback goes down
As far as user space is concerned, blackhole nexthops do not have a
nexthop device and therefore should not be affected by the
administrative or carrier state of any netdev.

However, when the loopback netdev goes down all the blackhole nexthops
are flushed. This happens because internally the kernel associates
blackhole nexthops with the loopback netdev.

This behavior is both confusing to those not familiar with kernel
internals and also diverges from the legacy API where blackhole IPv4
routes are not flushed when the loopback netdev goes down:

 # ip route add blackhole 198.51.100.0/24
 # ip link set dev lo down
 # ip route show 198.51.100.0/24
 blackhole 198.51.100.0/24

Blackhole IPv6 routes are flushed, but at least user space knows that
they are associated with the loopback netdev:

 # ip -6 route show 2001:db8:1::/64
 blackhole 2001:db8:1::/64 dev lo metric 1024 pref medium

Fix this by only flushing blackhole nexthops when the loopback netdev is
unregistered.

Fixes: ab84be7e54 ("net: Initial nexthop code")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reported-by: Donald Sharp <sharpd@nvidia.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-04 14:04:49 -08:00
Drew Fustini
d93ef30164 net: sctp: trivial: fix typo in comment
Fix typo of 'overflow' for comment in sctp_tsnmap_check().

Reported-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Drew Fustini <drew@beagleboard.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-04 13:48:32 -08:00
David S. Miller
e216674a5b Merge branch '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2021-03-03

This series contains updates to ixgbe and ixgbevf drivers.

Bartosz Golaszewski does not error on -ENODEV from ixgbe_mii_bus_init()
as this is valid for some devices with a shared bus for ixgbe.

Antony Antony adds a check to fail for non transport mode SA with
offload as this is not supported for ixgbe and ixgbevf.

Dinghao Liu fixes a memory leak on failure to program a perfect filter
for ixgbe.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-04 13:47:42 -08:00
Linus Torvalds
3cb60ee632 Merge tag 'tpmdd-next-v5.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd
Pull tpm fixes Jarkko Sakkinen:
 "Three fixes for rc2"

* tag 'tpmdd-next-v5.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd:
  tpm: Remove unintentional dump_stack() call
  tpm, tpm_tis: Decorate tpm_tis_gen_interrupt() with request_locality()
  tpm, tpm_tis: Decorate tpm_get_timeouts() with request_locality()
2021-03-04 12:51:57 -08:00
Milan Broz
df7b59ba92 dm verity: fix FEC for RS roots unaligned to block size
Optional Forward Error Correction (FEC) code in dm-verity uses
Reed-Solomon code and should support roots from 2 to 24.

The error correction parity bytes (of roots lengths per RS block) are
stored on a separate device in sequence without any padding.

Currently, to access FEC device, the dm-verity-fec code uses dm-bufio
client with block size set to verity data block (usually 4096 or 512
bytes).

Because this block size is not divisible by some (most!) of the roots
supported lengths, data repair cannot work for partially stored parity
bytes.

This fix changes FEC device dm-bufio block size to "roots << SECTOR_SHIFT"
where we can be sure that the full parity data is always available.
(There cannot be partial FEC blocks because parity must cover whole
sectors.)

Because the optional FEC starting offset could be unaligned to this
new block size, we have to use dm_bufio_set_sector_offset() to
configure it.

The problem is easily reproduced using veritysetup, e.g. for roots=13:

  # create verity device with RS FEC
  dd if=/dev/urandom of=data.img bs=4096 count=8 status=none
  veritysetup format data.img hash.img --fec-device=fec.img --fec-roots=13 | awk '/^Root hash/{ print $3 }' >roothash

  # create an erasure that should be always repairable with this roots setting
  dd if=/dev/zero of=data.img conv=notrunc bs=1 count=8 seek=4088 status=none

  # try to read it through dm-verity
  veritysetup open data.img test hash.img --fec-device=fec.img --fec-roots=13 $(cat roothash)
  dd if=/dev/mapper/test of=/dev/null bs=4096 status=noxfer
  # wait for possible recursive recovery in kernel
  udevadm settle
  veritysetup close test

With this fix, errors are properly repaired.
  device-mapper: verity-fec: 7:1: FEC 0: corrected 8 errors
  ...

Without it, FEC code usually ends on unrecoverable failure in RS decoder:
  device-mapper: verity-fec: 7:1: FEC 0: failed to correct: -74
  ...

This problem is present in all kernels since the FEC code's
introduction (kernel 4.5).

It is thought that this problem is not visible in Android ecosystem
because it always uses a default RS roots=2.

Depends-on: a14e5ec66a ("dm bufio: subtract the number of initial sectors in dm_bufio_get_device_size")
Signed-off-by: Milan Broz <gmazyland@gmail.com>
Tested-by: Jérôme Carretero <cJ-ko@zougloub.eu>
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Cc: stable@vger.kernel.org # 4.5+
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2021-03-04 15:08:18 -05:00
Mikulas Patocka
a14e5ec66a dm bufio: subtract the number of initial sectors in dm_bufio_get_device_size
dm_bufio_get_device_size returns the device size in blocks. Before
returning the value, we must subtract the nubmer of starting
sectors. The number of starting sectors may not be divisible by block
size.

Note that currently, no target is using dm_bufio_set_sector_offset and
dm_bufio_get_device_size simultaneously, so this change has no effect.
However, an upcoming dm-verity-fec fix needs this change.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Reviewed-by: Milan Broz <gmazyland@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2021-03-04 14:53:54 -05:00
Linus Torvalds
cee407c5cc Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM fixes from Paolo Bonzini:

 - Doc fixes

 - selftests fixes

 - Add runstate information to the new Xen support

 - Allow compiling out the Xen interface

 - 32-bit PAE without EPT bugfix

 - NULL pointer dereference bugfix

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: SVM: Clear the CR4 register on reset
  KVM: x86/xen: Add support for vCPU runstate information
  KVM: x86/xen: Fix return code when clearing vcpu_info and vcpu_time_info
  selftests: kvm: Mmap the entire vcpu mmap area
  KVM: Documentation: Fix index for KVM_CAP_PPC_DAWR1
  KVM: x86: allow compiling out the Xen hypercall interface
  KVM: xen: flush deferred static key before checking it
  KVM: x86/mmu: Set SPTE_AD_WRPROT_ONLY_MASK if and only if PML is enabled
  KVM: x86: hyper-v: Fix Hyper-V context null-ptr-deref
  KVM: x86: remove misplaced comment on active_mmu_pages
  KVM: Documentation: rectify rst markup in kvm_run->flags
  Documentation: kvm: fix messy conversion from .txt to .rst
2021-03-04 11:26:17 -08:00
Linus Torvalds
c5a58f877c Merge tag 'for-linus-5.12b-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen fixes from Juergen Gross:
 "Two security issues (XSA-367 and XSA-369)"

* tag 'for-linus-5.12b-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen: fix p2m size in dom0 for disabled memory hotplug case
  xen-netback: respect gnttab_map_refs()'s return value
  Xen/gnttab: handle p2m update errors on a per-slot basis
2021-03-04 11:24:47 -08:00
Linus Torvalds
43df5242af Merge tag 'sound-5.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
 "Slightly bulky changes are seen at this time, mostly for dealing with
  the messed up Kconfig for ASoC Intel SOF stuff. The driver and its
  code was split to each module per platform now, which is far more
  straightforward. This should cover the randconfig problems, and more
  importantly, improve the actual device handling as well.

  Other than that, nothing particular stands out: the HDMI PCM
  assignment fix for Intel Tigerlake, MIPS n64 error handling fix, and
  the usual suspects, HD-audio / USB-audio quirks"

* tag 'sound-5.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (21 commits)
  ALSA: hda/realtek: Apply dual codec quirks for MSI Godlike X570 board
  ALSA: hda/realtek: Add quirk for Intel NUC 10
  ALSA: hda/hdmi: let new platforms assign the pcm slot dynamically
  ALSA: hda/realtek: Add quirk for Clevo NH55RZQ
  ALSA: hda: intel-sdw-acpi: add missing include files
  ALSA: hda: move Intel SoundWire ACPI scan to dedicated module
  ASoC: SOF: Intel: SoundWire: simplify Kconfig
  ASoC: SOF: pci: move DSP_CONFIG use to platform-specific drivers
  ASoC: SOF: pci: split PCI into different drivers
  ASoC: SOF: ACPI: avoid reverse module dependency
  ASoC: soc-acpi: allow for partial match in parent name
  ALSA: hda: intel-nhlt: verify config type
  ALSA: hda: fix kernel-doc warnings
  ALSA: usb-audio: Fix Pioneer DJM devices URB_CONTROL request direction to set samplerate
  ALSA: usb-audio: use Corsair Virtuoso mapping for Corsair Virtuoso SE
  ALSA: hda/realtek: Enable headset mic of Acer SWIFT with ALC256
  ALSA: ctxfi: cthw20k2: fix mask on conf to allow 4 bits
  ALSA: usb-audio: Allow modifying parameters with succeeding hw_params calls
  ALSA: usb-audio: Drop bogus dB range in too low level
  ALSA: usb-audio: Don't abort even if the clock rate differs
  ...
2021-03-04 11:18:13 -08:00
Dinghao Liu
7a76638163 ixgbe: Fix memleak in ixgbe_configure_clsu32
When ixgbe_fdir_write_perfect_filter_82599() fails,
input allocated by kzalloc() has not been freed,
which leads to memleak.

Signed-off-by: Dinghao Liu <dinghao.liu@zju.edu.cn>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-03-04 11:05:06 -08:00
Antony Antony
d785e1fec6 ixgbe: fail to create xfrm offload of IPsec tunnel mode SA
Based on talks and indirect references ixgbe IPsec offlod do not
support IPsec tunnel mode offload. It can only support IPsec transport
mode offload. Now explicitly fail when creating non transport mode SA
with offload to avoid false performance expectations.

Fixes: 63a67fe229 ("ixgbe: add ipsec offload add and remove SA")
Signed-off-by: Antony Antony <antony@phenome.org>
Acked-by: Shannon Nelson <snelson@pensando.io>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-03-04 11:05:05 -08:00
Naohiro Aota
badae9c869 btrfs: zoned: do not account freed region of read-only block group as zone_unusable
We migrate zone unusable bytes to read-only bytes when a block group is
set to read-only, and account all the free region as bytes_readonly.
Thus, we should not increase block_group->zone_unusable when the block
group is read-only.

Fixes: 169e0da91a ("btrfs: zoned: track unusable bytes for zones")
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-03-04 16:16:58 +01:00
Naohiro Aota
d734492a14 btrfs: zoned: use sector_t for zone sectors
We need to use sector_t for zone_sectors, or it would set the zone size
to zero when the size >= 4GB (= 2^24 sectors) by shifting the
zone_sectors value by SECTOR_SHIFT. We're assuming zones sizes up to
8GiB.

Fixes: 5b31646898 ("btrfs: get zone information of zoned block devices")
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-03-04 16:16:55 +01:00
Ilya Leoshkevich
83a2881903 bpf: Account for BPF_FETCH in insn_has_def32()
insn_has_def32() returns false for 32-bit BPF_FETCH insns. This makes
adjust_insn_aux_data() incorrectly set zext_dst, as can be seen in [1].
This happens because insn_no_def() does not know about the BPF_FETCH
variants of BPF_STX.

Fix in two steps.

First, replace insn_no_def() with insn_def_regno(), which returns the
register an insn defines. Normally insn_no_def() calls are followed by
insn->dst_reg uses; replace those with the insn_def_regno() return
value.

Second, adjust the BPF_STX special case in is_reg64() to deal with
queries made from opt_subreg_zext_lo32_rnd_hi32(), where the state
information is no longer available. Add a comment, since the purpose
of this special case is not clear at first glance.

  [1] https://lore.kernel.org/bpf/20210223150845.1857620-1-jackmanb@google.com/

Fixes: 5ffa25502b ("bpf: Add instructions for atomic_[cmp]xchg")
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Brendan Jackman <jackmanb@google.com>
Link: https://lore.kernel.org/bpf/20210301154019.129110-1-iii@linux.ibm.com
2021-03-04 16:02:31 +01:00
Maciej Fijalkowski
2b2aedabc4 libbpf: Clear map_info before each bpf_obj_get_info_by_fd
xsk_lookup_bpf_maps, based on prog_fd, looks whether current prog has a
reference to XSKMAP. BPF prog can include insns that work on various BPF
maps and this is covered by iterating through map_ids.

The bpf_map_info that is passed to bpf_obj_get_info_by_fd for filling
needs to be cleared at each iteration, so that it doesn't contain any
outdated fields and that is currently missing in the function of
interest.

To fix that, zero-init map_info via memset before each
bpf_obj_get_info_by_fd call.

Also, since the area of this code is touched, in general strcmp is
considered harmful, so let's convert it to strncmp and provide the
size of the array name for current map_info.

While at it, do s/continue/break/ once we have found the xsks_map to
terminate the search.

Fixes: 5750902a6e ("libbpf: proper XSKMAP cleanup")
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Björn Töpel <bjorn.topel@intel.com>
Link: https://lore.kernel.org/bpf/20210303185636.18070-4-maciej.fijalkowski@intel.com
2021-03-04 15:53:37 +01:00
Maciej Fijalkowski
6bc6699881 samples, bpf: Add missing munmap in xdpsock
We mmap the umem region, but we never munmap it.
Add the missing call at the end of the cleanup.

Fixes: 3945b37a97 ("samples/bpf: use hugepages in xdpsock app")
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Björn Töpel <bjorn.topel@intel.com>
Link: https://lore.kernel.org/bpf/20210303185636.18070-3-maciej.fijalkowski@intel.com
2021-03-04 15:53:32 +01:00
Maciej Fijalkowski
c95c34f01b xsk: Remove dangling function declaration from header file
xdp_umem_query() is dead for a long time, drop the declaration from
include/linux/netdevice.h

Fixes: c9b47cc1fa ("xsk: fix bug when trying to use both copy and zero-copy on one queue id")
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Björn Töpel <bjorn.topel@intel.com>
Link: https://lore.kernel.org/bpf/20210303185636.18070-2-maciej.fijalkowski@intel.com
2021-03-04 15:52:44 +01:00
Steven Rostedt (VMware)
f9f344479d tracing: Fix comment about the trace_event_call flags
In the declaration of the struct trace_event_call, the flags has the bits
defined in the comment above it. But these bits are also defined by the
TRACE_EVENT_FL_* enums just above the declaration of the struct. As the
comment about the flags in the struct has become stale and incorrect, just
replace it with a reference to the TRACE_EVENT_FL_* enum above.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-03-04 09:51:50 -05:00
Steven Rostedt (VMware)
ee666a1855 tracing: Skip selftests if tracing is disabled
If tracing is disabled for some reason (traceoff_on_warning, command line,
etc), the ftrace selftests are guaranteed to fail, as their results are
defined by trace data in the ring buffers. If the ring buffers are turned
off, the tests will fail, due to lack of data.

Because tracing being disabled is for a specific reason (warning, user
decided to, etc), it does not make sense to enable tracing to run the self
tests, as the test output may corrupt the reason for the tracing to be
disabled.

Instead, simply skip the self tests and report that they are being skipped
due to tracing being disabled.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-03-04 09:51:25 -05:00
Vamshi K Sthambamkadi
f40fc799af tracing: Fix memory leak in __create_synth_event()
kmemleak report:
unreferenced object 0xc5a6f708 (size 8):
  comm "ftracetest", pid 1209, jiffies 4294911500 (age 6.816s)
  hex dump (first 8 bytes):
    00 c1 3d 60 14 83 1f 8a                          ..=`....
  backtrace:
    [<f0aa4ac4>] __kmalloc_track_caller+0x2a6/0x460
    [<7d3d60a6>] kstrndup+0x37/0x70
    [<45a0e739>] argv_split+0x1c/0x120
    [<c17982f8>] __create_synth_event+0x192/0xb00
    [<0708b8a3>] create_synth_event+0xbb/0x150
    [<3d1941e1>] create_dyn_event+0x5c/0xb0
    [<5cf8b9e3>] trace_parse_run_command+0xa7/0x140
    [<04deb2ef>] dyn_event_write+0x10/0x20
    [<8779ac95>] vfs_write+0xa9/0x3c0
    [<ed93722a>] ksys_write+0x89/0xc0
    [<b9ca0507>] __ia32_sys_write+0x15/0x20
    [<7ce02d85>] __do_fast_syscall_32+0x45/0x80
    [<cb0ecb35>] do_fast_syscall_32+0x29/0x60
    [<2467454a>] do_SYSENTER_32+0x15/0x20
    [<9beaa61d>] entry_SYSENTER_32+0xa9/0xfc
unreferenced object 0xc5a6f078 (size 8):
  comm "ftracetest", pid 1209, jiffies 4294911500 (age 6.816s)
  hex dump (first 8 bytes):
    08 f7 a6 c5 00 00 00 00                          ........
  backtrace:
    [<bbac096a>] __kmalloc+0x2b6/0x470
    [<aa2624b4>] argv_split+0x82/0x120
    [<c17982f8>] __create_synth_event+0x192/0xb00
    [<0708b8a3>] create_synth_event+0xbb/0x150
    [<3d1941e1>] create_dyn_event+0x5c/0xb0
    [<5cf8b9e3>] trace_parse_run_command+0xa7/0x140
    [<04deb2ef>] dyn_event_write+0x10/0x20
    [<8779ac95>] vfs_write+0xa9/0x3c0
    [<ed93722a>] ksys_write+0x89/0xc0
    [<b9ca0507>] __ia32_sys_write+0x15/0x20
    [<7ce02d85>] __do_fast_syscall_32+0x45/0x80
    [<cb0ecb35>] do_fast_syscall_32+0x29/0x60
    [<2467454a>] do_SYSENTER_32+0x15/0x20
    [<9beaa61d>] entry_SYSENTER_32+0xa9/0xfc

In __create_synth_event(), while iterating field/type arguments, the
argv_split() will return array of atleast 2 elements even when zero
arguments(argc=0) are passed. for e.g. when there is double delimiter
or string ends with delimiter

To fix call argv_free() even when argc=0.

Link: https://lkml.kernel.org/r/20210304094521.GA1826@cosmos

Signed-off-by: Vamshi K Sthambamkadi <vamshi.k.sthambamkadi@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-03-04 09:45:57 -05:00
Steven Rostedt (VMware)
6549de1fe3 ring-buffer: Add a little more information and a WARN when time stamp going backwards is detected
When the CONFIG_RING_BUFFER_VALIDATE_TIME_DELTAS is enabled, and the time
stamps are detected as not being valid, it reports information about the
write stamp, but does not show the before_stamp which is still useful
information. Also, it should give a warning once, such that tests detect
this happening.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-03-04 09:45:17 -05:00
Steven Rostedt (VMware)
6f6be606e7 ring-buffer: Force before_stamp and write_stamp to be different on discard
Part of the logic of the new time stamp code depends on the before_stamp and
the write_stamp to be different if the write_stamp does not match the last
event on the buffer, as it will be used to calculate the delta of the next
event written on the buffer.

The discard logic depends on this, as the next event to come in needs to
inject a full timestamp as it can not rely on the last event timestamp in
the buffer because it is unknown due to events after it being discarded. But
by changing the write_stamp back to the time before it, it forces the next
event to use a full time stamp, instead of relying on it.

The issue came when a full time stamp was used for the event, and
rb_time_delta() returns zero in that case. The update to the write_stamp
(which subtracts delta) made it not change. Then when the event is removed
from the buffer, because the before_stamp and write_stamp still match, the
next event written would calculate its delta from the write_stamp, but that
would be wrong as the write_stamp is of the time of the event that was
discarded.

In the case that the delta change being made to write_stamp is zero, set the
before_stamp to zero as well, and this will force the next event to inject a
full timestamp and not use the current write_stamp.

Cc: stable@vger.kernel.org
Fixes: a389d86f7f ("ring-buffer: Have nested events still record running time stamp")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-03-04 09:45:17 -05:00
Rolf Eike Beer
69268094a1 tracing: Fix help text of TRACEPOINT_BENCHMARK in Kconfig
It's "cond_resched()" not "cond_sched()".

Link: https://lkml.kernel.org/r/1863065.aFVDpXsuPd@devpool47

Signed-off-by: Rolf Eike Beer <eb@emlix.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-03-04 09:45:17 -05:00
Yordan Karadzhov (VMware)
70d443d846 tracing: Remove duplicate declaration from trace.h
A declaration of function "int trace_empty(struct trace_iterator *iter)"
shows up twice in the header file kernel/trace/trace.h

Link: https://lkml.kernel.org/r/20210304092348.208033-1-y.karadz@gmail.com

Signed-off-by: Yordan Karadzhov (VMware) <y.karadz@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-03-04 09:44:47 -05:00
Jens Axboe
f01272541d io-wq: ensure all pending work is canceled on exit
If we race on shutting down the io-wq, then we should ensure that any
work that was queued after workers shutdown is canceled. Harden the
add work check a bit too, checking for IO_WQ_BIT_EXIT and cancel if
it's set.

Add a WARN_ON() for having any work before we kill the io-wq context.

Reported-by: syzbot+91b4b56ead187d35c9d3@syzkaller.appspotmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-04 06:38:10 -07:00
Jens Axboe
e4b4a13f49 io_uring: ensure that threads freeze on suspend
Alex reports that his system fails to suspend using 5.12-rc1, with the
following dump:

[  240.650300] PM: suspend entry (deep)
[  240.650748] Filesystems sync: 0.000 seconds
[  240.725605] Freezing user space processes ...
[  260.739483] Freezing of tasks failed after 20.013 seconds (3 tasks refusing to freeze, wq_busy=0):
[  260.739497] task:iou-mgr-446     state:S stack:    0 pid:  516 ppid:   439 flags:0x00004224
[  260.739504] Call Trace:
[  260.739507]  ? sysvec_apic_timer_interrupt+0xb/0x81
[  260.739515]  ? pick_next_task_fair+0x197/0x1cde
[  260.739519]  ? sysvec_reschedule_ipi+0x2f/0x6a
[  260.739522]  ? asm_sysvec_reschedule_ipi+0x12/0x20
[  260.739525]  ? __schedule+0x57/0x6d6
[  260.739529]  ? del_timer_sync+0xb9/0x115
[  260.739533]  ? schedule+0x63/0xd5
[  260.739536]  ? schedule_timeout+0x219/0x356
[  260.739540]  ? __next_timer_interrupt+0xf1/0xf1
[  260.739544]  ? io_wq_manager+0x73/0xb1
[  260.739549]  ? io_wq_create+0x262/0x262
[  260.739553]  ? ret_from_fork+0x22/0x30
[  260.739557] task:iou-mgr-517     state:S stack:    0 pid:  522 ppid:   439 flags:0x00004224
[  260.739561] Call Trace:
[  260.739563]  ? sysvec_apic_timer_interrupt+0xb/0x81
[  260.739566]  ? pick_next_task_fair+0x16f/0x1cde
[  260.739569]  ? sysvec_apic_timer_interrupt+0xb/0x81
[  260.739571]  ? asm_sysvec_apic_timer_interrupt+0x12/0x20
[  260.739574]  ? __schedule+0x5b7/0x6d6
[  260.739578]  ? del_timer_sync+0x70/0x115
[  260.739581]  ? schedule_timeout+0x211/0x356
[  260.739585]  ? __next_timer_interrupt+0xf1/0xf1
[  260.739588]  ? io_wq_check_workers+0x15/0x11f
[  260.739592]  ? io_wq_manager+0x69/0xb1
[  260.739596]  ? io_wq_create+0x262/0x262
[  260.739600]  ? ret_from_fork+0x22/0x30
[  260.739603] task:iou-wrk-517     state:S stack:    0 pid:  523 ppid:   439 flags:0x00004224
[  260.739607] Call Trace:
[  260.739609]  ? __schedule+0x5b7/0x6d6
[  260.739614]  ? schedule+0x63/0xd5
[  260.739617]  ? schedule_timeout+0x219/0x356
[  260.739621]  ? __next_timer_interrupt+0xf1/0xf1
[  260.739624]  ? task_thread.isra.0+0x148/0x3af
[  260.739628]  ? task_thread_unbound+0xa/0xa
[  260.739632]  ? task_thread_bound+0x7/0x7
[  260.739636]  ? ret_from_fork+0x22/0x30
[  260.739647] OOM killer enabled.
[  260.739648] Restarting tasks ... done.
[  260.740077] PM: suspend exit

Play nice and ensure that any thread we create will call try_to_freeze()
at an opportune time so that memory suspend can proceed. For the io-wq
worker threads, mark them as PF_NOFREEZE. They could potentially be
blocked for a long time.

Reported-by: Alex Xu (Hello71) <alex_y_xu@yahoo.ca>
Tested-by: Alex Xu (Hello71) <alex_y_xu@yahoo.ca>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-04 06:38:09 -07:00
Pavel Begunkov
b23fcf477f io_uring: remove extra in_idle wake up
io_dismantle_req() is always followed by io_put_task(), which already do
proper in_idle wake ups, so we can skip waking the owner task in
io_dismantle_req(). The rules are simpler now, do io_put_task() shortly
after ending a request, and it will be fine.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-04 06:38:07 -07:00
Pavel Begunkov
ebf9366707 io_uring: inline __io_queue_async_work()
__io_queue_async_work() is only called from io_queue_async_work(),
inline it.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-04 06:38:05 -07:00
Pavel Begunkov
f85c310ac3 io_uring: inline io_req_clean_work()
Inline io_req_clean_work(), less code and easier to analyse
tctx dependencies and refs usage.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-04 06:38:04 -07:00
Pavel Begunkov
64c7212391 io_uring: choose right tctx->io_wq for try cancel
When we cancel SQPOLL, @task in io_uring_try_cancel_requests() will
differ from current. Use the right tctx from passed in @task, and don't
forget that it can be NULL when the io_uring ctx exits.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-04 06:38:03 -07:00
Jens Axboe
3e6a0d3c75 io_uring: fix -EAGAIN retry with IOPOLL
We no longer revert the iovec on -EIOCBQUEUED, see commit ab2125df92,
and this started causing issues for IOPOLL on devies that run out of
request slots. Turns out what outside of needing a revert for those, we
also had a bug where we didn't properly setup retry inside the submission
path. That could cause re-import of the iovec, if any, and that could lead
to spurious results if the application had those allocated on the stack.

Catch -EAGAIN retry and make the iovec stable for IOPOLL, just like we do
for !IOPOLL retries.

Cc: <stable@vger.kernel.org> # 5.9+
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Reported-by: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-04 06:38:01 -07:00
Jens Axboe
dc7bbc9ef3 io-wq: fix error path leak of buffered write hash map
The 'err' path should include the hash put, we already grabbed a reference
once we get that far.

Fixes: e941894eae ("io-wq: make buffered file write hashed work map per-ctx")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-04 06:37:59 -07:00
Pavel Begunkov
16270893d7 io_uring: remove sqo_task
Now, sqo_task is used only for a warning that is not interesting anymore
since sqo_dead is gone, remove all of that including ctx->sqo_task.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-04 06:37:57 -07:00
Pavel Begunkov
70aacfe661 io_uring: kill sqo_dead and sqo submission halting
As SQPOLL task doesn't poke into ->sqo_task anymore, there is no need to
kill the sqo when the master task exits. Before it was necessary to
avoid races accessing sqo_task->files with removing them.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
[axboe: don't forget to enable SQPOLL before exit, if started disabled]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-04 06:37:55 -07:00
Jens Axboe
1c3b3e6527 io_uring: ignore double poll add on the same waitqueue head
syzbot reports a deadlock, attempting to lock the same spinlock twice:

============================================
WARNING: possible recursive locking detected
5.11.0-syzkaller #0 Not tainted
--------------------------------------------
swapper/1/0 is trying to acquire lock:
ffff88801b2b1130 (&runtime->sleep){..-.}-{2:2}, at: spin_lock include/linux/spinlock.h:354 [inline]
ffff88801b2b1130 (&runtime->sleep){..-.}-{2:2}, at: io_poll_double_wake+0x25f/0x6a0 fs/io_uring.c:4960

but task is already holding lock:
ffff88801b2b3130 (&runtime->sleep){..-.}-{2:2}, at: __wake_up_common_lock+0xb4/0x130 kernel/sched/wait.c:137

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&runtime->sleep);
  lock(&runtime->sleep);

 *** DEADLOCK ***

 May be due to missing lock nesting notation

2 locks held by swapper/1/0:
 #0: ffff888147474908 (&group->lock){..-.}-{2:2}, at: _snd_pcm_stream_lock_irqsave+0x9f/0xd0 sound/core/pcm_native.c:170
 #1: ffff88801b2b3130 (&runtime->sleep){..-.}-{2:2}, at: __wake_up_common_lock+0xb4/0x130 kernel/sched/wait.c:137

stack backtrace:
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.11.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 <IRQ>
 __dump_stack lib/dump_stack.c:79 [inline]
 dump_stack+0xfa/0x151 lib/dump_stack.c:120
 print_deadlock_bug kernel/locking/lockdep.c:2829 [inline]
 check_deadlock kernel/locking/lockdep.c:2872 [inline]
 validate_chain kernel/locking/lockdep.c:3661 [inline]
 __lock_acquire.cold+0x14c/0x3b4 kernel/locking/lockdep.c:4900
 lock_acquire kernel/locking/lockdep.c:5510 [inline]
 lock_acquire+0x1ab/0x730 kernel/locking/lockdep.c:5475
 __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
 _raw_spin_lock+0x2a/0x40 kernel/locking/spinlock.c:151
 spin_lock include/linux/spinlock.h:354 [inline]
 io_poll_double_wake+0x25f/0x6a0 fs/io_uring.c:4960
 __wake_up_common+0x147/0x650 kernel/sched/wait.c:108
 __wake_up_common_lock+0xd0/0x130 kernel/sched/wait.c:138
 snd_pcm_update_state+0x46a/0x540 sound/core/pcm_lib.c:203
 snd_pcm_update_hw_ptr0+0xa75/0x1a50 sound/core/pcm_lib.c:464
 snd_pcm_period_elapsed+0x160/0x250 sound/core/pcm_lib.c:1805
 dummy_hrtimer_callback+0x94/0x1b0 sound/drivers/dummy.c:378
 __run_hrtimer kernel/time/hrtimer.c:1519 [inline]
 __hrtimer_run_queues+0x609/0xe40 kernel/time/hrtimer.c:1583
 hrtimer_run_softirq+0x17b/0x360 kernel/time/hrtimer.c:1600
 __do_softirq+0x29b/0x9f6 kernel/softirq.c:345
 invoke_softirq kernel/softirq.c:221 [inline]
 __irq_exit_rcu kernel/softirq.c:422 [inline]
 irq_exit_rcu+0x134/0x200 kernel/softirq.c:434
 sysvec_apic_timer_interrupt+0x93/0xc0 arch/x86/kernel/apic/apic.c:1100
 </IRQ>
 asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:632
RIP: 0010:native_save_fl arch/x86/include/asm/irqflags.h:29 [inline]
RIP: 0010:arch_local_save_flags arch/x86/include/asm/irqflags.h:70 [inline]
RIP: 0010:arch_irqs_disabled arch/x86/include/asm/irqflags.h:137 [inline]
RIP: 0010:acpi_safe_halt drivers/acpi/processor_idle.c:111 [inline]
RIP: 0010:acpi_idle_do_entry+0x1c9/0x250 drivers/acpi/processor_idle.c:516
Code: dd 38 6e f8 84 db 75 ac e8 54 32 6e f8 e8 0f 1c 74 f8 e9 0c 00 00 00 e8 45 32 6e f8 0f 00 2d 4e 4a c5 00 e8 39 32 6e f8 fb f4 <9c> 5b 81 e3 00 02 00 00 fa 31 ff 48 89 de e8 14 3a 6e f8 48 85 db
RSP: 0018:ffffc90000d47d18 EFLAGS: 00000293
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: ffff8880115c3780 RSI: ffffffff89052537 RDI: 0000000000000000
RBP: ffff888141127064 R08: 0000000000000001 R09: 0000000000000001
R10: ffffffff81794168 R11: 0000000000000000 R12: 0000000000000001
R13: ffff888141127000 R14: ffff888141127064 R15: ffff888143331804
 acpi_idle_enter+0x361/0x500 drivers/acpi/processor_idle.c:647
 cpuidle_enter_state+0x1b1/0xc80 drivers/cpuidle/cpuidle.c:237
 cpuidle_enter+0x4a/0xa0 drivers/cpuidle/cpuidle.c:351
 call_cpuidle kernel/sched/idle.c:158 [inline]
 cpuidle_idle_call kernel/sched/idle.c:239 [inline]
 do_idle+0x3e1/0x590 kernel/sched/idle.c:300
 cpu_startup_entry+0x14/0x20 kernel/sched/idle.c:397
 start_secondary+0x274/0x350 arch/x86/kernel/smpboot.c:272
 secondary_startup_64_no_verify+0xb0/0xbb

which is due to the driver doing poll_wait() twice on the same
wait_queue_head. That is perfectly valid, but from checking the rest
of the kernel tree, it's the only driver that does this.

We can handle this just fine, we just need to ignore the second addition
as we'll get woken just fine on the first one.

Cc: stable@vger.kernel.org # 5.8+
Fixes: 18bceab101 ("io_uring: allow POLL_ADD with double poll_wait() users")
Reported-by: syzbot+28abd693db9e92c160d8@syzkaller.appspotmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-04 06:37:14 -07:00
Jens Axboe
3ebba796fa io_uring: ensure that SQPOLL thread is started for exit
If we create it in a disabled state because IORING_SETUP_R_DISABLED is
set on ring creation, we need to ensure that we've kicked the thread if
we're exiting before it's been explicitly disabled. Otherwise we can run
into a deadlock where exit is waiting go park the SQPOLL thread, but the
SQPOLL thread itself is waiting to get a signal to start.

That results in the below trace of both tasks hung, waiting on each other:

INFO: task syz-executor458:8401 blocked for more than 143 seconds.
      Not tainted 5.11.0-next-20210226-syzkaller #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:syz-executor458 state:D stack:27536 pid: 8401 ppid:  8400 flags:0x00004004
Call Trace:
 context_switch kernel/sched/core.c:4324 [inline]
 __schedule+0x90c/0x21a0 kernel/sched/core.c:5075
 schedule+0xcf/0x270 kernel/sched/core.c:5154
 schedule_timeout+0x1db/0x250 kernel/time/timer.c:1868
 do_wait_for_common kernel/sched/completion.c:85 [inline]
 __wait_for_common kernel/sched/completion.c:106 [inline]
 wait_for_common kernel/sched/completion.c:117 [inline]
 wait_for_completion+0x168/0x270 kernel/sched/completion.c:138
 io_sq_thread_park fs/io_uring.c:7115 [inline]
 io_sq_thread_park+0xd5/0x130 fs/io_uring.c:7103
 io_uring_cancel_task_requests+0x24c/0xd90 fs/io_uring.c:8745
 __io_uring_files_cancel+0x110/0x230 fs/io_uring.c:8840
 io_uring_files_cancel include/linux/io_uring.h:47 [inline]
 do_exit+0x299/0x2a60 kernel/exit.c:780
 do_group_exit+0x125/0x310 kernel/exit.c:922
 __do_sys_exit_group kernel/exit.c:933 [inline]
 __se_sys_exit_group kernel/exit.c:931 [inline]
 __x64_sys_exit_group+0x3a/0x50 kernel/exit.c:931
 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x43e899
RSP: 002b:00007ffe89376d48 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
RAX: ffffffffffffffda RBX: 00000000004af2f0 RCX: 000000000043e899
RDX: 000000000000003c RSI: 00000000000000e7 RDI: 0000000000000000
RBP: 0000000000000000 R08: ffffffffffffffc0 R09: 0000000010000000
R10: 0000000000008011 R11: 0000000000000246 R12: 00000000004af2f0
R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000001
INFO: task iou-sqp-8401:8402 can't die for more than 143 seconds.
task:iou-sqp-8401    state:D stack:30272 pid: 8402 ppid:  8400 flags:0x00004004
Call Trace:
 context_switch kernel/sched/core.c:4324 [inline]
 __schedule+0x90c/0x21a0 kernel/sched/core.c:5075
 schedule+0xcf/0x270 kernel/sched/core.c:5154
 schedule_timeout+0x1db/0x250 kernel/time/timer.c:1868
 do_wait_for_common kernel/sched/completion.c:85 [inline]
 __wait_for_common kernel/sched/completion.c:106 [inline]
 wait_for_common kernel/sched/completion.c:117 [inline]
 wait_for_completion+0x168/0x270 kernel/sched/completion.c:138
 io_sq_thread+0x27d/0x1ae0 fs/io_uring.c:6717
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294
INFO: task iou-sqp-8401:8402 blocked for more than 143 seconds.

Reported-by: syzbot+fb5458330b4442f2090d@syzkaller.appspotmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-04 06:37:14 -07:00
Pavel Begunkov
28c4721b80 io_uring: replace cmpxchg in fallback with xchg
io_run_ctx_fallback() can use xchg() instead of cmpxchg(). It's simpler
and faster.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-04 06:37:14 -07:00
Pavel Begunkov
2c32395d81 io_uring: fix __tctx_task_work() ctx race
There is an unlikely but possible race using a freed context. That's
because req->task_work.func() can free a request, but we won't
necessarily find a completion in submit_state.comp and so all ctx refs
may be put by the time we do mutex_lock(&ctx->uring_ctx);

There are several reasons why it can miss going through
submit_state.comp: 1) req->task_work.func() didn't complete it itself,
but punted to iowq (e.g. reissue) and it got freed later, or a similar
situation with it overflowing and getting flushed by someone else, or
being submitted to IRQ completion, 2) As we don't hold the uring_lock,
someone else can do io_submit_flush_completions() and put our ref.
3) Bugs and code obscurities, e.g. failing to propagate issue_flags
properly.

One example is as follows

  CPU1                                  |  CPU2
=======================================================================
@req->task_work.func()                  |
  -> @req overflwed,                    |
     so submit_state.comp,nr==0         |
                                        | flush overflows, and free @req
                                        | ctx refs == 0, free it
ctx is dead, but we do                  |
	lock + flush + unlock           |

So take a ctx reference for each new ctx we see in __tctx_task_work(),
and do release it until we do all our flushing.

Fixes: 65453d1efb ("io_uring: enable req cache for task_work items")
Reported-by: syzbot+a157ac7c03a56397f553@syzkaller.appspotmail.com
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
[axboe: fold in my one-liner and fix ref mismatch]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-04 06:37:05 -07:00
Jens Axboe
0d30b3e7ee io_uring: kill io_uring_flush()
This was always a weird work-around or file referencing, and we don't
need it anymore. Get rid of it.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-04 06:37:03 -07:00
Jens Axboe
914390bcfd io_uring: kill unnecessary io_run_ctx_fallback() in io_ring_exit_work()
We already run the fallback task_work in io_uring_try_cancel_requests(),
no need to duplicate at ring exit explicitly.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-04 06:36:28 -07:00
Jens Axboe
5730b27e84 io_uring: move cred assignment into io_issue_sqe()
If we move it in there, then we no longer have to care about it in io-wq.
This means we can drop the cred handling in io-wq, and we can drop the
REQ_F_WORK_INITIALIZED flag and async init functions as that was the last
user of it since we moved to the new workers. Then we can also drop
io_wq_work->creds, and just hold the personality u16 in there instead.

Suggested-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-04 06:36:28 -07:00
Jens Axboe
1575f21a09 io_uring: kill unnecessary REQ_F_WORK_INITIALIZED checks
We're no longer checking anything that requires the work item to be
initialized, as we're not carrying any file related state there.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-04 06:36:26 -07:00
Jens Axboe
4010fec41f io_uring: remove unused argument 'tsk' from io_req_caches_free()
We prune the full cache regardless, get rid of the dead argument.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-04 06:36:24 -07:00
Pavel Begunkov
8452d4a674 io_uring: destroy io-wq on exec
Destroy current's io-wq backend and tctx on __io_uring_task_cancel(),
aka exec(). Looks it's not strictly necessary, because it will be done
at some point when the task dies and changes of creds/files/etc. are
handled, but better to do that earlier to free io-wq and not potentially
lock previous mm and other resources for the time being.

It's safe to do because we wait for all requests of the current task to
complete, so no request will use tctx afterwards. Note, that
io_uring_files_cancel() may leave some requests for later reaping, so it
leaves tctx intact, that's ok as the task is dying anyway.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-04 06:36:22 -07:00
Pavel Begunkov
ef8eaa4e65 io_uring: warn on not destroyed io-wq
Make sure that we killed an io-wq by the time a task is dead.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-04 06:35:00 -07:00
Jens Axboe
1d5f360dd1 io_uring: fix race condition in task_work add and clear
We clear the bit marking the ctx task_work as active after having run
the queued work, but we really should be clearing it before. Otherwise
we can hit a tiny race ala:

CPU0					CPU1
io_task_work_add()			tctx_task_work()
					run_work
	add_to_list
	test_and_set_bit
					clear_bit
		already set

and CPU0 will return thinking the task_work is queued, while in reality
it's already being run. If we hit the condition after __tctx_task_work()
found no more work, but before we've cleared the bit, then we'll end up
thinking it's queued and will be run. In reality it is queued, but we
didn't queue the ctx task_work to ensure that it gets run.

Fixes: 7cbf1722d5 ("io_uring: provide FIFO ordering for task_work")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-04 06:35:00 -07:00
Jens Axboe
afcc4015d1 io-wq: provide an io_wq_put_and_exit() helper
If we put the io-wq from io_uring, we really want it to exit. Provide
a helper that does that for us. Couple that with not having the manager
hold a reference to the 'wq' and the normal SQPOLL exit will tear down
the io-wq context appropriate.

On the io-wq side, our wq context is per task, so only the task itself
is manipulating ->manager and hence it's safe to check and clear without
any extra locking. We just need to ensure that the manager task stays
around, in case it exits.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-04 06:34:39 -07:00
Jens Axboe
8629397e6e io_uring: don't use complete_all() on SQPOLL thread exit
We want to reuse this completion, and a single complete should do just
fine. Ensure that we park ourselves first if requested, as that is what
lead to the initial deadlock in this area. If we've got someone attempting
to park us, then we can't proceed without having them finish first.

Fixes: 37d1e2e364 ("io_uring: move SQPOLL thread io-wq forked worker")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-04 06:34:04 -07:00
Pavel Begunkov
ba50a036f2 io_uring: run fallback on cancellation
io_uring_try_cancel_requests() matches not only current's requests, but
also of other exiting tasks, so we need to actively cancel them and not
just wait, especially since the function can be called on flush during
do_exit() -> exit_files().
Even if it's not a problem for now, it's much nicer to know that the
function tries to cancel everything it can.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-04 06:34:03 -07:00
Jens Axboe
e54945ae94 io_uring: SQPOLL stop error handling fixes
If we fail to fork an SQPOLL worker, we can hit cancel, and hence
attempted thread stop, with the thread already being stopped. Ensure
we check for that.

Also guard thread stop fully by the sqd mutex, just like we do for
park.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-04 06:34:01 -07:00
Jens Axboe
470ec4ed8c io-wq: fix double put of 'wq' in error path
We are already freeing the wq struct in both spots, so don't put it and
get it freed twice.

Reported-by: syzbot+7bf785eedca35ca05501@syzkaller.appspotmail.com
Fixes: 4fb6ac3262 ("io-wq: improve manager/worker handling over exec")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-04 06:34:00 -07:00
Jens Axboe
d364d9e5db io-wq: wait for manager exit on wq destroy
The manager waits for the workers, hence the manager is always valid if
workers are running. Now also have wq destroy wait for the manager on
exit, so we now everything is gone.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-04 06:33:58 -07:00
Jens Axboe
dbf996202e io-wq: rename wq->done completion to wq->started
This is a leftover from a different use cases, it's used to wait for
the manager to startup. Rename it as such.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-04 06:32:54 -07:00
Jens Axboe
613eeb600e io-wq: don't ask for a new worker if we're exiting
If we're in the process of shutting down the async context, then don't
create new workers if we already have at least the fixed one.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-04 06:32:53 -07:00
Jens Axboe
fb3a1f6c74 io-wq: have manager wait for all workers to exit
Instead of having to wait separately on workers and manager, just have
the manager wait on the workers. We use an atomic_t for the reference
here, as we need to start at 0 and allow increment from that. Since the
number of workers is naturally capped by the allowed nr of processes,
and that uses an int, there is no risk of overflow.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-04 06:32:33 -07:00
Zenghui Yu
444d66a23c iommu/vt-d: Fix status code for Allocate/Free PASID command
As per Intel vt-d spec, Rev 3.0 (section 10.4.45 "Virtual Command Response
Register"), the status code of "No PASID available" error in response to
the Allocate PASID command is 2, not 1. The same for "Invalid PASID" error
in response to the Free PASID command.

We will otherwise see confusing kernel log under the command failure from
guest side. Fix it.

Fixes: 24f27d32ab ("iommu/vt-d: Enlightened PASID allocation")
Signed-off-by: Zenghui Yu <yuzenghui@huawei.com>
Acked-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20210227073909.432-1-yuzenghui@huawei.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-03-04 13:32:04 +01:00
Lu Baolu
82c3cefb9f iommu: Don't use lazy flush for untrusted device
The lazy IOTLB flushing setup leaves a time window, in which the device
can still access some system memory, which has already been unmapped by
the device driver. It's not suitable for untrusted devices. A malicious
device might use this to attack the system by obtaining data that it
shouldn't obtain.

Fixes: c588072bba ("iommu/vt-d: Convert intel iommu driver to the iommu ops")
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20210225061454.2864009-1-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-03-04 13:22:28 +01:00
Nicolin Chen
765a9d1d02 iommu/tegra-smmu: Fix mc errors on tegra124-nyan
Commit 25938c73cd ("iommu/tegra-smmu: Rework tegra_smmu_probe_device()")
removed certain hack in the tegra_smmu_probe() by relying on IOMMU core to
of_xlate SMMU's SID per device, so as to get rid of tegra_smmu_find() and
tegra_smmu_configure() that are typically done in the IOMMU core also.

This approach works for both existing devices that have DT nodes and other
devices (like PCI device) that don't exist in DT, on Tegra210 and Tegra3
upon testing. However, Page Fault errors are reported on tegra124-Nyan:

  tegra-mc 70019000.memory-controller: display0a: read @0xfe056b40:
	 EMEM address decode error (SMMU translation error [--S])
  tegra-mc 70019000.memory-controller: display0a: read @0xfe056b40:
	 Page fault (SMMU translation error [--S])

After debugging, I found that the mentioned commit changed some function
callback sequence of tegra-smmu's, resulting in enabling SMMU for display
client before display driver gets initialized. I couldn't reproduce exact
same issue on Tegra210 as Tegra124 (arm-32) differs at arch-level code.

Actually this Page Fault is a known issue, as on most of Tegra platforms,
display gets enabled by the bootloader for the splash screen feature, so
it keeps filling the framebuffer memory. A proper fix to this issue is to
1:1 linear map the framebuffer memory to IOVA space so the SMMU will have
the same address as the physical address in its page table. Yet, Thierry
has been working on the solution above for a year, and it hasn't merged.

Therefore, let's partially revert the mentioned commit to fix the errors.

The reason why we do a partial revert here is that we can still set priv
in ->of_xlate() callback for PCI devices. Meanwhile, devices existing in
DT, like display, will go through tegra_smmu_configure() at the stage of
bus_set_iommu() when SMMU gets probed(), as what it did before we merged
the mentioned commit.

Once we have the linear map solution for framebuffer memory, this change
can be cleaned away.

[Big thank to Guillaume who reported and helped debugging/verification]

Fixes: 25938c73cd ("iommu/tegra-smmu: Rework tegra_smmu_probe_device()")
Reported-by: Guillaume Tucker <guillaume.tucker@collabora.com>
Signed-off-by: Nicolin Chen <nicoleotsuka@gmail.com>
Tested-by: Guillaume Tucker <guillaume.tucker@collabora.com>
Acked-by: Thierry Reding <treding@nvidia.com>
Link: https://lore.kernel.org/r/20210218220702.1962-1-nicoleotsuka@gmail.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-03-04 13:20:54 +01:00
Andrey Ryabinin
140456f994 iommu/amd: Fix sleeping in atomic in increase_address_space()
increase_address_space() calls get_zeroed_page(gfp) under spin_lock with
disabled interrupts. gfp flags passed to increase_address_space() may allow
sleeping, so it comes to this:

 BUG: sleeping function called from invalid context at mm/page_alloc.c:4342
 in_atomic(): 1, irqs_disabled(): 1, pid: 21555, name: epdcbbf1qnhbsd8

 Call Trace:
  dump_stack+0x66/0x8b
  ___might_sleep+0xec/0x110
  __alloc_pages_nodemask+0x104/0x300
  get_zeroed_page+0x15/0x40
  iommu_map_page+0xdd/0x3e0
  amd_iommu_map+0x50/0x70
  iommu_map+0x106/0x220
  vfio_iommu_type1_ioctl+0x76e/0x950 [vfio_iommu_type1]
  do_vfs_ioctl+0xa3/0x6f0
  ksys_ioctl+0x66/0x70
  __x64_sys_ioctl+0x16/0x20
  do_syscall_64+0x4e/0x100
  entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fix this by moving get_zeroed_page() out of spin_lock/unlock section.

Fixes: 754265bcab ("iommu/amd: Fix race in increase_address_space()")
Signed-off-by: Andrey Ryabinin <arbn@yandex-team.com>
Acked-by: Will Deacon <will@kernel.org>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20210217143004.19165-1-arbn@yandex-team.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-03-04 13:15:56 +01:00
Takashi Iwai
06abcb18b3 ALSA: usb-audio: Apply the control quirk to Plantronics headsets
Other Plantronics headset models seem requiring the same workaround as
C320-M to add the 20ms delay for the control messages, too.  Apply the
workaround generically for devices with the vendor ID 0x047f.

Note that the problem didn't surface before 5.11 just with luck.
Since 5.11 got a big code rewrite about the stream handling, the
parameter setup procedure has changed, and this seemed triggering the
problem more often.

BugLink: https://bugzilla.suse.com/show_bug.cgi?id=1182552
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20210304085009.4770-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2021-03-04 09:50:55 +01:00
Takashi Iwai
fec60c3bc5 ALSA: usb-audio: Fix "cannot get freq eq" errors on Dell AE515 sound bar
Dell AE515 sound bar (413c:a506) spews the error messages when the
driver tries to read the current sample frequency, hence it needs to
be on the list in snd_usb_get_sample_rate_quirk().

BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=211551
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20210304083021.2152-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2021-03-04 09:31:08 +01:00
Mark Pearson
a14a621999 ALSA: hda: ignore invalid NHLT table
On some Lenovo systems if the microphone is disabled in the BIOS
only the NHLT table header is created, with no data. This means
the endpoints field is not correctly set to zero - leading to an
unintialised variable and hence invalid descriptors are parsed
leading to page faults.

The Lenovo firmware team is addressing this, but adding a check
preventing invalid tables being parsed is worthwhile.

Tested on a Lenovo T14.

Tested-by: Philipp Leskovitz <philipp.leskovitz@secunet.com>
Reported-by: Philipp Leskovitz <philipp.leskovitz@secunet.com>
Signed-off-by: Mark Pearson <markpearson@lenovo.com>
Reviewed-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Link: https://lore.kernel.org/r/20210302141003.7342-1-markpearson@lenovo.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2021-03-04 09:14:36 +01:00
Zhang Yunkai
460c9f1c94 arch/parisc/kernel: remove duplicate include in ptrace
'linux/compat.h' included in 'arch/parisc/kernel/ptrace.c' is duplicated.
It is also included in the 24th line.

Signed-off-by: Zhang Yunkai <zhang.yunkai@zte.com.cn>
Signed-off-by: Helge Deller <deller@gmx.de>
2021-03-04 09:12:29 +01:00
Kai-Heng Feng
9799110825 ALSA: usb-audio: Disable USB autosuspend properly in setup_disable_autosuspend()
Rear audio on Lenovo ThinkStation P620 stops working after commit
1965c4364b ("ALSA: usb-audio: Disable autosuspend for Lenovo
ThinkStation P620"):
[    6.013526] usbcore: registered new interface driver snd-usb-audio
[    6.023064] usb 3-6: cannot get ctl value: req = 0x81, wValue = 0x100, wIndex = 0x0, type = 1
[    6.023083] usb 3-6: cannot get ctl value: req = 0x81, wValue = 0x202, wIndex = 0x0, type = 4
[    6.023090] usb 3-6: cannot get ctl value: req = 0x81, wValue = 0x100, wIndex = 0x0, type = 1
[    6.023098] usb 3-6: cannot get ctl value: req = 0x81, wValue = 0x202, wIndex = 0x0, type = 4
[    6.023103] usb 3-6: cannot get ctl value: req = 0x81, wValue = 0x100, wIndex = 0x0, type = 1
[    6.023110] usb 3-6: cannot get ctl value: req = 0x81, wValue = 0x202, wIndex = 0x0, type = 4
[    6.045846] usb 3-6: cannot get ctl value: req = 0x81, wValue = 0x100, wIndex = 0x0, type = 1
[    6.045866] usb 3-6: cannot get ctl value: req = 0x81, wValue = 0x202, wIndex = 0x0, type = 4
[    6.045877] usb 3-6: cannot get ctl value: req = 0x81, wValue = 0x100, wIndex = 0x0, type = 1
[    6.045886] usb 3-6: cannot get ctl value: req = 0x81, wValue = 0x202, wIndex = 0x0, type = 4
[    6.045894] usb 3-6: cannot get ctl value: req = 0x81, wValue = 0x100, wIndex = 0x0, type = 1
[    6.045908] usb 3-6: cannot get ctl value: req = 0x81, wValue = 0x202, wIndex = 0x0, type = 4

I overlooked the issue because when I was working on the said commit,
only the front audio is tested. Apology for that.

Changing supports_autosuspend in driver is too late for disabling
autosuspend, because it was already used by USB probe routine, so it can
break the balance on the following code that depends on
supports_autosuspend.

Fix it by using usb_disable_autosuspend() helper, and balance the
suspend count in disconnect callback.

Fixes: 1965c4364b ("ALSA: usb-audio: Disable autosuspend for Lenovo ThinkStation P620")
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20210304043419.287191-1-kai.heng.feng@canonical.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2021-03-04 09:09:37 +01:00
Kevin Wang
1aa46901ee drm/amdgpu: fix parameter error of RREG32_PCIE() in amdgpu_regs_pcie
the register offset isn't needed division by 4 to pass RREG32_PCIE()

Signed-off-by: Kevin Wang <kevin1.wang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2021-03-03 23:05:16 -05:00
Colin Ian King
b934dd9b44 drm/amd/display: fix the return of the uninitialized value in ret
Currently if stream->signal is neither SIGNAL_TYPE_DISPLAY_PORT_MST or
SIGNAL_TYPE_DISPLAY_PORT then variable ret is uninitialized and this is
checked for > 0 at the end of the function.  Ret should be initialized,
I believe setting it to zero is a correct default.

Addresses-Coverity: ("Uninitialized scalar variable")
Fixes: bd0c064c16 ("drm/amd/display: Add return code instead of boolean for future use")
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-03-03 23:05:16 -05:00
Alex Deucher
25951362db drm/amdgpu: enable BACO runpm by default on sienna cichlid and navy flounder
It works fine and was only disabled because primary GPUs
don't enter runpm if there is a console bound to the fbdev due
to the kmap.  This will at least allow runpm on secondary cards.

Reviewed-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-03-03 23:05:16 -05:00
Evan Quan
6efda16713 drm/amd/pm: correct Arcturus mmTHM_BACO_CNTL register address
Arcturus has a different register address from other SMU V11
ASICs.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Acked-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2021-03-03 23:05:02 -05:00
Alex Deucher
992ace410c drm/amdgpu/swsmu/vangogh: Only use RLCPowerNotify msg for disable
Per discussions with PMFW team, the driver only needs to
notify the PMFW when the RLC is disabled.  The RLC FW will notify
the PMFW directly when it's enabled.

Acked-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-03-03 22:50:26 -05:00
Alex Deucher
e3746696e7 drm/amdgpu/pm: make unsupported power profile messages debug
Making them an error confuses users and the errors are harmless
as not all asics support all profiles.

Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1488
Acked-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-03-03 22:49:44 -05:00
Asher.Song
0c61ac8134 drm/amdgpu:disable VCN for Navi12 SKU
Navi12 0x7360/C7 SKU has no video support, so remove it.

Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Asher.Song <Asher.Song@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2021-03-03 22:48:33 -05:00
Alex Deucher
31ada99bdd drm/amdgpu: Only check for S0ix if AMD_PMC is configured
The S0ix check only makes sense if the AMD PMC driver is
present.  We need to use the legacy S3 pathes when the
PMC driver is not present.

Reviewed-by: Prike Liang <Prike.Liang@amd.com>
Reviewed-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2021-03-03 22:46:55 -05:00
Pablo Neira Ayuso
bd1777b3a8 netfilter: nftables: bogus check for netlink portID with table owner
The existing branch checks for 0 != table->nlpid which always evaluates
true for tables that have an owner.

Fixes: 6001a930ce ("netfilter: nftables: introduce table ownership")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-03-04 04:02:54 +01:00
Pablo Neira Ayuso
2888b080d0 netfilter: nftables: fix possible double hook unregistration with table owner
Skip hook unregistration of owner tables from the netns exit path,
nft_rcv_nl_event() unregisters the table hooks before tearing down
the table content.

Fixes: 6001a930ce ("netfilter: nftables: introduce table ownership")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-03-04 03:58:55 +01:00
zhang kai
a9ecb0cbf0 rtnetlink: using dev_base_seq from target net
Signed-off-by: zhang kai <zhangkaiheb@126.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-03 16:59:17 -08:00
Jisheng Zhang
d65614a01d net: 9p: advance iov on empty read
I met below warning when cating a small size(about 80bytes) txt file
on 9pfs(msize=2097152 is passed to 9p mount option), the reason is we
miss iov_iter_advance() if the read count is 0 for zerocopy case, so
we didn't truncate the pipe, then iov_iter_pipe() thinks the pipe is
full. Fix it by removing the exception for 0 to ensure to call
iov_iter_advance() even on empty read for zerocopy case.

[    8.279568] WARNING: CPU: 0 PID: 39 at lib/iov_iter.c:1203 iov_iter_pipe+0x31/0x40
[    8.280028] Modules linked in:
[    8.280561] CPU: 0 PID: 39 Comm: cat Not tainted 5.11.0+ #6
[    8.281260] RIP: 0010:iov_iter_pipe+0x31/0x40
[    8.281974] Code: 2b 42 54 39 42 5c 76 22 c7 07 20 00 00 00 48 89 57 18 8b 42 50 48 c7 47 08 b
[    8.283169] RSP: 0018:ffff888000cbbd80 EFLAGS: 00000246
[    8.283512] RAX: 0000000000000010 RBX: ffff888000117d00 RCX: 0000000000000000
[    8.283876] RDX: ffff88800031d600 RSI: 0000000000000000 RDI: ffff888000cbbd90
[    8.284244] RBP: ffff888000cbbe38 R08: 0000000000000000 R09: ffff8880008d2058
[    8.284605] R10: 0000000000000002 R11: ffff888000375510 R12: 0000000000000050
[    8.284964] R13: ffff888000cbbe80 R14: 0000000000000050 R15: ffff88800031d600
[    8.285439] FS:  00007f24fd8af600(0000) GS:ffff88803ec00000(0000) knlGS:0000000000000000
[    8.285844] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    8.286150] CR2: 00007f24fd7d7b90 CR3: 0000000000c97000 CR4: 00000000000406b0
[    8.286710] Call Trace:
[    8.288279]  generic_file_splice_read+0x31/0x1a0
[    8.289273]  ? do_splice_to+0x2f/0x90
[    8.289511]  splice_direct_to_actor+0xcc/0x220
[    8.289788]  ? pipe_to_sendpage+0xa0/0xa0
[    8.290052]  do_splice_direct+0x8b/0xd0
[    8.290314]  do_sendfile+0x1ad/0x470
[    8.290576]  do_syscall_64+0x2d/0x40
[    8.290818]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[    8.291409] RIP: 0033:0x7f24fd7dca0a
[    8.292511] Code: c3 0f 1f 80 00 00 00 00 4c 89 d2 4c 89 c6 e9 bd fd ff ff 0f 1f 44 00 00 31 8
[    8.293360] RSP: 002b:00007ffc20932818 EFLAGS: 00000206 ORIG_RAX: 0000000000000028
[    8.293800] RAX: ffffffffffffffda RBX: 0000000001000000 RCX: 00007f24fd7dca0a
[    8.294153] RDX: 0000000000000000 RSI: 0000000000000003 RDI: 0000000000000001
[    8.294504] RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000000000000
[    8.294867] R10: 0000000001000000 R11: 0000000000000206 R12: 0000000000000003
[    8.295217] R13: 0000000000000001 R14: 0000000000000001 R15: 0000000000000000
[    8.295782] ---[ end trace 63317af81b3ca24b ]---

Signed-off-by: Jisheng Zhang <Jisheng.Zhang@synaptics.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-03 16:57:59 -08:00
Hayes Wang
4b5dc1a94d Revert "r8152: adjust the settings about MAC clock speed down for RTL8153"
This reverts commit 134f98bcf1.

The r8153_mac_clk_spd() is used for RTL8153A only, because the register
table of RTL8153B is different from RTL8153A. However, this function would
be called when RTL8153B calls r8153_first_init() and r8153_enter_oob().
That causes RTL8153B becomes unstable when suspending and resuming. The
worst case may let the device stop working.

Besides, revert this commit to disable MAC clock speed down for RTL8153A.
It would avoid the known issue when enabling U1. The data of the first
control transfer may be wrong when exiting U1.

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-03 16:56:03 -08:00
Matthias Schiffer
3e59e88567 net: l2tp: reduce log level of messages in receive path, add counter instead
Commit 5ee759cda5 ("l2tp: use standard API for warning log messages")
changed a number of warnings about invalid packets in the receive path
so that they are always shown, instead of only when a special L2TP debug
flag is set. Even with rate limiting these warnings can easily cause
significant log spam - potentially triggered by a malicious party
sending invalid packets on purpose.

In addition these warnings were noticed by projects like Tunneldigger [1],
which uses L2TP for its data path, but implements its own control
protocol (which is sufficiently different from L2TP data packets that it
would always be passed up to userspace even with future extensions of
L2TP).

Some of the warnings were already redundant, as l2tp_stats has a counter
for these packets. This commit adds one additional counter for invalid
packets that are passed up to userspace. Packets with unknown session are
not counted as invalid, as there is nothing wrong with the format of
these packets.

With the additional counter, all of these messages are either redundant
or benign, so we reduce them to pr_debug_ratelimited().

[1] https://github.com/wlanslovenija/tunneldigger/issues/160

Fixes: 5ee759cda5 ("l2tp: use standard API for warning log messages")
Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-03 16:55:02 -08:00
Atish Patra
b12422362c net: macb: Add default usrio config to default gem config
There is no usrio config defined for default gem config leading to
a kernel panic devices that don't define a data. This issue can be
reprdouced with microchip polar fire soc where compatible string
is defined as "cdns,macb".

Fixes: edac63861d ("add userio bits as platform configuration")

Signed-off-by: Atish Patra <atish.patra@wdc.com>
Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-03 16:53:45 -08:00
David S. Miller
ef9a6df09c Merge tag 'wireless-drivers-2021-03-03' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers
Kalle Valo says:

====================
wireless-drivers fixes for v5.12

Second set of fixes for v5.12. Only three iwlwifi fixes this time, the
crash with MVM being the most important one and reported by multiple
people.

iwlwifi

* fix kernel crash regression when using LTO with MVM devices

* fix printk format warnings

* fix potential deadlock found by lockdep
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-03 16:35:24 -08:00
John Ernberg
fc7c5c208e ALSA: usb: Add Plantronics C320-M USB ctrl msg delay quirk
The microphone in the Plantronics C320-M headset will randomly
fail to initialize properly, at least when using Microsoft Teams.
Introducing a 20ms delay on the control messages appears to
resolve the issue.

Link: https://gitlab.freedesktop.org/pulseaudio/pulseaudio/-/issues/1065
Tested-by: Andreas Kempe <kempe@lysator.liu.se>
Signed-off-by: John Ernberg <john.ernberg@actia.se>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20210303181405.39835-1-john.ernberg@actia.se
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2021-03-03 20:19:23 +01:00
Andy Shevchenko
ff70784ab9 ACPI: bus: Constify is_acpi_node() and friends (part 2)
Commit 8b9d680258 ("ACPI: Constify acpi_bus helper functions,
switch to macros") only changed functions for CONFIG_ACPI=y case.
This part adjusts the rest.

Fixes: 8b9d680258 ("ACPI: Constify acpi_bus helper functions, switch to macros")
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-03-03 19:50:47 +01:00
Leon Romanovsky
cca7f12b93 RDMA/uverbs: Fix kernel-doc warning of _uverbs_alloc
Fix the following W=1 compilation warning:

drivers/infiniband/core/uverbs_ioctl.c:108: warning: expecting prototype for uverbs_alloc(). Prototype was for _uverbs_alloc() instead

Fixes: 461bb2eee4 ("IB/uverbs: Add a simple allocator to uverbs_attr_bundle")
Link: https://lore.kernel.org/r/20210302074214.1054299-3-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-03-03 13:22:10 -04:00
Leon Romanovsky
f91803998c RDMA/mlx5: Set correct kernel-doc identifier
The W=1 allmodconfig build produces the following warning:

drivers/infiniband/hw/mlx5/odp.c:1086: warning: wrong kernel-doc identifier on line:
  * Parse a series of data segments for page fault handling.

Fix it by changing /** to be /* as it is written in kernel-doc
documentation.

Fixes: 5e769e444d ("RDMA/hw/mlx5/odp: Fix formatting and add missing descriptions in 'pagefault_data_segments()'")
Link: https://lore.kernel.org/r/20210302074214.1054299-2-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-03-03 13:21:24 -04:00
Jakub Kicinski
dbbe7c962c docs: networking: drop special stable handling
Leave it to Greg.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-03 08:49:08 -08:00
Ong Boon Leong
879c348c35 net: stmmac: fix incorrect DMA channel intr enable setting of EQoS v4.10
We introduce dwmac410_dma_init_channel() here for both EQoS v4.10 and
above which use different DMA_CH(n)_Interrupt_Enable bit definitions for
NIE and AIE.

Fixes: 48863ce594 ("stmmac: add DMA support for GMAC 4.xx")
Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: Ramesh Babu B <ramesh.babu.b@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-03 08:48:12 -08:00
Michal Suchanek
6881b07fdd ibmvnic: Fix possibly uninitialized old_num_tx_queues variable warning.
GCC 7.5 reports:
../drivers/net/ethernet/ibm/ibmvnic.c: In function 'ibmvnic_reset_init':
../drivers/net/ethernet/ibm/ibmvnic.c:5373:51: warning: 'old_num_tx_queues' may be used uninitialized in this function [-Wmaybe-uninitialized]
../drivers/net/ethernet/ibm/ibmvnic.c:5373:6: warning: 'old_num_rx_queues' may be used uninitialized in this function [-Wmaybe-uninitialized]

The variable is initialized only if(reset) and used only if(reset &&
something) so this is a false positive. However, there is no reason to
not initialize the variables unconditionally avoiding the warning.

Fixes: 635e442f4a ("ibmvnic: merge ibmvnic_reset_init and ibmvnic_init")
Signed-off-by: Michal Suchanek <msuchanek@suse.de>
Reviewed-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-03 08:46:24 -08:00
Dan Carpenter
2378b2c9ec octeontx2-af: cn10k: fix an array overflow in is_lmac_valid()
The value of "lmac_id" can be controlled by the user and if it is larger
then the number of bits in long then it reads outside the bitmap.
The highest valid value is less than MAX_LMAC_PER_CGX (4).

Fixes: 91c6945ea1 ("octeontx2-af: cn10k: Add RPM MAC support")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-03 08:44:24 -08:00
Jiri Kosina
295d4cd82b iwlwifi: don't call netif_napi_add() with rxq->lock held (was Re: Lockdep warning in iwl_pcie_rx_handle())
We can't call netif_napi_add() with rxq-lock held, as there is a potential
for deadlock as spotted by lockdep (see below). rxq->lock is not
protecting anything over the netif_napi_add() codepath anyway, so let's
drop it just before calling into NAPI.

 ========================================================
 WARNING: possible irq lock inversion dependency detected
 5.12.0-rc1-00002-gbada49429032 #5 Not tainted
 --------------------------------------------------------
 irq/136-iwlwifi/565 just changed the state of lock:
 ffff89f28433b0b0 (&rxq->lock){+.-.}-{2:2}, at: iwl_pcie_rx_handle+0x7f/0x960 [iwlwifi]
 but this lock took another, SOFTIRQ-unsafe lock in the past:
  (napi_hash_lock){+.+.}-{2:2}

 and interrupts could create inverse lock ordering between them.

 other info that might help us debug this:
  Possible interrupt unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   lock(napi_hash_lock);
                                local_irq_disable();
                                lock(&rxq->lock);
                                lock(napi_hash_lock);
   <Interrupt>
     lock(&rxq->lock);

  *** DEADLOCK ***

 1 lock held by irq/136-iwlwifi/565:
  #0: ffff89f2b1440170 (sync_cmd_lockdep_map){+.+.}-{0:0}, at: iwl_pcie_irq_handler+0x5/0xb30

 the shortest dependencies between 2nd lock and 1st lock:
  -> (napi_hash_lock){+.+.}-{2:2} {
     HARDIRQ-ON-W at:
                       lock_acquire+0x277/0x3d0
                       _raw_spin_lock+0x2c/0x40
                       netif_napi_add+0x14b/0x270
                       e1000_probe+0x2fe/0xee0 [e1000e]
                       local_pci_probe+0x42/0x90
                       pci_device_probe+0x10b/0x1c0
                       really_probe+0xef/0x4b0
                       driver_probe_device+0xde/0x150
                       device_driver_attach+0x4f/0x60
                       __driver_attach+0x9c/0x140
                       bus_for_each_dev+0x79/0xc0
                       bus_add_driver+0x18d/0x220
                       driver_register+0x5b/0xf0
                       do_one_initcall+0x5b/0x300
                       do_init_module+0x5b/0x21c
                       load_module+0x1dae/0x22c0
                       __do_sys_finit_module+0xad/0x110
                       do_syscall_64+0x33/0x80
                       entry_SYSCALL_64_after_hwframe+0x44/0xae
     SOFTIRQ-ON-W at:
                       lock_acquire+0x277/0x3d0
                       _raw_spin_lock+0x2c/0x40
                       netif_napi_add+0x14b/0x270
                       e1000_probe+0x2fe/0xee0 [e1000e]
                       local_pci_probe+0x42/0x90
                       pci_device_probe+0x10b/0x1c0
                       really_probe+0xef/0x4b0
                       driver_probe_device+0xde/0x150
                       device_driver_attach+0x4f/0x60
                       __driver_attach+0x9c/0x140
                       bus_for_each_dev+0x79/0xc0
                       bus_add_driver+0x18d/0x220
                       driver_register+0x5b/0xf0
                       do_one_initcall+0x5b/0x300
                       do_init_module+0x5b/0x21c
                       load_module+0x1dae/0x22c0
                       __do_sys_finit_module+0xad/0x110
                       do_syscall_64+0x33/0x80
                       entry_SYSCALL_64_after_hwframe+0x44/0xae
     INITIAL USE at:
                      lock_acquire+0x277/0x3d0
                      _raw_spin_lock+0x2c/0x40
                      netif_napi_add+0x14b/0x270
                      e1000_probe+0x2fe/0xee0 [e1000e]
                      local_pci_probe+0x42/0x90
                      pci_device_probe+0x10b/0x1c0
                      really_probe+0xef/0x4b0
                      driver_probe_device+0xde/0x150
                      device_driver_attach+0x4f/0x60
                      __driver_attach+0x9c/0x140
                      bus_for_each_dev+0x79/0xc0
                      bus_add_driver+0x18d/0x220
                      driver_register+0x5b/0xf0
                      do_one_initcall+0x5b/0x300
                      do_init_module+0x5b/0x21c
                      load_module+0x1dae/0x22c0
                      __do_sys_finit_module+0xad/0x110
                      do_syscall_64+0x33/0x80
                      entry_SYSCALL_64_after_hwframe+0x44/0xae
   }
   ... key      at: [<ffffffffae84ef38>] napi_hash_lock+0x18/0x40
   ... acquired at:
    _raw_spin_lock+0x2c/0x40
    netif_napi_add+0x14b/0x270
    _iwl_pcie_rx_init+0x1f4/0x710 [iwlwifi]
    iwl_pcie_rx_init+0x1b/0x3b0 [iwlwifi]
    iwl_trans_pcie_start_fw+0x2ac/0x6a0 [iwlwifi]
    iwl_mvm_load_ucode_wait_alive+0x116/0x460 [iwlmvm]
    iwl_run_init_mvm_ucode+0xa4/0x3a0 [iwlmvm]
    iwl_op_mode_mvm_start+0x9ed/0xbf0 [iwlmvm]
    _iwl_op_mode_start.isra.4+0x42/0x80 [iwlwifi]
    iwl_opmode_register+0x71/0xe0 [iwlwifi]
    iwl_mvm_init+0x34/0x1000 [iwlmvm]
    do_one_initcall+0x5b/0x300
    do_init_module+0x5b/0x21c
    load_module+0x1dae/0x22c0
    __do_sys_finit_module+0xad/0x110
    do_syscall_64+0x33/0x80
    entry_SYSCALL_64_after_hwframe+0x44/0xae

[ ... lockdep output trimmed .... ]

Fixes: 25edc8f259 ("iwlwifi: pcie: properly implement NAPI")
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Acked-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/nycvar.YFH.7.76.2103021134060.12405@cbobk.fhfr.pm
2021-03-03 17:59:16 +02:00
Pierre-Louis Bossart
436b265671 iwlwifi: fix ARCH=i386 compilation warnings
An unsigned long variable should rely on '%lu' format strings, not '%zd'

Fixes: a1a6a4cf49 ("iwlwifi: pnvm: implement reading PNVM from UEFI")
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Acked-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/20210302011640.1276636-1-pierre-louis.bossart@linux.intel.com
2021-03-03 17:57:33 +02:00
Wei Yongjun
a22549f127 iwlwifi: mvm: add terminate entry for dmi_system_id tables
Make sure dmi_system_id tables are NULL terminated. This crashed when LTO was enabled:

BUG: KASAN: global-out-of-bounds in dmi_check_system+0x5a/0x70
Read of size 1 at addr ffffffffc16af750 by task NetworkManager/1913

CPU: 4 PID: 1913 Comm: NetworkManager Not tainted 5.12.0-rc1+ #10057
Hardware name: LENOVO 20THCTO1WW/20THCTO1WW, BIOS N2VET27W (1.12 ) 12/21/2020
Call Trace:
 dump_stack+0x90/0xbe
 print_address_description.constprop.0+0x1d/0x140
 ? dmi_check_system+0x5a/0x70
 ? dmi_check_system+0x5a/0x70
 kasan_report.cold+0x7b/0xd4
 ? dmi_check_system+0x5a/0x70
 __asan_load1+0x4d/0x50
 dmi_check_system+0x5a/0x70
 iwl_mvm_up+0x1360/0x1690 [iwlmvm]
 ? iwl_mvm_send_recovery_cmd+0x270/0x270 [iwlmvm]
 ? setup_object.isra.0+0x27/0xd0
 ? kasan_poison+0x20/0x50
 ? ___slab_alloc.constprop.0+0x483/0x5b0
 ? mempool_kmalloc+0x17/0x20
 ? ftrace_graph_ret_addr+0x2a/0xb0
 ? kasan_poison+0x3c/0x50
 ? cfg80211_iftype_allowed+0x2e/0x90 [cfg80211]
 ? __kasan_check_write+0x14/0x20
 ? mutex_lock+0x86/0xe0
 ? __mutex_lock_slowpath+0x20/0x20
 __iwl_mvm_mac_start+0x49/0x290 [iwlmvm]
 iwl_mvm_mac_start+0x37/0x50 [iwlmvm]
 drv_start+0x73/0x1b0 [mac80211]
 ieee80211_do_open+0x53e/0xf10 [mac80211]
 ? ieee80211_check_concurrent_iface+0x266/0x2e0 [mac80211]
 ieee80211_open+0xb9/0x100 [mac80211]
 __dev_open+0x1b8/0x280

Fixes: a2ac0f48a0 ("iwlwifi: mvm: implement approved list for the PPAG feature")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Victor Michel <vic.michel.web@gmail.com>
Acked-by: Luca Coelho <luciano.coelho@intel.com>
[kvalo@codeaurora.org: improve commit log]
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/20210223140039.1708534-1-weiyongjun1@huawei.com
2021-03-03 17:56:11 +02:00
Jarkko Sakkinen
232a37ea3a tpm: Remove unintentional dump_stack() call
Somewhere along the line, probably during a rebase, an unintentional
dump_stack() got included. Revert this change.

Reported-by: Rikard Falkeborn <rikard.falkeborn@gmail.com>
Fixes: 90cba8d20f ("tpm/ppi: Constify static struct attribute_group")
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
2021-03-03 17:43:52 +02:00
Takashi Iwai
26af17722a ALSA: hda/realtek: Apply dual codec quirks for MSI Godlike X570 board
There is another MSI board (1462:cc34) that has dual Realtek codecs,
and we need to apply the existing quirk for fixing the conflicts of
Master control.

BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=211743
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20210303142346.28182-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2021-03-03 15:25:03 +01:00
Dan Carpenter
77516d25f5 rsxx: Return -EFAULT if copy_to_user() fails
The copy_to_user() function returns the number of bytes remaining but
we want to return -EFAULT to the user if it can't complete the copy.
The "st" variable only holds zero on success or negative error codes on
failure so the type should be int.

Fixes: 36f988e978 ("rsxx: Adding in debugfs entries.")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-03 06:48:44 -07:00
Werner Sembach
73e7161eab ALSA: hda/realtek: Add quirk for Intel NUC 10
This adds a new SND_PCI_QUIRK(...) and applies it to the Intel NUC 10
devices. This fixes the issue of the devices not having audio input and
output on the headset jack because the kernel does not recognize when
something is plugged in.

The new quirk was inspired by the quirk for the Intel NUC 8 devices, but
it turned out that the NUC 10 uses another pin. This information was
acquired by black box testing likely pins.

Co-developed-by: Eckhart Mohr <e.mohr@tuxedocomputers.com>
Signed-off-by: Eckhart Mohr <e.mohr@tuxedocomputers.com>
Signed-off-by: Werner Sembach <wse@tuxedocomputers.com>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20210302180414.23194-1-wse@tuxedocomputers.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2021-03-03 09:22:52 +01:00
Juergen Gross
882213990d xen: fix p2m size in dom0 for disabled memory hotplug case
Since commit 9e2369c06c ("xen: add helpers to allocate unpopulated
memory") foreign mappings are using guest physical addresses allocated
via ZONE_DEVICE functionality.

This will result in problems for the case of no balloon memory hotplug
being configured, as the p2m list will only cover the initial memory
size of the domain. Any ZONE_DEVICE allocated address will be outside
the p2m range and thus a mapping can't be established with that memory
address.

Fix that by extending the p2m size for that case. At the same time add
a check for a to be created mapping to be within the p2m limits in
order to detect errors early.

While changing a comment, remove some 32-bit leftovers.

This is XSA-369.

Fixes: 9e2369c06c ("xen: add helpers to allocate unpopulated memory")
Cc: <stable@vger.kernel.org> # 5.9
Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
2021-03-03 08:32:13 +01:00
Jan Beulich
2991397d23 xen-netback: respect gnttab_map_refs()'s return value
Commit 3194a1746e ("xen-netback: don't "handle" error by BUG()")
dropped respective a BUG_ON() without noticing that with this the
variable's value wouldn't be consumed anymore. With gnttab_set_map_op()
setting all status fields to a non-zero value, in case of an error no
slot should have a status of GNTST_okay (zero).

This is part of XSA-367.

Cc: <stable@vger.kernel.org>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/d933f495-619a-0086-5fb4-1ec3cf81a8fc@suse.com
Signed-off-by: Juergen Gross <jgross@suse.com>
2021-03-03 08:32:11 +01:00
Jan Beulich
8310b77b48 Xen/gnttab: handle p2m update errors on a per-slot basis
Bailing immediately from set_foreign_p2m_mapping() upon a p2m updating
error leaves the full batch in an ambiguous state as far as the caller
is concerned. Instead flags respective slots as bad, unmapping what
was mapped there right away.

HYPERVISOR_grant_table_op()'s return value and the individual unmap
slots' status fields get used only for a one-time - there's not much we
can do in case of a failure.

Note that there's no GNTST_enomem or alike, so GNTST_general_error gets
used.

The map ops' handle fields get overwritten just to be on the safe side.

This is part of XSA-367.

Cc: <stable@vger.kernel.org>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/96cccf5d-e756-5f53-b91a-ea269bfb9be0@suse.com
Signed-off-by: Juergen Gross <jgross@suse.com>
2021-03-03 08:32:10 +01:00
Linus Torvalds
f69d02e37a Merge tag 'misc-5.12-2021-03-02' of git://git.kernel.dk/linux-block
Pull misc fixes from Jens Axboe:
 "Two misc fixes that don't belong in other branches:

   - Fix a regression with ia64 signals, introduced by the
     TIF_NOTIFY_SIGNAL change in 5.11.

   - Fix the current swapfile regression from this merge window"

* tag 'misc-5.12-2021-03-02' of git://git.kernel.dk/linux-block:
  swap: fix swapfile read/write offset
  ia64: don't call handle_signal() unless there's actually a signal queued
2021-03-02 18:18:17 -08:00
Jens Axboe
caf6912f3f swap: fix swapfile read/write offset
We're not factoring in the start of the file for where to write and
read the swapfile, which leads to very unfortunate side effects of
writing where we should not be...

Fixes: 48d15436fd ("mm: remove get_swap_bio")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-02 17:25:46 -07:00
Jens Axboe
f5f4fc4649 ia64: don't call handle_signal() unless there's actually a signal queued
Sergei and John both reported that ia64 failed to boot in 5.11, and it
was related to signals. Turns out the ia64 signal handling is a bit odd,
it doesn't check the return value of get_signal() for whether there's a
signal to deliver or not. With the introduction of TIF_NOTIFY_SIGNAL,
then task_work could trigger it.

Fix it by only calling handle_signal() if we actually have a real signal
to deliver. This brings it in line with all other archs, too.

Fixes: b269c229b0 ("ia64: add support for TIF_NOTIFY_SIGNAL")
Reported-by: Sergei Trofimovich <slyich@gmail.com>
Reported-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Tested-by: Sergei Trofimovich <slyich@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-02 17:22:11 -07:00
Biao Huang
95b39f07a1 net: ethernet: mtk-star-emac: fix wrong unmap in RX handling
mtk_star_dma_unmap_rx() should unmap the dma_addr of old skb rather than
that of new skb.
Assign new_dma_addr to desc_data.dma_addr after all handling of old skb
ends to avoid unexpected receive side error.

Fixes: f96e9641e9 ("net: ethernet: mtk-star-emac: fix error path in RX handling")
Signed-off-by: Biao Huang <biao.huang@mediatek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-02 15:28:07 -08:00
Wong Vee Khee
fa706dce2f stmmac: intel: Fix mdio bus registration issue for TGL-H/ADL-S
On Intel platforms which consist of two Ethernet Controllers such as
TGL-H and ADL-S, a unique MDIO bus id is required for MDIO bus to be
successful registered:

[   13.076133] sysfs: cannot create duplicate filename '/class/mdio_bus/stmmac-1'
[   13.083404] CPU: 8 PID: 1898 Comm: systemd-udevd Tainted: G     U            5.11.0-net-next #106
[   13.092410] Hardware name: Intel Corporation Alder Lake Client Platform/AlderLake-S ADP-S DRR4 CRB, BIOS ADLIFSI1.R00.1494.B00.2012031421 12/03/2020
[   13.105709] Call Trace:
[   13.108176]  dump_stack+0x64/0x7c
[   13.111553]  sysfs_warn_dup+0x56/0x70
[   13.115273]  sysfs_do_create_link_sd.isra.2+0xbd/0xd0
[   13.120371]  device_add+0x4df/0x840
[   13.123917]  ? complete_all+0x2a/0x40
[   13.127636]  __mdiobus_register+0x98/0x310 [libphy]
[   13.132572]  stmmac_mdio_register+0x1c5/0x3f0 [stmmac]
[   13.137771]  ? stmmac_napi_add+0xa5/0xf0 [stmmac]
[   13.142493]  stmmac_dvr_probe+0x806/0xee0 [stmmac]
[   13.147341]  intel_eth_pci_probe+0x1cb/0x250 [dwmac_intel]
[   13.152884]  pci_device_probe+0xd2/0x150
[   13.156897]  really_probe+0xf7/0x4d0
[   13.160527]  driver_probe_device+0x5d/0x140
[   13.164761]  device_driver_attach+0x4f/0x60
[   13.168996]  __driver_attach+0xa2/0x140
[   13.172891]  ? device_driver_attach+0x60/0x60
[   13.177300]  bus_for_each_dev+0x76/0xc0
[   13.181188]  bus_add_driver+0x189/0x230
[   13.185083]  ? 0xffffffffc0795000
[   13.188446]  driver_register+0x5b/0xf0
[   13.192249]  ? 0xffffffffc0795000
[   13.195577]  do_one_initcall+0x4d/0x210
[   13.199467]  ? kmem_cache_alloc_trace+0x2ff/0x490
[   13.204228]  do_init_module+0x5b/0x21c
[   13.208031]  load_module+0x2a0c/0x2de0
[   13.211838]  ? __do_sys_finit_module+0xb1/0x110
[   13.216420]  __do_sys_finit_module+0xb1/0x110
[   13.220825]  do_syscall_64+0x33/0x40
[   13.224451]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[   13.229515] RIP: 0033:0x7fc2b1919ccd
[   13.233113] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 93 31 0c 00 f7 d8 64 89 01 48
[   13.251912] RSP: 002b:00007ffcea2e5b98 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
[   13.259527] RAX: ffffffffffffffda RBX: 0000560558920f10 RCX: 00007fc2b1919ccd
[   13.266706] RDX: 0000000000000000 RSI: 00007fc2b1a881e3 RDI: 0000000000000012
[   13.273887] RBP: 0000000000020000 R08: 0000000000000000 R09: 0000000000000000
[   13.281036] R10: 0000000000000012 R11: 0000000000000246 R12: 00007fc2b1a881e3
[   13.288183] R13: 0000000000000000 R14: 0000000000000000 R15: 00007ffcea2e5d58
[   13.295389] libphy: mii_bus stmmac-1 failed to register

Fixes: 88af9bd4ef ("stmmac: intel: Add ADL-S 1Gbps PCI IDs")
Fixes: 8450e23f14 ("stmmac: intel: Add PCI IDs for TGL-H platform")
Signed-off-by: Wong Vee Khee <vee.khee.wong@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-02 15:27:14 -08:00
Chen Jun
999340d511 ftrace: Have recordmcount use w8 to read relp->r_info in arm64_is_fake_mcount
On little endian system, Use aarch64_be(gcc v7.3) downloaded from
linaro.org to build image with CONFIG_CPU_BIG_ENDIAN = y,
CONFIG_FTRACE = y, CONFIG_DYNAMIC_FTRACE = y.

gcc will create symbols of _mcount but recordmcount can not create
mcount_loc for *.o.
aarch64_be-linux-gnu-objdump -r fs/namei.o | grep mcount
00000000000000d0 R_AARCH64_CALL26  _mcount
...
0000000000007190 R_AARCH64_CALL26  _mcount

The reason is than funciton arm64_is_fake_mcount can not work correctly.
A symbol of _mcount in *.o compiled with big endian compiler likes:
00 00 00 2d 00 00 01 1b
w(rp->r_info) will return 0x2d instead of 0x011b. Because w() takes
uint32_t as parameter, which truncates rp->r_info.

Use w8() instead w() to read relp->r_info

Link: https://lkml.kernel.org/r/20210222135840.56250-1-chenjun102@huawei.com

Fixes: ea0eada456 ("recordmcount: only record relocation of type R_AARCH64_CALL26 on arm64.")
Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Chen Jun <chenjun102@huawei.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-03-02 17:27:18 -05:00
Helge Deller
778e45d772 parisc: Enable -mlong-calls gcc option with CONFIG_COMPILE_TEST
The kernel test robot reported multiple linkage problems like this:

    hppa64-linux-ld: init/main.o(.init.text+0x56c): cannot reach printk
    init/main.o: in function `unknown_bootoption':
    (.init.text+0x56c): relocation truncated to fit: R_PARISC_PCREL22F against
	symbol `printk' defined in .text.unlikely section in kernel/printk/printk.o

There are two ways to solve it:
a) Enable the -mlong-call compiler option (CONFIG_MLONGCALLS),
b) Add long branch stub support in 64-bit linker.

While b) is the long-term solution, this patch works around the issue by
automatically enabling the CONFIG_MLONGCALLS option when
CONFIG_COMPILE_TEST is set, which indicates that a non-production kernel
(e.g. 0-day kernel) is built.

Signed-off-by: Helge Deller <deller@gmx.de>
Reported-by: kernel test robot <lkp@intel.com>
Fixes: 00e35f2b0e ("parisc: Enable -mlong-calls gcc option by default when !CONFIG_MODULES")
Cc: stable@vger.kernel.org # v5.6+
2021-03-02 21:21:17 +01:00
Dmitry Osipenko
7db688e99c pstore/ram: Rate-limit "uncorrectable error in header" message
There is a quite huge "uncorrectable error in header" flood in KMSG
on a clean system boot since there is no pstore buffer saved in RAM.
Let's silence the redundant noisy messages by rate-limiting the printk
message. Now there are maximum 10 messages printed repeatedly instead
of 35+.

Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20210302095850.30894-1-digetx@gmail.com
2021-03-02 11:52:31 -08:00
Babu Moger
9e46f6c6c9 KVM: SVM: Clear the CR4 register on reset
This problem was reported on a SVM guest while executing kexec.
Kexec fails to load the new kernel when the PCID feature is enabled.

When kexec starts loading the new kernel, it starts the process by
resetting the vCPU's and then bringing each vCPU online one by one.
The vCPU reset is supposed to reset all the register states before the
vCPUs are brought online. However, the CR4 register is not reset during
this process. If this register is already setup during the last boot,
all the flags can remain intact. The X86_CR4_PCIDE bit can only be
enabled in long mode. So, it must be enabled much later in SMP
initialization.  Having the X86_CR4_PCIDE bit set during SMP boot can
cause a boot failures.

Fix the issue by resetting the CR4 register in init_vmcb().

Signed-off-by: Babu Moger <babu.moger@amd.com>
Message-Id: <161471109108.30811.6392805173629704166.stgit@bmoger-ubuntu>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-02 14:39:11 -05:00
David Woodhouse
30b5c851af KVM: x86/xen: Add support for vCPU runstate information
This is how Xen guests do steal time accounting. The hypervisor records
the amount of time spent in each of running/runnable/blocked/offline
states.

In the Xen accounting, a vCPU is still in state RUNSTATE_running while
in Xen for a hypercall or I/O trap, etc. Only if Xen explicitly schedules
does the state become RUNSTATE_blocked. In KVM this means that even when
the vCPU exits the kvm_run loop, the state remains RUNSTATE_running.

The VMM can explicitly set the vCPU to RUNSTATE_blocked by using the
KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_CURRENT attribute, and can also use
KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_ADJUST to retrospectively add a given
amount of time to the blocked state and subtract it from the running
state.

The state_entry_time corresponds to get_kvmclock_ns() at the time the
vCPU entered the current state, and the total times of all four states
should always add up to state_entry_time.

Co-developed-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Message-Id: <20210301125309.874953-2-dwmw2@infradead.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-02 14:30:54 -05:00
David Woodhouse
7d7c5f76e5 KVM: x86/xen: Fix return code when clearing vcpu_info and vcpu_time_info
When clearing the per-vCPU shared regions, set the return value to zero
to indicate success. This was causing spurious errors to be returned to
userspace on soft reset.

Also add a paranoid BUILD_BUG_ON() for compat structure compatibility.

Fixes: 0c165b3c01 ("KVM: x86/xen: Allow reset of Xen attributes")
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Message-Id: <20210301125309.874953-1-dwmw2@infradead.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-02 14:30:54 -05:00
Aaron Lewis
6528fc0a11 selftests: kvm: Mmap the entire vcpu mmap area
The vcpu mmap area may consist of more than just the kvm_run struct.
Allocate enough space for the entire vcpu mmap area. Without this, on
x86, the PIO page, for example, will be missing.  This is problematic
when dealing with an unhandled exception from the guest as the exception
vector will be incorrectly reported as 0x0.

Message-Id: <20210210165035.3712489-1-aaronlewis@google.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Co-developed-by: Steve Rutherford <srutherford@google.com>
Signed-off-by: Aaron Lewis <aaronlewis@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-02 14:30:53 -05:00
Kai Huang
7d2cdad0da KVM: Documentation: Fix index for KVM_CAP_PPC_DAWR1
It should be 7.23 instead of 7.22, which has already been taken by
KVM_CAP_X86_BUS_LOCK_EXIT.

Signed-off-by: Kai Huang <kai.huang@intel.com>
Message-Id: <20210226094832.380394-1-kai.huang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-02 14:30:53 -05:00
Paolo Bonzini
b59b153d10 KVM: x86: allow compiling out the Xen hypercall interface
The Xen hypercall interface adds to the attack surface of the hypervisor
and will be used quite rarely.  Allow compiling it out.

Suggested-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-02 14:30:45 -05:00
Joseph Qi
4168a8d27e block/bfq: update comments and default value in docs for fifo_expire
Correct the comments since bfq_fifo_expire[0] is for async request,
while bfq_fifo_expire[1] is for sync request.
Also update docs, according the source code, the default
fifo_expire_async is 250ms, and fifo_expire_sync is 125ms.

Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-02 11:25:38 -07:00
Tian Tao
e4ef09e512 rsxx: remove unused including <linux/version.h>
Remove including <linux/version.h> that don't need it.

Signed-off-by: Tian Tao <tiantao6@hisilicon.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-02 10:40:27 -07:00
Hui Wang
13046370c4 ALSA: hda/hdmi: let new platforms assign the pcm slot dynamically
If the platform set the dyn_pcm_assign to true, it will call
hdmi_find_pcm_slot() to find a pcm slot when hdmi/dp monitor is
connected and need to create a pcm.

So far only intel_hsw_common_init() and patch_nvhdmi() set the
dyn_pcm_assign to true, here we let tgl platforms assign the pcm slot
dynamically first, if the driver runs for a period of time and there
is no regression reported, we could set no_fixed_assgin to true in
the intel_hsw_common_init(), and then set it to true in the
patch_nvhdmi().

This change comes from the discussion between Takashi and
Kai Vehmanen. Please refer to:
https://github.com/alsa-project/alsa-lib/pull/118

Suggested-and-reviewed-by: Takashi Iwai <tiwai@suse.de>
Suggested-and-reviewed-by: Kai Vehmanen <kai.vehmanen@linux.intel.com>
Signed-off-by: Hui Wang <hui.wang@canonical.com>
Link: https://lore.kernel.org/r/20210301111202.2684-1-hui.wang@canonical.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2021-03-02 18:35:56 +01:00
Eckhart Mohr
48698c973e ALSA: hda/realtek: Add quirk for Clevo NH55RZQ
This applies a SND_PCI_QUIRK(...) to the Clevo NH55RZQ barebone. This
fixes the issue of the device not recognizing a pluged in microphone.

The device has both, a microphone only jack, and a speaker + microphone
combo jack. The combo jack already works. The microphone-only jack does
not recognize when a device is pluged in without this patch.

Signed-off-by: Eckhart Mohr <e.mohr@tuxedocomputers.com>
Co-developed-by: Werner Sembach <wse@tuxedocomputers.com>
Signed-off-by: Werner Sembach <wse@tuxedocomputers.com>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/0eee6545-5169-ef08-6cfa-5def8cd48c86@tuxedocomputers.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2021-03-02 18:32:14 +01:00
Takashi Iwai
9b838a3c32 Merge tag 'tags/sound-sdw-kconfig-fixes' into for-linus
ALSA/ASoC/SOF/SoundWire: fix Kconfig issues

In January, Intel kbuild bot and Arnd Bergmann reported multiple
issues with randconfig. This patchset builds on Arnd's suggestions to

a) expose ACPI and PCI devices in separate modules, while sof-acpi-dev
and sof-pci-dev become helpers. This will result in minor changes
required for developers/testers, i.e. modprobe snd-sof-pci will no
longer result in a probe. The SOF CI was already updated to deal with
this module dependency change and introduction of new modules.

b) Fix SOF/SoundWire/DSP_config dependencies by moving the code
required to detect SoundWire presence in ACPI tables to sound/hda.

Link: https://lore.kernel.org/r/20210302003125.1178419-1-pierre-louis.bossart@linux.intel.com
2021-03-02 18:30:07 +01:00
Qu Wenruo
c28ea613fa btrfs: subpage: fix the false data csum mismatch error
[BUG]
When running fstresss, we can hit strange data csum mismatch where the
on-disk data is in fact correct (passes scrub).

With some extra debug info added, we have the following traces:

  0482us: btrfs_do_readpage: root=5 ino=284 offset=393216, submit force=0 pgoff=0 iosize=8192
  0494us: btrfs_do_readpage: root=5 ino=284 offset=401408, submit force=0 pgoff=8192 iosize=4096
  0498us: btrfs_submit_data_bio: root=5 ino=284 bio first bvec=393216 len=8192
  0591us: btrfs_do_readpage: root=5 ino=284 offset=405504, submit force=0 pgoff=12288 iosize=36864
  0594us: btrfs_submit_data_bio: root=5 ino=284 bio first bvec=401408 len=4096
  0863us: btrfs_submit_data_bio: root=5 ino=284 bio first bvec=405504 len=36864
  0933us: btrfs_verify_data_csum: root=5 ino=284 offset=393216 len=8192
  0967us: btrfs_do_readpage: root=5 ino=284 offset=442368, skip beyond isize pgoff=49152 iosize=16384
  1047us: btrfs_verify_data_csum: root=5 ino=284 offset=401408 len=4096
  1163us: btrfs_verify_data_csum: root=5 ino=284 offset=405504 len=36864
  1290us: check_data_csum: !!! root=5 ino=284 offset=438272 pg_off=45056 !!!
  7387us: end_bio_extent_readpage: root=5 ino=284 before pending_read_bios=0

[CAUSE]
Normally we expect all submitted bio reads to only touch the range we
specified, and under subpage context, it means we should only touch the
range specified in each bvec.

But in data read path, inside end_bio_extent_readpage(), we have page
zeroing which only takes regular page size into consideration.

This means for subpage if we have an inode whose content looks like below:

  0       16K     32K     48K     64K
  |///////|       |///////|       |

  |//| = data needs to be read from disk
  |  | = hole

And i_size is 64K initially.

Then the following race can happen:

		T1		|		T2
--------------------------------+--------------------------------
btrfs_do_readpage()		|
|- isize = 64K;			|
|  At this time, the isize is 	|
|  64K				|
|				|
|- submit_extent_page()		|
|  submit previous assembled bio|
|  assemble bio for [0, 16K)	|
|				|
|- submit_extent_page()		|
   submit read bio for [0, 16K) |
   assemble read bio for	|
   [32K, 48K)			|
 				|
				| btrfs_setsize()
				| |- i_size_write(, 16K);
				|    Now i_size is only 16K
end_io() for [0K, 16K)		|
|- end_bio_extent_readpage()	|
   |- btrfs_verify_data_csum()  |
   |  No csum error		|
   |- i_size = 16K;		|
   |- zero_user_segment(16K,	|
      PAGE_SIZE);		|
      !!! We zeroed range	|
      !!! [32K, 48K)		|
				| end_io for [32K, 48K)
				| |- end_bio_extent_readpage()
				|    |- btrfs_verify_data_csum()
				|       ! CSUM MISMATCH !
				|       ! As the range is zeroed now !

[FIX]
To fix the problem, make end_bio_extent_readpage() to only zero the
range of bvec.

The bug only affects subpage read-write support, as for full read-only
mount we can't change i_size thus won't hit the race condition.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-03-02 17:48:00 +01:00
Filipe Manana
fd57a98d6f btrfs: fix warning when creating a directory with smack enabled
When we have smack enabled, during the creation of a directory smack may
attempt to add a "smack transmute" xattr on the inode, which results in
the following warning and trace:

  WARNING: CPU: 3 PID: 2548 at fs/btrfs/transaction.c:537 start_transaction+0x489/0x4f0
  Modules linked in: nft_objref nf_conntrack_netbios_ns (...)
  CPU: 3 PID: 2548 Comm: mkdir Not tainted 5.9.0-rc2smack+ #81
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
  RIP: 0010:start_transaction+0x489/0x4f0
  Code: e9 be fc ff ff (...)
  RSP: 0018:ffffc90001887d10 EFLAGS: 00010202
  RAX: ffff88816f1e0000 RBX: 0000000000000201 RCX: 0000000000000003
  RDX: 0000000000000201 RSI: 0000000000000002 RDI: ffff888177849000
  RBP: ffff888177849000 R08: 0000000000000001 R09: 0000000000000004
  R10: ffffffff825e8f7a R11: 0000000000000003 R12: ffffffffffffffe2
  R13: 0000000000000000 R14: ffff88803d884270 R15: ffff8881680d8000
  FS:  00007f67317b8440(0000) GS:ffff88817bcc0000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007f67247a22a8 CR3: 000000004bfbc002 CR4: 0000000000370ee0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
   ? slab_free_freelist_hook+0xea/0x1b0
   ? trace_hardirqs_on+0x1c/0xe0
   btrfs_setxattr_trans+0x3c/0xf0
   __vfs_setxattr+0x63/0x80
   smack_d_instantiate+0x2d3/0x360
   security_d_instantiate+0x29/0x40
   d_instantiate_new+0x38/0x90
   btrfs_mkdir+0x1cf/0x1e0
   vfs_mkdir+0x14f/0x200
   do_mkdirat+0x6d/0x110
   do_syscall_64+0x2d/0x40
   entry_SYSCALL_64_after_hwframe+0x44/0xa9
  RIP: 0033:0x7f673196ae6b
  Code: 8b 05 11 (...)
  RSP: 002b:00007ffc3c679b18 EFLAGS: 00000246 ORIG_RAX: 0000000000000053
  RAX: ffffffffffffffda RBX: 00000000000001ff RCX: 00007f673196ae6b
  RDX: 0000000000000000 RSI: 00000000000001ff RDI: 00007ffc3c67a30d
  RBP: 00007ffc3c67a30d R08: 00000000000001ff R09: 0000000000000000
  R10: 000055d3e39fe930 R11: 0000000000000246 R12: 0000000000000000
  R13: 00007ffc3c679cd8 R14: 00007ffc3c67a30d R15: 00007ffc3c679ce0
  irq event stamp: 11029
  hardirqs last  enabled at (11037): [<ffffffff81153fe6>] console_unlock+0x486/0x670
  hardirqs last disabled at (11044): [<ffffffff81153c01>] console_unlock+0xa1/0x670
  softirqs last  enabled at (8864): [<ffffffff81e0102f>] asm_call_on_stack+0xf/0x20
  softirqs last disabled at (8851): [<ffffffff81e0102f>] asm_call_on_stack+0xf/0x20

This happens because at btrfs_mkdir() we call d_instantiate_new() while
holding a transaction handle, which results in the following call chain:

  btrfs_mkdir()
     trans = btrfs_start_transaction(root, 5);

     d_instantiate_new()
        smack_d_instantiate()
            __vfs_setxattr()
                btrfs_setxattr_trans()
                   btrfs_start_transaction()
                      start_transaction()
                         WARN_ON()
                           --> a tansaction start has TRANS_EXTWRITERS
                               set in its type
                         h->orig_rsv = h->block_rsv
                         h->block_rsv = NULL

     btrfs_end_transaction(trans)

Besides the warning triggered at start_transaction, we set the handle's
block_rsv to NULL which may cause some surprises later on.

So fix this by making btrfs_setxattr_trans() not start a transaction when
we already have a handle on one, stored in current->journal_info, and use
that handle. We are good to use the handle because at btrfs_mkdir() we did
reserve space for the xattr and the inode item.

Reported-by: Casey Schaufler <casey@schaufler-ca.com>
CC: stable@vger.kernel.org # 5.4+
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Tested-by: Casey Schaufler <casey@schaufler-ca.com>
Link: https://lore.kernel.org/linux-btrfs/434d856f-bd7b-4889-a6ec-e81aaebfa735@schaufler-ca.com/
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-03-02 17:47:56 +01:00
Nikolay Borisov
4d14c5cde5 btrfs: don't flush from btrfs_delayed_inode_reserve_metadata
Calling btrfs_qgroup_reserve_meta_prealloc from
btrfs_delayed_inode_reserve_metadata can result in flushing delalloc
while holding a transaction and delayed node locks. This is deadlock
prone. In the past multiple commits:

 * ae5e070eac ("btrfs: qgroup: don't try to wait flushing if we're
already holding a transaction")

 * 6f23277a49 ("btrfs: qgroup: don't commit transaction when we already
 hold the handle")

Tried to solve various aspects of this but this was always a
whack-a-mole game. Unfortunately those 2 fixes don't solve a deadlock
scenario involving btrfs_delayed_node::mutex. Namely, one thread
can call btrfs_dirty_inode as a result of reading a file and modifying
its atime:

  PID: 6963   TASK: ffff8c7f3f94c000  CPU: 2   COMMAND: "test"
  #0  __schedule at ffffffffa529e07d
  #1  schedule at ffffffffa529e4ff
  #2  schedule_timeout at ffffffffa52a1bdd
  #3  wait_for_completion at ffffffffa529eeea             <-- sleeps with delayed node mutex held
  #4  start_delalloc_inodes at ffffffffc0380db5
  #5  btrfs_start_delalloc_snapshot at ffffffffc0393836
  #6  try_flush_qgroup at ffffffffc03f04b2
  #7  __btrfs_qgroup_reserve_meta at ffffffffc03f5bb6     <-- tries to reserve space and starts delalloc inodes.
  #8  btrfs_delayed_update_inode at ffffffffc03e31aa      <-- acquires delayed node mutex
  #9  btrfs_update_inode at ffffffffc0385ba8
 #10  btrfs_dirty_inode at ffffffffc038627b               <-- TRANSACTIION OPENED
 #11  touch_atime at ffffffffa4cf0000
 #12  generic_file_read_iter at ffffffffa4c1f123
 #13  new_sync_read at ffffffffa4ccdc8a
 #14  vfs_read at ffffffffa4cd0849
 #15  ksys_read at ffffffffa4cd0bd1
 #16  do_syscall_64 at ffffffffa4a052eb
 #17  entry_SYSCALL_64_after_hwframe at ffffffffa540008c

This will cause an asynchronous work to flush the delalloc inodes to
happen which can try to acquire the same delayed_node mutex:

  PID: 455    TASK: ffff8c8085fa4000  CPU: 5   COMMAND: "kworker/u16:30"
  #0  __schedule at ffffffffa529e07d
  #1  schedule at ffffffffa529e4ff
  #2  schedule_preempt_disabled at ffffffffa529e80a
  #3  __mutex_lock at ffffffffa529fdcb                    <-- goes to sleep, never wakes up.
  #4  btrfs_delayed_update_inode at ffffffffc03e3143      <-- tries to acquire the mutex
  #5  btrfs_update_inode at ffffffffc0385ba8              <-- this is the same inode that pid 6963 is holding
  #6  cow_file_range_inline.constprop.78 at ffffffffc0386be7
  #7  cow_file_range at ffffffffc03879c1
  #8  btrfs_run_delalloc_range at ffffffffc038894c
  #9  writepage_delalloc at ffffffffc03a3c8f
 #10  __extent_writepage at ffffffffc03a4c01
 #11  extent_write_cache_pages at ffffffffc03a500b
 #12  extent_writepages at ffffffffc03a6de2
 #13  do_writepages at ffffffffa4c277eb
 #14  __filemap_fdatawrite_range at ffffffffa4c1e5bb
 #15  btrfs_run_delalloc_work at ffffffffc0380987         <-- starts running delayed nodes
 #16  normal_work_helper at ffffffffc03b706c
 #17  process_one_work at ffffffffa4aba4e4
 #18  worker_thread at ffffffffa4aba6fd
 #19  kthread at ffffffffa4ac0a3d
 #20  ret_from_fork at ffffffffa54001ff

To fully address those cases the complete fix is to never issue any
flushing while holding the transaction or the delayed node lock. This
patch achieves it by calling qgroup_reserve_meta directly which will
either succeed without flushing or will fail and return -EDQUOT. In the
latter case that return value is going to be propagated to
btrfs_dirty_inode which will fallback to start a new transaction. That's
fine as the majority of time we expect the inode will have
BTRFS_DELAYED_NODE_INODE_DIRTY flag set which will result in directly
copying the in-memory state.

Fixes: c53e965360 ("btrfs: qgroup: try to flush qgroup space when we get -EDQUOT")
CC: stable@vger.kernel.org # 5.10+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-03-02 17:17:09 +01:00
Nikolay Borisov
80e9baed72 btrfs: export and rename qgroup_reserve_meta
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-03-02 16:58:30 +01:00
Nikolay Borisov
0f9c03d824 btrfs: free correct amount of space in btrfs_delayed_inode_reserve_metadata
Following commit f218ea6c47 ("btrfs: delayed-inode: Remove wrong
qgroup meta reservation calls") this function now reserves num_bytes,
rather than the fixed amount of nodesize. As such this requires the
same amount to be freed in case of failure. Fix this by adjusting
the amount we are freeing.

Fixes: f218ea6c47 ("btrfs: delayed-inode: Remove wrong qgroup meta reservation calls")
CC: stable@vger.kernel.org # 4.19+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-03-02 16:56:04 +01:00
Boris Burkov
c55a4319c4 btrfs: fix spurious free_space_tree remount warning
The intended logic of the check is to catch cases where the desired
free_space_tree setting doesn't match the mounted setting, and the
remount is anything but ro->rw. However, it makes the mistake of
checking equality on a masked integer (btrfs_test_opt) against a boolean
(btrfs_fs_compat_ro).

If you run the reproducer:
  $ mount -o space_cache=v2 dev mnt
  $ mount -o remount,ro mnt

you would expect no warning, because the remount is not attempting to
change the free space tree setting, but we do see the warning.

To fix this, add explicit bool type casts to the condition.

I tested a variety of transitions:
sudo mount -o space_cache=v2 /dev/vg0/lv0 mnt/lol
(fst enabled)
mount -o remount,ro mnt/lol
(no warning, no fst change)
sudo mount -o remount,rw,space_cache=v1,clear_cache
(no warning, ro->rw)
sudo mount -o remount,rw,space_cache=v2 mnt
(warning, rw->rw with change)
sudo mount -o remount,ro mnt
(no warning, no fst change)
sudo mount -o remount,rw,space_cache=v2 mnt
(no warning, no fst change)

Reported-by: Chris Murphy <lists@colorremedies.com>
CC: stable@vger.kernel.org # 5.11
Signed-off-by: Boris Burkov <boris@bur.io>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-03-02 16:55:55 +01:00
Dan Carpenter
5011c5a663 btrfs: validate qgroup inherit for SNAP_CREATE_V2 ioctl
The problem is we're copying "inherit" from user space but we don't
necessarily know that we're copying enough data for a 64 byte
struct.  Then the next problem is that 'inherit' has a variable size
array at the end, and we have to verify that array is the size we
expected.

Fixes: 6f72c7e20d ("Btrfs: add qgroup inheritance")
CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-03-02 16:55:47 +01:00
Nikolay Borisov
4f6a49de64 btrfs: unlock extents in btrfs_zero_range in case of quota reservation errors
If btrfs_qgroup_reserve_data returns an error (i.e quota limit reached)
the handling logic directly goes to the 'out' label without first
unlocking the extent range between lockstart, lockend. This results in
deadlocks as other processes try to lock the same extent.

Fixes: a7f8b1c2ac ("btrfs: file: reserve qgroup space after the hole punch range is locked")
CC: stable@vger.kernel.org # 5.10+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-03-02 16:55:44 +01:00
Randy Dunlap
aedb9d9089 btrfs: ref-verify: use 'inline void' keyword ordering
Fix build warnings of function signature when CONFIG_STACKTRACE is not
enabled by reordering the 'inline' and 'void' keywords.

../fs/btrfs/ref-verify.c:221:1: warning: ‘inline’ is not at beginning of declaration [-Wold-style-declaration]
 static void inline __save_stack_trace(struct ref_action *ra)
../fs/btrfs/ref-verify.c:225:1: warning: ‘inline’ is not at beginning of declaration [-Wold-style-declaration]
 static void inline __print_stack_trace(struct btrfs_fs_info *fs_info,

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-03-02 16:55:40 +01:00
Pablo Neira Ayuso
9cc0001a18 netfilter: nftables: disallow updates on table ownership
Disallow updating the ownership bit on an existing table: Do not allow
to grab ownership on an existing table. Do not allow to drop ownership
on an existing table.

Fixes: 6001a930ce ("netfilter: nftables: introduce table ownership")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-03-02 16:37:22 +01:00
Pierre-Louis Bossart
ffd7e705fa ALSA: hda: intel-sdw-acpi: add missing include files
We rely on implicit includes, list out explicitly what this code
relies on.

Suggested-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com>
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Reviewed-by: Kai Vehmanen <kai.vehmanen@linux.intel.com>
Reviewed-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com>
Reviewed-by: Bard Liao <bard.liao@intel.com>
Acked-by: Mark Brown <broonie@kernel.org>
Acked-by: Vinod Koul <vkoul@kernel.org>
Link: https://lore.kernel.org/r/20210302003125.1178419-8-pierre-louis.bossart@linux.intel.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2021-03-02 15:33:17 +01:00
Pierre-Louis Bossart
08c2a4bc9f ALSA: hda: move Intel SoundWire ACPI scan to dedicated module
The ACPI scan capabilities is called from the intel-dspconfig as well
as the SOF/HDaudio drivers. This creates dependencies and randconfig issues
when HDaudio and SOF/SoundWire are not all configured as modules.

To simplify Kconfig dependencies between HDAudio, SoundWire, SOF and
intel-dspconfig, move the ACPI scan helpers to a dedicated
module. This follows the same idea as NHLT helpers which are already
handled as a dedicated module.

The only functional change is that the kernel parameter to filter
links is now handled by a different module, but that was only provided
for developers needing work-arounds for early BIOS releases.

Reported-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Reviewed-by: Kai Vehmanen <kai.vehmanen@linux.intel.com>
Reviewed-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com>
Reviewed-by: Bard Liao <bard.liao@intel.com>
Acked-by: Mark Brown <broonie@kernel.org>
Acked-by: Vinod Koul <vkoul@kernel.org>
Link: https://lore.kernel.org/r/20210302003125.1178419-7-pierre-louis.bossart@linux.intel.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2021-03-02 15:33:00 +01:00
Pierre-Louis Bossart
cf5807f5f8 ASoC: SOF: Intel: SoundWire: simplify Kconfig
The Kconfig file is way too convoluted. Track platforms where
SoundWire is supported, and add simpler conditions to make sure there
is no module/built-in issue.

The use of 'depends on' is less intuitive if a required 'depend' is
missing, but that's a small price to pay for clarity and simplicity.

Suggested-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Reviewed-by: Kai Vehmanen <kai.vehmanen@linux.intel.com>
Reviewed-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com>
Reviewed-by: Bard Liao <bard.liao@intel.com>
Acked-by: Mark Brown <broonie@kernel.org>
Acked-by: Vinod Koul <vkoul@kernel.org>
Link: https://lore.kernel.org/r/20210302003125.1178419-6-pierre-louis.bossart@linux.intel.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2021-03-02 15:32:43 +01:00
Pierre-Louis Bossart
194fe0fc34 ASoC: SOF: pci: move DSP_CONFIG use to platform-specific drivers
There is no reason why we should call the intel_dspcfg helpers from
common code, this should be moved in Intel-specific code and only
called from platforms where a conflict may occur with the HDaudio or
SST/Skylake driver.

Suggested-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Reviewed-by: Kai Vehmanen <kai.vehmanen@linux.intel.com>
Reviewed-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com>
Reviewed-by: Bard Liao <bard.liao@intel.com>
Acked-by: Mark Brown <broonie@kernel.org>
Acked-by: Vinod Koul <vkoul@kernel.org>
Link: https://lore.kernel.org/r/20210302003125.1178419-5-pierre-louis.bossart@linux.intel.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2021-03-02 15:32:20 +01:00
Pierre-Louis Bossart
8d4ba1be3d ASoC: SOF: pci: split PCI into different drivers
Move PCI IDs and device-specific definitions out of common code. No
functionality change for now, just code split and removal of
IF_ENABLED() which made the configurations too complicated in case of
reuse of IP across generations.

Additional changes to address the DSP_CONFIG case and SoundWire
depends/select confusions are provided in follow-up patches.

Suggested-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Reviewed-by: Kai Vehmanen <kai.vehmanen@linux.intel.com>
Reviewed-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com>
Reviewed-by: Bard Liao <bard.liao@intel.com>
Acked-by: Mark Brown <broonie@kernel.org>
Acked-by: Vinod Koul <vkoul@kernel.org>
Link: https://lore.kernel.org/r/20210302003125.1178419-4-pierre-louis.bossart@linux.intel.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2021-03-02 15:32:00 +01:00
Arnd Bergmann
8a49cd11e6 ASoC: SOF: ACPI: avoid reverse module dependency
The SOF-ACPI driver is backwards from the normal Linux model, it has a
generic driver that knows about all the specific drivers, as opposed to
having hardware specific drivers that link against a common framework.

This requires ugly Kconfig magic and leads to missed dependencies as
seen in this link error:

arm-linux-gnueabi-ld: sound/soc/sof/sof-pci-dev.o: in function `sof_acpi_probe':
sof-pci-dev.c:(.text+0x1c): undefined reference to `snd_intel_dsp_driver_probe'

Change it to use the normal probe order of starting with a specific
device in a driver, turning the sof-acpi-dev.c driver into a
library (exported symbols are name-spaced to avoid symbol pollution).

For backwards-compatibility with previous Kconfigs, the default values
for platform drivers uses the top-level ACPI configurations. The
modules were also renamed to allow for gradual transitions in test
scripts.

Co-developed-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Kai Vehmanen <kai.vehmanen@linux.intel.com>
Reviewed-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com>
Reviewed-by: Bard Liao <bard.liao@intel.com>
Acked-by: Mark Brown <broonie@kernel.org>
Acked-by: Vinod Koul <vkoul@kernel.org>
Link: https://lore.kernel.org/r/20210302003125.1178419-3-pierre-louis.bossart@linux.intel.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2021-03-02 15:31:40 +01:00
Pierre-Louis Bossart
c7929b15b6 ASoC: soc-acpi: allow for partial match in parent name
To change the module dependencies and simplify Kconfigs, we need to
introduce new driver names (sof-audio-acpi-intel-byt and
sof-audio-acpi-intel-bdw), and move from an exact string match to a
partial one.

Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Reviewed-by: Kai Vehmanen <kai.vehmanen@linux.intel.com>
Reviewed-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com>
Reviewed-by: Bard Liao <bard.liao@intel.com>
Acked-by: Mark Brown <broonie@kernel.org>
Acked-by: Vinod Koul <vkoul@kernel.org>
Link: https://lore.kernel.org/r/20210302003125.1178419-2-pierre-louis.bossart@linux.intel.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2021-03-02 15:31:06 +01:00
Ben Skeggs
78652ff69b drm/nouveau/fifo/gk104-gp1xx: fix creation of sw class
Fixes: 496162037c ("drm/nouveau/fifo: add id_engine hook")
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2021-03-02 21:48:42 +10:00
Jordan Niethe
5c88a17e15 powerpc/sstep: Fix VSX instruction emulation
Commit af99da7433 ("powerpc/sstep: Support VSX vector paired storage
access instructions") added loading and storing 32 word long data into
adjacent VSRs. However the calculation used to determine if two VSRs
needed to be loaded/stored inadvertently prevented the load/storing
taking place for instructions with a data length less than 16 words.

This causes the emulation to not function correctly, which can be seen
by the alignment_handler selftest:

$ ./alignment_handler
[snip]
test: test_alignment_handler_vsx_207
tags: git_version:powerpc-5.12-1-0-g82d2c16b350f
VSX: 2.07B
        Doing lxsspx:   PASSED
        Doing lxsiwax:  FAILED: Wrong Data
        Doing lxsiwzx:  PASSED
        Doing stxsspx:  PASSED
        Doing stxsiwx:  PASSED
failure: test_alignment_handler_vsx_207
test: test_alignment_handler_vsx_300
tags: git_version:powerpc-5.12-1-0-g82d2c16b350f
VSX: 3.00B
        Doing lxsd:     PASSED
        Doing lxsibzx:  PASSED
        Doing lxsihzx:  PASSED
        Doing lxssp:    FAILED: Wrong Data
        Doing lxv:      PASSED
        Doing lxvb16x:  PASSED
        Doing lxvh8x:   PASSED
        Doing lxvx:     PASSED
        Doing lxvwsx:   FAILED: Wrong Data
        Doing lxvl:     PASSED
        Doing lxvll:    PASSED
        Doing stxsd:    PASSED
        Doing stxsibx:  PASSED
        Doing stxsihx:  PASSED
        Doing stxssp:   PASSED
        Doing stxv:     PASSED
        Doing stxvb16x: PASSED
        Doing stxvh8x:  PASSED
        Doing stxvx:    PASSED
        Doing stxvl:    PASSED
        Doing stxvll:   PASSED
failure: test_alignment_handler_vsx_300
[snip]

Fix this by making sure all VSX instruction emulation correctly
load/store from the VSRs.

Fixes: af99da7433 ("powerpc/sstep: Support VSX vector paired storage access instructions")
Signed-off-by: Jordan Niethe <jniethe5@gmail.com>
Reviewed-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210225031946.1458206-1-jniethe5@gmail.com
2021-03-02 22:41:51 +11:00
Athira Rajeev
5ae5fbd210 powerpc/perf: Fix handling of privilege level checks in perf interrupt context
Running "perf mem record" in powerpc platforms with selinux enabled
resulted in soft lockup's. Below call-trace was seen in the logs:

  CPU: 58 PID: 3751 Comm: sssd_nss Not tainted 5.11.0-rc7+ #2
  NIP:  c000000000dff3d4 LR: c000000000dff3d0 CTR: 0000000000000000
  REGS: c000007fffab7d60 TRAP: 0100   Not tainted  (5.11.0-rc7+)
  ...
  NIP _raw_spin_lock_irqsave+0x94/0x120
  LR  _raw_spin_lock_irqsave+0x90/0x120
  Call Trace:
    0xc00000000fd47260 (unreliable)
    skb_queue_tail+0x3c/0x90
    audit_log_end+0x6c/0x180
    common_lsm_audit+0xb0/0xe0
    slow_avc_audit+0xa4/0x110
    avc_has_perm+0x1c4/0x260
    selinux_perf_event_open+0x74/0xd0
    security_perf_event_open+0x68/0xc0
    record_and_restart+0x6e8/0x7f0
    perf_event_interrupt+0x22c/0x560
    performance_monitor_exception0x4c/0x60
    performance_monitor_common_virt+0x1c8/0x1d0
  interrupt: f00 at _raw_spin_lock_irqsave+0x38/0x120
  NIP:  c000000000dff378 LR: c000000000b5fbbc CTR: c0000000007d47f0
  REGS: c00000000fd47860 TRAP: 0f00   Not tainted  (5.11.0-rc7+)
  ...
  NIP _raw_spin_lock_irqsave+0x38/0x120
  LR  skb_queue_tail+0x3c/0x90
  interrupt: f00
    0x38 (unreliable)
    0xc00000000aae6200
    audit_log_end+0x6c/0x180
    audit_log_exit+0x344/0xf80
    __audit_syscall_exit+0x2c0/0x320
    do_syscall_trace_leave+0x148/0x200
    syscall_exit_prepare+0x324/0x390
    system_call_common+0xfc/0x27c

The above trace shows that while the CPU was handling a performance
monitor exception, there was a call to security_perf_event_open()
function. In powerpc core-book3s, this function is called from
perf_allow_kernel() check during recording of data address in the
sample via perf_get_data_addr().

Commit da97e18458 ("perf_event: Add support for LSM and SELinux
checks") introduced security enhancements to perf. As part of this
commit, the new security hook for perf_event_open() was added in all
places where perf paranoid check was previously used. In powerpc
core-book3s code, originally had paranoid checks in
perf_get_data_addr() and power_pmu_bhrb_read(). So
perf_paranoid_kernel() checks were replaced with perf_allow_kernel()
in these PMU helper functions as well.

The intention of paranoid checks in core-book3s was to verify
privilege access before capturing some of the sample data. Along with
paranoid checks, perf_allow_kernel() also does a
security_perf_event_open(). Since these functions are accessed while
recording a sample, we end up calling selinux_perf_event_open() in PMI
context. Some of the security functions use spinlock like
sidtab_sid2str_put(). If a perf interrupt hits under a spin lock and
if we end up in calling selinux hook functions in PMI handler, this
could cause a dead lock.

Since the purpose of this security hook is to control access to
perf_event_open(), it is not right to call this in interrupt context.

The paranoid checks in powerpc core-book3s were done at interrupt time
which is also not correct.

Reference commits:
  Commit cd1231d703 ("powerpc/perf: Prevent kernel address leak via perf_get_data_addr()")
  Commit bb19af8160 ("powerpc/perf: Prevent kernel address leak to userspace via BHRB buffer")

We only allow creation of events that have already passed the
privilege checks in perf_event_open(). So these paranoid checks are
not needed at event time. As a fix, patch uses
'event->attr.exclude_kernel' check to prevent exposing kernel address
for userspace only sampling.

Fixes: cd1231d703 ("powerpc/perf: Prevent kernel address leak via perf_get_data_addr()")
Cc: stable@vger.kernel.org # v4.17+
Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1614247839-1428-1-git-send-email-atrajeev@linux.vnet.ibm.com
2021-03-02 22:41:51 +11:00
Christophe Leroy
acdad8fb4a powerpc: Force inlining of mmu_has_feature to fix build failure
The test robot has managed to generate a random config leading
to following build failure:

  LD      .tmp_vmlinux.kallsyms1
powerpc64-linux-ld: arch/powerpc/mm/pgtable.o: in function `ptep_set_access_flags':
pgtable.c:(.text.ptep_set_access_flags+0xf0): undefined reference to `hash__flush_tlb_page'
powerpc64-linux-ld: arch/powerpc/mm/book3s32/mmu.o: in function `MMU_init_hw_patch':
mmu.c:(.init.text+0x452): undefined reference to `patch__hash_page_A0'
powerpc64-linux-ld: mmu.c:(.init.text+0x45e): undefined reference to `patch__hash_page_A0'
powerpc64-linux-ld: mmu.c:(.init.text+0x46a): undefined reference to `patch__hash_page_A1'
powerpc64-linux-ld: mmu.c:(.init.text+0x476): undefined reference to `patch__hash_page_A1'
powerpc64-linux-ld: mmu.c:(.init.text+0x482): undefined reference to `patch__hash_page_A2'
powerpc64-linux-ld: mmu.c:(.init.text+0x48e): undefined reference to `patch__hash_page_A2'
powerpc64-linux-ld: mmu.c:(.init.text+0x49e): undefined reference to `patch__hash_page_B'
powerpc64-linux-ld: mmu.c:(.init.text+0x4aa): undefined reference to `patch__hash_page_B'
powerpc64-linux-ld: mmu.c:(.init.text+0x4b6): undefined reference to `patch__hash_page_C'
powerpc64-linux-ld: mmu.c:(.init.text+0x4c2): undefined reference to `patch__hash_page_C'
powerpc64-linux-ld: mmu.c:(.init.text+0x4ce): undefined reference to `patch__flush_hash_A0'
powerpc64-linux-ld: mmu.c:(.init.text+0x4da): undefined reference to `patch__flush_hash_A0'
powerpc64-linux-ld: mmu.c:(.init.text+0x4e6): undefined reference to `patch__flush_hash_A1'
powerpc64-linux-ld: mmu.c:(.init.text+0x4f2): undefined reference to `patch__flush_hash_A1'
powerpc64-linux-ld: mmu.c:(.init.text+0x4fe): undefined reference to `patch__flush_hash_A2'
powerpc64-linux-ld: mmu.c:(.init.text+0x50a): undefined reference to `patch__flush_hash_A2'
powerpc64-linux-ld: mmu.c:(.init.text+0x522): undefined reference to `patch__flush_hash_B'
powerpc64-linux-ld: mmu.c:(.init.text+0x532): undefined reference to `patch__flush_hash_B'
powerpc64-linux-ld: arch/powerpc/mm/book3s32/mmu.o: in function `update_mmu_cache':
mmu.c:(.text.update_mmu_cache+0xa0): undefined reference to `add_hash_page'
powerpc64-linux-ld: mm/memory.o: in function `zap_pte_range':
memory.c:(.text.zap_pte_range+0x160): undefined reference to `flush_hash_pages'
powerpc64-linux-ld: mm/memory.o: in function `handle_pte_fault':
memory.c:(.text.handle_pte_fault+0x180): undefined reference to `hash__flush_tlb_page'

This is due to mmu_has_feature() not being inlined. See extract of build of
mmu.c with -Winline:

In file included from ./include/linux/mm_types.h:19,
                 from ./include/linux/mmzone.h:21,
                 from ./include/linux/gfp.h:6,
                 from ./include/linux/mm.h:10,
                 from arch/powerpc/mm/book3s32/mmu.c:21:
./arch/powerpc/include/asm/mmu.h: In function 'find_free_bat':
./arch/powerpc/include/asm/mmu.h:231:20: warning: inlining failed in call to 'early_mmu_has_feature': call is unlikely and code size would grow [-Winline]
  231 | static inline bool early_mmu_has_feature(unsigned long feature)
      |                    ^~~~~~~~~~~~~~~~~~~~~
./arch/powerpc/include/asm/mmu.h:291:9: note: called from here
  291 |  return early_mmu_has_feature(feature);
      |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The code relies on constant folding of MMU_FTRS_POSSIBLE at buildtime
and elimination of non possible parts of code at compile time.
For this to work, mmu_has_feature() and early_mmu_has_feature()
must be inlined.

Fixes: 259149cf7c ("powerpc/32s: Only build hash code when CONFIG_PPC_BOOK3S_604 is selected")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/cf61345912c078c96f171afd0fcc48ef27cbdc3f.1614443418.git.christophe.leroy@csgroup.eu
2021-03-02 22:41:50 +11:00
Uwe Kleine-König
386a966f5c vio: make remove callback return void
The driver core ignores the return value of struct bus_type::remove()
because there is only little that can be done. To simplify the quest to
make this function return void, let struct vio_driver::remove() return
void, too. All users already unconditionally return 0, this commit makes
it obvious that returning an error code is a bad idea.

Note there are two nominally different implementations for a vio bus:
one in arch/sparc/kernel/vio.c and the other in
arch/powerpc/platforms/pseries/vio.c. This patch only adapts the powerpc
one.

Before this patch for a device that was bound to a driver without a
remove callback vio_cmo_bus_remove(viodev) wasn't called. As the device
core still considers the device unbound after vio_bus_remove() returns
calling this unconditionally is the consistent behaviour which is
implemented here.

Signed-off-by: Uwe Kleine-König <uwe@kleine-koenig.org>
Reviewed-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Acked-by: Lijun Pan <ljp@linux.ibm.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[mpe: Drop unneeded hvcs_remove() forward declaration, squash in
 change from sfr to drop ibmvnic_remove() forward declaration]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210225221834.160083-1-uwe@kleine-koenig.org
2021-03-02 22:41:23 +11:00
Yauheni Kaliuta
6185266c5a selftests/bpf: Mask bpf_csum_diff() return value to 16 bits in test_verifier
The verifier test labelled "valid read map access into a read-only array
2" calls the bpf_csum_diff() helper and checks its return value. However,
architecture implementations of csum_partial() (which is what the helper
uses) differ in whether they fold the return value to 16 bit or not. For
example, x86 version has ...

	if (unlikely(odd)) {
		result = from32to16(result);
		result = ((result >> 8) & 0xff) | ((result & 0xff) << 8);
	}

... while generic lib/checksum.c does:

	result = from32to16(result);
	if (odd)
		result = ((result >> 8) & 0xff) | ((result & 0xff) << 8);

This makes the helper return different values on different architectures,
breaking the test on non-x86. To fix this, add an additional instruction
to always mask the return value to 16 bits, and update the expected return
value accordingly.

Fixes: fb2abb73e5 ("bpf, selftest: test {rd, wr}only flags and direct value access")
Signed-off-by: Yauheni Kaliuta <yauheni.kaliuta@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210228103017.320240-1-yauheni.kaliuta@redhat.com
2021-03-02 11:50:00 +01:00
Ilya Leoshkevich
42a382a466 selftests/bpf: Use the last page in test_snprintf_btf on s390
test_snprintf_btf fails on s390, because NULL points to a readable
struct lowcore there. Fix by using the last page instead.

Error message example:

    printing fffffffffffff000 should generate error, got (361)

Fixes: 076a95f5af ("selftests/bpf: Add bpf_snprintf_btf helper tests")
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210227051726.121256-1-iii@linux.ibm.com
2021-03-02 11:30:59 +01:00
Pierre-Louis Bossart
a864e8f159 ALSA: hda: intel-nhlt: verify config type
Multiple bug reports report issues with the SOF and SST drivers when
dealing with single microphone cases.

We currently read the DMIC array information unconditionally but we
don't check that the configuration type is actually a mic array.

When the DMIC link does not rely on a mic array configuration, the
recommendation is to check the format information to infer the maximum
number of channels, and map this to the number of microphones.

This leaves a potential for a mismatch between actual microphones
available in hardware and what the ACPI table contains, but we have no
other source of information.

Note that single microphone configurations can alternatively be
handled with a 'mic array' configuration along with a 'vendor-defined'
geometry.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=201251
BugLink: https://github.com/thesofproject/linux/issues/2725
Fixes: 7a33ea70e1 ('ALSA: hda: intel-nhlt: handle NHLT VENDOR_DEFINED DMIC geometry')
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Reviewed-by: Guennadi Liakhovetski <guennadi.liakhovetski@intel.com>
Reviewed-by: Rander Wang <rander.wang@intel.com>
Reviewed-by: Kai Vehmanen <kai.vehmanen@linux.intel.com>
Link: https://lore.kernel.org/r/20210302000146.1177770-1-pierre-louis.bossart@linux.intel.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2021-03-02 10:22:47 +01:00
Pierre-Louis Bossart
3531ba21f5 ALSA: hda: fix kernel-doc warnings
v5.12-rc1 flags new warnings with make W=1, fix missing or broken
function descriptors.

sound/pci/hda/hda_codec.c:3492: warning: expecting prototype for
snd_hda_input_mux_info_info(). Prototype was for
snd_hda_input_mux_info() instead

sound/pci/hda/hda_codec.c:3521: warning: expecting prototype for
snd_hda_input_mux_info_put(). Prototype was for
snd_hda_input_mux_put() instead

sound/pci/hda/hda_codec.c:3958: warning: expecting prototype for
_snd_hda_pin_ctl(). Prototype was for _snd_hda_set_pin_ctl() instead

sound/pci/hda/hda_jack.c:223: warning: expecting prototype for
snd_hda_set_dirty_all(). Prototype was for
snd_hda_jack_set_dirty_all() instead

sound/pci/hda/hda_jack.c:309: warning: expecting prototype for
snd_hda_jack_detect_enable_mst(). Prototype was for
snd_hda_jack_detect_enable_callback_mst() instead

sound/pci/hda/hda_generic.c:3933: warning: expecting prototype for
snd_dha_gen_add_mute_led_cdev(). Prototype was for
snd_hda_gen_add_mute_led_cdev() instead

sound/pci/hda/hda_generic.c:4093: warning: expecting prototype for
snd_dha_gen_add_micmute_led_cdev(). Prototype was for
snd_hda_gen_add_micmute_led_cdev() instead

sound/pci/hda/patch_ca0132.c:2357: warning: expecting prototype for
Prepare and send the SCP message to DSP(). Prototype was for
dspio_scp() instead

sound/pci/hda/patch_ca0132.c:2883: warning: expecting prototype for
Allocate router ports(). Prototype was for dsp_allocate_router_ports()
instead

sound/pci/hda/patch_ca0132.c:3202: warning: expecting prototype for
Write a block of data into DSP code or data RAM using pre(). Prototype
was for dspxfr_one_seg() instead

sound/pci/hda/patch_ca0132.c:3397: warning: expecting prototype for
data overlay to DSP memories(). Prototype was for dspxfr_image()
instead

sound/hda/hdac_regmap.c:393: warning: expecting prototype for
snd_hdac_regmap_init(). Prototype was for snd_hdac_regmap_exit()
instead

sound/hda/ext/hdac_ext_controller.c:142: warning: expecting prototype
for snd_hdac_ext_bus_get_link_index(). Prototype was for
snd_hdac_ext_bus_get_link() instead

sound/hda/ext/hdac_ext_stream.c:140: warning: expecting prototype for
snd_hdac_ext_linkstream_start(). Prototype was for
snd_hdac_ext_link_stream_start() instead

Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Link: https://lore.kernel.org/r/20210301174617.116960-1-pierre-louis.bossart@linux.intel.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2021-03-02 10:21:36 +01:00
Jason Yan
5477edcaca gcc-plugins: latent_entropy: remove unneeded semicolon
Fix the following coccicheck warning:

scripts/gcc-plugins/latent_entropy_plugin.c:539:2-3: Unneeded semicolon

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20200418070521.10931-1-yanaijie@huawei.com
2021-03-01 19:19:50 -08:00
Jason Yan
b924a8197a gcc-plugins: structleak: remove unneeded variable 'ret'
Fix the following coccicheck warning:

scripts/gcc-plugins/structleak_plugin.c:177:14-17: Unneeded variable:
"ret". Return "0" on line 207

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20200418070505.10715-1-yanaijie@huawei.com
2021-03-01 19:19:44 -08:00
Eric Dumazet
8811f4a983 tcp: add sanity tests to TCP_QUEUE_SEQ
Qingyu Li reported a syzkaller bug where the repro
changes RCV SEQ _after_ restoring data in the receive queue.

mprotect(0x4aa000, 12288, PROT_READ)    = 0
mmap(0x1ffff000, 4096, PROT_NONE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x1ffff000
mmap(0x20000000, 16777216, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x20000000
mmap(0x21000000, 4096, PROT_NONE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x21000000
socket(AF_INET6, SOCK_STREAM, IPPROTO_IP) = 3
setsockopt(3, SOL_TCP, TCP_REPAIR, [1], 4) = 0
connect(3, {sa_family=AF_INET6, sin6_port=htons(0), sin6_flowinfo=htonl(0), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_scope_id=0}, 28) = 0
setsockopt(3, SOL_TCP, TCP_REPAIR_QUEUE, [1], 4) = 0
sendmsg(3, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="0x0000000000000003\0\0", iov_len=20}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 20
setsockopt(3, SOL_TCP, TCP_REPAIR, [0], 4) = 0
setsockopt(3, SOL_TCP, TCP_QUEUE_SEQ, [128], 4) = 0
recvfrom(3, NULL, 20, 0, NULL, NULL)    = -1 ECONNRESET (Connection reset by peer)

syslog shows:
[  111.205099] TCP recvmsg seq # bug 2: copied 80, seq 0, rcvnxt 80, fl 0
[  111.207894] WARNING: CPU: 1 PID: 356 at net/ipv4/tcp.c:2343 tcp_recvmsg_locked+0x90e/0x29a0

This should not be allowed. TCP_QUEUE_SEQ should only be used
when queues are empty.

This patch fixes this case, and the tx path as well.

Fixes: ee9952831c ("tcp: Initial repair mode")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=212005
Reported-by: Qingyu Li <ieatmuttonchuan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-01 15:32:05 -08:00
Andrea Parri (Microsoft)
3946688edb hv_netvsc: Fix validation in netvsc_linkstatus_callback()
Contrary to the RNDIS protocol specification, certain (pre-Fe)
implementations of Hyper-V's vSwitch did not account for the status
buffer field in the length of an RNDIS packet; the bug was fixed in
newer implementations.  Validate the status buffer fields using the
length of the 'vmtransfer_page' packet (all implementations), that
is known/validated to be less than or equal to the receive section
size and not smaller than the length of the RNDIS message.

Reported-by: Dexuan Cui <decui@microsoft.com>
Suggested-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com>
Fixes: 505e3f00c3 ("hv_netvsc: Add (more) validation for untrusted Hyper-V values")
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-01 15:30:52 -08:00
DENG Qingfang
9200f515c4 net: dsa: tag_mtk: fix 802.1ad VLAN egress
A different TPID bit is used for 802.1ad VLAN frames.

Reported-by: Ilario Gelmetti <iochesonome@gmail.com>
Fixes: f0af34317f ("net: dsa: mediatek: combine MediaTek tag with VLAN tag")
Signed-off-by: DENG Qingfang <dqfext@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-01 15:29:43 -08:00
Willem de Bruijn
b228c9b058 net: expand textsearch ts_state to fit skb_seq_state
The referenced commit expands the skb_seq_state used by
skb_find_text with a 4B frag_off field, growing it to 48B.

This exceeds container ts_state->cb, causing a stack corruption:

[   73.238353] Kernel panic - not syncing: stack-protector: Kernel stack
is corrupted in: skb_find_text+0xc5/0xd0
[   73.247384] CPU: 1 PID: 376 Comm: nping Not tainted 5.11.0+ #4
[   73.252613] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS 1.14.0-2 04/01/2014
[   73.260078] Call Trace:
[   73.264677]  dump_stack+0x57/0x6a
[   73.267866]  panic+0xf6/0x2b7
[   73.270578]  ? skb_find_text+0xc5/0xd0
[   73.273964]  __stack_chk_fail+0x10/0x10
[   73.277491]  skb_find_text+0xc5/0xd0
[   73.280727]  string_mt+0x1f/0x30
[   73.283639]  ipt_do_table+0x214/0x410

The struct is passed between skb_find_text and its callbacks
skb_prepare_seq_read, skb_seq_read and skb_abort_seq read through
the textsearch interface using TS_SKB_CB.

I assumed that this mapped to skb->cb like other .._SKB_CB wrappers.
skb->cb is 48B. But it maps to ts_state->cb, which is only 40B.

skb->cb was increased from 40B to 48B after ts_state was introduced,
in commit 3e3850e989 ("[NETFILTER]: Fix xfrm lookup in
ip_route_me_harder/ip6_route_me_harder").

Increase ts_state.cb[] to 48 to fit the struct.

Also add a BUILD_BUG_ON to avoid a repeat.

The alternative is to directly add a dependency from textsearch onto
linux/skbuff.h, but I think the intent is textsearch to have no such
dependencies on its callers.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=211911
Fixes: 97550f6fa5 ("net: compound page support in skb_seq_read")
Reported-by: Kris Karas <bugs-a17@moonlit-rail.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-01 15:25:24 -08:00
Masanari Iida
2353db75c3 docs: networking: bonding.rst Fix a typo in bonding.rst
This patch fixes a spelling typo in bonding.rst.

Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-01 13:53:21 -08:00
David S. Miller
2eb4898255 Merge tag 'linux-can-fixes-for-5.12-20210301' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can
Marc Kleine-Budde says:

====================
pull-request: can 2021-03-01

this is a pull request of 6 patches for net/master.

The first 3 patches are by Joakim Zhang for the flexcan driver and fix
the probing and starting of the chip.

The next patch is by me, for the mcp251xfd driver and reverts the BQL
support. BQL support got mainline with rc1 and assumes that CAN frames
are always echoed, which is not the case. A proper fix requires
changes more changes and will be rolled out via linux-can-next later.

Oleksij Rempel's patch fixes the socket ref counting if socket was
closed before setting skb ownership.

Torin Cooper-Bennun's patch for the tcan4x5x driver fixes a race
condition, where the chip is first attached the bus and then the MRAM
is initialized, which may result in lost data.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-01 13:37:08 -08:00
David S. Miller
8a00946e1a Merge branch 'enetc-fixes'
Vladimir Oltean says:

====================
Fixes for NXP ENETC driver

This contains an assorted set of fixes collected over the past 2 weeks
on the enetc driver. Some are related to VLAN processing, some to
physical link settings, some are fixups of previous hardware workarounds,
and some are simply zero-day data path bugs that for some reason were
never caught or at least identified.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-01 13:34:47 -08:00
Vladimir Oltean
3a5d12c9be net: enetc: keep RX ring consumer index in sync with hardware
The RX rings have a producer index owned by hardware, where newly
received frame buffers are placed, and a consumer index owned by
software, where newly allocated buffers are placed, in expectation of
hardware being able to place frame data in them.

Hardware increments the producer index when a frame is received, however
it is not allowed to increment the producer index to match the consumer
index (RBCIR) since the ring can hold at most RBLENR[LENGTH]-1 received
BDs. Whenever the producer index matches the value of the consumer
index, the ring has no unprocessed received frames and all BDs in the
ring have been initialized/prepared by software, i.e. hardware owns all
BDs in the ring.

The code uses the next_to_clean variable to keep track of the producer
index, and the next_to_use variable to keep track of the consumer index.

The RX rings are seeded from enetc_refill_rx_ring, which is called from
two places:

1. initially the ring is seeded until full with enetc_bd_unused(rx_ring),
   i.e. with 511 buffers. This will make next_to_clean=0 and next_to_use=511:

.ndo_open
-> enetc_open
   -> enetc_setup_bdrs
      -> enetc_setup_rxbdr
         -> enetc_refill_rx_ring

2. then during the data path processing, it is refilled with 16 buffers
   at a time:

enetc_msix
-> napi_schedule
   -> enetc_poll
      -> enetc_clean_rx_ring
         -> enetc_refill_rx_ring

There is just one problem: the initial seeding done during .ndo_open
updates just the producer index (ENETC_RBPIR) with 0, and the software
next_to_clean and next_to_use variables. Notably, it will not update the
consumer index to make the hardware aware of the newly added buffers.

Wait, what? So how does it work?

Well, the reset values of the producer index and of the consumer index
of a ring are both zero. As per the description in the second paragraph,
it means that the ring is full of buffers waiting for hardware to put
frames in them, which by coincidence is almost true, because we have in
fact seeded 511 buffers into the ring.

But will the hardware attempt to access the 512th entry of the ring,
which has an invalid BD in it? Well, no, because in order to do that, it
would have to first populate the first 511 entries, and the NAPI
enetc_poll will kick in by then. Eventually, after 16 processed slots
have become available in the RX ring, enetc_clean_rx_ring will call
enetc_refill_rx_ring and then will [ finally ] update the consumer index
with the new software next_to_use variable. From now on, the
next_to_clean and next_to_use variables are in sync with the producer
and consumer ring indices.

So the day is saved, right? Well, not quite. Freeing the memory
allocated for the rings is done in:

enetc_close
-> enetc_clear_bdrs
   -> enetc_clear_rxbdr
      -> this just disables the ring
-> enetc_free_rxtx_rings
   -> enetc_free_rx_ring
      -> sets next_to_clean and next_to_use to 0

but again, nothing is committed to the hardware producer and consumer
indices (yay!). The assumption is that the ring is disabled, so the
indices don't matter anyway, and it's the responsibility of the "open"
code path to set those up.

.. Except that the "open" code path does not set those up properly.

While initially, things almost work, during subsequent enetc_close ->
enetc_open sequences, we have problems. To be precise, the enetc_open
that is subsequent to enetc_close will again refill the ring with 511
entries, but it will leave the consumer index untouched. Untouched
means, of course, equal to the value it had before disabling the ring
and draining the old buffers in enetc_close.

But as mentioned, enetc_setup_rxbdr will at least update the producer
index though, through this line of code:

	enetc_rxbdr_wr(hw, idx, ENETC_RBPIR, 0);

so at this stage we'll have:

next_to_clean=0 (in hardware 0)
next_to_use=511 (in hardware we'll have the refill index prior to enetc_close)

Again, the next_to_clean and producer index are in sync and set to
correct values, so the driver manages to limp on. Eventually, 16 ring
entries will be consumed by enetc_poll, and the savior
enetc_clean_rx_ring will come and call enetc_refill_rx_ring, and then
update the hardware consumer ring based upon the new next_to_use.

So.. it works?
Well, by coincidence, it almost does, but there's a circumstance where
enetc_clean_rx_ring won't be there to save us. If the previous value of
the consumer index was 15, there's a problem, because the NAPI poll
sequence will only issue a refill when 16 or more buffers have been
consumed.

It's easiest to illustrate this with an example:

ip link set eno0 up
ip addr add 192.168.100.1/24 dev eno0
ping 192.168.100.1 -c 20 # ping this port from another board
ip link set eno0 down
ip link set eno0 up
ping 192.168.100.1 -c 20 # ping it again from the same other board

One by one:

1. ip link set eno0 up
-> calls enetc_setup_rxbdr:
   -> calls enetc_refill_rx_ring(511 buffers)
   -> next_to_clean=0 (in hw 0)
   -> next_to_use=511 (in hw 0)

2. ping 192.168.100.1 -c 20 # ping this port from another board
enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=1 next_to_clean 0 (in hw 1) next_to_use 511 (in hw 0)
enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=2 next_to_clean 1 (in hw 2) next_to_use 511 (in hw 0)
enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=3 next_to_clean 2 (in hw 3) next_to_use 511 (in hw 0)
enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=4 next_to_clean 3 (in hw 4) next_to_use 511 (in hw 0)
enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=5 next_to_clean 4 (in hw 5) next_to_use 511 (in hw 0)
enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=6 next_to_clean 5 (in hw 6) next_to_use 511 (in hw 0)
enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=7 next_to_clean 6 (in hw 7) next_to_use 511 (in hw 0)
enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=8 next_to_clean 7 (in hw 8) next_to_use 511 (in hw 0)
enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=9 next_to_clean 8 (in hw 9) next_to_use 511 (in hw 0)
enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=10 next_to_clean 9 (in hw 10) next_to_use 511 (in hw 0)
enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=11 next_to_clean 10 (in hw 11) next_to_use 511 (in hw 0)
enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=12 next_to_clean 11 (in hw 12) next_to_use 511 (in hw 0)
enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=13 next_to_clean 12 (in hw 13) next_to_use 511 (in hw 0)
enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=14 next_to_clean 13 (in hw 14) next_to_use 511 (in hw 0)
enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=15 next_to_clean 14 (in hw 15) next_to_use 511 (in hw 0)
enetc_clean_rx_ring: enetc_refill_rx_ring(16) increments next_to_use by 16 (mod 512) and writes it to hw
enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=0 next_to_clean 15 (in hw 16) next_to_use 15 (in hw 15)
enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=1 next_to_clean 16 (in hw 17) next_to_use 15 (in hw 15)
enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=2 next_to_clean 17 (in hw 18) next_to_use 15 (in hw 15)
enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=3 next_to_clean 18 (in hw 19) next_to_use 15 (in hw 15)
enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=4 next_to_clean 19 (in hw 20) next_to_use 15 (in hw 15)
enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=5 next_to_clean 20 (in hw 21) next_to_use 15 (in hw 15)
enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=6 next_to_clean 21 (in hw 22) next_to_use 15 (in hw 15)

20 packets transmitted, 20 packets received, 0% packet loss

3. ip link set eno0 down
enetc_free_rx_ring: next_to_clean 0 (in hw 22), next_to_use 0 (in hw 15)

4. ip link set eno0 up
-> calls enetc_setup_rxbdr:
   -> calls enetc_refill_rx_ring(511 buffers)
   -> next_to_clean=0 (in hw 0)
   -> next_to_use=511 (in hw 15)

5. ping 192.168.100.1 -c 20 # ping it again from the same other board
enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=1 next_to_clean 0 (in hw 1) next_to_use 511 (in hw 15)
enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=2 next_to_clean 1 (in hw 2) next_to_use 511 (in hw 15)
enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=3 next_to_clean 2 (in hw 3) next_to_use 511 (in hw 15)
enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=4 next_to_clean 3 (in hw 4) next_to_use 511 (in hw 15)
enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=5 next_to_clean 4 (in hw 5) next_to_use 511 (in hw 15)
enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=6 next_to_clean 5 (in hw 6) next_to_use 511 (in hw 15)
enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=7 next_to_clean 6 (in hw 7) next_to_use 511 (in hw 15)
enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=8 next_to_clean 7 (in hw 8) next_to_use 511 (in hw 15)
enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=9 next_to_clean 8 (in hw 9) next_to_use 511 (in hw 15)
enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=10 next_to_clean 9 (in hw 10) next_to_use 511 (in hw 15)
enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=11 next_to_clean 10 (in hw 11) next_to_use 511 (in hw 15)
enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=12 next_to_clean 11 (in hw 12) next_to_use 511 (in hw 15)
enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=13 next_to_clean 12 (in hw 13) next_to_use 511 (in hw 15)
enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=14 next_to_clean 13 (in hw 14) next_to_use 511 (in hw 15)

20 packets transmitted, 12 packets received, 40% packet loss

And there it dies. No enetc_refill_rx_ring (because cleaned_cnt must be equal
to 15 for that to happen), no nothing. The hardware enters the condition where
the producer (14) + 1 is equal to the consumer (15) index, which makes it
believe it has no more free buffers to put packets in, so it starts discarding
them:

ip netns exec ns0 ethtool -S eno0 | grep -v ': 0'
NIC statistics:
     Rx ring  0 discarded frames: 8

Summarized, if the interface receives between 16 and 32 (mod 512) frames
and then there is a link flap, then the port will eventually die with no
way to recover. If it receives less than 16 (mod 512) frames, then the
initial NAPI poll [ before the link flap ] will not update the consumer
index in hardware (it will remain zero) which will be ok when the buffers
are later reinitialized. If more than 32 (mod 512) frames are received,
the initial NAPI poll has the chance to refill the ring twice, updating
the consumer index to at least 32. So after the link flap, the consumer
index is still wrong, but the post-flap NAPI poll gets a chance to
refill the ring once (because it passes through cleaned_cnt=15) and
makes the consumer index be again back in sync with next_to_use.

The solution to this problem is actually simple, we just need to write
next_to_use into the hardware consumer index at enetc_open time, which
always brings it back in sync after an initial buffer seeding process.

The simpler thing would be to put the write to the consumer index into
enetc_refill_rx_ring directly, but there are issues with the MDIO
locking: in the NAPI poll code we have the enetc_lock_mdio() taken from
top-level and we use the unlocked enetc_wr_reg_hot, whereas in
enetc_open, the enetc_lock_mdio() is not taken at the top level, but
instead by each individual enetc_wr_reg, so we are forced to put an
additional enetc_wr_reg in enetc_setup_rxbdr. Better organization of
the code is left as a refactoring exercise.

Fixes: d4fd0404c1 ("enetc: Introduce basic PF and VF ENETC ethernet drivers")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-01 13:34:47 -08:00
Vladimir Oltean
96a5223b91 net: enetc: remove bogus write to SIRXIDR from enetc_setup_rxbdr
The Station Interface Receive Interrupt Detect Register (SIRXIDR)
contains a 16-bit wide mask of 'interrupt detected' events for each ring
associated with a port. Bit i is write-1-to-clean for RX ring i.

I have no explanation whatsoever how this line of code came to be
inserted in the blamed commit. I checked the downstream versions of that
patch and none of them have it.

The somewhat comical aspect of it is that we're writing a binary number
to the SIRXIDR register, which is derived from enetc_bd_unused(rx_ring).
Since the RX rings have 512 buffer descriptors, we end up writing 511 to
this register, which is 0x1ff, so we are effectively clearing the
'interrupt detected' event for rings 0-8.

This register is not what is used for interrupt handling though - it
only provides a summary for the entire SI. The hardware provides one
separate Interrupt Detect Register per RX ring, which auto-clears upon
read. So there doesn't seem to be any adverse effect caused by this
bogus write.

There is, however, one reason why this should be handled as a bugfix:
next_to_clean _should_ be committed to hardware, just not to that
register, and this was obscuring the fact that it wasn't. This is fixed
in the next patch, and removing the bogus line now allows the fix patch
to be backported beyond that point.

Fixes: fd5736bf9f ("enetc: Workaround for MDIO register access issue")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-01 13:34:47 -08:00
Vladimir Oltean
c76a97218d net: enetc: force the RGMII speed and duplex instead of operating in inband mode
The ENETC port 0 MAC supports in-band status signaling coming from a PHY
when operating in RGMII mode, and this feature is enabled by default.

It has been reported that RGMII is broken in fixed-link, and that is not
surprising considering the fact that no PHY is attached to the MAC in
that case, but a switch.

This brings us to the topic of the patch: the enetc driver should have
not enabled the optional in-band status signaling for RGMII unconditionally,
but should have forced the speed and duplex to what was resolved by
phylink.

Note that phylink does not accept the RGMII modes as valid for in-band
signaling, and these operate a bit differently than 1000base-x and SGMII
(notably there is no clause 37 state machine so no ACK required from the
MAC, instead the PHY sends extra code words on RXD[3:0] whenever it is
not transmitting something else, so it should be safe to leave a PHY
with this option unconditionally enabled even if we ignore it). The spec
talks about this here:
https://e2e.ti.com/cfs-file/__key/communityserver-discussions-components-files/138/RGMIIv1_5F00_3.pdf

Fixes: 71b77a7a27 ("enetc: Migrate to PHYLINK and PCS_LYNX")
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-01 13:34:47 -08:00
Vladimir Oltean
a74dbce9d4 net: enetc: don't disable VLAN filtering in IFF_PROMISC mode
Quoting from the blamed commit:

    In promiscuous mode, it is more intuitive that all traffic is received,
    including VLAN tagged traffic. It appears that it is necessary to set
    the flag in PSIPVMR for that to be the case, so VLAN promiscuous mode is
    also temporarily enabled. On exit from promiscuous mode, the setting
    made by ethtool is restored.

Intuitive or not, there isn't any definition issued by a standards body
which says that promiscuity has anything to do with VLAN filtering - it
only has to do with accepting packets regardless of destination MAC address.

In fact people are already trying to use this misunderstanding/bug of
the enetc driver as a justification to transform promiscuity into
something it never was about: accepting every packet (maybe that would
be the "rx-all" netdev feature?):
https://lore.kernel.org/netdev/20201110153958.ci5ekor3o2ekg3ky@ipetronik.com/

This is relevant because there are use cases in the kernel (such as
tc-flower rules with the protocol 802.1Q and a vlan_id key) which do not
(yet) use the vlan_vid_add API to be compatible with VLAN-filtering NICs
such as enetc, so for those, disabling rx-vlan-filter is currently the
only right solution to make these setups work:
https://lore.kernel.org/netdev/CA+h21hoxwRdhq4y+w8Kwgm74d4cA0xLeiHTrmT-VpSaM7obhkg@mail.gmail.com/
The blamed patch has unintentionally introduced one more way for this to
work, which is to enable IFF_PROMISC, however this is non-portable
because port promiscuity is not meant to disable VLAN filtering.
Therefore, it could invite people to write broken scripts for enetc, and
then wonder why they are broken when migrating to other drivers that
don't handle promiscuity in the same way.

Fixes: 7070eea5e9 ("enetc: permit configuration of rx-vlan-filter with ethtool")
Cc: Markus Blöchl <Markus.Bloechl@ipetronik.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-01 13:34:47 -08:00
Vladimir Oltean
827b6fd046 net: enetc: fix incorrect TPID when receiving 802.1ad tagged packets
When the enetc ports have rx-vlan-offload enabled, they report a TPID of
ETH_P_8021Q regardless of what was actually in the packet. When
rx-vlan-offload is disabled, packets have the proper TPID. Fix this
inconsistency by finishing the TODO left in the code.

Fixes: d4fd0404c1 ("enetc: Introduce basic PF and VF ENETC ethernet drivers")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-01 13:34:47 -08:00
Vladimir Oltean
6d36ecdbc4 net: enetc: take the MDIO lock only once per NAPI poll cycle
The workaround for the ENETC MDIO erratum caused a performance
degradation of 82 Kpps (seen with IP forwarding of two 1Gbps streams of
64B packets). This is due to excessive locking and unlocking in the fast
path, which can be avoided.

By taking the MDIO read-side lock only once per NAPI poll cycle, we are
able to regain 54 Kpps (65%) of the performance hit. The rest of the
performance degradation comes from the TX data path, but unfortunately
it doesn't look like we can optimize that away easily, even with
netdev_xmit_more(), there just isn't any skb batching done, to help with
taking the MDIO lock less often than once per packet.

We need to change the register accessor type for enetc_get_tx_tstamp,
because it now runs under the enetc_lock_mdio as per the new call path
detailed below:

enetc_msix
-> napi_schedule
   -> enetc_poll
      -> enetc_lock_mdio
      -> enetc_clean_tx_ring
         -> enetc_get_tx_tstamp
      -> enetc_clean_rx_ring
      -> enetc_unlock_mdio

Fixes: fd5736bf9f ("enetc: Workaround for MDIO register access issue")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-01 13:34:47 -08:00
Vladimir Oltean
3222b5b613 net: enetc: initialize RFS/RSS memories for unused ports too
Michael reports that since linux-next-20210211, the AER messages for ECC
errors have started reappearing, and this time they can be reliably
reproduced with the first ping on one of his LS1028A boards.

$ ping 1[   33.258069] pcieport 0000:00:1f.0: AER: Multiple Corrected error received: 0000:00:00.0
72.16.0.1
PING [   33.267050] pcieport 0000:00:1f.0: AER: can't find device of ID0000
172.16.0.1 (172.16.0.1): 56 data bytes
64 bytes from 172.16.0.1: seq=0 ttl=64 time=17.124 ms
64 bytes from 172.16.0.1: seq=1 ttl=64 time=0.273 ms

$ devmem 0x1f8010e10 32
0xC0000006

It isn't clear why this is necessary, but it seems that for the errors
to go away, we must clear the entire RFS and RSS memory, not just for
the ports in use.

Sadly the code is structured in such a way that we can't have unified
logic for the used and unused ports. For the minimal initialization of
an unused port, we need just to enable and ioremap the PF memory space,
and a control buffer descriptor ring. Unused ports must then free the
CBDR because the driver will exit, but used ports can not pick up from
where that code path left, since the CBDR API does not reinitialize a
ring when setting it up, so its producer and consumer indices are out of
sync between the software and hardware state. So a separate
enetc_init_unused_port function was created, and it gets called right
after the PF memory space is enabled.

Fixes: 07bf34a50e ("net: enetc: initialize the RFS and RSS memories")
Reported-by: Michael Walle <michael@walle.cc>
Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Michael Walle <michael@walle.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-01 13:34:47 -08:00
Vladimir Oltean
c646d10dda net: enetc: don't overwrite the RSS indirection table when initializing
After the blamed patch, all RX traffic gets hashed to CPU 0 because the
hashing indirection table set up in:

enetc_pf_probe
-> enetc_alloc_si_resources
   -> enetc_configure_si
      -> enetc_setup_default_rss_table

is overwritten later in:

enetc_pf_probe
-> enetc_init_port_rss_memory

which zero-initializes the entire port RSS table in order to avoid ECC errors.

The trouble really is that enetc_init_port_rss_memory really neads
enetc_alloc_si_resources to be called, because it depends upon
enetc_alloc_cbdr and enetc_setup_cbdr. But that whole enetc_configure_si
thing could have been better thought out, it has nothing to do in a
function called "alloc_si_resources", especially since its counterpart,
"free_si_resources", does nothing to unwind the configuration of the SI.

The point is, we need to pull out enetc_configure_si out of
enetc_alloc_resources, and move it after enetc_init_port_rss_memory.
This allows us to set up the default RSS indirection table after
initializing the memory.

Fixes: 07bf34a50e ("net: enetc: initialize the RFS and RSS memories")
Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-01 13:34:47 -08:00
Yejune Deng
8bd2a05527 inetpeer: use div64_ul() and clamp_val() calculate inet_peer_threshold
In inet_initpeers(), struct inet_peer on IA32 uses 128 bytes in nowdays.
Get rid of the cascade and use div64_ul() and clamp_val() calculate that
will not need to be adjusted in the future as suggested by Eric Dumazet.

Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Yejune Deng <yejune.deng@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-01 13:32:12 -08:00
Pavel Skripkin
093b036aa9 net/qrtr: fix __netdev_alloc_skb call
syzbot found WARNING in __alloc_pages_nodemask()[1] when order >= MAX_ORDER.
It was caused by a huge length value passed from userspace to qrtr_tun_write_iter(),
which tries to allocate skb. Since the value comes from the untrusted source
there is no need to raise a warning in __alloc_pages_nodemask().

[1] WARNING in __alloc_pages_nodemask+0x5f8/0x730 mm/page_alloc.c:5014
Call Trace:
 __alloc_pages include/linux/gfp.h:511 [inline]
 __alloc_pages_node include/linux/gfp.h:524 [inline]
 alloc_pages_node include/linux/gfp.h:538 [inline]
 kmalloc_large_node+0x60/0x110 mm/slub.c:3999
 __kmalloc_node_track_caller+0x319/0x3f0 mm/slub.c:4496
 __kmalloc_reserve net/core/skbuff.c:150 [inline]
 __alloc_skb+0x4e4/0x5a0 net/core/skbuff.c:210
 __netdev_alloc_skb+0x70/0x400 net/core/skbuff.c:446
 netdev_alloc_skb include/linux/skbuff.h:2832 [inline]
 qrtr_endpoint_post+0x84/0x11b0 net/qrtr/qrtr.c:442
 qrtr_tun_write_iter+0x11f/0x1a0 net/qrtr/tun.c:98
 call_write_iter include/linux/fs.h:1901 [inline]
 new_sync_write+0x426/0x650 fs/read_write.c:518
 vfs_write+0x791/0xa30 fs/read_write.c:605
 ksys_write+0x12d/0x250 fs/read_write.c:658
 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

Reported-by: syzbot+80dccaee7c6630fa9dcf@syzkaller.appspotmail.com
Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
Acked-by: Alexander Lobakin <alobakin@pm.me>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-01 13:24:03 -08:00
David S. Miller
5db4f74ec8 Merge branch 'sh_eth-masks'
Sergey Shtylyov says:

====================
Fix TRSCER masks in the Ether driver

Here are 3 patches against DaveM's 'net' repo. I'm fixing the TRSCER masks in
the driver to match the manuals...
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-01 13:22:34 -08:00
Sergey Shtylyov
165bc5a4f3 sh_eth: fix TRSCER mask for R7S9210
According  to the RZ/A2M Group User's Manual: Hardware, Rev. 2.00,
the TRSCER register has bit 9 reserved, hence we can't use the driver's
default TRSCER mask.  Add the explicit initializer for sh_eth_cpu_data::
trscer_err_mask for R7S9210.

Fixes: 6e0bb04d0e ("sh_eth: Add R7S9210 support")
Signed-off-by: Sergey Shtylyov <s.shtylyov@omprussia.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-01 13:22:34 -08:00
Sergey Shtylyov
75be7fb7f9 sh_eth: fix TRSCER mask for R7S72100
According  to  the RZ/A1H Group, RZ/A1M Group User's Manual: Hardware,
Rev. 4.00, the TRSCER register has bit 9 reserved, hence we can't use
the driver's default TRSCER mask.  Add the explicit initializer for
sh_eth_cpu_data::trscer_err_mask for R7S72100.

Fixes: db893473d3 ("sh_eth: Add support for r7s72100")
Signed-off-by: Sergey Shtylyov <s.shtylyov@omprussia.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-01 13:22:34 -08:00
Sergey Shtylyov
8c91bc3d44 sh_eth: fix TRSCER mask for SH771x
According  to  the SH7710, SH7712, SH7713 Group User's Manual: Hardware,
Rev. 3.00, the TRSCER register actually has only bit 7 valid (and named
differently), with all the other bits reserved. Apparently, this was not
the case with some early revisions of the manual as we have the other
bits declared (and set) in the original driver.  Follow the suit and add
the explicit sh_eth_cpu_data::trscer_err_mask initializer for SH771x...

Fixes: 86a74ff21a ("net: sh_eth: add support for Renesas SuperH Ethernet")
Signed-off-by: Sergey Shtylyov <s.shtylyov@omprussia.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-01 13:22:34 -08:00
Tong Zhang
a2bd45834e atm: lanai: dont run lanai_dev_close if not open
lanai_dev_open() can fail. When it fail, lanai->base is unmapped and the
pci device is disabled. The caller, lanai_init_one(), then tries to run
atm_dev_deregister(). This will subsequently call lanai_dev_close() and
use the already released MMIO area.

To fix this issue, set the lanai->base to NULL if open fail,
and test the flag in lanai_dev_close().

[    8.324153] lanai: lanai_start() failed, err=19
[    8.324819] lanai(itf 0): shutting down interface
[    8.325211] BUG: unable to handle page fault for address: ffffc90000180024
[    8.325781] #PF: supervisor write access in kernel mode
[    8.326215] #PF: error_code(0x0002) - not-present page
[    8.326641] PGD 100000067 P4D 100000067 PUD 100139067 PMD 10013a067 PTE 0
[    8.327206] Oops: 0002 [#1] SMP KASAN NOPTI
[    8.327557] CPU: 0 PID: 95 Comm: modprobe Not tainted 5.11.0-rc7-00090-gdcc0b49040c7 #12
[    8.328229] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-48-gd9c812dda519-4
[    8.329145] RIP: 0010:lanai_dev_close+0x4f/0xe5 [lanai]
[    8.329587] Code: 00 48 c7 c7 00 d3 01 c0 e8 49 4e 0a c2 48 8d bd 08 02 00 00 e8 6e 52 14 c1 48 80
[    8.330917] RSP: 0018:ffff8881029ef680 EFLAGS: 00010246
[    8.331196] RAX: 000000000003fffe RBX: ffff888102fb4800 RCX: ffffffffc001a98a
[    8.331572] RDX: ffffc90000180000 RSI: 0000000000000246 RDI: ffff888102fb4000
[    8.331948] RBP: ffff888102fb4000 R08: ffffffff8115da8a R09: ffffed102053deaa
[    8.332326] R10: 0000000000000003 R11: ffffed102053dea9 R12: ffff888102fb48a4
[    8.332701] R13: ffffffffc00123c0 R14: ffff888102fb4b90 R15: ffff888102fb4b88
[    8.333077] FS:  00007f08eb9056a0(0000) GS:ffff88815b400000(0000) knlGS:0000000000000000
[    8.333502] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    8.333806] CR2: ffffc90000180024 CR3: 0000000102a28000 CR4: 00000000000006f0
[    8.334182] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    8.334557] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[    8.334932] Call Trace:
[    8.335066]  atm_dev_deregister+0x161/0x1a0 [atm]
[    8.335324]  lanai_init_one.cold+0x20c/0x96d [lanai]
[    8.335594]  ? lanai_send+0x2a0/0x2a0 [lanai]
[    8.335831]  local_pci_probe+0x6f/0xb0
[    8.336039]  pci_device_probe+0x171/0x240
[    8.336255]  ? pci_device_remove+0xe0/0xe0
[    8.336475]  ? kernfs_create_link+0xb6/0x110
[    8.336704]  ? sysfs_do_create_link_sd.isra.0+0x76/0xe0
[    8.336983]  really_probe+0x161/0x420
[    8.337181]  driver_probe_device+0x6d/0xd0
[    8.337401]  device_driver_attach+0x82/0x90
[    8.337626]  ? device_driver_attach+0x90/0x90
[    8.337859]  __driver_attach+0x60/0x100
[    8.338065]  ? device_driver_attach+0x90/0x90
[    8.338298]  bus_for_each_dev+0xe1/0x140
[    8.338511]  ? subsys_dev_iter_exit+0x10/0x10
[    8.338745]  ? klist_node_init+0x61/0x80
[    8.338956]  bus_add_driver+0x254/0x2a0
[    8.339164]  driver_register+0xd3/0x150
[    8.339370]  ? 0xffffffffc0028000
[    8.339550]  do_one_initcall+0x84/0x250
[    8.339755]  ? trace_event_raw_event_initcall_finish+0x150/0x150
[    8.340076]  ? free_vmap_area_noflush+0x1a5/0x5c0
[    8.340329]  ? unpoison_range+0xf/0x30
[    8.340532]  ? ____kasan_kmalloc.constprop.0+0x84/0xa0
[    8.340806]  ? unpoison_range+0xf/0x30
[    8.341014]  ? unpoison_range+0xf/0x30
[    8.341217]  do_init_module+0xf8/0x350
[    8.341419]  load_module+0x3fe6/0x4340
[    8.341621]  ? vm_unmap_ram+0x1d0/0x1d0
[    8.341826]  ? ____kasan_kmalloc.constprop.0+0x84/0xa0
[    8.342101]  ? module_frob_arch_sections+0x20/0x20
[    8.342358]  ? __do_sys_finit_module+0x108/0x170
[    8.342604]  __do_sys_finit_module+0x108/0x170
[    8.342841]  ? __ia32_sys_init_module+0x40/0x40
[    8.343083]  ? file_open_root+0x200/0x200
[    8.343298]  ? do_sys_open+0x85/0xe0
[    8.343491]  ? filp_open+0x50/0x50
[    8.343675]  ? exit_to_user_mode_prepare+0xfc/0x130
[    8.343935]  do_syscall_64+0x33/0x40
[    8.344132]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[    8.344401] RIP: 0033:0x7f08eb887cf7
[    8.344594] Code: 48 89 57 30 48 8b 04 24 48 89 47 38 e9 1d a0 02 00 48 89 f8 48 89 f7 48 89 d6 41
[    8.345565] RSP: 002b:00007ffcd5c98ad8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
[    8.345962] RAX: ffffffffffffffda RBX: 00000000008fea70 RCX: 00007f08eb887cf7
[    8.346336] RDX: 0000000000000000 RSI: 00000000008fd9e0 RDI: 0000000000000003
[    8.346711] RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000000000001
[    8.347085] R10: 00007f08eb8eb300 R11: 0000000000000246 R12: 00000000008fd9e0
[    8.347460] R13: 0000000000000000 R14: 00000000008fddd0 R15: 0000000000000001
[    8.347836] Modules linked in: lanai(+) atm
[    8.348065] CR2: ffffc90000180024
[    8.348244] ---[ end trace 7fdc1c668f2003e5 ]---
[    8.348490] RIP: 0010:lanai_dev_close+0x4f/0xe5 [lanai]
[    8.348772] Code: 00 48 c7 c7 00 d3 01 c0 e8 49 4e 0a c2 48 8d bd 08 02 00 00 e8 6e 52 14 c1 48 80
[    8.349745] RSP: 0018:ffff8881029ef680 EFLAGS: 00010246
[    8.350022] RAX: 000000000003fffe RBX: ffff888102fb4800 RCX: ffffffffc001a98a
[    8.350397] RDX: ffffc90000180000 RSI: 0000000000000246 RDI: ffff888102fb4000
[    8.350772] RBP: ffff888102fb4000 R08: ffffffff8115da8a R09: ffffed102053deaa
[    8.351151] R10: 0000000000000003 R11: ffffed102053dea9 R12: ffff888102fb48a4
[    8.351525] R13: ffffffffc00123c0 R14: ffff888102fb4b90 R15: ffff888102fb4b88
[    8.351918] FS:  00007f08eb9056a0(0000) GS:ffff88815b400000(0000) knlGS:0000000000000000
[    8.352343] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    8.352647] CR2: ffffc90000180024 CR3: 0000000102a28000 CR4: 00000000000006f0
[    8.353022] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    8.353397] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[    8.353958] modprobe (95) used greatest stack depth: 26216 bytes left

Signed-off-by: Tong Zhang <ztong0001@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-01 13:18:54 -08:00
Tong Zhang
4deb550bc3 atm: eni: dont release is never initialized
label err_eni_release is reachable when eni_start() fail.
In eni_start() it calls dev->phy->start() in the last step, if start()
fail we don't need to call phy->stop(), if start() is never called, we
neither need to call phy->stop(), otherwise null-ptr-deref will happen.

In order to fix this issue, don't call phy->stop() in label err_eni_release

[    4.875714] ==================================================================
[    4.876091] BUG: KASAN: null-ptr-deref in suni_stop+0x47/0x100 [suni]
[    4.876433] Read of size 8 at addr 0000000000000030 by task modprobe/95
[    4.876778]
[    4.876862] CPU: 0 PID: 95 Comm: modprobe Not tainted 5.11.0-rc7-00090-gdcc0b49040c7 #2
[    4.877290] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-48-gd94
[    4.877876] Call Trace:
[    4.878009]  dump_stack+0x7d/0xa3
[    4.878191]  kasan_report.cold+0x10c/0x10e
[    4.878410]  ? __slab_free+0x2f0/0x340
[    4.878612]  ? suni_stop+0x47/0x100 [suni]
[    4.878832]  suni_stop+0x47/0x100 [suni]
[    4.879043]  eni_do_release+0x3b/0x70 [eni]
[    4.879269]  eni_init_one.cold+0x1152/0x1747 [eni]
[    4.879528]  ? _raw_spin_lock_irqsave+0x7b/0xd0
[    4.879768]  ? eni_ioctl+0x270/0x270 [eni]
[    4.879990]  ? __mutex_lock_slowpath+0x10/0x10
[    4.880226]  ? eni_ioctl+0x270/0x270 [eni]
[    4.880448]  local_pci_probe+0x6f/0xb0
[    4.880650]  pci_device_probe+0x171/0x240
[    4.880864]  ? pci_device_remove+0xe0/0xe0
[    4.881086]  ? kernfs_create_link+0xb6/0x110
[    4.881315]  ? sysfs_do_create_link_sd.isra.0+0x76/0xe0
[    4.881594]  really_probe+0x161/0x420
[    4.881791]  driver_probe_device+0x6d/0xd0
[    4.882010]  device_driver_attach+0x82/0x90
[    4.882233]  ? device_driver_attach+0x90/0x90
[    4.882465]  __driver_attach+0x60/0x100
[    4.882671]  ? device_driver_attach+0x90/0x90
[    4.882903]  bus_for_each_dev+0xe1/0x140
[    4.883114]  ? subsys_dev_iter_exit+0x10/0x10
[    4.883346]  ? klist_node_init+0x61/0x80
[    4.883557]  bus_add_driver+0x254/0x2a0
[    4.883764]  driver_register+0xd3/0x150
[    4.883971]  ? 0xffffffffc0038000
[    4.884149]  do_one_initcall+0x84/0x250
[    4.884355]  ? trace_event_raw_event_initcall_finish+0x150/0x150
[    4.884674]  ? unpoison_range+0xf/0x30
[    4.884875]  ? ____kasan_kmalloc.constprop.0+0x84/0xa0
[    4.885150]  ? unpoison_range+0xf/0x30
[    4.885352]  ? unpoison_range+0xf/0x30
[    4.885557]  do_init_module+0xf8/0x350
[    4.885760]  load_module+0x3fe6/0x4340
[    4.885960]  ? vm_unmap_ram+0x1d0/0x1d0
[    4.886166]  ? ____kasan_kmalloc.constprop.0+0x84/0xa0
[    4.886441]  ? module_frob_arch_sections+0x20/0x20
[    4.886697]  ? __do_sys_finit_module+0x108/0x170
[    4.886941]  __do_sys_finit_module+0x108/0x170
[    4.887178]  ? __ia32_sys_init_module+0x40/0x40
[    4.887419]  ? file_open_root+0x200/0x200
[    4.887634]  ? do_sys_open+0x85/0xe0
[    4.887826]  ? filp_open+0x50/0x50
[    4.888009]  ? fpregs_assert_state_consistent+0x4d/0x60
[    4.888287]  ? exit_to_user_mode_prepare+0x2f/0x130
[    4.888547]  do_syscall_64+0x33/0x40
[    4.888739]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[    4.889010] RIP: 0033:0x7ff62fcf1cf7
[    4.889202] Code: 48 89 57 30 48 8b 04 24 48 89 47 38 e9 1d a0 02 00 48 89 f8 48 89 f71
[    4.890172] RSP: 002b:00007ffe6644ade8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
[    4.890570] RAX: ffffffffffffffda RBX: 0000000000f2ca70 RCX: 00007ff62fcf1cf7
[    4.890944] RDX: 0000000000000000 RSI: 0000000000f2b9e0 RDI: 0000000000000003
[    4.891318] RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000000000001
[    4.891691] R10: 00007ff62fd55300 R11: 0000000000000246 R12: 0000000000f2b9e0
[    4.892064] R13: 0000000000000000 R14: 0000000000f2bdd0 R15: 0000000000000001
[    4.892439] ==================================================================

Signed-off-by: Tong Zhang <ztong0001@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-01 13:17:36 -08:00
Guangbin Huang
d9032dba5a net: phy: fix save wrong speed and duplex problem if autoneg is on
If phy uses generic driver and autoneg is on, enter command
"ethtool -s eth0 speed 50" will not change phy speed actually, but
command "ethtool eth0" shows speed is 50Mb/s because phydev->speed
has been set to 50 and no update later.

And duplex setting has same problem too.

However, if autoneg is on, phy only changes speed and duplex according to
phydev->advertising, but not phydev->speed and phydev->duplex. So in this
case, phydev->speed and phydev->duplex don't need to be set in function
phy_ethtool_ksettings_set() if autoneg is on.

Fixes: 51e2a3846e ("PHY: Avoid unnecessary aneg restarts")
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-01 13:16:16 -08:00
Jason A. Donenfeld
4372339efc net: always use icmp{,v6}_ndo_send from ndo_start_xmit
There were a few remaining tunnel drivers that didn't receive the prior
conversion to icmp{,v6}_ndo_send. Knowing now that this could lead to
memory corrution (see ee576c47db ("net: icmp: pass zeroed opts from
icmp{,v6}_ndo_send before sending") for details), there's even more
imperative to have these all converted. So this commit goes through the
remaining cases that I could find and does a boring translation to the
ndo variety.

The Fixes: line below is the merge that originally added icmp{,v6}_
ndo_send and converted the first batch of icmp{,v6}_send users. The
rationale then for the change applies equally to this patch. It's just
that these drivers were left out of the initial conversion because these
network devices are hiding in net/ rather than in drivers/net/.

Cc: Florian Westphal <fw@strlen.de>
Cc: Willem de Bruijn <willemb@google.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: David Ahern <dsahern@kernel.org>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Fixes: 803381f9f1 ("Merge branch 'icmp-account-for-NAT-when-sending-icmps-from-ndo-layer'")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-01 13:11:35 -08:00
DENG Qingfang
9eb8bc593a net: dsa: tag_rtl4_a: fix egress tags
Commit 86dd9868b8 has several issues, but was accepted too soon
before anyone could take a look.

- Double free. dsa_slave_xmit() will free the skb if the xmit function
  returns NULL, but the skb is already freed by eth_skb_pad(). Use
  __skb_put_padto() to avoid that.
- Unnecessary allocation. It has been done by DSA core since commit
  a3b0b64797.
- A u16 pointer points to skb data. It should be __be16 for network
  byte order.
- Typo in comments. "numer" -> "number".

Fixes: 86dd9868b8 ("net: dsa: tag_rtl4_a: Support also egress tags")
Signed-off-by: DENG Qingfang <dqfext@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-01 13:09:55 -08:00
Jan Beulich
826d82170b xen-netback: use local var in xenvif_tx_check_gop() instead of re-calculating
shinfo already holds the result of skb_shinfo(skb) at this point - no
need to re-invoke the construct even twice.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-01 13:05:08 -08:00
Ioana Ciornei
73f476aa19 net: phy: ti: take into account all possible interrupt sources
The previous implementation of .handle_interrupt() did not take into
account the fact that all the interrupt status registers should be
acknowledged since multiple interrupt sources could be asserted.

Fix this by reading all the status registers before exiting with
IRQ_NONE or triggering the PHY state machine.

Fixes: 1d1ae3c6ca ("net: phy: ti: implement generic .handle_interrupt() callback")
Reported-by: Sven Schuchmann <schuchmann@schleissheimer.de>
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Link: https://lore.kernel.org/r/20210226153020.867852-1-ciorneiioana@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-03-01 11:46:55 -08:00
Linus Torvalds
7a7fd0de4a Merge branch 'kmap-conversion-for-5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull kmap conversion updates from David Sterba:
 "This contains changes regarding kmap API use and eg conversion from
  kmap_atomic to kmap_local_page.

  The API belongs to memory management but to save cross-tree
  dependency headaches we've agreed to take it through the btrfs tree
  because there are some trivial conversions possible, while the rest
  will need some time and getting the easy cases out of the way would be
  convenient.

  The changes can be grouped:

   - function exports, new helpers

   - new VM_BUG_ON for additional verification; it's been discussed if
     it should be VM_BUG_ON or BUG_ON, the former was chosen due to
     performance reasons

   - code replaced by relevant helpers"

[ This is an updated version of a request that originally came in during
  the merge window, but I asked for some updates:

    https://lore.kernel.org/lkml/cover.1614090658.git.dsterba@suse.com/

  which is why this got merge after the merge window closed.  - Linus ]

* 'kmap-conversion-for-5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: use copy_highpage() instead of 2 kmaps()
  btrfs: use memcpy_[to|from]_page() and kmap_local_page()
  mm/highmem: Add VM_BUG_ON() to mem*_page() calls
  mm/highmem: Introduce memcpy_page(), memmove_page(), and memset_page()
  mm/highmem: Convert memcpy_[to|from]_page() to kmap_local_page()
  mm/highmem: Lift memcpy_[to|from]_page to core
2021-03-01 11:24:18 -08:00
Linus Torvalds
c608aca57d Merge tag 'for-5.12-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
 "This is the first batch of fixes that usually arrive during the merge
  window code freeze. Regressions and stable material.

  Regressions:

   - fix deadlock in log sync in zoned mode

   - fix bugs in subpage mode still wrongly assuming sectorsize == page
     size

  Fixes:

   - fix missing kunmap of the Q stripe in RAID6

   - block group fixes:
      - fix race between extent freeing/allocation when using bitmaps
      - avoid double put of block group when emptying cluster

   - swapfile fixes:
      - fix swapfile writes vs running scrub
      - fix swapfile activation vs snapshot creation

   - fix stale data exposure after cloning a hole with NO_HOLES enabled

   - remove tree-checker check that does not work in case information
     from other leaves is necessary"

* tag 'for-5.12-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: zoned: fix deadlock on log sync
  btrfs: avoid double put of block group when emptying cluster
  btrfs: fix stale data exposure after cloning a hole with NO_HOLES enabled
  btrfs: tree-checker: do not error out if extent ref hash doesn't match
  btrfs: fix race between swap file activation and snapshot creation
  btrfs: fix race between writes to swap files and scrub
  btrfs: avoid checking for RO block group twice during nocow writeback
  btrfs: fix race between extent freeing/allocation when using bitmaps
  btrfs: make check_compressed_csum() to be subpage compatible
  btrfs: make btrfs_submit_compressed_read() subpage compatible
  btrfs: fix raid6 qstripe kmap
2021-03-01 11:17:37 -08:00
YueHaibing
3a9b3d4536 IB/mlx5: Add missing error code
Set err to -ENOMEM if kzalloc fails instead of 0.

Fixes: 7597385371 ("IB/mlx5: Enable subscription for device events over DEVX")
Link: https://lore.kernel.org/r/20210222122343.19720-1-yuehaibing@huawei.com
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-03-01 14:49:09 -04:00
Julian Braha
475f23b8c6 RDMA/rxe: Fix missing kconfig dependency on CRYPTO
When RDMA_RXE is enabled and CRYPTO is disabled, Kbuild gives the
following warning:

 WARNING: unmet direct dependencies detected for CRYPTO_CRC32
   Depends on [n]: CRYPTO [=n]
   Selected by [y]:
   - RDMA_RXE [=y] && (INFINIBAND_USER_ACCESS [=y] || !INFINIBAND_USER_ACCESS [=y]) && INET [=y] && PCI [=y] && INFINIBAND [=y] && INFINIBAND_VIRT_DMA [=y]

This is because RDMA_RXE selects CRYPTO_CRC32, without depending on or
selecting CRYPTO, despite that config option being subordinate to CRYPTO.

Fixes: cee2688e3c ("IB/rxe: Offload CRC calculation when possible")
Signed-off-by: Julian Braha <julianbraha@gmail.com>
Link: https://lore.kernel.org/r/21525878.NYvzQUHefP@ubuntu-mate-laptop
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-03-01 14:46:31 -04:00
Saeed Mahameed
221384df61 RDMA/cm: Fix IRQ restore in ib_send_cm_sidr_rep
ib_send_cm_sidr_rep() {
	spin_lock_irqsave()
        cm_send_sidr_rep_locked() {
                ...
        	spin_lock_irq()
                ....
                spin_unlock_irq() <--- this will enable interrupts
        }
        spin_unlock_irqrestore()
}

spin_unlock_irqrestore() expects interrupts to be disabled but the
internal spin_unlock_irq() will always enable hard interrupts.

Fix this by replacing the internal spin_{lock,unlock}_irq() with
irqsave/restore variants.

It fixes the following kernel trace:

 raw_local_irq_restore() called with IRQs enabled
 WARNING: CPU: 2 PID: 20001 at kernel/locking/irqflag-debug.c:10 warn_bogus_irq_restore+0x1d/0x20

 Call Trace:
  _raw_spin_unlock_irqrestore+0x4e/0x50
  ib_send_cm_sidr_rep+0x3a/0x50 [ib_cm]
  cma_send_sidr_rep+0xa1/0x160 [rdma_cm]
  rdma_accept+0x25e/0x350 [rdma_cm]
  ucma_accept+0x132/0x1cc [rdma_ucm]
  ucma_write+0xbf/0x140 [rdma_ucm]
  vfs_write+0xc1/0x340
  ksys_write+0xb3/0xe0
  do_syscall_64+0x2d/0x40
  entry_SYSCALL_64_after_hwframe+0x44/0xae

Fixes: 87c4c774cb ("RDMA/cm: Protect access to remote_sidr_table")
Link: https://lore.kernel.org/r/20210301081844.445823-1-leon@kernel.org
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-03-01 14:43:16 -04:00
Rob Herring
b5a95bb188 dt-bindings: media: Use graph and video-interfaces schemas, round 2
A couple of media schemas got applied without using or incorrectly
using the video-interfaces.yaml and graph.yaml schemas. Fix them up
before we have more copy-n-paste errors.

Fixes: 41b3e23376 ("media: dt-bindings: media: Add bindings for imx334")
Fixes: d899e5f1db ("media: dt-bindings: media: imx258: add bindings for IMX258 sensor")
Fixes: 918b866edf ("media: dt-bindings: Remove old ov5647.yaml file, update ovti,ov5647.yaml")
Fixes: 22f2b47517 ("media: dt-bindings: media: i2c: Add OV8865 bindings documentation")
Fixes: 29a202fa7a ("media: dt-bindings: media: i2c: Add OV5648 bindings documentation")
Cc: Sakari Ailus <sakari.ailus@linux.intel.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Dave Stevenson <dave.stevenson@raspberrypi.com>
Cc: Jacopo Mondi <jacopo@jmondi.org>
Cc: "Paul J. Murphy" <paul.j.murphy@intel.com>
Cc: Daniele Alessandrelli <daniele.alessandrelli@intel.com>
Cc: Krzysztof Kozlowski <krzk@kernel.org>
Cc: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
Cc: linux-media@vger.kernel.org
Signed-off-by: Rob Herring <robh@kernel.org>
Acked-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Acked-by: Krzysztof Kozlowski <krzk@kernel.org>
Link: https://lore.kernel.org/r/20210223210127.55455-1-robh@kernel.org
2021-03-01 11:30:21 -06:00
Jens Axboe
65d4302317 io-wq: wait for worker startup when forking a new one
We need to have our worker count updated before continuing, to avoid
cases where we repeatedly think we need a new worker, but a fork is
already in progress.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-01 10:21:06 -07:00
Jean Delvare
5218e12e9f block: Drop leftover references to RQF_SORTED
Commit a1ce35fa49 ("block: remove dead
elevator code") removed all users of RQF_SORTED. However it is still
defined, and there is one reference left to it (which in effect is
dead code). Clear it all up.

Signed-off-by: Jean Delvare <jdelvare@suse.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Omar Sandoval <osandov@fb.com>
Cc: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-01 10:15:07 -07:00
Daniel Lezcano
9e2be308f0 powercap/drivers/dtpm: Add the experimental label to the option description
The DTPM framework will evolve in the next cycles. Let's add a
temporary EXPERIMENTAL tag to the option so users will be aware
the API may change over time.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-03-01 17:43:29 +01:00
Daniel Lezcano
f3c1410579 powercap/drivers/dtpm: Fix root node initialization
The root node is not set to NULL when the dtpm root node is
removed. Consequently, it is not possible to create a new root
as it is already set.

Set the root node to NULL when the last node is removed.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-03-01 17:43:29 +01:00
Rafael J. Wysocki
44cc89f764 PM: runtime: Update device status before letting suppliers suspend
Because the PM-runtime status of the device is not updated in
__rpm_callback(), attempts to suspend the suppliers of the given
device triggered by rpm_put_suppliers() called by it may fail.

Fix this by making __rpm_callback() update the device's status to
RPM_SUSPENDED before calling rpm_put_suppliers() if the current
status of the device is RPM_SUSPENDING and the callback just invoked
by it has returned 0 (success).

While at it, modify the code in __rpm_callback() to always check
the device's PM-runtime status under its PM lock.

Link: https://lore.kernel.org/linux-pm/CAPDyKFqm06KDw_p8WXsM4dijDbho4bb6T4k50UqqvR1_COsp8g@mail.gmail.com/
Fixes: 21d5c57b37 ("PM / runtime: Use device links")
Reported-by: Elaine Zhang <zhangqing@rock-chips.com>
Diagnosed-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Elaine Zhang <zhangiqng@rock-chips.com>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Cc: 4.10+ <stable@vger.kernel.org> # 4.10+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-03-01 17:40:38 +01:00
Nicolas MURE
2c9119001d ALSA: usb-audio: Fix Pioneer DJM devices URB_CONTROL request direction to set samplerate
This commit only contains the fix about the `URB_CONTROL` request
direction to set the samplerate of Pioneer DJM devices (`URB_CONTROL out`).

Fixes: 3b85f5fc75 ("ALSA: usb-audio: Add DJM450 to Pioneer format quirk")
Signed-off-by: Nicolas MURE <nicolas.mure2019@gmail.com>
Link: https://lore.kernel.org/r/20210301142927.14552-1-nicolas.mure2019@gmail.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2021-03-01 15:44:53 +01:00
Torin Cooper-Bennun
2712625200 can: tcan4x5x: tcan4x5x_init(): fix initialization - clear MRAM before entering Normal Mode
This patch prevents a potentially destructive race condition. The
device is fully operational on the bus after entering Normal Mode, so
zeroing the MRAM after entering this mode may lead to loss of
information, e.g. new received messages.

This patch fixes the problem by first initializing the MRAM, then
bringing the device into Normale Mode.

Fixes: 5443c226ba ("can: tcan4x5x: Add tcan4x5x driver to the kernel")
Link: https://lore.kernel.org/r/20210226163440.313628-1-torin@maxiluxsystems.com
Suggested-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Torin Cooper-Bennun <torin@maxiluxsystems.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2021-03-01 11:45:15 +01:00
Oleksij Rempel
e940e0895a can: skb: can_skb_set_owner(): fix ref counting if socket was closed before setting skb ownership
There are two ref count variables controlling the free()ing of a socket:
- struct sock::sk_refcnt - which is changed by sock_hold()/sock_put()
- struct sock::sk_wmem_alloc - which accounts the memory allocated by
  the skbs in the send path.

In case there are still TX skbs on the fly and the socket() is closed,
the struct sock::sk_refcnt reaches 0. In the TX-path the CAN stack
clones an "echo" skb, calls sock_hold() on the original socket and
references it. This produces the following back trace:

| WARNING: CPU: 0 PID: 280 at lib/refcount.c:25 refcount_warn_saturate+0x114/0x134
| refcount_t: addition on 0; use-after-free.
| Modules linked in: coda_vpu(E) v4l2_jpeg(E) videobuf2_vmalloc(E) imx_vdoa(E)
| CPU: 0 PID: 280 Comm: test_can.sh Tainted: G            E     5.11.0-04577-gf8ff6603c617 #203
| Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
| Backtrace:
| [<80bafea4>] (dump_backtrace) from [<80bb0280>] (show_stack+0x20/0x24) r7:00000000 r6:600f0113 r5:00000000 r4:81441220
| [<80bb0260>] (show_stack) from [<80bb593c>] (dump_stack+0xa0/0xc8)
| [<80bb589c>] (dump_stack) from [<8012b268>] (__warn+0xd4/0x114) r9:00000019 r8:80f4a8c2 r7:83e4150c r6:00000000 r5:00000009 r4:80528f90
| [<8012b194>] (__warn) from [<80bb09c4>] (warn_slowpath_fmt+0x88/0xc8) r9:83f26400 r8:80f4a8d1 r7:00000009 r6:80528f90 r5:00000019 r4:80f4a8c2
| [<80bb0940>] (warn_slowpath_fmt) from [<80528f90>] (refcount_warn_saturate+0x114/0x134) r8:00000000 r7:00000000 r6:82b44000 r5:834e5600 r4:83f4d540
| [<80528e7c>] (refcount_warn_saturate) from [<8079a4c8>] (__refcount_add.constprop.0+0x4c/0x50)
| [<8079a47c>] (__refcount_add.constprop.0) from [<8079a57c>] (can_put_echo_skb+0xb0/0x13c)
| [<8079a4cc>] (can_put_echo_skb) from [<8079ba98>] (flexcan_start_xmit+0x1c4/0x230) r9:00000010 r8:83f48610 r7:0fdc0000 r6:0c080000 r5:82b44000 r4:834e5600
| [<8079b8d4>] (flexcan_start_xmit) from [<80969078>] (netdev_start_xmit+0x44/0x70) r9:814c0ba0 r8:80c8790c r7:00000000 r6:834e5600 r5:82b44000 r4:82ab1f00
| [<80969034>] (netdev_start_xmit) from [<809725a4>] (dev_hard_start_xmit+0x19c/0x318) r9:814c0ba0 r8:00000000 r7:82ab1f00 r6:82b44000 r5:00000000 r4:834e5600
| [<80972408>] (dev_hard_start_xmit) from [<809c6584>] (sch_direct_xmit+0xcc/0x264) r10:834e5600 r9:00000000 r8:00000000 r7:82b44000 r6:82ab1f00 r5:834e5600 r4:83f27400
| [<809c64b8>] (sch_direct_xmit) from [<809c6c0c>] (__qdisc_run+0x4f0/0x534)

To fix this problem, only set skb ownership to sockets which have still
a ref count > 0.

Fixes: 0ae89beb28 ("can: add destructor for self generated skbs")
Cc: Oliver Hartkopp <socketcan@hartkopp.net>
Cc: Andre Naujoks <nautsch2@gmail.com>
Link: https://lore.kernel.org/r/20210226092456.27126-1-o.rempel@pengutronix.de
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2021-03-01 11:45:04 +01:00
Marc Kleine-Budde
2afe72ead5 can: mcp251xfd: revert "can: mcp251xfd: add BQL support"
In the following 4 patches

| 99842c9685 can: dev: can_rx_offload_get_echo_skb(): extend to return can frame length
| 9420e1d495 can: dev: can_get_echo_skb(): extend to return can frame length
| 1dcb6e57db can: dev: can_put_echo_skb(): extend to handle frame_len
| f0ef72febc can: dev: extend struct can_skb_priv to hold CAN frame length

the CAN echo SKB support was extended to hold the CAN frame
length (which is the length of the CAN frame on the wire). It is meant
as a helper for BQL support, to avoid the re-calculation of the frame
length before sending it and on TX-completion.

However if the CAN frame is send without the request to be looped back
the SKB is discarded in can_put_echo_skb() and the subsequent
can_get_echo_skb() and can_rx_offload_get_echo_skb() return 0 for the
CAN frame length. This results in BQL stalling the TX queue after a
few packages.

Until the BQL helpers can_get_echo_skb() and
can_rx_offload_get_echo_skb() are fixed, revert the BQL support for
the mcp251xfd driver.

This reverts commit 4162e18e94.

Fixes: 4162e18e94 ("can: mcp251xfd: add BQL support")
Cc: Manivannan Sadhasivam <mani@kernel.org>
Cc: Thomas Kopp <thomas.kopp@microchip.com>
Link: https://lore.kernel.org/r/20210228083347.28580-1-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2021-03-01 11:44:45 +01:00
Joakim Zhang
c63820045e can: flexcan: invoke flexcan_chip_freeze() to enter freeze mode
Invoke flexcan_chip_freeze() to enter freeze mode, since need poll
freeze mode acknowledge.

Fixes: e955cead03 ("CAN: Add Flexcan CAN controller driver")
Link: https://lore.kernel.org/r/20210218110037.16591-4-qiangqing.zhang@nxp.com
Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2021-03-01 10:34:56 +01:00
Joakim Zhang
ec15e27cc8 can: flexcan: enable RX FIFO after FRZ/HALT valid
RX FIFO enable failed could happen when do system reboot stress test:

[    0.303958] flexcan 5a8d0000.can: 5a8d0000.can supply xceiver not found, using dummy regulator
[    0.304281] flexcan 5a8d0000.can (unnamed net_device) (uninitialized): Could not enable RX FIFO, unsupported core
[    0.314640] flexcan 5a8d0000.can: registering netdev failed
[    0.320728] flexcan 5a8e0000.can: 5a8e0000.can supply xceiver not found, using dummy regulator
[    0.320991] flexcan 5a8e0000.can (unnamed net_device) (uninitialized): Could not enable RX FIFO, unsupported core
[    0.331360] flexcan 5a8e0000.can: registering netdev failed
[    0.337444] flexcan 5a8f0000.can: 5a8f0000.can supply xceiver not found, using dummy regulator
[    0.337716] flexcan 5a8f0000.can (unnamed net_device) (uninitialized): Could not enable RX FIFO, unsupported core
[    0.348117] flexcan 5a8f0000.can: registering netdev failed

RX FIFO should be enabled after the FRZ/HALT are valid. But the current
code enable RX FIFO and FRZ/HALT at the same time.

Fixes: e955cead03 ("CAN: Add Flexcan CAN controller driver")
Link: https://lore.kernel.org/r/20210218110037.16591-3-qiangqing.zhang@nxp.com
Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2021-03-01 10:34:56 +01:00
Joakim Zhang
449052cfeb can: flexcan: assert FRZ bit in flexcan_chip_freeze()
Assert HALT bit to enter freeze mode, there is a premise that FRZ bit is
asserted. This patch asserts FRZ bit in flexcan_chip_freeze, although
the reset value is 1b'1. This is a prepare patch, later patch will
invoke flexcan_chip_freeze() to enter freeze mode, which polling freeze
mode acknowledge.

Fixes: b1aa1c7a21 ("can: flexcan: fix transition from and to freeze mode in chip_{,un}freeze")
Link: https://lore.kernel.org/r/20210218110037.16591-2-qiangqing.zhang@nxp.com
Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2021-03-01 10:34:56 +01:00
Andrea Fagiani
11302bb69e ALSA: usb-audio: use Corsair Virtuoso mapping for Corsair Virtuoso SE
The Corsair Virtuoso SE RGB Wireless is a USB headset with a mic and a
sidetone feature. Assign the Corsair Virtuoso name map to the SE product
ids as well, in order to label its mixer appropriately and allow
userspace to pick the correct volume controls.

Signed-off-by: Andrea Fagiani <andfagiani@gmail.com>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/40bbdf55-f854-e2ee-87b4-183e6451352c@gmail.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2021-03-01 09:24:47 +01:00
Damien Le Moal
0f47227705 block: revert "block: fix bd_size_lock use"
With the removal of the skd driver, using IRQ safe locking of a bdev
bd_size_lock spinlock to protect the bdev inode size is not necessary
anymore as there is no other known driver using this lock under an IRQ
disabled context (e.g. calling set_capacity() with IRQ disabled).
Revert commit 0fe37724f8 ("block: fix bd_size_lock use") which
introduced the IRQ safe change.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-28 20:12:02 -07:00
Christophe Leroy
91b6c5dbe9 powerpc/syscall: Force inlining of __prep_irq_for_enabled_exit()
As reported by kernel test robot, a randconfig with high amount of
debuging options can lead to build failure for undefined reference
to replay_soft_interrupts() on ppc32.

This is due to gcc not seeing that __prep_irq_for_enabled_exit()
always returns true on ppc32 because it doesn't inline it for
some reason.

Force inlining of __prep_irq_for_enabled_exit() to fix the build.

Fixes: 344bb20b15 ("powerpc/syscall: Make interrupt.c buildable on PPC32")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/53f3a1f719441761000c41154602bf097d4350b5.1614148356.git.christophe.leroy@csgroup.eu
2021-03-01 12:33:31 +11:00
Christophe Leroy
c119565a15 powerpc/603: Fix protection of user pages mapped with PROT_NONE
On book3s/32, page protection is defined by the PP bits in the PTE
which provide the following protection depending on the access
keys defined in the matching segment register:
- PP 00 means RW with key 0 and N/A with key 1.
- PP 01 means RW with key 0 and RO with key 1.
- PP 10 means RW with both key 0 and key 1.
- PP 11 means RO with both key 0 and key 1.

Since the implementation of kernel userspace access protection,
PP bits have been set as follows:
- PP00 for pages without _PAGE_USER
- PP01 for pages with _PAGE_USER and _PAGE_RW
- PP11 for pages with _PAGE_USER and without _PAGE_RW

For kernelspace segments, kernel accesses are performed with key 0
and user accesses are performed with key 1. As PP00 is used for
non _PAGE_USER pages, user can't access kernel pages not flagged
_PAGE_USER while kernel can.

For userspace segments, both kernel and user accesses are performed
with key 0, therefore pages not flagged _PAGE_USER are still
accessible to the user.

This shouldn't be an issue, because userspace is expected to be
accessible to the user. But unlike most other architectures, powerpc
implements PROT_NONE protection by removing _PAGE_USER flag instead of
flagging the page as not valid. This means that pages in userspace
that are not flagged _PAGE_USER shall remain inaccessible.

To get the expected behaviour, just mimic other architectures in the
TLB miss handler by checking _PAGE_USER permission on userspace
accesses as if it was the _PAGE_PRESENT bit.

Note that this problem only is only for 603 cores. The 604+ have
an hash table, and hash_page() function already implement the
verification of _PAGE_USER permission on userspace pages.

Fixes: f342adca3a ("powerpc/32s: Prepare Kernel Userspace Access Protection")
Cc: stable@vger.kernel.org # v5.2+
Reported-by: Christoph Plattner <christoph.plattner@thalesgroup.com>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/4a0c6e3bb8f0c162457bf54d9bc6fd8d7b55129f.1612160907.git.christophe.leroy@csgroup.eu
2021-03-01 12:33:31 +11:00
Greg Kurz
f9619d5e51 powerpc/pseries: Don't enforce MSI affinity with kdump
Depending on the number of online CPUs in the original kernel, it is
likely for CPU #0 to be offline in a kdump kernel. The associated IRQs
in the affinity mappings provided by irq_create_affinity_masks() are
thus not started by irq_startup(), as per-design with managed IRQs.

This can be a problem with multi-queue block devices driven by blk-mq :
such a non-started IRQ is very likely paired with the single queue
enforced by blk-mq during kdump (see blk_mq_alloc_tag_set()). This
causes the device to remain silent and likely hangs the guest at
some point.

This is a regression caused by commit 9ea69a55b3 ("powerpc/pseries:
Pass MSI affinity to irq_create_mapping()"). Note that this only happens
with the XIVE interrupt controller because XICS has a workaround to bypass
affinity, which is activated during kdump with the "noirqdistrib" kernel
parameter.

The issue comes from a combination of factors:
- discrepancy between the number of queues detected by the multi-queue
  block driver, that was used to create the MSI vectors, and the single
  queue mode enforced later on by blk-mq because of kdump (i.e. keeping
  all queues fixes the issue)
- CPU#0 offline (i.e. kdump always succeed with CPU#0)

Given that I couldn't reproduce on x86, which seems to always have CPU#0
online even during kdump, I'm not sure where this should be fixed. Hence
going for another approach : fine-grained affinity is for performance
and we don't really care about that during kdump. Simply revert to the
previous working behavior of ignoring affinity masks in this case only.

Fixes: 9ea69a55b3 ("powerpc/pseries: Pass MSI affinity to irq_create_mapping()")
Cc: stable@vger.kernel.org # v5.10+
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210215094506.1196119-1-groug@kaod.org
2021-03-01 12:33:31 +11:00
Michael Ellerman
eead089311 powerpc/4xx: Fix build errors from mfdcr()
lkp reported a build error in fsp2.o:

  CC      arch/powerpc/platforms/44x/fsp2.o
  {standard input}:577: Error: unsupported relocation against base

Which comes from:

  pr_err("GESR0: 0x%08x\n", mfdcr(base + PLB4OPB_GESR0));

Where our mfdcr() macro is stringifying "base + PLB4OPB_GESR0", and
passing that to the assembler, which obviously doesn't work.

The mfdcr() macro already checks that the argument is constant using
__builtin_constant_p(), and if not calls the out-of-line version of
mfdcr(). But in this case GCC is smart enough to notice that "base +
PLB4OPB_GESR0" will be constant, even though it's not something we can
immediately stringify into a register number.

Segher pointed out that passing the register number to the inline asm
as a constant would be better, and in fact it fixes the build error,
presumably because it gives GCC a chance to resolve the value.

While we're at it, change mtdcr() similarly.

Reported-by: kernel test robot <lkp@intel.com>
Suggested-by: Segher Boessenkool <segher@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Feng Tang <feng.tang@intel.com>
Link: https://lore.kernel.org/r/20210218123058.748882-1-mpe@ellerman.id.au
2021-03-01 12:33:31 +11:00
Jakub Kicinski
447621e373 Merge branch 'net-hns3-fixes-fot-net'
Huazhong Tan says:

====================
net: hns3: fixes fot -net

The patchset includes some fixes for the HNS3 ethernet driver.
====================

Link: https://lore.kernel.org/r/1614410693-8107-1-git-send-email-tanhuazhong@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-28 12:04:02 -08:00
Jian Shen
b36fc875bc net: hns3: fix bug when calculating the TCAM table info
The function hclge_fd_convert_tuple() is used to convert tuples
and tuples mask to TCAM x and y.  But it misuses the source mac
as source mac mask when convert INNER_SRC_MAC, which may cause
the flow director rule works unexpectedly. So fix it.

Fixes: 1173286802 ("net: hns3: Add input key and action config support for flow director")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-28 12:04:00 -08:00
Jian Shen
c75ec148a3 net: hns3: fix query vlan mask value error for flow director
Currently, the driver returns VLAN_VID_MASK for vlan mask field,
when get flow director rule information for rule doesn't use vlan.
It may cause the vlan mask value display as 0xf000 in this
case, like below:

estuary:/$ ethtool -u eth1
50 RX rings available
Total 1 rules

Filter: 2
Rule Type: TCP over IPv4
Src IP addr: 0.0.0.0 mask: 255.255.255.255
Dest IP addr: 0.0.0.0 mask: 255.255.255.255
TOS: 0x0 mask: 0xff
Src port: 0 mask: 0xffff
Dest port: 0 mask: 0xffff
VLAN EtherType: 0x0 mask: 0xffff
VLAN: 0x0 mask: 0xf000
User-defined: 0x1234 mask: 0x0
Action: Direct to queue 3

Fix it by return 0.

Fixes: 05c2314fe6 ("net: hns3: Add support for rule query of flow director")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-28 12:03:59 -08:00
Jian Shen
ae85ddda0f net: hns3: fix error mask definition of flow director
Currently, some bit filed definitions of flow director TCAM
configuration command are incorrect. Since the wrong MSB is
always 0, and these fields are assgined in order, so it still works.

Fix it by redefine them.

Fixes: 1173286802 ("net: hns3: Add input key and action config support for flow director")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-28 12:03:59 -08:00
Daniel Borkmann
89e5c58fc1 net: Fix gro aggregation for udp encaps with zero csum
We noticed a GRO issue for UDP-based encaps such as vxlan/geneve when the
csum for the UDP header itself is 0. In that case, GRO aggregation does
not take place on the phys dev, but instead is deferred to the vxlan/geneve
driver (see trace below).

The reason is essentially that GRO aggregation bails out in udp_gro_receive()
for such case when drivers marked the skb with CHECKSUM_UNNECESSARY (ice, i40e,
others) where for non-zero csums 2abb7cdc0d ("udp: Add support for doing
checksum unnecessary conversion") promotes those skbs to CHECKSUM_COMPLETE
and napi context has csum_valid set. This is however not the case for zero
UDP csum (here: csum_cnt is still 0 and csum_valid continues to be false).

At the same time 57c67ff4bd ("udp: additional GRO support") added matches
on !uh->check ^ !uh2->check as part to determine candidates for aggregation,
so it certainly is expected to handle zero csums in udp_gro_receive(). The
purpose of the check added via 662880f442 ("net: Allow GRO to use and set
levels of checksum unnecessary") seems to catch bad csum and stop aggregation
right away.

One way to fix aggregation in the zero case is to only perform the !csum_valid
check in udp_gro_receive() if uh->check is infact non-zero.

Before:

  [...]
  swapper     0 [008]   731.946506: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497100400 len=1500   (1)
  swapper     0 [008]   731.946507: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497100200 len=1500
  swapper     0 [008]   731.946507: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497101100 len=1500
  swapper     0 [008]   731.946508: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497101700 len=1500
  swapper     0 [008]   731.946508: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497101b00 len=1500
  swapper     0 [008]   731.946508: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497100600 len=1500
  swapper     0 [008]   731.946508: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497100f00 len=1500
  swapper     0 [008]   731.946509: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497100a00 len=1500
  swapper     0 [008]   731.946516: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497100500 len=1500
  swapper     0 [008]   731.946516: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497100700 len=1500
  swapper     0 [008]   731.946516: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497101d00 len=1500   (2)
  swapper     0 [008]   731.946517: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497101000 len=1500
  swapper     0 [008]   731.946517: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497101c00 len=1500
  swapper     0 [008]   731.946517: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497101400 len=1500
  swapper     0 [008]   731.946518: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497100e00 len=1500
  swapper     0 [008]   731.946518: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497101600 len=1500
  swapper     0 [008]   731.946521: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497100800 len=774
  swapper     0 [008]   731.946530: net:netif_receive_skb: dev=test_vxlan skbaddr=0xffff966497100400 len=14032 (1)
  swapper     0 [008]   731.946530: net:netif_receive_skb: dev=test_vxlan skbaddr=0xffff966497101d00 len=9112  (2)
  [...]

  # netperf -H 10.55.10.4 -t TCP_STREAM -l 20
  MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.55.10.4 () port 0 AF_INET : demo
  Recv   Send    Send
  Socket Socket  Message  Elapsed
  Size   Size    Size     Time     Throughput
  bytes  bytes   bytes    secs.    10^6bits/sec

   87380  16384  16384    20.01    13129.24

After:

  [...]
  swapper     0 [026]   521.862641: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff93ab0d479000 len=11286 (1)
  swapper     0 [026]   521.862643: net:netif_receive_skb: dev=test_vxlan skbaddr=0xffff93ab0d479000 len=11236 (1)
  swapper     0 [026]   521.862650: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff93ab0d478500 len=2898  (2)
  swapper     0 [026]   521.862650: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff93ab0d479f00 len=8490  (3)
  swapper     0 [026]   521.862653: net:netif_receive_skb: dev=test_vxlan skbaddr=0xffff93ab0d478500 len=2848  (2)
  swapper     0 [026]   521.862653: net:netif_receive_skb: dev=test_vxlan skbaddr=0xffff93ab0d479f00 len=8440  (3)
  [...]

  # netperf -H 10.55.10.4 -t TCP_STREAM -l 20
  MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.55.10.4 () port 0 AF_INET : demo
  Recv   Send    Send
  Socket Socket  Message  Elapsed
  Size   Size    Size     Time     Throughput
  bytes  bytes   bytes    secs.    10^6bits/sec

   87380  16384  16384    20.01    24576.53

Fixes: 57c67ff4bd ("udp: additional GRO support")
Fixes: 662880f442 ("net: Allow GRO to use and set levels of checksum unnecessary")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
Cc: Tom Herbert <tom@herbertland.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/r/20210226212248.8300-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-28 12:01:08 -08:00
Rafał Miłecki
d313d16bba net: broadcom: bcm4908_enet: enable RX after processing packets
When receiving a lot of packets hardware may run out of free
descriptiors and stop RX ring. Enable it every time after handling
received packets.

Fixes: 4feffeadbc ("net: broadcom: bcm4908enet: add BCM4908 controller driver")
Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20210226132038.29849-1-zajec5@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-28 11:51:31 -08:00
Yinjun Zhang
a4fc088ad4 ethtool: fix the check logic of at least one channel for RX/TX
The command "ethtool -L <intf> combined 0" may clean the RX/TX channel
count and skip the error path, since the attrs
tb[ETHTOOL_A_CHANNELS_RX_COUNT] and tb[ETHTOOL_A_CHANNELS_TX_COUNT]
are NULL in this case when recent ethtool is used.

Tested using ethtool v5.10.

Fixes: 7be92514b9 ("ethtool: check if there is at least one channel for TX/RX in the core")
Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Louis Peens <louis.peens@netronome.com>
Link: https://lore.kernel.org/r/20210225125102.23989-1-simon.horman@netronome.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-28 11:49:07 -08:00
Chris Chiu
d0e185616a ALSA: hda/realtek: Enable headset mic of Acer SWIFT with ALC256
The Acer SWIFT Swift SF314-54/55 laptops with ALC256 cannot detect
both the headset mic and the internal mic. Introduce new fixup
to enable the jack sense and the headset mic. However, the internal
mic actually connects to Intel SST audio. It still needs Intel SST
support to make internal mic capture work.

Signed-off-by: Chris Chiu <chris.chiu@canonical.com>
Acked-by: Jian-Hong Pan <jhp@endlessos.org>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20210226010440.8474-1-chris.chiu@canonical.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2021-02-28 09:14:07 +01:00
Colin Ian King
26a9630c72 ALSA: ctxfi: cthw20k2: fix mask on conf to allow 4 bits
Currently the mask operation on variable conf is just 3 bits so
the switch statement case value of 8 is unreachable dead code.
The function daio_mgr_dao_init can be passed a 4 bit value,
function dao_rsc_init calls it with conf set to:

     conf = (desc->msr & 0x7) | (desc->passthru << 3);

so clearly when desc->passthru is set to 1 then conf can be
at least 8.

Fix this by changing the mask to 0xf.

Fixes: 8cc7236148 ("ALSA: SB X-Fi driver merge")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Link: https://lore.kernel.org/r/20210227001527.1077484-1-colin.king@canonical.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2021-02-28 09:12:41 +01:00
Takashi Iwai
5f5e6a3e8b ALSA: usb-audio: Allow modifying parameters with succeeding hw_params calls
The recent fix for the hw constraints for implicit feedback streams
via commit e4ea77f8e5 ("ALSA: usb-audio: Always apply the hw
constraints for implicit fb sync") added the check of the matching
endpoints and whether those EPs are already opened.  This is needed
and correct, per se, even for the normal streams without the implicit
feedback, as the endpoint setup is exclusive.

However, it's reported that there seem applications that behave in
unexpected ways to update the hw_params without clearing the previous
setup via hw_free, and those hit a problem now: then hw_params is
called with still the previous EP setup kept, hence it's restricted
with the previous own setup.  Although the obvious fix is to call
snd_pcm_hw_free() API in the application side, it's a kind of
unwelcome change.

This patch tries to ease the situation: in the endpoint check, we add
a couple of more conditions and now skip the endpoint that is being
used only by the stream in question itself.  That is, in addition to
the presence check of ep (ep->cur_audiofmt is non-NULL), when the
following conditions are met, we skip such an ep:
- ep->opened == 1, and
- ep->cur_audiofmt == subs->cur_audiofmt.

subs->cur_audiofmt is non-NULL only if it's a re-setup of hw_params,
and ep->cur_audiofmt points to the currently set up parameters.  So if
those match, it must be this stream itself.

Fixes: e4ea77f8e5 ("ALSA: usb-audio: Always apply the hw constraints for implicit fb sync")
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=211941
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20210228080138.9936-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2021-02-28 09:02:08 +01:00
Vasily Averin
8e24edddad netfilter: x_tables: gpf inside xt_find_revision()
nested target/match_revfn() calls work with xt[NFPROTO_UNSPEC] lists
without taking xt[NFPROTO_UNSPEC].mutex. This can race with module unload
and cause host to crash:

general protection fault: 0000 [#1]
Modules linked in: ... [last unloaded: xt_cluster]
CPU: 0 PID: 542455 Comm: iptables
RIP: 0010:[<ffffffff8ffbd518>]  [<ffffffff8ffbd518>] strcmp+0x18/0x40
RDX: 0000000000000003 RSI: ffff9a5a5d9abe10 RDI: dead000000000111
R13: ffff9a5a5d9abe10 R14: ffff9a5a5d9abd8c R15: dead000000000100
(VvS: %R15 -- &xt_match,  %RDI -- &xt_match.name,
xt_cluster unregister match in xt[NFPROTO_UNSPEC].match list)
Call Trace:
 [<ffffffff902ccf44>] match_revfn+0x54/0xc0
 [<ffffffff902ccf9f>] match_revfn+0xaf/0xc0
 [<ffffffff902cd01e>] xt_find_revision+0x6e/0xf0
 [<ffffffffc05a5be0>] do_ipt_get_ctl+0x100/0x420 [ip_tables]
 [<ffffffff902cc6bf>] nf_getsockopt+0x4f/0x70
 [<ffffffff902dd99e>] ip_getsockopt+0xde/0x100
 [<ffffffff903039b5>] raw_getsockopt+0x25/0x50
 [<ffffffff9026c5da>] sock_common_getsockopt+0x1a/0x20
 [<ffffffff9026b89d>] SyS_getsockopt+0x7d/0xf0
 [<ffffffff903cbf92>] system_call_fastpath+0x25/0x2a

Fixes: 656caff20e ("netfilter 04/09: x_tables: fix match/target revision lookup")
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-02-28 00:25:16 +01:00
Florian Westphal
c2c16ccba2 selftests: netfilter: test nat port clash resolution interaction with tcp early demux
Convert Antonio Ojeas bug reproducer to a kselftest.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-02-28 00:25:16 +01:00
Florian Westphal
07b5a76e18 netfilter: conntrack: avoid misleading 'invalid' in log message
The packet is not flagged as invalid: conntrack will accept it and
its associated with the conntrack entry.

This happens e.g. when receiving a retransmitted SYN in SYN_RECV state.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-02-28 00:25:16 +01:00
Florian Westphal
03a3ca37e4 netfilter: nf_nat: undo erroneous tcp edemux lookup
Under extremely rare conditions TCP early demux will retrieve the wrong
socket.

1. local machine establishes a connection to a remote server, S, on port
   p.

   This gives:
   laddr:lport -> S:p
   ... both in tcp and conntrack.

2. local machine establishes a connection to host H, on port p2.
   2a. TCP stack choses same laddr:lport, so we have
   laddr:lport -> H:p2 from TCP point of view.
   2b). There is a destination NAT rewrite in place, translating
        H:p2 to S:p.  This results in following conntrack entries:

   I)  laddr:lport -> S:p  (origin)  S:p -> laddr:lport (reply)
   II) laddr:lport -> H:p2 (origin)  S:p -> laddr:lport2 (reply)

   NAT engine has rewritten laddr:lport to laddr:lport2 to map
   the reply packet to the correct origin.

   When server sends SYN/ACK to laddr:lport2, the PREROUTING hook
   will undo-the SNAT transformation, rewriting IP header to
   S:p -> laddr:lport

   This causes TCP early demux to associate the skb with the TCP socket
   of the first connection.

   The INPUT hook will then reverse the DNAT transformation, rewriting
   the IP header to H:p2 -> laddr:lport.

Because packet ends up with the wrong socket, the new connection
never completes: originator stays in SYN_SENT and conntrack entry
remains in SYN_RECV until timeout, and responder retransmits SYN/ACK
until it gives up.

To resolve this, orphan the skb after the input rewrite:
Because the source IP address changed, the socket must be incorrect.
We can't move the DNAT undo to prerouting due to backwards
compatibility, doing so will make iptables/nftables rules to no longer
match the way they did.

After orphan, the packet will be handed to the next protocol layer
(tcp, udp, ...) and that will repeat the socket lookup just like as if
early demux was disabled.

Fixes: 41063e9dd1 ("ipv4: Early TCP socket demux.")
Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1427
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-02-28 00:25:16 +01:00
Klemen Košir
c57ea2d7d8 netfilter: conntrack: Remove a double space in a log message
Removed an extra space in a log message and an extra blank line in code.

Signed-off-by: Klemen Košir <klemen.kosir@kream.io>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-02-28 00:25:16 +01:00
Dmitry V. Levin
c33cb0020e uapi: nfnetlink_cthelper.h: fix userspace compilation error
Apparently, <linux/netfilter/nfnetlink_cthelper.h> and
<linux/netfilter/nfnetlink_acct.h> could not be included into the same
compilation unit because of a cut-and-paste typo in the former header.

Fixes: 12f7a50533 ("netfilter: add user-space connection tracking helper infrastructure")
Cc: <stable@vger.kernel.org> # v3.6
Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-02-28 00:24:41 +01:00
Takashi Iwai
21cba9c535 ALSA: usb-audio: Drop bogus dB range in too low level
Some USB audio firmware seem to report broken dB values for the volume
controls, and this screws up applications like PulseAudio who blindly
trusts the given data.  For example, Edifier G2000 reports a PCM
volume from -128dB to -127dB, and this results in barely inaudible
sound.

This patch adds a sort of sanity check at parsing the dB values in
USB-audio driver and disables the dB reporting if the range looks
bogus.  Here, we assume -96dB as the bottom line of the max dB.

Note that, if one can figure out that proper dB range later, it can be
patched in the mixer maps.

BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=211929
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20210227105737.3656-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2021-02-27 11:58:05 +01:00
Takashi Iwai
dcf269b3f7 ALSA: usb-audio: Don't abort even if the clock rate differs
The commit 93db51d06b ("ALSA: usb-audio: Check valid altsetting at
parsing rates for UAC2/3") changed the behavior of the function
set_sample_rate_v2v3() slightly to treat the inconsistent sample rate
as an error.  It was done by assumption that the sample rate
validation should have been done at the parser phase as implemented in
that patch.  But the validation is later selectively enabled only for
certain devices as it causes a regression (the commit fe773b8711
"ALSA: usb-audio: workaround for iface reset issue"), and now the
inconsistency surfaced as a fatal error while it worked in the past as
is, as reported for FiiO M3K DAC.

For recovering from the regression, change set_sample_rate_v2v3()
again to ignore the sample rate difference as non-error.

BugLink: https://bugzilla.opensuse.org/show_bug.cgi?id=1182633
Fixes: 93db51d06b ("ALSA: usb-audio: Check valid altsetting at parsing rates for UAC2/3")
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20210227082002.21185-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2021-02-27 09:26:05 +01:00
David S. Miller
cf64c2a905 Merge branch 'work.sparc32' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2021-02-26 16:16:24 -08:00
David S. Miller
b9d6243311 Merge branch 'work.sparc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2021-02-26 16:15:20 -08:00
Jakub Kicinski
eee7ede695 Merge branch 'bnxt_en-error-recovery-bug-fixes'
Michael Chan says:

====================
bnxt_en: Error recovery bug fixes.

Two error recovery related bug fixes for 2 corner cases.

Please queue patch #2 for -stable.  Thanks.
====================

Link: https://lore.kernel.org/r/1614332590-17865-1-git-send-email-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-26 15:50:25 -08:00
Edwin Peer
20d7d1c5c9 bnxt_en: reliably allocate IRQ table on reset to avoid crash
The following trace excerpt corresponds with a NULL pointer dereference
of 'bp->irq_tbl' in bnxt_setup_inta() on an Aarch64 system after many
device resets:

    Unable to handle kernel NULL pointer dereference at ... 000000d
    ...
    pc : string+0x3c/0x80
    lr : vsnprintf+0x294/0x7e0
    sp : ffff00000f61ba70 pstate : 20000145
    x29: ffff00000f61ba70 x28: 000000000000000d
    x27: ffff0000009c8b5a x26: ffff00000f61bb80
    x25: ffff0000009c8b5a x24: 0000000000000012
    x23: 00000000ffffffe0 x22: ffff000008990428
    x21: ffff00000f61bb80 x20: 000000000000000d
    x19: 000000000000001f x18: 0000000000000000
    x17: 0000000000000000 x16: ffff800b6d0fb400
    x15: 0000000000000000 x14: ffff800b7fe31ae8
    x13: 00001ed16472c920 x12: ffff000008c6b1c9
    x11: ffff000008cf0580 x10: ffff00000f61bb80
    x9 : 00000000ffffffd8 x8 : 000000000000000c
    x7 : ffff800b684b8000 x6 : 0000000000000000
    x5 : 0000000000000065 x4 : 0000000000000001
    x3 : ffff0a00ffffff04 x2 : 000000000000001f
    x1 : 0000000000000000 x0 : 000000000000000d
    Call trace:
    string+0x3c/0x80
    vsnprintf+0x294/0x7e0
    snprintf+0x44/0x50
    __bnxt_open_nic+0x34c/0x928 [bnxt_en]
    bnxt_open+0xe8/0x238 [bnxt_en]
    __dev_open+0xbc/0x130
    __dev_change_flags+0x12c/0x168
    dev_change_flags+0x20/0x60
    ...

Ordinarily, a call to bnxt_setup_inta() (not in trace due to inlining)
would not be expected on a system supporting MSIX at all. However, if
bnxt_init_int_mode() does not end up being called after the call to
bnxt_clear_int_mode() in bnxt_fw_reset_close(), then the driver will
think that only INTA is supported and bp->irq_tbl will be NULL,
causing the above crash.

In the error recovery scenario, we call bnxt_clear_int_mode() in
bnxt_fw_reset_close() early in the sequence. Ordinarily, we will
call bnxt_init_int_mode() in bnxt_hwrm_if_change() after we
reestablish communication with the firmware after reset.  However,
if the sequence has to abort before we call bnxt_init_int_mode() and
if the user later attempts to re-open the device, then it will cause
the crash above.

We fix it in 2 ways:

1. Check for bp->irq_tbl in bnxt_setup_int_mode(). If it is NULL, call
bnxt_init_init_mode().

2. If we need to abort in bnxt_hwrm_if_change() and cannot complete
the error recovery sequence, set the BNXT_STATE_ABORT_ERR flag.  This
will cause more drastic recovery at the next attempt to re-open the
device, including a call to bnxt_init_int_mode().

Fixes: 3bc7d4a352 ("bnxt_en: Add BNXT_STATE_IN_FW_RESET state.")
Reviewed-by: Scott Branden <scott.branden@broadcom.com>
Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-26 15:50:23 -08:00
Vasundhara Volam
d20cd74521 bnxt_en: Fix race between firmware reset and driver remove.
The driver's error recovery reset sequence can take many seconds to
complete and only the critical sections are protected by rtnl_lock.
A recent change has introduced a regression in this sequence.

bnxt_remove_one() may be called while the recovery is in progress.
Normally, unregister_netdev() would cause bnxt_close_nic() to be
called and this would cause the error recovery to safely abort
with the BNXT_STATE_ABORT_ERR flag set in bnxt_close_nic().

Recently, we added bnxt_reinit_after_abort() to allow the user to
reopen the device after an aborted recovery.  This causes the
regression in the scenario described above because we would
attempt to re-open even after the netdev has been unregistered.

Fix it by checking the netdev reg_state in
bnxt_reinit_after_abort() and abort if it is unregistered.

Fixes: 6882c36cf8 ("bnxt_en: attempt to reinitialize after aborted reset")
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-26 15:50:23 -08:00
Jakub Kicinski
82adc457ac Merge branch 'mlxsw-various-fixes'
Ido Schimmel says:

====================
mlxsw: Various fixes

This patchset contains various fixes for mlxsw.

Patch #1 fixes a race condition in a selftest. The race and fix are
explained in detail in the changelog.

Patch #2 re-adds a link mode that was wrongly removed, resulting in a
regression in some setups.

Patch #3 fixes a race condition in route installation with nexthop
objects.

Please consider patches #2 and #3 for stable.
====================

Link: https://lore.kernel.org/r/20210225165721.1322424-1-idosch@idosch.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-26 15:47:56 -08:00
Ido Schimmel
dc860b88ce mlxsw: spectrum_router: Ignore routes using a deleted nexthop object
Routes are currently processed from a workqueue whereas nexthop objects
are processed in system call context. This can result in the driver not
finding a suitable nexthop group for a route and issuing a warning [1].

Fix this by ignoring such routes earlier in the process. The subsequent
deletion notification will be ignored as well.

[1]
 WARNING: CPU: 2 PID: 7754 at drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:4853 mlxsw_sp_router_fib_event_work+0x1112/0x1e00 [mlxsw_spectrum]
 [...]
 CPU: 2 PID: 7754 Comm: kworker/u8:0 Not tainted 5.11.0-rc6-cq-20210207-1 #16
 Hardware name: Mellanox Technologies Ltd. MSN2100/SA001390, BIOS 5.6.5 05/24/2018
 Workqueue: mlxsw_core_ordered mlxsw_sp_router_fib_event_work [mlxsw_spectrum]
 RIP: 0010:mlxsw_sp_router_fib_event_work+0x1112/0x1e00 [mlxsw_spectrum]

Fixes: cdd6cfc54c ("mlxsw: spectrum_router: Allow programming routes with nexthop objects")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reported-by: Alex Veber <alexve@nvidia.com>
Tested-by: Alex Veber <alexve@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-26 15:47:53 -08:00
Danielle Ratson
ae9b24ddb6 mlxsw: spectrum_ethtool: Add an external speed to PTYS register
Currently, only external bits are added to the PTYS register, whereas
there is one external bit that is wrongly marked as internal, and so was
recently removed from the register.

Add that bit to the PTYS register again, as this bit is no longer
internal.

Its removal resulted in '100000baseLR4_ER4/Full' link mode no longer
being supported, causing a regression on some setups.

Fixes: 5bf01b571c ("mlxsw: spectrum_ethtool: Remove internal speeds from PTYS register")
Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Reported-by: Eddie Shklaer <eddies@nvidia.com>
Tested-by: Eddie Shklaer <eddies@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-26 15:47:53 -08:00
Danielle Ratson
edcbf5137f selftests: forwarding: Fix race condition in mirror installation
When mirroring to a gretap in hardware the device expects to be
programmed with the egress port and all the encapsulating headers. This
requires the driver to resolve the path the packet will take in the
software data path and program the device accordingly.

If the path cannot be resolved (in this case because of an unresolved
neighbor), then mirror installation fails until the path is resolved.
This results in a race that causes the test to sometimes fail.

Fix this by setting the neighbor's state to permanent, so that it is
always valid.

Fixes: b5b029399f ("selftests: forwarding: mirror_gre_bridge_1d_vlan: Add STP test")
Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-26 15:47:52 -08:00
Arjun Roy
2107d45f17 tcp: Fix sign comparison bug in getsockopt(TCP_ZEROCOPY_RECEIVE)
getsockopt(TCP_ZEROCOPY_RECEIVE) has a bug where we read a
user-provided "len" field of type signed int, and then compare the
value to the result of an "offsetofend" operation, which is unsigned.

Negative values provided by the user will be promoted to large
positive numbers; thus checking that len < offsetofend() will return
false when the intention was that it return true.

Note that while len is originally checked for negative values earlier
on in do_tcp_getsockopt(), subsequent calls to get_user() re-read the
value from userspace which may have changed in the meantime.

Therefore, re-add the check for negative values after the call to
get_user in the handler code for TCP_ZEROCOPY_RECEIVE.

Fixes: c8856c0514 ("tcp-zerocopy: Return inq along with tcp receive zerocopy.")
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Arjun Roy <arjunroy@google.com>
Link: https://lore.kernel.org/r/20210225232628.4033281-1-arjunroy.kdev@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-26 15:47:15 -08:00
Heiko Thiery
6a4d7234ae net: fec: ptp: avoid register access when ipg clock is disabled
When accessing the timecounter register on an i.MX8MQ the kernel hangs.
This is only the case when the interface is down. This can be reproduced
by reading with 'phc_ctrl eth0 get'.

Like described in the change in 91c0d987a9
the igp clock is disabled when the interface is down and leads to a
system hang.

So we check if the ptp clock status before reading the timecounter
register.

Signed-off-by: Heiko Thiery <heiko.thiery@gmail.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Link: https://lore.kernel.org/r/20210225211514.9115-1-heiko.thiery@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-26 15:45:18 -08:00
Arnd Bergmann
7f654157f0 net: phy: make mdio_bus_phy_suspend/resume as __maybe_unused
When CONFIG_PM_SLEEP is disabled, the compiler warns about unused
functions:

drivers/net/phy/phy_device.c:273:12: error: unused function 'mdio_bus_phy_suspend' [-Werror,-Wunused-function]
static int mdio_bus_phy_suspend(struct device *dev)
drivers/net/phy/phy_device.c:293:12: error: unused function 'mdio_bus_phy_resume' [-Werror,-Wunused-function]
static int mdio_bus_phy_resume(struct device *dev)

The logic is intentional, so just mark these two as __maybe_unused
and remove the incorrect #ifdef.

Fixes: 4c0d2e96ba ("net: phy: consider that suspend2ram may cut off PHY power")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20210225145748.404410-1-arnd@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-26 15:43:25 -08:00
DENG Qingfang
63c75c053b net: dsa: mt7530: don't build GPIO support if !GPIOLIB
The new GPIO support may be optional at runtime, but it requires
building against gpiolib:

ERROR: modpost: "gpiochip_get_data" [drivers/net/dsa/mt7530.ko]
undefined!
ERROR: modpost: "devm_gpiochip_add_data_with_key"
[drivers/net/dsa/mt7530.ko] undefined!

Add #ifdef to exclude GPIO support if GPIOLIB is not enabled.

Fixes: 429a0edeef ("net: dsa: mt7530: MT7530 optional GPIO support")
Reported-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: DENG Qingfang <dqfext@gmail.com>
Link: https://lore.kernel.org/r/20210226063226.8474-1-dqfext@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-26 15:31:28 -08:00
Arnd Bergmann
01c2c1ad8f net: dsa: tag_ocelot_8021q: fix driver dependency
When the ocelot driver code is in a library, the dsa tag
code cannot be built-in:

ld.lld: error: undefined symbol: ocelot_can_inject
>>> referenced by tag_ocelot_8021q.c
>>>               dsa/tag_ocelot_8021q.o:(ocelot_xmit) in archive net/built-in.a

ld.lld: error: undefined symbol: ocelot_port_inject_frame
>>> referenced by tag_ocelot_8021q.c
>>>               dsa/tag_ocelot_8021q.o:(ocelot_xmit) in archive net/built-in.a

Building the tag support only really makes sense for compile-testing
when the driver is available, so add a Kconfig dependency that prevents
the broken configuration while allowing COMPILE_TEST alternative when
MSCC_OCELOT_SWITCH_LIB is disabled entirely.  This case is handled
through the #ifdef check in include/soc/mscc/ocelot.h.

Fixes: 0a6f17c6ae ("net: dsa: tag_ocelot_8021q: add support for PTP timestamping")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20210225143910.3964364-2-arnd@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-26 15:29:53 -08:00
Arnd Bergmann
907310ceb2 net: mscc: ocelot: select NET_DEVLINK
Without this option, the driver fails to link:

ld.lld: error: undefined symbol: devlink_sb_register
>>> referenced by ocelot_devlink.c
>>>               net/ethernet/mscc/ocelot_devlink.o:(ocelot_devlink_sb_register) in archive drivers/built-in.a
>>> referenced by ocelot_devlink.c
>>>               net/ethernet/mscc/ocelot_devlink.o:(ocelot_devlink_sb_register) in archive drivers/built-in.a

Fixes: f59fd9cab7 ("net: mscc: ocelot: configure watermarks using devlink-sb")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20210225143910.3964364-1-arnd@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-26 15:29:53 -08:00
Jakub Kicinski
7ae845d767 Merge branch 'ethernet-fixes-for-stmmac-driver'
Joakim Zhang says:

====================
ethernet: fixes for stmmac driver

Fixes for stmmac driver.
---
ChangeLogs:
V1->V2:
	* subject prefix: ethernet: stmmac: -> net: stmmac:
	* use dma_addr_t instead of unsigned int for physical address
	* use cpu_to_le32()
V2->V3:
	* fix the build issue pointed out by kbuild bot.
	* add error handling for stmmac_reinit_rx_buffers() function.
V3->V4:
	* remove patch (net: stmmac: remove redundant null check for ptp clock),
	  reviewer thinks it should target net-next.
V4->V5:
	* use %pad format to print dma_addr_t.
	* extend dwmac4_display_ring() to support all descriptor types.
	* while() -> do-while()
====================

Link: https://lore.kernel.org/r/20210225090114.17562-1-qiangqing.zhang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-26 15:17:14 -08:00
Joakim Zhang
9c63faaa93 net: stmmac: re-init rx buffers when mac resume back
During suspend/resume stress test, we found descriptor write back by DMA
could exhibit unusual behavior, e.g.:
	003 [0xc4310030]: 0x0 0x40 0x0 0xb5010040

We can see that desc3 write back is 0xb5010040, it is still ownd by DMA,
so application would not recycle this buffer. It will trigger fatal bus
error when DMA try to use this descriptor again. To fix this issue, we
should re-init all rx buffers when mac resume back.

Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-26 15:17:12 -08:00
Joakim Zhang
396e13e115 net: stmmac: fix wrongly set buffer2 valid when sph unsupport
In current driver, buffer2 available only when hardware supports split
header. Wrongly set buffer2 valid in stmmac_rx_refill when refill buffer
address. You can see that desc3 is 0x81000000 after initialization, but
turn out to be 0x83000000 after refill.

Fixes: 67afd6d1cf ("net: stmmac: Add Split Header support and enable it in XGMAC cores")
Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-26 15:17:11 -08:00
Joakim Zhang
bfaf91ca84 net: stmmac: fix dma physical address of descriptor when display ring
Driver uses dma_alloc_coherent to allocate dma memory for descriptors,
dma_alloc_coherent will return both the virtual address and physical
address. AFAIK, virt_to_phys could not convert virtual address to
physical address, for which memory is allocated by dma_alloc_coherent.

dwmac4_display_ring() function is broken for various descriptor, it only
support normal descriptor(struct dma_desc) now, this patch also extends to
support all descriptor types.

Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-26 15:17:11 -08:00
Joakim Zhang
c511819d13 net: stmmac: fix watchdog timeout during suspend/resume stress test
stmmac_xmit() call stmmac_tx_timer_arm() at the end to modify tx timer to
do the transmission cleanup work. Imagine such a situation, stmmac enters
suspend immediately after tx timer modified, it's expire callback
stmmac_tx_clean() would not be invoked. This could affect BQL, since
netdev_tx_sent_queue() has been called, but netdev_tx_completed_queue()
have not been involved, as a result, dql_avail(&dev_queue->dql) finally
always return a negative value.

__dev_queue_xmit->__dev_xmit_skb->qdisc_run->__qdisc_run->qdisc_restart->dequeue_skb:
	if ((q->flags & TCQ_F_ONETXQUEUE) &&
		netif_xmit_frozen_or_stopped(txq)) // __QUEUE_STATE_STACK_XOFF is set

Net core will stop transmitting any more. Finillay, net watchdong would timeout.
To fix this issue, we should call netdev_tx_reset_queue() in stmmac_resume().

Fixes: 54139cf3bb ("net: stmmac: adding multiple buffers for rx")
Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-26 15:17:11 -08:00
Joakim Zhang
a3e860a833 net: stmmac: stop each tx channel independently
If clear GMAC_CONFIG_TE bit, it would stop all tx channels, but users
may only want to stop specific tx channel.

Fixes: 48863ce594 ("stmmac: add DMA support for GMAC 4.xx")
Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-26 15:17:11 -08:00
Jakub Kicinski
0d1bf7a5e2 Merge tag 'wireless-drivers-2021-02-26' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers
Kalle Valo says:

====================
wireless-drivers fixes for v5.12

First set of fixes for v5.12. One iwlwifi kernel crash fix and smaller
fixes to multiple drivers.

ath9k
 * fix Spatial Multiplexing Power Save (SMPS) handling to improve thoughtput

mt76
 * error handling fixes
 * memory leax fixes

iwlwifi
 * don't crash during debug collection on DVM devices

MAINTAINERS
 * email address update

ath11k
 * fix GCC warning about DMA address debug messages
 * fix regression which broke QCA6390 AP mode

* tag 'wireless-drivers-2021-02-26' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers:
  mt76: mt7915: fix unused 'mode' variable
  mt76: dma: do not report truncated frames to mac80211
  mt76: mt7921: remove incorrect error handling
  iwlwifi: pcie: fix iwl_so_trans_cfg link error when CONFIG_IWLMVM is disabled
  ath11k: fix AP mode for QCA6390
  ath11k: qmi: use %pad to format dma_addr_t
  MAINTAINERS: update for mwifiex driver maintainers
  iwlwifi: avoid crash on unsupported debug collection
  mt76: mt7915: only modify tx buffer list after allocating tx token id
  mt76: fix tx skb error handling in mt76_dma_tx_queue_skb
  ath9k: fix transmitting to stations in dynamic SMPS mode
====================

Link: https://lore.kernel.org/r/20210226164411.CDD03C433CA@smtp.codeaurora.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-26 13:17:44 -08:00
Jakub Kicinski
9e8e714f2d Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Alexei Starovoitov says:

====================
pull-request: bpf 2021-02-26

1) Fix for bpf atomic insns with src_reg=r0, from Brendan.

2) Fix use after free due to bpf_prog_clone, from Cong.

3) Drop imprecise verifier log message, from Dmitrii.

4) Remove incorrect blank line in bpf helper description, from Hangbin.

* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  selftests/bpf: No need to drop the packet when there is no geneve opt
  bpf: Remove blank line in bpf helper description comment
  tools/resolve_btfids: Fix build error with older host toolchains
  selftests/bpf: Fix a compiler warning in global func test
  bpf: Drop imprecise log message
  bpf: Clear percpu pointers in bpf_prog_clone_free()
  bpf: Fix a warning message in mark_ptr_not_null_reg()
  bpf, x86: Fix BPF_FETCH atomic and/or/xor with r0 as src
====================

Link: https://lore.kernel.org/r/20210226193737.57004-1-alexei.starovoitov@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-26 13:16:31 -08:00
Arnd Bergmann
c490492f15 mt76: mt7915: fix unused 'mode' variable
clang points out a possible corner case in the mt7915_tm_set_tx_cont()
function if called with invalid arguments:

drivers/net/wireless/mediatek/mt76/mt7915/testmode.c:593:2: warning: variable 'mode' is used uninitialized whenever switch default is taken [-Wsometimes-uninitialized]
        default:
        ^~~~~~~
drivers/net/wireless/mediatek/mt76/mt7915/testmode.c:597:13: note: uninitialized use occurs here
        rateval =  mode << 6 | rate_idx;
                   ^~~~
drivers/net/wireless/mediatek/mt76/mt7915/testmode.c:506:37: note: initialize the variable 'mode' to silence this warning
        u8 rate_idx = td->tx_rate_idx, mode;
                                           ^

Change it to return an error instead of continuing with invalid data
here.

Fixes: 3f0caa3cbf ("mt76: mt7915: add support for continuous tx in testmode")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2021-02-26 17:35:15 +02:00
Lorenzo Bianconi
d0bd52c591 mt76: dma: do not report truncated frames to mac80211
Commit b102f0c522 ("mt76: fix array overflow on receiving too many
fragments for a packet") fixes a possible OOB access but it introduces a
memory leak since the pending frame is not released to page_frag_cache
if the frag array of skb_shared_info is full. Commit 93a1d4791c
("mt76: dma: fix a possible memory leak in mt76_add_fragment()") fixes
the issue but does not free the truncated skb that is forwarded to
mac80211 layer. Fix the leftover issue discarding even truncated skbs.

Fixes: 93a1d4791c ("mt76: dma: fix a possible memory leak in mt76_add_fragment()")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/a03166fcc8214644333c68674a781836e0f57576.1612697217.git.lorenzo@kernel.org
2021-02-26 13:49:57 +02:00
Ira Weiny
80cc838423 btrfs: use copy_highpage() instead of 2 kmaps()
There are many places where kmap/memove/kunmap patterns occur.

This pattern exists in the core common function copy_highpage().

Use copy_highpage to avoid open coding the use of kmap and leverages the
core functions use of kmap_local_page().

Development of this patch was aided by the following coccinelle script:

// <smpl>
// SPDX-License-Identifier: GPL-2.0-only
// Find kmap/copypage/kunmap pattern and replace with copy_highpage calls
//
// NOTE: The expressions in the copy page version of this kmap pattern are
// overly complex and so these all need individual attention.
//
// Confidence: Low
// Copyright: (C) 2021 Intel Corporation
// URL: http://coccinelle.lip6.fr/
// Comments:
// Options:

//
// Then a copy_page where we have 2 pages involved.
//
@ copy_page_rule @
expression page, page2, To, From, Size;
identifier ptr, ptr2;
type VP, VP2;
@@

/* kmap */
(
-VP ptr = kmap(page);
...
-VP2 ptr2 = kmap(page2);
|
-VP ptr = kmap_atomic(page);
...
-VP2 ptr2 = kmap_atomic(page2);
|
-ptr = kmap(page);
...
-ptr2 = kmap(page2);
|
-ptr = kmap_atomic(page);
...
-ptr2 = kmap_atomic(page2);
)

// 1 or more copy versions of the entire page
<+...
(
-copy_page(To, From);
+copy_highpage(To, From);
|
-memmove(To, From, Size);
+memmoveExtra(To, From, Size);
)
...+>

/* kunmap */
(
-kunmap(page2);
...
-kunmap(page);
|
-kunmap(page);
...
-kunmap(page2);
|
-kmap_atomic(ptr2);
...
-kmap_atomic(ptr);
)

// Remove any pointers left unused
@
depends on copy_page_rule
@
identifier copy_page_rule.ptr;
identifier copy_page_rule.ptr2;
type VP, VP1;
type VP2, VP21;
@@

-VP ptr;
	... when != ptr;
? VP1 ptr;
-VP2 ptr2;
	... when != ptr2;
? VP21 ptr2;

// </smpl>

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-02-26 12:45:15 +01:00
Ira Weiny
3590ec5899 btrfs: use memcpy_[to|from]_page() and kmap_local_page()
There are many places where the pattern kmap/memcpy/kunmap occurs.

This pattern was lifted to the core common functions
memcpy_[to|from]_page().

Use these new functions to reduce the code, eliminate direct uses of
kmap, and leverage the new core functions use of kmap_local_page().

Also, there is 1 place where a kmap/memcpy is followed by an
optional memset.  Here we leave the kmap open coded to avoid remapping
the page but use kmap_local_page() directly.

Development of this patch was aided by the coccinelle script:

// <smpl>
// SPDX-License-Identifier: GPL-2.0-only
// Find kmap/memcpy/kunmap pattern and replace with memcpy*page calls
//
// NOTE: Offsets and other expressions may be more complex than what the script
// will automatically generate.  Therefore a catchall rule is provided to find
// the pattern which then must be evaluated by hand.
//
// Confidence: Low
// Copyright: (C) 2021 Intel Corporation
// URL: http://coccinelle.lip6.fr/
// Comments:
// Options:

//
// simple memcpy version
//
@ memcpy_rule1 @
expression page, T, F, B, Off;
identifier ptr;
type VP;
@@

(
-VP ptr = kmap(page);
|
-ptr = kmap(page);
|
-VP ptr = kmap_atomic(page);
|
-ptr = kmap_atomic(page);
)
<+...
(
-memcpy(ptr + Off, F, B);
+memcpy_to_page(page, Off, F, B);
|
-memcpy(ptr, F, B);
+memcpy_to_page(page, 0, F, B);
|
-memcpy(T, ptr + Off, B);
+memcpy_from_page(T, page, Off, B);
|
-memcpy(T, ptr, B);
+memcpy_from_page(T, page, 0, B);
)
...+>
(
-kunmap(page);
|
-kunmap_atomic(ptr);
)

// Remove any pointers left unused
@
depends on memcpy_rule1
@
identifier memcpy_rule1.ptr;
type VP, VP1;
@@

-VP ptr;
	... when != ptr;
? VP1 ptr;

//
// Some callers kmap without a temp pointer
//
@ memcpy_rule2 @
expression page, T, Off, F, B;
@@

<+...
(
-memcpy(kmap(page) + Off, F, B);
+memcpy_to_page(page, Off, F, B);
|
-memcpy(kmap(page), F, B);
+memcpy_to_page(page, 0, F, B);
|
-memcpy(T, kmap(page) + Off, B);
+memcpy_from_page(T, page, Off, B);
|
-memcpy(T, kmap(page), B);
+memcpy_from_page(T, page, 0, B);
)
...+>
-kunmap(page);
// No need for the ptr variable removal

//
// Catch all
//
@ memcpy_rule3 @
expression page;
expression GenTo, GenFrom, GenSize;
identifier ptr;
type VP;
@@

(
-VP ptr = kmap(page);
|
-ptr = kmap(page);
|
-VP ptr = kmap_atomic(page);
|
-ptr = kmap_atomic(page);
)
<+...
(
//
// Some call sites have complex expressions within the memcpy
// match a catch all to be evaluated by hand.
//
-memcpy(GenTo, GenFrom, GenSize);
+memcpy_to_pageExtra(page, GenTo, GenFrom, GenSize);
+memcpy_from_pageExtra(GenTo, page, GenFrom, GenSize);
)
...+>
(
-kunmap(page);
|
-kunmap_atomic(ptr);
)

// Remove any pointers left unused
@
depends on memcpy_rule3
@
identifier memcpy_rule3.ptr;
type VP, VP1;
@@

-VP ptr;
	... when != ptr;
? VP1 ptr;

// <smpl>

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-02-26 12:45:15 +01:00
Paolo Bonzini
c462f859f8 KVM: xen: flush deferred static key before checking it
A missing flush would cause the static branch to trigger incorrectly.

Cc: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-26 05:13:02 -05:00
Sean Christopherson
44ac5958a6 KVM: x86/mmu: Set SPTE_AD_WRPROT_ONLY_MASK if and only if PML is enabled
Check that PML is actually enabled before setting the mask to force a
SPTE to be write-protected.  The bits used for the !AD_ENABLED case are
in the upper half of the SPTE.  With 64-bit paging and EPT, these bits
are ignored, but with 32-bit PAE paging they are reserved.  Setting them
for L2 SPTEs without checking PML breaks NPT on 32-bit KVM.

Fixes: 1f4e5fc83a ("KVM: x86: fix nested guest live migration with PML")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210225204749.1512652-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-26 04:43:29 -05:00
Arnd Bergmann
fb5fabb192 mt76: mt7921: remove incorrect error handling
Clang points out a mistake in the error handling in
mt7921_mcu_tx_rate_report(), which tries to dereference a pointer that
cannot be initialized because of the error that is being handled:

drivers/net/wireless/mediatek/mt76/mt7921/mcu.c:409:3: warning: variable 'stats' is uninitialized when used here [-Wuninitialized]
                stats->tx_rate = rate;
                ^~~~~
drivers/net/wireless/mediatek/mt76/mt7921/mcu.c:401:32: note: initialize the variable 'stats' to silence this warning
        struct mt7921_sta_stats *stats;
                                      ^
Just remove the obviously incorrect line.

Fixes: 1c099ab447 ("mt76: mt7921: add MCU support")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/20210225145953.404859-2-arnd@kernel.org
2021-02-26 11:22:09 +02:00
Wanpeng Li
919f4ebc59 KVM: x86: hyper-v: Fix Hyper-V context null-ptr-deref
Reported by syzkaller:

    KASAN: null-ptr-deref in range [0x0000000000000140-0x0000000000000147]
    CPU: 1 PID: 8370 Comm: syz-executor859 Not tainted 5.11.0-syzkaller #0
    RIP: 0010:synic_get arch/x86/kvm/hyperv.c:165 [inline]
    RIP: 0010:kvm_hv_set_sint_gsi arch/x86/kvm/hyperv.c:475 [inline]
    RIP: 0010:kvm_hv_irq_routing_update+0x230/0x460 arch/x86/kvm/hyperv.c:498
    Call Trace:
     kvm_set_irq_routing+0x69b/0x940 arch/x86/kvm/../../../virt/kvm/irqchip.c:223
     kvm_vm_ioctl+0x12d0/0x2800 arch/x86/kvm/../../../virt/kvm/kvm_main.c:3959
     vfs_ioctl fs/ioctl.c:48 [inline]
     __do_sys_ioctl fs/ioctl.c:753 [inline]
     __se_sys_ioctl fs/ioctl.c:739 [inline]
     __x64_sys_ioctl+0x193/0x200 fs/ioctl.c:739
     do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
     entry_SYSCALL_64_after_hwframe+0x44/0xae

Hyper-V context is lazily allocated until Hyper-V specific MSRs are accessed
or SynIC is enabled. However, the syzkaller testcase sets irq routing table
directly w/o enabling SynIC. This results in null-ptr-deref when accessing
SynIC Hyper-V context. This patch fixes it.

syzkaller source: https://syzkaller.appspot.com/x/repro.c?x=163342ccd00000

Reported-by: syzbot+6987f3b2dbd9eda95f12@syzkaller.appspotmail.com
Fixes: 8f014550df ("KVM: x86: hyper-v: Make Hyper-V emulation enablement conditional")
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Message-Id: <1614326399-5762-1-git-send-email-wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-26 03:16:50 -05:00
Dongli Zhang
ffe76c24c5 KVM: x86: remove misplaced comment on active_mmu_pages
The 'mmu_page_hash' is used as hash table while 'active_mmu_pages' is a
list. Remove the misplaced comment as it's mostly stating the obvious
anyways.

Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210226061945.1222-1-dongli.zhang@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-26 03:03:29 -05:00
Chenyi Qiang
96564d7773 KVM: Documentation: rectify rst markup in kvm_run->flags
Commit c32b1b896d ("KVM: X86: Add the Document for
KVM_CAP_X86_BUS_LOCK_EXIT") added a new flag in kvm_run->flags
documentation, and caused warning in make htmldocs:

  Documentation/virt/kvm/api.rst:5004: WARNING: Unexpected indentation
  Documentation/virt/kvm/api.rst:5004: WARNING: Inline emphasis start-string without end-string

Fix this rst markup issue.

Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
Message-Id: <20210226075541.27179-1-chenyi.qiang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-26 03:03:12 -05:00
Paolo Bonzini
e2a0fcac6b Documentation: kvm: fix messy conversion from .txt to .rst
Building the documentation gives a warning that the KVM_PPC_RESIZE_HPT_PREPARE
label is defined twice.  The root cause is that the KVM_PPC_RESIZE_HPT_PREPARE
API is present twice, the second being a mix of the prepare and commit APIs.
Fix it.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-26 03:02:22 -05:00
Kalle Valo
62541e2667 iwlwifi: pcie: fix iwl_so_trans_cfg link error when CONFIG_IWLMVM is disabled
Randy reported an error on his randconfig builds:

ERROR: modpost: "iwl_so_trans_cfg" [drivers/net/wireless/intel/iwlwifi/iwlwifi.ko] undefined!

The problem was that when CONFIG_IWLMVM was disabled we were still accessing
iwl_so_trans_cfg. Fix it by moving IS_ENABLED() check before the access.

Reported-by: Randy Dunlap <rdunlap@infradead.org>
Fixes: 930be4e76f ("iwlwifi: add support for SnJ with Jf devices")
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Acked-by: Luca Coelho <luciano.coelho@intel.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1614236661-20274-1-git-send-email-kvalo@codeaurora.org
2021-02-26 09:34:20 +02:00
Wei Yongjun
c88fb897c1 ALSA: n64: Fix return value check in n64audio_probe()
In case of error, the function devm_platform_ioremap_resource()
returns ERR_PTR() and never returns NULL. The NULL test in the
return value check should be replaced with IS_ERR().

Fixes: 1448f8acf4 ("sound: Add n64 driver")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Reviewed-by: Lauri Kasanen <cand@gmx.com>
Link: https://lore.kernel.org/r/20210224013803.2146953-1-weiyongjun1@huawei.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2021-02-25 10:29:13 +01:00
Hangbin Liu
557c223b64 selftests/bpf: No need to drop the packet when there is no geneve opt
In bpf geneve tunnel test we set geneve option on tx side. On rx side we
only call bpf_skb_get_tunnel_opt(). Since commit 9c2e14b481 ("ip_tunnels:
Set tunnel option flag when tunnel metadata is present") geneve_rx() will
not add TUNNEL_GENEVE_OPT flag if there is no geneve option, which cause
bpf_skb_get_tunnel_opt() return ENOENT and _geneve_get_tunnel() in
test_tunnel_kern.c drop the packet.

As it should be valid that bpf_skb_get_tunnel_opt() return error when
there is not tunnel option, there is no need to drop the packet and
break all geneve rx traffic. Just set opt_class to 0 in this test and
keep returning TC_ACT_OK.

Fixes: 933a741e3b ("selftests/bpf: bpf tunnel test.")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: William Tu <u9012063@gmail.com>
Link: https://lore.kernel.org/bpf/20210224081403.1425474-1-liuhangbin@gmail.com
2021-02-24 21:28:30 +01:00
Lukasz Majczak
d53a6adfb5 tpm, tpm_tis: Decorate tpm_tis_gen_interrupt() with request_locality()
This is shown with Samsung Chromebook Pro (Caroline) with TPM 1.2
(SLB 9670):

[    4.324298] TPM returned invalid status
[    4.324806] WARNING: CPU: 2 PID: 1 at drivers/char/tpm/tpm_tis_core.c:275 tpm_tis_status+0x86/0x8f

Background
==========

TCG PC Client Platform TPM Profile (PTP) Specification, paragraph 6.1 FIFO
Interface Locality Usage per Register, Table 39 Register Behavior Based on
Locality Setting for FIFO - a read attempt to TPM_STS_x Registers returns
0xFF in case of lack of locality.

The fix
=======

Decorate tpm_tis_gen_interrupt() with request_locality() and
release_locality().

Cc: Laurent Bigonville <bigon@debian.org>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: stable@vger.kernel.org
Fixes: a3fbfae82b ("tpm: take TPM chip power gating out of tpm_transmit()")
Signed-off-by: Lukasz Majczak <lma@semihalf.com>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
2021-02-24 18:53:22 +02:00
Jarkko Sakkinen
a5665ec2af tpm, tpm_tis: Decorate tpm_get_timeouts() with request_locality()
This is shown with Samsung Chromebook Pro (Caroline) with TPM 1.2
(SLB 9670):

[    4.324298] TPM returned invalid status
[    4.324806] WARNING: CPU: 2 PID: 1 at drivers/char/tpm/tpm_tis_core.c:275 tpm_tis_status+0x86/0x8f

Background
==========

TCG PC Client Platform TPM Profile (PTP) Specification, paragraph 6.1 FIFO
Interface Locality Usage per Register, Table 39 Register Behavior Based on
Locality Setting for FIFO - a read attempt to TPM_STS_x Registers returns
0xFF in case of lack of locality.

The fix
=======

Decorate tpm_get_timeouts() with request_locality() and release_locality().

Fixes: a3fbfae82b ("tpm: take TPM chip power gating out of tpm_transmit()")
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Laurent Bigonville <bigon@debian.org>
Cc: stable@vger.kernel.org
Reported-by: Lukasz Majczak <lma@semihalf.com>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
2021-02-24 18:53:22 +02:00
Hangbin Liu
a7c9c25a99 bpf: Remove blank line in bpf helper description comment
Commit 34b2021cc6 ("bpf: Add BPF-helper for MTU checking") added an extra
blank line in bpf helper description. This will make bpf_helpers_doc.py stop
building bpf_helper_defs.h immediately after bpf_check_mtu(), which will
affect future added functions.

Fixes: 34b2021cc6 ("bpf: Add BPF-helper for MTU checking")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Link: https://lore.kernel.org/bpf/20210223131457.1378978-1-liuhangbin@gmail.com
2021-02-24 17:20:21 +01:00
Kun-Chuan Hsieh
41462c6e73 tools/resolve_btfids: Fix build error with older host toolchains
Older libelf.h and glibc elf.h might not yet define the ELF compression
types.

Checking and defining SHF_COMPRESSED fix the build error when compiling
with older toolchains. Also, the tool resolve_btfids is compiled with host
toolchain. The host toolchain is more likely to be older than the cross
compile toolchain.

Fixes: 51f6463aac ("tools/resolve_btfids: Fix sections with wrong alignment")
Signed-off-by: Kun-Chuan Hsieh <jetswayss@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Link: https://lore.kernel.org/bpf/20210224052752.5284-1-jetswayss@gmail.com
2021-02-24 17:19:11 +01:00
Dmitrii Banshchikov
c41d81bfbb selftests/bpf: Fix a compiler warning in global func test
Add an explicit 'const void *' cast to pass program ctx pointer type into
a global function that expects pointer to structure.

warning: incompatible pointer types
passing 'struct __sk_buff *' to parameter of type 'const struct S *'
[-Wincompatible-pointer-types]
        return foo(skb);
                   ^~~
progs/test_global_func11.c:10:36: note: passing argument to parameter 's' here
__noinline int foo(const struct S *s)
                                   ^

Fixes: 8b08807d03 ("selftests/bpf: Add unit tests for pointers in global functions")
Signed-off-by: Dmitrii Banshchikov <me@ubique.spb.ru>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210223082211.302596-1-me@ubique.spb.ru
2021-02-24 16:48:16 +01:00
Dmitrii Banshchikov
f4eda8b6e4 bpf: Drop imprecise log message
Now it is possible for global function to have a pointer argument that
points to something different than struct. Drop the irrelevant log
message and keep the logic same.

Fixes: e5069b9c23 ("bpf: Support pointers in global func args")
Signed-off-by: Dmitrii Banshchikov <me@ubique.spb.ru>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20210223090416.333943-1-me@ubique.spb.ru
2021-02-24 16:43:39 +01:00
Kalle Valo
77d7e87128 ath11k: fix AP mode for QCA6390
Commit c134d1f8c4 ("ath11k: Handle errors if peer creation fails") completely
broke AP mode on QCA6390:

kernel: [  151.230734] ath11k_pci 0000:06:00.0: failed to create peer after vdev start delay: -22
wpa_supplicant[2307]: Failed to set beacon parameters
wpa_supplicant[2307]: Interface initialization failed
wpa_supplicant[2307]: wlan0: interface state UNINITIALIZED->DISABLED
wpa_supplicant[2307]: wlan0: AP-DISABLED
wpa_supplicant[2307]: wlan0: Unable to setup interface.
wpa_supplicant[2307]: Failed to initialize AP interface

This was because commit c134d1f8c4 ("ath11k: Handle errors if peer creation
fails") added error handling for ath11k_peer_create(), which had been failing
all along but was unnoticed due to the missing error handling. The actual bug
was introduced already in commit aa44b2f3ec ("ath11k: start vdev if a bss peer is
already created").

ath11k_peer_create() was failing because for AP mode the peer is created
already earlier op_add_interface() and we should skip creation here, but the
check for modes was wrong.  Fixing that makes AP mode work again.

This shouldn't affect IPQ8074 nor QCN9074 as they have hw_params.vdev_start_delay disabled.

Tested-on: QCA6390 hw2.0 PCI WLAN.HST.1.0.1-01740-QCAHSTSWPLZ_V2_TO_X86-1

Fixes: c134d1f8c4 ("ath11k: Handle errors if peer creation fails")
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1614006849-25764-1-git-send-email-kvalo@codeaurora.org
2021-02-24 08:38:11 +02:00
Geert Uytterhoeven
ebb9d34e07 ath11k: qmi: use %pad to format dma_addr_t
If CONFIG_ARCH_DMA_ADDR_T_64BIT=n:

    drivers/net/wireless/ath/ath11k/qmi.c: In function ‘ath11k_qmi_respond_fw_mem_request’:
    drivers/net/wireless/ath/ath11k/qmi.c:1690:8: warning: format ‘%llx’ expects argument of type ‘long long unsigned int’, but argument 5 has type ‘dma_addr_t’ {aka ‘unsigned int’} [-Wformat=]
     1690 |        "qmi req mem_seg[%d] 0x%llx %u %u\n", i,
	  |        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
     1691 |         ab->qmi.target_mem[i].paddr,
	  |         ~~~~~~~~~~~~~~~~~~~~~~~~~~~
	  |                              |
	  |                              dma_addr_t {aka unsigned int}
    drivers/net/wireless/ath/ath11k/debug.h:64:30: note: in definition of macro ‘ath11k_dbg’
       64 |   __ath11k_dbg(ar, dbg_mask, fmt, ##__VA_ARGS__); \
	  |                              ^~~
    drivers/net/wireless/ath/ath11k/qmi.c:1690:34: note: format string is defined here
     1690 |        "qmi req mem_seg[%d] 0x%llx %u %u\n", i,
	  |                               ~~~^
	  |                                  |
	  |                                  long long unsigned int
	  |                               %x

Fixes: d5395a5486 ("ath11k: qmi: add debug message for allocated memory segment addresses and sizes")
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/20210221182754.2071863-1-geert@linux-m68k.org
2021-02-24 08:37:13 +02:00
Sharvari Harisangam
9aa940047a MAINTAINERS: update for mwifiex driver maintainers
Add Sharvari Harisangam to Maintainer list.
Replace Ganapathi Bhat's email id in Maintainer list.

Signed-off-by: Rakesh Parmar <rakesh.parmar@nxp.com>
Signed-off-by: Sharvari Harisangam <sharvari.harisangam@nxp.com>
Acked-by: Ganapathi Bhat <ganapathi.bhat@nxp.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1613998184-20047-1-git-send-email-sharvari.harisangam@nxp.com
2021-02-24 08:36:23 +02:00
Michal Kubecek
497a4dc827 dts: drop dangling c6x symlink
With c6x architecture removal, scripts/dtc/include-prefixes/c6x symlink
lost its target. Drop the dangling symlink which triggers some distribution
check scripts.

Fixes: a579fcfa8e ("c6x: remove architecture")
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20210223204114.E7F55E0155@unicorn.suse.cz
2021-02-23 21:37:09 -06:00
Maxime Ripard
a3cb15cda1 dt-bindings: bcm2711-hdmi: Fix broken schema
For some reason, unevaluatedProperties doesn't work and
additionalProperties is required. Fix it by switching to
additionalProperties.

Signed-off-by: Maxime Ripard <maxime@cerno.tech>
Link: https://lore.kernel.org/r/20210218152837.1080819-1-maxime@cerno.tech
Signed-off-by: Rob Herring <robh@kernel.org>
2021-02-23 21:37:09 -06:00
Tetsuo Handa
9c7d83ae6b pstore: Fix warning in pstore_kill_sb()
syzbot is hitting WARN_ON(pstore_sb != sb) at pstore_kill_sb() [1], for the
assumption that pstore_sb != NULL is wrong because pstore_fill_super() will
not assign pstore_sb = sb when new_inode() for d_make_root() returned NULL
(due to memory allocation fault injection).

Since mount_single() calls pstore_kill_sb() when pstore_fill_super()
failed, pstore_kill_sb() needs to be aware of such failure path.

[1] https://syzkaller.appspot.com/bug?id=6abacb8da5137cb47a416f2bef95719ed60508a0

Reported-by: syzbot <syzbot+d0cf0ad6513e9a1da5df@syzkaller.appspotmail.com>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20210214031307.57903-1-penguin-kernel@I-love.SAKURA.ne.jp
2021-02-23 09:27:20 -08:00
Johannes Thumshirn
6e37d24599 btrfs: zoned: fix deadlock on log sync
Lockdep with fstests test case btrfs/041 detected a unsafe locking
scenario when we allocate the log node on a zoned filesystem.

btrfs/041
 ============================================
 WARNING: possible recursive locking detected
 5.11.0-rc7+ #939 Not tainted
 --------------------------------------------
 xfs_io/698 is trying to acquire lock:
 ffff88810cd673a0 (&root->log_mutex){+.+.}-{3:3}, at: btrfs_sync_log+0x3d1/0xee0 [btrfs]

 but task is already holding lock:
 ffff88810b0fc3a0 (&root->log_mutex){+.+.}-{3:3}, at: btrfs_sync_log+0x313/0xee0 [btrfs]

 other info that might help us debug this:
  Possible unsafe locking scenario:

        CPU0
        ----
   lock(&root->log_mutex);
   lock(&root->log_mutex);

  *** DEADLOCK ***

  May be due to missing lock nesting notation

 2 locks held by xfs_io/698:
  #0: ffff88810cd66620 (sb_internal){.+.+}-{0:0}, at: btrfs_sync_file+0x2c3/0x570 [btrfs]
  #1: ffff88810b0fc3a0 (&root->log_mutex){+.+.}-{3:3}, at: btrfs_sync_log+0x313/0xee0 [btrfs]

 stack backtrace:
 CPU: 0 PID: 698 Comm: xfs_io Not tainted 5.11.0-rc7+ #939
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4-rebuilt.opensuse.org 04/01/2014
 Call Trace:
  dump_stack+0x77/0x97
  __lock_acquire.cold+0xb9/0x32a
  lock_acquire+0xb5/0x400
  ? btrfs_sync_log+0x3d1/0xee0 [btrfs]
  __mutex_lock+0x7b/0x8d0
  ? btrfs_sync_log+0x3d1/0xee0 [btrfs]
  ? btrfs_sync_log+0x3d1/0xee0 [btrfs]
  ? find_first_extent_bit+0x9f/0x100 [btrfs]
  ? __mutex_unlock_slowpath+0x35/0x270
  btrfs_sync_log+0x3d1/0xee0 [btrfs]
  btrfs_sync_file+0x3a8/0x570 [btrfs]
  __x64_sys_fsync+0x34/0x60
  do_syscall_64+0x33/0x40
  entry_SYSCALL_64_after_hwframe+0x44/0xa9

This happens, because we are taking the ->log_mutex albeit it has already
been locked.

Also while at it, fix the bogus unlock of the tree_log_mutex in the error
handling.

Fixes: 3ddebf27fc ("btrfs: zoned: reorder log node allocation on zoned filesystem")
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-02-22 18:08:48 +01:00
Cong Wang
53f523f305 bpf: Clear percpu pointers in bpf_prog_clone_free()
Similar to bpf_prog_realloc(), bpf_prog_clone_create() also copies
the percpu pointers, but the clone still shares them with the original
prog, so we have to clear these two percpu pointers in
bpf_prog_clone_free(). Otherwise we would get a double free:

 BUG: kernel NULL pointer dereference, address: 0000000000000000
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0000) - not-present page
 PGD 0 P4D 0
 Oops: 0000 [#1] SMP PTI
 CPU: 13 PID: 8140 Comm: kworker/13:247 Kdump: loaded Tainted: G                W    OE
  5.11.0-rc4.bm.1-amd64+ #1
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
 test_bpf: #1 TXA
 Workqueue: events bpf_prog_free_deferred
 RIP: 0010:percpu_ref_get_many.constprop.97+0x42/0xf0
 Code: [...]
 RSP: 0018:ffffa6bce1f9bda0 EFLAGS: 00010002
 RAX: 0000000000000001 RBX: 0000000000000000 RCX: 00000000021dfc7b
 RDX: ffffffffae2eeb90 RSI: 867f92637e338da5 RDI: 0000000000000046
 RBP: ffffa6bce1f9bda8 R08: 0000000000000000 R09: 0000000000000001
 R10: 0000000000000046 R11: 0000000000000000 R12: 0000000000000280
 R13: 0000000000000000 R14: 0000000000000000 R15: ffff9b5f3ffdedc0
 FS:    0000000000000000(0000) GS:ffff9b5f2fb40000(0000) knlGS:0000000000000000
 CS:    0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000000 CR3: 000000027c36c002 CR4: 00000000003706e0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 Call Trace:
    refill_obj_stock+0x5e/0xd0
    free_percpu+0xee/0x550
    __bpf_prog_free+0x4d/0x60
    process_one_work+0x26a/0x590
    worker_thread+0x3c/0x390
    ? process_one_work+0x590/0x590
    kthread+0x130/0x150
    ? kthread_park+0x80/0x80
    ret_from_fork+0x1f/0x30

This bug is 100% reproducible with test_kmod.sh.

Fixes: 700d4796ef ("bpf: Optimize program stats")
Fixes: ca06f55b90 ("bpf: Add per-program recursion prevention mechanism")
Reported-by: Jiang Wang <jiang.wang@bytedance.com>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210218001647.71631-1-xiyou.wangcong@gmail.com
2021-02-22 18:08:35 +01:00
Josef Bacik
95c85fba1f btrfs: avoid double put of block group when emptying cluster
It's wrong calling btrfs_put_block_group in
__btrfs_return_cluster_to_free_space if the block group passed is
different than the block group the cluster represents. As this means the
cluster doesn't have a reference to the passed block group. This results
in double put and a use-after-free bug.

Fix this by simply bailing if the block group we passed in does not
match the block group on the cluster.

Fixes: fa9c0d795f ("Btrfs: rework allocation clustering")
CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ update changelog ]
Signed-off-by: David Sterba <dsterba@suse.com>
2021-02-22 18:07:45 +01:00
Filipe Manana
3660d0bcdb btrfs: fix stale data exposure after cloning a hole with NO_HOLES enabled
When using the NO_HOLES feature, if we clone a file range that spans only
a hole into a range that is at or beyond the current i_size of the
destination file, we end up not setting the full sync runtime flag on the
inode. As a result, if we then fsync the destination file and have a power
failure, after log replay we can end up exposing stale data instead of
having a hole for that range.

The conditions for this to happen are the following:

1) We have a file with a size of, for example, 1280K;

2) There is a written (non-prealloc) extent for the file range from 1024K
   to 1280K with a length of 256K;

3) This particular file extent layout is durably persisted, so that the
   existing superblock persisted on disk points to a subvolume root where
   the file has that exact file extent layout and state;

4) The file is truncated to a smaller size, to an offset lower than the
   start offset of its last extent, for example to 800K. The truncate sets
   the full sync runtime flag on the inode;

6) Fsync the file to log it and clear the full sync runtime flag;

7) Clone a region that covers only a hole (implicit hole due to NO_HOLES)
   into the file with a destination offset that starts at or beyond the
   256K file extent item we had - for example to offset 1024K;

8) Since the clone operation does not find extents in the source range,
   we end up in the if branch at the bottom of btrfs_clone() where we
   punch a hole for the file range starting at offset 1024K by calling
   btrfs_replace_file_extents(). There we end up not setting the full
   sync flag on the inode, because we don't know we are being called in
   a clone context (and not fallocate's punch hole operation), and
   neither do we create an extent map to represent a hole because the
   requested range is beyond eof;

9) A further fsync to the file will be a fast fsync, since the clone
   operation did not set the full sync flag, and therefore it relies on
   modified extent maps to correctly log the file layout. But since
   it does not find any extent map marking the range from 1024K (the
   previous eof) to the new eof, it does not log a file extent item
   for that range representing the hole;

10) After a power failure no hole for the range starting at 1024K is
   punched and we end up exposing stale data from the old 256K extent.

Turning this into exact steps:

  $ mkfs.btrfs -f -O no-holes /dev/sdi
  $ mount /dev/sdi /mnt

  # Create our test file with 3 extents of 256K and a 256K hole at offset
  # 256K. The file has a size of 1280K.
  $ xfs_io -f -s \
              -c "pwrite -S 0xab -b 256K 0 256K" \
              -c "pwrite -S 0xcd -b 256K 512K 256K" \
              -c "pwrite -S 0xef -b 256K 768K 256K" \
              -c "pwrite -S 0x73 -b 256K 1024K 256K" \
              /mnt/sdi/foobar

  # Make sure it's durably persisted. We want the last committed super
  # block to point to this particular file extent layout.
  sync

  # Now truncate our file to a smaller size, falling within a position of
  # the second extent. This sets the full sync runtime flag on the inode.
  # Then fsync the file to log it and clear the full sync flag from the
  # inode. The third extent is no longer part of the file and therefore
  # it is not logged.
  $ xfs_io -c "truncate 800K" -c "fsync" /mnt/foobar

  # Now do a clone operation that only clones the hole and sets back the
  # file size to match the size it had before the truncate operation
  # (1280K).
  $ xfs_io \
        -c "reflink /mnt/foobar 256K 1024K 256K" \
        -c "fsync" \
        /mnt/foobar

  # File data before power failure:
  $ od -A d -t x1 /mnt/foobar
  0000000 ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab
  *
  0262144 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  *
  0524288 cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd
  *
  0786432 ef ef ef ef ef ef ef ef ef ef ef ef ef ef ef ef
  *
  0819200 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  *
  1310720

  <power fail>

  # Mount the fs again to replay the log tree.
  $ mount /dev/sdi /mnt

  # File data after power failure:
  $ od -A d -t x1 /mnt/foobar
  0000000 ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab
  *
  0262144 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  *
  0524288 cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd
  *
  0786432 ef ef ef ef ef ef ef ef ef ef ef ef ef ef ef ef
  *
  0819200 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  *
  1048576 73 73 73 73 73 73 73 73 73 73 73 73 73 73 73 73
  *
  1310720

The range from 1024K to 1280K should correspond to a hole but instead it
points to stale data, to the 256K extent that should not exist after the
truncate operation.

The issue does not exists when not using NO_HOLES, because for that case
we use file extent items to represent holes, these are found and copied
during the loop that iterates over extents at btrfs_clone(), and that
causes btrfs_replace_file_extents() to be called with a non-NULL
extent_info argument and therefore set the full sync runtime flag on the
inode.

So fix this by making the code that deals with a trailing hole during
cloning, at btrfs_clone(), to set the full sync flag on the inode, if the
range starts at or beyond the current i_size.

A test case for fstests will follow soon.

Backporting notes: for kernel 5.4 the change goes to ioctl.c into
btrfs_clone before the last call to btrfs_punch_hole_range.

CC: stable@vger.kernel.org # 5.4+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-02-22 18:07:45 +01:00
Josef Bacik
1119a72e22 btrfs: tree-checker: do not error out if extent ref hash doesn't match
The tree checker checks the extent ref hash at read and write time to
make sure we do not corrupt the file system.  Generally extent
references go inline, but if we have enough of them we need to make an
item, which looks like

key.objectid	= <bytenr>
key.type	= <BTRFS_EXTENT_DATA_REF_KEY|BTRFS_TREE_BLOCK_REF_KEY>
key.offset	= hash(tree, owner, offset)

However if key.offset collide with an unrelated extent reference we'll
simply key.offset++ until we get something that doesn't collide.
Obviously this doesn't match at tree checker time, and thus we error
while writing out the transaction.  This is relatively easy to
reproduce, simply do something like the following

  xfs_io -f -c "pwrite 0 1M" file
  offset=2

  for i in {0..10000}
  do
	  xfs_io -c "reflink file 0 ${offset}M 1M" file
	  offset=$(( offset + 2 ))
  done

  xfs_io -c "reflink file 0 17999258914816 1M" file
  xfs_io -c "reflink file 0 35998517829632 1M" file
  xfs_io -c "reflink file 0 53752752058368 1M" file

  btrfs filesystem sync

And the sync will error out because we'll abort the transaction.  The
magic values above are used because they generate hash collisions with
the first file in the main subvol.

The fix for this is to remove the hash value check from tree checker, as
we have no idea which offset ours should belong to.

Reported-by: Tuomas Lähdekorpi <tuomas.lahdekorpi@gmail.com>
Fixes: 0785a9aacf ("btrfs: tree-checker: Add EXTENT_DATA_REF check")
CC: stable@vger.kernel.org # 5.4+
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ add comment]
Signed-off-by: David Sterba <dsterba@suse.com>
2021-02-22 18:07:44 +01:00
Filipe Manana
dd0734f2a8 btrfs: fix race between swap file activation and snapshot creation
When creating a snapshot we check if the current number of swap files, in
the root, is non-zero, and if it is, we error out and warn that we can not
create the snapshot because there are active swap files.

However this is racy because when a task started activation of a swap
file, another task might have started already snapshot creation and might
have seen the counter for the number of swap files as zero. This means
that after the swap file is activated we may end up with a snapshot of the
same root successfully created, and therefore when the first write to the
swap file happens it has to fall back into COW mode, which should never
happen for active swap files.

Basically what can happen is:

1) Task A starts snapshot creation and enters ioctl.c:create_snapshot().
   There it sees that root->nr_swapfiles has a value of 0 so it continues;

2) Task B enters btrfs_swap_activate(). It is not aware that another task
   started snapshot creation but it did not finish yet. It increments
   root->nr_swapfiles from 0 to 1;

3) Task B checks that the file meets all requirements to be an active
   swap file - it has NOCOW set, there are no snapshots for the inode's
   root at the moment, no file holes, no reflinked extents, etc;

4) Task B returns success and now the file is an active swap file;

5) Task A commits the transaction to create the snapshot and finishes.
   The swap file's extents are now shared between the original root and
   the snapshot;

6) A write into an extent of the swap file is attempted - there is a
   snapshot of the file's root, so we fall back to COW mode and therefore
   the physical location of the extent changes on disk.

So fix this by taking the snapshot lock during swap file activation before
locking the extent range, as that is the order in which we lock these
during buffered writes.

Fixes: ed46ff3d42 ("Btrfs: support swap files")
CC: stable@vger.kernel.org # 5.4+
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-02-22 18:07:35 +01:00
Filipe Manana
195a49eaf6 btrfs: fix race between writes to swap files and scrub
When we active a swap file, at btrfs_swap_activate(), we acquire the
exclusive operation lock to prevent the physical location of the swap
file extents to be changed by operations such as balance and device
replace/resize/remove. We also call there can_nocow_extent() which,
among other things, checks if the block group of a swap file extent is
currently RO, and if it is we can not use the extent, since a write
into it would result in COWing the extent.

However we have no protection against a scrub operation running after we
activate the swap file, which can result in the swap file extents to be
COWed while the scrub is running and operating on the respective block
group, because scrub turns a block group into RO before it processes it
and then back again to RW mode after processing it. That means an attempt
to write into a swap file extent while scrub is processing the respective
block group, will result in COWing the extent, changing its physical
location on disk.

Fix this by making sure that block groups that have extents that are used
by active swap files can not be turned into RO mode, therefore making it
not possible for a scrub to turn them into RO mode. When a scrub finds a
block group that can not be turned to RO due to the existence of extents
used by swap files, it proceeds to the next block group and logs a warning
message that mentions the block group was skipped due to active swap
files - this is the same approach we currently use for balance.

Fixes: ed46ff3d42 ("Btrfs: support swap files")
CC: stable@vger.kernel.org # 5.4+
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-02-22 18:07:15 +01:00
Dan Carpenter
33ccec5fd7 bpf: Fix a warning message in mark_ptr_not_null_reg()
The WARN_ON() argument is a condition, not an error message.  So this
code will print a stack trace but will not print the warning message.
Fix that and also change it to only WARN_ONCE().

Fixes: 4ddb74165a ("bpf: Extract nullable reg type conversion into a helper function")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/YCzJlV3hnF%2Ft1Pk4@mwanda
2021-02-22 18:03:11 +01:00
Brendan Jackman
b29dd96b90 bpf, x86: Fix BPF_FETCH atomic and/or/xor with r0 as src
This code generates a CMPXCHG loop in order to implement atomic_fetch
bitwise operations. Because CMPXCHG is hard-coded to use rax (which
holds the BPF r0 value), it saves the _real_ r0 value into the
internal "ax" temporary register and restores it once the loop is
complete.

In the middle of the loop, the actual bitwise operation is performed
using src_reg. The bug occurs when src_reg is r0: as described above,
r0 has been clobbered and the real r0 value is in the ax register.

Therefore, perform this operation on the ax register instead, when
src_reg is r0.

Fixes: 981f94c3e9 ("bpf: Add bitwise atomic instructions")
Signed-off-by: Brendan Jackman <jackmanb@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: KP Singh <kpsingh@kernel.org>
Link: https://lore.kernel.org/bpf/20210216125307.1406237-1-jackmanb@google.com
2021-02-22 18:03:11 +01:00
Filipe Manana
20903032cd btrfs: avoid checking for RO block group twice during nocow writeback
During the nocow writeback path, we currently iterate the rbtree of block
groups twice: once for checking if the target block group is RO with the
call to btrfs_extent_readonly()), and once again for getting a nocow
reference on the block group with a call to btrfs_inc_nocow_writers().

Since btrfs_inc_nocow_writers() already returns false when the target
block group is RO, remove the call to btrfs_extent_readonly(). Not only
we avoid searching the blocks group rbtree twice, it also helps reduce
contention on the lock that protects it (specially since it is a spin
lock and not a read-write lock). That may make a noticeable difference
on very large filesystems, with thousands of allocated block groups.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-02-22 17:15:55 +01:00
Nikolay Borisov
3c17916510 btrfs: fix race between extent freeing/allocation when using bitmaps
During allocation the allocator will try to allocate an extent using
cluster policy. Once the current cluster is exhausted it will remove the
entry under btrfs_free_cluster::lock and subsequently acquire
btrfs_free_space_ctl::tree_lock to dispose of the already-deleted entry
and adjust btrfs_free_space_ctl::total_bitmap. This poses a problem
because there exists a race condition between removing the entry under
one lock and doing the necessary accounting holding a different lock
since extent freeing only uses the 2nd lock. This can result in the
following situation:

T1:                                    T2:
btrfs_alloc_from_cluster               insert_into_bitmap <holds tree_lock>
 if (entry->bytes == 0)                   if (block_group && !list_empty(&block_group->cluster_list)) {
    rb_erase(entry)

 spin_unlock(&cluster->lock);
   (total_bitmaps is still 4)           spin_lock(&cluster->lock);
                                         <doesn't find entry in cluster->root>
 spin_lock(&ctl->tree_lock);             <goes to new_bitmap label, adds
<blocked since T2 holds tree_lock>       <a new entry and calls add_new_bitmap>
					    recalculate_thresholds  <crashes,
                                              due to total_bitmaps
					      becoming 5 and triggering
					      an ASSERT>

To fix this ensure that once depleted, the cluster entry is deleted when
both cluster lock and tree locks are held in the allocator (T1), this
ensures that even if there is a race with a concurrent
insert_into_bitmap call it will correctly find the entry in the cluster
and add the new space to it.

CC: <stable@vger.kernel.org> # 4.4+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-02-22 17:15:31 +01:00
Qu Wenruo
04d4ba4c90 btrfs: make check_compressed_csum() to be subpage compatible
Currently check_compressed_csum() completely relies on sectorsize ==
PAGE_SIZE to do checksum verification for compressed extents.

To make it subpage compatible, this patch will:
- Do extra calculation for the csum range
  Since we have multiple sectors inside a page, we need to only hash
  the range we want, not the full page anymore.

- Do sector-by-sector hash inside the page

With this patch and previous conversion on
btrfs_submit_compressed_read(), now we can read subpage compressed
extents properly, and do proper csum verification.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-02-22 17:15:27 +01:00
Qu Wenruo
be6a13613f btrfs: make btrfs_submit_compressed_read() subpage compatible
For compressed read, we always submit page read using page size.  This
doesn't work well with subpage, as for subpage one page can contain
several sectors.  Such submission will read range out of what we want,
and cause problems.

Thankfully to make it subpage compatible, we only need to change how the
last page of the compressed extent is read.

Instead of always adding a full page to the compressed read bio, if we're
at the last page, calculate the size using compressed length, so that we
only add part of the range into the compressed read bio.

Since we are here, also change the PAGE_SIZE used in
lookup_extent_mapping() to sectorsize.
This modification won't cause any functional change, as
lookup_extent_mapping() can handle the case where the search range is
larger than found extent range.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-02-22 17:15:25 +01:00
Ira Weiny
d70cef0d46 btrfs: fix raid6 qstripe kmap
When a qstripe is required an extra page is allocated and mapped.  There
were 3 problems:

1) There is no corresponding call of kunmap() for the qstripe page.
2) There is no reason to map the qstripe page more than once if the
   number of bits set in rbio->dbitmap is greater than one.
3) There is no reason to map the parity page and unmap it each time
   through the loop.

The page memory can continue to be reused with a single mapping on each
iteration by raid6_call.gen_syndrome() without remapping.  So map the
page for the duration of the loop.

Similarly, improve the algorithm by mapping the parity page just 1 time.

Fixes: 5a6ac9eacb ("Btrfs, raid56: support parity scrub on raid56")
CC: stable@vger.kernel.org # 4.4.x: c17af96554: btrfs: raid56: simplify tracking of Q stripe presence
CC: stable@vger.kernel.org # 4.4.x
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-02-22 17:15:21 +01:00
Johannes Berg
4538c5ed0f iwlwifi: avoid crash on unsupported debug collection
If the opmode doesn't support debug collection (DVM) then don't
crash, but just skip the callback.

Fixes: d01293154c ("iwlwifi: dbg: add op_mode callback for collecting debug data.")
Reported-by: Andy Lavr <andy.lavr@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/20210219133506.ecabe285bc7d.I73d230d555c595fa2d9bf284f80078729fe18aa4@changeid
2021-02-22 09:22:16 +02:00
Felix Fietkau
94f0e6256c mt76: mt7915: only modify tx buffer list after allocating tx token id
Modifying the tx buffer list too early can leak DMA mappings

Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/20210216135119.23809-2-nbd@nbd.name
2021-02-18 08:09:10 +02:00
Felix Fietkau
ae064fc0e3 mt76: fix tx skb error handling in mt76_dma_tx_queue_skb
When running out of room in the tx queue after calling drv->tx_prepare_skb,
the buffer list will already have been modified on MT7615 and newer drivers.
This can leak a DMA mapping and will show up as swiotlb allocation failures
on x86.

Fix this by moving the queue length check further up. This is less accurate,
since it can overestimate the needed room in the queue on MT7615 and newer,
but the difference is small enough to not matter in practice.

Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/20210216135119.23809-1-nbd@nbd.name
2021-02-18 08:09:09 +02:00
Felix Fietkau
3b9ea7206d ath9k: fix transmitting to stations in dynamic SMPS mode
When transmitting to a receiver in dynamic SMPS mode, all transmissions that
use multiple spatial streams need to be sent using CTS-to-self or RTS/CTS to
give the receiver's extra chains some time to wake up.
This fixes the tx rate getting stuck at <= MCS7 for some clients, especially
Intel ones, which make aggressive use of SMPS.

Cc: stable@vger.kernel.org
Reported-by: Martin Kennedy <hurricos@gmail.com>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/20210214184911.96702-1-nbd@nbd.name
2021-02-18 08:07:25 +02:00
Ira Weiny
ca18f6ea01 mm/highmem: Add VM_BUG_ON() to mem*_page() calls
Add VM_BUG_ON bounds checks to ensure the newly lifted and created page
memory operations do not result in corrupted data in neighbor pages.[1][2]

[1] https://lore.kernel.org/lkml/20201210053502.GS1563847@iweiny-DESK2.sc.intel.com/
[2] https://lore.kernel.org/lkml/20210209110931.00f00e47d9a0529fcee2ff01@linux-foundation.org/

Suggested-by: Matthew Wilcox <willy@infradead.org>
Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-02-11 19:56:15 +01:00
Ira Weiny
6a0996db68 mm/highmem: Introduce memcpy_page(), memmove_page(), and memset_page()
3 more common kmap patterns are kmap/memcpy/kunmap, kmap/memmove/kunmap.
and kmap/memset/kunmap.

Add helper functions for those patterns which use kmap_local_page().

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Christoph Hellwig <hch@infradead.org>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-02-11 19:55:37 +01:00
Ira Weiny
61b205f579 mm/highmem: Convert memcpy_[to|from]_page() to kmap_local_page()
kmap_local_page() is more efficient and is well suited for these calls.
Convert the kmap() to kmap_local_page()

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Christoph Hellwig <hch@infradead.org>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-02-11 19:55:16 +01:00
Ira Weiny
bb90d4bc7b mm/highmem: Lift memcpy_[to|from]_page to core
Working through a conversion to a call kmap_local_page() instead of
kmap() revealed many places where the pattern kmap/memcpy/kunmap
occurred.

Eric Biggers, Matthew Wilcox, Christoph Hellwig, Dan Williams, and Al
Viro all suggested putting this code into helper functions.  Al Viro
further pointed out that these functions already existed in the iov_iter
code.[1]

Various locations for the lifted functions were considered.

Headers like mm.h or string.h seem ok but don't really portray the
functionality well.  pagemap.h made some sense but is for page cache
functionality.[2]

Another alternative would be to create a new header for the promoted
memcpy functions, but it masks the fact that these are designed to copy
to/from pages using the kernel direct mappings and complicates matters
with a new header.

Placing these functions in 'highmem.h' is suboptimal especially with the
changes being proposed in the functionality of kmap.  From a caller
perspective including/using 'highmem.h' implies that the functions
defined in that header are only required when highmem is in use which is
increasingly not the case with modern processors.  However, highmem.h is
where all the current functions like this reside (zero_user(),
clear_highpage(), clear_user_highpage(), copy_user_highpage(), and
copy_highpage()).  So it makes the most sense even though it is
distasteful for some.[3]

Lift memcpy_to_page() and memcpy_from_page() to pagemap.h.

[1] https://lore.kernel.org/lkml/20201013200149.GI3576660@ZenIV.linux.org.uk/
    https://lore.kernel.org/lkml/20201013112544.GA5249@infradead.org/

[2] https://lore.kernel.org/lkml/20201208122316.GH7338@casper.infradead.org/

[3] https://lore.kernel.org/lkml/20201013200149.GI3576660@ZenIV.linux.org.uk/#t
    https://lore.kernel.org/lkml/20201208163814.GN1563847@iweiny-DESK2.sc.intel.com/

Cc: Boris Pismenny <borisp@mellanox.com>
Cc: Or Gerlitz <gerlitz.or@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Suggested-by: Matthew Wilcox <willy@infradead.org>
Suggested-by: Christoph Hellwig <hch@infradead.org>
Suggested-by: Dan Williams <dan.j.williams@intel.com>
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Suggested-by: Eric Biggers <ebiggers@kernel.org>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-02-11 19:54:43 +01:00
Al Viro
415ddc3b10 sparc32: take ->thread.flags out
it was used for two things - dealing with unusual ->kregs for kernel
threads and "emulate unaligned userland accesses for this thread";
the former is killed off by the previous commit, the latter never
had been set in the first place.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-01-03 20:10:56 -05:00
Al Viro
af7652500b sparc32: get rid of fake_swapper_regs
no reason to have ->kregs of initial thread set up in a special
way - we can keep them on stack, same as for every other thread.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-01-03 20:10:55 -05:00
Al Viro
d17b9ec777 sparc64: get rid of fake_swapper_regs
no reason to have ->kregs of initial thread set up in a special
way - we can keep them on stack, same as for every other thread.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-01-03 20:10:54 -05:00
Al Viro
73686e787b Merge remote-tracking branch 'sparc/master' into work.sparc32
... and resolve a non-textual conflict in arch/sparc/lib/memset.S -
EXT(...) stuff shouldn't be reintroduced on merge.
2021-01-03 20:07:31 -05:00
Al Viro
b4edf06c8a sparc32: switch to generic extables
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-01-03 20:05:18 -05:00
Al Viro
c4da8e0dc6 sparc32: switch copy_user.S away from range exception table entries
Those were the last range exception table entries, which will allow
to get rid of a lot of weirdness.  Emits the same code into .text.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-01-03 20:05:17 -05:00
Al Viro
cfd5fa7021 sparc32: get rid of range exception table entries in checksum_32.S
trivial - we don't even look at instruction offsets in the handler

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-01-03 20:05:16 -05:00
Al Viro
df06c27ebd sparc32: switch __bzero() away from range exception table entries
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-01-03 20:05:15 -05:00
Al Viro
5f99d33810 sparc32: kill lookup_fault()
No callers left.  As the result we can kill
	* lookup_fault() itself
	* the kludge in do_sparc_fault() for passing the
arguments for eventual lookup_fault() into exception handler and
labels used by it
	* the last of magical exception table entries (in __clear_user())

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-01-03 20:05:14 -05:00
Al Viro
e51ea54429 sparc32: don't bother with lookup_fault() in __bzero()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-01-03 20:05:13 -05:00
809 changed files with 8715 additions and 5475 deletions

View File

@@ -430,13 +430,13 @@ fifo_expire_async
-----------------
This parameter is used to set the timeout of asynchronous requests. Default
value of this is 248ms.
value of this is 250ms.
fifo_expire_sync
----------------
This parameter is used to set the timeout of synchronous requests. Default
value of this is 124ms. In case to favor synchronous requests over asynchronous
value of this is 125ms. In case to favor synchronous requests over asynchronous
one, this value should be decreased relative to fifo_expire_async.
low_latency

View File

@@ -109,7 +109,7 @@ required:
- resets
- ddc
unevaluatedProperties: false
additionalProperties: false
examples:
- |

View File

@@ -23,6 +23,7 @@ properties:
- enum:
- ingenic,jz4775-intc
- ingenic,jz4770-intc
- ingenic,jz4760b-intc
- const: ingenic,jz4760-intc
- items:
- const: ingenic,x1000-intc

View File

@@ -49,10 +49,14 @@ properties:
# See ../video-interfaces.txt for more details
port:
type: object
$ref: /schemas/graph.yaml#/properties/port
additionalProperties: false
properties:
endpoint:
type: object
$ref: /schemas/media/video-interfaces.yaml#
unevaluatedProperties: false
properties:
data-lanes:
oneOf:
@@ -65,11 +69,7 @@ properties:
- const: 1
- const: 2
link-frequencies:
allOf:
- $ref: /schemas/types.yaml#/definitions/uint64-array
description:
Allowed data bus frequencies.
link-frequencies: true
required:
- data-lanes

View File

@@ -31,7 +31,8 @@ properties:
maxItems: 1
port:
$ref: /schemas/graph.yaml#/$defs/port-base
$ref: /schemas/graph.yaml#/properties/port
additionalProperties: false
properties:
endpoint:
@@ -41,8 +42,6 @@ properties:
properties:
clock-noncontinuous: true
additionalProperties: false
required:
- compatible
- reg

View File

@@ -44,19 +44,17 @@ properties:
description: Reset Pin GPIO Control (active low)
port:
type: object
description: MIPI CSI-2 transmitter port
$ref: /schemas/graph.yaml#/properties/port
additionalProperties: false
properties:
endpoint:
type: object
$ref: /schemas/media/video-interfaces.yaml#
unevaluatedProperties: false
properties:
remote-endpoint: true
link-frequencies:
$ref: /schemas/types.yaml#/definitions/uint64-array
description: Allowed MIPI CSI-2 link frequencies
link-frequencies: true
data-lanes:
minItems: 1
@@ -65,10 +63,6 @@ properties:
required:
- data-lanes
- link-frequencies
- remote-endpoint
required:
- endpoint
required:
- compatible

View File

@@ -44,19 +44,17 @@ properties:
description: Reset Pin GPIO Control (active low)
port:
type: object
description: MIPI CSI-2 transmitter port
$ref: /schemas/graph.yaml#/properties/port
additionalProperties: false
properties:
endpoint:
type: object
$ref: /schemas/media/video-interfaces.yaml#
unevaluatedProperties: false
properties:
remote-endpoint: true
link-frequencies:
$ref: /schemas/types.yaml#/definitions/uint64-array
description: Allowed MIPI CSI-2 link frequencies
link-frequencies: true
data-lanes:
minItems: 1
@@ -65,10 +63,6 @@ properties:
required:
- data-lanes
- link-frequencies
- remote-endpoint
required:
- endpoint
required:
- compatible

View File

@@ -36,18 +36,17 @@ properties:
description: Reference to the GPIO connected to the XCLR pin, if any.
port:
type: object
additionalProperties: false
$ref: /schemas/graph.yaml#/properties/port
properties:
endpoint:
type: object
$ref: /schemas/media/video-interfaces.yaml#
unevaluatedProperties: false
properties:
data-lanes:
$ref: ../video-interfaces.yaml#/properties/data-lanes
link-frequencies:
$ref: ../video-interfaces.yaml#/properties/link-frequencies
data-lanes: true
link-frequencies: true
required:
- data-lanes

View File

@@ -613,6 +613,27 @@ Some of these date from the very introduction of KMS in 2008 ...
Level: Intermediate
Remove automatic page mapping from dma-buf importing
----------------------------------------------------
When importing dma-bufs, the dma-buf and PRIME frameworks automatically map
imported pages into the importer's DMA area. drm_gem_prime_fd_to_handle() and
drm_gem_prime_handle_to_fd() require that importers call dma_buf_attach()
even if they never do actual device DMA, but only CPU access through
dma_buf_vmap(). This is a problem for USB devices, which do not support DMA
operations.
To fix the issue, automatic page mappings should be removed from the
buffer-sharing code. Fixing this is a bit more involved, since the import/export
cache is also tied to &drm_gem_object.import_attach. Meanwhile we paper over
this problem for USB devices by fishing out the USB host controller device, as
long as that supports DMA. Otherwise importing can still needlessly fail.
Contact: Thomas Zimmermann <tzimmermann@suse.de>, Daniel Vetter
Level: Advanced
Better Testing
==============

View File

@@ -1988,7 +1988,7 @@ netif_carrier.
If use_carrier is 0, then the MII monitor will first query the
device's (via ioctl) MII registers and check the link state. If that
request fails (not just that it returns carrier down), then the MII
monitor will make an ethtool ETHOOL_GLINK request to attempt to obtain
monitor will make an ethtool ETHTOOL_GLINK request to attempt to obtain
the same information. If both methods fail (i.e., the driver either
does not support or had some error in processing both the MII register
and ethtool requests), then the MII monitor will assume the link is

View File

@@ -142,73 +142,13 @@ Please send incremental versions on top of what has been merged in order to fix
the patches the way they would look like if your latest patch series was to be
merged.
How can I tell what patches are queued up for backporting to the various stable releases?
-----------------------------------------------------------------------------------------
Normally Greg Kroah-Hartman collects stable commits himself, but for
networking, Dave collects up patches he deems critical for the
networking subsystem, and then hands them off to Greg.
There is a patchworks queue that you can see here:
https://patchwork.kernel.org/bundle/netdev/stable/?state=*
It contains the patches which Dave has selected, but not yet handed off
to Greg. If Greg already has the patch, then it will be here:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/stable-queue.git
A quick way to find whether the patch is in this stable-queue is to
simply clone the repo, and then git grep the mainline commit ID, e.g.
::
stable-queue$ git grep -l 284041ef21fdf2e
releases/3.0.84/ipv6-fix-possible-crashes-in-ip6_cork_release.patch
releases/3.4.51/ipv6-fix-possible-crashes-in-ip6_cork_release.patch
releases/3.9.8/ipv6-fix-possible-crashes-in-ip6_cork_release.patch
stable/stable-queue$
I see a network patch and I think it should be backported to stable. Should I request it via stable@vger.kernel.org like the references in the kernel's Documentation/process/stable-kernel-rules.rst file say?
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
No, not for networking. Check the stable queues as per above first
to see if it is already queued. If not, then send a mail to netdev,
listing the upstream commit ID and why you think it should be a stable
candidate.
Before you jump to go do the above, do note that the normal stable rules
in :ref:`Documentation/process/stable-kernel-rules.rst <stable_kernel_rules>`
still apply. So you need to explicitly indicate why it is a critical
fix and exactly what users are impacted. In addition, you need to
convince yourself that you *really* think it has been overlooked,
vs. having been considered and rejected.
Generally speaking, the longer it has had a chance to "soak" in
mainline, the better the odds that it is an OK candidate for stable. So
scrambling to request a commit be added the day after it appears should
be avoided.
I have created a network patch and I think it should be backported to stable. Should I add a Cc: stable@vger.kernel.org like the references in the kernel's Documentation/ directory say?
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
No. See above answer. In short, if you think it really belongs in
stable, then ensure you write a decent commit log that describes who
gets impacted by the bug fix and how it manifests itself, and when the
bug was introduced. If you do that properly, then the commit will get
handled appropriately and most likely get put in the patchworks stable
queue if it really warrants it.
If you think there is some valid information relating to it being in
stable that does *not* belong in the commit log, then use the three dash
marker line as described in
:ref:`Documentation/process/submitting-patches.rst <the_canonical_patch_format>`
to temporarily embed that information into the patch that you send.
Are all networking bug fixes backported to all stable releases?
Are there special rules regarding stable submissions on netdev?
---------------------------------------------------------------
Due to capacity, Dave could only take care of the backports for the
last two stable releases. For earlier stable releases, each stable
branch maintainer is supposed to take care of them. If you find any
patch is missing from an earlier stable branch, please notify
stable@vger.kernel.org with either a commit ID or a formal patch
backported, and CC Dave and other relevant networking developers.
While it used to be the case that netdev submissions were not supposed
to carry explicit ``CC: stable@vger.kernel.org`` tags that is no longer
the case today. Please follow the standard stable rules in
:ref:`Documentation/process/stable-kernel-rules.rst <stable_kernel_rules>`,
and make sure you include appropriate Fixes tags!
Is the comment style convention different for the networking content?
---------------------------------------------------------------------

View File

@@ -35,12 +35,6 @@ Rules on what kind of patches are accepted, and which ones are not, into the
Procedure for submitting patches to the -stable tree
----------------------------------------------------
- If the patch covers files in net/ or drivers/net please follow netdev stable
submission guidelines as described in
:ref:`Documentation/networking/netdev-FAQ.rst <netdev-FAQ>`
after first checking the stable networking queue at
https://patchwork.kernel.org/bundle/netdev/stable/?state=*
to ensure the requested patch is not already queued up.
- Security patches should not be handled (solely) by the -stable review
process but should follow the procedures in
:ref:`Documentation/admin-guide/security-bugs.rst <securitybugs>`.

View File

@@ -250,11 +250,6 @@ should also read
:ref:`Documentation/process/stable-kernel-rules.rst <stable_kernel_rules>`
in addition to this file.
Note, however, that some subsystem maintainers want to come to their own
conclusions on which patches should go to the stable trees. The networking
maintainer, in particular, would rather not see individual developers
adding lines like the above to their patches.
If changes affect userland-kernel interfaces, please send the MAN-PAGES
maintainer (as listed in the MAINTAINERS file) a man-pages patch, or at
least a notification of the change, so that some information makes its way

View File

@@ -182,6 +182,9 @@ is dependent on the CPU capability and the kernel configuration. The limit can
be retrieved using KVM_CAP_ARM_VM_IPA_SIZE of the KVM_CHECK_EXTENSION
ioctl() at run-time.
Creation of the VM will fail if the requested IPA size (whether it is
implicit or explicit) is unsupported on the host.
Please note that configuring the IPA size does not affect the capability
exposed by the guest CPUs in ID_AA64MMFR0_EL1[PARange]. It only affects
size of the address translated by the stage2 level (guest physical to
@@ -3856,49 +3859,20 @@ base 2 of the page size in the bottom 6 bits.
-EFAULT if struct kvm_reinject_control cannot be read,
-EINVAL if the supplied shift or flags are invalid,
-ENOMEM if unable to allocate the new HPT,
-ENOSPC if there was a hash collision
::
struct kvm_ppc_rmmu_info {
struct kvm_ppc_radix_geom {
__u8 page_shift;
__u8 level_bits[4];
__u8 pad[3];
} geometries[8];
__u32 ap_encodings[8];
};
The geometries[] field gives up to 8 supported geometries for the
radix page table, in terms of the log base 2 of the smallest page
size, and the number of bits indexed at each level of the tree, from
the PTE level up to the PGD level in that order. Any unused entries
will have 0 in the page_shift field.
The ap_encodings gives the supported page sizes and their AP field
encodings, encoded with the AP value in the top 3 bits and the log
base 2 of the page size in the bottom 6 bits.
4.102 KVM_PPC_RESIZE_HPT_PREPARE
--------------------------------
:Capability: KVM_CAP_SPAPR_RESIZE_HPT
:Architectures: powerpc
:Type: vm ioctl
:Parameters: struct kvm_ppc_resize_hpt (in)
:Returns: 0 on successful completion,
>0 if a new HPT is being prepared, the value is an estimated
number of milliseconds until preparation is complete,
-EFAULT if struct kvm_reinject_control cannot be read,
-EINVAL if the supplied shift or flags are invalid,when moving existing
HPT entries to the new HPT,
-EIO on other error conditions
Used to implement the PAPR extension for runtime resizing of a guest's
Hashed Page Table (HPT). Specifically this starts, stops or monitors
the preparation of a new potential HPT for the guest, essentially
implementing the H_RESIZE_HPT_PREPARE hypercall.
::
struct kvm_ppc_resize_hpt {
__u64 flags;
__u32 shift;
__u32 pad;
};
If called with shift > 0 when there is no pending HPT for the guest,
this begins preparation of a new pending HPT of size 2^(shift) bytes.
It then returns a positive integer with the estimated number of
@@ -3926,14 +3900,6 @@ Normally this will be called repeatedly with the same parameters until
it returns <= 0. The first call will initiate preparation, subsequent
ones will monitor preparation until it completes or fails.
::
struct kvm_ppc_resize_hpt {
__u64 flags;
__u32 shift;
__u32 pad;
};
4.103 KVM_PPC_RESIZE_HPT_COMMIT
-------------------------------
@@ -3956,6 +3922,14 @@ Hashed Page Table (HPT). Specifically this requests that the guest be
transferred to working with the new HPT, essentially implementing the
H_RESIZE_HPT_COMMIT hypercall.
::
struct kvm_ppc_resize_hpt {
__u64 flags;
__u32 shift;
__u32 pad;
};
This should only be called after KVM_PPC_RESIZE_HPT_PREPARE has
returned 0 with the same parameters. In other cases
KVM_PPC_RESIZE_HPT_COMMIT will return an error (usually -ENXIO or
@@ -3971,14 +3945,6 @@ HPT and the previous HPT will be discarded.
On failure, the guest will still be operating on its previous HPT.
::
struct kvm_ppc_resize_hpt {
__u64 flags;
__u32 shift;
__u32 pad;
};
4.104 KVM_X86_GET_MCE_CAP_SUPPORTED
-----------------------------------
@@ -4915,6 +4881,14 @@ see KVM_XEN_HVM_SET_ATTR above.
union {
__u64 gpa;
__u64 pad[4];
struct {
__u64 state;
__u64 state_entry_time;
__u64 time_running;
__u64 time_runnable;
__u64 time_blocked;
__u64 time_offline;
} runstate;
} u;
};
@@ -4927,6 +4901,31 @@ KVM_XEN_VCPU_ATTR_TYPE_VCPU_TIME_INFO
Sets the guest physical address of an additional pvclock structure
for a given vCPU. This is typically used for guest vsyscall support.
KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_ADDR
Sets the guest physical address of the vcpu_runstate_info for a given
vCPU. This is how a Xen guest tracks CPU state such as steal time.
KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_CURRENT
Sets the runstate (RUNSTATE_running/_runnable/_blocked/_offline) of
the given vCPU from the .u.runstate.state member of the structure.
KVM automatically accounts running and runnable time but blocked
and offline states are only entered explicitly.
KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_DATA
Sets all fields of the vCPU runstate data from the .u.runstate member
of the structure, including the current runstate. The state_entry_time
must equal the sum of the other four times.
KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_ADJUST
This *adds* the contents of the .u.runstate members of the structure
to the corresponding members of the given vCPU's runstate data, thus
permitting atomic adjustments to the runstate times. The adjustment
to the state_entry_time must equal the sum of the adjustments to the
other four times. The state field must be set to -1, or to a valid
runstate value (RUNSTATE_running, RUNSTATE_runnable, RUNSTATE_blocked
or RUNSTATE_offline) to set the current accounted state as of the
adjusted state_entry_time.
4.130 KVM_XEN_VCPU_GET_ATTR
---------------------------
@@ -4939,6 +4938,9 @@ KVM_XEN_VCPU_ATTR_TYPE_VCPU_TIME_INFO
Allows Xen vCPU attributes to be read. For the structure and types,
see KVM_XEN_VCPU_SET_ATTR above.
The KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_ADJUST type may not be used
with the KVM_XEN_VCPU_GET_ATTR ioctl.
5. The kvm_run structure
========================
@@ -5000,7 +5002,8 @@ local APIC is not used.
__u16 flags;
More architecture-specific flags detailing state of the VCPU that may
affect the device's behavior. Current defined flags:
affect the device's behavior. Current defined flags::
/* x86, set if the VCPU is in system management mode */
#define KVM_RUN_X86_SMM (1 << 0)
/* x86, set if bus lock detected in VM */
@@ -6217,7 +6220,7 @@ the bus lock vm exit can be preempted by a higher priority VM exit, the exit
notifications to userspace can be KVM_EXIT_BUS_LOCK or other reasons.
KVM_RUN_BUS_LOCK flag is used to distinguish between them.
7.22 KVM_CAP_PPC_DAWR1
7.23 KVM_CAP_PPC_DAWR1
----------------------
:Architectures: ppc
@@ -6702,6 +6705,7 @@ PVHVM guests. Valid flags are::
#define KVM_XEN_HVM_CONFIG_HYPERCALL_MSR (1 << 0)
#define KVM_XEN_HVM_CONFIG_INTERCEPT_HCALL (1 << 1)
#define KVM_XEN_HVM_CONFIG_SHARED_INFO (1 << 2)
#define KVM_XEN_HVM_CONFIG_RUNSTATE (1 << 2)
The KVM_XEN_HVM_CONFIG_HYPERCALL_MSR flag indicates that the KVM_XEN_HVM_CONFIG
ioctl is available, for the guest to set its hypercall page.
@@ -6716,3 +6720,7 @@ KVM_XEN_HVM_SET_ATTR, KVM_XEN_HVM_GET_ATTR, KVM_XEN_VCPU_SET_ATTR and
KVM_XEN_VCPU_GET_ATTR ioctls, as well as the delivery of exception vectors
for event channel upcalls when the evtchn_upcall_pending field of a vcpu's
vcpu_info is set.
The KVM_XEN_HVM_CONFIG_RUNSTATE flag indicates that the runstate-related
features KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_ADDR/_CURRENT/_DATA/_ADJUST are
supported by the KVM_XEN_VCPU_SET_ATTR/KVM_XEN_VCPU_GET_ATTR ioctls.

View File

@@ -261,8 +261,8 @@ ABI/API
L: linux-api@vger.kernel.org
F: include/linux/syscalls.h
F: kernel/sys_ni.c
F: include/uapi/
F: arch/*/include/uapi/
X: include/uapi/
X: arch/*/include/uapi/
ABIT UGURU 1,2 HARDWARE MONITOR DRIVER
M: Hans de Goede <hdegoede@redhat.com>
@@ -5835,7 +5835,7 @@ M: David Airlie <airlied@linux.ie>
M: Daniel Vetter <daniel@ffwll.ch>
L: dri-devel@lists.freedesktop.org
S: Maintained
B: https://bugs.freedesktop.org/
B: https://gitlab.freedesktop.org/drm
C: irc://chat.freenode.net/dri-devel
T: git git://anongit.freedesktop.org/drm/drm
F: Documentation/devicetree/bindings/display/
@@ -10716,7 +10716,8 @@ F: drivers/net/ethernet/marvell/mvpp2/
MARVELL MWIFIEX WIRELESS DRIVER
M: Amitkumar Karwar <amitkarwar@gmail.com>
M: Ganapathi Bhat <ganapathi.bhat@nxp.com>
M: Ganapathi Bhat <ganapathi017@gmail.com>
M: Sharvari Harisangam <sharvari.harisangam@nxp.com>
M: Xinming Hu <huxinming820@gmail.com>
L: linux-wireless@vger.kernel.org
S: Maintained
@@ -19165,7 +19166,7 @@ S: Maintained
F: drivers/infiniband/hw/vmw_pvrdma/
VMware PVSCSI driver
M: Jim Gill <jgill@vmware.com>
M: Vishal Bhakta <vbhakta@vmware.com>
M: VMware PV-Drivers <pv-drivers@vmware.com>
L: linux-scsi@vger.kernel.org
S: Maintained

View File

@@ -2,7 +2,7 @@
VERSION = 5
PATCHLEVEL = 12
SUBLEVEL = 0
EXTRAVERSION = -rc1
EXTRAVERSION = -rc3
NAME = Frozen Wasteland
# *DOCUMENTATION*
@@ -264,7 +264,8 @@ no-dot-config-targets := $(clean-targets) \
$(version_h) headers headers_% archheaders archscripts \
%asm-generic kernelversion %src-pkg dt_binding_check \
outputmakefile
no-sync-config-targets := $(no-dot-config-targets) %install kernelrelease
no-sync-config-targets := $(no-dot-config-targets) %install kernelrelease \
image_name
single-targets := %.a %.i %.ko %.lds %.ll %.lst %.mod %.o %.s %.symtypes %/
config-build :=
@@ -478,6 +479,7 @@ USERINCLUDE := \
-I$(objtree)/arch/$(SRCARCH)/include/generated/uapi \
-I$(srctree)/include/uapi \
-I$(objtree)/include/generated/uapi \
-include $(srctree)/include/linux/compiler-version.h \
-include $(srctree)/include/linux/kconfig.h
# Use LINUXINCLUDE when you must reference the include/ directory.

View File

@@ -632,13 +632,12 @@ config HAS_LTO_CLANG
def_bool y
# Clang >= 11: https://github.com/ClangBuiltLinux/linux/issues/510
depends on CC_IS_CLANG && CLANG_VERSION >= 110000 && LD_IS_LLD
depends on $(success,test $(LLVM) -eq 1)
depends on $(success,test $(LLVM_IAS) -eq 1)
depends on $(success,$(NM) --help | head -n 1 | grep -qi llvm)
depends on $(success,$(AR) --help | head -n 1 | grep -qi llvm)
depends on ARCH_SUPPORTS_LTO_CLANG
depends on !FTRACE_MCOUNT_USE_RECORDMCOUNT
depends on !KASAN
depends on !KASAN || KASAN_HW_TAGS
depends on !GCOV_KERNEL
help
The compiler and Kconfig options support building with Clang's

View File

@@ -348,6 +348,7 @@ config ARCH_EP93XX
select ARM_AMBA
imply ARM_PATCH_PHYS_VIRT
select ARM_VIC
select GENERIC_IRQ_MULTI_HANDLER
select AUTO_ZRELADDR
select CLKDEV_LOOKUP
select CLKSRC_MMIO

View File

@@ -11,6 +11,7 @@
#include <xen/xen.h>
#include <xen/interface/memory.h>
#include <xen/grant_table.h>
#include <xen/page.h>
#include <xen/swiotlb-xen.h>
@@ -93,17 +94,43 @@ int set_foreign_p2m_mapping(struct gnttab_map_grant_ref *map_ops,
int i;
for (i = 0; i < count; i++) {
struct gnttab_unmap_grant_ref unmap;
int rc;
if (map_ops[i].status)
continue;
if (unlikely(!set_phys_to_machine(map_ops[i].host_addr >> XEN_PAGE_SHIFT,
map_ops[i].dev_bus_addr >> XEN_PAGE_SHIFT))) {
return -ENOMEM;
}
if (likely(set_phys_to_machine(map_ops[i].host_addr >> XEN_PAGE_SHIFT,
map_ops[i].dev_bus_addr >> XEN_PAGE_SHIFT)))
continue;
/*
* Signal an error for this slot. This in turn requires
* immediate unmapping.
*/
map_ops[i].status = GNTST_general_error;
unmap.host_addr = map_ops[i].host_addr,
unmap.handle = map_ops[i].handle;
map_ops[i].handle = INVALID_GRANT_HANDLE;
if (map_ops[i].flags & GNTMAP_device_map)
unmap.dev_bus_addr = map_ops[i].dev_bus_addr;
else
unmap.dev_bus_addr = 0;
/*
* Pre-populate the status field, to be recognizable in
* the log message below.
*/
unmap.status = 1;
rc = HYPERVISOR_grant_table_op(GNTTABOP_unmap_grant_ref,
&unmap, 1);
if (rc || unmap.status != GNTST_okay)
pr_err_once("gnttab unmap failed: rc=%d st=%d\n",
rc, unmap.status);
}
return 0;
}
EXPORT_SYMBOL_GPL(set_foreign_p2m_mapping);
int clear_foreign_p2m_mapping(struct gnttab_unmap_grant_ref *unmap_ops,
struct gnttab_unmap_grant_ref *kunmap_ops,
@@ -118,7 +145,6 @@ int clear_foreign_p2m_mapping(struct gnttab_unmap_grant_ref *unmap_ops,
return 0;
}
EXPORT_SYMBOL_GPL(clear_foreign_p2m_mapping);
bool __set_phys_to_machine_multi(unsigned long pfn,
unsigned long mfn, unsigned long nr_pages)

View File

@@ -1055,8 +1055,6 @@ config HW_PERF_EVENTS
config SYS_SUPPORTS_HUGETLBFS
def_bool y
config ARCH_WANT_HUGE_PMD_SHARE
config ARCH_HAS_CACHE_LINE_SIZE
def_bool y
@@ -1157,8 +1155,8 @@ config XEN
config FORCE_MAX_ZONEORDER
int
default "14" if (ARM64_64K_PAGES && TRANSPARENT_HUGEPAGE)
default "12" if (ARM64_16K_PAGES && TRANSPARENT_HUGEPAGE)
default "14" if ARM64_64K_PAGES
default "12" if ARM64_16K_PAGES
default "11"
help
The kernel memory allocator divides physically contiguous memory
@@ -1855,12 +1853,6 @@ config CMDLINE_FROM_BOOTLOADER
the boot loader doesn't provide any, the default kernel command
string provided in CMDLINE will be used.
config CMDLINE_EXTEND
bool "Extend bootloader kernel arguments"
help
The command-line arguments provided by the boot loader will be
appended to the default kernel command string.
config CMDLINE_FORCE
bool "Always use the default kernel command string"
help

View File

@@ -47,10 +47,10 @@
#define __KVM_HOST_SMCCC_FUNC___kvm_flush_vm_context 2
#define __KVM_HOST_SMCCC_FUNC___kvm_tlb_flush_vmid_ipa 3
#define __KVM_HOST_SMCCC_FUNC___kvm_tlb_flush_vmid 4
#define __KVM_HOST_SMCCC_FUNC___kvm_tlb_flush_local_vmid 5
#define __KVM_HOST_SMCCC_FUNC___kvm_flush_cpu_context 5
#define __KVM_HOST_SMCCC_FUNC___kvm_timer_set_cntvoff 6
#define __KVM_HOST_SMCCC_FUNC___kvm_enable_ssbs 7
#define __KVM_HOST_SMCCC_FUNC___vgic_v3_get_ich_vtr_el2 8
#define __KVM_HOST_SMCCC_FUNC___vgic_v3_get_gic_config 8
#define __KVM_HOST_SMCCC_FUNC___vgic_v3_read_vmcr 9
#define __KVM_HOST_SMCCC_FUNC___vgic_v3_write_vmcr 10
#define __KVM_HOST_SMCCC_FUNC___vgic_v3_init_lrs 11
@@ -183,16 +183,16 @@ DECLARE_KVM_HYP_SYM(__bp_harden_hyp_vecs);
#define __bp_harden_hyp_vecs CHOOSE_HYP_SYM(__bp_harden_hyp_vecs)
extern void __kvm_flush_vm_context(void);
extern void __kvm_flush_cpu_context(struct kvm_s2_mmu *mmu);
extern void __kvm_tlb_flush_vmid_ipa(struct kvm_s2_mmu *mmu, phys_addr_t ipa,
int level);
extern void __kvm_tlb_flush_vmid(struct kvm_s2_mmu *mmu);
extern void __kvm_tlb_flush_local_vmid(struct kvm_s2_mmu *mmu);
extern void __kvm_timer_set_cntvoff(u64 cntvoff);
extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
extern u64 __vgic_v3_get_ich_vtr_el2(void);
extern u64 __vgic_v3_get_gic_config(void);
extern u64 __vgic_v3_read_vmcr(void);
extern void __vgic_v3_write_vmcr(u32 vmcr);
extern void __vgic_v3_init_lrs(void);

View File

@@ -83,6 +83,11 @@ void sysreg_restore_guest_state_vhe(struct kvm_cpu_context *ctxt);
void __debug_switch_to_guest(struct kvm_vcpu *vcpu);
void __debug_switch_to_host(struct kvm_vcpu *vcpu);
#ifdef __KVM_NVHE_HYPERVISOR__
void __debug_save_host_buffers_nvhe(struct kvm_vcpu *vcpu);
void __debug_restore_host_buffers_nvhe(struct kvm_vcpu *vcpu);
#endif
void __fpsimd_save_state(struct user_fpsimd_state *fp_regs);
void __fpsimd_restore_state(struct user_fpsimd_state *fp_regs);
@@ -97,7 +102,8 @@ bool kvm_host_psci_handler(struct kvm_cpu_context *host_ctxt);
void __noreturn hyp_panic(void);
#ifdef __KVM_NVHE_HYPERVISOR__
void __noreturn __hyp_do_panic(bool restore_host, u64 spsr, u64 elr, u64 par);
void __noreturn __hyp_do_panic(struct kvm_cpu_context *host_ctxt, u64 spsr,
u64 elr, u64 par);
#endif
#endif /* __ARM64_KVM_HYP_H__ */

View File

@@ -328,6 +328,11 @@ static inline void *phys_to_virt(phys_addr_t x)
#define ARCH_PFN_OFFSET ((unsigned long)PHYS_PFN_OFFSET)
#if !defined(CONFIG_SPARSEMEM_VMEMMAP) || defined(CONFIG_DEBUG_VIRTUAL)
#define page_to_virt(x) ({ \
__typeof__(x) __page = x; \
void *__addr = __va(page_to_phys(__page)); \
(void *)__tag_set((const void *)__addr, page_kasan_tag(__page));\
})
#define virt_to_page(x) pfn_to_page(virt_to_pfn(x))
#else
#define page_to_virt(x) ({ \

View File

@@ -63,23 +63,6 @@ static inline void cpu_switch_mm(pgd_t *pgd, struct mm_struct *mm)
extern u64 idmap_t0sz;
extern u64 idmap_ptrs_per_pgd;
static inline bool __cpu_uses_extended_idmap(void)
{
if (IS_ENABLED(CONFIG_ARM64_VA_BITS_52))
return false;
return unlikely(idmap_t0sz != TCR_T0SZ(VA_BITS));
}
/*
* True if the extended ID map requires an extra level of translation table
* to be configured.
*/
static inline bool __cpu_uses_extended_idmap_level(void)
{
return ARM64_HW_PGTABLE_LEVELS(64 - idmap_t0sz) > CONFIG_PGTABLE_LEVELS;
}
/*
* Ensure TCR.T0SZ is set to the provided value.
*/

View File

@@ -66,7 +66,6 @@ extern bool arm64_use_ng_mappings;
#define _PAGE_DEFAULT (_PROT_DEFAULT | PTE_ATTRINDX(MT_NORMAL))
#define PAGE_KERNEL __pgprot(PROT_NORMAL)
#define PAGE_KERNEL_TAGGED __pgprot(PROT_NORMAL_TAGGED)
#define PAGE_KERNEL_RO __pgprot((PROT_NORMAL & ~PTE_WRITE) | PTE_RDONLY)
#define PAGE_KERNEL_ROX __pgprot((PROT_NORMAL & ~(PTE_WRITE | PTE_PXN)) | PTE_RDONLY)
#define PAGE_KERNEL_EXEC __pgprot(PROT_NORMAL & ~PTE_PXN)

View File

@@ -486,6 +486,9 @@ static inline pmd_t pmd_mkdevmap(pmd_t pmd)
__pgprot_modify(prot, PTE_ATTRINDX_MASK, PTE_ATTRINDX(MT_NORMAL_NC) | PTE_PXN | PTE_UXN)
#define pgprot_device(prot) \
__pgprot_modify(prot, PTE_ATTRINDX_MASK, PTE_ATTRINDX(MT_DEVICE_nGnRE) | PTE_PXN | PTE_UXN)
#define pgprot_tagged(prot) \
__pgprot_modify(prot, PTE_ATTRINDX_MASK, PTE_ATTRINDX(MT_NORMAL_TAGGED))
#define pgprot_mhp pgprot_tagged
/*
* DMA allocations for non-coherent devices use what the Arm architecture calls
* "Normal non-cacheable" memory, which permits speculation, unaligned accesses

View File

@@ -796,6 +796,11 @@
#define ID_AA64MMFR0_PARANGE_48 0x5
#define ID_AA64MMFR0_PARANGE_52 0x6
#define ID_AA64MMFR0_TGRAN_2_SUPPORTED_DEFAULT 0x0
#define ID_AA64MMFR0_TGRAN_2_SUPPORTED_NONE 0x1
#define ID_AA64MMFR0_TGRAN_2_SUPPORTED_MIN 0x2
#define ID_AA64MMFR0_TGRAN_2_SUPPORTED_MAX 0x7
#ifdef CONFIG_ARM64_PA_BITS_52
#define ID_AA64MMFR0_PARANGE_MAX ID_AA64MMFR0_PARANGE_52
#else
@@ -961,14 +966,17 @@
#define ID_PFR1_PROGMOD_SHIFT 0
#if defined(CONFIG_ARM64_4K_PAGES)
#define ID_AA64MMFR0_TGRAN_SHIFT ID_AA64MMFR0_TGRAN4_SHIFT
#define ID_AA64MMFR0_TGRAN_SUPPORTED ID_AA64MMFR0_TGRAN4_SUPPORTED
#define ID_AA64MMFR0_TGRAN_SHIFT ID_AA64MMFR0_TGRAN4_SHIFT
#define ID_AA64MMFR0_TGRAN_SUPPORTED_MIN ID_AA64MMFR0_TGRAN4_SUPPORTED
#define ID_AA64MMFR0_TGRAN_SUPPORTED_MAX 0x7
#elif defined(CONFIG_ARM64_16K_PAGES)
#define ID_AA64MMFR0_TGRAN_SHIFT ID_AA64MMFR0_TGRAN16_SHIFT
#define ID_AA64MMFR0_TGRAN_SUPPORTED ID_AA64MMFR0_TGRAN16_SUPPORTED
#define ID_AA64MMFR0_TGRAN_SHIFT ID_AA64MMFR0_TGRAN16_SHIFT
#define ID_AA64MMFR0_TGRAN_SUPPORTED_MIN ID_AA64MMFR0_TGRAN16_SUPPORTED
#define ID_AA64MMFR0_TGRAN_SUPPORTED_MAX 0xF
#elif defined(CONFIG_ARM64_64K_PAGES)
#define ID_AA64MMFR0_TGRAN_SHIFT ID_AA64MMFR0_TGRAN64_SHIFT
#define ID_AA64MMFR0_TGRAN_SUPPORTED ID_AA64MMFR0_TGRAN64_SUPPORTED
#define ID_AA64MMFR0_TGRAN_SHIFT ID_AA64MMFR0_TGRAN64_SHIFT
#define ID_AA64MMFR0_TGRAN_SUPPORTED_MIN ID_AA64MMFR0_TGRAN64_SUPPORTED
#define ID_AA64MMFR0_TGRAN_SUPPORTED_MAX 0x7
#endif
#define MVFR2_FPMISC_SHIFT 4

View File

@@ -319,7 +319,7 @@ SYM_FUNC_START_LOCAL(__create_page_tables)
*/
adrp x5, __idmap_text_end
clz x5, x5
cmp x5, TCR_T0SZ(VA_BITS) // default T0SZ small enough?
cmp x5, TCR_T0SZ(VA_BITS_MIN) // default T0SZ small enough?
b.ge 1f // .. then skip VA range extension
adr_l x6, idmap_t0sz
@@ -655,8 +655,10 @@ SYM_FUNC_END(__secondary_too_slow)
SYM_FUNC_START(__enable_mmu)
mrs x2, ID_AA64MMFR0_EL1
ubfx x2, x2, #ID_AA64MMFR0_TGRAN_SHIFT, 4
cmp x2, #ID_AA64MMFR0_TGRAN_SUPPORTED
b.ne __no_granule_support
cmp x2, #ID_AA64MMFR0_TGRAN_SUPPORTED_MIN
b.lt __no_granule_support
cmp x2, #ID_AA64MMFR0_TGRAN_SUPPORTED_MAX
b.gt __no_granule_support
update_early_cpu_boot_status 0, x2, x3
adrp x2, idmap_pg_dir
phys_to_ttbr x1, x1

View File

@@ -163,33 +163,36 @@ static __init void __parse_cmdline(const char *cmdline, bool parse_aliases)
} while (1);
}
static __init const u8 *get_bootargs_cmdline(void)
{
const u8 *prop;
void *fdt;
int node;
fdt = get_early_fdt_ptr();
if (!fdt)
return NULL;
node = fdt_path_offset(fdt, "/chosen");
if (node < 0)
return NULL;
prop = fdt_getprop(fdt, node, "bootargs", NULL);
if (!prop)
return NULL;
return strlen(prop) ? prop : NULL;
}
static __init void parse_cmdline(void)
{
if (!IS_ENABLED(CONFIG_CMDLINE_FORCE)) {
const u8 *prop;
void *fdt;
int node;
const u8 *prop = get_bootargs_cmdline();
fdt = get_early_fdt_ptr();
if (!fdt)
goto out;
node = fdt_path_offset(fdt, "/chosen");
if (node < 0)
goto out;
prop = fdt_getprop(fdt, node, "bootargs", NULL);
if (!prop)
goto out;
if (IS_ENABLED(CONFIG_CMDLINE_FORCE) || !prop)
__parse_cmdline(CONFIG_CMDLINE, true);
if (!IS_ENABLED(CONFIG_CMDLINE_FORCE) && prop)
__parse_cmdline(prop, true);
if (!IS_ENABLED(CONFIG_CMDLINE_EXTEND))
return;
}
out:
__parse_cmdline(CONFIG_CMDLINE, true);
}
/* Keep checkers quiet */

View File

@@ -101,6 +101,9 @@ KVM_NVHE_ALIAS(__stop___kvm_ex_table);
/* Array containing bases of nVHE per-CPU memory regions. */
KVM_NVHE_ALIAS(kvm_arm_hyp_percpu_base);
/* PMU available static key */
KVM_NVHE_ALIAS(kvm_arm_pmu_available);
#endif /* CONFIG_KVM */
#endif /* __ARM64_KERNEL_IMAGE_VARS_H */

View File

@@ -460,7 +460,7 @@ static inline int armv8pmu_counter_has_overflowed(u32 pmnc, int idx)
return pmnc & BIT(ARMV8_IDX_TO_COUNTER(idx));
}
static inline u32 armv8pmu_read_evcntr(int idx)
static inline u64 armv8pmu_read_evcntr(int idx)
{
u32 counter = ARMV8_IDX_TO_COUNTER(idx);

View File

@@ -385,11 +385,16 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
last_ran = this_cpu_ptr(mmu->last_vcpu_ran);
/*
* We guarantee that both TLBs and I-cache are private to each
* vcpu. If detecting that a vcpu from the same VM has
* previously run on the same physical CPU, call into the
* hypervisor code to nuke the relevant contexts.
*
* We might get preempted before the vCPU actually runs, but
* over-invalidation doesn't affect correctness.
*/
if (*last_ran != vcpu->vcpu_id) {
kvm_call_hyp(__kvm_tlb_flush_local_vmid, mmu);
kvm_call_hyp(__kvm_flush_cpu_context, mmu);
*last_ran = vcpu->vcpu_id;
}

View File

@@ -85,8 +85,10 @@ SYM_INNER_LABEL(__guest_exit_panic, SYM_L_GLOBAL)
// If the hyp context is loaded, go straight to hyp_panic
get_loaded_vcpu x0, x1
cbz x0, hyp_panic
cbnz x0, 1f
b hyp_panic
1:
// The hyp context is saved so make sure it is restored to allow
// hyp_panic to run at hyp and, subsequently, panic to run in the host.
// This makes use of __guest_exit to avoid duplication but sets the
@@ -94,7 +96,7 @@ SYM_INNER_LABEL(__guest_exit_panic, SYM_L_GLOBAL)
// current state is saved to the guest context but it will only be
// accurate if the guest had been completely restored.
adr_this_cpu x0, kvm_hyp_ctxt, x1
adr x1, hyp_panic
adr_l x1, hyp_panic
str x1, [x0, #CPU_XREG_OFFSET(30)]
get_vcpu_ptr x1, x0
@@ -146,7 +148,7 @@ SYM_INNER_LABEL(__guest_exit, SYM_L_GLOBAL)
// Now restore the hyp regs
restore_callee_saved_regs x2
set_loaded_vcpu xzr, x1, x2
set_loaded_vcpu xzr, x2, x3
alternative_if ARM64_HAS_RAS_EXTN
// If we have the RAS extensions we can consume a pending error

View File

@@ -90,15 +90,18 @@ static inline void __activate_traps_common(struct kvm_vcpu *vcpu)
* counter, which could make a PMXEVCNTR_EL0 access UNDEF at
* EL1 instead of being trapped to EL2.
*/
write_sysreg(0, pmselr_el0);
write_sysreg(ARMV8_PMU_USERENR_MASK, pmuserenr_el0);
if (kvm_arm_support_pmu_v3()) {
write_sysreg(0, pmselr_el0);
write_sysreg(ARMV8_PMU_USERENR_MASK, pmuserenr_el0);
}
write_sysreg(vcpu->arch.mdcr_el2, mdcr_el2);
}
static inline void __deactivate_traps_common(void)
{
write_sysreg(0, hstr_el2);
write_sysreg(0, pmuserenr_el0);
if (kvm_arm_support_pmu_v3())
write_sysreg(0, pmuserenr_el0);
}
static inline void ___activate_traps(struct kvm_vcpu *vcpu)

View File

@@ -58,16 +58,24 @@ static void __debug_restore_spe(u64 pmscr_el1)
write_sysreg_s(pmscr_el1, SYS_PMSCR_EL1);
}
void __debug_switch_to_guest(struct kvm_vcpu *vcpu)
void __debug_save_host_buffers_nvhe(struct kvm_vcpu *vcpu)
{
/* Disable and flush SPE data generation */
__debug_save_spe(&vcpu->arch.host_debug_state.pmscr_el1);
}
void __debug_switch_to_guest(struct kvm_vcpu *vcpu)
{
__debug_switch_to_guest_common(vcpu);
}
void __debug_restore_host_buffers_nvhe(struct kvm_vcpu *vcpu)
{
__debug_restore_spe(vcpu->arch.host_debug_state.pmscr_el1);
}
void __debug_switch_to_host(struct kvm_vcpu *vcpu)
{
__debug_restore_spe(vcpu->arch.host_debug_state.pmscr_el1);
__debug_switch_to_host_common(vcpu);
}

View File

@@ -71,7 +71,8 @@ SYM_FUNC_START(__host_enter)
SYM_FUNC_END(__host_enter)
/*
* void __noreturn __hyp_do_panic(bool restore_host, u64 spsr, u64 elr, u64 par);
* void __noreturn __hyp_do_panic(struct kvm_cpu_context *host_ctxt, u64 spsr,
* u64 elr, u64 par);
*/
SYM_FUNC_START(__hyp_do_panic)
/* Prepare and exit to the host's panic funciton. */
@@ -82,9 +83,11 @@ SYM_FUNC_START(__hyp_do_panic)
hyp_kimg_va lr, x6
msr elr_el2, lr
/* Set the panic format string. Use the, now free, LR as scratch. */
ldr lr, =__hyp_panic_string
hyp_kimg_va lr, x6
mov x29, x0
/* Load the format string into x0 and arguments into x1-7 */
ldr x0, =__hyp_panic_string
hyp_kimg_va x0, x6
/* Load the format arguments into x1-7. */
mov x6, x3
@@ -94,9 +97,7 @@ SYM_FUNC_START(__hyp_do_panic)
mrs x5, hpfar_el2
/* Enter the host, conditionally restoring the host context. */
cmp x0, xzr
mov x0, lr
b.eq __host_enter_without_restoring
cbz x29, __host_enter_without_restoring
b __host_enter_for_panic
SYM_FUNC_END(__hyp_do_panic)

View File

@@ -46,11 +46,11 @@ static void handle___kvm_tlb_flush_vmid(struct kvm_cpu_context *host_ctxt)
__kvm_tlb_flush_vmid(kern_hyp_va(mmu));
}
static void handle___kvm_tlb_flush_local_vmid(struct kvm_cpu_context *host_ctxt)
static void handle___kvm_flush_cpu_context(struct kvm_cpu_context *host_ctxt)
{
DECLARE_REG(struct kvm_s2_mmu *, mmu, host_ctxt, 1);
__kvm_tlb_flush_local_vmid(kern_hyp_va(mmu));
__kvm_flush_cpu_context(kern_hyp_va(mmu));
}
static void handle___kvm_timer_set_cntvoff(struct kvm_cpu_context *host_ctxt)
@@ -67,9 +67,9 @@ static void handle___kvm_enable_ssbs(struct kvm_cpu_context *host_ctxt)
write_sysreg_el2(tmp, SYS_SCTLR);
}
static void handle___vgic_v3_get_ich_vtr_el2(struct kvm_cpu_context *host_ctxt)
static void handle___vgic_v3_get_gic_config(struct kvm_cpu_context *host_ctxt)
{
cpu_reg(host_ctxt, 1) = __vgic_v3_get_ich_vtr_el2();
cpu_reg(host_ctxt, 1) = __vgic_v3_get_gic_config();
}
static void handle___vgic_v3_read_vmcr(struct kvm_cpu_context *host_ctxt)
@@ -115,10 +115,10 @@ static const hcall_t host_hcall[] = {
HANDLE_FUNC(__kvm_flush_vm_context),
HANDLE_FUNC(__kvm_tlb_flush_vmid_ipa),
HANDLE_FUNC(__kvm_tlb_flush_vmid),
HANDLE_FUNC(__kvm_tlb_flush_local_vmid),
HANDLE_FUNC(__kvm_flush_cpu_context),
HANDLE_FUNC(__kvm_timer_set_cntvoff),
HANDLE_FUNC(__kvm_enable_ssbs),
HANDLE_FUNC(__vgic_v3_get_ich_vtr_el2),
HANDLE_FUNC(__vgic_v3_get_gic_config),
HANDLE_FUNC(__vgic_v3_read_vmcr),
HANDLE_FUNC(__vgic_v3_write_vmcr),
HANDLE_FUNC(__vgic_v3_init_lrs),

View File

@@ -192,6 +192,14 @@ int __kvm_vcpu_run(struct kvm_vcpu *vcpu)
pmu_switch_needed = __pmu_switch_to_guest(host_ctxt);
__sysreg_save_state_nvhe(host_ctxt);
/*
* We must flush and disable the SPE buffer for nVHE, as
* the translation regime(EL1&0) is going to be loaded with
* that of the guest. And we must do this before we change the
* translation regime to EL2 (via MDCR_EL2_E2PB == 0) and
* before we load guest Stage1.
*/
__debug_save_host_buffers_nvhe(vcpu);
__adjust_pc(vcpu);
@@ -234,11 +242,12 @@ int __kvm_vcpu_run(struct kvm_vcpu *vcpu)
if (vcpu->arch.flags & KVM_ARM64_FP_ENABLED)
__fpsimd_save_fpexc32(vcpu);
__debug_switch_to_host(vcpu);
/*
* This must come after restoring the host sysregs, since a non-VHE
* system may enable SPE here and make use of the TTBRs.
*/
__debug_switch_to_host(vcpu);
__debug_restore_host_buffers_nvhe(vcpu);
if (pmu_switch_needed)
__pmu_switch_to_host(host_ctxt);
@@ -257,7 +266,6 @@ void __noreturn hyp_panic(void)
u64 spsr = read_sysreg_el2(SYS_SPSR);
u64 elr = read_sysreg_el2(SYS_ELR);
u64 par = read_sysreg_par();
bool restore_host = true;
struct kvm_cpu_context *host_ctxt;
struct kvm_vcpu *vcpu;
@@ -271,7 +279,7 @@ void __noreturn hyp_panic(void)
__sysreg_restore_state_nvhe(host_ctxt);
}
__hyp_do_panic(restore_host, spsr, elr, par);
__hyp_do_panic(host_ctxt, spsr, elr, par);
unreachable();
}

View File

@@ -123,7 +123,7 @@ void __kvm_tlb_flush_vmid(struct kvm_s2_mmu *mmu)
__tlb_switch_to_host(&cxt);
}
void __kvm_tlb_flush_local_vmid(struct kvm_s2_mmu *mmu)
void __kvm_flush_cpu_context(struct kvm_s2_mmu *mmu)
{
struct tlb_inv_context cxt;
@@ -131,6 +131,7 @@ void __kvm_tlb_flush_local_vmid(struct kvm_s2_mmu *mmu)
__tlb_switch_to_guest(mmu, &cxt);
__tlbi(vmalle1);
asm volatile("ic iallu");
dsb(nsh);
isb();

View File

@@ -223,6 +223,7 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
goto out;
if (!table) {
data->addr = ALIGN_DOWN(data->addr, kvm_granule_size(level));
data->addr += kvm_granule_size(level);
goto out;
}

View File

@@ -405,9 +405,45 @@ void __vgic_v3_init_lrs(void)
__gic_v3_set_lr(0, i);
}
u64 __vgic_v3_get_ich_vtr_el2(void)
/*
* Return the GIC CPU configuration:
* - [31:0] ICH_VTR_EL2
* - [62:32] RES0
* - [63] MMIO (GICv2) capable
*/
u64 __vgic_v3_get_gic_config(void)
{
return read_gicreg(ICH_VTR_EL2);
u64 val, sre = read_gicreg(ICC_SRE_EL1);
unsigned long flags = 0;
/*
* To check whether we have a MMIO-based (GICv2 compatible)
* CPU interface, we need to disable the system register
* view. To do that safely, we have to prevent any interrupt
* from firing (which would be deadly).
*
* Note that this only makes sense on VHE, as interrupts are
* already masked for nVHE as part of the exception entry to
* EL2.
*/
if (has_vhe())
flags = local_daif_save();
write_gicreg(0, ICC_SRE_EL1);
isb();
val = read_gicreg(ICC_SRE_EL1);
write_gicreg(sre, ICC_SRE_EL1);
isb();
if (has_vhe())
local_daif_restore(flags);
val = (val & ICC_SRE_EL1_SRE) ? 0 : (1ULL << 63);
val |= read_gicreg(ICH_VTR_EL2);
return val;
}
u64 __vgic_v3_read_vmcr(void)

View File

@@ -127,7 +127,7 @@ void __kvm_tlb_flush_vmid(struct kvm_s2_mmu *mmu)
__tlb_switch_to_host(&cxt);
}
void __kvm_tlb_flush_local_vmid(struct kvm_s2_mmu *mmu)
void __kvm_flush_cpu_context(struct kvm_s2_mmu *mmu)
{
struct tlb_inv_context cxt;
@@ -135,6 +135,7 @@ void __kvm_tlb_flush_local_vmid(struct kvm_s2_mmu *mmu)
__tlb_switch_to_guest(mmu, &cxt);
__tlbi(vmalle1);
asm volatile("ic iallu");
dsb(nsh);
isb();

View File

@@ -1312,8 +1312,7 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
* Prevent userspace from creating a memory region outside of the IPA
* space addressable by the KVM guest IPA space.
*/
if (memslot->base_gfn + memslot->npages >=
(kvm_phys_size(kvm) >> PAGE_SHIFT))
if ((memslot->base_gfn + memslot->npages) > (kvm_phys_size(kvm) >> PAGE_SHIFT))
return -EFAULT;
mmap_read_lock(current->mm);

View File

@@ -11,6 +11,8 @@
#include <asm/kvm_emulate.h>
DEFINE_STATIC_KEY_FALSE(kvm_arm_pmu_available);
static int kvm_is_in_guest(void)
{
return kvm_get_running_vcpu() != NULL;
@@ -48,6 +50,14 @@ static struct perf_guest_info_callbacks kvm_guest_cbs = {
int kvm_perf_init(void)
{
/*
* Check if HW_PERF_EVENTS are supported by checking the number of
* hardware performance counters. This could ensure the presence of
* a physical PMU and CONFIG_PERF_EVENT is selected.
*/
if (IS_ENABLED(CONFIG_ARM_PMU) && perf_num_counters() > 0)
static_branch_enable(&kvm_arm_pmu_available);
return perf_register_guest_info_callbacks(&kvm_guest_cbs);
}

View File

@@ -823,16 +823,6 @@ u64 kvm_pmu_get_pmceid(struct kvm_vcpu *vcpu, bool pmceid1)
return val & mask;
}
bool kvm_arm_support_pmu_v3(void)
{
/*
* Check if HW_PERF_EVENTS are supported by checking the number of
* hardware performance counters. This could ensure the presence of
* a physical PMU and CONFIG_PERF_EVENT is selected.
*/
return (perf_num_counters() > 0);
}
int kvm_arm_pmu_v3_enable(struct kvm_vcpu *vcpu)
{
if (!kvm_vcpu_has_pmu(vcpu))

View File

@@ -311,23 +311,24 @@ int kvm_set_ipa_limit(void)
}
switch (cpuid_feature_extract_unsigned_field(mmfr0, tgran_2)) {
default:
case 1:
case ID_AA64MMFR0_TGRAN_2_SUPPORTED_NONE:
kvm_err("PAGE_SIZE not supported at Stage-2, giving up\n");
return -EINVAL;
case 0:
case ID_AA64MMFR0_TGRAN_2_SUPPORTED_DEFAULT:
kvm_debug("PAGE_SIZE supported at Stage-2 (default)\n");
break;
case 2:
case ID_AA64MMFR0_TGRAN_2_SUPPORTED_MIN ... ID_AA64MMFR0_TGRAN_2_SUPPORTED_MAX:
kvm_debug("PAGE_SIZE supported at Stage-2 (advertised)\n");
break;
default:
kvm_err("Unsupported value for TGRAN_2, giving up\n");
return -EINVAL;
}
kvm_ipa_limit = id_aa64mmfr0_parange_to_phys_shift(parange);
WARN(kvm_ipa_limit < KVM_PHYS_SHIFT,
"KVM IPA Size Limit (%d bits) is smaller than default size\n",
kvm_ipa_limit);
kvm_info("IPA Size Limit: %d bits\n", kvm_ipa_limit);
kvm_info("IPA Size Limit: %d bits%s\n", kvm_ipa_limit,
((kvm_ipa_limit < KVM_PHYS_SHIFT) ?
" (Reduced IPA size, limited VM/VMM compatibility)" : ""));
return 0;
}
@@ -356,6 +357,11 @@ int kvm_arm_setup_stage2(struct kvm *kvm, unsigned long type)
return -EINVAL;
} else {
phys_shift = KVM_PHYS_SHIFT;
if (phys_shift > kvm_ipa_limit) {
pr_warn_once("%s using unsupported default IPA limit, upgrade your VMM\n",
current->comm);
return -EINVAL;
}
}
mmfr0 = read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1);

View File

@@ -574,9 +574,13 @@ early_param("kvm-arm.vgic_v4_enable", early_gicv4_enable);
*/
int vgic_v3_probe(const struct gic_kvm_info *info)
{
u32 ich_vtr_el2 = kvm_call_hyp_ret(__vgic_v3_get_ich_vtr_el2);
u64 ich_vtr_el2 = kvm_call_hyp_ret(__vgic_v3_get_gic_config);
bool has_v2;
int ret;
has_v2 = ich_vtr_el2 >> 63;
ich_vtr_el2 = (u32)ich_vtr_el2;
/*
* The ListRegs field is 5 bits, but there is an architectural
* maximum of 16 list registers. Just ignore bit 4...
@@ -594,13 +598,15 @@ int vgic_v3_probe(const struct gic_kvm_info *info)
gicv4_enable ? "en" : "dis");
}
kvm_vgic_global_state.vcpu_base = 0;
if (!info->vcpu.start) {
kvm_info("GICv3: no GICV resource entry\n");
kvm_vgic_global_state.vcpu_base = 0;
} else if (!has_v2) {
pr_warn(FW_BUG "CPU interface incapable of MMIO access\n");
} else if (!PAGE_ALIGNED(info->vcpu.start)) {
pr_warn("GICV physical address 0x%llx not page aligned\n",
(unsigned long long)info->vcpu.start);
kvm_vgic_global_state.vcpu_base = 0;
} else {
kvm_vgic_global_state.vcpu_base = info->vcpu.start;
kvm_vgic_global_state.can_emulate_gicv2 = true;

View File

@@ -219,17 +219,40 @@ static void __init zone_sizes_init(unsigned long min, unsigned long max)
int pfn_valid(unsigned long pfn)
{
phys_addr_t addr = pfn << PAGE_SHIFT;
phys_addr_t addr = PFN_PHYS(pfn);
if ((addr >> PAGE_SHIFT) != pfn)
/*
* Ensure the upper PAGE_SHIFT bits are clear in the
* pfn. Else it might lead to false positives when
* some of the upper bits are set, but the lower bits
* match a valid pfn.
*/
if (PHYS_PFN(addr) != pfn)
return 0;
#ifdef CONFIG_SPARSEMEM
{
struct mem_section *ms;
if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS)
return 0;
if (!valid_section(__pfn_to_section(pfn)))
ms = __pfn_to_section(pfn);
if (!valid_section(ms))
return 0;
/*
* ZONE_DEVICE memory does not have the memblock entries.
* memblock_is_map_memory() check for ZONE_DEVICE based
* addresses will always fail. Even the normal hotplugged
* memory will never have MEMBLOCK_NOMAP flag set in their
* memblock entries. Skip memblock search for all non early
* memory sections covering all of hotplug memory including
* both normal and ZONE_DEVICE based.
*/
if (!early_section(ms))
return pfn_section_valid(ms, pfn);
}
#endif
return memblock_is_map_memory(addr);
}

View File

@@ -40,7 +40,7 @@
#define NO_BLOCK_MAPPINGS BIT(0)
#define NO_CONT_MAPPINGS BIT(1)
u64 idmap_t0sz = TCR_T0SZ(VA_BITS);
u64 idmap_t0sz = TCR_T0SZ(VA_BITS_MIN);
u64 idmap_ptrs_per_pgd = PTRS_PER_PGD;
u64 __section(".mmuoff.data.write") vabits_actual;
@@ -512,7 +512,8 @@ static void __init map_mem(pgd_t *pgdp)
* if MTE is present. Otherwise, it has the same attributes as
* PAGE_KERNEL.
*/
__map_memblock(pgdp, start, end, PAGE_KERNEL_TAGGED, flags);
__map_memblock(pgdp, start, end, pgprot_tagged(PAGE_KERNEL),
flags);
}
/*

View File

@@ -32,7 +32,7 @@ static inline void syscall_rollback(struct task_struct *task,
static inline long syscall_get_error(struct task_struct *task,
struct pt_regs *regs)
{
return regs->r10 == -1 ? regs->r8:0;
return regs->r10 == -1 ? -regs->r8:0;
}
static inline long syscall_get_return_value(struct task_struct *task,

View File

@@ -2013,27 +2013,39 @@ static void syscall_get_set_args_cb(struct unw_frame_info *info, void *data)
{
struct syscall_get_set_args *args = data;
struct pt_regs *pt = args->regs;
unsigned long *krbs, cfm, ndirty;
unsigned long *krbs, cfm, ndirty, nlocals, nouts;
int i, count;
if (unw_unwind_to_user(info) < 0)
return;
/*
* We get here via a few paths:
* - break instruction: cfm is shared with caller.
* syscall args are in out= regs, locals are non-empty.
* - epsinstruction: cfm is set by br.call
* locals don't exist.
*
* For both cases argguments are reachable in cfm.sof - cfm.sol.
* CFM: [ ... | sor: 17..14 | sol : 13..7 | sof : 6..0 ]
*/
cfm = pt->cr_ifs;
nlocals = (cfm >> 7) & 0x7f; /* aka sol */
nouts = (cfm & 0x7f) - nlocals; /* aka sof - sol */
krbs = (unsigned long *)info->task + IA64_RBS_OFFSET/8;
ndirty = ia64_rse_num_regs(krbs, krbs + (pt->loadrs >> 19));
count = 0;
if (in_syscall(pt))
count = min_t(int, args->n, cfm & 0x7f);
count = min_t(int, args->n, nouts);
/* Iterate over outs. */
for (i = 0; i < count; i++) {
int j = ndirty + nlocals + i + args->i;
if (args->rw)
*ia64_rse_skip_regs(krbs, ndirty + i + args->i) =
args->args[i];
*ia64_rse_skip_regs(krbs, j) = args->args[i];
else
args->args[i] = *ia64_rse_skip_regs(krbs,
ndirty + i + args->i);
args->args[i] = *ia64_rse_skip_regs(krbs, j);
}
if (!args->rw) {

View File

@@ -341,7 +341,8 @@ ia64_do_signal (struct sigscratch *scr, long in_syscall)
* need to push through a forced SIGSEGV.
*/
while (1) {
get_signal(&ksig);
if (!get_signal(&ksig))
break;
/*
* get_signal() may have run a debugger (via notify_parent())

View File

@@ -171,7 +171,7 @@ static inline __attribute_const__ int __virt_to_node_shift(void)
#include <asm-generic/memory_model.h>
#endif
#define virt_addr_valid(kaddr) ((void *)(kaddr) >= (void *)PAGE_OFFSET && (void *)(kaddr) < high_memory)
#define virt_addr_valid(kaddr) ((unsigned long)(kaddr) >= PAGE_OFFSET && (unsigned long)(kaddr) < (unsigned long)high_memory)
#define pfn_valid(pfn) virt_addr_valid(pfn_to_virt(pfn))
#endif /* __ASSEMBLY__ */

View File

@@ -30,8 +30,8 @@ extern unsigned long memory_end;
#define page_to_pfn(page) virt_to_pfn(page_to_virt(page))
#define pfn_valid(pfn) ((pfn) < max_mapnr)
#define virt_addr_valid(kaddr) (((void *)(kaddr) >= (void *)PAGE_OFFSET) && \
((void *)(kaddr) < (void *)memory_end))
#define virt_addr_valid(kaddr) (((unsigned long)(kaddr) >= PAGE_OFFSET) && \
((unsigned long)(kaddr) < memory_end))
#endif /* __ASSEMBLY__ */

View File

@@ -14,6 +14,7 @@
#include <asm/addrspace.h>
#include <asm/unaligned.h>
#include <asm-generic/vmlinux.lds.h>
/*
* These two variables specify the free mem region
@@ -120,6 +121,13 @@ void decompress_kernel(unsigned long boot_heap_start)
/* last four bytes is always image size in little endian */
image_size = get_unaligned_le32((void *)&__image_end - 4);
/* The device tree's address must be properly aligned */
image_size = ALIGN(image_size, STRUCT_ALIGNMENT);
puts("Copy device tree to address ");
puthex(VMLINUX_LOAD_ADDRESS_ULL + image_size);
puts("\n");
/* copy dtb to where the booted kernel will expect it */
memcpy((void *)VMLINUX_LOAD_ADDRESS_ULL + image_size,
__appended_dtb, dtb_size);

View File

@@ -12,8 +12,8 @@ AFLAGS_chacha-core.o += -O2 # needed to fill branch delay slots
obj-$(CONFIG_CRYPTO_POLY1305_MIPS) += poly1305-mips.o
poly1305-mips-y := poly1305-core.o poly1305-glue.o
perlasm-flavour-$(CONFIG_CPU_MIPS32) := o32
perlasm-flavour-$(CONFIG_CPU_MIPS64) := 64
perlasm-flavour-$(CONFIG_32BIT) := o32
perlasm-flavour-$(CONFIG_64BIT) := 64
quiet_cmd_perlasm = PERLASM $@
cmd_perlasm = $(PERL) $(<) $(perlasm-flavour-y) $(@)

View File

@@ -24,8 +24,11 @@ extern void (*board_ebase_setup)(void);
extern void (*board_cache_error_setup)(void);
extern int register_nmi_notifier(struct notifier_block *nb);
extern void reserve_exception_space(phys_addr_t addr, unsigned long size);
extern char except_vec_nmi[];
#define VECTORSPACING 0x100 /* for EI/VI mode */
#define nmi_notifier(fn, pri) \
({ \
static struct notifier_block fn##_nb = { \

View File

@@ -26,6 +26,7 @@
#include <asm/elf.h>
#include <asm/pgtable-bits.h>
#include <asm/spram.h>
#include <asm/traps.h>
#include <linux/uaccess.h>
#include "fpu-probe.h"
@@ -1628,6 +1629,7 @@ static inline void cpu_probe_broadcom(struct cpuinfo_mips *c, unsigned int cpu)
c->cputype = CPU_BMIPS3300;
__cpu_name[cpu] = "Broadcom BMIPS3300";
set_elf_platform(cpu, "bmips3300");
reserve_exception_space(0x400, VECTORSPACING * 64);
break;
case PRID_IMP_BMIPS43XX: {
int rev = c->processor_id & PRID_REV_MASK;
@@ -1638,6 +1640,7 @@ static inline void cpu_probe_broadcom(struct cpuinfo_mips *c, unsigned int cpu)
__cpu_name[cpu] = "Broadcom BMIPS4380";
set_elf_platform(cpu, "bmips4380");
c->options |= MIPS_CPU_RIXI;
reserve_exception_space(0x400, VECTORSPACING * 64);
} else {
c->cputype = CPU_BMIPS4350;
__cpu_name[cpu] = "Broadcom BMIPS4350";
@@ -1654,6 +1657,7 @@ static inline void cpu_probe_broadcom(struct cpuinfo_mips *c, unsigned int cpu)
__cpu_name[cpu] = "Broadcom BMIPS5000";
set_elf_platform(cpu, "bmips5000");
c->options |= MIPS_CPU_ULRI | MIPS_CPU_RIXI;
reserve_exception_space(0x1000, VECTORSPACING * 64);
break;
}
}
@@ -2133,6 +2137,8 @@ void cpu_probe(void)
if (cpu == 0)
__ua_limit = ~((1ull << cpu_vmbits) - 1);
#endif
reserve_exception_space(0, 0x1000);
}
void cpu_report(void)

View File

@@ -21,6 +21,7 @@
#include <asm/fpu.h>
#include <asm/mipsregs.h>
#include <asm/elf.h>
#include <asm/traps.h>
#include "fpu-probe.h"
@@ -158,6 +159,8 @@ void cpu_probe(void)
cpu_set_fpu_opts(c);
else
cpu_set_nofpu_opts(c);
reserve_exception_space(0, 0x400);
}
void cpu_report(void)

View File

@@ -2009,13 +2009,16 @@ void __noreturn nmi_exception_handler(struct pt_regs *regs)
nmi_exit();
}
#define VECTORSPACING 0x100 /* for EI/VI mode */
unsigned long ebase;
EXPORT_SYMBOL_GPL(ebase);
unsigned long exception_handlers[32];
unsigned long vi_handlers[64];
void reserve_exception_space(phys_addr_t addr, unsigned long size)
{
memblock_reserve(addr, size);
}
void __init *set_except_vector(int n, void *addr)
{
unsigned long handler = (unsigned long) addr;
@@ -2367,10 +2370,7 @@ void __init trap_init(void)
if (!cpu_has_mips_r2_r6) {
ebase = CAC_BASE;
ebase_pa = virt_to_phys((void *)ebase);
vec_size = 0x400;
memblock_reserve(ebase_pa, vec_size);
} else {
if (cpu_has_veic || cpu_has_vint)
vec_size = 0x200 + VECTORSPACING*64;

View File

@@ -145,6 +145,7 @@ SECTIONS
}
#ifdef CONFIG_MIPS_ELF_APPENDED_DTB
STRUCT_ALIGN();
.appended_dtb : AT(ADDR(.appended_dtb) - LOAD_OFFSET) {
*(.appended_dtb)
KEEP(*(.appended_dtb))
@@ -172,6 +173,11 @@ SECTIONS
#endif
#ifdef CONFIG_MIPS_RAW_APPENDED_DTB
.fill : {
FILL(0);
BYTE(0);
. = ALIGN(8);
}
__appended_dtb = .;
/* leave space for appended DTB */
. += 0x100000;

View File

@@ -203,9 +203,12 @@ config PREFETCH
def_bool y
depends on PA8X00 || PA7200
config PARISC_HUGE_KERNEL
def_bool y if !MODULES || UBSAN || FTRACE || COMPILE_TEST
config MLONGCALLS
def_bool y if !MODULES || UBSAN || FTRACE
bool "Enable the -mlong-calls compiler option for big kernels" if MODULES && !UBSAN && !FTRACE
def_bool y if PARISC_HUGE_KERNEL
bool "Enable the -mlong-calls compiler option for big kernels" if !PARISC_HUGE_KERNEL
depends on PA8X00
help
If you configure the kernel to include many drivers built-in instead

View File

@@ -567,8 +567,6 @@ static const struct user_regset_view user_parisc_native_view = {
};
#ifdef CONFIG_64BIT
#include <linux/compat.h>
static int gpr32_get(struct task_struct *target,
const struct user_regset *regset,
struct membuf to)

View File

@@ -73,9 +73,10 @@ void __patch_exception(int exc, unsigned long addr);
#endif
#define OP_RT_RA_MASK 0xffff0000UL
#define LIS_R2 0x3c020000UL
#define ADDIS_R2_R12 0x3c4c0000UL
#define ADDI_R2_R2 0x38420000UL
#define LIS_R2 (PPC_INST_ADDIS | __PPC_RT(R2))
#define ADDIS_R2_R12 (PPC_INST_ADDIS | __PPC_RT(R2) | __PPC_RA(R12))
#define ADDI_R2_R2 (PPC_INST_ADDI | __PPC_RT(R2) | __PPC_RA(R2))
static inline unsigned long ppc_function_entry(void *func)
{

View File

@@ -53,8 +53,8 @@ static inline void mtdcrx(unsigned int reg, unsigned int val)
#define mfdcr(rn) \
({unsigned int rval; \
if (__builtin_constant_p(rn) && rn < 1024) \
asm volatile("mfdcr %0," __stringify(rn) \
: "=r" (rval)); \
asm volatile("mfdcr %0, %1" : "=r" (rval) \
: "n" (rn)); \
else if (likely(cpu_has_feature(CPU_FTR_INDEXED_DCR))) \
rval = mfdcrx(rn); \
else \
@@ -64,8 +64,8 @@ static inline void mtdcrx(unsigned int reg, unsigned int val)
#define mtdcr(rn, v) \
do { \
if (__builtin_constant_p(rn) && rn < 1024) \
asm volatile("mtdcr " __stringify(rn) ",%0" \
: : "r" (v)); \
asm volatile("mtdcr %0, %1" \
: : "n" (rn), "r" (v)); \
else if (likely(cpu_has_feature(CPU_FTR_INDEXED_DCR))) \
mtdcrx(rn, v); \
else \

View File

@@ -410,7 +410,6 @@ DECLARE_INTERRUPT_HANDLER(altivec_assist_exception);
DECLARE_INTERRUPT_HANDLER(CacheLockingException);
DECLARE_INTERRUPT_HANDLER(SPEFloatingPointException);
DECLARE_INTERRUPT_HANDLER(SPEFloatingPointRoundException);
DECLARE_INTERRUPT_HANDLER(unrecoverable_exception);
DECLARE_INTERRUPT_HANDLER(WatchdogException);
DECLARE_INTERRUPT_HANDLER(kernel_bad_stack);
@@ -437,6 +436,8 @@ DECLARE_INTERRUPT_HANDLER_NMI(hmi_exception_realmode);
DECLARE_INTERRUPT_HANDLER_ASYNC(TAUException);
void unrecoverable_exception(struct pt_regs *regs);
void replay_system_reset(void);
void replay_soft_interrupts(void);

View File

@@ -228,7 +228,7 @@ enum {
#define MMU_FTRS_ALWAYS 0
#endif
static inline bool early_mmu_has_feature(unsigned long feature)
static __always_inline bool early_mmu_has_feature(unsigned long feature)
{
if (MMU_FTRS_ALWAYS & feature)
return true;
@@ -286,7 +286,7 @@ static inline void mmu_feature_keys_init(void)
}
static inline bool mmu_has_feature(unsigned long feature)
static __always_inline bool mmu_has_feature(unsigned long feature)
{
return early_mmu_has_feature(feature);
}

View File

@@ -195,7 +195,7 @@ static inline void regs_set_return_value(struct pt_regs *regs, unsigned long rc)
#define TRAP_FLAGS_MASK 0x11
#define TRAP(regs) ((regs)->trap & ~TRAP_FLAGS_MASK)
#define FULL_REGS(regs) (((regs)->trap & 1) == 0)
#define SET_FULL_REGS(regs) ((regs)->trap |= 1)
#define SET_FULL_REGS(regs) ((regs)->trap &= ~1)
#endif
#define CHECK_FULL_REGS(regs) BUG_ON(!FULL_REGS(regs))
#define NV_REG_POISON 0xdeadbeefdeadbeefUL
@@ -210,7 +210,7 @@ static inline void regs_set_return_value(struct pt_regs *regs, unsigned long rc)
#define TRAP_FLAGS_MASK 0x1F
#define TRAP(regs) ((regs)->trap & ~TRAP_FLAGS_MASK)
#define FULL_REGS(regs) (((regs)->trap & 1) == 0)
#define SET_FULL_REGS(regs) ((regs)->trap |= 1)
#define SET_FULL_REGS(regs) ((regs)->trap &= ~1)
#define IS_CRITICAL_EXC(regs) (((regs)->trap & 2) != 0)
#define IS_MCHECK_EXC(regs) (((regs)->trap & 4) != 0)
#define IS_DEBUG_EXC(regs) (((regs)->trap & 8) != 0)

View File

@@ -71,6 +71,16 @@ static inline void disable_kernel_vsx(void)
{
msr_check_and_clear(MSR_FP|MSR_VEC|MSR_VSX);
}
#else
static inline void enable_kernel_vsx(void)
{
BUILD_BUG();
}
static inline void disable_kernel_vsx(void)
{
BUILD_BUG();
}
#endif
#ifdef CONFIG_SPE

View File

@@ -113,7 +113,7 @@ struct vio_driver {
const char *name;
const struct vio_device_id *id_table;
int (*probe)(struct vio_dev *dev, const struct vio_device_id *id);
int (*remove)(struct vio_dev *dev);
void (*remove)(struct vio_dev *dev);
/* A driver must have a get_desired_dma() function to
* be loaded in a CMO environment if it uses DMA.
*/

View File

@@ -466,7 +466,7 @@ DEFINE_FIXED_SYMBOL(\name\()_common_real)
ld r10,PACAKMSR(r13) /* get MSR value for kernel */
/* MSR[RI] is clear iff using SRR regs */
.if IHSRR == EXC_HV_OR_STD
.if IHSRR_IF_HVMODE
BEGIN_FTR_SECTION
xori r10,r10,MSR_RI
END_FTR_SECTION_IFCLR(CPU_FTR_HVMODE)

View File

@@ -457,11 +457,12 @@ InstructionTLBMiss:
cmplw 0,r1,r3
#endif
mfspr r2, SPRN_SDR1
li r1,_PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_EXEC
li r1,_PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_EXEC | _PAGE_USER
rlwinm r2, r2, 28, 0xfffff000
#ifdef CONFIG_MODULES
bgt- 112f
lis r2, (swapper_pg_dir - PAGE_OFFSET)@ha /* if kernel address, use */
li r1,_PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_EXEC
addi r2, r2, (swapper_pg_dir - PAGE_OFFSET)@l /* kernel page table */
#endif
112: rlwimi r2,r3,12,20,29 /* insert top 10 bits of address */
@@ -520,10 +521,11 @@ DataLoadTLBMiss:
lis r1, TASK_SIZE@h /* check if kernel address */
cmplw 0,r1,r3
mfspr r2, SPRN_SDR1
li r1, _PAGE_PRESENT | _PAGE_ACCESSED
li r1, _PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_USER
rlwinm r2, r2, 28, 0xfffff000
bgt- 112f
lis r2, (swapper_pg_dir - PAGE_OFFSET)@ha /* if kernel address, use */
li r1, _PAGE_PRESENT | _PAGE_ACCESSED
addi r2, r2, (swapper_pg_dir - PAGE_OFFSET)@l /* kernel page table */
112: rlwimi r2,r3,12,20,29 /* insert top 10 bits of address */
lwz r2,0(r2) /* get pmd entry */
@@ -597,10 +599,11 @@ DataStoreTLBMiss:
lis r1, TASK_SIZE@h /* check if kernel address */
cmplw 0,r1,r3
mfspr r2, SPRN_SDR1
li r1, _PAGE_RW | _PAGE_DIRTY | _PAGE_PRESENT | _PAGE_ACCESSED
li r1, _PAGE_RW | _PAGE_DIRTY | _PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_USER
rlwinm r2, r2, 28, 0xfffff000
bgt- 112f
lis r2, (swapper_pg_dir - PAGE_OFFSET)@ha /* if kernel address, use */
li r1, _PAGE_RW | _PAGE_DIRTY | _PAGE_PRESENT | _PAGE_ACCESSED
addi r2, r2, (swapper_pg_dir - PAGE_OFFSET)@l /* kernel page table */
112: rlwimi r2,r3,12,20,29 /* insert top 10 bits of address */
lwz r2,0(r2) /* get pmd entry */

View File

@@ -149,7 +149,7 @@ notrace long system_call_exception(long r3, long r4, long r5,
* enabled when the interrupt handler returns (indicating a process-context /
* synchronous interrupt) then irqs_enabled should be true.
*/
static notrace inline bool __prep_irq_for_enabled_exit(bool clear_ri)
static notrace __always_inline bool __prep_irq_for_enabled_exit(bool clear_ri)
{
/* This must be done with RI=1 because tracing may touch vmaps */
trace_hardirqs_on();
@@ -436,7 +436,6 @@ again:
return ret;
}
void unrecoverable_exception(struct pt_regs *regs);
void preempt_schedule_irq(void);
notrace unsigned long interrupt_exit_kernel_prepare(struct pt_regs *regs, unsigned long msr)

View File

@@ -2170,7 +2170,7 @@ DEFINE_INTERRUPT_HANDLER(SPEFloatingPointRoundException)
* in the MSR is 0. This indicates that SRR0/1 are live, and that
* we therefore lost state by taking this exception.
*/
DEFINE_INTERRUPT_HANDLER(unrecoverable_exception)
void unrecoverable_exception(struct pt_regs *regs)
{
pr_emerg("Unrecoverable exception %lx at %lx (msr=%lx)\n",
regs->trap, regs->nip, regs->msr);

View File

@@ -904,7 +904,7 @@ static nokprobe_inline int do_vsx_load(struct instruction_op *op,
if (!address_ok(regs, ea, size) || copy_mem_in(mem, ea, size, regs))
return -EFAULT;
nr_vsx_regs = size / sizeof(__vector128);
nr_vsx_regs = max(1ul, size / sizeof(__vector128));
emulate_vsx_load(op, buf, mem, cross_endian);
preempt_disable();
if (reg < 32) {
@@ -951,7 +951,7 @@ static nokprobe_inline int do_vsx_store(struct instruction_op *op,
if (!address_ok(regs, ea, size))
return -EFAULT;
nr_vsx_regs = size / sizeof(__vector128);
nr_vsx_regs = max(1ul, size / sizeof(__vector128));
preempt_disable();
if (reg < 32) {
/* FP regs + extensions */

View File

@@ -222,7 +222,7 @@ static inline void perf_get_data_addr(struct perf_event *event, struct pt_regs *
if (!(mmcra & MMCRA_SAMPLE_ENABLE) || sdar_valid)
*addrp = mfspr(SPRN_SDAR);
if (is_kernel_addr(mfspr(SPRN_SDAR)) && perf_allow_kernel(&event->attr) != 0)
if (is_kernel_addr(mfspr(SPRN_SDAR)) && event->attr.exclude_kernel)
*addrp = 0;
}
@@ -507,7 +507,7 @@ static void power_pmu_bhrb_read(struct perf_event *event, struct cpu_hw_events *
* addresses, hence include a check before filtering code
*/
if (!(ppmu->flags & PPMU_ARCH_31) &&
is_kernel_addr(addr) && perf_allow_kernel(&event->attr) != 0)
is_kernel_addr(addr) && event->attr.exclude_kernel)
continue;
/* Branches are read most recent first (ie. mfbhrb 0 is

View File

@@ -4,6 +4,7 @@
* Copyright 2006-2007 Michael Ellerman, IBM Corp.
*/
#include <linux/crash_dump.h>
#include <linux/device.h>
#include <linux/irq.h>
#include <linux/msi.h>
@@ -458,8 +459,28 @@ again:
return hwirq;
}
virq = irq_create_mapping_affinity(NULL, hwirq,
entry->affinity);
/*
* Depending on the number of online CPUs in the original
* kernel, it is likely for CPU #0 to be offline in a kdump
* kernel. The associated IRQs in the affinity mappings
* provided by irq_create_affinity_masks() are thus not
* started by irq_startup(), as per-design for managed IRQs.
* This can be a problem with multi-queue block devices driven
* by blk-mq : such a non-started IRQ is very likely paired
* with the single queue enforced by blk-mq during kdump (see
* blk_mq_alloc_tag_set()). This causes the device to remain
* silent and likely hangs the guest at some point.
*
* We don't really care for fine-grained affinity when doing
* kdump actually : simply ignore the pre-computed affinity
* masks in this case and let the default mask with all CPUs
* be used when creating the IRQ mappings.
*/
if (is_kdump_kernel())
virq = irq_create_mapping(NULL, hwirq);
else
virq = irq_create_mapping_affinity(NULL, hwirq,
entry->affinity);
if (!virq) {
pr_debug("rtas_msi: Failed mapping hwirq %d\n", hwirq);

View File

@@ -1261,7 +1261,6 @@ static int vio_bus_remove(struct device *dev)
struct vio_dev *viodev = to_vio_dev(dev);
struct vio_driver *viodrv = to_vio_driver(dev->driver);
struct device *devptr;
int ret = 1;
/*
* Hold a reference to the device after the remove function is called
@@ -1270,13 +1269,13 @@ static int vio_bus_remove(struct device *dev)
devptr = get_device(dev);
if (viodrv->remove)
ret = viodrv->remove(viodev);
viodrv->remove(viodev);
if (!ret && firmware_has_feature(FW_FEATURE_CMO))
if (firmware_has_feature(FW_FEATURE_CMO))
vio_cmo_bus_remove(viodev);
put_device(devptr);
return ret;
return 0;
}
/**

View File

@@ -275,9 +275,9 @@ CONFIG_IP_VS_DH=m
CONFIG_IP_VS_SH=m
CONFIG_IP_VS_SED=m
CONFIG_IP_VS_NQ=m
CONFIG_IP_VS_TWOS=m
CONFIG_IP_VS_FTP=m
CONFIG_IP_VS_PE_SIP=m
CONFIG_NF_TABLES_IPV4=y
CONFIG_NFT_FIB_IPV4=m
CONFIG_NF_TABLES_ARP=y
CONFIG_IP_NF_IPTABLES=m
@@ -298,7 +298,6 @@ CONFIG_IP_NF_SECURITY=m
CONFIG_IP_NF_ARPTABLES=m
CONFIG_IP_NF_ARPFILTER=m
CONFIG_IP_NF_ARP_MANGLE=m
CONFIG_NF_TABLES_IPV6=y
CONFIG_NFT_FIB_IPV6=m
CONFIG_IP6_NF_IPTABLES=m
CONFIG_IP6_NF_MATCH_AH=m
@@ -481,7 +480,6 @@ CONFIG_NLMON=m
# CONFIG_NET_VENDOR_AQUANTIA is not set
# CONFIG_NET_VENDOR_ARC is not set
# CONFIG_NET_VENDOR_ATHEROS is not set
# CONFIG_NET_VENDOR_AURORA is not set
# CONFIG_NET_VENDOR_BROADCOM is not set
# CONFIG_NET_VENDOR_BROCADE is not set
# CONFIG_NET_VENDOR_CADENCE is not set
@@ -581,7 +579,6 @@ CONFIG_VIRTIO_BALLOON=m
CONFIG_VIRTIO_INPUT=y
CONFIG_VHOST_NET=m
CONFIG_VHOST_VSOCK=m
# CONFIG_SURFACE_PLATFORMS is not set
CONFIG_S390_CCW_IOMMU=y
CONFIG_S390_AP_IOMMU=y
CONFIG_EXT4_FS=y
@@ -635,6 +632,7 @@ CONFIG_NTFS_RW=y
CONFIG_PROC_KCORE=y
CONFIG_TMPFS=y
CONFIG_TMPFS_POSIX_ACL=y
CONFIG_TMPFS_INODE64=y
CONFIG_HUGETLBFS=y
CONFIG_CONFIGFS_FS=m
CONFIG_ECRYPT_FS=m
@@ -714,12 +712,8 @@ CONFIG_CRYPTO_VMAC=m
CONFIG_CRYPTO_CRC32=m
CONFIG_CRYPTO_BLAKE2S=m
CONFIG_CRYPTO_MICHAEL_MIC=m
CONFIG_CRYPTO_RMD128=m
CONFIG_CRYPTO_RMD160=m
CONFIG_CRYPTO_RMD256=m
CONFIG_CRYPTO_RMD320=m
CONFIG_CRYPTO_SHA3=m
CONFIG_CRYPTO_TGR192=m
CONFIG_CRYPTO_WP512=m
CONFIG_CRYPTO_AES_TI=m
CONFIG_CRYPTO_ANUBIS=m
@@ -731,7 +725,6 @@ CONFIG_CRYPTO_CAST6=m
CONFIG_CRYPTO_DES=m
CONFIG_CRYPTO_FCRYPT=m
CONFIG_CRYPTO_KHAZAD=m
CONFIG_CRYPTO_SALSA20=m
CONFIG_CRYPTO_SEED=m
CONFIG_CRYPTO_SERPENT=m
CONFIG_CRYPTO_SM4=m
@@ -796,12 +789,9 @@ CONFIG_DEBUG_OBJECTS_RCU_HEAD=y
CONFIG_DEBUG_OBJECTS_PERCPU_COUNTER=y
CONFIG_SLUB_DEBUG_ON=y
CONFIG_SLUB_STATS=y
CONFIG_DEBUG_KMEMLEAK=y
CONFIG_DEBUG_KMEMLEAK_DEFAULT_OFF=y
CONFIG_DEBUG_STACK_USAGE=y
CONFIG_DEBUG_VM=y
CONFIG_DEBUG_VM_VMACACHE=y
CONFIG_DEBUG_VM_RB=y
CONFIG_DEBUG_VM_PGFLAGS=y
CONFIG_DEBUG_MEMORY_INIT=y
CONFIG_MEMORY_NOTIFIER_ERROR_INJECT=m
@@ -838,6 +828,7 @@ CONFIG_BPF_KPROBE_OVERRIDE=y
CONFIG_HIST_TRIGGERS=y
CONFIG_FTRACE_STARTUP_TEST=y
# CONFIG_EVENT_TRACE_STARTUP_TEST is not set
CONFIG_DEBUG_ENTRY=y
CONFIG_NOTIFIER_ERROR_INJECTION=m
CONFIG_NETDEV_NOTIFIER_ERROR_INJECT=m
CONFIG_FAULT_INJECTION=y
@@ -861,4 +852,3 @@ CONFIG_PERCPU_TEST=m
CONFIG_ATOMIC64_SELFTEST=y
CONFIG_TEST_BITOPS=m
CONFIG_TEST_BPF=m
CONFIG_DEBUG_ENTRY=y

View File

@@ -266,9 +266,9 @@ CONFIG_IP_VS_DH=m
CONFIG_IP_VS_SH=m
CONFIG_IP_VS_SED=m
CONFIG_IP_VS_NQ=m
CONFIG_IP_VS_TWOS=m
CONFIG_IP_VS_FTP=m
CONFIG_IP_VS_PE_SIP=m
CONFIG_NF_TABLES_IPV4=y
CONFIG_NFT_FIB_IPV4=m
CONFIG_NF_TABLES_ARP=y
CONFIG_IP_NF_IPTABLES=m
@@ -289,7 +289,6 @@ CONFIG_IP_NF_SECURITY=m
CONFIG_IP_NF_ARPTABLES=m
CONFIG_IP_NF_ARPFILTER=m
CONFIG_IP_NF_ARP_MANGLE=m
CONFIG_NF_TABLES_IPV6=y
CONFIG_NFT_FIB_IPV6=m
CONFIG_IP6_NF_IPTABLES=m
CONFIG_IP6_NF_MATCH_AH=m
@@ -473,7 +472,6 @@ CONFIG_NLMON=m
# CONFIG_NET_VENDOR_AQUANTIA is not set
# CONFIG_NET_VENDOR_ARC is not set
# CONFIG_NET_VENDOR_ATHEROS is not set
# CONFIG_NET_VENDOR_AURORA is not set
# CONFIG_NET_VENDOR_BROADCOM is not set
# CONFIG_NET_VENDOR_BROCADE is not set
# CONFIG_NET_VENDOR_CADENCE is not set
@@ -573,7 +571,6 @@ CONFIG_VIRTIO_BALLOON=m
CONFIG_VIRTIO_INPUT=y
CONFIG_VHOST_NET=m
CONFIG_VHOST_VSOCK=m
# CONFIG_SURFACE_PLATFORMS is not set
CONFIG_S390_CCW_IOMMU=y
CONFIG_S390_AP_IOMMU=y
CONFIG_EXT4_FS=y
@@ -623,6 +620,7 @@ CONFIG_NTFS_RW=y
CONFIG_PROC_KCORE=y
CONFIG_TMPFS=y
CONFIG_TMPFS_POSIX_ACL=y
CONFIG_TMPFS_INODE64=y
CONFIG_HUGETLBFS=y
CONFIG_CONFIGFS_FS=m
CONFIG_ECRYPT_FS=m
@@ -703,12 +701,8 @@ CONFIG_CRYPTO_VMAC=m
CONFIG_CRYPTO_CRC32=m
CONFIG_CRYPTO_BLAKE2S=m
CONFIG_CRYPTO_MICHAEL_MIC=m
CONFIG_CRYPTO_RMD128=m
CONFIG_CRYPTO_RMD160=m
CONFIG_CRYPTO_RMD256=m
CONFIG_CRYPTO_RMD320=m
CONFIG_CRYPTO_SHA3=m
CONFIG_CRYPTO_TGR192=m
CONFIG_CRYPTO_WP512=m
CONFIG_CRYPTO_AES_TI=m
CONFIG_CRYPTO_ANUBIS=m
@@ -720,7 +714,6 @@ CONFIG_CRYPTO_CAST6=m
CONFIG_CRYPTO_DES=m
CONFIG_CRYPTO_FCRYPT=m
CONFIG_CRYPTO_KHAZAD=m
CONFIG_CRYPTO_SALSA20=m
CONFIG_CRYPTO_SEED=m
CONFIG_CRYPTO_SERPENT=m
CONFIG_CRYPTO_SM4=m

View File

@@ -26,7 +26,6 @@ CONFIG_CRASH_DUMP=y
# CONFIG_SECCOMP is not set
# CONFIG_GCC_PLUGINS is not set
CONFIG_PARTITION_ADVANCED=y
CONFIG_IBM_PARTITION=y
# CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS is not set
# CONFIG_COMPACTION is not set
# CONFIG_MIGRATION is not set
@@ -61,11 +60,9 @@ CONFIG_RAW_DRIVER=y
# CONFIG_HID is not set
# CONFIG_VIRTIO_MENU is not set
# CONFIG_VHOST_MENU is not set
# CONFIG_SURFACE_PLATFORMS is not set
# CONFIG_IOMMU_SUPPORT is not set
# CONFIG_DNOTIFY is not set
# CONFIG_INOTIFY_USER is not set
CONFIG_CONFIGFS_FS=y
# CONFIG_MISC_FILESYSTEMS is not set
# CONFIG_NETWORK_FILESYSTEMS is not set
CONFIG_LSM="yama,loadpin,safesetid,integrity"

View File

@@ -14,12 +14,12 @@
struct s390_idle_data {
seqcount_t seqcount;
unsigned long long idle_count;
unsigned long long idle_time;
unsigned long long clock_idle_enter;
unsigned long long clock_idle_exit;
unsigned long long timer_idle_enter;
unsigned long long timer_idle_exit;
unsigned long idle_count;
unsigned long idle_time;
unsigned long clock_idle_enter;
unsigned long clock_idle_exit;
unsigned long timer_idle_enter;
unsigned long timer_idle_exit;
unsigned long mt_cycles_enter[8];
};

View File

@@ -98,10 +98,10 @@ extern unsigned char ptff_function_mask[16];
/* Query TOD offset result */
struct ptff_qto {
unsigned long long physical_clock;
unsigned long long tod_offset;
unsigned long long logical_tod_offset;
unsigned long long tod_epoch_difference;
unsigned long physical_clock;
unsigned long tod_offset;
unsigned long logical_tod_offset;
unsigned long tod_epoch_difference;
} __packed;
static inline int ptff_query(unsigned int nr)
@@ -151,9 +151,9 @@ struct ptff_qui {
rc; \
})
static inline unsigned long long local_tick_disable(void)
static inline unsigned long local_tick_disable(void)
{
unsigned long long old;
unsigned long old;
old = S390_lowcore.clock_comparator;
S390_lowcore.clock_comparator = clock_comparator_max;
@@ -161,7 +161,7 @@ static inline unsigned long long local_tick_disable(void)
return old;
}
static inline void local_tick_enable(unsigned long long comp)
static inline void local_tick_enable(unsigned long comp)
{
S390_lowcore.clock_comparator = comp;
set_clock_comparator(S390_lowcore.clock_comparator);
@@ -169,9 +169,9 @@ static inline void local_tick_enable(unsigned long long comp)
#define CLOCK_TICK_RATE 1193180 /* Underlying HZ */
typedef unsigned long long cycles_t;
typedef unsigned long cycles_t;
static inline unsigned long long get_tod_clock(void)
static inline unsigned long get_tod_clock(void)
{
union tod_clock clk;
@@ -179,10 +179,10 @@ static inline unsigned long long get_tod_clock(void)
return clk.tod;
}
static inline unsigned long long get_tod_clock_fast(void)
static inline unsigned long get_tod_clock_fast(void)
{
#ifdef CONFIG_HAVE_MARCH_Z9_109_FEATURES
unsigned long long clk;
unsigned long clk;
asm volatile("stckf %0" : "=Q" (clk) : : "cc");
return clk;
@@ -208,9 +208,9 @@ extern union tod_clock tod_clock_base;
* Therefore preemption must be disabled, otherwise the returned
* value is not guaranteed to be monotonic.
*/
static inline unsigned long long get_tod_clock_monotonic(void)
static inline unsigned long get_tod_clock_monotonic(void)
{
unsigned long long tod;
unsigned long tod;
preempt_disable_notrace();
tod = get_tod_clock() - tod_clock_base.tod;
@@ -237,7 +237,7 @@ static inline unsigned long long get_tod_clock_monotonic(void)
* -> ns = (th * 125) + ((tl * 125) >> 9);
*
*/
static inline unsigned long long tod_to_ns(unsigned long long todval)
static inline unsigned long tod_to_ns(unsigned long todval)
{
return ((todval >> 9) * 125) + (((todval & 0x1ff) * 125) >> 9);
}
@@ -249,10 +249,10 @@ static inline unsigned long long tod_to_ns(unsigned long long todval)
*
* Returns: true if a is later than b
*/
static inline int tod_after(unsigned long long a, unsigned long long b)
static inline int tod_after(unsigned long a, unsigned long b)
{
if (MACHINE_HAS_SCC)
return (long long) a > (long long) b;
return (long) a > (long) b;
return a > b;
}
@@ -263,10 +263,10 @@ static inline int tod_after(unsigned long long a, unsigned long long b)
*
* Returns: true if a is later than b
*/
static inline int tod_after_eq(unsigned long long a, unsigned long long b)
static inline int tod_after_eq(unsigned long a, unsigned long b)
{
if (MACHINE_HAS_SCC)
return (long long) a >= (long long) b;
return (long) a >= (long) b;
return a >= b;
}

View File

@@ -47,7 +47,7 @@ void account_idle_time_irq(void)
void arch_cpu_idle(void)
{
struct s390_idle_data *idle = this_cpu_ptr(&s390_idle);
unsigned long long idle_time;
unsigned long idle_time;
unsigned long psw_mask;
/* Wait for external, I/O or machine check interrupt. */
@@ -73,7 +73,7 @@ static ssize_t show_idle_count(struct device *dev,
struct device_attribute *attr, char *buf)
{
struct s390_idle_data *idle = &per_cpu(s390_idle, dev->id);
unsigned long long idle_count;
unsigned long idle_count;
unsigned int seq;
do {
@@ -82,14 +82,14 @@ static ssize_t show_idle_count(struct device *dev,
if (READ_ONCE(idle->clock_idle_enter))
idle_count++;
} while (read_seqcount_retry(&idle->seqcount, seq));
return sprintf(buf, "%llu\n", idle_count);
return sprintf(buf, "%lu\n", idle_count);
}
DEVICE_ATTR(idle_count, 0444, show_idle_count, NULL);
static ssize_t show_idle_time(struct device *dev,
struct device_attribute *attr, char *buf)
{
unsigned long long now, idle_time, idle_enter, idle_exit, in_idle;
unsigned long now, idle_time, idle_enter, idle_exit, in_idle;
struct s390_idle_data *idle = &per_cpu(s390_idle, dev->id);
unsigned int seq;
@@ -109,14 +109,14 @@ static ssize_t show_idle_time(struct device *dev,
}
}
idle_time += in_idle;
return sprintf(buf, "%llu\n", idle_time >> 12);
return sprintf(buf, "%lu\n", idle_time >> 12);
}
DEVICE_ATTR(idle_time_us, 0444, show_idle_time, NULL);
u64 arch_cpu_idle_time(int cpu)
{
struct s390_idle_data *idle = &per_cpu(s390_idle, cpu);
unsigned long long now, idle_enter, idle_exit, in_idle;
unsigned long now, idle_enter, idle_exit, in_idle;
unsigned int seq;
do {

View File

@@ -269,7 +269,7 @@ static int __hw_perf_event_init(struct perf_event *event, unsigned int type)
case CPUMF_CTR_SET_MAX:
/* The counter could not be associated to a counter set */
return -EINVAL;
};
}
/* Initialize for using the CPU-measurement counter facility */
if (!atomic_inc_not_zero(&num_events)) {

View File

@@ -26,12 +26,10 @@
#include <asm/timex.h>
#include <asm/debug.h>
#include <asm/perf_cpum_cf_diag.h>
#include <asm/hwctrset.h>
#define CF_DIAG_CTRSET_DEF 0xfeef /* Counter set header mark */
#define CF_DIAG_MIN_INTERVAL 60 /* Minimum counter set read */
/* interval in seconds */
static unsigned long cf_diag_interval = CF_DIAG_MIN_INTERVAL;
static unsigned int cf_diag_cpu_speed;
static debug_info_t *cf_diag_dbg;
@@ -729,7 +727,6 @@ static DEFINE_MUTEX(cf_diag_ctrset_mutex);
static struct cf_diag_ctrset {
unsigned long ctrset; /* Bit mask of counter set to read */
cpumask_t mask; /* CPU mask to read from */
time64_t lastread; /* Epoch counter set last read */
} cf_diag_ctrset;
static void cf_diag_ctrset_clear(void)
@@ -866,27 +863,16 @@ static int cf_diag_all_read(unsigned long arg)
{
struct cf_diag_call_on_cpu_parm p;
cpumask_var_t mask;
time64_t now;
int rc = 0;
int rc;
debug_sprintf_event(cf_diag_dbg, 5, "%s\n", __func__);
if (!alloc_cpumask_var(&mask, GFP_KERNEL))
return -ENOMEM;
now = ktime_get_seconds();
if (cf_diag_ctrset.lastread + cf_diag_interval > now) {
debug_sprintf_event(cf_diag_dbg, 5, "%s now %lld "
" lastread %lld\n", __func__, now,
cf_diag_ctrset.lastread);
rc = -EAGAIN;
goto out;
} else {
cf_diag_ctrset.lastread = now;
}
p.sets = cf_diag_ctrset.ctrset;
cpumask_and(mask, &cf_diag_ctrset.mask, cpu_online_mask);
on_each_cpu_mask(mask, cf_diag_cpu_read, &p, 1);
rc = cf_diag_all_copy(arg, mask);
out:
free_cpumask_var(mask);
debug_sprintf_event(cf_diag_dbg, 5, "%s rc %d\n", __func__, rc);
return rc;

View File

@@ -68,10 +68,10 @@ EXPORT_SYMBOL(s390_epoch_delta_notifier);
unsigned char ptff_function_mask[16];
static unsigned long long lpar_offset;
static unsigned long long initial_leap_seconds;
static unsigned long long tod_steering_end;
static long long tod_steering_delta;
static unsigned long lpar_offset;
static unsigned long initial_leap_seconds;
static unsigned long tod_steering_end;
static long tod_steering_delta;
/*
* Get time offsets with PTFF
@@ -96,7 +96,7 @@ void __init time_early_init(void)
/* get initial leap seconds */
if (ptff_query(PTFF_QUI) && ptff(&qui, sizeof(qui), PTFF_QUI) == 0)
initial_leap_seconds = (unsigned long long)
initial_leap_seconds = (unsigned long)
((long) qui.old_leap * 4096000000L);
}
@@ -222,7 +222,7 @@ void __init read_persistent_wall_and_boot_offset(struct timespec64 *wall_time,
static u64 read_tod_clock(struct clocksource *cs)
{
unsigned long long now, adj;
unsigned long now, adj;
preempt_disable(); /* protect from changes to steering parameters */
now = get_tod_clock();
@@ -362,7 +362,7 @@ static inline int check_sync_clock(void)
* Apply clock delta to the global data structures.
* This is called once on the CPU that performed the clock sync.
*/
static void clock_sync_global(unsigned long long delta)
static void clock_sync_global(unsigned long delta)
{
unsigned long now, adj;
struct ptff_qto qto;
@@ -378,7 +378,7 @@ static void clock_sync_global(unsigned long long delta)
-(adj >> 15) : (adj >> 15);
tod_steering_delta += delta;
if ((abs(tod_steering_delta) >> 48) != 0)
panic("TOD clock sync offset %lli is too large to drift\n",
panic("TOD clock sync offset %li is too large to drift\n",
tod_steering_delta);
tod_steering_end = now + (abs(tod_steering_delta) << 15);
vdso_data->arch_data.tod_steering_end = tod_steering_end;
@@ -394,7 +394,7 @@ static void clock_sync_global(unsigned long long delta)
* Apply clock delta to the per-CPU data structures of this CPU.
* This is called for each online CPU after the call to clock_sync_global.
*/
static void clock_sync_local(unsigned long long delta)
static void clock_sync_local(unsigned long delta)
{
/* Add the delta to the clock comparator. */
if (S390_lowcore.clock_comparator != clock_comparator_max) {
@@ -418,7 +418,7 @@ static void __init time_init_wq(void)
struct clock_sync_data {
atomic_t cpus;
int in_sync;
unsigned long long clock_delta;
unsigned long clock_delta;
};
/*
@@ -538,7 +538,7 @@ static int stpinfo_valid(void)
static int stp_sync_clock(void *data)
{
struct clock_sync_data *sync = data;
unsigned long long clock_delta, flags;
u64 clock_delta, flags;
static int first;
int rc;
@@ -720,8 +720,8 @@ static ssize_t ctn_id_show(struct device *dev,
mutex_lock(&stp_mutex);
if (stpinfo_valid())
ret = sprintf(buf, "%016llx\n",
*(unsigned long long *) stp_info.ctnid);
ret = sprintf(buf, "%016lx\n",
*(unsigned long *) stp_info.ctnid);
mutex_unlock(&stp_mutex);
return ret;
}
@@ -794,7 +794,7 @@ static ssize_t leap_seconds_scheduled_show(struct device *dev,
if (!stzi.lsoib.p)
return sprintf(buf, "0,0\n");
return sprintf(buf, "%llu,%d\n",
return sprintf(buf, "%lu,%d\n",
tod_to_ns(stzi.lsoib.nlsout - TOD_UNIX_EPOCH) / NSEC_PER_SEC,
stzi.lsoib.nlso - stzi.lsoib.also);
}

View File

@@ -76,8 +76,6 @@ static void cpu_group_map(cpumask_t *dst, struct mask_info *info, unsigned int c
}
info = info->next;
}
if (cpumask_empty(&mask))
cpumask_copy(&mask, cpumask_of(cpu));
break;
case TOPOLOGY_MODE_PACKAGE:
cpumask_copy(&mask, cpu_present_mask);

View File

@@ -1287,7 +1287,7 @@ static u64 __calculate_sltime(struct kvm_vcpu *vcpu)
/* already expired? */
if (cputm >> 63)
return 0;
return min(sltime, tod_to_ns(cputm));
return min_t(u64, sltime, tod_to_ns(cputm));
}
} else if (cpu_timer_interrupts_enabled(vcpu)) {
sltime = kvm_s390_get_cpu_timer(vcpu);

View File

@@ -93,7 +93,7 @@ CONFIG_NETDEVICES=y
CONFIG_NET_ETHERNET=y
CONFIG_MII=m
CONFIG_SUNLANCE=m
CONFIG_HAPPYMEAL=m
CONFIG_HAPPYMEAL=y
CONFIG_SUNGEM=m
CONFIG_SUNVNET=m
CONFIG_LDMVSW=m
@@ -234,9 +234,7 @@ CONFIG_CRYPTO_TWOFISH=m
CONFIG_CRC16=m
CONFIG_LIBCRC32C=m
CONFIG_VCC=m
CONFIG_ATA=y
CONFIG_PATA_CMD64X=y
CONFIG_HAPPYMEAL=y
CONFIG_IP_PNP=y
CONFIG_IP_PNP_DHCP=y
CONFIG_DEVTMPFS=y

View File

@@ -8,7 +8,6 @@
#include <asm/ptrace.h>
#include <asm/processor.h>
#include <asm/extable_64.h>
#include <asm/spitfire.h>
#include <asm/adi.h>

View File

@@ -1,6 +1,6 @@
/* SPDX-License-Identifier: GPL-2.0 */
#ifndef __ASM_EXTABLE64_H
#define __ASM_EXTABLE64_H
#ifndef __ASM_EXTABLE_H
#define __ASM_EXTABLE_H
/*
* The exception table consists of pairs of addresses: the first is the
* address of an instruction that is allowed to fault, and the second is

View File

@@ -50,16 +50,12 @@ struct thread_struct {
unsigned long fsr;
unsigned long fpqdepth;
struct fpq fpqueue[16];
unsigned long flags;
mm_segment_t current_ds;
};
#define SPARC_FLAG_KTHREAD 0x1 /* task is a kernel thread */
#define SPARC_FLAG_UNALIGNED 0x2 /* is allowed to do unaligned accesses */
#define INIT_THREAD { \
.flags = SPARC_FLAG_KTHREAD, \
.current_ds = KERNEL_DS, \
.kregs = (struct pt_regs *)(init_stack+THREAD_SIZE)-1 \
}
/* Do necessary setup to start up a newly executed thread. */

View File

@@ -118,6 +118,7 @@ struct thread_info {
.task = &tsk, \
.current_ds = ASI_P, \
.preempt_count = INIT_PREEMPT_COUNT, \
.kregs = (struct pt_regs *)(init_stack+THREAD_SIZE)-1 \
}
/* how to get the thread information struct from C */

View File

@@ -1,6 +1,9 @@
/* SPDX-License-Identifier: GPL-2.0 */
#ifndef ___ASM_SPARC_UACCESS_H
#define ___ASM_SPARC_UACCESS_H
#include <asm/extable.h>
#if defined(__sparc__) && defined(__arch64__)
#include <asm/uaccess_64.h>
#else

View File

@@ -13,9 +13,6 @@
#include <asm/processor.h>
#define ARCH_HAS_SORT_EXTABLE
#define ARCH_HAS_SEARCH_EXTABLE
/* Sparc is not segmented, however we need to be able to fool access_ok()
* when doing system calls from kernel mode legitimately.
*
@@ -40,36 +37,6 @@
#define __access_ok(addr, size) (__user_ok((addr) & get_fs().seg, (size)))
#define access_ok(addr, size) __access_ok((unsigned long)(addr), size)
/*
* The exception table consists of pairs of addresses: the first is the
* address of an instruction that is allowed to fault, and the second is
* the address at which the program should continue. No registers are
* modified, so it is entirely up to the continuation code to figure out
* what to do.
*
* All the routines below use bits of fixup code that are out of line
* with the main instruction path. This means when everything is well,
* we don't even have to jump over them. Further, they do not intrude
* on our cache or tlb entries.
*
* There is a special way how to put a range of potentially faulting
* insns (like twenty ldd/std's with now intervening other instructions)
* You specify address of first in insn and 0 in fixup and in the next
* exception_table_entry you specify last potentially faulting insn + 1
* and in fixup the routine which should handle the fault.
* That fixup code will get
* (faulting_insn_address - first_insn_in_the_range_address)/4
* in %g2 (ie. index of the faulting instruction in the range).
*/
struct exception_table_entry
{
unsigned long insn, fixup;
};
/* Returns 0 if exception not found and fixup otherwise. */
unsigned long search_extables_range(unsigned long addr, unsigned long *g2);
/* Uh, these should become the main single-value transfer routines..
* They automatically use the right size if we just have the right
* pointer type..
@@ -252,12 +219,7 @@ static inline unsigned long __clear_user(void __user *addr, unsigned long size)
unsigned long ret;
__asm__ __volatile__ (
".section __ex_table,#alloc\n\t"
".align 4\n\t"
".word 1f,3\n\t"
".previous\n\t"
"mov %2, %%o1\n"
"1:\n\t"
"call __bzero\n\t"
" mov %1, %%o0\n\t"
"mov %%o0, %0\n"

View File

@@ -10,7 +10,6 @@
#include <linux/string.h>
#include <asm/asi.h>
#include <asm/spitfire.h>
#include <asm/extable_64.h>
#include <asm/processor.h>

View File

@@ -515,7 +515,7 @@ continue_boot:
/* I want a kernel stack NOW! */
set init_thread_union, %g1
set (THREAD_SIZE - STACKFRAME_SZ), %g2
set (THREAD_SIZE - STACKFRAME_SZ - TRACEREG_SZ), %g2
add %g1, %g2, %sp
mov 0, %fp /* And for good luck */

View File

@@ -706,7 +706,7 @@ tlb_fixup_done:
wr %g0, ASI_P, %asi
mov 1, %g1
sllx %g1, THREAD_SHIFT, %g1
sub %g1, (STACKFRAME_SZ + STACK_BIAS), %g1
sub %g1, (STACKFRAME_SZ + STACK_BIAS + TRACEREG_SZ), %g1
add %g6, %g1, %sp
/* Set per-cpu pointer initially to zero, this makes

Some files were not shown because too many files have changed in this diff Show More