Pull tracing fixes from Steven Rostedt:
"A few tracing fixes:
- Remove lockdown from tracefs itself and moved it to the trace
directory. Have the open functions there do the lockdown checks.
- Fix a few races with opening an instance file and the instance
being deleted (Discovered during the lockdown updates). Kept
separate from the clean up code such that they can be backported to
stable easier.
- Clean up and consolidated the checks done when opening a trace
file, as there were multiple checks that need to be done, and it
did not make sense having them done in each open instance.
- Fix a regression in the record mcount code.
- Small hw_lat detector tracer fixes.
- A trace_pipe read fix due to not initializing trace_seq"
* tag 'trace-v5.4-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Initialize iter->seq after zeroing in tracing_read_pipe()
tracing/hwlat: Don't ignore outer-loop duration when calculating max_latency
tracing/hwlat: Report total time spent in all NMIs during the sample
recordmcount: Fix nop_mcount() function
tracing: Do not create tracefs files if tracefs lockdown is in effect
tracing: Add locked_down checks to the open calls of files created for tracefs
tracing: Add tracing_check_open_get_tr()
tracing: Have trace events system open call tracing_open_generic_tr()
tracing: Get trace_array reference for available_tracers files
ftrace: Get a reference counter for the trace_array on filter files
tracefs: Revert ccbd54ff54 ("tracefs: Restrict tracefs when the kernel is locked down")
Pull hwmon fixes from Guenter Roeck:
- Update/fix inspur-ipsps1 and k10temp Documentation
- Fix nct7904 driver
- Fix HWMON_P_MIN_ALARM mask in hwmon core
* tag 'hwmon-for-v5.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: docs: Extend inspur-ipsps1 title underline
hwmon: (nct7904) Add array fan_alarm and vsen_alarm to store the alarms in nct7904_data struct.
docs: hwmon: Include 'inspur-ipsps1.rst' into docs
hwmon: Fix HWMON_P_MIN_ALARM mask
hwmon: (k10temp) Update documentation and add temp2_input info
hwmon: (nct7904) Fix the incorrect value of vsen_mask in nct7904_data struct
Pull MTD fixes from Richard Weinberger:
"Two fixes for MTD:
- spi-nor: Fix for a regression in write_sr()
- rawnand: Regression fix for the au1550nd driver"
* tag 'fixes-for-5.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux:
mtd: rawnand: au1550nd: Fix au_read_buf16() prototype
mtd: spi-nor: Fix direction of the write_sr() transfer
Pull io_uring fix from Jens Axboe:
"Single small fix for a regression in the sequence logic for linked
commands"
* tag 'for-linus-20191012' of git://git.kernel.dk/linux-block:
io_uring: fix sequence logic for timeout requests
A customer reported the following softlockup:
[899688.160002] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [test.sh:16464]
[899688.160002] CPU: 0 PID: 16464 Comm: test.sh Not tainted 4.12.14-6.23-azure #1 SLE12-SP4
[899688.160002] RIP: 0010:up_write+0x1a/0x30
[899688.160002] Kernel panic - not syncing: softlockup: hung tasks
[899688.160002] RIP: 0010:up_write+0x1a/0x30
[899688.160002] RSP: 0018:ffffa86784d4fde8 EFLAGS: 00000257 ORIG_RAX: ffffffffffffff12
[899688.160002] RAX: ffffffff970fea00 RBX: 0000000000000001 RCX: 0000000000000000
[899688.160002] RDX: ffffffff00000001 RSI: 0000000000000080 RDI: ffffffff970fea00
[899688.160002] RBP: ffffffffffffffff R08: ffffffffffffffff R09: 0000000000000000
[899688.160002] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8b59014720d8
[899688.160002] R13: ffff8b59014720c0 R14: ffff8b5901471090 R15: ffff8b5901470000
[899688.160002] tracing_read_pipe+0x336/0x3c0
[899688.160002] __vfs_read+0x26/0x140
[899688.160002] vfs_read+0x87/0x130
[899688.160002] SyS_read+0x42/0x90
[899688.160002] do_syscall_64+0x74/0x160
It caught the process in the middle of trace_access_unlock(). There is
no loop. So, it must be looping in the caller tracing_read_pipe()
via the "waitagain" label.
Crashdump analyze uncovered that iter->seq was completely zeroed
at this point, including iter->seq.seq.size. It means that
print_trace_line() was never able to print anything and
there was no forward progress.
The culprit seems to be in the code:
/* reset all but tr, trace, and overruns */
memset(&iter->seq, 0,
sizeof(struct trace_iterator) -
offsetof(struct trace_iterator, seq));
It was added by the commit 53d0aa7730 ("ftrace:
add logic to record overruns"). It was v2.6.27-rc1.
It was the time when iter->seq looked like:
struct trace_seq {
unsigned char buffer[PAGE_SIZE];
unsigned int len;
};
There was no "size" variable and zeroing was perfectly fine.
The solution is to reinitialize the structure after or without
zeroing.
Link: http://lkml.kernel.org/r/20191011142134.11997-1-pmladek@suse.com
Signed-off-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
nmi_total_ts is supposed to record the total time spent in *all* NMIs
that occur on the given CPU during the (active portion of the)
sampling window. However, the code seems to be overwriting this
variable for each NMI, thereby only recording the time spent in the
most recent NMI. Fix it by accumulating the duration instead.
Link: http://lkml.kernel.org/r/157073343544.17189.13911783866738671133.stgit@srivatsa-ubuntu
Fixes: 7b2c862501 ("tracing: Add NMI tracing in hwlat detector")
Cc: stable@vger.kernel.org
Signed-off-by: Srivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
The removal of the longjmp code in recordmcount.c mistakenly made the return
of make_nop() being negative an exit of nop_mcount(). It should not exit the
routine, but instead just not process that part of the code. By exiting with
an error code, it would cause the update of recordmcount to fail some files
which would fail the build if ftrace function tracing was enabled.
Link: http://lkml.kernel.org/r/20191009110538.5909fec6@gandalf.local.home
Reported-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Tested-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Fixes: 3f1df12019 ("recordmcount: Rewrite error/success handling")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Added various checks on open tracefs calls to see if tracefs is in lockdown
mode, and if so, to return -EPERM.
Note, the event format files (which are basically standard on all machines)
as well as the enabled_functions file (which shows what is currently being
traced) are not lockde down. Perhaps they should be, but it seems counter
intuitive to lockdown information to help you know if the system has been
modified.
Link: http://lkml.kernel.org/r/CAHk-=wj7fGPKUspr579Cii-w_y60PtRaiDgKuxVtBAMK0VNNkA@mail.gmail.com
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Currently, most files in the tracefs directory test if tracing_disabled is
set. If so, it should return -ENODEV. The tracing_disabled is called when
tracing is found to be broken. Originally it was done in case the ring
buffer was found to be corrupted, and we wanted to prevent reading it from
crashing the kernel. But it's also called if a tracing selftest fails on
boot. It's a one way switch. That is, once it is triggered, tracing is
disabled until reboot.
As most tracefs files can also be used by instances in the tracefs
directory, they need to be carefully done. Each instance has a trace_array
associated to it, and when the instance is removed, the trace_array is
freed. But if an instance is opened with a reference to the trace_array,
then it requires looking up the trace_array to get its ref counter (as there
could be a race with it being deleted and the open itself). Once it is
found, a reference is added to prevent the instance from being removed (and
the trace_array associated with it freed).
Combine the two checks (tracing_disabled and trace_array_get()) into a
single helper function. This will also make it easier to add lockdown to
tracefs later.
Link: http://lkml.kernel.org/r/20191011135458.7399da44@gandalf.local.home
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Instead of having the trace events system open call open code the taking of
the trace_array descriptor (with trace_array_get()) and then calling
trace_open_generic(), have it use the tracing_open_generic_tr() that does
the combination of the two. This requires making tracing_open_generic_tr()
global.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
As instances may have different tracers available, we need to look at the
trace_array descriptor that shows the list of the available tracers for the
instance. But there's a race between opening the file and an admin
deleting the instance. The trace_array_get() needs to be called before
accessing the trace_array.
Cc: stable@vger.kernel.org
Fixes: 607e2ea167 ("tracing: Set up infrastructure to allow tracers for instances")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
The ftrace set_ftrace_filter and set_ftrace_notrace files are specific for
an instance now. They need to take a reference to the instance otherwise
there could be a race between accessing the files and deleting the instance.
It wasn't until the :mod: caching where these file operations started
referencing the trace_array directly.
Cc: stable@vger.kernel.org
Fixes: 673feb9d76 ("ftrace: Add :mod: caching infrastructure to trace_array")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Running the latest kernel through my "make instances" stress tests, I
triggered the following bug (with KASAN and kmemleak enabled):
mkdir invoked oom-killer:
gfp_mask=0x40cd0(GFP_KERNEL|__GFP_COMP|__GFP_RECLAIMABLE), order=0,
oom_score_adj=0
CPU: 1 PID: 2229 Comm: mkdir Not tainted 5.4.0-rc2-test #325
Hardware name: MSI MS-7823/CSM-H87M-G43 (MS-7823), BIOS V1.6 02/22/2014
Call Trace:
dump_stack+0x64/0x8c
dump_header+0x43/0x3b7
? trace_hardirqs_on+0x48/0x4a
oom_kill_process+0x68/0x2d5
out_of_memory+0x2aa/0x2d0
__alloc_pages_nodemask+0x96d/0xb67
__alloc_pages_node+0x19/0x1e
alloc_slab_page+0x17/0x45
new_slab+0xd0/0x234
___slab_alloc.constprop.86+0x18f/0x336
? alloc_inode+0x2c/0x74
? irq_trace+0x12/0x1e
? tracer_hardirqs_off+0x1d/0xd7
? __slab_alloc.constprop.85+0x21/0x53
__slab_alloc.constprop.85+0x31/0x53
? __slab_alloc.constprop.85+0x31/0x53
? alloc_inode+0x2c/0x74
kmem_cache_alloc+0x50/0x179
? alloc_inode+0x2c/0x74
alloc_inode+0x2c/0x74
new_inode_pseudo+0xf/0x48
new_inode+0x15/0x25
tracefs_get_inode+0x23/0x7c
? lookup_one_len+0x54/0x6c
tracefs_create_file+0x53/0x11d
trace_create_file+0x15/0x33
event_create_dir+0x2a3/0x34b
__trace_add_new_event+0x1c/0x26
event_trace_add_tracer+0x56/0x86
trace_array_create+0x13e/0x1e1
instance_mkdir+0x8/0x17
tracefs_syscall_mkdir+0x39/0x50
? get_dname+0x31/0x31
vfs_mkdir+0x78/0xa3
do_mkdirat+0x71/0xb0
sys_mkdir+0x19/0x1b
do_fast_syscall_32+0xb0/0xed
I bisected this down to the addition of the proxy_ops into tracefs for
lockdown. It appears that the allocation of the proxy_ops and then freeing
it in the destroy_inode callback, is causing havoc with the memory system.
Reading the documentation about destroy_inode and talking with Linus about
this, this is buggy and wrong. When defining the destroy_inode() method, it
is expected that the destroy_inode() will also free the inode, and not just
the extra allocations done in the creation of the inode. The faulty commit
causes a memory leak of the inode data structure when they are deleted.
Instead of allocating the proxy_ops (and then having to free it) the checks
should be done by the open functions themselves, and not hack into the
tracefs directory. First revert the tracefs updates for locked_down and then
later we can add the locked_down checks in the kernel/trace files.
Link: http://lkml.kernel.org/r/20191011135458.7399da44@gandalf.local.home
Fixes: ccbd54ff54 ("tracefs: Restrict tracefs when the kernel is locked down")
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Pull char/misc driver fixes from Greg KH:
"Here are some small char/misc driver fixes for 5.4-rc3.
Nothing huge here. Some binder driver fixes (although it is still
being discussed if these all fix the reported issues or not, so more
might be coming later), some mei device ids and fixes, and a google
firmware driver bugfix that fixes a regression, as well as some other
tiny fixes.
All have been in linux-next with no reported issues"
* tag 'char-misc-5.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
firmware: google: increment VPD key_len properly
w1: ds250x: Fix build error without CRC16
virt: vbox: fix memory leak in hgcm_call_preprocess_linaddr
binder: Fix comment headers on binder_alloc_prepare_to_free()
binder: prevent UAF read in print_binder_transaction_log_entry()
misc: fastrpc: prevent memory leak in fastrpc_dma_buf_attach
mei: avoid FW version request on Ibex Peak and earlier
mei: me: add comet point (lake) LP device ids
Pull staging/IIO driver fixes from Greg KH:
"Here are some staging and IIO driver fixes for 5.4-rc3.
The "biggest" thing here is a removal of the fbtft device and flexfb
code as they have been abandoned by their authors and are no longer
needed for that hardware.
Other than that, the usual amount of staging driver and iio driver
fixes for reported issues, and some speakup sysfs file documentation,
which has been long awaited for.
All have been in linux-next with no reported issues"
* tag 'staging-5.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (32 commits)
iio: Fix an undefied reference error in noa1305_probe
iio: light: opt3001: fix mutex unlock race
iio: adc: ad799x: fix probe error handling
iio: light: add missing vcnl4040 of_compatible
iio: light: fix vcnl4000 devicetree hooks
iio: imu: st_lsm6dsx: fix waitime for st_lsm6dsx i2c controller
iio: adc: axp288: Override TS pin bias current for some models
iio: imu: adis16400: fix memory leak
iio: imu: adis16400: release allocated memory on failure
iio: adc: stm32-adc: fix a race when using several adcs with dma and irq
iio: adc: stm32-adc: move registers definitions
iio: accel: adxl372: Perform a reset at start up
iio: accel: adxl372: Fix push to buffers lost samples
iio: accel: adxl372: Fix/remove limitation for FIFO samples
iio: adc: hx711: fix bug in sampling of data
staging: vt6655: Fix memory leak in vt6655_probe
staging: exfat: Use kvzalloc() instead of kzalloc() for exfat_sb_info
Staging: fbtft: fix memory leak in fbtft_framebuffer_alloc
staging: speakup: document sysfs attributes
staging: rtl8188eu: fix HighestRate check in odm_ARFBRefresh_8188E()
...
Pull tty/serial driver fixes from Greg KH:
"Here are some small tty and serial driver fixes for 5.4-rc3 that
resolve a number of reported issues and regressions.
None of these are huge, full details are in the shortlog. There's also
a MAINTAINERS update that I think you might have already taken in your
tree already, but git should handle that merge easily.
All have been in linux-next with no reported issues"
* tag 'tty-5.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
MAINTAINERS: kgdb: Add myself as a reviewer for kgdb/kdb
tty: serial: imx: Use platform_get_irq_optional() for optional IRQs
serial: fix kernel-doc warning in comments
serial: 8250_omap: Fix gpio check for auto RTS/CTS
serial: mctrl_gpio: Check for NULL pointer
tty: serial: fsl_lpuart: Fix lpuart_flush_buffer()
tty: serial: Fix PORT_LINFLEXUART definition
tty: n_hdlc: fix build on SPARC
serial: uartps: Fix uartps_major handling
serial: uartlite: fix exit path null pointer
tty: serial: linflexuart: Fix magic SysRq handling
serial: sh-sci: Use platform_get_irq_optional() for optional interrupts
dt-bindings: serial: sh-sci: Document r8a774b1 bindings
serial/sifive: select SERIAL_EARLYCON
tty: serial: rda: Fix the link time qualifier of 'rda_uart_exit()'
tty: serial: owl: Fix the link time qualifier of 'owl_uart_exit()'
Pull USB fixes from Greg KH:
"Here are a lot of small USB driver fixes for 5.4-rc3.
syzbot has stepped up its testing of the USB driver stack, now able to
trigger fun race conditions between disconnect and probe functions.
Because of that we have a lot of fixes in here from Johan and others
fixing these reported issues that have been around since almost all
time.
We also are just deleting the rio500 driver, making all of the syzbot
bugs found in it moot as it turns out no one has been using it for
years as there is a userspace version that is being used instead.
There are also a number of other small fixes in here, all resolving
reported issues or regressions.
All have been in linux-next without any reported issues"
* tag 'usb-5.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (65 commits)
USB: yurex: fix NULL-derefs on disconnect
USB: iowarrior: use pr_err()
USB: iowarrior: drop redundant iowarrior mutex
USB: iowarrior: drop redundant disconnect mutex
USB: iowarrior: fix use-after-free after driver unbind
USB: iowarrior: fix use-after-free on release
USB: iowarrior: fix use-after-free on disconnect
USB: chaoskey: fix use-after-free on release
USB: adutux: fix use-after-free on release
USB: ldusb: fix NULL-derefs on driver unbind
USB: legousbtower: fix use-after-free on release
usb: cdns3: Fix for incorrect DMA mask.
usb: cdns3: fix cdns3_core_init_role()
usb: cdns3: gadget: Fix full-speed mode
USB: usb-skeleton: drop redundant in-urb check
USB: usb-skeleton: fix use-after-free after driver unbind
USB: usb-skeleton: fix NULL-deref on disconnect
usb:cdns3: Fix for CV CH9 running with g_zero driver.
usb: dwc3: Remove dev_err() on platform_get_irq() failure
usb: dwc3: Switch to platform_get_irq_byname_optional()
...
Pull scheduler fixes from Ingo Molnar:
"Two fixes: a guest-cputime accounting fix, and a cgroup bandwidth
quota precision fix"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/vtime: Fix guest/system mis-accounting on task switch
sched/fair: Scale bandwidth quota and period without losing quota/period ratio precision
Pull perf fixes from Ingo Molnar:
"Mostly tooling fixes, but also a couple of updates for new Intel
models (which are technically hw-enablement, but to users it's a fix
to perf behavior on those new CPUs - hope this is fine), an AUX
inheritance fix, event time-sharing fix, and a fix for lost non-perf
NMI events on AMD systems"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits)
perf/x86/cstate: Add Tiger Lake CPU support
perf/x86/msr: Add Tiger Lake CPU support
perf/x86/intel: Add Tiger Lake CPU support
perf/x86/cstate: Update C-state counters for Ice Lake
perf/x86/msr: Add new CPU model numbers for Ice Lake
perf/x86/cstate: Add Comet Lake CPU support
perf/x86/msr: Add Comet Lake CPU support
perf/x86/intel: Add Comet Lake CPU support
perf/x86/amd: Change/fix NMI latency mitigation to use a timestamp
perf/core: Fix corner case in perf_rotate_context()
perf/core: Rework memory accounting in perf_mmap()
perf/core: Fix inheritance of aux_output groups
perf annotate: Don't return -1 for error when doing BPF disassembly
perf annotate: Return appropriate error code for allocation failures
perf annotate: Fix arch specific ->init() failure errors
perf annotate: Propagate the symbol__annotate() error return
perf annotate: Fix the signedness of failure returns
perf annotate: Propagate perf_env__arch() error
perf evsel: Fall back to global 'perf_env' in perf_evsel__env()
perf tools: Propagate get_cpuid() error
...
Pull EFI fixes from Ingo Molnar:
"Misc EFI fixes all across the map: CPER error report fixes, fixes to
TPM event log parsing, fix for a kexec hang, a Sparse fix and other
fixes"
* 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
efi/tpm: Fix sanity check of unsigned tbl_size being less than zero
efi/x86: Do not clean dummy variable in kexec path
efi: Make unexported efi_rci2_sysfs_init() static
efi/tpm: Only set 'efi_tpm_final_log_size' after successful event log parsing
efi/tpm: Don't traverse an event log with no events
efi/tpm: Don't access event->count when it isn't mapped
efivar/ssdt: Don't iterate over EFI vars if no SSDT override was specified
efi/cper: Fix endianness of PCIe class code
Pull x86 fixes from Ingo Molnar:
"A handful of fixes: a kexec linking fix, an AMD MWAITX fix, a vmware
guest support fix when built under Clang, and new CPU model number
definitions"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/cpu: Add Comet Lake to the Intel CPU models header
lib/string: Make memzero_explicit() inline instead of external
x86/cpu/vmware: Use the full form of INL in VMWARE_PORT
x86/asm: Fix MWAITX C-state hint value
Pull x86 license tag fixlets from Ingo Molnar:
"Fix a couple of SPDX tags in x86 headers to follow the canonical
pattern"
* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86: Use the correct SPDX License Identifier in headers
Pull RISC-V fixes from Paul Walmsley:
- Fix several bugs in the breakpoint trap handler
- Drop an unnecessary loop around calls to preempt_schedule_irq()
* tag 'riscv/for-v5.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
RISC-V: entry: Remove unneeded need_resched() loop
riscv: Correct the handling of unexpected ebreak in do_trap_break()
riscv: avoid sending a SIGTRAP to a user thread trapped in WARN()
riscv: avoid kernel hangs when trapped in BUG()
Pull MIPS fixes from Paul Burton:
- Build fixes for CONFIG_OPTIMIZE_INLINING=y builds in which the
compiler may choose not to inline __xchg() & __cmpxchg().
- A build fix for Loongson configurations with GCC 9.x.
- Expose some extra HWCAP bits to indicate support for various
instruction set extensions to userland.
- Fix bad stack access in firmware handling code for old SNI
RM200/300/400 machines.
* tag 'mips_fixes_5.4_2' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
MIPS: Disable Loongson MMI instructions for kernel build
MIPS: elf_hwcap: Export userspace ASEs
MIPS: fw: sni: Fix out of bounds init of o32 stack
MIPS: include: Mark __xchg as __always_inline
MIPS: include: Mark __cmpxchg as __always_inline
Pull powerpc fixes from Michael Ellerman:
"Fix a kernel crash in spufs_create_root() on Cell machines, since the
new mount API went in.
Fix a regression in our KVM code caused by our recent PCR changes.
Avoid a warning message about a failing hypervisor API on systems that
don't have that API.
A couple of minor build fixes.
Thanks to: Alexey Kardashevskiy, Alistair Popple, Desnes A. Nunes do
Rosario, Emmanuel Nicolet, Jordan Niethe, Laurent Dufour, Stephen
Rothwell"
* tag 'powerpc-5.4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
spufs: fix a crash in spufs_create_root()
powerpc/kvm: Fix kvmppc_vcore->in_guest value in kvmhv_switch_to_host
selftests/powerpc: Fix compile error on tlbie_test due to newer gcc
powerpc/pseries: Remove confusing warning message.
powerpc/64s/radix: Fix build failure with RADIX_MMU=n
Pull xen fixes from Juergen Gross:
- correct panic handling when running as a Xen guest
- cleanup the Xen grant driver to remove printing a pointer being
always NULL
- remove a soon to be wrong call of of_dma_configure()
* tag 'for-linus-5.4-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen: Stop abusing DT of_dma_configure API
xen/grant-table: remove unnecessary printing
x86/xen: Return from panic notifier
Comet Lake is the new 10th Gen Intel processor. From the perspective of
Intel cstate residency counters, there is nothing changed compared with
Kaby Lake.
Share hswult_cstates with Kaby Lake.
Update the comments for Comet Lake.
Kaby Lake is missed in the comments for some Residency Counters. Update
the comments for Kaby Lake as well.
The External Design Specification (EDS) is not published yet. It comes
from an authoritative internal source.
The patch has been tested on real hardware.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/1570549810-25049-5-git-send-email-kan.liang@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull NFS client bugfixes from Anna Schumaker:
"Stable bugfixes:
- Fix O_DIRECT accounting of number of bytes read/written # v4.1+
Other fixes:
- Fix nfsi->nrequests count error on nfs_inode_remove_request()
- Remove redundant mirror tracking in O_DIRECT
- Fix leak of clp->cl_acceptor string
- Fix race to sk_err after xs_error_report"
* tag 'nfs-for-5.4-2' of git://git.linux-nfs.org/projects/anna/linux-nfs:
SUNRPC: fix race to sk_err after xs_error_report
NFSv4: Fix leak of clp->cl_acceptor string
NFS: Remove redundant mirror tracking in O_DIRECT
NFS: Fix O_DIRECT accounting of number of bytes read/written
nfs: Fix nfsi->nrequests count error on nfs_inode_remove_request
Pull cifs fixes from Steve French:
"Eight small SMB3 fixes, four for stable, and important fix for the
recent regression introduced by filesystem timestamp range patches"
* tag '5.4-rc2-smb3' of git://git.samba.org/sfrench/cifs-2.6:
CIFS: Force reval dentry if LOOKUP_REVAL flag is set
CIFS: Force revalidate inode when dentry is stale
smb3: Fix regression in time handling
smb3: remove noisy debug message and minor cleanup
CIFS: Gracefully handle QueryInfo errors during open
cifs: use cifsInodeInfo->open_file_lock while iterating to avoid a panic
fs: cifs: mute -Wunused-const-variable message
smb3: cleanup some recent endian errors spotted by updated sparse
Pull module fixes from Jessica Yu:
"Code cleanups and kbuild/namespace related fixups from Masahiro.
Most importantly, it fixes a namespace-related modpost issue for
external module builds
- Fix broken external module builds due to a modpost bug in
read_dump(), where the namespace was not being strdup'd and
sym->namespace would be set to bogus data.
- Various namespace-related kbuild fixes and cleanups thanks to
Masahiro Yamada"
* tag 'modules-for-v5.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux:
doc: move namespaces.rst from kbuild/ to core-api/
nsdeps: make generated patches independent of locale
nsdeps: fix hashbang of scripts/nsdeps
kbuild: fix build error of 'make nsdeps' in clean tree
module: rename __kstrtab_ns_* to __kstrtabns_* to avoid symbol conflict
modpost: fix broken sym->namespace for external module builds
module: swap the order of symbol.namespace
scripts: add_namespace: Fix coccicheck failed
Pull Hyper-V fixes from Sasha Levin:
"Two fixes from Dexuan Cui:
- Fix a (harmless) warning when building vmbus without
CONFIG_PM_SLEEP
- Fix for a memory leak (and optimization) in the hyperv mouse code"
* tag 'hyperv-fixes-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
Drivers: hv: vmbus: Fix harmless building warnings without CONFIG_PM_SLEEP
HID: hyperv: Use in-place iterator API in the channel callback
Describe the fallthrough pseudo-keyword.
Convert the coding-style.rst example to the keyword style.
Add description and links to deprecated.rst.
Miguel Ojeda comments on the eventual [[fallthrough]] syntax:
"Note that C17/C18 does not have [[fallthrough]].
C++17 introduced it, as it is mentioned above. I would keep the
__attribute__((fallthrough)) -> [[fallthrough]] change you did,
though, since that is indeed the standard syntax (given the paragraph
references C++17).
I was told by Aaron Ballman (who is proposing them for C) that it is
more or less likely that it becomes standardized in C2x. However, it
is still not added to the draft (other attributes are already,
though). See N2268 and N2269:
http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2268.pdf (fallthrough)
http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2269.pdf (attributes in general)"
Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Reserve the pseudo keyword 'fallthrough' for the ability to convert the
various case block /* fallthrough */ style comments to appear to be an
actual reserved word with the same gcc case block missing fallthrough
warning capability.
All switch/case blocks now should end in one of:
break;
fallthrough;
goto <label>;
return [expression];
continue;
In C mode, GCC supports the __fallthrough__ attribute since 7.1,
the same time the warning and the comment parsing were introduced.
fallthrough devolves to an empty "do {} while (0)" if the compiler
version (any version less than gcc 7) does not support the attribute.
Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Suggested-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull drm fixes from Dave Airlie:
"The regular fixes pull for rc3. The i915 team found some fixes they
(or I) missed for rc1, which is why this is a bit bigger than usual,
otherwise there is a single amdgpu fix, some spi panel aliases, and a
bridge fix.
i915:
- execlist access fixes
- list deletion fix
- CML display fix
- HSW workaround extension to GT2
- chicken bit whitelist
- GGTT resume issue
- SKL GPU hangs for Vulkan compute
amdgpu:
- memory leak fix
panel:
- spi aliases
tc358767:
- bridge artifacts fix"
* tag 'drm-fixes-2019-10-11' of git://anongit.freedesktop.org/drm/drm: (22 commits)
drm/bridge: tc358767: fix max_tu_symbol value
drm/i915/gt: execlists->active is serialised by the tasklet
drm/i915/execlists: Protect peeking at execlists->active
drm/i915: Fixup preempt-to-busy vs reset of a virtual request
drm/i915: Only enqueue already completed requests
drm/i915/execlists: Drop redundant list_del_init(&rq->sched.link)
drm/i915/cml: Add second PCH ID for CMP
drm/amdgpu: fix memory leak
drm/panel: tpo-td043mtea1: Fix SPI alias
drm/panel: tpo-td028ttec1: Fix SPI alias
drm/panel: sony-acx565akm: Fix SPI alias
drm/panel: nec-nl8048hl11: Fix SPI alias
drm/panel: lg-lb035q02: Fix SPI alias
drm/i915: Mark contents as dirty on a write fault
drm/i915: Prevent bonded requests from overtaking each other on preemption
drm/i915: Bump skl+ max plane width to 5k for linear/x-tiled
drm/i915: Verify the engine after acquiring the active.lock
drm/i915: Extend Haswell GT1 PSMI workaround to all
drm/i915: Don't mix srcu tag and negative error codes
drm/i915: Whitelist COMMON_SLICE_CHICKEN2
...
Pull block fixes from Jens Axboe:
- Fix wbt performance regression introduced with the blk-rq-qos
refactoring (Harshad)
- Fix io_uring fileset removal inadvertently killing the workqueue (me)
- Fix io_uring typo in linked command nonblock submission (Pavel)
- Remove spurious io_uring wakeups on request free (Pavel)
- Fix null_blk zoned command error return (Keith)
- Don't use freezable workqueues for backing_dev, also means we can
revert a previous libata hack (Mika)
- Fix nbd sysfs mutex dropped too soon at removal time (Xiubo)
* tag 'for-linus-20191010' of git://git.kernel.dk/linux-block:
nbd: fix possible sysfs duplicate warning
null_blk: Fix zoned command return code
io_uring: only flush workqueues on fileset removal
io_uring: remove wait loop spurious wakeups
blk-wbt: fix performance regression in wbt scale_up/scale_down
Revert "libata, freezer: avoid block device removal while system is frozen"
bdi: Do not use freezable workqueue
io_uring: fix reversed nonblock flag for link submission
Depending on inlining decisions by the compiler, __get/put_user_fn
might become out of line. Then the compiler is no longer able to tell
that size can only be 1,2,4 or 8 due to the check in __get/put_user
resulting in false positives like
./arch/s390/include/asm/uaccess.h: In function ‘__put_user_fn’:
./arch/s390/include/asm/uaccess.h:113:9: warning: ‘rc’ may be used uninitialized in this function [-Wmaybe-uninitialized]
113 | return rc;
| ^~
./arch/s390/include/asm/uaccess.h: In function ‘__get_user_fn’:
./arch/s390/include/asm/uaccess.h:143:9: warning: ‘rc’ may be used uninitialized in this function [-Wmaybe-uninitialized]
143 | return rc;
| ^~
These functions are supposed to be always inlined. Mark it as such.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Commit 4b708b7b1a ("firmware: google: check if size is valid when
decoding VPD data") adds length checks, but the new vpd_decode_entry()
function botched the logic -- it adds the key length twice, instead of
adding the key and value lengths separately.
On my local system, this means vpd.c's vpd_section_create_attribs() hits
an error case after the first attribute it parses, since it's no longer
looking at the correct offset. With this patch, I'm back to seeing all
the correct attributes in /sys/firmware/vpd/...
Fixes: 4b708b7b1a ("firmware: google: check if size is valid when decoding VPD data")
Cc: <stable@vger.kernel.org>
Cc: Hung-Te Lin <hungte@chromium.org>
Signed-off-by: Brian Norris <briannorris@chromium.org>
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Guenter Roeck <groeck@chromium.org>
Link: https://lore.kernel.org/r/20190930214522.240680-1-briannorris@chromium.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We have two ways a request can be deferred:
1) It's a regular request that depends on another one
2) It's a timeout that tracks completions
We have a shared helper to determine whether to defer, and that
attempts to make the right decision based on the request. But we
only have some of this information in the caller. Un-share the
two timeout/defer helpers so the caller can use the right one.
Fixes: 5262f56798 ("io_uring: IORING_OP_TIMEOUT support")
Reported-by: yangerkun <yangerkun@huawei.com>
Reviewed-by: Jackie Liu <liuyun01@kylinos.cn>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
- Fix CML display by adding a missing ID.
- Drop redundant list_del_init
- Only enqueue already completed requests to avoid races
- Fixup preempt-to-busy vs reset of a virtual request
- Protect peeking at execlists->active
- execlists->active is serialised by the tasklet
drm-intel-next-fixes-2019-09-19:
- Extend old HSW workaround to fix some GPU hangs on Haswell GT2
- Fix return error code on GEM mmap.
- White list a chicken bit register for push constants legacy mode on Mesa
- Fix resume issue related to GGTT restore
- Remove incorrect BUG_ON on execlist's schedule-out
- Fix unrecoverable GPU hangs with Vulkan compute workloads on SKL
drm-intel-next-fixes-2019-09-26:
- Fix concurrence on cases where requests where getting retired at same time as resubmitted to HW
- Fix gen9 display resolutions by setting the right max plane width
- Fix GPU hang on preemption
- Mark contents as dirty on a write fault. This was breaking cursor sprite with dumb buffers.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191010143039.GA15313@intel.com
Since commit 4f8943f808 ("SUNRPC: Replace direct task wakeups from
softirq context") there has been a race to the value of the sk_err if both
XPRT_SOCK_WAKE_ERROR and XPRT_SOCK_WAKE_DISCONNECT are set. In that case,
we may end up losing the sk_err value that existed when xs_error_report was
called.
Fix this by reverting to the previous behavior: instead of using SO_ERROR
to retrieve the value at a later time (which might also return sk_err_soft),
copy the sk_err value onto struct sock_xprt, and use that value to wake
pending tasks.
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Fixes: 4f8943f808 ("SUNRPC: Replace direct task wakeups from softirq context")
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
GCC 9.x automatically enables support for Loongson MMI instructions when
using some -march= flags, and then errors out when -msoft-float is
specified with:
cc1: error: ‘-mloongson-mmi’ must be used with ‘-mhard-float’
The kernel shouldn't be using these MMI instructions anyway, just as it
doesn't use floating point instructions. Explicitly disable them in
order to fix the build with GCC 9.x.
Signed-off-by: Paul Burton <paul.burton@mips.com>
Fixes: 3702bba5eb ("MIPS: Loongson: Add GCC 4.4 support for Loongson2E")
Fixes: 6f7a251a25 ("MIPS: Loongson: Add basic Loongson 2F support")
Fixes: 5188129b8c ("MIPS: Loongson-3: Improve -march option and move it to Platform")
Cc: Huacai Chen <chenhc@lemote.com>
Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
Cc: stable@vger.kernel.org # v2.6.32+
Cc: linux-mips@vger.kernel.org
Pull xfs fixes from Darrick Wong:
"A couple of small code cleanups and bug fixes for rounding errors,
metadata logging errors, and an extra layer of safeguards against
leaking memory contents.
- Fix a rounding error in the fallocate code
- Minor code cleanups
- Make sure to zero memory buffers before formatting metadata blocks
- Fix a few places where we forgot to log an inode metadata update
- Remove broken error handling that tried to clean up after a failure
but still got it wrong"
* tag 'xfs-5.4-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: move local to extent inode logging into bmap helper
xfs: remove broken error handling on failed attr sf to leaf change
xfs: log the inode on directory sf to block format change
xfs: assure zeroed memory buffers for certain kmem allocations
xfs: removed unused error variable from xchk_refcountbt_rec
xfs: remove unused flags arg from xfs_get_aghdr_buf()
xfs: Fix tail rounding in xfs_alloc_file_space()
1. nbd_put takes the mutex and drops nbd->ref to 0. It then does
idr_remove and drops the mutex.
2. nbd_genl_connect takes the mutex. idr_find/idr_for_each fails
to find an existing device, so it does nbd_dev_add.
3. just before the nbd_put could call nbd_dev_remove or not finished
totally, but if nbd_dev_add try to add_disk, we can hit:
debugfs: Directory 'nbd1' with parent 'block' already present!
This patch will make sure all the disk add/remove stuff are done
by holding the nbd_index_mutex lock.
Reported-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pull btrfs fixes from David Sterba:
"A few more stabitly fixes, one build warning fix.
- fix inode allocation under NOFS context
- fix leak in fiemap due to concurrent append writes
- fix log-root tree updates
- fix balance convert of single profile on 32bit architectures
- silence false positive warning on old GCCs (code moved in rc1)"
* tag 'for-5.4-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: silence maybe-uninitialized warning in clone_range
btrfs: fix uninitialized ret in ref-verify
btrfs: allocate new inode in NOFS context
btrfs: fix balance convert to single on 32-bit host CPUs
btrfs: fix incorrect updating of log root tree
Btrfs: fix memory leak due to concurrent append writes with fiemap
Pull dcache_readdir() fixes from Al Viro:
"The couple of patches you'd been OK with; no hlist conversion yet, and
cursors are still in the list of children"
[ Al is referring to future work to avoid some nasty O(n**2) behavior
with the readdir cursors when you have lots of concurrent readdirs.
This is just a fix for a race with a trivial cleanup - Linus ]
* 'work.dcache' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
libfs: take cursors out of list when moving past the end of directory
Fix the locking in dcache_readdir() and friends
Pull mount fixes from Al Viro:
"A couple of regressions from the mount series"
* 'work.mount3' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
vfs: add missing blkdev_put() in get_tree_bdev()
shmem: fix LSM options parsing
At the end of the v5.3 upstream kernel development cycle, Simon stepped
down from his role as Renesas SoC maintainer.
Remove his maintainership, git repository, and branch from the
MAINTAINERS file, and add an entry to the CREDITS file to honor his
work.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
xen_auto_xlat_grant_frames.vaddr is definitely NULL in this case.
So the address printing is unnecessary.
Signed-off-by: Fuqian Huang <huangfq.daxian@gmail.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
If CRC16 is not set, building will fails:
drivers/w1/slaves/w1_ds250x.o: In function `w1_ds2505_read_page':
w1_ds250x.c:(.text+0x82f): undefined reference to `crc16'
w1_ds250x.c:(.text+0x90a): undefined reference to `crc16'
w1_ds250x.c:(.text+0x91a): undefined reference to `crc16'
Reported-by: Hulk Robot <hulkci@huawei.com>
Fixes: 25ec8710d9 ("w1: add DS2501, DS2502, DS2505 EPROM device driver")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20190920060318.35020-1-yuehaibing@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In hgcm_call_preprocess_linaddr memory is allocated for bounce_buf but
is not released if copy_form_user fails. In order to prevent memory leak
in case of failure, the assignment to bounce_buf_ret is moved before the
error check. This way the allocated bounce_buf will be released by the
caller.
Fixes: 579db9d45c ("virt: Add vboxguest VMMDEV communication code")
Signed-off-by: Navid Emamdoost <navid.emamdoost@gmail.com>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20190930204223.3660-1-navid.emamdoost@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The driver was using its struct usb_interface pointer as an inverted
disconnected flag, but was setting it to NULL without making sure all
code paths that used it were done with it.
Before commit ef61eb43ad ("USB: yurex: Fix protection fault after
device removal") this included the interrupt-in completion handler, but
there are further accesses in dev_err and dev_dbg statements in
yurex_write() and the driver-data destructor (sic!).
Fix this by unconditionally stopping also the control URB at disconnect
and by using a dedicated disconnected flag.
Note that we need to take a reference to the struct usb_interface to
avoid a use-after-free in the destructor whenever the device was
disconnected while the character device was still open.
Fixes: aadd6472d9 ("USB: yurex.c: remove dbg() usage")
Fixes: 45714104b9 ("USB: yurex.c: remove err() usage")
Cc: stable <stable@vger.kernel.org> # 3.5: ef61eb43ad
Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://lore.kernel.org/r/20191009153848.8664-6-johan@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
All i.MX SoCs except i.MX1 have ONLY one necessary IRQ, use
platform_get_irq_optional() to get second/third IRQ which are
optional to avoid below error message during probe:
[ 0.726219] imx-uart 30860000.serial: IRQ index 1 not found
[ 0.731329] imx-uart 30860000.serial: IRQ index 2 not found
Fixes: 7723f4c5ec ("driver core: platform: Add an error message to platform_get_irq*()")
Signed-off-by: Anson Huang <Anson.Huang@nxp.com>
Link: https://lore.kernel.org/r/1570614559-11900-1-git-send-email-Anson.Huang@nxp.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Drop the redundant disconnect mutex which was introduced after the
open-disconnect race had been addressed generally in USB core by commit
d4ead16f50 ("USB: prevent char device open/deregister race").
Specifically, the rw-semaphore in core guarantees that all calls to
open() will have completed and that no new calls to open() will occur
after usb_deregister_dev() returns. Hence there is no need use the
driver data as an inverted disconnected flag.
Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://lore.kernel.org/r/20191009104846.5925-5-johan@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
A recent fix addressing a deadlock on disconnect introduced a new bug
by moving the present flag out of the critical section protected by the
driver-data mutex. This could lead to a racing release() freeing the
driver data before disconnect() is done with it.
Due to insufficient locking a related use-after-free could be triggered
also before the above mentioned commit. Specifically, the driver needs
to hold the driver-data mutex also while checking the opened flag at
disconnect().
Fixes: c468a8aa79 ("usb: iowarrior: fix deadlock on disconnect")
Fixes: 946b960d13 ("USB: add driver for iowarrior devices.")
Cc: stable <stable@vger.kernel.org> # 2.6.21
Reported-by: syzbot+0761012cebf7bdb38137@syzkaller.appspotmail.com
Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://lore.kernel.org/r/20191009104846.5925-2-johan@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The driver was using its struct usb_interface pointer as an inverted
disconnected flag, but was setting it to NULL before making sure all
completion handlers had run. This could lead to a NULL-pointer
dereference in a number of dev_dbg, dev_warn and dev_err statements in
the completion handlers which relies on said pointer.
Fix this by unconditionally stopping all I/O and preventing
resubmissions by poisoning the interrupt URBs at disconnect and using a
dedicated disconnected flag.
This also makes sure that all I/O has completed by the time the
disconnect callback returns.
Fixes: 2824bd250f ("[PATCH] USB: add ldusb driver")
Cc: stable <stable@vger.kernel.org> # 2.6.13
Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://lore.kernel.org/r/20191009153848.8664-4-johan@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
At startup we should trigger the HW state machine
only if it is OTG mode. Otherwise we should just
start the respective role.
Initialize idle role by default. If we don't do this then
cdns3_idle_role_stop() is not called when switching to
host/device role and so lane switch mechanism
doesn't work. This results to super-speed device not working
in one orientation if it was plugged before driver probe.
Signed-off-by: Roger Quadros <rogerq@ti.com>
Signed-off-by: Sekhar Nori <nsekhar@ti.com>
Link: https://lore.kernel.org/r/20191007121601.25996-2-rogerq@ti.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The driver was using its struct usb_interface pointer as an inverted
disconnected flag and was setting it to NULL before making sure all
completion handlers had run. This could lead to NULL-pointer
dereferences in the dev_err() statements in the completion handlers
which relies on said pointer.
Fix this by using a dedicated disconnected flag.
Note that this is also addresses a NULL-pointer dereference at release()
and a struct usb_interface reference leak introduced by a recent runtime
PM fix, which depends on and should have been submitted together with
this patch.
Fixes: 4212cd74ca ("USB: usb-skeleton.c: remove err() usage")
Fixes: 5c290a5e42 ("USB: usb-skeleton: fix runtime PM after driver unbind")
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://lore.kernel.org/r/20191009170944.30057-2-johan@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Jonathan writes:
First set of IIO fixes for the 5.4 cycle.
* adis16400
- Make sure to free memory on a few failure paths.
* adxl372
- Fix wrong fifo depth
- Fix wrong indexing of data from the fifo.
- Perform a reset at startup to avoid a problem with inconsistent state.
* axp288
- This is a fix for a fix. The original fix made sure we kept the
configuration from some firmwares to preserve a bias current.
Unfortunately it appears the previous behaviour was working around
a buggy firmware by overwriting the wrong value it had. Hence
a regression was seen.
* bmc150
- Fix the centre temperature. This was due to an error in one of the
datasheets.
* hx711
- Fix an issue where a badly timed interrupt could lead to a control
line being high long enough to put the device into a low power state.
* meson_sar_adc
- Fix a case where the irq was enabled before everything it uses was
allocated.
* st_lsm6dsx
- Ensure we don't set the sensor sensitivity to 0 as it will force
all readings to 0.
- Fix a wait time for the slave i2c controller when the accelerometer
is not enabled.
* stm32-adc
- Precursor for fix. Move a set of register definitions to a header.
- Fix a race when several ADCs are in use with some using interrupts
to control the dataflow and some using DMA.
* vcnl4000
- Fix a garbage of_match_table in which a string was passed instead
of the intended enum.
* tag 'iio-fixes-for-5.4a' of https://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio:
iio: Fix an undefied reference error in noa1305_probe
iio: light: opt3001: fix mutex unlock race
iio: adc: ad799x: fix probe error handling
iio: light: add missing vcnl4040 of_compatible
iio: light: fix vcnl4000 devicetree hooks
iio: imu: st_lsm6dsx: fix waitime for st_lsm6dsx i2c controller
iio: adc: axp288: Override TS pin bias current for some models
iio: imu: adis16400: fix memory leak
iio: imu: adis16400: release allocated memory on failure
iio: adc: stm32-adc: fix a race when using several adcs with dma and irq
iio: adc: stm32-adc: move registers definitions
iio: accel: adxl372: Perform a reset at start up
iio: accel: adxl372: Fix push to buffers lost samples
iio: accel: adxl372: Fix/remove limitation for FIFO samples
iio: adc: hx711: fix bug in sampling of data
iio: fix center temperature of bmc150-accel-core
iio: imu: st_lsm6dsx: forbid 0 sensor sensitivity
iio: adc: meson_saradc: Fix memory allocation order
max_tu_symbol was programmed to TU_SIZE_RECOMMENDED - 1, which is not
what the spec says. The spec says:
roundup ((input active video bandwidth in bytes/output active video
bandwidth in bytes) * tu_size)
It is not quite clear what the above means, but calculating
max_tu_symbol = (input Bps / output Bps) * tu_size seems to work and
fixes the issues seen.
This fixes artifacts in some videomodes (e.g. 1024x768@60 on 2-lanes &
1.62Gbps was pretty bad for me).
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
Tested-by: Jyri Sarha <jsarha@ti.com>
Signed-off-by: Andrzej Hajda <a.hajda@samsung.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190924131702.9988-1-tomi.valkeinen@ti.com
Commit 37db8985b2 ("s390/cio: add basic protected virtualization
support") breaks virtio-ccw devices with VIRTIO_F_IOMMU_PLATFORM for non
Protected Virtualization (PV) guests. The problem is that the dma_mask
of the ccw device, which is used by virtio core, gets changed from 64 to
31 bit, because some of the DMA allocations do require 31 bit
addressable memory. For PV the only drawback is that some of the virtio
structures must end up in ZONE_DMA because we have the bounce the
buffers mapped via DMA API anyway.
But for non PV guests we have a problem: because of the 31 bit mask
guests bigger than 2G are likely to try bouncing buffers. The swiotlb
however is only initialized for PV guests, because we don't want to
bounce anything for non PV guests. The first such map kills the guest.
Since the DMA API won't allow us to specify for each allocation whether
we need memory from ZONE_DMA (31 bit addressable) or any DMA capable
memory will do, let us use coherent_dma_mask (which is used for
allocations) to force allocating form ZONE_DMA while changing dma_mask
to DMA_BIT_MASK(64) so that at least the streaming API will regard
the whole memory DMA capable.
Signed-off-by: Halil Pasic <pasic@linux.ibm.com>
Reported-by: Marc Hartmayer <mhartmay@linux.ibm.com>
Suggested-by: Robin Murphy <robin.murphy@arm.com>
Fixes: 37db8985b2 ("s390/cio: add basic protected virtualization support")
Link: https://lore.kernel.org/lkml/20190930153803.7958-1-pasic@linux.ibm.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
The return code from null_handle_zoned() sets the cmd->error value.
Returning OK status when an error occured overwrites the intended
cmd->error. Return the appropriate error code instead of setting the
error in the cmd.
Fixes: fceb5d1b19 ("null_blk: create a helper for zoned devices")
Cc: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Due to the nature of preempt-to-busy the execlists active tracking and
the schedule queue may become temporarily desync'ed (between resubmission
to HW and its ack from HW). This means that we may have unwound a
request and passed it back to the virtual engine, but it is still
inflight on the HW and may even result in a GPU hang. If we detect that
GPU hang and try to reset, the hanging request->engine will no longer
match the current engine, which means that the request is not on the
execlists active list and we should not try to find an older incomplete
request. Given that we have deduced this must be a request on a virtual
engine, it is the single active request in the context and so must be
guilty (as the context is still inflight, it is prevented from being
executed on another engine as we process the reset).
Fixes: 22b7a426bb ("drm/i915/execlists: Preempt-to-busy")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190923152844.8914-2-chris@chris-wilson.co.uk
(cherry picked from commit cb2377a919)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
We should not remove the workqueue, we just need to ensure that the
workqueues are synced. The workqueues are torn down on ctx removal.
Cc: stable@vger.kernel.org
Fixes: 6b06314c47 ("io_uring: add file set registration")
Reported-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
If we are asked to submit a completed request, just move it onto the
active-list without modifying it's payload. If we try to emit the
modified payload of a completed request, we risk racing with the
ring->head update during retirement which may advance the head past our
breadcrumb and so we generate a warning for the emission being behind
the RING_HEAD.
v2: Commentary for the sneaky, shared responsibility between functions.
v3: Spelling mistakes and bonus assertion
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190923110056.15176-3-chris@chris-wilson.co.uk
(cherry picked from commit c0bb487dc1)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
I hit the following error when compile the kernel.
drivers/iio/light/noa1305.o: In function `noa1305_probe':
noa1305.c:(.text+0x65): undefined reference to `__devm_regmap_init_i2c'
make: *** [vmlinux] Error 1
Signed-off-by: zhong jiang <zhongjiang@huawei.com>
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
When an end-of-conversion interrupt is received after performing a
single-shot reading of the light sensor, the driver was waking up the
result ready queue before checking opt->ok_to_ignore_lock to determine
if it should unlock the mutex. The problem occurred in the case where
the other thread woke up and changed the value of opt->ok_to_ignore_lock
to false prior to the interrupt thread performing its read of the
variable. In this case, the mutex would be unlocked twice.
Signed-off-by: David Frey <dpfrey@gmail.com>
Reviewed-by: Andreas Dannenberg <dannenberg@ti.com>
Fixes: 94a9b7b180 ("iio: light: add support for TI's opt3001 light sensor")
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Since commit 0f7ddcc1bf ("iio:adc:ad799x: Write default config on probe
and reset alert status on probe") the error path is wrong since it
leaves the vref regulator on. Fix this by disabling both regulators.
Fixes: 0f7ddcc1bf ("iio:adc:ad799x: Write default config on probe and reset alert status on probe")
Signed-off-by: Marco Felsch <m.felsch@pengutronix.de>
Reviewed-by: Alexandru Ardelean <alexandru.ardelean@analog.com>
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Commit 5a441aade5 ("iio: light: vcnl4000 add support for the VCNL4040
proximity and light sensor") added the support for the vcnl4040 but
forgot to add the of_compatible. Fix this by adding it now.
Signed-off-by: Marco Felsch <m.felsch@pengutronix.de>
Fixes: 5a441aade5 ("iio: light: vcnl4000 add support for the VCNL4040 proximity and light sensor")
Reviewed-by: Angus Ainslie (Purism) angus@akkea.ca
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Since commit ebd457d559 ("iio: light: vcnl4000 add devicetree hooks")
the of_match_table is supported but the data shouldn't be a string.
Instead it shall be one of 'enum vcnl4000_device_ids'. Also the matching
logic for the vcnl4020 was wrong. Since the data retrieve mechanism is
still based on the i2c_device_id no failures did appeared till now.
Fixes: ebd457d559 ("iio: light: vcnl4000 add devicetree hooks")
Signed-off-by: Marco Felsch <m.felsch@pengutronix.de>
Reviewed-by: Angus Ainslie (Purism) angus@akkea.ca
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
i2c controller available in st_lsm6dsx series performs i2c slave
configuration using accel clock as trigger.
st_lsm6dsx_shub_wait_complete routine is used to wait the controller has
carried out the requested configuration. However if the accel sensor is not
enabled we should not use its configured odr to estimate a proper timeout
Fixes: c91c1c844e ("iio: imu: st_lsm6dsx: add i2c embedded controller support")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Since commit 9bcf15f75c ("iio: adc: axp288: Fix TS-pin handling") we
preserve the bias current set by the firmware at boot. This fixes issues
we were seeing on various models, but it seems our old hardcoded 80ųA bias
current was working around a firmware bug on at least one model laptop.
In order to both have our cake and eat it, this commit adds a dmi based
list of models where we need to override the firmware set bias current and
adds the one model we now know needs this to it: The Lenovo Ideapad 100S
(11 inch version).
Fixes: 9bcf15f75c ("iio: adc: axp288: Fix TS-pin handling")
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=203829
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
End of conversion may be handled by using IRQ or DMA. There may be a
race when two conversions complete at the same time on several ADCs.
EOC can be read as 'set' for several ADCs, with:
- an ADC configured to use IRQs. EOCIE bit is set. The handler is normally
called in this case.
- an ADC configured to use DMA. EOCIE bit isn't set. EOC triggers the DMA
request instead. It's then automatically cleared by DMA read. But the
handler gets called due to status bit is temporarily set (IRQ triggered
by the other ADC).
So both EOC status bit in CSR and EOCIE control bit must be checked
before invoking the interrupt handler (e.g. call ISR only for
IRQ-enabled ADCs).
Fixes: 2763ea0585 ("iio: adc: stm32: add optional dma support")
Signed-off-by: Fabrice Gasnier <fabrice.gasnier@st.com>
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Move STM32 ADC registers definitions to common header.
This is precursor patch to:
- iio: adc: stm32-adc: fix a race when using several adcs with dma and irq
It keeps registers definitions as a whole block, to ease readability and
allow simple access path to EOC bits (readl) in stm32-adc-core driver.
Fixes: 2763ea0585 ("iio: adc: stm32: add optional dma support")
Signed-off-by: Fabrice Gasnier <fabrice.gasnier@st.com>
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
We need to perform a reset a start up to make sure that the chip is in a
consistent state. This reset also disables all the interrupts which
should only be enabled together with the iio buffer. Not doing this, was
sometimes causing unwanted interrupts to trigger.
Signed-off-by: Stefan Popa <stefan.popa@analog.com>
Fixes: f4f55ce38e ("iio:adxl372: Add FIFO and interrupts support")
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
One in two sample sets was lost by multiplying fifo_set_size with
sizeof(u16). Also, the double number of available samples were pushed to
the iio buffers.
Signed-off-by: Stefan Popa <stefan.popa@analog.com>
Fixes: f4f55ce38e ("iio:adxl372: Add FIFO and interrupts support")
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Currently, the driver sets the FIFO_SAMPLES register with the number of
sample sets (maximum of 170 for 3 axis data, 256 for 2-axis and 512 for
single axis). However, the FIFO_SAMPLES register should store the number
of samples, regardless of how the FIFO format is configured.
Signed-off-by: Stefan Popa <stefan.popa@analog.com>
Fixes: f4f55ce38e ("iio:adxl372: Add FIFO and interrupts support")
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Fix bug in sampling function hx711_cycle() when interrupt occures while
PD_SCK is high. If PD_SCK is high for at least 60 us power down mode of
the sensor is entered which in turn leads to a wrong measurement.
Switch off interrupts during a PD_SCK high period and move query of DOUT
to the latest point of time which is at the end of PD_SCK low period.
This bug exists in the driver since it's initial addition. The more
interrupts on the system the higher is the probability that it happens.
Fixes: c3b2fdd0ea ("iio: adc: hx711: Add IIO driver for AVIA HX711")
Signed-off-by: Andreas Klinger <ak@it-klinger.de>
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Pull rdma fixes from Jason Gunthorpe:
"The usual collection of driver bug fixes, and a few regressions from
the merge window. Nothing particularly worrisome.
- Various missed memory frees and error unwind bugs
- Fix regressions in a few iwarp drivers from 5.4 patches
- A few regressions added in past kernels
- Squash a number of races in mlx5 ODP code"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
RDMA/mlx5: Add missing synchronize_srcu() for MW cases
RDMA/mlx5: Put live in the correct place for ODP MRs
RDMA/mlx5: Order num_pending_prefetch properly with synchronize_srcu
RDMA/odp: Lift umem_mutex out of ib_umem_odp_unmap_dma_pages()
RDMA/mlx5: Fix a race with mlx5_ib_update_xlt on an implicit MR
RDMA/mlx5: Do not allow rereg of a ODP MR
IB/core: Fix wrong iterating on ports
RDMA/nldev: Reshuffle the code to avoid need to rebind QP in error path
RDMA/cxgb4: Do not dma memory off of the stack
RDMA/cm: Fix memory leak in cm_add/remove_one
RDMA/core: Fix an error handling path in 'res_get_common_doit()'
RDMA/i40iw: Associate ibdev to netdev before IB device registration
RDMA/iwcm: Fix a lock inversion issue
RDMA/iw_cxgb4: fix SRQ access from dump_qp()
RDMA/hfi1: Prevent memory leak in sdma_init
RDMA/core: Fix use after free and refcnt leak on ndev in_device in iwarp_query_port
RDMA/siw: Fix serialization issue in write_space()
RDMA/vmw_pvrdma: Free SRQ only once
cleanup error handling code and make sure temporary info array
with the handles are freed by amdgpu_bo_list_put() on
idr_replace()'s failure.
Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Pull arm64 fixes from Will Deacon:
"A larger-than-usual batch of arm64 fixes for -rc3.
The bulk of the fixes are dealing with a bunch of issues with the
build system from the compat vDSO, which unfortunately led to some
significant Makefile rework to manage the horrible combinations of
toolchains that we can end up needing to drive simultaneously.
We came close to disabling the thing entirely, but Vincenzo was quick
to spin up some patches and I ended up picking up most of the bits
that were left [*]. Future work will look at disentangling the header
files properly.
Other than that, we have some important fixes all over, including one
papering over the miscompilation fallout from forcing
CONFIG_OPTIMIZE_INLINING=y, which I'm still unhappy about. Harumph.
We've still got a couple of open issues, so I'm expecting to have some
more fixes later this cycle.
Summary:
- Numerous fixes to the compat vDSO build system, especially when
combining gcc and clang
- Fix parsing of PAR_EL1 in spurious kernel fault detection
- Partial workaround for Neoverse-N1 erratum #1542419
- Fix IRQ priority masking on entry from compat syscalls
- Fix advertisment of FRINT HWCAP to userspace
- Attempt to workaround inlining breakage with '__always_inline'
- Fix accidental freeing of parent SVE state on fork() error path
- Add some missing NULL pointer checks in instruction emulation init
- Some formatting and comment fixes"
[*] Will's final fixes were
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
but they were already in linux-next by then and he didn't rebase
just to add those.
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (21 commits)
arm64: armv8_deprecated: Checking return value for memory allocation
arm64: Kconfig: Make CONFIG_COMPAT_VDSO a proper Kconfig option
arm64: vdso32: Rename COMPATCC to CC_COMPAT
arm64: vdso32: Pass '--target' option to clang via VDSO_CAFLAGS
arm64: vdso32: Don't use KBUILD_CPPFLAGS unconditionally
arm64: vdso32: Move definition of COMPATCC into vdso32/Makefile
arm64: Default to building compat vDSO with clang when CONFIG_CC_IS_CLANG
lib: vdso: Remove CROSS_COMPILE_COMPAT_VDSO
arm64: vdso32: Remove jump label config option in Makefile
arm64: vdso32: Detect binutils support for dmb ishld
arm64: vdso: Remove stale files from old assembly implementation
arm64: vdso32: Fix broken compat vDSO build warnings
arm64: mm: fix spurious fault detection
arm64: ftrace: Ensure synchronisation in PLT setup for Neoverse-N1 #1542419
arm64: Fix incorrect irqflag restore for priority masking for compat
arm64: mm: avoid virt_to_phys(init_mm.pgd)
arm64: cpufeature: Effectively expose FRINT capability to userspace
arm64: Mark functions using explicit register variables as '__always_inline'
docs: arm64: Fix indentation and doc formatting
arm64/sve: Fix wrong free for task->thread.sve_state
...
The callers of xfs_bmap_local_to_extents_empty() log the inode
external to the function, yet this function is where the on-disk
format value is updated. Push the inode logging down into the
function itself to help prevent future mistakes.
Note that internal bmap callers track the inode logging flags
independently and thus may log the inode core twice due to this
change. This is harmless, so leave this code around for consistency
with the other attr fork conversion functions.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
xfs_attr_shortform_to_leaf() attempts to put the shortform fork back
together after a failed attempt to convert from shortform to leaf
format. While this code reallocates and copies back the shortform
attr fork data, it never resets the inode format field back to local
format. Further, now that the inode is properly logged after the
initial switch from local format, any error that triggers the
recovery code will eventually abort the transaction and shutdown the
fs. Therefore, remove the broken and unnecessary error handling
code.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
When a directory changes from shortform (sf) to block format, the sf
format is copied to a temporary buffer, the inode format is modified
and the updated format filled with the dentries from the temporary
buffer. If the inode format is modified and attempt to grow the
inode fails (due to I/O error, for example), it is possible to
return an error while leaving the directory in an inconsistent state
and with an otherwise clean transaction. This results in corruption
of the associated directory and leads to xfs_dabuf_map() errors as
subsequent lookups cannot accurately determine the format of the
directory. This problem is reproduced occasionally by generic/475.
The fundamental problem is that xfs_dir2_sf_to_block() changes the
on-disk inode format without logging the inode. The inode is
eventually logged by the bmapi layer in the common case, but error
checking introduces the possibility of failing the high level
request before this happens.
Update both of the dir2 and attr callers of
xfs_bmap_local_to_extents_empty() to log the inode core as
consistent with the bmap local to extent format change codepath.
This ensures that any subsequent errors after the format has changed
cause the transaction to abort.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
We no longer need the extra mirror length tracking in the O_DIRECT code,
as we are able to track the maximum contiguous length in dreq->max_count.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
When a series of O_DIRECT reads or writes are truncated, either due to
eof or due to an error, then we should return the number of contiguous
bytes that were received/sent starting at the offset specified by the
application.
Currently, we are failing to correctly check contiguity, and so we're
failing the generic/465 in xfstests when the race between the read
and write RPCs causes the file to get extended while the 2 reads are
outstanding. If the first read RPC call wins the race and returns with
eof set, we should treat the second read RPC as being truncated.
Reported-by: Su Yanjun <suyj.fnst@cn.fujitsu.com>
Fixes: 1ccbad9f9f ("nfs: fix DIO good bytes calculation")
Cc: stable@vger.kernel.org # 4.1+
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
It turns out that the NMI latency workaround from commit:
6d3edaae16 ("x86/perf/amd: Resolve NMI latency issues for active PMCs")
ends up being too conservative and results in the perf NMI handler claiming
NMIs too easily on AMD hardware when the NMI watchdog is active.
This has an impact, for example, on the hpwdt (HPE watchdog timer) module.
This module can produce an NMI that is used to reset the system. It
registers an NMI handler for the NMI_UNKNOWN type and relies on the fact
that nothing has claimed an NMI so that its handler will be invoked when
the watchdog device produces an NMI. After the referenced commit, the
hpwdt module is unable to process its generated NMI if the NMI watchdog is
active, because the current NMI latency mitigation results in the NMI
being claimed by the perf NMI handler.
Update the AMD perf NMI latency mitigation workaround to, instead, use a
window of time. Whenever a PMC is handled in the perf NMI handler, set a
timestamp which will act as a perf NMI window. Any NMIs arriving within
that window will be claimed by perf. Anything outside that window will
not be claimed by perf. The value for the NMI window is set to 100 msecs.
This is a conservative value that easily covers any NMI latency in the
hardware. While this still results in a window in which the hpwdt module
will not receive its NMI, the window is now much, much smaller.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Jerry Hoemann <jerry.hoemann@hpe.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 6d3edaae16 ("x86/perf/amd: Resolve NMI latency issues for active PMCs")
Link: https://lkml.kernel.org/r/Message-ID:
Signed-off-by: Ingo Molnar <mingo@kernel.org>
In perf_rotate_context(), when the first cpu flexible event fail to
schedule, cpu_rotate is 1, while cpu_event is NULL. Since cpu_event is
NULL, perf_rotate_context will _NOT_ call cpu_ctx_sched_out(), thus
cpuctx->ctx.is_active will have EVENT_FLEXIBLE set. Then, the next
perf_event_sched_in() will skip all cpu flexible events because of the
EVENT_FLEXIBLE bit.
In the next call of perf_rotate_context(), cpu_rotate stays 1, and
cpu_event stays NULL, so this process repeats. The end result is, flexible
events on this cpu will not be scheduled (until another event being added
to the cpuctx).
Here is an easy repro of this issue. On Intel CPUs, where ref-cycles
could only use one counter, run one pinned event for ref-cycles, one
flexible event for ref-cycles, and one flexible event for cycles. The
flexible ref-cycles is never scheduled, which is expected. However,
because of this issue, the cycles event is never scheduled either.
$ perf stat -e ref-cycles:D,ref-cycles,cycles -C 5 -I 1000
time counts unit events
1.000152973 15,412,480 ref-cycles:D
1.000152973 <not counted> ref-cycles (0.00%)
1.000152973 <not counted> cycles (0.00%)
2.000486957 18,263,120 ref-cycles:D
2.000486957 <not counted> ref-cycles (0.00%)
2.000486957 <not counted> cycles (0.00%)
To fix this, when the flexible_active list is empty, try rotate the
first event in the flexible_groups. Also, rename ctx_first_active() to
ctx_event_to_rotate(), which is more accurate.
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <kernel-team@fb.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sasha Levin <sashal@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 8d5bce0c37 ("perf/core: Optimize perf_rotate_context() event scheduling")
Link: https://lkml.kernel.org/r/20191008165949.920548-1-songliubraving@fb.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
vtime_account_system() assumes that the target task to account cputime
to is always the current task. This is most often true indeed except on
task switch where we call:
vtime_common_task_switch(prev)
vtime_account_system(prev)
Here prev is the scheduling-out task where we account the cputime to. It
doesn't match current that is already the scheduling-in task at this
stage of the context switch.
So we end up checking the wrong task flags to determine if we are
accounting guest or system time to the previous task.
As a result the wrong task is used to check if the target is running in
guest mode. We may then spuriously account or leak either system or
guest time on task switch.
Fix this assumption and also turn vtime_guest_enter/exit() to use the
task passed in parameter as well to avoid future similar issues.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wanpeng Li <wanpengli@tencent.com>
Fixes: 2a42eb9594 ("sched/cputime: Accumulate vtime on top of nsec clocksource")
Link: https://lkml.kernel.org/r/20190925214242.21873-1-frederic@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The quota/period ratio is used to ensure a child task group won't get
more bandwidth than the parent task group, and is calculated as:
normalized_cfs_quota() = [(quota_us << 20) / period_us]
If the quota/period ratio was changed during this scaling due to
precision loss, it will cause inconsistency between parent and child
task groups.
See below example:
A userspace container manager (kubelet) does three operations:
1) Create a parent cgroup, set quota to 1,000us and period to 10,000us.
2) Create a few children cgroups.
3) Set quota to 1,000us and period to 10,000us on a child cgroup.
These operations are expected to succeed. However, if the scaling of
147/128 happens before step 3, quota and period of the parent cgroup
will be changed:
new_quota: 1148437ns, 1148us
new_period: 11484375ns, 11484us
And when step 3 comes in, the ratio of the child cgroup will be
104857, which will be larger than the parent cgroup ratio (104821),
and will fail.
Scaling them by a factor of 2 will fix the problem.
Tested-by: Phil Auld <pauld@redhat.com>
Signed-off-by: Xuewei Zhang <xueweiz@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Phil Auld <pauld@redhat.com>
Cc: Anton Blanchard <anton@ozlabs.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Fixes: 2e8e192263 ("sched/fair: Limit sched_cfs_period_timer() loop to avoid hard lockup")
Link: https://lkml.kernel.org/r/20191004001243.140897-1-xueweiz@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
kvmhv_switch_to_host() in arch/powerpc/kvm/book3s_hv_rmhandlers.S
needs to set kvmppc_vcore->in_guest to 0 to signal secondary CPUs to
continue. This happens after resetting the PCR. Before commit
13c7bb3c57 ("powerpc/64s: Set reserved PCR bits"), r0 would always
be 0 before it was stored to kvmppc_vcore->in_guest. However because
of this change in the commit:
/* Reset PCR */
ld r0, VCORE_PCR(r5)
- cmpdi r0, 0
+ LOAD_REG_IMMEDIATE(r6, PCR_MASK)
+ cmpld r0, r6
beq 18f
- li r0, 0
- mtspr SPRN_PCR, r0
+ mtspr SPRN_PCR, r6
18:
/* Signal secondary CPUs to continue */
stb r0,VCORE_IN_GUEST(r5)
We are no longer comparing r0 against 0 and loading it with 0 if it
contains something else. Hence when we store r0 to
kvmppc_vcore->in_guest, it might not be 0. This means that secondary
CPUs will not be signalled to continue. Those CPUs get stuck and
errors like the following are logged:
KVM: CPU 1 seems to be stuck
KVM: CPU 2 seems to be stuck
KVM: CPU 3 seems to be stuck
KVM: CPU 4 seems to be stuck
KVM: CPU 5 seems to be stuck
KVM: CPU 6 seems to be stuck
KVM: CPU 7 seems to be stuck
This can be reproduced with:
$ for i in `seq 1 7` ; do chcpu -d $i ; done ;
$ taskset -c 0 qemu-system-ppc64 -smp 8,threads=8 \
-M pseries,accel=kvm,kvm-type=HV -m 1G -nographic -vga none \
-kernel vmlinux -initrd initrd.cpio.xz
Fix by making sure r0 is 0 before storing it to
kvmppc_vcore->in_guest.
Fixes: 13c7bb3c57 ("powerpc/64s: Set reserved PCR bits")
Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Jordan Niethe <jniethe5@gmail.com>
Reviewed-by: Alistair Popple <alistair@popple.id.au>
Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191004025317.19340-1-jniethe5@gmail.com
Newer versions of GCC (>= 9) demand that the size of the string to be
copied must be explicitly smaller than the size of the destination.
Thus, the NULL char has to be taken into account on strncpy.
This will avoid the following compiling error:
tlbie_test.c: In function 'main':
tlbie_test.c:639:4: error: 'strncpy' specified bound 100 equals destination size
strncpy(logdir, optarg, LOGDIR_NAME_SIZE);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors
Signed-off-by: Desnes A. Nunes do Rosario <desnesn@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191003211010.9711-1-desnesn@linux.ibm.com
Since commit 1211ee61b4 ("powerpc/pseries: Read TLB Block Invalidate
Characteristics"), a warning message is displayed when booting a guest
on top of KVM:
lpar: arch/powerpc/platforms/pseries/lpar.c pseries_lpar_read_hblkrm_characteristics Error calling get-system-parameter (0xfffffffd)
This message is displayed because this hypervisor is not supporting
the H_BLOCK_REMOVE hcall and thus is not exposing the corresponding
feature.
Reading the TLB Block Invalidate Characteristics should not be done if
the feature is not exposed.
Fixes: 1211ee61b4 ("powerpc/pseries: Read TLB Block Invalidate Characteristics")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191001132928.72555-1-ldufour@linux.ibm.com
After merging the powerpc tree, today's linux-next build (powerpc64
allnoconfig) failed like this:
arch/powerpc/mm/book3s64/pgtable.c:216:3:
error: implicit declaration of function 'radix__flush_all_lpid_guest'
radix__flush_all_lpid_guest() is only declared for
CONFIG_PPC_RADIX_MMU which is not set for this build.
Fix it by adding an empty version for the RADIX_MMU=n case, which
should never be called.
Fixes: 99161de3a2 ("powerpc/64s/radix: tidy up TLB flushing code")
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
[mpe: Munge change log]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190930101342.36c1afa0@canb.auug.org.au
Mark inode for force revalidation if LOOKUP_REVAL flag is set.
This tells the client to actually send a QueryInfo request to
the server to obtain the latest metadata in case a directory
or a file were changed remotely. Only do that if the client
doesn't have a lease for the file to avoid unneeded round
trips to the server.
Cc: <stable@vger.kernel.org>
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Currently the client indicates that a dentry is stale when inode
numbers or type types between a local inode and a remote file
don't match. If this is the case attributes is not being copied
from remote to local, so, it is already known that the local copy
has stale metadata. That's why the inode needs to be marked for
revalidation in order to tell the VFS to lookup the dentry again
before openning a file. This prevents unexpected stale errors
to be returned to the user space when openning a file.
Cc: <stable@vger.kernel.org>
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Fixes: cb7a69e605 ("cifs: Initialize filesystem timestamp ranges")
Only very old servers (e.g. OS/2 and DOS) did not support
DCE TIME (100 nanosecond granularity). Fix the checks used
to set minimum and maximum times.
Fixes xfstest generic/258 (on 5.4-rc1 and later)
CC: Deepa Dinamani <deepa.kernel@gmail.com>
Acked-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Message was intended only for developer temporary build
In addition cleanup two minor warnings noticed by Coverity
and a trivial change to workaround a sparse warning
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Pull LED fixes from Jacek Anaszewski:
- fix a leftover from earlier stage of development in the documentation
of recently added led_compose_name() and fix old mistake in the
documentation of led_set_brightness_sync() parameter name.
- MAINTAINERS: add pointer to Pavel Machek's linux-leds.git tree.
Pavel is going to take over LED tree maintainership from myself.
* tag 'led-fixes-for-5.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/j.anaszewski/linux-leds:
Add my linux-leds branch to MAINTAINERS
leds: core: Fix leds.h structure documentation
Add pointer to my git tree to MAINTAINERS. I'd like to maintain
linux-leds for-next branch for 5.5.
Signed-off-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Jacek Anaszewski <jacek.anaszewski@gmail.com>
Update the leds.h structure documentation to define the
correct arguments.
Signed-off-by: Dan Murphy <dmurphy@ti.com>
Signed-off-by: Jacek Anaszewski <jacek.anaszewski@gmail.com>
Pull GPIO fixes from Linus Walleij:
- don't clear FLAG_IS_OUT when emulating open drain/source in gpiolib
- fix up the usage of nonexclusive GPIO descriptors from device trees
- fix the incorrect IEC offset when toggling trigger edge in the
Spreadtrum driver
- use the correct unit for debounce settings in the MAX77620 driver
* tag 'gpio-v5.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
gpio: max77620: Use correct unit for debounce times
gpio: eic: sprd: Fix the incorrect EIC offset when toggling
gpio: fix getting nonexclusive gpiods from DT
gpiolib: don't clear FLAG_IS_OUT when emulating open-drain/open-source
Pull selinuxfix from Paul Moore:
"One patch to ensure we don't copy bad memory up into userspace"
* tag 'selinux-pr-20191007' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
selinux: fix context string corruption in convert_context()
Pull Kselftest fixes from Shuah Khan:
"Fixes for existing tests and the framework.
Cristian Marussi's patches add the ability to skip targets (tests) and
exclude tests that didn't build from run-list. These patches improve
the Kselftest results. Ability to skip targets helps avoid running
tests that aren't supported in certain environments. As an example,
bpf tests from mainline aren't supported on stable kernels and have
dependency on bleeding edge llvm. Being able to skip bpf on systems
that can't meet this llvm dependency will be helpful.
Kselftest can be built and installed from the main Makefile. This
change help simplify Kselftest use-cases which addresses request from
users.
Kees Cook added per test timeout support to limit individual test
run-time"
* tag 'linux-kselftest-5.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests: watchdog: Add command line option to show watchdog_info
selftests: watchdog: Validate optional file argument
selftests/kselftest/runner.sh: Add 45 second timeout per test
kselftest: exclude failed TARGETS from runlist
kselftest: add capability to skip chosen TARGETS
selftests: Add kselftest-all and kselftest-install targets
We discussed a better location for this file, and agreed that
core-api/ is a good fit. Rename it to symbol-namespaces.rst
for disambiguation, and also add it to index.rst and MAINTAINERS.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Matthias Maennich <maennich@google.com>
Signed-off-by: Jessica Yu <jeyu@kernel.org>
There are no return value checking when using kzalloc() and kcalloc() for
memory allocation. so add it.
Signed-off-by: Yunfeng Ye <yeyunfeng@huawei.com>
Signed-off-by: Will Deacon <will@kernel.org>
GCC throws warning message as below:
‘clone_src_i_size’ may be used uninitialized in this function
[-Wmaybe-uninitialized]
#define IS_ALIGNED(x, a) (((x) & ((typeof(x))(a) - 1)) == 0)
^
fs/btrfs/send.c:5088:6: note: ‘clone_src_i_size’ was declared here
u64 clone_src_i_size;
^
The clone_src_i_size is only used as call-by-reference
in a call to get_inode_info().
Silence the warning by initializing clone_src_i_size to 0.
Note that the warning is a false positive and reported by older versions
of GCC (eg. 7.x) but not eg 9.x. As there have been numerous people, the
patch is applied. Setting clone_src_i_size to 0 does not otherwise make
sense and would not do any action in case the code changes in the future.
Signed-off-by: Austin Kim <austindh.kim@gmail.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ add note ]
Signed-off-by: David Sterba <dsterba@suse.com>
Any changes interesting to tasks waiting in io_cqring_wait() are
commited with io_cqring_ev_posted(). However, io_ring_drop_ctx_refs()
also tries to do that but with no reason, that means spurious wakeups
every io_free_req() and io_uring_enter().
Just use percpu_ref_put() instead.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Merge misc fixes from Andrew Morton:
"The usual shower of hotfixes.
Chris's memcg patches aren't actually fixes - they're mature but a few
niggling review issues were late to arrive.
The ocfs2 fixes are quite old - those took some time to get reviewer
attention.
Subsystems affected by this patch series: ocfs2, hotfixes, mm/memcg,
mm/slab-generic"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two)
mm, sl[ou]b: improve memory accounting
mm, memcg: make scan aggression always exclude protection
mm, memcg: make memory.emin the baseline for utilisation determination
mm, memcg: proportional memory.{low,min} reclaim
mm/vmpressure.c: fix a signedness bug in vmpressure_register_event()
mm/page_alloc.c: fix a crash in free_pages_prepare()
mm/z3fold.c: claim page in the beginning of free
kernel/sysctl.c: do not override max_threads provided by userspace
memcg: only record foreign writebacks with dirty pages when memcg is not disabled
mm: fix -Wmissing-prototypes warnings
writeback: fix use-after-free in finish_writeback_work()
mm/memremap: drop unused SECTION_SIZE and SECTION_MASK
panic: ensure preemption is disabled during panic()
fs: ocfs2: fix a possible null-pointer dereference in ocfs2_info_scan_inode_alloc()
fs: ocfs2: fix a possible null-pointer dereference in ocfs2_write_end_nolock()
fs: ocfs2: fix possible null-pointer dereferences in ocfs2_xa_prepare_entry()
ocfs2: clear zero in unaligned direct IO
In most configurations, kmalloc() happens to return naturally aligned
(i.e. aligned to the block size itself) blocks for power of two sizes.
That means some kmalloc() users might unknowingly rely on that
alignment, until stuff breaks when the kernel is built with e.g.
CONFIG_SLUB_DEBUG or CONFIG_SLOB, and blocks stop being aligned. Then
developers have to devise workaround such as own kmem caches with
specified alignment [1], which is not always practical, as recently
evidenced in [2].
The topic has been discussed at LSF/MM 2019 [3]. Adding a
'kmalloc_aligned()' variant would not help with code unknowingly relying
on the implicit alignment. For slab implementations it would either
require creating more kmalloc caches, or allocate a larger size and only
give back part of it. That would be wasteful, especially with a generic
alignment parameter (in contrast with a fixed alignment to size).
Ideally we should provide to mm users what they need without difficult
workarounds or own reimplementations, so let's make the kmalloc()
alignment to size explicitly guaranteed for power-of-two sizes under all
configurations. What this means for the three available allocators?
* SLAB object layout happens to be mostly unchanged by the patch. The
implicitly provided alignment could be compromised with
CONFIG_DEBUG_SLAB due to redzoning, however SLAB disables redzoning for
caches with alignment larger than unsigned long long. Practically on at
least x86 this includes kmalloc caches as they use cache line alignment,
which is larger than that. Still, this patch ensures alignment on all
arches and cache sizes.
* SLUB layout is also unchanged unless redzoning is enabled through
CONFIG_SLUB_DEBUG and boot parameter for the particular kmalloc cache.
With this patch, explicit alignment is guaranteed with redzoning as
well. This will result in more memory being wasted, but that should be
acceptable in a debugging scenario.
* SLOB has no implicit alignment so this patch adds it explicitly for
kmalloc(). The potential downside is increased fragmentation. While
pathological allocation scenarios are certainly possible, in my testing,
after booting a x86_64 kernel+userspace with virtme, around 16MB memory
was consumed by slab pages both before and after the patch, with
difference in the noise.
[1] https://lore.kernel.org/linux-btrfs/c3157c8e8e0e7588312b40c853f65c02fe6c957a.1566399731.git.christophe.leroy@c-s.fr/
[2] https://lore.kernel.org/linux-fsdevel/20190225040904.5557-1-ming.lei@redhat.com/
[3] https://lwn.net/Articles/787740/
[akpm@linux-foundation.org: documentation fixlet, per Matthew]
Link: http://lkml.kernel.org/r/20190826111627.7505-3-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Cc: David Sterba <dsterba@suse.cz>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: "Darrick J . Wong" <darrick.wong@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Patch series "guarantee natural alignment for kmalloc()", v2.
This patch (of 2):
SLOB currently doesn't account its pages at all, so in /proc/meminfo the
Slab field shows zero. Modifying a counter on page allocation and
freeing should be acceptable even for the small system scenarios SLOB is
intended for. Since reclaimable caches are not separated in SLOB,
account everything as unreclaimable.
SLUB currently doesn't account kmalloc() and kmalloc_node() allocations
larger than order-1 page, that are passed directly to the page
allocator. As they also don't appear in /proc/slabinfo, it might look
like a memory leak. For consistency, account them as well. (SLAB
doesn't actually use page allocator directly, so no change there).
Ideally SLOB and SLUB would be handled in separate patches, but due to
the shared kmalloc_order() function and different kfree()
implementations, it's easier to patch both at once to prevent
inconsistencies.
Link: http://lkml.kernel.org/r/20190826111627.7505-2-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: "Darrick J . Wong" <darrick.wong@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch is an incremental improvement on the existing
memory.{low,min} relative reclaim work to base its scan pressure
calculations on how much protection is available compared to the current
usage, rather than how much the current usage is over some protection
threshold.
This change doesn't change the experience for the user in the normal
case too much. One benefit is that it replaces the (somewhat arbitrary)
100% cutoff with an indefinite slope, which makes it easier to ballpark
a memory.low value.
As well as this, the old methodology doesn't quite apply generically to
machines with varying amounts of physical memory. Let's say we have a
top level cgroup, workload.slice, and another top level cgroup,
system-management.slice. We want to roughly give 12G to
system-management.slice, so on a 32GB machine we set memory.low to 20GB
in workload.slice, and on a 64GB machine we set memory.low to 52GB.
However, because these are relative amounts to the total machine size,
while the amount of memory we want to generally be willing to yield to
system.slice is absolute (12G), we end up putting more pressure on
system.slice just because we have a larger machine and a larger workload
to fill it, which seems fairly unintuitive. With this new behaviour, we
don't end up with this unintended side effect.
Previously the way that memory.low protection works is that if you are
50% over a certain baseline, you get 50% of your normal scan pressure.
This is certainly better than the previous cliff-edge behaviour, but it
can be improved even further by always considering memory under the
currently enforced protection threshold to be out of bounds. This means
that we can set relatively low memory.low thresholds for variable or
bursty workloads while still getting a reasonable level of protection,
whereas with the previous version we may still trivially hit the 100%
clamp. The previous 100% clamp is also somewhat arbitrary, whereas this
one is more concretely based on the currently enforced protection
threshold, which is likely easier to reason about.
There is also a subtle issue with the way that proportional reclaim
worked previously -- it promotes having no memory.low, since it makes
pressure higher during low reclaim. This happens because we base our
scan pressure modulation on how far memory.current is between memory.min
and memory.low, but if memory.low is unset, we only use the overage
method. In most cromulent configurations, this then means that we end
up with *more* pressure than with no memory.low at all when we're in low
reclaim, which is not really very usable or expected.
With this patch, memory.low and memory.min affect reclaim pressure in a
more understandable and composable way. For example, from a user
standpoint, "protected" memory now remains untouchable from a reclaim
aggression standpoint, and users can also have more confidence that
bursty workloads will still receive some amount of guaranteed
protection.
Link: http://lkml.kernel.org/r/20190322160307.GA3316@chrisdown.name
Signed-off-by: Chris Down <chris@chrisdown.name>
Reviewed-by: Roman Gushchin <guro@fb.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Roman points out that when when we do the low reclaim pass, we scale the
reclaim pressure relative to position between 0 and the maximum
protection threshold.
However, if the maximum protection is based on memory.elow, and
memory.emin is above zero, this means we still may get binary behaviour
on second-pass low reclaim. This is because we scale starting at 0, not
starting at memory.emin, and since we don't scan at all below emin, we
end up with cliff behaviour.
This should be a fairly uncommon case since usually we don't go into the
second pass, but it makes sense to scale our low reclaim pressure
starting at emin.
You can test this by catting two large sparse files, one in a cgroup
with emin set to some moderate size compared to physical RAM, and
another cgroup without any emin. In both cgroups, set an elow larger
than 50% of physical RAM. The one with emin will have less page
scanning, as reclaim pressure is lower.
Rebase on top of and apply the same idea as what was applied to handle
cgroup_memory=disable properly for the original proportional patch
http://lkml.kernel.org/r/20190201045711.GA18302@chrisdown.name ("mm,
memcg: Handle cgroup_disable=memory when getting memcg protection").
Link: http://lkml.kernel.org/r/20190201051810.GA18895@chrisdown.name
Signed-off-by: Chris Down <chris@chrisdown.name>
Suggested-by: Roman Gushchin <guro@fb.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Dennis Zhou <dennis@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
cgroup v2 introduces two memory protection thresholds: memory.low
(best-effort) and memory.min (hard protection). While they generally do
what they say on the tin, there is a limitation in their implementation
that makes them difficult to use effectively: that cliff behaviour often
manifests when they become eligible for reclaim. This patch implements
more intuitive and usable behaviour, where we gradually mount more
reclaim pressure as cgroups further and further exceed their protection
thresholds.
This cliff edge behaviour happens because we only choose whether or not
to reclaim based on whether the memcg is within its protection limits
(see the use of mem_cgroup_protected in shrink_node), but we don't vary
our reclaim behaviour based on this information. Imagine the following
timeline, with the numbers the lruvec size in this zone:
1. memory.low=1000000, memory.current=999999. 0 pages may be scanned.
2. memory.low=1000000, memory.current=1000000. 0 pages may be scanned.
3. memory.low=1000000, memory.current=1000001. 1000001* pages may be
scanned. (?!)
* Of course, we won't usually scan all available pages in the zone even
without this patch because of scan control priority, over-reclaim
protection, etc. However, as shown by the tests at the end, these
techniques don't sufficiently throttle such an extreme change in input,
so cliff-like behaviour isn't really averted by their existence alone.
Here's an example of how this plays out in practice. At Facebook, we are
trying to protect various workloads from "system" software, like
configuration management tools, metric collectors, etc (see this[0] case
study). In order to find a suitable memory.low value, we start by
determining the expected memory range within which the workload will be
comfortable operating. This isn't an exact science -- memory usage deemed
"comfortable" will vary over time due to user behaviour, differences in
composition of work, etc, etc. As such we need to ballpark memory.low,
but doing this is currently problematic:
1. If we end up setting it too low for the workload, it won't have
*any* effect (see discussion above). The group will receive the full
weight of reclaim and won't have any priority while competing with the
less important system software, as if we had no memory.low configured
at all.
2. Because of this behaviour, we end up erring on the side of setting
it too high, such that the comfort range is reliably covered. However,
protected memory is completely unavailable to the rest of the system,
so we might cause undue memory and IO pressure there when we *know* we
have some elasticity in the workload.
3. Even if we get the value totally right, smack in the middle of the
comfort zone, we get extreme jumps between no pressure and full
pressure that cause unpredictable pressure spikes in the workload due
to the current binary reclaim behaviour.
With this patch, we can set it to our ballpark estimation without too much
worry. Any undesirable behaviour, such as too much or too little reclaim
pressure on the workload or system will be proportional to how far our
estimation is off. This means we can set memory.low much more
conservatively and thus waste less resources *without* the risk of the
workload falling off a cliff if we overshoot.
As a more abstract technical description, this unintuitive behaviour
results in having to give high-priority workloads a large protection
buffer on top of their expected usage to function reliably, as otherwise
we have abrupt periods of dramatically increased memory pressure which
hamper performance. Having to set these thresholds so high wastes
resources and generally works against the principle of work conservation.
In addition, having proportional memory reclaim behaviour has other
benefits. Most notably, before this patch it's basically mandatory to set
memory.low to a higher than desirable value because otherwise as soon as
you exceed memory.low, all protection is lost, and all pages are eligible
to scan again. By contrast, having a gradual ramp in reclaim pressure
means that you now still get some protection when thresholds are exceeded,
which means that one can now be more comfortable setting memory.low to
lower values without worrying that all protection will be lost. This is
important because workingset size is really hard to know exactly,
especially with variable workloads, so at least getting *some* protection
if your workingset size grows larger than you expect increases user
confidence in setting memory.low without a huge buffer on top being
needed.
Thanks a lot to Johannes Weiner and Tejun Heo for their advice and
assistance in thinking about how to make this work better.
In testing these changes, I intended to verify that:
1. Changes in page scanning become gradual and proportional instead of
binary.
To test this, I experimented stepping further and further down
memory.low protection on a workload that floats around 19G workingset
when under memory.low protection, watching page scan rates for the
workload cgroup:
+------------+-----------------+--------------------+--------------+
| memory.low | test (pgscan/s) | control (pgscan/s) | % of control |
+------------+-----------------+--------------------+--------------+
| 21G | 0 | 0 | N/A |
| 17G | 867 | 3799 | 23% |
| 12G | 1203 | 3543 | 34% |
| 8G | 2534 | 3979 | 64% |
| 4G | 3980 | 4147 | 96% |
| 0 | 3799 | 3980 | 95% |
+------------+-----------------+--------------------+--------------+
As you can see, the test kernel (with a kernel containing this
patch) ramps up page scanning significantly more gradually than the
control kernel (without this patch).
2. More gradual ramp up in reclaim aggression doesn't result in
premature OOMs.
To test this, I wrote a script that slowly increments the number of
pages held by stress(1)'s --vm-keep mode until a production system
entered severe overall memory contention. This script runs in a highly
protected slice taking up the majority of available system memory.
Watching vmstat revealed that page scanning continued essentially
nominally between test and control, without causing forward reclaim
progress to become arrested.
[0]: https://facebookmicrosites.github.io/cgroup2/docs/overview.html#case-study-the-fbtax2-project
[akpm@linux-foundation.org: reflow block comments to fit in 80 cols]
[chris@chrisdown.name: handle cgroup_disable=memory when getting memcg protection]
Link: http://lkml.kernel.org/r/20190201045711.GA18302@chrisdown.name
Link: http://lkml.kernel.org/r/20190124014455.GA6396@chrisdown.name
Signed-off-by: Chris Down <chris@chrisdown.name>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Roman Gushchin <guro@fb.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Partially revert 16db3d3f11 ("kernel/sysctl.c: threads-max observe
limits") because the patch is causing a regression to any workload which
needs to override the auto-tuning of the limit provided by kernel.
set_max_threads is implementing a boot time guesstimate to provide a
sensible limit of the concurrently running threads so that runaways will
not deplete all the memory. This is a good thing in general but there
are workloads which might need to increase this limit for an application
to run (reportedly WebSpher MQ is affected) and that is simply not
possible after the mentioned change. It is also very dubious to
override an admin decision by an estimation that doesn't have any direct
relation to correctness of the kernel operation.
Fix this by dropping set_max_threads from sysctl_max_threads so any
value is accepted as long as it fits into MAX_THREADS which is important
to check because allowing more threads could break internal robust futex
restriction. While at it, do not use MIN_THREADS as the lower boundary
because it is also only a heuristic for automatic estimation and admin
might have a good reason to stop new threads to be created even when
below this limit.
This became more severe when we switched x86 from 4k to 8k kernel
stacks. Starting since 6538b8ea88 ("x86_64: expand kernel stack to
16K") (3.16) we use THREAD_SIZE_ORDER = 2 and that halved the auto-tuned
value.
In the particular case
3.12
kernel.threads-max = 515561
4.4
kernel.threads-max = 200000
Neither of the two values is really insane on 32GB machine.
I am not sure we want/need to tune the max_thread value further. If
anything the tuning should be removed altogether if proven not useful in
general. But we definitely need a way to override this auto-tuning.
Link: http://lkml.kernel.org/r/20190922065801.GB18814@dhcp22.suse.cz
Fixes: 16db3d3f11 ("kernel/sysctl.c: threads-max observe limits")
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Heinrich Schuchardt <xypron.glpk@gmx.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In kdump kernel, memcg usually is disabled with 'cgroup_disable=memory'
for saving memory. Now kdump kernel will always panic when dump vmcore
to local disk:
BUG: kernel NULL pointer dereference, address: 0000000000000ab8
Oops: 0000 [#1] SMP NOPTI
CPU: 0 PID: 598 Comm: makedumpfile Not tainted 5.3.0+ #26
Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 10/02/2018
RIP: 0010:mem_cgroup_track_foreign_dirty_slowpath+0x38/0x140
Call Trace:
__set_page_dirty+0x52/0xc0
iomap_set_page_dirty+0x50/0x90
iomap_write_end+0x6e/0x270
iomap_write_actor+0xce/0x170
iomap_apply+0xba/0x11e
iomap_file_buffered_write+0x62/0x90
xfs_file_buffered_aio_write+0xca/0x320 [xfs]
new_sync_write+0x12d/0x1d0
vfs_write+0xa5/0x1a0
ksys_write+0x59/0xd0
do_syscall_64+0x59/0x1e0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
And this will corrupt the 1st kernel too with 'cgroup_disable=memory'.
Via the trace and with debugging, it is pointing to commit 97b27821b4
("writeback, memcg: Implement foreign dirty flushing") which introduced
this regression. Disabling memcg causes the null pointer dereference at
uninitialized data in function mem_cgroup_track_foreign_dirty_slowpath().
Fix it by returning directly if memcg is disabled, but not trying to
record the foreign writebacks with dirty pages.
Link: http://lkml.kernel.org/r/20190924141928.GD31919@MiWiFi-R3L-srv
Fixes: 97b27821b4 ("writeback, memcg: Implement foreign dirty flushing")
Signed-off-by: Baoquan He <bhe@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
SECTION_SIZE and SECTION_MASK macros are not getting used anymore. But
they do conflict with existing definitions on arm64 platform causing
following warning during build. Lets drop these unused macros.
mm/memremap.c:16: warning: "SECTION_MASK" redefined
#define SECTION_MASK ~((1UL << PA_SECTION_SHIFT) - 1)
arch/arm64/include/asm/pgtable-hwdef.h:79: note: this is the location of the previous definition
#define SECTION_MASK (~(SECTION_SIZE-1))
mm/memremap.c:17: warning: "SECTION_SIZE" redefined
#define SECTION_SIZE (1UL << PA_SECTION_SHIFT)
arch/arm64/include/asm/pgtable-hwdef.h:78: note: this is the location of the previous definition
#define SECTION_SIZE (_AC(1, UL) << SECTION_SHIFT)
Link: http://lkml.kernel.org/r/1569312010-31313-1-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Reported-by: kbuild test robot <lkp@intel.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Logan Gunthorpe <logang@deltatee.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Calling 'panic()' on a kernel with CONFIG_PREEMPT=y can leave the
calling CPU in an infinite loop, but with interrupts and preemption
enabled. From this state, userspace can continue to be scheduled,
despite the system being "dead" as far as the kernel is concerned.
This is easily reproducible on arm64 when booting with "nosmp" on the
command line; a couple of shell scripts print out a periodic "Ping"
message whilst another triggers a crash by writing to
/proc/sysrq-trigger:
| sysrq: Trigger a crash
| Kernel panic - not syncing: sysrq triggered crash
| CPU: 0 PID: 1 Comm: init Not tainted 5.2.15 #1
| Hardware name: linux,dummy-virt (DT)
| Call trace:
| dump_backtrace+0x0/0x148
| show_stack+0x14/0x20
| dump_stack+0xa0/0xc4
| panic+0x140/0x32c
| sysrq_handle_reboot+0x0/0x20
| __handle_sysrq+0x124/0x190
| write_sysrq_trigger+0x64/0x88
| proc_reg_write+0x60/0xa8
| __vfs_write+0x18/0x40
| vfs_write+0xa4/0x1b8
| ksys_write+0x64/0xf0
| __arm64_sys_write+0x14/0x20
| el0_svc_common.constprop.0+0xb0/0x168
| el0_svc_handler+0x28/0x78
| el0_svc+0x8/0xc
| Kernel Offset: disabled
| CPU features: 0x0002,24002004
| Memory Limit: none
| ---[ end Kernel panic - not syncing: sysrq triggered crash ]---
| Ping 2!
| Ping 1!
| Ping 1!
| Ping 2!
The issue can also be triggered on x86 kernels if CONFIG_SMP=n,
otherwise local interrupts are disabled in 'smp_send_stop()'.
Disable preemption in 'panic()' before re-enabling interrupts.
Link: http://lkml.kernel.org/r/20191002123538.22609-1-will@kernel.org
Link: https://lore.kernel.org/r/BX1W47JXPMR8.58IYW53H6M5N@dragonstone
Signed-off-by: Will Deacon <will@kernel.org>
Reported-by: Xogium <contact@xogium.me>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Feng Tang <feng.tang@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In ocfs2_xa_prepare_entry(), there is an if statement on line 2136 to
check whether loc->xl_entry is NULL:
if (loc->xl_entry)
When loc->xl_entry is NULL, it is used on line 2158:
ocfs2_xa_add_entry(loc, name_hash);
loc->xl_entry->xe_name_hash = cpu_to_le32(name_hash);
loc->xl_entry->xe_name_offset = cpu_to_le16(loc->xl_size);
and line 2164:
ocfs2_xa_add_namevalue(loc, xi);
loc->xl_entry->xe_value_size = cpu_to_le64(xi->xi_value_len);
loc->xl_entry->xe_name_len = xi->xi_name_len;
Thus, possible null-pointer dereferences may occur.
To fix these bugs, if loc-xl_entry is NULL, ocfs2_xa_prepare_entry()
abnormally returns with -EINVAL.
These bugs are found by a static analysis tool STCheck written by us.
[akpm@linux-foundation.org: remove now-unused ocfs2_xa_add_entry()]
Link: http://lkml.kernel.org/r/20190726101447.9153-1-baijiaju1990@gmail.com
Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Unused portion of a part-written fs-block-sized block is not set to zero
in unaligned append direct write.This can lead to serious data
inconsistencies.
Ocfs2 manage disk with cluster size(for example, 1M), part-written in
one cluster will change the cluster state from UN-WRITTEN to WRITTEN,
VFS(function dio_zero_block) doesn't do the cleaning because bh's state
is not set to NEW in function ocfs2_dio_wr_get_block when we write a
WRITTEN cluster. For example, the cluster size is 1M, file size is 8k
and we direct write from 14k to 15k, then 12k~14k and 15k~16k will
contain dirty data.
We have to deal with two cases:
1.The starting position of direct write is outside the file.
2.The starting position of direct write is located in the file.
We need set bh's state to NEW in the first case. In the second case, we
need mapped twice because bh's state of area out file should be set to
NEW while area in file not.
[akpm@linux-foundation.org: coding style fixes]
Link: http://lkml.kernel.org/r/5292e287-8f1a-fd4a-1a14-661e555e0bed@huawei.com
Signed-off-by: Jia Guo <guojia12@huawei.com>
Reviewed-by: Yiwen Jiang <jiangyiwen@huawei.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <joseph.qi@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently execution of panic() continues until Xen's panic notifier
(xen_panic_event()) is called at which point we make a hypercall that
never returns.
This means that any notifier that is supposed to be called later as
well as significant part of panic() code (such as pstore writes from
kmsg_dump()) is never executed.
There is no reason for xen_panic_event() to be this last point in
execution since panic()'s emergency_restart() will call into
xen_emergency_restart() from where we can perform our hypercall.
Nevertheless, we will provide xen_legacy_crash boot option that will
preserve original behavior during crash. This option could be used,
for example, if running kernel dumper (which happens after panic
notifiers) is undesirable.
Reported-by: James Dingwall <james@dingwall.me.uk>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
For the kernel space, all ebreak instructions are determined at compile
time because the kernel space debugging module is currently unsupported.
Hence, it should be treated as a bug if an ebreak instruction which does
not belong to BUG_TRAP_TYPE_WARN or BUG_TRAP_TYPE_BUG is executed in
kernel space. For the userspace, debugging module or user problem may
intentionally insert an ebreak instruction to trigger a SIGTRAP signal.
To approach the above two situations, the do_trap_break() will direct
the BUG_TRAP_TYPE_NONE ebreak exception issued in kernel space to die()
and will send a SIGTRAP to the trapped process only when the ebreak is
in userspace.
Signed-off-by: Vincent Chen <vincent.chen@sifive.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
[paul.walmsley@sifive.com: fixed checkpatch issue]
Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
On RISC-V, when the kernel runs code on behalf of a user thread, and the
kernel executes a WARN() or WARN_ON(), the user thread will be sent
a bogus SIGTRAP. Fix the RISC-V kernel code to not send a SIGTRAP when
a WARN()/WARN_ON() is executed.
Signed-off-by: Vincent Chen <vincent.chen@sifive.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
[paul.walmsley@sifive.com: fixed subject]
Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
When the CONFIG_GENERIC_BUG is disabled by disabling CONFIG_BUG, if a
kernel thread is trapped by BUG(), the whole system will be in the
loop that infinitely handles the ebreak exception instead of entering the
die function. To fix this problem, the do_trap_break() will always call
the die() to deal with the break exception as the type of break is
BUG_TRAP_TYPE_BUG.
Signed-off-by: Vincent Chen <vincent.chen@sifive.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
In commit 9f79b78ef7 ("Convert filldir[64]() from __put_user() to
unsafe_put_user()") I made filldir() use unsafe_put_user(), which
improves code generation on x86 enormously.
But because we didn't have a "unsafe_copy_to_user()", the dirent name
copy was also done by hand with unsafe_put_user() in a loop, and it
turns out that a lot of other architectures didn't like that, because
unlike x86, they have various alignment issues.
Most non-x86 architectures trap and fix it up, and some (like xtensa)
will just fail unaligned put_user() accesses unconditionally. Which
makes that "copy using put_user() in a loop" not work for them at all.
I could make that code do explicit alignment etc, but the architectures
that don't like unaligned accesses also don't really use the fancy
"user_access_begin/end()" model, so they might just use the regular old
__copy_to_user() interface.
So this commit takes that looping implementation, turns it into the x86
version of "unsafe_copy_to_user()", and makes other architectures
implement the unsafe copy version as __copy_to_user() (the same way they
do for the other unsafe_xyz() accessor functions).
Note that it only does this for the copying _to_ user space, and we
still don't have a unsafe version of copy_from_user().
That's partly because we have no current users of it, but also partly
because the copy_from_user() case is slightly different and cannot
efficiently be implemented in terms of a unsafe_get_user() loop (because
gcc can't do asm goto with outputs).
It would be trivial to do this using "rep movsb", which would work
really nicely on newer x86 cores, but really badly on some older ones.
Al Viro is looking at cleaning up all our user copy routines to make
this all a non-issue, but for now we have this simple-but-stupid version
for x86 that works fine for the dirent name copy case because those
names are short strings and we simply don't need anything fancier.
Fixes: 9f79b78ef7 ("Convert filldir[64]() from __put_user() to unsafe_put_user()")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Reported-and-tested-by: Tony Luck <tony.luck@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Force bonded requests to run on distinct engines so that they cannot be
shuffled onto the same engine where timeslicing will reverse the order.
A bonded request will often wait on a semaphore signaled by its master,
creating an implicit dependency -- if we ignore that implicit dependency
and allow the bonded request to run on the same engine and before its
master, we will cause a GPU hang. [Whether it will hang the GPU is
debatable, we should keep on timeslicing and each timeslice should be
"accidentally" counted as forward progress, in which case it should run
but at one-half to one-third speed.]
We can prevent this inversion by restricting which engines we allow
ourselves to jump to upon preemption, i.e. baking in the arrangement
established at first execution. (We should also consider capturing the
implicit dependency using i915_sched_add_dependency(), but first we need
to think about the constraints that requires on the execution/retirement
ordering.)
Fixes: 8ee36e048c ("drm/i915/execlists: Minimalistic timeslicing")
References: ee1136908e ("drm/i915/execlists: Virtual engine bonding")
Testcase: igt/gem_exec_balancer/bonded-slice
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190923152844.8914-3-chris@chris-wilson.co.uk
(cherry picked from commit e2144503bf)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
The officially validated plane width limit is 4k on skl+, however
we already had people using 5k displays before we started to enforce
the limit. Also it seems Windows allows 5k resolutions as well
(though not sure if they do it with one plane or two).
According to hw folks 5k should work with the possible
exception of the following features:
- Ytile (already limited to 4k)
- FP16 (already limited to 4k)
- render compression (already limited to 4k)
- KVMR sprite and cursor (don't care)
- horizontal panning (need to verify this)
- pipe and plane scaling (need to verify this)
So apart from last two items on that list we are already
fine. We should really verify what happens with those last
two items but I don't have a 5k display on hand atm so it'll
have to wait.
In the meantime let's just bump the limit back up to 5k since
several users have already been using it without apparent issues.
At least we'll be no worse off than we were prior to lowering
the limits.
Cc: stable@vger.kernel.org
Cc: Sean Paul <sean@poorly.run>
Cc: José Roberto de Souza <jose.souza@intel.com>
Tested-by: Leho Kraav <leho@kraav.com>
Fixes: 372b9ffb57 ("drm/i915: Fix skl+ max plane width")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111501
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190905135044.2001-1-ville.syrjala@linux.intel.com
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Sean Paul <sean@poorly.run>
(cherry picked from commit bed34ef544)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
As we may unwind incomplete requests (for preemption) prior to
processing the CSB and the schedule-out events, we may update rq->engine
(resetting it to point back to the parent virtual engine) prior to
calling execlists_schedule_out(), invalidating the assertion that the
request still points to the inflight engine. (The likelihood of this is
increased if the CSB interrupt processing is pushed to the ksoftirqd for
being too slow and direct submission overtakes it.)
Tvrtko summarised it as:
"So unwind from direct submission resets rq->engine and races with
process_csb from the tasklet which notices request has actually
completed."
Reported-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Fixes: df40306902 ("drm/i915/execlists: Lift process_csb() out of the irq-off spinlock")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190907105046.19934-1-chris@chris-wilson.co.uk
(cherry picked from commit d810583fc2)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Commit ac7c3e4ff4 ("compiler: enable CONFIG_OPTIMIZE_INLINING
forcibly") allows compiler to uninline functions marked as 'inline'.
In cace of cmpxchg this would cause to reference function
__cmpxchg_called_with_bad_pointer, which is a error case
for catching bugs and will not happen for correct code, if
__cmpxchg is inlined.
Signed-off-by: Thomas Bogendoerfer <tbogendoerfer@suse.de>
[paul.burton@mips.com: s/__cmpxchd/__cmpxchg in subject]
Signed-off-by: Paul Burton <paul.burton@mips.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: James Hogan <jhogan@kernel.org>
Cc: linux-mips@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
scripts/nsdeps automatically generates a patch to add MODULE_IMPORT_NS
tags, and what is nicer, it sorts the lines alphabetically with the
'sort' command. However, the output from the 'sort' command depends on
locale.
For example, I got this:
$ { echo usbstorage; echo usb_storage; } | LANG=en_US.UTF-8 sort
usbstorage
usb_storage
$ { echo usbstorage; echo usb_storage; } | LANG=C sort
usb_storage
usbstorage
So, this means people might potentially send different patches.
This kind of issue was reported in the past, for example,
commit f55f2328bb ("kbuild: make sorting initramfs contents
independent of locale").
Adding 'LANG=C' is a conventional way of fixing when a deterministic
result is desirable.
I added 'LANG=C' very close to the 'sort' command since changing
locale affects the language of error messages etc. We should respect
users' choice as much as possible.
Reviewed-by: Matthias Maennich <maennich@google.com>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Jessica Yu <jeyu@kernel.org>
This script does not use bash-extension. I am guessing this hashbang
was copied from scripts/coccicheck, which really uses bash-extension.
/bin/sh is enough for this script.
Reviewed-by: Matthias Maennich <maennich@google.com>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Jessica Yu <jeyu@kernel.org>
Running 'make nsdeps' in a clean source tree fails as follows:
$ make -s clean; make -s defconfig; make nsdeps
[ snip ]
awk: fatal: cannot open file `init/modules.order' for reading (No such file or directory)
make: *** [Makefile;1307: modules.order] Error 2
make: *** Deleting file 'modules.order'
make: *** Waiting for unfinished jobs....
The cause of the error is 'make nsdeps' does not build modules at all.
Set KBUILD_MODULES to fix it.
Reviewed-by: Matthias Maennich <maennich@google.com>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Jessica Yu <jeyu@kernel.org>
The module namespace produces __strtab_ns_<sym> symbols to store
namespace strings, but it does not guarantee the name uniqueness.
This is a potential problem because we have exported symbols starting
with "ns_".
For example, kernel/capability.c exports the following symbols:
EXPORT_SYMBOL(ns_capable);
EXPORT_SYMBOL(capable);
Assume a situation where those are converted as follows:
EXPORT_SYMBOL_NS(ns_capable, some_namespace);
EXPORT_SYMBOL_NS(capable, some_namespace);
The former expands to "__kstrtab_ns_capable" and "__kstrtab_ns_ns_capable",
and the latter to "__kstrtab_capable" and "__kstrtab_ns_capable".
Then, we have the duplicated "__kstrtab_ns_capable".
To ensure the uniqueness, rename "__kstrtab_ns_*" to "__kstrtabns_*".
Reviewed-by: Matthias Maennich <maennich@google.com>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Jessica Yu <jeyu@kernel.org>
Currently, external module builds produce tons of false-positives:
WARNING: module <mod> uses symbol <sym> from namespace <ns>, but does not import it.
Here, the <ns> part shows a random string.
When you build external modules, the symbol info of vmlinux and
in-kernel modules are read from $(objtree)/Module.symvers, but
read_dump() is buggy in multiple ways:
[1] When the modpost is run for vmlinux and in-kernel modules,
sym_extract_namespace() allocates memory for the namespace. On the
other hand, read_dump() does not, then sym->namespace will point to
somewhere in the line buffer of get_next_line(). The data in the
buffer will be replaced soon, and sym->namespace will end up with
pointing to unrelated data. As a result, check_exports() will show
random strings in the warning messages.
[2] When there is no namespace, sym_extract_namespace() returns NULL.
On the other hand, read_dump() sets namespace to an empty string "".
(but, it will be later replaced with unrelated data due to bug [1].)
The check_exports() shows a warning unless exp->namespace is NULL,
so every symbol read from read_dump() emits the warning, which is
mostly false positive.
To address [1], sym_add_exported() calls strdup() for s->namespace.
The namespace from sym_extract_namespace() must be freed to avoid
memory leak.
For [2], I changed the if-conditional in check_exports().
This commit also fixes sym_add_exported() to set s->namespace correctly
when the symbol is preloaded.
Reviewed-by: Matthias Maennich <maennich@google.com>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Jessica Yu <jeyu@kernel.org>
Currently, EXPORT_SYMBOL_NS(_GPL) constructs the kernel symbol as
follows:
__ksymtab_SYMBOL.NAMESPACE
The sym_extract_namespace() in modpost allocates memory for the part
SYMBOL.NAMESPACE when '.' is contained. One problem is that the pointer
returned by strdup() is lost because the symbol name will be copied to
malloc'ed memory by alloc_symbol(). No one will keep track of the
pointer of strdup'ed memory.
sym->namespace still points to the NAMESPACE part. So, you can free it
with complicated code like this:
free(sym->namespace - strlen(sym->name) - 1);
It complicates memory free.
To fix it elegantly, I swapped the order of the symbol and the
namespace as follows:
__ksymtab_NAMESPACE.SYMBOL
then, simplified sym_extract_namespace() so that it allocates memory
only for the NAMESPACE part.
I prefer this order because it is intuitive and also matches to major
languages. For example, NAMESPACE::NAME in C++, MODULE.NAME in Python.
Reviewed-by: Matthias Maennich <maennich@google.com>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Jessica Yu <jeyu@kernel.org>
Now all scripts in scripts/coccinelle to be automatically called
by coccicheck. However new adding add_namespace.cocci does not
support report mode, which make coccicheck failed.
This add "virtual report" to make the coccicheck go ahead smoothly.
Fixes: eb8305aecb ("scripts: Coccinelle script for namespace dependencies.")
Acked-by: Julia Lawall <julia.lawall@lip6.fr>
Acked-by: Matthias Maennich <maennich@google.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Jessica Yu <jeyu@kernel.org>
If __calc_tpm2_event_size() fails to parse an event it will return 0,
resulting tpm2_calc_event_log_size() returning -1. Currently there is
no check of this return value, and 'efi_tpm_final_log_size' can end up
being set to this negative value resulting in a crash like this one:
BUG: unable to handle page fault for address: ffffbc8fc00866ad
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
RIP: 0010:memcpy_erms+0x6/0x10
Call Trace:
tpm_read_log_efi()
tpm_bios_log_setup()
tpm_chip_register()
tpm_tis_core_init.cold.9+0x28c/0x466
tpm_tis_plat_probe()
platform_drv_probe()
...
Also __calc_tpm2_event_size() returns a size of 0 when it fails
to parse an event, so update function documentation to reflect this.
The root cause of the issue that caused the failure of event parsing
in this case is resolved by Peter Jone's patchset dealing with large
event logs where crossing over a page boundary causes the page with
the event count to be unmapped.
Signed-off-by: Jerry Snitselaar <jsnitsel@redhat.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Ben Dooks <ben.dooks@codethink.co.uk>
Cc: Dave Young <dyoung@redhat.com>
Cc: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Lukas Wunner <lukas@wunner.de>
Cc: Lyude Paul <lyude@redhat.com>
Cc: Matthew Garrett <mjg59@google.com>
Cc: Octavian Purdila <octavian.purdila@intel.com>
Cc: Peter Jones <pjones@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Scott Talbert <swt@techie.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Cc: linux-integrity@vger.kernel.org
Cc: stable@vger.kernel.org
Fixes: c46f340569 ("tpm: Reserve the TPM final events table")
Link: https://lkml.kernel.org/r/20191002165904.8819-6-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull perf/urgent fixes from Arnaldo Carvalho de Melo:
perf script:
Andi Kleen:
- Fix recovery from LBR/binary mismatch in the "brstackinsn" --field.
perf annotate:
Arnaldo Carvalho de Melo:
- Propagate errors so that meaningful messages can be presented to the
user in case of problems.
perf map:
Steve MacLean:
- Fix handling of maps partially overlapped, resolving symbols in the
ranges not replaced by new mmaps.
perf tests:
Ian Rogers:
- Use raise() instead of NULL derefs to avoid causing a SIGILL rather than a
SIGSEGV for optimized builds that turn NULL derefs into ud2 instructions.
perf LLVM:
Ian Rogers:
- Don't access out-of-scope array.
perf inject:
Steve MacLean:
- Fix JIT_CODE_MOVE filename, that was having a u64 truncaded into a 32-bit
snprintf format and also a missing ".so" suffix in another case.
libsubcmd:
Ian Rogers:
- Make _FORTIFY_SOURCE defines dependent on the feature, avoiding
false positives with with memory sanitizers such as LLVM's ASan.
Vendor specific events:
Intel:
Andi Kleen:
- Fix period for Intel fixed counters.
s390:
Thomas Richter (2):
- Fix some event details transaction for machine type 8561.
tools headers UAPI:
Arnaldo Carvalho de Melo:
- Sync headers with the kernel, catching new usbdevfs ioctls and
madvise behaviours to properly decode in 'perf trace' output.
Documentation:
Steve MacLean:
- Correct and clarify jitdump spec.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Sphinx is generating a build warning as the title underline
of this file is too short.
Signed-off-by: Adam Zerella <adam.zerella@gmail.com>
Reviewed-by: Jean Delvare <jdelvare@suse.de>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
CONFIG_COMPAT_VDSO is defined by passing '-DCONFIG_COMPAT_VDSO' to the
compiler when the generic compat vDSO code is in use. It's much cleaner
and simpler to expose this as a proper Kconfig option (like x86 does),
so do that and remove the bodge.
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
For consistency with CROSS_COMPILE_COMPAT, mechanically rename COMPATCC
to CC_COMPAT so that specifying aspects of the compat vDSO toolchain in
the environment isn't needlessly confusing.
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Directly passing the '--target' option to clang by appending to
COMPATCC does not work if COMPATCC has been specified explicitly as
an argument to Make unless the 'override' directive is used, which is
ugly and different to what is done in the top-level Makefile.
Move the '--target' option for clang out of COMPATCC and into
VDSO_CAFLAGS, where it will be picked up when compiling and assembling
the 32-bit vDSO under clang.
Reported-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
KBUILD_CPPFLAGS is defined differently depending on whether the main
compiler is clang or not. This means that it is not possible to build
the compat vDSO with GCC if the rest of the kernel is built with clang.
Define VDSO_CPPFLAGS directly to break this dependency and allow a clang
kernel to build a compat vDSO with GCC:
$ make ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- \
CROSS_COMPILE_COMPAT=arm-linux-gnueabihf- CC=clang \
COMPATCC=arm-linux-gnueabihf-gcc
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
There's no need to export COMPATCC, so just define it locally in the
vdso32/Makefile, which is the only place where it is used.
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Rather than force the use of GCC for the compat cross-compiler, instead
extract the target from CROSS_COMPILE_COMPAT and pass it to clang if the
main compiler is clang.
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
There are two checks to see if the manual gpio is configured, but
these the check is seeing if the structure is NULL instead it
should check to see if there are CTS and/or RTS pins defined.
This patch uses checks for those individual pins instead of
checking for the structure itself to restore auto RTS/CTS.
Signed-off-by: Adam Ford <aford173@gmail.com>
Reviewed-by: Yegor Yefremov <yegorslists@googlemail.com>
Link: https://lore.kernel.org/r/20191006163314.23191-2-aford173@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Patch fixes issue with Halt Endnpoint Test observed during using g_zero
driver as DUT. Bug occurred only on some testing board.
Endpoint can defer transition to Halted state if endpoint has pending
requests.
Patch add additional condition that allows to return correct endpoint
status during Get Endpoint Status request even if the halting endpoint
is in progress.
Reported-by: Rahul Kumar <kurahul@cadence.com>
Signed-off-by: Rahul Kumar <kurahul@cadence.com>
Signed-off-by: Pawel Laszczak <pawell@cadence.com>
Fixes: 7733f6c32e ("usb: cdns3: Add Cadence USB3 DRD Driver")
Tested-by: Roger Quadros <rogerq@ti.com>
Link: https://lore.kernel.org/r/1570430355-26118-1-git-send-email-pawell@cadence.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
arm64 was the last architecture using CROSS_COMPILE_COMPAT_VDSO config
option. With this patch series the dependency in the architecture has
been removed.
Remove CROSS_COMPILE_COMPAT_VDSO from the Unified vDSO library code.
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Older versions of binutils (prior to 2.24) do not support the "ISHLD"
option for memory barrier instructions, which leads to a build failure
when assembling the vdso32 library.
Add a compilation time mechanism that detects if binutils supports those
instructions and configure the kernel accordingly.
Cc: Will Deacon <will@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Reported-by: Will Deacon <will@kernel.org>
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
The .config file and the generated include/config/auto.conf can
end up out of sync after a set of commands since
CONFIG_CROSS_COMPILE_COMPAT_VDSO is not updated correctly.
The sequence can be reproduced as follows:
$ make ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- defconfig
[...]
$ make ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- menuconfig
[set CONFIG_CROSS_COMPILE_COMPAT_VDSO="arm-linux-gnueabihf-"]
$ make ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu-
Which results in:
arch/arm64/Makefile:62: CROSS_COMPILE_COMPAT not defined or empty,
the compat vDSO will not be built
even though the compat vDSO has been built:
$ file arch/arm64/kernel/vdso32/vdso.so
arch/arm64/kernel/vdso32/vdso.so: ELF 32-bit LSB pie executable, ARM,
EABI5 version 1 (SYSV), dynamically linked,
BuildID[sha1]=c67f6c786f2d2d6f86c71f708595594aa25247f6, stripped
A similar case that involves changing the configuration parameter
multiple times can be reconducted to the same family of problems.
Remove the use of CONFIG_CROSS_COMPILE_COMPAT_VDSO altogether and
instead rely on the cross-compiler prefix coming from the environment
via CROSS_COMPILE_COMPAT, much like we do for the rest of the kernel.
Cc: Will Deacon <will@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Reported-by: Will Deacon <will@kernel.org>
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
When detecting a spurious EL1 translation fault, we attempt to compare
ESR_EL1.DFSC with PAR_EL1.FST. We erroneously use FIELD_PREP() to
extract PAR_EL1.FST, when we should be using FIELD_GET().
In the wise words of Robin Murphy:
| FIELD_GET() is a UBFX, FIELD_PREP() is a BFI
Using FIELD_PREP() means that that dfsc & ESR_ELx_FSC_TYPE is always
zero, and hence not equal to ESR_ELx_FSC_FAULT. Thus we detect any
unhandled translation fault as spurious.
... so let's use FIELD_GET() to ensure we don't decide all translation
faults are spurious. ESR_EL1.DFSC occupies bits [5:0], and requires no
shifting.
Fixes: 42f91093b0 ("arm64: mm: Ignore spurious translation faults taken from the kernel")
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reported-by: Robin Murphy <robin.murphy@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will.deacon@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
Commit 7e534323c4 ("mtd: rawnand: Pass a nand_chip object to
chip->read_xxx() hooks") modified the prototype of the struct nand_chip
read_buf function pointer. In the au1550nd driver we have 2
implementations of read_buf. The previously mentioned commit modified
the au_read_buf() implementation to match the function pointer, but not
au_read_buf16(). This results in a compiler warning for MIPS
db1xxx_defconfig builds:
drivers/mtd/nand/raw/au1550nd.c:443:57:
warning: pointer type mismatch in conditional expression
Fix this by updating the prototype of au_read_buf16() to take a struct
nand_chip pointer as its first argument, as is expected after commit
7e534323c4 ("mtd: rawnand: Pass a nand_chip object to chip->read_xxx()
hooks").
Note that this shouldn't have caused any functional issues at runtime,
since the offset of the struct mtd_info within struct nand_chip is 0
making mtd_to_nand() effectively a type-cast.
Signed-off-by: Paul Burton <paul.burton@mips.com>
Fixes: 7e534323c4 ("mtd: rawnand: Pass a nand_chip object to chip->read_xxx() hooks")
Cc: stable@vger.kernel.org # v4.20+
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Currently if the client identifies problems when processing
metadata returned in CREATE response, the open handle is being
leaked. This causes multiple problems like a file missing a lease
break by that client which causes high latencies to other clients
accessing the file. Another side-effect of this is that the file
can't be deleted.
Fix this by closing the file after the client hits an error after
the file was opened and the open descriptor wasn't returned to
the user space. Also convert -ESTALE to -EOPENSTALE to allow
the VFS to revalidate a dentry and retry the open.
Cc: <stable@vger.kernel.org>
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
After 'Initial git repository build' commit,
'mapping_table_ERRHRD' variable has not been used.
So 'mapping_table_ERRHRD' const variable could be removed
to mute below warning message:
fs/cifs/netmisc.c:120:40: warning: unused variable 'mapping_table_ERRHRD' [-Wunused-const-variable]
static const struct smb_to_posix_error mapping_table_ERRHRD[] = {
^
Signed-off-by: Austin Kim <austindh.kim@gmail.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Now that sparse has been fixed, it spotted a couple recent minor
endian errors (and removed one additional sparse warning).
Thanks to Luc Van Oostenryck for his help fixing sparse.
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Guarantee zeroed memory buffers for cases where potential memory
leak to disk can occur. In these cases, kmem_alloc is used and
doesn't zero the buffer, opening the possibility of information
leakage to disk.
Use existing infrastucture (xfs_buf_allocate_memory) to obtain
the already zeroed buffer from kernel memory.
This solution avoids the performance issue that would occur if a
wholesale change to replace kmem_alloc with kmem_zalloc was done.
Signed-off-by: Bill O'Donnell <billodo@redhat.com>
[darrick: fix bitwise complaint about kmflag_mask]
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
The flags arg is always passed as zero, so remove it.
(xfs_buf_get_uncached takes flags to support XBF_NO_IOACCT for
the sb, but that should never be relevant for xfs_get_aghdr_buf)
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
To ensure that all blocks touched by the range [offset, offset + count)
are allocated, we need to calculate the block count from the difference
of the range end (rounded up) and the range start (rounded down).
Before this patch, we just round up the byte count, which may lead to
unaligned ranges not being fully allocated:
$ touch test_file
$ block_size=$(stat -fc '%S' test_file)
$ fallocate -o $((block_size / 2)) -l $block_size test_file
$ xfs_bmap test_file
test_file:
0: [0..7]: 1396264..1396271
1: [8..15]: hole
There should not be a hole there. Instead, the first two blocks should
be fully allocated.
With this patch applied, the result is something like this:
$ touch test_file
$ block_size=$(stat -fc '%S' test_file)
$ fallocate -o $((block_size / 2)) -l $block_size test_file
$ xfs_bmap test_file
test_file:
0: [0..15]: 11024..11039
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
In commit 4ed2863951 ("fs, elf: drop MAP_FIXED usage from elf_map") we
changed elf to use MAP_FIXED_NOREPLACE instead of MAP_FIXED for the
executable mappings.
Then, people reported that it broke some binaries that had overlapping
segments from the same file, and commit ad55eac74f ("elf: enforce
MAP_FIXED on overlaying elf segments") re-instated MAP_FIXED for some
overlaying elf segment cases. But only some - despite the summary line
of that commit, it only did it when it also does a temporary brk vma for
one obvious overlapping case.
Now Russell King reports another overlapping case with old 32-bit x86
binaries, which doesn't trigger that limited case. End result: we had
better just drop MAP_FIXED_NOREPLACE entirely, and go back to MAP_FIXED.
Yes, it's a sign of old binaries generated with old tool-chains, but we
do pride ourselves on not breaking existing setups.
This still leaves MAP_FIXED_NOREPLACE in place for the load_elf_interp()
and the old load_elf_library() use-cases, because nobody has reported
breakage for those. Yet.
Note that in all the cases seen so far, the overlapping elf sections
seem to be just re-mapping of the same executable with different section
attributes. We could possibly introduce a new MAP_FIXED_NOFILECHANGE
flag or similar, which acts like NOREPLACE, but allows just remapping
the same executable file using different protection flags.
It's not clear that would make a huge difference to anything, but if
people really hate that "elf remaps over previous maps" behavior, maybe
at least a more limited form of remapping would alleviate some concerns.
Alternatively, we should take a look at our elf_map() logic to see if we
end up not mapping things properly the first time.
In the meantime, this is the minimal "don't do that then" patch while
people hopefully think about it more.
Reported-by: Russell King <linux@armlinux.org.uk>
Fixes: 4ed2863951 ("fs, elf: drop MAP_FIXED usage from elf_map")
Fixes: ad55eac74f ("elf: enforce MAP_FIXED on overlaying elf segments")
Cc: Michal Hocko <mhocko@suse.com>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull dma-mapping regression fix from Christoph Hellwig:
"Revert an incorret hunk from a patch that caused problems on various
arm boards (Andrey Smirnov)"
* tag 'dma-mapping-5.4-1' of git://git.infradead.org/users/hch/dma-mapping:
dma-mapping: fix false positive warnings in dma_common_free_remap()
scale_up wakes up waiters after scaling up. But after scaling max, it
should not wake up more waiters as waiters will not have anything to
do. This patch fixes this by making scale_up (and also scale_down)
return when threshold is reached.
This bug causes increased fdatasync latency when fdatasync and dd
conv=sync are performed in parallel on 4.19 compared to 4.14. This
bug was introduced during refactoring of blk-wbt code.
Fixes: a79050434b ("blk-rq-qos: refactor out common elements of blk-wbt")
Cc: stable@vger.kernel.org
Cc: Josef Bacik <jbacik@fb.com>
Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This reverts commit 85fbd722ad.
The commit was added as a quick band-aid for a hang that happened when a
block device was removed during system suspend. Now that bdi_wq is not
freezable anymore the hang should not be possible and we can get rid of
this hack by reverting it.
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
A removable block device, such as NVMe or SSD connected over Thunderbolt
can be hot-removed any time including when the system is suspended. When
device is hot-removed during suspend and the system gets resumed, kernel
first resumes devices and then thaws the userspace including freezable
workqueues. What happens in that case is that the NVMe driver notices
that the device is unplugged and removes it from the system. This ends
up calling bdi_unregister() for the gendisk which then schedules
wb_workfn() to be run one more time.
However, since the bdi_wq is still frozen flush_delayed_work() call in
wb_shutdown() blocks forever halting system resume process. User sees
this as hang as nothing is happening anymore.
Triggering sysrq-w reveals this:
Workqueue: nvme-wq nvme_remove_dead_ctrl_work [nvme]
Call Trace:
? __schedule+0x2c5/0x630
? wait_for_completion+0xa4/0x120
schedule+0x3e/0xc0
schedule_timeout+0x1c9/0x320
? resched_curr+0x1f/0xd0
? wait_for_completion+0xa4/0x120
wait_for_completion+0xc3/0x120
? wake_up_q+0x60/0x60
__flush_work+0x131/0x1e0
? flush_workqueue_prep_pwqs+0x130/0x130
bdi_unregister+0xb9/0x130
del_gendisk+0x2d2/0x2e0
nvme_ns_remove+0xed/0x110 [nvme_core]
nvme_remove_namespaces+0x96/0xd0 [nvme_core]
nvme_remove+0x5b/0x160 [nvme]
pci_device_remove+0x36/0x90
device_release_driver_internal+0xdf/0x1c0
nvme_remove_dead_ctrl_work+0x14/0x30 [nvme]
process_one_work+0x1c2/0x3f0
worker_thread+0x48/0x3e0
kthread+0x100/0x140
? current_work+0x30/0x30
? kthread_park+0x80/0x80
ret_from_fork+0x35/0x40
This is not limited to NVMes so exactly same issue can be reproduced by
hot-removing SSD (over Thunderbolt) while the system is suspended.
Prevent this from happening by removing WQ_FREEZABLE from bdi_wq.
Reported-by: AceLan Kao <acelan.kao@canonical.com>
Link: https://marc.info/?l=linux-kernel&m=138695698516487
Link: https://bugzilla.kernel.org/show_bug.cgi?id=204385
Link: https://lore.kernel.org/lkml/20191002122136.GD2819@lahna.fi.intel.com/#t
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pull ARM SoC fixes from Olof Johansson:
"A few fixes this time around:
- Fixup of some clock specifications for DRA7 (device-tree fix)
- Removal of some dead/legacy CPU OPP/PM code for OMAP that throws
warnings at boot
- A few more minor fixups for OMAPs, most around display
- Enable STM32 QSPI as =y since their rootfs sometimes comes from
there
- Switch CONFIG_REMOTEPROC to =y since it went from tristate to bool
- Fix of thermal zone definition for ux500 (5.4 regression)"
* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
ARM: multi_v7_defconfig: Fix SPI_STM32_QSPI support
ARM: dts: ux500: Fix up the CPU thermal zone
arm64/ARM: configs: Change CONFIG_REMOTEPROC from m to y
ARM: dts: am4372: Set memory bandwidth limit for DISPC
ARM: OMAP2+: Fix warnings with broken omap2_set_init_voltage()
ARM: OMAP2+: Add missing LCDC midlemode for am335x
ARM: OMAP2+: Fix missing reset done flag for am3 and am43
ARM: dts: Fix gpio0 flags for am335x-icev2
ARM: omap2plus_defconfig: Enable more droid4 devices as loadable modules
ARM: omap2plus_defconfig: Enable DRM_TI_TFP410
DTS: ARM: gta04: introduce legacy spi-cs-high to make display work again
ARM: dts: Fix wrong clocks for dra7 mcasp
clk: ti: dra7: Fix mcasp8 clock bits
Pull Kbuild fixes from Masahiro Yamada:
- remove unneeded ar-option and KBUILD_ARFLAGS
- remove long-deprecated SUBDIRS
- fix modpost to suppress false-positive warnings for UML builds
- fix namespace.pl to handle relative paths to ${objtree}, ${srctree}
- make setlocalversion work for /bin/sh
- make header archive reproducible
- fix some Makefiles and documents
* tag 'kbuild-fixes-v5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kheaders: make headers archive reproducible
kbuild: update compile-test header list for v5.4-rc2
kbuild: two minor updates for Documentation/kbuild/modules.rst
scripts/setlocalversion: clear local variable to make it work for sh
namespace: fix namespace.pl script to support relative paths
video/logo: do not generate unneeded logo C files
video/logo: remove unneeded *.o pattern from clean-files
integrity: remove pointless subdir-$(CONFIG_...)
integrity: remove unneeded, broken attempt to add -fshort-wchar
modpost: fix static EXPORT_SYMBOL warnings for UML build
kbuild: correct formatting of header in kbuild module docs
kbuild: remove SUBDIRS support
kbuild: remove ar-option and KBUILD_ARFLAGS
Pull SCSI fixes from James Bottomley:
"Twelve patches mostly small but obvious fixes or cosmetic but small
updates"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: qla2xxx: Fix Nport ID display value
scsi: qla2xxx: Fix N2N link up fail
scsi: qla2xxx: Fix N2N link reset
scsi: qla2xxx: Optimize NPIV tear down process
scsi: qla2xxx: Fix stale mem access on driver unload
scsi: qla2xxx: Fix unbound sleep in fcport delete path.
scsi: qla2xxx: Silence fwdump template message
scsi: hisi_sas: Make three functions static
scsi: megaraid: disable device when probe failed after enabled device
scsi: storvsc: setup 1:1 mapping between hardware queue and CPU queue
scsi: qedf: Remove always false 'tmp_prio < 0' statement
scsi: ufs: skip shutdown if hba is not powered
scsi: bnx2fc: Handle scope bits when array returns BUSY or TSF
This makes getdents() and getdents64() do sanity checking on the
pathname that it gives to user space. And to mitigate the performance
impact of that, it first cleans up the way it does the user copying, so
that the code avoids doing the SMAP/PAN updates between each part of the
dirent structure write.
I really wanted to do this during the merge window, but didn't have
time. The conversion of filldir to unsafe_put_user() is something I've
had around for years now in a private branch, but the extra pathname
checking finally made me clean it up to the point where it is mergable.
It's worth noting that the filename validity checking really should be a
bit smarter: it would be much better to delay the error reporting until
the end of the readdir, so that non-corrupted filenames are still
returned. But that involves bigger changes, so let's see if anybody
actually hits the corrupt directory entry case before worrying about it
further.
* branch 'readdir':
Make filldir[64]() verify the directory entry filename is valid
Convert filldir[64]() from __put_user() to unsafe_put_user()
This has been discussed several times, and now filesystem people are
talking about doing it individually at the filesystem layer, so head
that off at the pass and just do it in getdents{64}().
This is partially based on a patch by Jann Horn, but checks for NUL
bytes as well, and somewhat simplified.
There's also commentary about how it might be better if invalid names
due to filesystem corruption don't cause an immediate failure, but only
an error at the end of the readdir(), so that people can still see the
filenames that are ok.
There's also been discussion about just how much POSIX strictly speaking
requires this since it's about filesystem corruption. It's really more
"protect user space from bad behavior" as pointed out by Jann. But
since Eric Biederman looked up the POSIX wording, here it is for context:
"From readdir:
The readdir() function shall return a pointer to a structure
representing the directory entry at the current position in the
directory stream specified by the argument dirp, and position the
directory stream at the next entry. It shall return a null pointer
upon reaching the end of the directory stream. The structure dirent
defined in the <dirent.h> header describes a directory entry.
From definitions:
3.129 Directory Entry (or Link)
An object that associates a filename with a file. Several directory
entries can associate names with the same file.
...
3.169 Filename
A name consisting of 1 to {NAME_MAX} bytes used to name a file. The
characters composing the name may be selected from the set of all
character values excluding the slash character and the null byte. The
filenames dot and dot-dot have special meaning. A filename is
sometimes referred to as a 'pathname component'."
Note that I didn't bother adding the checks to any legacy interfaces
that nobody uses.
Also note that if this ends up being noticeable as a performance
regression, we can fix that to do a much more optimized model that
checks for both NUL and '/' at the same time one word at a time.
We haven't really tended to optimize 'memchr()', and it only checks for
one pattern at a time anyway, and we really _should_ check for NUL too
(but see the comment about "soft errors" in the code about why it
currently only checks for '/')
See the CONFIG_DCACHE_WORD_ACCESS case of hash_name() for how the name
lookup code looks for pathname terminating characters in parallel.
Link: https://lore.kernel.org/lkml/20190118161440.220134-2-jannh@google.com/
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Jann Horn <jannh@google.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We really should avoid the "__{get,put}_user()" functions entirely,
because they can easily be mis-used and the original intent of being
used for simple direct user accesses no longer holds in a post-SMAP/PAN
world.
Manually optimizing away the user access range check makes no sense any
more, when the range check is generally much cheaper than the "enable
user accesses" code that the __{get,put}_user() functions still need.
So instead of __put_user(), use the unsafe_put_user() interface with
user_access_{begin,end}() that really does generate better code these
days, and which is generally a nicer interface. Under some loads, the
multiple user writes that filldir() does are actually quite noticeable.
This also makes the dirent name copy use unsafe_put_user() with a couple
of macros. We do not want to make function calls with SMAP/PAN
disabled, and the code this generates is quite good when the
architecture uses "asm goto" for unsafe_put_user() like x86 does.
Note that this doesn't bother with the legacy cases. Nobody should use
them anyway, so performance doesn't really matter there.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull networking fixes from David Miller:
1) Fix ieeeu02154 atusb driver use-after-free, from Johan Hovold.
2) Need to validate TCA_CBQ_WRROPT netlink attributes, from Eric
Dumazet.
3) txq null deref in mac80211, from Miaoqing Pan.
4) ionic driver needs to select NET_DEVLINK, from Arnd Bergmann.
5) Need to disable bh during nft_connlimit GC, from Pablo Neira Ayuso.
6) Avoid division by zero in taprio scheduler, from Vladimir Oltean.
7) Various xgmac fixes in stmmac driver from Jose Abreu.
8) Avoid 64-bit division in mlx5 leading to link errors on 32-bit from
Michal Kubecek.
9) Fix bad VLAN check in rtl8366 DSA driver, from Linus Walleij.
10) Fix sleep while atomic in sja1105, from Vladimir Oltean.
11) Suspend/resume deadlock in stmmac, from Thierry Reding.
12) Various UDP GSO fixes from Josh Hunt.
13) Fix slab out of bounds access in tcp_zerocopy_receive(), from Eric
Dumazet.
14) Fix OOPS in __ipv6_ifa_notify(), from David Ahern.
15) Memory leak in NFC's llcp_sock_bind, from Eric Dumazet.
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (72 commits)
selftests/net: add nettest to .gitignore
net: qlogic: Fix memory leak in ql_alloc_large_buffers
nfc: fix memory leak in llcp_sock_bind()
sch_dsmark: fix potential NULL deref in dsmark_init()
net: phy: at803x: use operating parameters from PHY-specific status
net: phy: extract pause mode
net: phy: extract link partner advertisement reading
net: phy: fix write to mii-ctrl1000 register
ipv6: Handle missing host route in __ipv6_ifa_notify
net: phy: allow for reset line to be tied to a sleepy GPIO controller
net: ipv4: avoid mixed n_redirects and rate_tokens usage
r8152: Set macpassthru in reset_resume callback
cxgb4:Fix out-of-bounds MSI-X info array access
Revert "ipv6: Handle race in addrconf_dad_work"
net: make sock_prot_memory_pressure() return "const char *"
rxrpc: Fix rxrpc_recvmsg tracepoint
qmi_wwan: add support for Cinterion CLS8 devices
tcp: fix slab-out-of-bounds in tcp_zerocopy_receive()
lib: textsearch: fix escapes in example code
udp: only do GSO if # of segs > 1
...
Pull s390 fixes from Vasily Gorbik:
- defconfig updates
- Fix build errors with CC_OPTIMIZE_FOR_SIZE due to usage of "i"
constraint for function arguments. Two kvm changes acked-by Christian
Borntraeger.
- Fix -Wunused-but-set-variable warnings in mm code.
- Avoid a constant misuse in qdio.
- Handle a case when cpumf is temporarily unavailable.
* tag 's390-5.4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
KVM: s390: mark __insn32_query() as __always_inline
KVM: s390: fix __insn32_query() inline assembly
s390: update defconfigs
s390/pci: mark function(s) __always_inline
s390/mm: mark function(s) __always_inline
s390/jump_label: mark function(s) __always_inline
s390/cpu_mf: mark function(s) __always_inline
s390/atomic,bitops: mark function(s) __always_inline
s390/mm: fix -Wunused-but-set-variable warnings
s390: mark __cpacf_query() as __always_inline
s390/qdio: clarify size of the QIB parm area
s390/cpumf: Fix indentation in sampling device driver
s390/cpumsf: Check for CPU Measurement sampling
s390/cpumf: Use consistant debug print format
__insn32_query() will not compile if the compiler decides to not
inline it, since it contains an inline assembly with an "i" constraint
with variable contents.
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
The inline assembly constraints of __insn32_query() tell the compiler
that only the first byte of "query" is being written to. Intended was
probably that 32 bytes are written to.
Fix and simplify the code and just use a "memory" clobber.
Fixes: d668139718 ("KVM: s390: provide query function for instructions returning 32 byte")
Cc: stable@vger.kernel.org # v5.2+
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
In commit 43d8ce9d65 ("Provide in-kernel headers to make
extending kernel easier") a new mechanism was introduced, for kernels
>=5.2, which embeds the kernel headers in the kernel image or a module
and exposes them in procfs for use by userland tools.
The archive containing the header files has nondeterminism caused by
header files metadata. This patch normalizes the metadata and utilizes
KBUILD_BUILD_TIMESTAMP if provided and otherwise falls back to the
default behaviour.
In commit f7b101d330 ("kheaders: Move from proc to sysfs") it was
modified to use sysfs and the script for generation of the archive was
renamed to what is being patched.
Signed-off-by: Dmitry Goldin <dgoldin+lkml@protonmail.ch>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Commit 6dc280ebee ("coda: remove uapi/linux/coda_psdev.h") removed
a header in question. Some more build errors were fixed. Add more
headers into the test coverage.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Capitalize the first word in the sentence.
Use obj-m instead of obj-y. obj-y still works, but we have no built-in
objects in external module builds. So, obj-m is better IMHO.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Geert Uytterhoeven reports a strange side-effect of commit 858805b336
("kbuild: add $(BASH) to run scripts with bash-extension"), which
inserts the contents of a localversion file in the build directory twice.
[Steps to Reproduce]
$ echo bar > localversion
$ mkdir build
$ cd build/
$ echo foo > localversion
$ make -s -f ../Makefile defconfig include/config/kernel.release
$ cat include/config/kernel.release
5.4.0-rc1foofoobar
This comes down to the behavior change of local variables.
The 'man sh' on my Ubuntu machine, where sh is an alias to dash,
explains as follows:
When a variable is made local, it inherits the initial value and
exported and readonly flags from the variable with the same name
in the surrounding scope, if there is one. Otherwise, the variable
is initially unset.
[Test Code]
foo ()
{
local res
echo "res: $res"
}
res=1
foo
[Result]
$ sh test.sh
res: 1
$ bash test.sh
res:
So, scripts/setlocalversion correctly works only for bash in spite of
its hashbang being #!/bin/sh. Nobody had noticed it before because
CONFIG_SHELL was previously set to bash almost all the time.
Now that CONFIG_SHELL is set to sh, we must write portable and correct
code. I gave the Fixes tag to the commit that uncovered the issue.
Clear the variable 'res' in collect_files() to make it work for sh
(and it also works on distributions where sh is an alias to bash).
Fixes: 858805b336 ("kbuild: add $(BASH) to run scripts with bash-extension")
Reported-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
The namespace.pl script does not work properly if objtree is not set to
an absolute path. The do_nm function is run from within the find
function, which changes directories.
Because of this, appending objtree, $File::Find::dir, and $source, will
return a path which is not valid from the current directory.
This used to work when objtree was set to an absolute path when using
"make namespacecheck". It appears to have not worked when calling
./scripts/namespace.pl directly.
This behavior was changed in 7e1c04779e ("kbuild: Use relative path
for $(objtree)", 2014-05-14)
Rather than fixing the Makefile to set objtree to an absolute path, just
fix namespace.pl to work when srctree and objtree are relative. Also fix
the script to use an absolute path for these by default.
Use the File::Spec module for this purpose. It's been part of perl
5 since 5.005.
The curdir() function is used to get the current directory when the
objtree and srctree aren't set in the environment.
rel2abs() is used to convert possibly relative objtree and srctree
environment variables to absolute paths.
Finally, the catfile() function is used instead of string appending
paths together, since this is more robust when joining paths together.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Currently, all the logo C files are generated irrespective of the
CONFIG options. Adding them to extra-y is wrong. What we need to do
here is to add them to 'targets' so that if_changed works properly.
Files listed in 'targets' are cleaned, so clean-files is unneeded.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
The ima/ and evm/ sub-directories contain built-in objects, so
obj-$(CONFIG_...) is the correct way to descend into them.
subdir-$(CONFIG_...) is redundant.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
I guess commit 15ea0e1e3e ("efi: Import certificates from UEFI Secure
Boot") attempted to add -fshort-wchar for building load_uefi.o, but it
has never worked as intended.
load_uefi.o is created in the platform_certs/ sub-directory. If you
really want to add -fshort-wchar, the correct code is:
$(obj)/platform_certs/load_uefi.o: KBUILD_CFLAGS += -fshort-wchar
But, you do not need to fix it.
Commit 8c97023cf0 ("Kbuild: use -fshort-wchar globally") had already
added -fshort-wchar globally. This code was unneeded in the first place.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
In ql_alloc_large_buffers, a new skb is allocated via netdev_alloc_skb.
This skb should be released if pci_dma_mapping_error fails.
Fixes: 0f8ab89e82 ("qla3xxx: Check return code from pci_map_single() in ql_release_to_lrg_buf_free_list(), ql_populate_free_queue(), ql_alloc_large_buffers(), and ql3xxx_send()")
Signed-off-by: Navid Emamdoost <navid.emamdoost@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King says:
====================
Fix regression with AR8035 speed downgrade
The following series attempts to address an issue spotted by tinywrkb
with the AR8035 on the Cubox-i2 in a situation where the PHY downgrades
the negotiated link.
This is version 2, not much has changed other than rebasing on the
current net tree. Changes have happend to patch 2 due to conflicts,
so I dropped Andrew's reviewed-by. Minor context changes to patch 4
which I don't consider important enough to warrant dropping the
reviewed-by.
Before commit 5502b218e0 ("net: phy: use phy_resolve_aneg_linkmode in
genphy_read_status"), we would read not only the link partner's
advertisement, but also our own advertisement from the PHY registers,
and use both to derive the PHYs current link mode. This works when the
AR8035 downgrades the speed, because it appears that the AR8035 clears
link mode bits in the advertisement registers as part of the downgrade.
Commentary: what is not yet known is whether the AR8035 restores the
advertisement register when the link goes down to the
previous state.
However, since the above referenced commit, we no longer use the PHYs
advertisement registers, instead converting the link partner's
advertisement to the ethtool link mode array, and combine that with
phylib's cached version of our advertisement - which is not updated on
speed downgrade.
This results in phylib disagreeing with the actual operating mode of
the PHY.
Commentary: I wonder how many more PHY drivers are broken by this
commit, but have yet to be discovered.
The obvious way to address this would be to disable the downgrade
feature, and indeed this does fix the problem in tinywrkb's case - his
link partner instead downgrades the speed by reducing its
advertisement, resulting in phylib correctly evaluating a slower speed.
However, it has a serious drawback - the gigabit control register (MII
register 9) appears to become read only. It seems the only way to
update the register is to re-enable the downgrade feature, reset the
PHY, changing register 9, disable the downgrade feature, and reset the
PHY again.
This series attempts to address the problem using a different approach,
similar to the approach taken with Marvell PHYs. The AR8031, AR8033
and AR8035 have a PHY-Specific Status register which reports the
actual operating mode of the PHY - both speed and duplex. This
register correctly reports the operating mode irrespective of whether
autoneg is enabled or not. We use this register to fill in phylib's
speed and duplex parameters.
In detail:
Patch 1 fixes a bug where writing to register 9 does not update
phylib's advertisement mask in the same way that writing register 4
does; this looks like an omission from when gigabit PHY support came
into being.
Patch 2 seperates the generic phylib code which reads the link partners
advertisement from the PHY, so that we can re-use this in the Atheros
PHY driver.
Patch 3 seperates the generic phylib pause mode; phylib provides no
help for MAC drivers to ascertain the negotiated pause mode, it merely
copies the link partner's pause mode bits into its own variables.
Commentary: Both the aforementioned Atheros PHYs and Marvell PHYs
provide the resolved pause modes in terms of whether
we should transmit pause frames, or whether we should
allow reception of pause frames. Surely the resolution
of this should be in phylib?
Patch 4 provides the Atheros PHY driver with a private "read_status"
implementation that fills in phylib's speed and duplex settings
depending on the PHY-Specific status register. This ensures that
phylib and the MAC driver match the operating mode that the PHY has
decided to use. Since the register also gives us MDIX status, we
can trivially fill that status in as well.
Note that, although the bits mentioned in this patch for this register
match those in th Marvell PHY driver, and it is located at the same
address, the meaning of other register bits varies between the PHYs.
Therefore, I do not feel that it would be appropriate to make this some
kind of generic function.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Read the PHY-specific status register for the current operating mode
(speed and duplex) of the PHY. This register reflects the actual
mode that the PHY has resolved depending on either the advertisements
of autoneg is enabled, or the forced mode if autoneg is disabled.
This ensures that phylib's software state always tracks the hardware
state.
It seems both AR8033 (which uses the AR8031 ID) and AR8035 support
this status register. AR8030 is not known at the present time.
This patch depends on "net: phy: extract pause mode" and "net: phy:
extract link partner advertisement reading".
Reported-by: tinywrkb <tinywrkb@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: tinywrkb <tinywrkb@gmail.com>
Fixes: 5502b218e0 ("net: phy: use phy_resolve_aneg_linkmode in genphy_read_status")
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Extract the update of phylib's software pause mode state from
genphy_read_status(), so that we can re-use this functionality with
PHYs that have alternative ways to read the negotiation results.
Tested-by: tinywrkb <tinywrkb@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move reading the link partner advertisement out of genphy_read_status()
into its own separate function. This will allow re-use of this code by
PHY drivers that are able to read the resolved status from the PHY.
Tested-by: tinywrkb <tinywrkb@gmail.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
When userspace writes to the MII_ADVERTISE register, we update phylib's
advertising mask and trigger a renegotiation. However, writing to the
MII_CTRL1000 register, which contains the gigabit advertisement, does
neither. This can lead to phylib's copy of the advertisement becoming
de-synced with the values in the PHY register set, which can result in
incorrect negotiation resolution.
Fixes: 5502b218e0 ("net: phy: use phy_resolve_aneg_linkmode in genphy_read_status")
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rajendra reported a kernel panic when a link was taken down:
[ 6870.263084] BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8
[ 6870.271856] IP: [<ffffffff8efc5764>] __ipv6_ifa_notify+0x154/0x290
<snip>
[ 6870.570501] Call Trace:
[ 6870.573238] [<ffffffff8efc58c6>] ? ipv6_ifa_notify+0x26/0x40
[ 6870.579665] [<ffffffff8efc98ec>] ? addrconf_dad_completed+0x4c/0x2c0
[ 6870.586869] [<ffffffff8efe70c6>] ? ipv6_dev_mc_inc+0x196/0x260
[ 6870.593491] [<ffffffff8efc9c6a>] ? addrconf_dad_work+0x10a/0x430
[ 6870.600305] [<ffffffff8f01ade4>] ? __switch_to_asm+0x34/0x70
[ 6870.606732] [<ffffffff8ea93a7a>] ? process_one_work+0x18a/0x430
[ 6870.613449] [<ffffffff8ea93d6d>] ? worker_thread+0x4d/0x490
[ 6870.619778] [<ffffffff8ea93d20>] ? process_one_work+0x430/0x430
[ 6870.626495] [<ffffffff8ea99dd9>] ? kthread+0xd9/0xf0
[ 6870.632145] [<ffffffff8f01ade4>] ? __switch_to_asm+0x34/0x70
[ 6870.638573] [<ffffffff8ea99d00>] ? kthread_park+0x60/0x60
[ 6870.644707] [<ffffffff8f01ae77>] ? ret_from_fork+0x57/0x70
[ 6870.650936] Code: 31 c0 31 d2 41 b9 20 00 08 02 b9 09 00 00 0
addrconf_dad_work is kicked to be scheduled when a device is brought
up. There is a race between addrcond_dad_work getting scheduled and
taking the rtnl lock and a process taking the link down (under rtnl).
The latter removes the host route from the inet6_addr as part of
addrconf_ifdown which is run for NETDEV_DOWN. The former attempts
to use the host route in __ipv6_ifa_notify. If the down event removes
the host route due to the race to the rtnl, then the BUG listed above
occurs.
Since the DAD sequence can not be aborted, add a check for the missing
host route in __ipv6_ifa_notify. The only way this should happen is due
to the previously mentioned race. The host route is created when the
address is added to an interface; it is only removed on a down event
where the address is kept. Add a warning if the host route is missing
AND the device is up; this is a situation that should never happen.
Fixes: f1705ec197 ("net: ipv6: Make address flushing on ifdown optional")
Reported-by: Rajendra Dendukuri <rajendra.dendukuri@broadcom.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
mdio_device_reset() makes use of the atomic-pretending API flavor for
handling the PHY reset GPIO line.
I found no hint that mdio_device_reset() is called from atomic context
and indeed it uses usleep_range() since long time, so I would assume that
it is OK to sleep there.
This patch switch to gpiod_set_value_cansleep() in mdio_device_reset().
This is relevant if e.g. the PHY reset line is tied to a I2C GPIO
controller.
This has been tested on a ZynqMP board running an upstream 4.19 kernel and
then hand-ported on current kernel tree.
Signed-off-by: Andrea Merello <andrea.merello@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since commit c09551c6ff ("net: ipv4: use a dedicated counter
for icmp_v4 redirect packets") we use 'n_redirects' to account
for redirect packets, but we still use 'rate_tokens' to compute
the redirect packets exponential backoff.
If the device sent to the relevant peer any ICMP error packet
after sending a redirect, it will also update 'rate_token' according
to the leaking bucket schema; typically 'rate_token' will raise
above BITS_PER_LONG and the redirect packets backoff algorithm
will produce undefined behavior.
Fix the issue using 'n_redirects' to compute the exponential backoff
in ip_rt_send_redirect().
Note that we still clear rate_tokens after a redirect silence period,
to avoid changing an established behaviour.
The root cause predates git history; before the mentioned commit in
the critical scenario, the kernel stopped sending redirects, after
the mentioned commit the behavior more randomic.
Reported-by: Xiumei Mu <xmu@redhat.com>
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Fixes: c09551c6ff ("net: ipv4: use a dedicated counter for icmp_v4 redirect packets")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
r8152 may fail to establish network connection after resume from system
suspend.
If the USB port connects to r8152 lost its power during system suspend,
the MAC address was written before is lost. The reason is that The MAC
address doesn't get written again in its reset_resume callback.
So let's set MAC address again in reset_resume callback. Also remove
unnecessary lock as no other locking attempt will happen during
reset_resume.
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When fetching free MSI-X vectors for ULDs, check for the error code
before accessing MSI-X info array. Otherwise, an out-of-bounds access is
attempted, which results in kernel panic.
Fixes: 94cdb8bb99 ("cxgb4: Add support for dynamic allocation of resources for ULD")
Signed-off-by: Shahjada Abul Husain <shahjada@chelsio.com>
Signed-off-by: Vishal Kulkarni <vishal@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit a3ce2a21bb.
Eric reported tests failings with commit. After digging into it,
the bottom line is that the DAD sequence is not to be messed with.
There are too many cases that are expected to proceed regardless
of whether a device is up.
Revert the patch and I will send a different solution for the
problem Rajendra reported.
Signed-off-by: David Ahern <dsahern@gmail.com>
Cc: Eric Dumazet <edumazet@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This function returns string literals which are "const char *".
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix the rxrpc_recvmsg tracepoint to handle being called with a NULL call
parameter.
Fixes: a25e21f0bc ("rxrpc, afs: Use debug_ids rather than pointers in traces")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull MIPS fixes from Paul Burton:
- Build fixes for Cavium Octeon & PMC-Sierra MSP systems, as well as
all pre-MIPSr6 configurations built with binutils < 2.25.
- Boot fixes for 64-bit Loongson systems & SGI IP28 systems.
- Wire up the new clone3 syscall.
- Clean ups for a few build-time warnings.
* tag 'mips_fixes_5.4_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
MIPS: fw/arc: Remove unused addr variable
MIPS: pmcs-msp71xx: Remove unused addr variable
MIPS: pmcs-msp71xx: Add missing MAX_PROM_MEM definition
mips: Loongson: Fix the link time qualifier of 'serial_exit()'
MIPS: init: Prevent adding memory before PHYS_OFFSET
MIPS: init: Fix reservation of memory between PHYS_OFFSET and mem start
MIPS: VDSO: Fix build for binutils < 2.25
MIPS: VDSO: Remove unused gettimeofday.c
MIPS: Wire up clone3 syscall
MIPS: octeon: Include required header; fix octeon ethernet build
MIPS: cpu-bugs64: Mark inline functions as __always_inline
MIPS: dts: ar9331: fix interrupt-controller size
MIPS: Loongson64: Fix boot failure after dropping boot_mem_map
Pull RISC-V fixes from Paul Walmsley:
- Ensure that exclusive-load reservations are terminated after system
call or exception handling. This primarily affects QEMU, which does
not expire load reservations.
- Fix an issue primarily affecting RV32 platforms that can cause the DT
header to be corrupted, causing boot failures.
* tag 'riscv/for-v5.4-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: Fix memblock reservation for device tree blob
RISC-V: Clear load reservations while restoring hart contexts
Pull DeviceTree fixes from Rob Herring:
"Fix several 'dt_binding_check' build failures"
* tag 'devicetree-fixes-for-5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
dt-bindings: phy: lantiq: Fix Property Name
dt-bindings: iio: ad7192: Fix DTC warning in the example
dt-bindings: iio: ad7192: Fix Regulator Properties
dt-bindings: media: rc: Fix redundant string
dt-bindings: dsp: Fix fsl,dsp example
While MR uses live as the SRCU 'update', the MW case uses the xarray
directly, xa_erase() causes the MW to become inaccessible to the pagefault
thread.
Thus whenever a MW is removed from the xarray we must synchronize_srcu()
before freeing it.
This must be done before freeing the mkey as re-use of the mkey while the
pagefault thread is using the stale mkey is undesirable.
Add the missing synchronizes to MW and DEVX indirect mkey and delete the
bogus protection against double destroy in mlx5_core_destroy_mkey()
Fixes: 534fd7aac5 ("IB/mlx5: Manage indirection mkey upon DEVX flow for ODP")
Fixes: 6aec21f6a8 ("IB/mlx5: Page faults handling infrastructure")
Link: https://lore.kernel.org/r/20191001153821.23621-7-jgg@ziepe.ca
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
live is used to signal to the pagefault thread that the MR is initialized
and ready for use. It should be after the umem is assigned and all other
setup is completed. This prevents races (at least) of the form:
CPU0 CPU1
mlx5_ib_alloc_implicit_mr()
implicit_mr_alloc()
live = 1
imr->umem = umem
num_pending_prefetch_inc()
if (live)
atomic_inc(num_pending_prefetch)
atomic_set(num_pending_prefetch,0) // Overwrites other thread's store
Further, live is being used with SRCU as the 'update' in an
acquire/release fashion, so it can not be read and written raw.
Move all live = 1's to after MR initialization is completed and use
smp_store_release/smp_load_acquire() for manipulating it.
Add a missing live = 0 when an implicit MR child is deleted, before
queuing work to do synchronize_srcu().
The barriers in update_odp_mr() were some broken attempt to create a
acquire/release, but were not even applied consistently and missed the
point, delete it as well.
Fixes: 6aec21f6a8 ("IB/mlx5: Page faults handling infrastructure")
Link: https://lore.kernel.org/r/20191001153821.23621-6-jgg@ziepe.ca
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
During destroy setting live = 0 and then synchronize_srcu() prevents
num_pending_prefetch from incrementing, and also, ensures that all work
holding that count is queued on the WQ. Testing before causes races of the
form:
CPU0 CPU1
dereg_mr()
mlx5_ib_advise_mr_prefetch()
srcu_read_lock()
num_pending_prefetch_inc()
if (!live)
live = 0
atomic_read() == 0
// skip flush_workqueue()
atomic_inc()
queue_work();
srcu_read_unlock()
WARN_ON(atomic_read()) // Fails
Swap the order so that the synchronize_srcu() prevents this.
Fixes: a6bc3875f1 ("IB/mlx5: Protect against prefetch of invalid MR")
Link: https://lore.kernel.org/r/20191001153821.23621-5-jgg@ziepe.ca
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
This fixes a race of the form:
CPU0 CPU1
mlx5_ib_invalidate_range() mlx5_ib_invalidate_range()
// This one actually makes npages == 0
ib_umem_odp_unmap_dma_pages()
if (npages == 0 && !dying)
// This one does nothing
ib_umem_odp_unmap_dma_pages()
if (npages == 0 && !dying)
dying = 1;
dying = 1;
schedule_work(&umem_odp->work);
// Double schedule of the same work
schedule_work(&umem_odp->work); // BOOM
npages and dying must be read and written under the umem_mutex lock.
Since whenever ib_umem_odp_unmap_dma_pages() is called mlx5 must also call
mlx5_ib_update_xlt, and both need to be done in the same locking region,
hoist the lock out of unmap.
This avoids an expensive double critical section in
mlx5_ib_invalidate_range().
Fixes: 81713d3788 ("IB/mlx5: Add implicit MR support")
Link: https://lore.kernel.org/r/20191001153821.23621-4-jgg@ziepe.ca
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
mlx5_ib_update_xlt() must be protected against parallel free of the MR it
is accessing, also it must be called single threaded while updating the
HW. Otherwise we can have races of the form:
CPU0 CPU1
mlx5_ib_update_xlt()
mlx5_odp_populate_klm()
odp_lookup() == NULL
pklm = ZAP
implicit_mr_get_data()
implicit_mr_alloc()
<update interval tree>
mlx5_ib_update_xlt
mlx5_odp_populate_klm()
odp_lookup() != NULL
pklm = VALID
mlx5_ib_post_send_wait()
mlx5_ib_post_send_wait() // Replaces VALID with ZAP
This can be solved by putting both the SRCU and the umem_mutex lock around
every call to mlx5_ib_update_xlt(). This ensures that the content of the
interval tree relavent to mlx5_odp_populate_klm() (ie mr->parent == mr)
will not change while it is running, and thus the posted WRs to update the
KLM will always reflect the correct information.
The race above will resolve by either having CPU1 wait till CPU0 completes
the ZAP or CPU0 will run after the add and instead store VALID.
The pagefault path adding children already holds the umem_mutex and SRCU,
so the only missed lock is during MR destruction.
Fixes: 81713d3788 ("IB/mlx5: Add implicit MR support")
Link: https://lore.kernel.org/r/20191001153821.23621-3-jgg@ziepe.ca
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The addr variable in prom_free_prom_memory() has been unused since
commit 0df1007677 ("MIPS: fw: Record prom memory"), leading to a
compiler warning:
arch/mips/fw/arc/memory.c:163:16:
warning: unused variable 'addr' [-Wunused-variable]
Fix this by removing the unused variable.
Signed-off-by: Paul Burton <paul.burton@mips.com>
Fixes: 0df1007677 ("MIPS: fw: Record prom memory")
Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
Cc: linux-mips@vger.kernel.org
Pull KVM fixes from Paolo Bonzini:
"ARM and x86 bugfixes of all kinds.
The most visible one is that migrating a nested hypervisor has always
been busted on Broadwell and newer processors, and that has finally
been fixed"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (22 commits)
KVM: x86: omit "impossible" pmu MSRs from MSR list
KVM: nVMX: Fix consistency check on injected exception error code
KVM: x86: omit absent pmu MSRs from MSR list
selftests: kvm: Fix libkvm build error
kvm: vmx: Limit guest PMCs to those supported on the host
kvm: x86, powerpc: do not allow clearing largepages debugfs entry
KVM: selftests: x86: clarify what is reported on KVM_GET_MSRS failure
KVM: VMX: Set VMENTER_L1D_FLUSH_NOT_REQUIRED if !X86_BUG_L1TF
selftests: kvm: add test for dirty logging inside nested guests
KVM: x86: fix nested guest live migration with PML
KVM: x86: assign two bits to track SPTE kinds
KVM: x86: Expose XSAVEERPTR to the guest
kvm: x86: Enumerate support for CLZERO instruction
kvm: x86: Use AMD CPUID semantics for AMD vCPUs
kvm: x86: Improve emulation of CPUID leaves 0BH and 1FH
KVM: X86: Fix userspace set invalid CR4
kvm: x86: Fix a spurious -E2BIG in __do_cpuid_func
KVM: LAPIC: Loosen filter for adaptive tuning of lapic_timer_advance_ns
KVM: arm/arm64: vgic: Use the appropriate TRACE_INCLUDE_PATH
arm64: KVM: Kill hyp_alternate_select()
...
Nicolas pointed out that the cxgb4 driver is doing dma off of the stack,
which is generally considered a very bad thing. On some architectures it
could be a security problem, but odds are none of them actually run this
driver, so it's just a "normal" bug.
Resolve this by allocating the memory for a message off of the heap
instead of the stack. kmalloc() always will give us a proper memory
location that DMA will work correctly from.
Link: https://lore.kernel.org/r/20191001165611.GA3542072@kroah.com
Reported-by: Nicolas Waisman <nico@semmle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Tested-by: Potnuri Bharat Teja <bharat@chelsio.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Pull xen fixes and cleanups from Juergen Gross:
- a fix in the Xen balloon driver avoiding hitting a BUG_ON() in some
cases, plus a follow-on cleanup series for that driver
- a patch for introducing non-blocking EFI callbacks in Xen's EFI
driver, plu a cleanup patch for Xen EFI handling merging the x86 and
ARM arch specific initialization into the Xen EFI driver
- a fix of the Xen xenbus driver avoiding a self-deadlock when cleaning
up after a user process has died
- a fix for Xen on ARM after removal of ZONE_DMA
- a cleanup patch for avoiding build warnings for Xen on ARM
* tag 'for-linus-5.4-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen/xenbus: fix self-deadlock after killing user process
xen/efi: have a common runtime setup function
arm: xen: mm: use __GPF_DMA32 for arm64
xen/balloon: Clear PG_offline in balloon_retrieve()
xen/balloon: Mark pages PG_offline in balloon_append()
xen/balloon: Drop __balloon_append()
xen/balloon: Set pages PageOffline() in balloon_add_region()
ARM: xen: unexport HYPERVISOR_platform_op function
xen/efi: Set nonblocking callbacks
In the process of moving the debug counters sysfs entries, the commit
mentioned below eliminated the cm_infiniband sysfs directory.
This sysfs directory was tied to the cm_port object allocated in procedure
cm_add_one().
Before the commit below, this cm_port object was freed via a call to
kobject_put(port->kobj) in procedure cm_remove_port_fs().
Since port no longer uses its kobj, kobject_put(port->kobj) was eliminated.
This, however, meant that kfree was never called for the cm_port buffers.
Fix this by adding explicit kfree(port) calls to functions cm_add_one()
and cm_remove_one().
Note: the kfree call in the first chunk below (in the cm_add_one error
flow) fixes an old, undetected memory leak.
Fixes: c87e65cfb9 ("RDMA/cm: Move debug counters to be under relevant IB device")
Link: https://lore.kernel.org/r/20190916071154.20383-2-leon@kernel.org
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Pull copy_struct_from_user() helper from Christian Brauner:
"This contains the copy_struct_from_user() helper which got split out
from the openat2() patchset. It is a generic interface designed to
copy a struct from userspace.
The helper will be especially useful for structs versioned by size of
which we have quite a few. This allows for backwards compatibility,
i.e. an extended struct can be passed to an older kernel, or a legacy
struct can be passed to a newer kernel. For the first case (extended
struct, older kernel) the new fields in an extended struct can be set
to zero and the struct safely passed to an older kernel.
The most obvious benefit is that this helper lets us get rid of
duplicate code present in at least sched_setattr(), perf_event_open(),
and clone3(). More importantly it will also help to ensure that users
implementing versioning-by-size end up with the same core semantics.
This point is especially crucial since we have at least one case where
versioning-by-size is used but with slighly different semantics:
sched_setattr(), perf_event_open(), and clone3() all do do similar
checks to copy_struct_from_user() while rt_sigprocmask(2) always
rejects differently-sized struct arguments.
With this pull request we also switch over sched_setattr(),
perf_event_open(), and clone3() to use the new helper"
* tag 'copy-struct-from-user-v5.4-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
usercopy: Add parentheses around assignment in test_copy_struct_from_user
perf_event_open: switch to copy_struct_from_user()
sched_setattr: switch to copy_struct_from_user()
clone3: switch to copy_struct_from_user()
lib: introduce copy_struct_from_user() helper
i40iw IB device registration fails with ENODEV.
ib_register_device
setup_device/setup_port_data
i40iw_port_immutable
ib_query_port
iw_query_port
ib_device_get_netdev(ENODEV)
ib_device_get_netdev() does not have a netdev associated
with the ibdev and thus fails.
Use ib_device_set_netdev() to associate netdev to ibdev
in i40iw before IB device registration.
Fixes: 4929116bdf ("RDMA/core: Add common iWARP query port")
Link: https://lore.kernel.org/r/20190925164524.856-1-shiraz.saleem@intel.com
Signed-off-by: Shiraz, Saleem <shiraz.saleem@intel.com>
Reviewed-by: Kamal Heib <kamalheib1@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Pull clone3/pidfd fixes from Christian Brauner:
"This contains a couple of fixes:
- Fix pidfd selftest compilation (Shuah Kahn)
Due to a false linking instruction in the Makefile compilation for
the pidfd selftests would fail on some systems.
- Fix compilation for glibc on RISC-V systems (Seth Forshee)
In some scenarios linux/uapi/linux/sched.h is included where
__ASSEMBLY__ is defined causing a build failure because struct
clone_args was not guarded by an #ifndef __ASSEMBLY__.
- Add missing clone3() and struct clone_args kernel-doc (Christian Brauner)
clone3() and struct clone_args were missing kernel-docs. (The goal
is to use kernel-doc for any function or type where it's worth it.)
For struct clone_args this also contains a comment about the fact
that it's versioned by size"
* tag 'for-linus-20191003' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
sched: add kernel-doc for struct clone_args
fork: add kernel-doc for clone3
selftests: pidfd: Fix undefined reference to pthread_create()
sched: Add __ASSEMBLY__ guards around struct clone_args
Pull drm fixes from Dave Airlie:
"Been offline for 3 days, got back and had some fixes queued up.
Nothing too major, the i915 dp-mst fix is important, and amdgpu has a
bulk move speedup fix and some regressions, but nothing too insane for
an rc2 pull. The intel fixes are also 2 weeks worth, they missed the
boat last week.
core:
- writeback fixes
i915:
- Fix DP-MST crtc_mask
- Fix dsc dpp calculations
- Fix g4x sprite scaling stride check with GTT remapping
- Fix concurrence on cases where requests where getting retired at
same time as resubmitted to HW
- Fix gen9 display resolutions by setting the right max plane width
- Fix GPU hang on preemption
- Mark contents as dirty on a write fault. This was breaking cursor
sprite with dumb buffers.
komeda:
- memory leak fix
tilcdc:
- include fix
amdgpu:
- Enable bulk moves
- Power metrics fixes for Navi
- Fix S4 regression
- Add query for tcc disabled mask
- Fix several leaks in error paths
- randconfig fixes
- clang fixes"
* tag 'drm-fixes-2019-10-04' of git://anongit.freedesktop.org/drm/drm: (21 commits)
Revert "drm/i915: Fix DP-MST crtc_mask"
drm/omap: fix max fclk divider for omap36xx
drm/i915: Fix g4x sprite scaling stride check with GTT remapping
drm/i915/dp: Fix dsc bpp calculations, v5.
drm/amd/display: fix dcn21 Makefile for clang
drm/amd/display: hide an unused variable
drm/amdgpu: display_mode_vba_21: remove uint typedef
drm/amdgpu: hide another #warning
drm/amdgpu: make pmu support optional, again
drm/amd/display: memory leak
drm/amdgpu: fix multiple memory leaks in acp_hw_init
drm/amdgpu: return tcc_disabled_mask to userspace
drm/amdgpu: don't increment vram lost if we are in hibernation
Revert "drm/amdgpu: disable stutter mode for renoir"
drm/amd/powerplay: add sensor lock support for smu
drm/amd/powerplay: change metrics update period from 1ms to 100ms
drm/amdgpu: revert "disable bulk moves for now"
drm/tilcdc: include linux/pinctrl/consumer.h again
drm/komeda: prevent memory leak in komeda_wb_connector_add
drm: Clear the fence pointer when writeback job signaled
...
Pull block fixes from Jens Axboe:
- Mandate timespec64 for the io_uring timeout ABI (Arnd)
- Set of NVMe changes via Sagi:
- controller removal race fix from Balbir
- quirk additions from Gabriel and Jian-Hong
- nvme-pci power state save fix from Mario
- Add 64bit user commands (for 64bit registers) from Marta
- nvme-rdma/nvme-tcp fixes from Max, Mark and Me
- Minor cleanups and nits from James, Dan and John
- Two s390 dasd fixes (Jan, Stefan)
- Have loop change block size in DIO mode (Martijn)
- paride pg header ifdef guard (Masahiro)
- Two blk-mq queue scheduler tweaks, fixing an ordering issue on zoned
devices and suboptimal performance on others (Ming)
* tag 'for-linus-2019-10-03' of git://git.kernel.dk/linux-block: (22 commits)
block: sed-opal: fix sparse warning: convert __be64 data
block: sed-opal: fix sparse warning: obsolete array init.
block: pg: add header include guard
Revert "s390/dasd: Add discard support for ESE volumes"
s390/dasd: Fix error handling during online processing
io_uring: use __kernel_timespec in timeout ABI
loop: change queue block size to match when using DIO
blk-mq: apply normal plugging for HDD
blk-mq: honor IO scheduler for multiqueue devices
nvme-rdma: fix possible use-after-free in connect timeout
nvme: Move ctrl sqsize to generic space
nvme: Add ctrl attributes for queue_count and sqsize
nvme: allow 64-bit results in passthru commands
nvme: Add quirk for Kingston NVME SSD running FW E8FK11.T
nvmet-tcp: remove superflous check on request sgl
Added QUIRKs for ADATA XPG SX8200 Pro 512GB
nvme-rdma: Fix max_hw_sectors calculation
nvme: fix an error code in nvme_init_subsystem()
nvme-pci: Save PCI state before putting drive into deepest state
nvme-tcp: fix wrong stop condition in io_work
...
The fixed MKHI client on PCH 6 gen platforms
does not support fw version retrieval.
The error is not fatal, but it fills up the kernel logs and
slows down the driver start.
This patch disables requesting FW version on GEN6 and earlier platforms.
Fixes warning:
[ 15.964298] mei mei::55213584-9a29-4916-badf-0fb7ed682aeb:01: Could not read FW version
[ 15.964301] mei mei::55213584-9a29-4916-badf-0fb7ed682aeb:01: version command failed -5
Cc: <stable@vger.kernel.org> +v4.18
Cc: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com>
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Link: https://lore.kernel.org/r/20191004181722.31374-1-tomas.winkler@intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Always inline asm inlines with variable operands for "i" constraints,
since they won't compile if the compiler would decide to not inline
them.
Reported-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Always inline asm inlines with variable operands for "i" constraints,
since they won't compile if the compiler would decide to not inline
them.
Reported-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Always inline asm inlines with variable operands for "i" constraints,
since they won't compile if the compiler would decide to not inline
them.
Reported-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Always inline asm inlines with variable operands for "i" constraints,
since they won't compile if the compiler would decide to not inline
them.
Reported-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Always inline asm inlines with variable operands for "i" constraints,
since they won't compile if the compiler would decide to not inline
them.
Reported-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Convert two functions to static inline to get ride of W=1 GCC warnings
like,
mm/gup.c: In function 'gup_pte_range':
mm/gup.c:1816:16: warning: variable 'ptem' set but not used
[-Wunused-but-set-variable]
pte_t *ptep, *ptem;
^~~~
mm/mmap.c: In function 'acct_stack_growth':
mm/mmap.c:2322:16: warning: variable 'new_start' set but not used
[-Wunused-but-set-variable]
unsigned long new_start;
^~~~~~~~~
Signed-off-by: Qian Cai <cai@lca.pw>
Link: https://lore.kernel.org/lkml/1570138596-11913-1-git-send-email-cai@lca.pw/
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
arch/s390/kvm/kvm-s390.c calls on several places __cpacf_query() directly,
which makes it impossible to meet the "i" constraint for the asm operands
(opcode in this case).
As we are now force-enabling CONFIG_OPTIMIZE_INLINING on all
architectures, this causes a build failure on s390:
In file included from arch/s390/kvm/kvm-s390.c:44:
./arch/s390/include/asm/cpacf.h: In function '__cpacf_query':
./arch/s390/include/asm/cpacf.h:179:2: warning: asm operand 3 probably doesn't match constraints
179 | asm volatile(
| ^~~
./arch/s390/include/asm/cpacf.h:179:2: error: impossible constraint in 'asm'
Mark __cpacf_query() as __always_inline in order to fix that, analogically
how we fixes __cpacf_check_opcode(), cpacf_query_func() and scpacf_query()
already.
Reported-and-tested-by: Michal Kubecek <mkubecek@suse.cz>
Fixes: d83623c5ea ("s390: mark __cpacf_check_opcode() and cpacf_query_func() as __always_inline")
Fixes: e60fb8bf68 ("s390/cpacf: mark scpacf_query() as __always_inline")
Fixes: ac7c3e4ff4 ("compiler: enable CONFIG_OPTIMIZE_INLINING forcibly")
Fixes: 9012d01166 ("compiler: allow all arches to enable CONFIG_OPTIMIZE_INLINING")
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Link: https://lore.kernel.org/lkml/nycvar.YFH.7.76.1910012203010.13160@cbobk.fhfr.pm
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Johan writes:
USB-serial fixes for 5.4-rc2
Here's a fix for a long-standing issue in the keyspan driver which could
lead to NULL-pointer dereferences when a device had unexpected endpoint
descriptors.
Included are also some new device IDs.
All but the last two commits have been in linux-next with no reported
issues.
Signed-off-by: Johan Hovold <johan@kernel.org>
* tag 'usb-serial-5.4-rc2' of https://git.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial:
USB: serial: keyspan: fix NULL-derefs on open() and write()
USB: serial: option: add support for Cinterion CLS8 devices
USB: serial: option: add Telit FN980 compositions
USB: serial: ftdi_sio: add device IDs for Sienna and Echelon PL-20
Fix tty driver build on SPARC by not using __exitdata.
It appears that SPARC does not support section .exit.data.
Fixes these build errors:
`.exit.data' referenced in section `.exit.text' of drivers/tty/n_hdlc.o: defined in discarded section `.exit.data' of drivers/tty/n_hdlc.o
`.exit.data' referenced in section `.exit.text' of drivers/tty/n_hdlc.o: defined in discarded section `.exit.data' of drivers/tty/n_hdlc.o
`.exit.data' referenced in section `.exit.text' of drivers/tty/n_hdlc.o: defined in discarded section `.exit.data' of drivers/tty/n_hdlc.o
`.exit.data' referenced in section `.exit.text' of drivers/tty/n_hdlc.o: defined in discarded section `.exit.data' of drivers/tty/n_hdlc.o
Reported-by: kbuild test robot <lkp@intel.com>
Fixes: 063246641d ("format-security: move static strings to const")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: https://lore.kernel.org/r/675e7bd9-955b-3ff3-1101-a973b58b5b75@infradead.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Call uart_unregister_driver() conditionally instead of
unconditionally, only if it has been previously registered.
This uses driver.state, just as the sh-sci.c driver does.
Fixes this null pointer dereference in tty_unregister_driver(),
since the 'driver' argument is null:
general protection fault: 0000 [#1] PREEMPT SMP KASAN PTI
RIP: 0010:tty_unregister_driver+0x25/0x1d0
Fixes: 238b8721a5 ("[PATCH] serial uartlite driver")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: stable <stable@vger.kernel.org>
Cc: Peter Korsgaard <jacmet@sunsite.dk>
Link: https://lore.kernel.org/r/9c8e6581-6fcc-a595-0897-4d90f5d710df@infradead.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
As platform_get_irq() now prints an error when the interrupt does not
exist, scary warnings may be printed for optional interrupts:
sh-sci e6550000.serial: IRQ index 1 not found
sh-sci e6550000.serial: IRQ index 2 not found
sh-sci e6550000.serial: IRQ index 3 not found
sh-sci e6550000.serial: IRQ index 4 not found
sh-sci e6550000.serial: IRQ index 5 not found
Fix this by calling platform_get_irq_optional() instead for all but the
first interrupts, which are optional.
Fixes: 7723f4c5ec ("driver core: platform: Add an error message to platform_get_irq*()")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Tested-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Link: https://lore.kernel.org/r/20191001180743.1041-1-geert+renesas@glider.be
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Since commit c2b71462d2 ("USB: core: Fix bug caused by duplicate
interface PM usage counter") USB drivers must always balance their
runtime PM gets and puts, including when the driver has already been
unbound from the interface.
Leaving the interface with a positive PM usage counter would prevent a
later bound driver from suspending the device.
Note that runtime PM has never actually been enabled for this driver
since the support_autosuspend flag in its usb_driver struct is not set.
Fixes: c2b71462d2 ("USB: core: Fix bug caused by duplicate interface PM usage counter")
Cc: stable <stable@vger.kernel.org>
Acked-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://lore.kernel.org/r/20191001084908.2003-5-johan@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Since commit c2b71462d2 ("USB: core: Fix bug caused by duplicate
interface PM usage counter") USB drivers must always balance their
runtime PM gets and puts, including when the driver has already been
unbound from the interface.
Leaving the interface with a positive PM usage counter would prevent a
later bound driver from suspending the device.
Fixes: c2b71462d2 ("USB: core: Fix bug caused by duplicate interface PM usage counter")
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://lore.kernel.org/r/20191001084908.2003-4-johan@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Since commit c2b71462d2 ("USB: core: Fix bug caused by duplicate
interface PM usage counter") USB drivers must always balance their
runtime PM gets and puts, including when the driver has already been
unbound from the interface.
Leaving the interface with a positive PM usage counter would prevent a
later bound driver from suspending the device.
Fixes: c2b71462d2 ("USB: core: Fix bug caused by duplicate interface PM usage counter")
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://lore.kernel.org/r/20191001084908.2003-3-johan@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Since commit c2b71462d2 ("USB: core: Fix bug caused by duplicate
interface PM usage counter") USB drivers must always balance their
runtime PM gets and puts, including when the driver has already been
unbound from the interface.
Leaving the interface with a positive PM usage counter would prevent a
later bound driver from suspending the device.
Fixes: c2b71462d2 ("USB: core: Fix bug caused by duplicate interface PM usage counter")
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://lore.kernel.org/r/20191001084908.2003-2-johan@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
While the original bindings that were superseeded by the YAML schemas
didn't mention that phy-names was needed, it turns out that phy-names is
required if phys is set according to phy/phy-bindings.txt.
Let's add back those properties.
Fixes: 14ec072a19 ("dt-bindings: usb: Convert USB HCD generic binding to YAML")
Fixes: c93bcace10 ("dt-bindings: usb: Convert the generic OHCI binding to YAML")
Fixes: c3e2485d5f ("dt-bindings: usb: Convert the generic EHCI binding to YAML")
Reported-by: Emmanuel Vadot <manu@bidouilliste.com>
Signed-off-by: Maxime Ripard <mripard@kernel.org>
Link: https://lore.kernel.org/r/20191002112651.100504-2-mripard@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This reverts commits 3d109bdca9 ("ARM: dts: sunxi: Remove useless
phy-names from EHCI and OHCI"), 0a3df8bb6d ("ARM: dts: sunxi: h3/h5:
Remove useless phy-names from EHCI and OHCI") and 3c7ab90aaa ("arm64:
dts: allwinner: Remove useless phy-names from EHCI and OHCI").
It turns out that while the USB bindings were not mentionning it, the PHY
client bindings were mandating that phy-names is set when phys is. Let's
add it back.
Fixes: 3d109bdca9 ("ARM: dts: sunxi: Remove useless phy-names from EHCI and OHCI")
Fixes: 0a3df8bb6d ("ARM: dts: sunxi: h3/h5: Remove useless phy-names from EHCI and OHCI")
Fixes: 3c7ab90aaa ("arm64: dts: allwinner: Remove useless phy-names from EHCI and OHCI")
Reported-by: Emmanuel Vadot <manu@bidouilliste.com>
Signed-off-by: Maxime Ripard <mripard@kernel.org>
Link: https://lore.kernel.org/r/20191002112651.100504-1-mripard@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
gcc points out a suspicious cast from a pointer to an 'int' when
compile-testing on 64-bit architectures.
drivers/usb/gadget/udc/lpc32xx_udc.c: In function ‘udc_pop_fifo’:
drivers/usb/gadget/udc/lpc32xx_udc.c:1156:11: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
drivers/usb/gadget/udc/lpc32xx_udc.c: In function ‘udc_stuff_fifo’:
drivers/usb/gadget/udc/lpc32xx_udc.c:1257:11: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
The code works find, but it's easy enough to change the cast to
a uintptr_t to shut up that warning.
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20190918200201.2292008-1-arnd@arndb.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The system can hit a deadlock if an xhci adapter breaks while initializing.
The deadlock is between two threads: thread 1 is tearing down the
adapter and is stuck in usb_unlocked_disable_lpm waiting to lock the
hcd->handwidth_mutex. Thread 2 is holding this mutex (while still trying
to add a usb device), but is stuck in xhci_endpoint_reset waiting for a
stop or config command to complete. A reboot is required to resolve.
It turns out when calling xhci_queue_stop_endpoint and
xhci_queue_configure_endpoint in xhci_endpoint_reset, the return code is
not checked for errors. If the timing is right and the adapter dies just
before either of these commands get issued, we hang indefinitely waiting
for a completion on a command that didn't get issued.
This wasn't a problem before the following fix because we didn't send
commands in xhci_endpoint_reset:
commit f5249461b5 ("xhci: Clear the host side toggle manually when
endpoint is soft reset")
With the patch I am submitting, a duration test which breaks adapters
during initialization (and which deadlocks with the standard kernel) runs
without issue.
Fixes: f5249461b5 ("xhci: Clear the host side toggle manually when endpoint is soft reset")
Cc: <stable@vger.kernel.org> # v4.17+
Cc: Torez Smith <torez@redhat.com>
Signed-off-by: Bill Kuzeja <william.kuzeja@stratus.com>
Signed-off-by: Torez Smith <torez@redhat.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/1570190373-30684-7-git-send-email-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The check printing out the "WARN Wrong bounce buffer write length:"
uses incorrect values when comparing bytes written from scatterlist
to bounce buffer. Actual copied lengths are fine.
The used seg->bounce_len will be set to equal new_buf_len a few lines later
in the code, but is incorrect when doing the comparison.
The patch which added this false warning was backported to 4.8+ kernels
so this should be backported as far as well.
Cc: <stable@vger.kernel.org> # v4.8+
Fixes: 597c56e372 ("xhci: update bounce buffer with correct sg num")
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/1570190373-30684-2-git-send-email-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The driver is using its struct usb_device pointer as an inverted
disconnected flag, but was setting it to NULL before making sure all
completion handlers had run. This could lead to a NULL-pointer
dereference in a number of dev_dbg and dev_err statements in the
completion handlers which relies on said pointer.
Fix this by unconditionally stopping all I/O and preventing
resubmissions by poisoning the interrupt URBs at disconnect and using a
dedicated disconnected flag.
This also makes sure that all I/O has completed by the time the
disconnect callback returns.
Fixes: 9d974b2a06 ("USB: legousbtower.c: remove err() usage")
Fixes: fef526cae7 ("USB: legousbtower: remove custom debug macro")
Fixes: 4dae996380 ("USB: legotower: remove custom debug macro and module parameter")
Cc: stable <stable@vger.kernel.org> # 3.5
Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://lore.kernel.org/r/20190919083039.30898-4-johan@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
INTEL_PMC_MAX_GENERIC is currently 32, which exceeds the 18
contiguous MSR indices reserved by Intel for event selectors.
Since some machines actually have MSRs past the reserved range,
filtering them against x86_pmu.num_counters_gp may have false
positives. Cut the list to 18 entries to avoid this.
Reported-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Suggested-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Jim Mattson <jamttson@google.com>
Fixes: e2ada66ec4 ("kvm: x86: Add Intel PMU MSRs to msrs_to_save[]", 2019-08-21)
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The "run_isr" flag is used for preventing the driver from
calling the interrupt service routine in its runtime resume
callback when the driver is expecting completion to a
command, but what that basically does is that it hides the
real problem. The real problem is that the controller is
allowed to suspend in the middle of command execution.
As a more appropriate fix for the problem, using autosuspend
delay time that matches UCSI_TIMEOUT_MS (5s). That prevents
the controller from suspending while still in the middle of
executing a command.
This fixes a potential deadlock. Both ccg_read() and
ccg_write() are called with the mutex already taken at least
from ccg_send_command(). In ccg_read() and ccg_write, the
mutex is only acquired so that run_isr flag can be set.
Fixes: f0e4cd948b ("usb: typec: ucsi: ccg: add runtime pm workaround")
Cc: stable@vger.kernel.org
Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Link: https://lore.kernel.org/r/20191004100219.71152-2-heikki.krogerus@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
CPUs affected by Neoverse-N1 #1542419 may execute a stale instruction if
it was recently modified. The affected sequence requires freshly written
instructions to be executable before a branch to them is updated.
There are very few places in the kernel that modify executable text,
all but one come with sufficient synchronisation:
* The module loader's flush_module_icache() calls flush_icache_range(),
which does a kick_all_cpus_sync()
* bpf_int_jit_compile() calls flush_icache_range().
* Kprobes calls aarch64_insn_patch_text(), which does its work in
stop_machine().
* static keys and ftrace both patch between nops and branches to
existing kernel code (not generated code).
The affected sequence is the interaction between ftrace and modules.
The module PLT is cleaned using __flush_icache_range() as the trampoline
shouldn't be executable until we update the branch to it.
Drop the double-underscore so that this path runs kick_all_cpus_sync()
too.
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Commit bd82d4bd21 ("arm64: Fix incorrect irqflag restore for priority
masking") added a macro to the entry.S call paths that leave the
PSTATE.I bit set. This tells the pPNMI masking logic that interrupts
are masked by the CPU, not by the PMR. This value is read back by
local_daif_save().
Commit bd82d4bd21 added this call to el0_svc, as el0_svc_handler
is called with interrupts masked. el0_svc_compat was missed, but should
be covered in the same way as both of these paths end up in
el0_svc_common(), which expects to unmask interrupts.
Fixes: bd82d4bd21 ("arm64: Fix incorrect irqflag restore for priority masking")
Signed-off-by: James Morse <james.morse@arm.com>
Cc: Julien Thierry <julien.thierry.kdev@gmail.com>
Signed-off-by: Will Deacon <will@kernel.org>
If we take an unhandled fault in the kernel, we call show_pte() to dump
the {PGDP,PGD,PUD,PMD,PTE} values for the corresponding page table walk,
where the PGDP value is virt_to_phys(mm->pgd).
The boot-time and runtime kernel page tables, init_pg_dir and
swapper_pg_dir respectively, are kernel symbols. Thus, it is not valid
to call virt_to_phys() on either of these, though we'll do so if we take
a fault on a TTBR1 address.
When CONFIG_DEBUG_VIRTUAL is not selected, virt_to_phys() will silently
fix this up. However, when CONFIG_DEBUG_VIRTUAL is selected, this
results in splats as below. Depending on when these occur, they can
happen to suppress information needed to debug the original unhandled
fault, such as the backtrace:
| Unable to handle kernel paging request at virtual address ffff7fffec73cf0f
| Mem abort info:
| ESR = 0x96000004
| EC = 0x25: DABT (current EL), IL = 32 bits
| SET = 0, FnV = 0
| EA = 0, S1PTW = 0
| Data abort info:
| ISV = 0, ISS = 0x00000004
| CM = 0, WnR = 0
| ------------[ cut here ]------------
| virt_to_phys used for non-linear address: 00000000102c9dbe (swapper_pg_dir+0x0/0x1000)
| WARNING: CPU: 1 PID: 7558 at arch/arm64/mm/physaddr.c:15 __virt_to_phys+0xe0/0x170 arch/arm64/mm/physaddr.c:12
| Kernel panic - not syncing: panic_on_warn set ...
| SMP: stopping secondary CPUs
| Dumping ftrace buffer:
| (ftrace buffer empty)
| Kernel Offset: disabled
| CPU features: 0x0002,23000438
| Memory Limit: none
| Rebooting in 1 seconds..
We can avoid this by ensuring that we call __pa_symbol() for
init_mm.pgd, as this will always be a kernel symbol. As the dumped
{PGD,PUD,PMD,PTE} values are the raw values from the relevant entries we
don't need to handle these specially.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
The HWCAP framework will detect a new capability based on the sanitized
version of the ID registers.
Sanitization is based on a whitelist, so any field not described will end
up to be zeroed.
At the moment, ID_AA64ISAR1_EL1.FRINTTS is not described in
ftr_id_aa64isar1. This means the field will be zeroed and therefore the
userspace will not be able to see the HWCAP even if the hardware
supports the feature.
This can be fixed by describing the field in ftr_id_aa64isar1.
Fixes: ca9503fc9e ("arm64: Expose FRINT capabilities to userspace")
Signed-off-by: Julien Grall <julien.grall@arm.com>
Cc: mark.brown@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
As of ac7c3e4ff4 ("compiler: enable CONFIG_OPTIMIZE_INLINING forcibly"),
inline functions are no longer annotated with '__always_inline', which
allows the compiler to decide whether inlining is really a good idea or
not. Although this is a great idea on paper, the reality is that AArch64
GCC prior to 9.1 has been shown to get confused when creating an
out-of-line copy of a function passing explicit 'register' variables
into an inline assembly block:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91111
It's not clear whether this is specific to arm64 or not but, for now,
ensure that all of our functions using 'register' variables are marked
as '__always_inline' so that the old behaviour is effectively preserved.
Hopefully other architectures are luckier with their compilers.
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Nicolas Saenz Julienne <nsaenzjulienne@suse.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Drop the redundant disconnect mutex which was introduced after the
open-disconnect race had been addressed generally in USB core by commit
d4ead16f50 ("USB: prevent char device open/deregister race").
Specifically, the rw-semaphore in core guarantees that all calls to
open() will have completed and that no new calls to open() will occur
after usb_deregister_dev() returns. Hence there is no need use the
driver data as an inverted disconnected flag.
Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://lore.kernel.org/r/20190926091228.24634-8-johan@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
According to Greg KH, it has been generally agreed that when a USB
driver encounters an unknown error (or one it can't handle directly),
it should just give up instead of going into a potentially infinite
retry loop.
The three codes -EPROTO, -EILSEQ, and -ETIME fall into this category.
They can be caused by bus errors such as packet loss or corruption,
attempting to communicate with a disconnected device, or by malicious
firmware. Nowadays the extent of packet loss or corruption is
negligible, so it should be safe for a driver to give up whenever one
of these errors occurs.
Although the yurex driver handles -EILSEQ errors in this way, it
doesn't do the same for -EPROTO (as discovered by the syzbot fuzzer)
or other unrecognized errors. This patch adjusts the driver so that
it doesn't log an error message for -EPROTO or -ETIME, and it doesn't
retry after any errors.
Reported-and-tested-by: syzbot+b24d736f18a1541ad550@syzkaller.appspotmail.com
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
CC: Tomoki Sekiyama <tomoki.sekiyama@gmail.com>
CC: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/Pine.LNX.4.44L0.1909171245410.1590-100000@iolanthe.rowland.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The driver was using its struct usb_device pointer as an inverted
disconnected flag, but was setting it to NULL before making sure all
completion handlers had run. This could lead to a NULL-pointer
dereference in a number of dev_dbg statements in the completion handlers
which relies on said pointer.
The pointer was also dereferenced unconditionally in a dev_dbg statement
release() something which would lead to a NULL-deref whenever a device
was disconnected before the final character-device close if debugging
was enabled.
Fix this by unconditionally stopping all I/O and preventing
resubmissions by poisoning the interrupt URBs at disconnect and using a
dedicated disconnected flag.
This also makes sure that all I/O has completed by the time the
disconnect callback returns.
Fixes: 1ef37c6047 ("USB: adutux: remove custom debug macro and module parameter")
Fixes: 66d4bc30d1 ("USB: adutux: remove custom debug macro")
Cc: stable <stable@vger.kernel.org> # 3.12
Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://lore.kernel.org/r/20190925092913.8608-2-johan@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fix NULL-pointer dereferences on open() and write() which can be
triggered by a malicious USB device.
The current URB allocation helper would fail to initialise the newly
allocated URB if the device has unexpected endpoint descriptors,
something which could lead NULL-pointer dereferences in a number of
open() and write() paths when accessing the URB. For example:
BUG: kernel NULL pointer dereference, address: 0000000000000000
...
RIP: 0010:usb_clear_halt+0x11/0xc0
...
Call Trace:
? tty_port_open+0x4d/0xd0
keyspan_open+0x70/0x160 [keyspan]
serial_port_activate+0x5b/0x80 [usbserial]
tty_port_open+0x7b/0xd0
? check_tty_count+0x43/0xa0
tty_open+0xf1/0x490
BUG: kernel NULL pointer dereference, address: 0000000000000000
...
RIP: 0010:keyspan_write+0x14e/0x1f3 [keyspan]
...
Call Trace:
serial_write+0x43/0xa0 [usbserial]
n_tty_write+0x1af/0x4f0
? do_wait_intr_irq+0x80/0x80
? process_echoes+0x60/0x60
tty_write+0x13f/0x2f0
BUG: kernel NULL pointer dereference, address: 0000000000000000
...
RIP: 0010:keyspan_usa26_send_setup+0x298/0x305 [keyspan]
...
Call Trace:
keyspan_open+0x10f/0x160 [keyspan]
serial_port_activate+0x5b/0x80 [usbserial]
tty_port_open+0x7b/0xd0
? check_tty_count+0x43/0xa0
tty_open+0xf1/0x490
Fixes: fdcba53e2d ("fix for bugzilla #7544 (keyspan USB-to-serial converter)")
Cc: stable <stable@vger.kernel.org> # 2.6.21
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Johan Hovold <johan@kernel.org>
The addr variable in prom_free_prom_memory() has been unused since
commit b3c948e2c0 ("MIPS: msp: Record prom memory"), causing a warning
& build failure due to -Werror. Remove the unused variable.
Signed-off-by: Paul Burton <paul.burton@mips.com>
Fixes: b3c948e2c0 ("MIPS: msp: Record prom memory")
Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
Cc: linux-mips@vger.kernel.org
Commit b3c948e2c0 ("MIPS: msp: Record prom memory") introduced use of
a MAX_PROM_MEM value but didn't define it. A bounds check in
prom_meminit() suggests its value was supposed to be 5, so define it as
such & adjust the bounds check to use the macro rather than a magic
number.
Signed-off-by: Paul Burton <paul.burton@mips.com>
Fixes: b3c948e2c0 ("MIPS: msp: Record prom memory")
Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
Cc: linux-mips@vger.kernel.org
Today, put_compat_statfs64() disallows nearly any field value over
2^32 if f_bsize is only 32 bits, but that makes no sense.
compat_statfs64 is there for the explicit purpose of providing 64-bit
fields for f_files, f_ffree, etc. And f_bsize is always only 32 bits.
As a result, 32-bit userspace gets -EOVERFLOW for i.e. large file
counts even with -D_FILE_OFFSET_BITS=64 set.
In reality, only f_bsize and f_frsize can legitimately overflow
(fields like f_type and f_namelen should never be large), so test
only those fields.
This bug was discussed at length some time ago, and this is the proposal
Al suggested at https://lkml.org/lkml/2018/8/6/640. It seemed to get
dropped amid the discussion of other related changes, but this
part seems obviously correct on its own, so I've picked it up and
sent it, for expediency.
Fixes: 64d2ab32ef ("vfs: fix put_compat_statfs64() does not handle errors")
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
sparse warns about incorrect type when using __be64 data.
It is not being converted to CPU-endian but it should be.
Fixes these sparse warnings:
../block/sed-opal.c:375:20: warning: incorrect type in assignment (different base types)
../block/sed-opal.c:375:20: expected unsigned long long [usertype] align
../block/sed-opal.c:375:20: got restricted __be64 const [usertype] alignment_granularity
../block/sed-opal.c:376:25: warning: incorrect type in assignment (different base types)
../block/sed-opal.c:376:25: expected unsigned long long [usertype] lowest_lba
../block/sed-opal.c:376:25: got restricted __be64 const [usertype] lowest_aligned_lba
Fixes: 455a7b238c ("block: Add Sed-opal library")
Cc: Scott Bauer <scott.bauer@intel.com>
Cc: Rafael Antognolli <rafael.antognolli@intel.com>
Cc: linux-block@vger.kernel.org
Reviewed-by: Jon Derrick <jonathan.derrick@intel.com>
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Clang warns:
lib/test_user_copy.c:96:10: warning: using the result of an assignment
as a condition without parentheses [-Wparentheses]
if (ret |= test(umem_src == NULL, "kmalloc failed"))
~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
lib/test_user_copy.c:96:10: note: place parentheses around the
assignment to silence this warning
if (ret |= test(umem_src == NULL, "kmalloc failed"))
^
( )
lib/test_user_copy.c:96:10: note: use '!=' to turn this compound
assignment into an inequality comparison
if (ret |= test(umem_src == NULL, "kmalloc failed"))
^~
!=
Add the parentheses as it suggests because this is intentional.
Fixes: f5a1a536fa ("lib: introduce copy_struct_from_user() helper")
Link: https://github.com/ClangBuiltLinux/linux/issues/731
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Acked-by: Aleksa Sarai <cyphar@cyphar.com>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Link: https://lore.kernel.org/r/20191003171121.2723619-1-natechancellor@gmail.com
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Apparently a refactoring patch brought a bug, that was caught
by syzbot [1]
Original code was correct, do not try to be smarter than the
compiler :/
[1]
BUG: KASAN: slab-out-of-bounds in tcp_zerocopy_receive net/ipv4/tcp.c:1807 [inline]
BUG: KASAN: slab-out-of-bounds in do_tcp_getsockopt.isra.0+0x2c6c/0x3120 net/ipv4/tcp.c:3654
Read of size 4 at addr ffff8880943cf188 by task syz-executor.2/17508
CPU: 0 PID: 17508 Comm: syz-executor.2 Not tainted 5.3.0-rc7+ #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_address_description.cold+0xd4/0x306 mm/kasan/report.c:351
__kasan_report.cold+0x1b/0x36 mm/kasan/report.c:482
kasan_report+0x12/0x17 mm/kasan/common.c:618
__asan_report_load4_noabort+0x14/0x20 mm/kasan/generic_report.c:131
tcp_zerocopy_receive net/ipv4/tcp.c:1807 [inline]
do_tcp_getsockopt.isra.0+0x2c6c/0x3120 net/ipv4/tcp.c:3654
tcp_getsockopt+0xbf/0xe0 net/ipv4/tcp.c:3680
sock_common_getsockopt+0x94/0xd0 net/core/sock.c:3098
__sys_getsockopt+0x16d/0x310 net/socket.c:2129
__do_sys_getsockopt net/socket.c:2144 [inline]
__se_sys_getsockopt net/socket.c:2141 [inline]
__x64_sys_getsockopt+0xbe/0x150 net/socket.c:2141
do_syscall_64+0xfd/0x6a0 arch/x86/entry/common.c:296
Fixes: d8e18a516f ("net: Use skb accessors in network core")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull kgdb update from Daniel Thompson:
"This is just a single patch adding a new reviewer for kgdb. New
reviewers will be a big help so I decided to consider this to be a
fix!
I'm looking forward to working more closely with Doug"
* tag 'kgdb-5.4-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux:
MAINTAINERS: kgdb: Add myself as a reviewer for kgdb/kdb
string_to_context_struct() may garble the context string, so we need to
copy back the contents again from the old context struct to avoid
storing the corrupted context.
Since string_to_context_struct() tokenizes (and therefore truncates) the
context string and we are later potentially copying it with kstrdup(),
this may eventually cause pieces of uninitialized kernel memory to be
disclosed to userspace (when copying to userspace based on the stored
length and not the null character).
How to reproduce on Fedora and similar:
# dnf install -y memcached
# systemctl start memcached
# semodule -d memcached
# load_policy
# load_policy
# systemctl stop memcached
# ausearch -m AVC
type=AVC msg=audit(1570090572.648:313): avc: denied { signal } for pid=1 comm="systemd" scontext=system_u:system_r:init_t:s0 tcontext=system_u:object_r:unlabeled_t:s0 tclass=process permissive=0 trawcon=73797374656D5F75007400000000000070BE6E847296FFFF726F6D000096FFFF76
Cc: stable@vger.kernel.org
Reported-by: Milos Malik <mmalik@redhat.com>
Fixes: ee1a84fdfe ("selinux: overhaul sidtab to fix bug and improve performance")
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Fixes for omaps for v5.4-rc cycle
Here are fixes for omaps to deal with few regressions, and to fix
more boot time errors and warnings:
- The recent ti-sysc interconnect target module driver changes had
incorrect clock bits for both clocks and dts that cause warnings
- For omap3-gta04, gpio changes caused the LCD to break a while back,
and after discussing things the right fix is to set spi-cs-high
- Recent omapdrm changes to use generic panels caused tfp410 to be
disabled as we now must enable the generic support for it in
defconfig
- Recent omapdrm and backlight changes also finally made droid4 LCD
to work, so let's enable it in the defconfig it can be used out
of the box. This is not strictly a fix, but we still also have the
older CONFIG_MFD_TI_LMU options available so this cuts down the
confusion for trying to guess which display and which backlight
is needed
- Recent ti-sysc interconnect target module changes need the gpio
module disabled on some boards, but this now needs to happen at
the module level, not at the gpio driver level
- Recent changes to probe system timers with ti-sysc caused warnings
about mismatch in syconfig registers, so let's configure the option
for RESET_STATUS as available in the TRMs
- Recent changes to probe LCDC with ti-sysc caused warnings about
mismatch in sysconfig registers, so let's configure the missing
idlemodes for both platform data and dts as documented in TRMs
- Since we moved mach-omap2 to probe with device tree, we've been
getting voltage controller warnings. Turns out this code is no
longer needed, so let's just remove omap2_set_init_voltage() to
get rid of the pointless warnings
- Configure am4372 dispc memory bandwidth to avoid underflow errors
* tag 'omap-for-v5.4/fixes-rc1-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap:
ARM: dts: am4372: Set memory bandwidth limit for DISPC
ARM: OMAP2+: Fix warnings with broken omap2_set_init_voltage()
ARM: OMAP2+: Add missing LCDC midlemode for am335x
ARM: OMAP2+: Fix missing reset done flag for am3 and am43
ARM: dts: Fix gpio0 flags for am335x-icev2
ARM: omap2plus_defconfig: Enable more droid4 devices as loadable modules
ARM: omap2plus_defconfig: Enable DRM_TI_TFP410
DTS: ARM: gta04: introduce legacy spi-cs-high to make display work again
ARM: dts: Fix wrong clocks for dra7 mcasp
clk: ti: dra7: Fix mcasp8 clock bits
Link: https://lore.kernel.org/r/pull-1570040410-308159@atomide.com
Signed-off-by: Olof Johansson <olof@lixom.net>
This textsearch code example does not need the '\' escapes and they can
be misleading to someone reading the example. Also, gcc and sparse warn
that the "\%d" is an unknown escape sequence.
Fixes: 5968a70d7a ("textsearch: fix kernel-doc warnings and add kernel-api section")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Prior to this change an application sending <= 1MSS worth of data and
enabling UDP GSO would fail if the system had SW GSO enabled, but the
same send would succeed if HW GSO offload is enabled. In addition to this
inconsistency the error in the SW GSO case does not get back to the
application if sending out of a real device so the user is unaware of this
failure.
With this change we only perform GSO if the # of segments is > 1 even
if the application has enabled segmentation. I've also updated the
relevant udpgso selftests.
Fixes: bec1f6f697 ("udp: generate gso with UDP_SEGMENT")
Signed-off-by: Josh Hunt <johunt@akamai.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit dfec0ee22c ("udp: Record gso_segs when supporting UDP segmentation offload")
added gso_segs calculation, but incorrectly got sizeof() the pointer and
not the underlying data type. In addition let's fix the v6 case.
Fixes: bec1f6f697 ("udp: generate gso with UDP_SEGMENT")
Fixes: dfec0ee22c ("udp: Record gso_segs when supporting UDP segmentation offload")
Signed-off-by: Josh Hunt <johunt@akamai.com>
Reviewed-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
I'm interested in kdb / kgdb and have sent various fixes over the
years. I'd like to get CCed on patches so I can be aware of them and
also help review.
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Acked-by: Daniel Thompson <daniel.thompson@linaro.org>
Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
This began with a syzbot report. syzkaller was injecting
IPv6 TCP SYN packets having a v4mapped source address.
After an unsuccessful 4-tuple lookup, TCP creates a request
socket (SYN_RECV) and calls reqsk_queue_hash_req()
reqsk_queue_hash_req() calls sk_ehashfn(sk)
At this point we have AF_INET6 sockets, and the heuristic
used by sk_ehashfn() to either hash the IPv4 or IPv6 addresses
is to use ipv6_addr_v4mapped(&sk->sk_v6_daddr)
For the particular spoofed packet, we end up hashing V4 addresses
which were not initialized by the TCP IPv6 stack, so KMSAN fired
a warning.
I first fixed sk_ehashfn() to test both source and destination addresses,
but then faced various problems, including user-space programs
like packetdrill that had similar assumptions.
Instead of trying to fix the whole ecosystem, it is better
to admit that we have a dual stack behavior, and that we
can not build linux kernels without V4 stack anyway.
The dual stack API automatically forces the traffic to be IPv4
if v4mapped addresses are used at bind() or connect(), so it makes
no sense to allow IPv6 traffic to use the same v4mapped class.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Florian Westphal <fw@strlen.de>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Coverity caught a case where we could return with a uninitialized value
in ret in process_leaf. This is actually pretty likely because we could
very easily run into a block group item key and have a garbage value in
ret and think there was an errror. Fix this by initializing ret to 0.
Reported-by: Colin Ian King <colin.king@canonical.com>
Fixes: fd708b81d9 ("Btrfs: add a extent ref verify tool")
CC: stable@vger.kernel.org # 4.19+
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Current versions of Intel's SDM incorrectly state that "bits 31:15 of
the VM-Entry exception error-code field" must be zero. In reality, bits
31:16 must be zero, i.e. error codes are 16-bit values.
The bogus error code check manifests as an unexpected VM-Entry failure
due to an invalid code field (error number 7) in L1, e.g. when injecting
a #GP with error_code=0x9f00.
Nadav previously reported the bug[*], both to KVM and Intel, and fixed
the associated kvm-unit-test.
[*] https://patchwork.kernel.org/patch/11124749/
Reported-by: Nadav Amit <namit@vmware.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
INTEL_PMC_MAX_GENERIC is currently 32, which exceeds the 18 contiguous
MSR indices reserved by Intel for event selectors. Since some machines
actually have MSRs past the reserved range, these may survive the
filtering of msrs_to_save array and would be rejected by KVM_GET/SET_MSR.
To avoid this, cut the list to whatever CPUID reports for the host's
architectural PMU.
Reported-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Suggested-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Jim Mattson <jmattson@google.com>
Fixes: e2ada66ec4 ("kvm: x86: Add Intel PMU MSRs to msrs_to_save[]", 2019-08-21)
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Fix the following build error from "make TARGETS=kvm kselftest":
libkvm.a(assert.o): relocation R_X86_64_32 against `.rodata.str1.1' can not be used when making a PIE object; recompile with -fPIC
This error is seen when build is done from the main Makefile using
kselftest target. In this case KBUILD_CPPFLAGS and CC_OPTION_CFLAGS
are defined.
When build is invoked using:
"make -C tools/testing/selftests/kvm" KBUILD_CPPFLAGS and CC_OPTION_CFLAGS
aren't defined.
There is no need to pass in KBUILD_CPPFLAGS and CC_OPTION_CFLAGS for the
check to determine if --no-pie is necessary, which is the case when these
two aren't defined when "make -C tools/testing/selftests/kvm" runs.
Fix it by simplifying the no-pie-option logic. With this change, both
build variations work.
"make TARGETS=kvm kselftest"
"make -C tools/testing/selftests/kvm"
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
One tilcdc fix was left out in drm-misc-next-fixes and didn't make it
during the merge window. Let's bring it into drm-misc-fixes.
Signed-off-by: Maxime Ripard <mripard@kernel.org>
The OMAP36xx and AM/DM37x TRMs say that the maximum divider for DSS fclk
(in CM_CLKSEL_DSS) is 32. Experimentation shows that this is not
correct, and using divider of 32 breaks DSS with a flood or underflows
and sync losts. Dividers up to 31 seem to work fine.
There is another patch to the DT files to limit the divider correctly,
but as the DSS driver also needs to know the maximum divider to be able
to iteratively find good rates, we also need to do the fix in the DSS
driver.
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: Adam Ford <aford173@gmail.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20191002122542.8449-1-tomi.valkeinen@ti.com
Tested-by: Adam Ford <aford173@gmail.com>
Reviewed-by: Jyri Sarha <jsarha@ti.com>
There was a integer wraparound when mode_clock became too high,
and we didn't correct for the FEC overhead factor when dividing,
with the calculations breaking at HBR3.
As a result our calculated bpp was way too high, and the link width
limitation never came into effect.
Print out the resulting bpp calcululations as a sanity check, just
in case we ever have to debug it later on again.
We also used the wrong factor for FEC. While bspec mentions 2.4%,
all the calculations use 1/0.972261, and the same ratio should be
applied to data M/N as well, so use it there when FEC is enabled.
This fixes the FIFO underrun we are seeing with FEC enabled.
Changes since v2:
- Handle fec_enable in intel_link_compute_m_n, so only data M/N is adjusted. (Ville)
- Fix initial hardware readout for FEC. (Ville)
Changes since v3:
- Remove bogus fec_to_mode_clock. (Ville)
Changes since v4:
- Use the correct register for icl. (Ville)
- Split hw readout to a separate patch.
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Fixes: d9218c8f6c ("drm/i915/dp: Add helpers for Compressed BPP and Slice Count for DSC")
Cc: <stable@vger.kernel.org> # v5.0+
Cc: Manasi Navare <manasi.d.navare@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190925082110.17439-1-maarten.lankhorst@linux.intel.com
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
(cherry picked from commit ed06efb801)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
The stmmac driver will try to acquire its private mutex during suspend
via phylink_resolve() -> stmmac_mac_link_down() -> stmmac_eee_init().
However, the phylink configuration is updated with the private mutex
held already, which causes a deadlock during suspend.
Fix this by moving the phylink configuration updates out of the region
of code protected by the private mutex.
Fixes: 19e13cb27b ("net: stmmac: Hold rtnl lock in suspend/resume callbacks")
Suggested-by: Bitan Biswas <bbiswas@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Randy Dunlap reports on the sparse list that sparse warns about this
expression:
of_irq->percpu ? free_percpu_irq(of_irq->irq, clkevt) :
free_irq(of_irq->irq, clkevt);
and honestly, sparse is correct to warn. The return type of
free_percpu_irq() is 'void', while free_irq() returns a 'const void *'
that is the devname argument passed in to the request_irq().
You can't mix a void type with a non-void types in a conditional
expression according to the C standard. It so happens that gcc seems to
accept it - and the resulting type of the expression is void - but
there's really no reason for the kernel to have this kind of
non-standard expression with no real upside.
The natural way to write that expression is with an if-statement:
if (of_irq->percpu)
free_percpu_irq(of_irq->irq, clkevt);
else
free_irq(of_irq->irq, clkevt);
which is more legible anyway.
I'm not sure why that timer-of code seems to have this odd pattern. It
does the same at allocation time, but at least there the types match,
and it makes sense as an expression.
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull timer fix from Ingo Molnar:
"Fix a broadcast-timer handling race that can result in spuriously and
indefinitely delayed hrtimers and even RCU stalls if the system is
otherwise quiet"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
tick: broadcast-hrtimer: Fix a race in bc_set_next
In function mdio_sc_cfg_reg_write(), variable "reg_value" could be
uninitialized if regmap_read() fails. However, "reg_value" is used
to decide the control flow later in the if statement, which is
potentially unsafe.
Signed-off-by: Yizhuo <yzhai003@ucr.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
On some SGI machines (IP28 and IP30) a small region of memory is mirrored
to pyhsical address 0 for exception vectors while rest of the memory
is reachable at a higher physical address. ARC PROM marks this
region as reserved, but with commit a94e4f24ec ("MIPS: init: Drop
boot_mem_map") this chunk is used, when searching for start of ram,
which breaks at least IP28 and IP30 machines. To fix this
add_region_memory() checks for start address < PHYS_OFFSET and ignores
these chunks.
Fixes: a94e4f24ec ("MIPS: init: Drop boot_mem_map")
Signed-off-by: Thomas Bogendoerfer <tbogendoerfer@suse.de>
Signed-off-by: Paul Burton <paul.burton@mips.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: James Hogan <jhogan@kernel.org>
Cc: linux-mips@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Versions of binutils prior to 2.25 are unable to link our VDSO due to an
unsupported R_MIPS_PC32 relocation generated by the ".word _start - ."
line of the inline asm in get_vdso_base(). As such, the intent is that
when building with binutils older than 2.25 we don't build code for
gettimeofday() & friends in the VDSO that rely upon get_vdso_base().
Commit 24640f233b ("mips: Add support for generic vDSO") converted us
to using generic VDSO infrastructure, and as part of that the
gettimeofday() functionality moved to a new vgettimeofday.c file. The
check for binutils < 2.25 wasn't updated to handle this new filename,
and so it continues trying to remove the old unused filename from the
build. The end result is that we try to include the gettimeofday() code
in builds that will fail to link.
Fix this by updating the binutils < 2.25 case to remove vgettimeofday.c
from obj-vdso-y, rather than gettimeofday.c.
Signed-off-by: Paul Burton <paul.burton@mips.com>
Fixes: 24640f233b ("mips: Add support for generic vDSO")
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: linux-mips@vger.kernel.org
Pull erofs fixes from Gao Xiang:
"Three patches to address regressions due to recent cleanups, mainly
found by stress test on latest mainline kernel (no more regression out
compared with older kernels for more than a week)
One additional patch updates sub-entries in MAINTAINERS.
Summary:
- Fix error handling in erofs_read_superblock
- Fix locking in erofs_get_meta_page
- Fix inplace behavior due to decompression frontend cleanup
- Update sub-entries in MAINTAINERS in order to better blame"
* tag 'erofs-for-5.4-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
erofs: fix mis-inplace determination related with noio chain
erofs: fix erofs_get_meta_page locking due to a cleanup
MAINTAINERS: erofs: complete sub-entries for erofs
erofs: fix return value check in erofs_read_superblock()
On Tue, Oct 01, 2019 at 10:14:40AM -0700, Linus Torvalds wrote:
> The previous state of the file didn't have that 0xa at the end, so you get that
>
>
> -EXPORT_SYMBOL_GPL(add_bootloader_randomness);
> \ No newline at end of file
> +EXPORT_SYMBOL_GPL(add_bootloader_randomness);
>
> which is "the '-' line doesn't have a newline, the '+' line does" marker.
Aaha, that makes total sense, thanks for explaining. Oh well, let's fix
it then so that people don't scratch heads like me.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In case a user process using xenbus has open transactions and is killed
e.g. via ctrl-C the following cleanup of the allocated resources might
result in a deadlock due to trying to end a transaction in the xenbus
worker thread:
[ 2551.474706] INFO: task xenbus:37 blocked for more than 120 seconds.
[ 2551.492215] Tainted: P OE 5.0.0-29-generic #5
[ 2551.510263] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2551.528585] xenbus D 0 37 2 0x80000080
[ 2551.528590] Call Trace:
[ 2551.528603] __schedule+0x2c0/0x870
[ 2551.528606] ? _cond_resched+0x19/0x40
[ 2551.528632] schedule+0x2c/0x70
[ 2551.528637] xs_talkv+0x1ec/0x2b0
[ 2551.528642] ? wait_woken+0x80/0x80
[ 2551.528645] xs_single+0x53/0x80
[ 2551.528648] xenbus_transaction_end+0x3b/0x70
[ 2551.528651] xenbus_file_free+0x5a/0x160
[ 2551.528654] xenbus_dev_queue_reply+0xc4/0x220
[ 2551.528657] xenbus_thread+0x7de/0x880
[ 2551.528660] ? wait_woken+0x80/0x80
[ 2551.528665] kthread+0x121/0x140
[ 2551.528667] ? xb_read+0x1d0/0x1d0
[ 2551.528670] ? kthread_park+0x90/0x90
[ 2551.528673] ret_from_fork+0x35/0x40
Fix this by doing the cleanup via a workqueue instead.
Reported-by: James Dingwall <james@dingwall.me.uk>
Fixes: fd8aa9095a ("xen: optimize xenbus driver for multiple concurrent xenstore accesses")
Cc: <stable@vger.kernel.org> # 4.11
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for net:
1) Remove the skb_ext_del from nf_reset, and renames it to a more
fitting nf_reset_ct(). Patch from Florian Westphal.
2) Fix deadlock in nft_connlimit between packet path updates and
the garbage collector.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The newly added optional file argument does not validate if the
file is indeed a watchdog, e.g.:
./watchdog-test -f /dev/zero
Watchdog Ticking Away!
Fix it by confirming that the WDIOC_GETSUPPORT ioctl succeeds.
Fixes: a4864a33f5 ("selftests: watchdog: Add optional file argument")
Reported-by: Eugeniu Rosca <erosca@de.adit-jv.com>
Signed-off-by: George G. Davis <george_davis@mentor.com>
Signed-off-by: Eugeniu Rosca <erosca@de.adit-jv.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
The binding has a typo where resets-names should read reset-names, which in
turn leads to a warning when the example is validated, since reset-names is
being used, and the binding prevent the usage of any property that isn't
described.
Fixes: 088e88be5a ("dt-bindings: phy: add binding for the Lantiq VRX200 and ARX300 PCIe PHYs")
Signed-off-by: Maxime Ripard <mripard@kernel.org>
Signed-off-by: Rob Herring <robh@kernel.org>
The example contains an SPI bus and device, but doesn't have the
appropriate size and address cells size.
This creates a DTC warning when the example is compiled since the default
ones will not match what the device uses. Let's add them to remove that
warning.
Fixes: f7356e4703 ("dt-bindings: iio: adc: ad7192: Add binding documentation for AD7192")
Signed-off-by: Maxime Ripard <mripard@kernel.org>
Signed-off-by: Rob Herring <robh@kernel.org>
The AD7192 binding describes two regulator properties, avdd-supply and
dvdd-supply, but describes it as a constant string that must be avdd and
dvdd. This is wrong since a *-supply property is actually a phandle, and
results in warnings when the example is validated (or any device tree using
that device, for that matter).
Let's remove that requirement.
Fixes: f7356e4703 ("dt-bindings: iio: adc: ad7192: Add binding documentation for AD7192")
Signed-off-by: Maxime Ripard <mripard@kernel.org>
Signed-off-by: Rob Herring <robh@kernel.org>
The linux,rc-map-name property is described using an enum, yet a value has
been put in that enum twice, resulting in a warning. Let's fix that.
Fixes: 7c31b9d673 ("media: dt-bindings: media: Add YAML schemas for the generic RC bindings")
Signed-off-by: Maxime Ripard <mripard@kernel.org>
Signed-off-by: Rob Herring <robh@kernel.org>
The fsl,dsp binding requires a memory-region, yet its example doesn't have
one which results in a warning. Let's add a memory-region phandle to the
example.
Fixes: 7db2f2dfc7 ("dt-bindings: dsp: fsl: Add DSP core binding support")
Signed-off-by: Maxime Ripard <mripard@kernel.org>
Reviewed-by: Daniel Baluta <daniel.baluta@nxp.com>
Signed-off-by: Rob Herring <robh@kernel.org>
Just like all the other variants, this one passes invalid
compile-time options with clang after the new code got
merged:
clang: error: unknown argument: '-mpreferred-stack-boundary=4'
scripts/Makefile.build:265: recipe for target 'drivers/gpu/drm/amd/amdgpu/../display/dc/dcn21/dcn21_resource.o' failed
Use the same variant that we have for dcn20 to fix compilation.
Fixes: eced51f9ba ("drm/amd/display: Add hubp block for Renoir (v2)")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Without CONFIG_DEBUG_FS, we get a warning for an unused
variable:
drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:6020:33: error: unused variable 'source' [-Werror,-Wunused-variable]
Hide the variable in an #ifdef like its only users.
Fixes: 14b2584636 ("drm/amd/display: add functionality to grab DPRX CRC entries.")
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The type definition for 'uint' clashes with the generic kernel
headers:
drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn21/display_mode_vba_21.c:43:22: error: redefinition of typedef 'uint' is a C11 feature [-Werror,-Wtypedef-redefinition]
include/linux/types.h:92:23: note: previous definition is here
Just remove this type and use plain 'unsigned int' consistently,
as it is already use almost everywhere in this file.
Fixes: b04641a3f4 ("drm/amd/display: Add Renoir DML")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
An earlier patch of mine disabled some #warning statements
that get in the way of build testing, but then another
instance was added around the same time.
Remove that as well.
Fixes: b5203d16ae ("drm/amd/amdgpu: hide #warning for missing DC config")
Fixes: e1c14c4339 ("drm/amdgpu: Enable DC on Renoir")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
When CONFIG_PERF_EVENTS is disabled, we cannot compile the pmu
portion of the amdgpu driver:
drivers/gpu/drm/amd/amdgpu/amdgpu_pmu.c:48:38: error: no member named 'hw' in 'struct perf_event'
struct hw_perf_event *hwc = &event->hw;
~~~~~ ^
drivers/gpu/drm/amd/amdgpu/amdgpu_pmu.c:51:13: error: no member named 'attr' in 'struct perf_event'
if (event->attr.type != event->pmu->type)
~~~~~ ^
...
The same bug was already fixed by commit d155bef063 ("amdgpu: make pmu
support optional") but broken again by what looks like an incorrectly
rebased patch.
Fixes: 64f55e6292 ("drm/amdgpu: Add RAS EEPROM table.")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
In acp_hw_init there are some allocations that needs to be released in
case of failure:
1- adev->acp.acp_genpd should be released if any allocation attemp for
adev->acp.acp_cell, adev->acp.acp_res or i2s_pdata fails.
2- all of those allocations should be released if
mfd_add_hotplug_devices or pm_genpd_add_device fail.
3- Release is needed in case of time out values expire.
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Navid Emamdoost <navid.emamdoost@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
when multithreading access sysfs of amdgpu_pm_info at the sametime.
the swsmu driver cause smu firmware hang.
eg:
single thread access:
Message A + Param A ==> right
Message B + Param B ==> right
Message C + Param C ==> right
multithreading access:
Message A + Param B ==> error
Message B + Param A ==> error
Message C + Param C ==> right
the patch will add sensor lock(mutex) to avoid this error.
Signed-off-by: Kevin Wang <kevin1.wang@amd.com>
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 5.3.x
v2:
change period from 10ms to 100ms (typo error)
too high frequence to update mertrics table will cause smu firmware
error,so change mertrics table update period from 1ms to 100ms
(navi10, 12, 14)
Signed-off-by: Kevin Wang <kevin1.wang@amd.com>
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 5.3.x
Commit 171a9bae68 ("staging/octeon: Allow test build on !MIPS") moved
the inclusion of a bunch of headers by various files in the Octeon
ethernet driver into a common header, but in doing so it changed the
order in which those headers are included.
Prior to the referenced commit drivers/staging/octeon/ethernet.c
included asm/octeon/cvmx-pip.h before asm/octeon/cvmx-ipd.h, which makes
use of the CVMX_PIP_SFT_RST definition pulled in by the former. After
commit 171a9bae68 ("staging/octeon: Allow test build on !MIPS") we
pull in asm/octeon/cvmx-ipd.h first & builds fail with:
In file included from drivers/staging/octeon/octeon-ethernet.h:27,
from drivers/staging/octeon/ethernet.c:22:
arch/mips/include/asm/octeon/cvmx-ipd.h: In function 'cvmx_ipd_free_ptr':
arch/mips/include/asm/octeon/cvmx-ipd.h:330:27: error: storage size of
'pip_sft_rst' isn't known
union cvmx_pip_sft_rst pip_sft_rst;
^~~~~~~~~~~
arch/mips/include/asm/octeon/cvmx-ipd.h:331:36: error: 'CVMX_PIP_SFT_RST'
undeclared (first use in this function); did you mean 'CVMX_CIU_SOFT_RST'?
pip_sft_rst.u64 = cvmx_read_csr(CVMX_PIP_SFT_RST);
^~~~~~~~~~~~~~~~
CVMX_CIU_SOFT_RST
arch/mips/include/asm/octeon/cvmx-ipd.h:331:36: note: each undeclared
identifier is reported only once for each function it appears in
arch/mips/include/asm/octeon/cvmx-ipd.h:330:27: warning: unused variable
'pip_sft_rst' [-Wunused-variable]
union cvmx_pip_sft_rst pip_sft_rst;
^~~~~~~~~~~
make[4]: *** [scripts/Makefile.build:266: drivers/staging/octeon/ethernet.o]
Error 1
make[3]: *** [scripts/Makefile.build:509: drivers/staging/octeon] Error 2
Fix this by having asm/octeon/cvmx-ipd.h include the
asm/octeon/cvmx-pip-defs.h header that it is reliant upon, rather than
requiring its users to pull in that header before it.
Signed-off-by: Paul Burton <paul.burton@mips.com>
Fixes: 171a9bae68 ("staging/octeon: Allow test build on !MIPS")
Cc: David S. Miller <davem@davemloft.net>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: linux-mips@vger.kernel.org
Cc: David S . Miller <davem@davemloft.net>
Cc: Matthew Wilcox <willy@infradead.org>
Vladimir Oltean says:
====================
SJA1105 DSA locking fixes for PTP
This series fixes the locking API usage problems spotted when compiling
the kernel with CONFIG_DEBUG_ATOMIC_SLEEP=y and CONFIG_DEBUG_SPINLOCK=y.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently this stack trace can be seen with CONFIG_DEBUG_ATOMIC_SLEEP=y:
[ 41.568348] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:909
[ 41.576757] in_atomic(): 1, irqs_disabled(): 0, pid: 208, name: ptp4l
[ 41.583212] INFO: lockdep is turned off.
[ 41.587123] CPU: 1 PID: 208 Comm: ptp4l Not tainted 5.3.0-rc6-01445-ge950f2d4bc7f-dirty #1827
[ 41.599873] [<c0313d7c>] (unwind_backtrace) from [<c030e13c>] (show_stack+0x10/0x14)
[ 41.607584] [<c030e13c>] (show_stack) from [<c1212d50>] (dump_stack+0xd4/0x100)
[ 41.614863] [<c1212d50>] (dump_stack) from [<c037dfc8>] (___might_sleep+0x1c8/0x2b4)
[ 41.622574] [<c037dfc8>] (___might_sleep) from [<c122ea90>] (__mutex_lock+0x48/0xab8)
[ 41.630368] [<c122ea90>] (__mutex_lock) from [<c122f51c>] (mutex_lock_nested+0x1c/0x24)
[ 41.638340] [<c122f51c>] (mutex_lock_nested) from [<c0c6fe08>] (sja1105_static_config_reload+0x30/0x27c)
[ 41.647779] [<c0c6fe08>] (sja1105_static_config_reload) from [<c0c7015c>] (sja1105_hwtstamp_set+0x108/0x1cc)
[ 41.657562] [<c0c7015c>] (sja1105_hwtstamp_set) from [<c0feb650>] (dev_ifsioc+0x18c/0x330)
[ 41.665788] [<c0feb650>] (dev_ifsioc) from [<c0febbd8>] (dev_ioctl+0x320/0x6e8)
[ 41.673064] [<c0febbd8>] (dev_ioctl) from [<c0f8b1f4>] (sock_ioctl+0x334/0x5e8)
[ 41.680340] [<c0f8b1f4>] (sock_ioctl) from [<c05404a8>] (do_vfs_ioctl+0xb0/0xa10)
[ 41.687789] [<c05404a8>] (do_vfs_ioctl) from [<c0540e3c>] (ksys_ioctl+0x34/0x58)
[ 41.695151] [<c0540e3c>] (ksys_ioctl) from [<c0301000>] (ret_fast_syscall+0x0/0x28)
[ 41.702768] Exception stack(0xe8495fa8 to 0xe8495ff0)
[ 41.707796] 5fa0: beff4a8c 00000001 00000011 000089b0 beff4a8c beff4a80
[ 41.715933] 5fc0: beff4a8c 00000001 0000000c 00000036 b6fa98c8 004e19c1 00000001 00000000
[ 41.724069] 5fe0: 004dcedc beff4a6c 004c0738 b6e7af4c
[ 41.729860] BUG: scheduling while atomic: ptp4l/208/0x00000002
[ 41.735682] INFO: lockdep is turned off.
Enabling RX timestamping will logically disturb the fastpath (processing
of meta frames). Replace bool hwts_rx_en with a bit that is checked
atomically from the fastpath and temporarily unset from the sleepable
context during a change of the RX timestamping process (a destructive
operation anyways, requires switch reset).
If found unset, the fastpath (net/dsa/tag_sja1105.c) will just drop any
received meta frame and not take the meta_lock at all.
Fixes: a602afd200 ("net: dsa: sja1105: Expose PTP timestamping ioctls to userspace")
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
rds_ibdev:ipaddr_list and rds_ibdev:conn_list are initialized
after allocation some resources such as protection domain.
If allocation of such resources fail, then these uninitialized
variables are accessed in rds_ib_dev_free() in failure path. This
can potentially crash the system. The code has been updated to
initialize these variables very early in the function.
Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il>
Signed-off-by: Sudhakar Dindukurti <sudhakar.dindukurti@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There has been some confusion between the port number and
the VLAN ID in this driver. What we need to check for
validity is the VLAN ID, nothing else.
The current confusion came from assigning a few default
VLANs for default routing and we need to rewrite that
properly.
Instead of checking if the port number is a valid VLAN
ID, check the actual VLAN IDs passed in to the callback
one by one as expected.
Fixes: d8652956cf ("net: dsa: realtek-smi: Add Realtek SMI driver")
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Recently added code introduces 64-bit division in dr_icm_pool_mr_create()
so that build on 32-bit architectures fails with
ERROR: "__umoddi3" [drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.ko] undefined!
As the divisor is always a power of 2, we can use bitwise operation
instead.
Fixes: 29cf8febd1 ("net/mlx5: DR, ICM pool memory allocator")
Reported-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
We have identified a problem with the "oversubscription" policy in the
link transmission code.
When small messages are transmitted, and the sending link has reached
the transmit window limit, those messages will be bundled and put into
the link backlog queue. However, bundles of data messages are counted
at the 'CRITICAL' level, so that the counter for that level, instead of
the counter for the real, bundled message's level is the one being
increased.
Subsequent, to-be-bundled data messages at non-CRITICAL levels continue
to be tested against the unchanged counter for their own level, while
contributing to an unrestrained increase at the CRITICAL backlog level.
This leaves a gap in congestion control algorithm for small messages
that can result in starvation for other users or a "real" CRITICAL
user. Even that eventually can lead to buffer exhaustion & link reset.
We fix this by keeping a 'target_bskb' buffer pointer at each levels,
then when bundling, we only bundle messages at the same importance
level only. This way, we know exactly how many slots a certain level
have occupied in the queue, so can manage level congestion accurately.
By bundling messages at the same level, we even have more benefits. Let
consider this:
- One socket sends 64-byte messages at the 'CRITICAL' level;
- Another sends 4096-byte messages at the 'LOW' level;
When a 64-byte message comes and is bundled the first time, we put the
overhead of message bundle to it (+ 40-byte header, data copy, etc.)
for later use, but the next message can be a 4096-byte one that cannot
be bundled to the previous one. This means the last bundle carries only
one payload message which is totally inefficient, as for the receiver
also! Later on, another 64-byte message comes, now we make a new bundle
and the same story repeats...
With the new bundling algorithm, this will not happen, the 64-byte
messages will be bundled together even when the 4096-byte message(s)
comes in between. However, if the 4096-byte messages are sent at the
same level i.e. 'CRITICAL', the bundling algorithm will again cause the
same overhead.
Also, the same will happen even with only one socket sending small
messages at a rate close to the link transmit's one, so that, when one
message is bundled, it's transmitted shortly. Then, another message
comes, a new bundle is created and so on...
We will solve this issue radically by another patch.
Fixes: 365ad353c2 ("tipc: reduce risk of user starvation during link congestion")
Reported-by: Hoang Le <hoang.h.le@dektech.com.au>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Today the EFI runtime functions are setup in architecture specific
code (x86 and arm), with the functions themselves living in drivers/xen
as they are not architecture dependent.
As the setup is exactly the same for arm and x86 move the setup to
drivers/xen, too. This at once removes the need to make the single
functions global visible.
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
[boris: "Dropped EXPORT_SYMBOL_GPL(xen_efi_runtime_setup)"]
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
gpio: fixes for v5.4-rc2
- fix a bug with emulated open-drain/source where lines' values can no longer
be changed
- fix getting nonexclusive gpiods from DT
- fix an incorrect offset for the level trigger in gpio-eic-sprd
SMI# interrupt for fan and voltage is Two-Times Interrupt Mode.
Fan or voltage exceeds high limit or going below low limit,
it will causes an interrupt if the previous interrupt has been
reset by reading all the interrupt Status Register. Thus, add the
array fan_alarm and vsen_alarm to store the alarms for all of the
fan and voltage sensors.
Signed-off-by: amy.shih <amy.shih@advantech.com.tw>
Link: https://lore.kernel.org/r/20190919030205.11440-1-Amy.Shih@advantech.com.tw
Fixes: 486842db3b ("hwmon: (nct7904) Add extra sysfs support for fan, voltage and temperature.")
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
It's been a while since the k10temp documentation has been updated.
There are new CPU families supported as well as Tdie temp was added.
This patch adds all missing families which I was able to find from git
history and provides more info about Tctl vs Tdie exported temps.
Signed-off-by: Lukas Zapletal <lzap+git@redhat.com>
Link: https://lore.kernel.org/r/20190923105931.27881-1-lzap+git@redhat.com
[groeck: Formatting]
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
arm64 shares some code under arch/arm/xen, including mm.c.
However ZONE_DMA is removed by commit
ad67f5a6545("arm64: replace ZONE_DMA with ZONE_DMA32").
So add a check if CONFIG_ZONE_DMA32 is enabled use __GFP_DMA32.
Signed-off-by: Peng Fan <peng.fan@nxp.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
xennet_fill_frags() uses ~0U as return value when the sk_buff is not able
to cache extra fragments. This is incorrect because the return type of
xennet_fill_frags() is RING_IDX and 0xffffffff is an expected value for
ring buffer index.
In the situation when the rsp_cons is approaching 0xffffffff, the return
value of xennet_fill_frags() may become 0xffffffff which xennet_poll() (the
caller) would regard as error. As a result, queue->rx.rsp_cons is set
incorrectly because it is updated only when there is error. If there is no
error, xennet_poll() would be responsible to update queue->rx.rsp_cons.
Finally, queue->rx.rsp_cons would point to the rx ring buffer entries whose
queue->rx_skbs[i] and queue->grant_rx_ref[i] are already cleared to NULL.
This leads to NULL pointer access in the next iteration to process rx ring
buffer entries.
The symptom is similar to the one fixed in
commit 00b368502d ("xen-netfront: do not assume sk_buff_head list is
empty in error handling").
This patch changes the return type of xennet_fill_frags() to indicate
whether it is successful or failed. The queue->rx.rsp_cons will be
always updated inside this function.
Fixes: ad4f15dc2c ("xen/netfront: don't bug in case of too many frags")
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rajendra reported a kernel panic when a link was taken down:
[ 6870.263084] BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8
[ 6870.271856] IP: [<ffffffff8efc5764>] __ipv6_ifa_notify+0x154/0x290
<snip>
[ 6870.570501] Call Trace:
[ 6870.573238] [<ffffffff8efc58c6>] ? ipv6_ifa_notify+0x26/0x40
[ 6870.579665] [<ffffffff8efc98ec>] ? addrconf_dad_completed+0x4c/0x2c0
[ 6870.586869] [<ffffffff8efe70c6>] ? ipv6_dev_mc_inc+0x196/0x260
[ 6870.593491] [<ffffffff8efc9c6a>] ? addrconf_dad_work+0x10a/0x430
[ 6870.600305] [<ffffffff8f01ade4>] ? __switch_to_asm+0x34/0x70
[ 6870.606732] [<ffffffff8ea93a7a>] ? process_one_work+0x18a/0x430
[ 6870.613449] [<ffffffff8ea93d6d>] ? worker_thread+0x4d/0x490
[ 6870.619778] [<ffffffff8ea93d20>] ? process_one_work+0x430/0x430
[ 6870.626495] [<ffffffff8ea99dd9>] ? kthread+0xd9/0xf0
[ 6870.632145] [<ffffffff8f01ade4>] ? __switch_to_asm+0x34/0x70
[ 6870.638573] [<ffffffff8ea99d00>] ? kthread_park+0x60/0x60
[ 6870.644707] [<ffffffff8f01ae77>] ? ret_from_fork+0x57/0x70
[ 6870.650936] Code: 31 c0 31 d2 41 b9 20 00 08 02 b9 09 00 00 0
addrconf_dad_work is kicked to be scheduled when a device is brought
up. There is a race between addrcond_dad_work getting scheduled and
taking the rtnl lock and a process taking the link down (under rtnl).
The latter removes the host route from the inet6_addr as part of
addrconf_ifdown which is run for NETDEV_DOWN. The former attempts
to use the host route in ipv6_ifa_notify. If the down event removes
the host route due to the race to the rtnl, then the BUG listed above
occurs.
This scenario does not occur when the ipv6 address is not kept
(net.ipv6.conf.all.keep_addr_on_down = 0) as addrconf_ifdown sets the
state of the ifp to DEAD. Handle when the addresses are kept by checking
IF_READY which is reset by addrconf_ifdown.
The 'dead' flag for an inet6_addr is set only under rtnl, in
addrconf_ifdown and it means the device is getting removed (or IPv6 is
disabled). The interesting cases for changing the idev flag are
addrconf_notify (NETDEV_UP and NETDEV_CHANGE) and addrconf_ifdown
(reset the flag). The former does not have the idev lock - only rtnl;
the latter has both. Based on that the existing dead + IF_READY check
can be moved to right after the rtnl_lock in addrconf_dad_work.
Fixes: f1705ec197 ("net: ipv6: Make address flushing on ifdown optional")
Reported-by: Rajendra Dendukuri <rajendra.dendukuri@broadcom.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The cited commit exposed an old retransmits_timed_out() bug
which assumed it could call tcp_model_timeout() with
TCP_RTO_MIN as rto_base for all states.
But flows in SYN_SENT or SYN_RECV state uses a different
RTO base (1 sec instead of 200 ms, unless BPF choses
another value)
This caused a reduction of SYN retransmits from 6 to 4 with
the default /proc/sys/net/ipv4/tcp_syn_retries value.
Fixes: a41e8a88b0 ("tcp: better handle TCP_USER_TIMEOUT in SYN_SENT state")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Marek Majkowski <marek@cloudflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lockdep is unhappy if two locks from the same class are held.
Fix the below warning for hyperv and virtio sockets (vmci socket code
doesn't have the issue) by using lock_sock_nested() when __vsock_release()
is called recursively:
============================================
WARNING: possible recursive locking detected
5.3.0+ #1 Not tainted
--------------------------------------------
server/1795 is trying to acquire lock:
ffff8880c5158990 (sk_lock-AF_VSOCK){+.+.}, at: hvs_release+0x10/0x120 [hv_sock]
but task is already holding lock:
ffff8880c5158150 (sk_lock-AF_VSOCK){+.+.}, at: __vsock_release+0x2e/0xf0 [vsock]
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(sk_lock-AF_VSOCK);
lock(sk_lock-AF_VSOCK);
*** DEADLOCK ***
May be due to missing lock nesting notation
2 locks held by server/1795:
#0: ffff8880c5d05ff8 (&sb->s_type->i_mutex_key#10){+.+.}, at: __sock_release+0x2d/0xa0
#1: ffff8880c5158150 (sk_lock-AF_VSOCK){+.+.}, at: __vsock_release+0x2e/0xf0 [vsock]
stack backtrace:
CPU: 5 PID: 1795 Comm: server Not tainted 5.3.0+ #1
Call Trace:
dump_stack+0x67/0x90
__lock_acquire.cold.67+0xd2/0x20b
lock_acquire+0xb5/0x1c0
lock_sock_nested+0x6d/0x90
hvs_release+0x10/0x120 [hv_sock]
__vsock_release+0x24/0xf0 [vsock]
__vsock_release+0xa0/0xf0 [vsock]
vsock_release+0x12/0x30 [vsock]
__sock_release+0x37/0xa0
sock_close+0x14/0x20
__fput+0xc1/0x250
task_work_run+0x98/0xc0
do_exit+0x344/0xc60
do_group_exit+0x47/0xb0
get_signal+0x15c/0xc50
do_signal+0x30/0x720
exit_to_usermode_loop+0x50/0xa0
do_syscall_64+0x24e/0x270
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x7f4184e85f31
Tested-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix NULL-pointer dereference on tty open due to a failure to handle a
missing interrupt-in endpoint when probing modem ports:
BUG: kernel NULL pointer dereference, address: 0000000000000006
...
RIP: 0010:tiocmget_submit_urb+0x1c/0xe0 [hso]
...
Call Trace:
hso_start_serial_device+0xdc/0x140 [hso]
hso_serial_open+0x118/0x1b0 [hso]
tty_open+0xf1/0x490
Fixes: 542f548236 ("tty: Modem functions for the HSO driver")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This fixes an error with how the FDT blob is reserved in memblock.
An incorrect physical address calculation exposed the FDT header to
unintended corruption, which typically manifested with of_fdt_raw_init()
faulting during late boot after fdt_totalsize() returned a wrong value.
Systems with smaller physical memory sizes more frequently trigger this
issue, as the kernel is more likely to allocate from the DMA32 zone
where bbl places the DTB after the kernel image.
Commit 671f9a3e2e ("RISC-V: Setup initial page tables in two stages")
changed the mapping of the DTB to reside in the fixmap area.
Consequently, early_init_fdt_reserve_self() cannot be used anymore in
setup_bootmem() since it relies on __pa() to derive a physical address,
which does not work with dtb_early_va that is no longer a valid kernel
logical address.
The reserved[0x1] region shows the effect of the pointer underflow
resulting from the __pa(initial_boot_params) offset subtraction:
[ 0.000000] MEMBLOCK configuration:
[ 0.000000] memory size = 0x000000001fe00000 reserved size = 0x0000000000a2e514
[ 0.000000] memory.cnt = 0x1
[ 0.000000] memory[0x0] [0x0000000080200000-0x000000009fffffff], 0x000000001fe00000 bytes flags: 0x0
[ 0.000000] reserved.cnt = 0x2
[ 0.000000] reserved[0x0] [0x0000000080200000-0x0000000080c2dfeb], 0x0000000000a2dfec bytes flags: 0x0
[ 0.000000] reserved[0x1] [0xfffffff080100000-0xfffffff080100527], 0x0000000000000528 bytes flags: 0x0
With the fix applied:
[ 0.000000] MEMBLOCK configuration:
[ 0.000000] memory size = 0x000000001fe00000 reserved size = 0x0000000000a2e514
[ 0.000000] memory.cnt = 0x1
[ 0.000000] memory[0x0] [0x0000000080200000-0x000000009fffffff], 0x000000001fe00000 bytes flags: 0x0
[ 0.000000] reserved.cnt = 0x2
[ 0.000000] reserved[0x0] [0x0000000080200000-0x0000000080c2dfeb], 0x0000000000a2dfec bytes flags: 0x0
[ 0.000000] reserved[0x1] [0x0000000080e00000-0x0000000080e00527], 0x0000000000000528 bytes flags: 0x0
Fixes: 671f9a3e2e ("RISC-V: Setup initial page tables in two stages")
Signed-off-by: Albert Ou <aou@eecs.berkeley.edu>
Tested-by: Bin Meng <bmeng.cn@gmail.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
This is almost entirely a comment. The bug is unlikely to manifest on
existing hardware because there is a timeout on load reservations, but
manifests on QEMU because there is no timeout.
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
Commit ac7c3e4ff4 ("compiler: enable CONFIG_OPTIMIZE_INLINING
forcibly") allows compiler to uninline functions marked as 'inline'.
Leading to section mismatch in this case.
Since we're using const variables to pass assembly flags, 'inline's
can't be dropped. So we simply mark them as __always_inline.
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Cc: linux-mips@vger.kernel.org
[paul.burton@mips.com:
- Annotate these functions with __init, even if it only serves to
inform human readers when the code can be used.
- Drop the __always_inline from check_daddi() & check_daddiu() which
don't use arguments as immediates in inline asm.
- Rewrap the commit message.]
Signed-off-by: Paul Burton <paul.burton@mips.com>
We are missing a __SetPageOffline(), which is why we can get
!PageOffline() pages onto the balloon list, where
alloc_xenballooned_pages() will complain:
page:ffffea0003e7ffc0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0
flags: 0xffffe00001000(reserved)
raw: 000ffffe00001000 dead000000000100 dead000000000200 0000000000000000
raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
page dumped because: VM_BUG_ON_PAGE(!PageOffline(page))
------------[ cut here ]------------
kernel BUG at include/linux/page-flags.h:744!
invalid opcode: 0000 [#1] SMP NOPTI
Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Fixes: 77c4adf6a6 ("xen/balloon: mark inflated pages PG_offline")
Cc: stable@vger.kernel.org # v5.1+
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
HYPERVISOR_platform_op() is an inline function and should not
be exported. Since commit 15bfc2348d ("modpost: check for
static EXPORT_SYMBOL* functions"), this causes a warning:
WARNING: "HYPERVISOR_platform_op" [vmlinux] is a static EXPORT_SYMBOL_GPL
Instead, export the underlying function called by the static inline:
HYPERVISOR_platform_op_raw.
Fixes: 15bfc2348d ("modpost: check for static EXPORT_SYMBOL* functions")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Commit a745f7af3c ("selftests/harness: Add 30 second timeout per
test") solves the problem of kselftest_harness.h-using binary tests
possibly hanging forever. However, scripts and other binaries can still
hang forever. This adds a global timeout to each test script run.
To make this configurable (e.g. as needed in the "rtc" test case),
include a new per-test-directory "settings" file (similar to "config")
that can contain kselftest-specific settings. The first recognized field
is "timeout".
Additionally, this splits the reporting for timeouts into a specific
"TIMEOUT" not-ok (and adds exit code reporting in the remaining case).
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
A TARGET which failed to be built/installed should not be included in the
runlist generated inside the run_kselftest.sh script.
Signed-off-by: Cristian Marussi <cristian.marussi@arm.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Let the user specify an optional TARGETS skiplist through the new optional
SKIP_TARGETS Makefile variable.
It is easier to skip at will using a reduced and well defined list of
possibly problematic targets with SKIP_TARGETS than to provide a partially
stripped down list of good targets using the usual TARGETS variable.
Signed-off-by: Cristian Marussi <cristian.marussi@arm.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
If CONFIG_PM_SLEEP is not set, we can comment out these functions to avoid
the below warnings:
drivers/hv/vmbus_drv.c:2208:12: warning: ‘vmbus_bus_resume’ defined but not used [-Wunused-function]
drivers/hv/vmbus_drv.c:2128:12: warning: ‘vmbus_bus_suspend’ defined but not used [-Wunused-function]
drivers/hv/vmbus_drv.c:937:12: warning: ‘vmbus_resume’ defined but not used [-Wunused-function]
drivers/hv/vmbus_drv.c:918:12: warning: ‘vmbus_suspend’ defined but not used [-Wunused-function]
Fixes: 271b2224d4 ("Drivers: hv: vmbus: Implement suspend/resume for VSC drivers for hibernation")
Fixes: f53335e328 ("Drivers: hv: vmbus: Suspend/resume the vmbus itself for hibernation")
Reported-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Simplify the ring buffer handling with the in-place API.
Also avoid the dynamic allocation and the memory leak in the channel
callback function.
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Currently, the command:
btrfs balance start -dconvert=single,soft .
on a Raspberry Pi produces the following kernel message:
BTRFS error (device mmcblk0p2): balance: invalid convert data profile single
This fails because we use is_power_of_2(unsigned long) to validate
the new data profile, the constant for 'single' profile uses bit 48,
and there are only 32 bits in a long on ARM.
Fix by open-coding the check using u64 variables.
Tested by completing the original balance command on several Raspberry
Pis.
Fixes: 818255feec ("btrfs: use common helper instead of open coding a bit test")
CC: stable@vger.kernel.org # 4.20+
Signed-off-by: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This patch is syncing driver with actual devicetree documentation:
Documentation/devicetree/bindings/net/qca,ar71xx.txt
|Optional subnodes:
|- mdio : specifies the mdio bus, used as a container for phy nodes
| according to phy.txt in the same directory
The driver was working with fixed phy without any noticeable issues. This bug
was uncovered by introducing dsa ar9331-switch driver.
Since no one reported this bug until now, I assume no body is using it
and this patch should not brake existing system.
Fixes: d51b6ce441 ("net: ethernet: add ag71xx driver")
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jose Abreu says:
====================
net: stmmac: Fixes for -net
Misc fixes for -net tree. More info in commit logs.
v2 is just a rebase of v1 against -net and we added a new patch (09/09) to
fix RSS feature.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit b6b6cc9acd, changed the call to dwxgmac2_rss_write_reg()
passing it the variable cfg->key[i].
As key is an u8 but we write 32 bits at a time we need to cast it into
an u32 so that the correct key values are written. Notice that the for
loop already takes this into account so we don't try to write past the
keys size.
Fixes: b6b6cc9acd ("net: stmmac: selftest: avoid large stack usage")
Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
We don't use it anyway as XGMAC only supports polling for timestamp (in
current SW implementation). This greatly reduces the system load by
reducing the number of interrupts.
Fixes: 2142754f8b ("net: stmmac: Add MAC related callbacks for XGMAC2")
Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If WoL is enabled we can't really stop the PHY, otherwise we will not
receive the WoL packet. Fix this by telling phylink that only the MAC is
down and only stop the PHY if WoL is not enabled.
Fixes: 74371272f9 ("net: stmmac: Convert to phylink and remove phylib logic")
Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The case for PTPV2_EVENT requires event packets to be captured so add
this setting to the list of enabled captures.
Fixes: 891434b18e ("stmmac: add IEEE PTPv1 and PTPv2 support.")
Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We need to always update the MAC Hash Filter so that previous entries
are invalidated.
Found out while running stmmac selftests.
Fixes: b8ef7020d6 ("net: stmmac: add support for hash table size 128/256 in dwmac4")
Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Although some XGMAC setups support frames larger than DMA size, some of
them may not. As we can't know before-hand which ones support let's use
the maximum DMA buffer size in the Jumbo Tests.
User can always reconfigure the MTU to achieve larger frames.
Fixes: 427849e8c3 ("net: stmmac: selftests: Add Jumbo Frame tests")
Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since commit b8ef7020d6 ("net: stmmac: add support for hash table size
128/256 in dwmac4"), we can detect the Hash Table dinamically.
Let's implement this feature in XGMAC cores and fix possible setups that
don't support the maximum size for Hash Table.
Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some setups may not have all Unicast addresses filters available. Let's
check this before trying to setup filters.
Fixes: 0efedbf11f ("net: stmmac: xgmac: Fix XGMAC selftests")
Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If any of the param or info_get op returns error, dumpit cb is
skipping to dump remaining params or info_get ops for all the
drivers.
Fix to not return if any of the param/info_get op returns error
as not supported and continue to dump remaining information.
v2: Modify the patch to return error, except for params/info_get
op that return -EOPNOTSUPP as suggested by Andrew Lunn. Also, modify
commit message to reflect the same.
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
As explained in the "net: sched: taprio: Avoid division by zero on
invalid link speed" commit, it is legal for the ethtool API to return
zero as a link speed. So guard against it to ensure we don't perform a
division by zero in kernel.
Fixes: e0a7683d30 ("net/sched: cbs: fix port_rate miscalculation")
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The check in taprio_set_picos_per_byte is currently not robust enough
and will trigger this division by zero, due to e.g. PHYLINK not setting
kset->base.speed when there is no PHY connected:
[ 27.109992] Division by zero in kernel.
[ 27.113842] CPU: 1 PID: 198 Comm: tc Not tainted 5.3.0-rc5-01246-gc4006b8c2637-dirty #212
[ 27.121974] Hardware name: Freescale LS1021A
[ 27.126234] [<c03132e0>] (unwind_backtrace) from [<c030d8b8>] (show_stack+0x10/0x14)
[ 27.133938] [<c030d8b8>] (show_stack) from [<c10b21b0>] (dump_stack+0xb0/0xc4)
[ 27.141124] [<c10b21b0>] (dump_stack) from [<c10af97c>] (Ldiv0_64+0x8/0x18)
[ 27.148052] [<c10af97c>] (Ldiv0_64) from [<c0700260>] (div64_u64+0xcc/0xf0)
[ 27.154978] [<c0700260>] (div64_u64) from [<c07002d0>] (div64_s64+0x4c/0x68)
[ 27.161993] [<c07002d0>] (div64_s64) from [<c0f3d890>] (taprio_set_picos_per_byte+0xe8/0xf4)
[ 27.170388] [<c0f3d890>] (taprio_set_picos_per_byte) from [<c0f3f614>] (taprio_change+0x668/0xcec)
[ 27.179302] [<c0f3f614>] (taprio_change) from [<c0f2bc24>] (qdisc_create+0x1fc/0x4f4)
[ 27.187091] [<c0f2bc24>] (qdisc_create) from [<c0f2c0c8>] (tc_modify_qdisc+0x1ac/0x6f8)
[ 27.195055] [<c0f2c0c8>] (tc_modify_qdisc) from [<c0ee9604>] (rtnetlink_rcv_msg+0x268/0x2dc)
[ 27.203449] [<c0ee9604>] (rtnetlink_rcv_msg) from [<c0f4fef0>] (netlink_rcv_skb+0xe0/0x114)
[ 27.211756] [<c0f4fef0>] (netlink_rcv_skb) from [<c0f4f6cc>] (netlink_unicast+0x1b4/0x22c)
[ 27.219977] [<c0f4f6cc>] (netlink_unicast) from [<c0f4fa84>] (netlink_sendmsg+0x284/0x340)
[ 27.228198] [<c0f4fa84>] (netlink_sendmsg) from [<c0eae5fc>] (sock_sendmsg+0x14/0x24)
[ 27.235988] [<c0eae5fc>] (sock_sendmsg) from [<c0eaedf8>] (___sys_sendmsg+0x214/0x228)
[ 27.243863] [<c0eaedf8>] (___sys_sendmsg) from [<c0eb015c>] (__sys_sendmsg+0x50/0x8c)
[ 27.251652] [<c0eb015c>] (__sys_sendmsg) from [<c0301000>] (ret_fast_syscall+0x0/0x54)
[ 27.259524] Exception stack(0xe8045fa8 to 0xe8045ff0)
[ 27.264546] 5fa0: b6f608c8 000000f8 00000003 bed7e2f0 00000000 00000000
[ 27.272681] 5fc0: b6f608c8 000000f8 004ce54c 00000128 5d3ce8c7 00000000 00000026 00505c9c
[ 27.280812] 5fe0: 00000070 bed7e298 004ddd64 b6dd1e64
Russell King points out that the ethtool API says zero is a valid return
value of __ethtool_get_link_ksettings:
* If it is enabled then they are read-only; if the link
* is up they represent the negotiated link mode; if the link is down,
* the speed is 0, %SPEED_UNKNOWN or the highest enabled speed and
* @duplex is %DUPLEX_UNKNOWN or the best enabled duplex mode.
So, it seems that taprio is not following the API... I'd suggest either
fixing taprio, or getting agreement to change the ethtool API.
The chosen path was to fix taprio.
Fixes: 7b9eba7ba0 ("net/sched: taprio: fix picos_per_byte miscalculation")
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
commit 174e23810c
("sk_buff: drop all skb extensions on free and skb scrubbing") made napi
recycle always drop skb extensions. The additional skb_ext_del() that is
performed via nf_reset on napi skb recycle is not needed anymore.
Most nf_reset() calls in the stack are there so queued skb won't block
'rmmod nf_conntrack' indefinitely.
This removes the skb_ext_del from nf_reset, and renames it to a more
fitting nf_reset_ct().
In a few selected places, add a call to skb_ext_reset to make sure that
no active extensions remain.
I am submitting this for "net", because we're still early in the release
cycle. The patch applies to net-next too, but I think the rename causes
needless divergence between those trees.
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
We've historically had reports of being unable to mount file systems
because the tree log root couldn't be read. Usually this is the "parent
transid failure", but could be any of the related errors, including
"fsid mismatch" or "bad tree block", depending on which block got
allocated.
The modification of the individual log root items are serialized on the
per-log root root_mutex. This means that any modification to the
per-subvol log root_item is completely protected.
However we update the root item in the log root tree outside of the log
root tree log_mutex. We do this in order to allow multiple subvolumes
to be updated in each log transaction.
This is problematic however because when we are writing the log root
tree out we update the super block with the _current_ log root node
information. Since these two operations happen independently of each
other, you can end up updating the log root tree in between writing out
the dirty blocks and setting the super block to point at the current
root.
This means we'll point at the new root node that hasn't been written
out, instead of the one we should be pointing at. Thus whatever garbage
or old block we end up pointing at complains when we mount the file
system later and try to replay the log.
Fix this by copying the log's root item into a local root item copy.
Then once we're safely under the log_root_tree->log_mutex we update the
root item in the log_root_tree. This way we do not modify the
log_root_tree while we're committing it, fixing the problem.
CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Chris Mason <clm@fb.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When we have a buffered write that starts at an offset greater than or
equals to the file's size happening concurrently with a full ranged
fiemap, we can end up leaking an extent state structure.
Suppose we have a file with a size of 1Mb, and before the buffered write
and fiemap are performed, it has a single extent state in its io tree
representing the range from 0 to 1Mb, with the EXTENT_DELALLOC bit set.
The following sequence diagram shows how the memory leak happens if a
fiemap a buffered write, starting at offset 1Mb and with a length of
4Kb, are performed concurrently.
CPU 1 CPU 2
extent_fiemap()
--> it's a full ranged fiemap
range from 0 to LLONG_MAX - 1
(9223372036854775807)
--> locks range in the inode's
io tree
--> after this we have 2 extent
states in the io tree:
--> 1 for range [0, 1Mb[ with
the bits EXTENT_LOCKED and
EXTENT_DELALLOC_BITS set
--> 1 for the range
[1Mb, LLONG_MAX[ with
the EXTENT_LOCKED bit set
--> start buffered write at offset
1Mb with a length of 4Kb
btrfs_file_write_iter()
btrfs_buffered_write()
--> cached_state is NULL
lock_and_cleanup_extent_if_need()
--> returns 0 and does not lock
range because it starts
at current i_size / eof
--> cached_state remains NULL
btrfs_dirty_pages()
btrfs_set_extent_delalloc()
(...)
__set_extent_bit()
--> splits extent state for range
[1Mb, LLONG_MAX[ and now we
have 2 extent states:
--> one for the range
[1Mb, 1Mb + 4Kb[ with
EXTENT_LOCKED set
--> another one for the range
[1Mb + 4Kb, LLONG_MAX[ with
EXTENT_LOCKED set as well
--> sets EXTENT_DELALLOC on the
extent state for the range
[1Mb, 1Mb + 4Kb[
--> caches extent state
[1Mb, 1Mb + 4Kb[ into
@cached_state because it has
the bit EXTENT_LOCKED set
--> btrfs_buffered_write() ends up
with a non-NULL cached_state and
never calls anything to release its
reference on it, resulting in a
memory leak
Fix this by calling free_extent_state() on cached_state if the range was
not locked by lock_and_cleanup_extent_if_need().
The same issue can happen if anything else other than fiemap locks a range
that covers eof and beyond.
This could be triggered, sporadically, by test case generic/561 from the
fstests suite, which makes duperemove run concurrently with fsstress, and
duperemove does plenty of calls to fiemap. When CONFIG_BTRFS_DEBUG is set
the leak is reported in dmesg/syslog when removing the btrfs module with
a message like the following:
[77100.039461] BTRFS: state leak: start 6574080 end 6582271 state 16402 in tree 0 refs 1
Otherwise (CONFIG_BTRFS_DEBUG not set) detectable with kmemleak.
CC: stable@vger.kernel.org # 4.16+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Johannes Berg says:
====================
A small list of fixes this time:
* two null pointer dereference fixes
* a fix for preempt-enabled/BHs-enabled (lockdep) splats
(that correctly pointed out a bug)
* a fix for multi-BSSID ordering assumptions
* a fix for the EDMG support, on-stack chandefs need to
be initialized properly (now that they're bigger)
* beacon (head) data from userspace should be validated
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
When no other driver selects the devlink library code, ionic
produces a link failure:
drivers/net/ethernet/pensando/ionic/ionic_devlink.o: In function `ionic_devlink_alloc':
ionic_devlink.c:(.text+0xd): undefined reference to `devlink_alloc'
drivers/net/ethernet/pensando/ionic/ionic_devlink.o: In function `ionic_devlink_register':
ionic_devlink.c:(.text+0x71): undefined reference to `devlink_register'
Add the same 'select' statement that the other drivers use here.
Fixes: fbfb803153 ("ionic: Add hardware init and device commands")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Resolving a couple of Sphinx documentation warnings
that are generated in the networking section.
- WARNING: document isn't included in any toctree
- WARNING: Title underline too short.
Signed-off-by: Adam Zerella <adam.zerella@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add kselftest-all target to build tests from the top level
Makefile. This is to simplify kselftest use-cases for CI and
distributions where build and test systems are different.
Current kselftest target builds and runs tests on a development
system which is a developer use-case.
Add kselftest-install target to install tests from the top level
Makefile. This is to simplify kselftest use-cases for CI and
distributions where build and test systems are different.
This change addresses requests from developers and testers to add
support for installing kselftest from the main Makefile.
In addition, make the install directory the same when install is
run using "make kselftest-install" or by running kselftest_install.sh.
Also fix the INSTALL_PATH variable conflict between main Makefile and
selftests Makefile.
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Acked-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Always acquire tx descriptor spinlock even if a xdp program is not loaded
on the netsec device since ndo_xdp_xmit can run concurrently with
netsec_netdev_start_xmit and netsec_clean_tx_dring. This can happen
loading a xdp program on a different device (e.g virtio-net) and
xdp_do_redirect_map/xdp_do_redirect_slow can redirect to netsec even if
we do not have a xdp program on it.
Fixes: ba2b232108 ("net: netsec: add XDP support")
Tested-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the interface type is P2P_DEVICE or NAN, read the file of
'/sys/kernel/debug/ieee80211/phyx/netdev:wlanx/aqm' will get a
NULL pointer dereference. As for those interface type, the
pointer sdata->vif.txq is NULL.
Unable to handle kernel NULL pointer dereference at virtual address 00000011
CPU: 1 PID: 30936 Comm: cat Not tainted 4.14.104 #1
task: ffffffc0337e4880 task.stack: ffffff800cd20000
PC is at ieee80211_if_fmt_aqm+0x34/0xa0 [mac80211]
LR is at ieee80211_if_fmt_aqm+0x34/0xa0 [mac80211]
[...]
Process cat (pid: 30936, stack limit = 0xffffff800cd20000)
[...]
[<ffffff8000b7cd00>] ieee80211_if_fmt_aqm+0x34/0xa0 [mac80211]
[<ffffff8000b7c414>] ieee80211_if_read+0x60/0xbc [mac80211]
[<ffffff8000b7ccc4>] ieee80211_if_read_aqm+0x28/0x30 [mac80211]
[<ffffff80082eff94>] full_proxy_read+0x2c/0x48
[<ffffff80081eef00>] __vfs_read+0x2c/0xd4
[<ffffff80081ef084>] vfs_read+0x8c/0x108
[<ffffff80081ef494>] SyS_read+0x40/0x7c
Signed-off-by: Miaoqing Pan <miaoqing@codeaurora.org>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/r/1569549796-8223-1-git-send-email-miaoqing@codeaurora.org
[trim useless data from commit message]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
If the interface is not in MESH mode, the command 'iw wlanx mpath del'
will cause kernel panic.
The root cause is null pointer access in mpp_flush_by_proxy(), as the
pointer 'sdata->u.mesh.mpp_paths' is NULL for non MESH interface.
Unable to handle kernel NULL pointer dereference at virtual address 00000068
[...]
PC is at _raw_spin_lock_bh+0x20/0x5c
LR is at mesh_path_del+0x1c/0x17c [mac80211]
[...]
Process iw (pid: 4537, stack limit = 0xd83e0238)
[...]
[<c021211c>] (_raw_spin_lock_bh) from [<bf8c7648>] (mesh_path_del+0x1c/0x17c [mac80211])
[<bf8c7648>] (mesh_path_del [mac80211]) from [<bf6cdb7c>] (extack_doit+0x20/0x68 [compat])
[<bf6cdb7c>] (extack_doit [compat]) from [<c05c309c>] (genl_rcv_msg+0x274/0x30c)
[<c05c309c>] (genl_rcv_msg) from [<c05c25d8>] (netlink_rcv_skb+0x58/0xac)
[<c05c25d8>] (netlink_rcv_skb) from [<c05c2e14>] (genl_rcv+0x20/0x34)
[<c05c2e14>] (genl_rcv) from [<c05c1f90>] (netlink_unicast+0x11c/0x204)
[<c05c1f90>] (netlink_unicast) from [<c05c2420>] (netlink_sendmsg+0x30c/0x370)
[<c05c2420>] (netlink_sendmsg) from [<c05886d0>] (sock_sendmsg+0x70/0x84)
[<c05886d0>] (sock_sendmsg) from [<c0589f4c>] (___sys_sendmsg.part.3+0x188/0x228)
[<c0589f4c>] (___sys_sendmsg.part.3) from [<c058add4>] (__sys_sendmsg+0x4c/0x70)
[<c058add4>] (__sys_sendmsg) from [<c0208c80>] (ret_fast_syscall+0x0/0x44)
Code: e2822c02 e2822001 e5832004 f590f000 (e1902f9f)
---[ end trace bbd717600f8f884d ]---
Signed-off-by: Miaoqing Pan <miaoqing@codeaurora.org>
Link: https://lore.kernel.org/r/1569485810-761-1-git-send-email-miaoqing@codeaurora.org
[trim useless data from commit message]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The code copying the data assumes that the SSID element is
before the MBSSID element, but since the data is untrusted
from the AP, this cannot be guaranteed.
Validate that this is indeed the case and ignore the MBSSID
otherwise, to avoid having to deal with both cases for the
copy of data that should be between them.
Cc: stable@vger.kernel.org
Fixes: 0b8fb8235b ("cfg80211: Parsing of Multiple BSSID information in scanning")
Link: https://lore.kernel.org/r/1569009255-I1673911f5eae02964e21bdc11b2bf58e5e207e59@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
We currently don't validate the beacon head, i.e. the header,
fixed part and elements that are to go in front of the TIM
element. This means that the variable elements there can be
malformed, e.g. have a length exceeding the buffer size, but
most downstream code from this assumes that this has already
been checked.
Add the necessary checks to the netlink policy.
Cc: stable@vger.kernel.org
Fixes: ed1b6cc7f8 ("cfg80211/nl80211: add beacon settings")
Link: https://lore.kernel.org/r/1569009255-I7ac7fbe9436e9d8733439eab8acbbd35e55c74ef@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This reverts commit 7e64db1597.
The thin provisioning feature introduces an IOCTL and the discard support
to allow userspace tools and filesystems to release unused and previously
allocated space respectively.
During some internal performance improvements and further tests, the
release of allocated space revealed some issues that may lead to data
corruption in some configurations when filesystems are mounted with
discard support enabled.
While we're working on a fix and trying to clarify the situation,
this commit reverts the discard support for ESE volumes to prevent
potential data corruption.
Cc: <stable@vger.kernel.org> # 5.3
Signed-off-by: Stefan Haberland <sth@linux.ibm.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
It is possible that the CCW commands for reading volume and extent pool
information are not supported, either by the storage server (for
dedicated DASDs) or by z/VM (for virtual devices, such as MDISKs).
As a command reject will occur in such a case, the current error
handling leads to a failing online processing and thus the DASD can't be
used at all.
Since the data being read is not essential for an fully operational
DASD, the error handling can be removed. Information about the failing
command is sent to the s390dbf debug feature.
Fixes: c729696bcf ("s390/dasd: Recognise data for ESE volumes")
Cc: <stable@vger.kernel.org> # 5.3
Reported-by: Frank Heimes <frank.heimes@canonical.com>
Signed-off-by: Jan Höppner <hoeppner@linux.ibm.com>
Signed-off-by: Stefan Haberland <sth@linux.ibm.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
All system calls use struct __kernel_timespec instead of the old struct
timespec, but this one was just added with the old-style ABI. Change it
now to enforce the use of __kernel_timespec, avoiding ABI confusion and
the need for compat handlers on 32-bit architectures.
Any user space caller will have to use __kernel_timespec now, but this
is unambiguous and works for any C library regardless of the time_t
definition. A nicer way to specify the timeout would have been a less
ambiguous 64-bit nanosecond value, but I suppose it's too late now to
change that as this would impact both 32-bit and 64-bit users.
Fixes: 5262f56798 ("io_uring: IORING_OP_TIMEOUT support")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The loop driver assumes that if the passed in fd is opened with
O_DIRECT, the caller wants to use direct I/O on the loop device.
However, if the underlying block device has a different block size than
the loop block queue, direct I/O can't be enabled. Instead of requiring
userspace to manually change the blocksize and re-enable direct I/O,
just change the queue block sizes to match, as well as the io_min size.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martijn Coenen <maco@android.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This patch fixes the lock inversion complaint:
============================================
WARNING: possible recursive locking detected
5.3.0-rc7-dbg+ #1 Not tainted
--------------------------------------------
kworker/u16:6/171 is trying to acquire lock:
00000000035c6e6c (&id_priv->handler_mutex){+.+.}, at: rdma_destroy_id+0x78/0x4a0 [rdma_cm]
but task is already holding lock:
00000000bc7c307d (&id_priv->handler_mutex){+.+.}, at: iw_conn_req_handler+0x151/0x680 [rdma_cm]
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&id_priv->handler_mutex);
lock(&id_priv->handler_mutex);
*** DEADLOCK ***
May be due to missing lock nesting notation
3 locks held by kworker/u16:6/171:
#0: 00000000e2eaa773 ((wq_completion)iw_cm_wq){+.+.}, at: process_one_work+0x472/0xac0
#1: 000000001efd357b ((work_completion)(&work->work)#3){+.+.}, at: process_one_work+0x476/0xac0
#2: 00000000bc7c307d (&id_priv->handler_mutex){+.+.}, at: iw_conn_req_handler+0x151/0x680 [rdma_cm]
stack backtrace:
CPU: 3 PID: 171 Comm: kworker/u16:6 Not tainted 5.3.0-rc7-dbg+ #1
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
Workqueue: iw_cm_wq cm_work_handler [iw_cm]
Call Trace:
dump_stack+0x8a/0xd6
__lock_acquire.cold+0xe1/0x24d
lock_acquire+0x106/0x240
__mutex_lock+0x12e/0xcb0
mutex_lock_nested+0x1f/0x30
rdma_destroy_id+0x78/0x4a0 [rdma_cm]
iw_conn_req_handler+0x5c9/0x680 [rdma_cm]
cm_work_handler+0xe62/0x1100 [iw_cm]
process_one_work+0x56d/0xac0
worker_thread+0x7a/0x5d0
kthread+0x1bc/0x210
ret_from_fork+0x24/0x30
This is not a bug as there are actually two lock classes here.
Link: https://lore.kernel.org/r/20190930231707.48259-3-bvanassche@acm.org
Fixes: de910bd921 ("RDMA/cma: Simplify locking needed for serialization of callbacks")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Switch sched_setattr() syscall from it's own copying struct sched_attr
from userspace to the new dedicated copy_struct_from_user() helper.
The change is very straightforward, and helps unify the syscall
interface for struct-from-userspace syscalls. Ideally we could also
unify sched_getattr(2)-style syscalls as well, but unfortunately the
correct semantics for such syscalls are much less clear (see [1] for
more detail). In future we could come up with a more sane idea for how
the syscall interface should look.
[1]: commit 1251201c0d ("sched/core: Fix uclamp ABI bug, clean up and
robustify sched_read_attr() ABI logic and code")
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Christian Brauner <christian.brauner@ubuntu.com>
[christian.brauner@ubuntu.com: improve commit message]
Link: https://lore.kernel.org/r/20191001011055.19283-4-cyphar@cyphar.com
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Switch clone3() syscall from it's own copying struct clone_args from
userspace to the new dedicated copy_struct_from_user() helper.
The change is very straightforward, and helps unify the syscall
interface for struct-from-userspace syscalls. Additionally, explicitly
define CLONE_ARGS_SIZE_VER0 to match the other users of the
struct-extension pattern.
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Christian Brauner <christian.brauner@ubuntu.com>
[christian.brauner@ubuntu.com: improve commit message]
Link: https://lore.kernel.org/r/20191001011055.19283-3-cyphar@cyphar.com
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
A common pattern for syscall extensions is increasing the size of a
struct passed from userspace, such that the zero-value of the new fields
result in the old kernel behaviour (allowing for a mix of userspace and
kernel vintages to operate on one another in most cases).
While this interface exists for communication in both directions, only
one interface is straightforward to have reasonable semantics for
(userspace passing a struct to the kernel). For kernel returns to
userspace, what the correct semantics are (whether there should be an
error if userspace is unaware of a new extension) is very
syscall-dependent and thus probably cannot be unified between syscalls
(a good example of this problem is [1]).
Previously there was no common lib/ function that implemented
the necessary extension-checking semantics (and different syscalls
implemented them slightly differently or incompletely[2]). Future
patches replace common uses of this pattern to make use of
copy_struct_from_user().
Some in-kernel selftests that insure that the handling of alignment and
various byte patterns are all handled identically to memchr_inv() usage.
[1]: commit 1251201c0d ("sched/core: Fix uclamp ABI bug, clean up and
robustify sched_read_attr() ABI logic and code")
[2]: For instance {sched_setattr,perf_event_open,clone3}(2) all do do
similar checks to copy_struct_from_user() while rt_sigprocmask(2)
always rejects differently-sized struct arguments.
Suggested-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Christian Brauner <christian.brauner@ubuntu.com>
Link: https://lore.kernel.org/r/20191001011055.19283-2-cyphar@cyphar.com
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
KVM can only virtualize as many PMCs as the host supports.
Limit the number of generic counters and fixed counters to the number
of corresponding counters supported on the host, rather than to
INTEL_PMC_MAX_GENERIC and INTEL_PMC_MAX_FIXED, respectively.
Note that INTEL_PMC_MAX_GENERIC is currently 32, which exceeds the 18
contiguous MSR indices reserved by Intel for event selectors. Since
the existing code relies on a contiguous range of MSR indices for
event selectors, it can't possibly work for more than 18 general
purpose counters.
Fixes: f5132b0138 ("KVM: Expose a version 2 architectural PMU to a guests")
Signed-off-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Marc Orr <marcorr@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The ARM accelerated AES driver depends on the new AES library for
its non-SIMD fallback so express this in its Kconfig declaration.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The NEON/Crypto Extensions based AES implementation for 32-bit ARM
can be built in a kernel that targets ARMv6 CPUs and higher, even
though the actual code will not be able to run on that generation,
but it allows for a portable image to be generated that can will
use the special instructions only when they are available.
Since those instructions are part of a FPU profile rather than a
CPU profile, we don't override the architecture in the assembler
code, and most of the scalar code is simple enough to be ARMv6
compatible. However, that changes with commit c61b1607ed,
which introduces calls to the movw/movt instructions, which are
v7+ only.
So override the architecture in the .S file to armv8-a, which
matches the architecture specification in the crypto-neon-fp-armv8
FPU specificier that we already using. Note that using armv7-a
here may trigger an issue with the upcoming Clang 10 release,
which no longer permits .arch/.fpu combinations it views as
incompatible.
Reported-by: kbuild test robot <lkp@intel.com>
Fixes: c61b1607ed ("crypto: arm/aes-ce - implement ciphertext stealing ...")
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Sphinx generates the following warnings for the arm64 doc
pages:
Documentation/arm64/memory.rst:158: WARNING: Unexpected indentation.
Documentation/arm64/memory.rst:162: WARNING: Unexpected indentation.
These indentations warnings can be resolved by utilising code
hightlighting instead.
Signed-off-by: Adam Zerella <adam.zerella@gmail.com>
Signed-off-by: Will Deacon <will@kernel.org>
The system which has SVE feature crashed because of
the memory pointed by task->thread.sve_state was destroyed
by someone.
That is because sve_state is freed while the forking the
child process. The child process has the pointer of sve_state
which is same as the parent's because the child's task_struct
is copied from the parent's one. If the copy_process()
fails as an error on somewhere, for example, copy_creds(),
then the sve_state is freed even if the parent is alive.
The flow is as follows.
copy_process
p = dup_task_struct
=> arch_dup_task_struct
*dst = *src; // copy the entire region.
:
retval = copy_creds
if (retval < 0)
goto bad_fork_free;
:
bad_fork_free:
...
delayed_free_task(p);
=> free_task
=> arch_release_task_struct
=> fpsimd_release_task
=> __sve_free
=> kfree(task->thread.sve_state);
// free the parent's sve_state
Move child's sve_state = NULL and clearing TIF_SVE flag
to arch_dup_task_struct() so that the child doesn't free the
parent's one.
There is no need to wait until copy_process() to clear TIF_SVE for
dst, because the thread flags for dst are initialized already by
copying the src task_struct.
This change simplifies the code, so get rid of comments that are no
longer needed.
As a note, arm64 used to have thread_info on the stack. So it
would not be possible to clear TIF_SVE until the stack is initialized.
From commit c02433dd6d ("arm64: split thread_info from task stack"),
the thread_info is part of the task, so it should be valid to modify
the flag from arch_dup_task_struct().
Cc: stable@vger.kernel.org # 4.15.x-
Fixes: bc0ee47603 ("arm64/sve: Core task context handling")
Signed-off-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Reported-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Suggested-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Dave Martin <Dave.Martin@arm.com>
Tested-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Commit 73f3816609 ("arm64: Advertise mitigation of Spectre-v2, or lack
thereof") renamed the caller of the install_bp_hardening_cb() function
but forgot to update a comment, which can be confusing when trying to
follow the code flow.
Fixes: 73f3816609 ("arm64: Advertise mitigation of Spectre-v2, or lack thereof")
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Will Deacon <will@kernel.org>
When toggling the level trigger to emulate the edge trigger, the
EIC offset is incorrect without adding the corresponding bank index,
thus fix it.
Fixes: 7bf0d7f622 ("gpio: eic: Add edge trigger emulation for EIC")
Cc: stable@vger.kernel.org
Signed-off-by: Bruce Chen <bruce.chen@unisoc.com>
Signed-off-by: Baolin Wang <baolin.wang@linaro.org>
Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Since commit ec757001c8 ("gpio: Enable nonexclusive gpiods from DT
nodes") we are able to get GPIOD_FLAGS_BIT_NONEXCLUSIVE marked gpios.
Currently the gpiolib uses the wrong flags variable for the check. We
need to check the gpiod_flags instead of the of_gpio_flags else we
return -EBUSY for GPIOD_FLAGS_BIT_NONEXCLUSIVE marked and requested
gpiod's.
Fixes: ec757001c8 gpio: Enable nonexclusive gpiods from DT nodes
Cc: stable@vger.kernel.org
Signed-off-by: Marco Felsch <m.felsch@pengutronix.de>
[Bartosz: the function was moved to gpiolib-of.c so updated the patch]
Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
When emulating open-drain/open-source by not actively driving the output
lines - we're simply changing their mode to input. This is wrong as it
will then make it impossible to change the value of such line - it's now
considered to actually be in input mode. If we want to still use the
direction_input() callback for simplicity then we need to set FLAG_IS_OUT
manually in gpiod_direction_output() and not clear it in
gpio_set_open_drain_value_commit() and
gpio_set_open_source_value_commit().
Fixes: c663e5f567 ("gpio: support native single-ended hardware drivers")
Cc: stable@vger.kernel.org
Reported-by: Kent Gibson <warthog618@gmail.com>
Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
The QIB parm area is 128 bytes long. Current code consistently misuses
an _entirely unrelated_ QDIO constant, merely because it has the same
value. Stop doing so.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Reviewed-by: Jens Remus <jremus@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Fix indentation in the s390 CPU Measuement Facility
sampling device dirver.
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
s390 IBM z15 introduces a check if the CPU Mesurement Facility
sampling is temporarily unavailable. If this is the case return -EBUSY
and abort the setup of CPU Measuement facility sampling.
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Use consistant debug print format of the form variable
blank value.
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Other parts of the kernel expect these nonblocking EFI callbacks to
exist and crash when running under Xen. Since the implementations of
xen_efi_set_variable() and xen_efi_query_variable_info() do not take any
locks, use them for the nonblocking callbacks too.
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
If kzalloc() returns NULL, the error path doesn't stop the flow of
control from entering rtw_hal_read_chip_version() which dereferences the
null pointer. Fix this by adding a 'goto' to the error path to more
gracefully handle the issue and avoid proceeding with initialization
steps that we're no longer prepared to handle.
Also update the debug message to be more consistent with the other debug
messages in this function.
Addresses-Coverity: ("Dereference after null check")
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Link: https://lore.kernel.org/r/20190927214415.899-1-connor.kuehl@canonical.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The PCM draining behavior got broken since the recent refactoring, and
this turned out to be the incorrect expectation of the firmware
behavior regarding "draining". While I expected the "drain" flag at
the stop operation would do processing the queued samples, it seems
rather dropping the samples.
As a quick fix, just drop the SNDRV_PCM_INFO_DRAIN_TRIGGER flag, so
that the driver uses the normal PCM draining procedure. Also, put
some caution comment to the function for future readers not to fall
into the same pitfall.
Fixes: d7ca3a7154 ("staging: bcm2835-audio: Operate non-atomic PCM ops")
BugLink: https://github.com/raspberrypi/linux/issues/2983
Cc: stable@vger.kernel.org
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Acked-by: Stefan Wahren <wahrenst@gmx.net>
Link: https://lore.kernel.org/r/20190914152405.7416-1-tiwai@suse.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit c440eee1a7 ("Staging: fbtft: Switch to the gpio descriptor
interface") removed the gpio code from fbtft_device rendering it useless.
fbtft_device is a module that was used on the Raspberry Pi to dynamically
add fbtft devices when the Pi didn't have Device Tree support.
Just remove the module since it's the responsibility of Device Tree, ACPI
or platform code to add devices.
Fixes: c440eee1a7 ("Staging: fbtft: Switch to the gpio descriptor interface")
Signed-off-by: Noralf Trønnes <noralf@tronnes.org>
Link: https://lore.kernel.org/r/20190917171843.10334-2-noralf@tronnes.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit c440eee1a7 ("Staging: fbtft: Switch to the gpio descriptor
interface") removed setting gpios via platform data. This means that
fbtft will now only work with Device Tree so set the dependency.
This also prevents a NULL pointer deref on non-DT platform because
fbtftops.request_gpios is not set in that case anymore.
Fixes: c440eee1a7 ("Staging: fbtft: Switch to the gpio descriptor interface")
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Noralf Trønnes <noralf@tronnes.org>
Link: https://lore.kernel.org/r/20190917171843.10334-1-noralf@tronnes.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
On 32-bit:
In file included from drivers/staging/octeon/octeon-ethernet.h:41,
from drivers/staging/octeon/ethernet-tx.c:25:
drivers/staging/octeon/octeon-stubs.h: In function ‘cvmx_phys_to_ptr’:
drivers/staging/octeon/octeon-stubs.h:1205:9: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
return (void *)(physical_address);
^
drivers/staging/octeon/ethernet-tx.c: In function ‘cvm_oct_xmit’:
drivers/staging/octeon/ethernet-tx.c:264:37: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
hw_buffer.s.addr = XKPHYS_TO_PHYS((u64)skb->data);
^
drivers/staging/octeon/octeon-stubs.h:2:30: note: in definition of macro ‘XKPHYS_TO_PHYS’
#define XKPHYS_TO_PHYS(p) (p)
^
drivers/staging/octeon/ethernet-tx.c:268:37: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
hw_buffer.s.addr = XKPHYS_TO_PHYS((u64)skb->data);
^
drivers/staging/octeon/octeon-stubs.h:2:30: note: in definition of macro ‘XKPHYS_TO_PHYS’
#define XKPHYS_TO_PHYS(p) (p)
^
drivers/staging/octeon/ethernet-tx.c:276:20: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
XKPHYS_TO_PHYS((u64)skb_frag_address(fs));
^
drivers/staging/octeon/octeon-stubs.h:2:30: note: in definition of macro ‘XKPHYS_TO_PHYS’
#define XKPHYS_TO_PHYS(p) (p)
^
drivers/staging/octeon/ethernet-tx.c:280:37: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
hw_buffer.s.addr = XKPHYS_TO_PHYS((u64)CVM_OCT_SKB_CB(skb));
^
drivers/staging/octeon/octeon-stubs.h:2:30: note: in definition of macro ‘XKPHYS_TO_PHYS’
#define XKPHYS_TO_PHYS(p) (p)
^
Fix this by replacing casts to "u64" by casts to "uintptr_t", which is
either 32-bit or 64-bit, and adding an intermediate cast to "uintptr_t"
where needed.
Exposed by commit 171a9bae68 ("staging/octeon: Allow test build on
!MIPS").
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Link: https://lore.kernel.org/r/20190919095022.29099-1-geert@linux-m68k.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The speed divisor is used in a context expecting an s64, but it is
evaluated using 32-bit arithmetic.
To avoid that happening, instead of multiplying by 1,000,000 in the
first place, simplify the fraction and do a standard 32 bit division
instead.
Fixes: f04b514c0c ("taprio: Set default link speed to 10 Mbps in taprio_set_picos_per_byte")
Reported-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In sja1105_static_config_upload, in two cases memory is leaked: when
static_config_buf_prepare_for_upload fails and when sja1105_inhibit_tx
fails. In both cases config_buf should be released.
Fixes: 8aa9ebccae ("net: dsa: Introduce driver for NXP SJA1105 5-port L2 switch")
Fixes: 1a4c69406c ("net: dsa: sja1105: Prevent PHY jabbering during switch reset")
Signed-off-by: Navid Emamdoost <navid.emamdoost@gmail.com>
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Johannes Berg reports lots of modpost warnings on ARCH=um builds:
WARNING: "rename" [vmlinux] is a static EXPORT_SYMBOL
WARNING: "lseek" [vmlinux] is a static EXPORT_SYMBOL
WARNING: "ftruncate64" [vmlinux] is a static EXPORT_SYMBOL
WARNING: "getuid" [vmlinux] is a static EXPORT_SYMBOL
WARNING: "lseek64" [vmlinux] is a static EXPORT_SYMBOL
WARNING: "unlink" [vmlinux] is a static EXPORT_SYMBOL
WARNING: "pwrite64" [vmlinux] is a static EXPORT_SYMBOL
WARNING: "close" [vmlinux] is a static EXPORT_SYMBOL
WARNING: "opendir" [vmlinux] is a static EXPORT_SYMBOL
WARNING: "pread64" [vmlinux] is a static EXPORT_SYMBOL
WARNING: "syscall" [vmlinux] is a static EXPORT_SYMBOL
WARNING: "readdir" [vmlinux] is a static EXPORT_SYMBOL
WARNING: "readdir64" [vmlinux] is a static EXPORT_SYMBOL
WARNING: "futimes" [vmlinux] is a static EXPORT_SYMBOL
WARNING: "__lxstat" [vmlinux] is a static EXPORT_SYMBOL
WARNING: "write" [vmlinux] is a static EXPORT_SYMBOL
WARNING: "closedir" [vmlinux] is a static EXPORT_SYMBOL
WARNING: "__xstat" [vmlinux] is a static EXPORT_SYMBOL
WARNING: "fsync" [vmlinux] is a static EXPORT_SYMBOL
WARNING: "__lxstat64" [vmlinux] is a static EXPORT_SYMBOL
WARNING: "__fxstat64" [vmlinux] is a static EXPORT_SYMBOL
WARNING: "telldir" [vmlinux] is a static EXPORT_SYMBOL
WARNING: "printf" [vmlinux] is a static EXPORT_SYMBOL
WARNING: "readlink" [vmlinux] is a static EXPORT_SYMBOL
WARNING: "__sprintf_chk" [vmlinux] is a static EXPORT_SYMBOL
WARNING: "link" [vmlinux] is a static EXPORT_SYMBOL
WARNING: "rmdir" [vmlinux] is a static EXPORT_SYMBOL
WARNING: "fdatasync" [vmlinux] is a static EXPORT_SYMBOL
WARNING: "truncate" [vmlinux] is a static EXPORT_SYMBOL
WARNING: "statfs" [vmlinux] is a static EXPORT_SYMBOL
WARNING: "__errno_location" [vmlinux] is a static EXPORT_SYMBOL
WARNING: "__xmknod" [vmlinux] is a static EXPORT_SYMBOL
WARNING: "open64" [vmlinux] is a static EXPORT_SYMBOL
WARNING: "truncate64" [vmlinux] is a static EXPORT_SYMBOL
WARNING: "open" [vmlinux] is a static EXPORT_SYMBOL
WARNING: "read" [vmlinux] is a static EXPORT_SYMBOL
WARNING: "chown" [vmlinux] is a static EXPORT_SYMBOL
WARNING: "chmod" [vmlinux] is a static EXPORT_SYMBOL
WARNING: "utime" [vmlinux] is a static EXPORT_SYMBOL
WARNING: "fchmod" [vmlinux] is a static EXPORT_SYMBOL
WARNING: "seekdir" [vmlinux] is a static EXPORT_SYMBOL
WARNING: "ioctl" [vmlinux] is a static EXPORT_SYMBOL
WARNING: "dup2" [vmlinux] is a static EXPORT_SYMBOL
WARNING: "statfs64" [vmlinux] is a static EXPORT_SYMBOL
WARNING: "utimes" [vmlinux] is a static EXPORT_SYMBOL
WARNING: "mkdir" [vmlinux] is a static EXPORT_SYMBOL
WARNING: "fchown" [vmlinux] is a static EXPORT_SYMBOL
WARNING: "__guard" [vmlinux] is a static EXPORT_SYMBOL
WARNING: "symlink" [vmlinux] is a static EXPORT_SYMBOL
WARNING: "access" [vmlinux] is a static EXPORT_SYMBOL
WARNING: "__stack_smash_handler" [vmlinux] is a static EXPORT_SYMBOL
When you run "make", the modpost is run twice; before linking vmlinux,
and before building modules. All the warnings above are from the second
modpost.
The offending symbols are defined not in vmlinux, but in the C library.
The first modpost is run against the relocatable vmlinux.o, and those
warnings are nicely suppressed because the SH_UNDEF entries from the
symbol table clear the ->is_static flag.
The second modpost is run against the executable vmlinux (+ modules),
where those symbols have been resolved, but the definitions do not
exist.
This commit fixes it in a straightforward way; suppress the static
EXPORT_SYMBOL warnings from "vmlinux".
Without this commit, we see valid warnings twice anyway. For example,
ARCH=arm64 defconfig shows the following warning twice:
WARNING: "HYPERVISOR_platform_op" [vmlinux] is a static EXPORT_SYMBOL_GPL
So, it is reasonable to suppress the second one.
Fixes: 15bfc2348d ("modpost: check for static EXPORT_SYMBOL* functions")
Reported-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Tested-by: Johannes Berg <johannes@sipsolutions.net>
Tested-by: Denis Efremov <efremov@linux.com>
Sometimes the PTP synchronization on the switch 'jumps':
ptp4l[11241.155]: rms 8 max 16 freq -21732 +/- 11 delay 742 +/- 0
ptp4l[11243.157]: rms 7 max 17 freq -21731 +/- 10 delay 744 +/- 0
ptp4l[11245.160]: rms 33592410 max 134217731 freq +192422 +/- 8530253 delay 743 +/- 0
ptp4l[11247.163]: rms 811631 max 964131 freq +10326 +/- 557785 delay 743 +/- 0
ptp4l[11249.166]: rms 261936 max 533876 freq -304323 +/- 126371 delay 744 +/- 0
ptp4l[11251.169]: rms 48700 max 57740 freq -20218 +/- 30532 delay 744 +/- 0
ptp4l[11253.171]: rms 14570 max 30163 freq -5568 +/- 7563 delay 742 +/- 0
ptp4l[11255.174]: rms 2914 max 3440 freq -22001 +/- 1667 delay 744 +/- 1
ptp4l[11257.177]: rms 811 max 1710 freq -22653 +/- 451 delay 744 +/- 1
ptp4l[11259.180]: rms 177 max 218 freq -21695 +/- 89 delay 741 +/- 0
ptp4l[11261.182]: rms 45 max 92 freq -21677 +/- 32 delay 742 +/- 0
ptp4l[11263.186]: rms 14 max 32 freq -21733 +/- 11 delay 742 +/- 0
ptp4l[11265.188]: rms 9 max 14 freq -21725 +/- 12 delay 742 +/- 0
ptp4l[11267.191]: rms 9 max 16 freq -21727 +/- 13 delay 742 +/- 0
ptp4l[11269.194]: rms 6 max 15 freq -21726 +/- 9 delay 743 +/- 0
ptp4l[11271.197]: rms 8 max 15 freq -21728 +/- 11 delay 743 +/- 0
ptp4l[11273.200]: rms 6 max 12 freq -21727 +/- 8 delay 743 +/- 0
ptp4l[11275.202]: rms 9 max 17 freq -21720 +/- 11 delay 742 +/- 0
ptp4l[11277.205]: rms 9 max 18 freq -21725 +/- 12 delay 742 +/- 0
Background: the switch only offers partial RX timestamps (24 bits) and
it is up to the driver to read the PTP clock to fill those timestamps up
to 64 bits. But the PTP clock readout needs to happen quickly enough (in
0.135 seconds, in fact), otherwise the PTP clock will wrap around 24
bits, condition which cannot be detected.
Looking at the 'max 134217731' value on output line 3, one can see that
in hex it is 0x8000003. Because the PTP clock resolution is 8 ns,
that means 0x1000000 in ticks, which is exactly 2^24. So indeed this is
a PTP clock wraparound, but the reason might be surprising.
What is going on is that sja1105_tstamp_reconstruct(priv, now, ts)
expects a "now" time that is later than the "ts" was snapshotted at.
This, of course, is obvious: we read the PTP time _after_ the partial RX
timestamp was received. However, the workqueue is processing frames from
a skb queue and reuses the same PTP time, read once at the beginning.
Normally the skb queue only contains one frame and all goes well. But
when the skb queue contains two frames, the second frame that gets
dequeued might have been partially timestamped by the RX MAC _after_ we
had read our PTP time initially.
The code was originally like that due to concerns that SPI access for
PTP time readout is a slow process, and we are time-constrained anyway
(aka: premature optimization). But some timing analysis reveals that the
time spent until the RX timestamp is completely reconstructed is 1 order
of magnitude lower than the 0.135 s deadline even under worst-case
conditions. So we can afford to read the PTP time for each frame in the
RX timestamping queue, which of course ensures that the full PTP time is
in the partial timestamp's future.
Fixes: f3097be21b ("net: dsa: sja1105: Add a state machine for RX timestamping")
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 40df759e2b ("kbuild: Fix build with binutils <= 2.19")
introduced ar-option and KBUILD_ARFLAGS to deal with old binutils.
According to Documentation/process/changes.rst, the current minimal
supported version of binutils is 2.21 so you can assume the 'D' option
is always supported. Not only GNU ar but also llvm-ar supports it.
With the 'D' option hard-coded, there is no more user of ar-option
or KBUILD_ARFLAGS.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Stefan Schmidt says:
====================
pull-request: ieee802154 for net 2019-09-28
An update from ieee802154 for your *net* tree.
Three driver fixes. Navid Emamdoost fixed a memory leak on an error
path in the ca8210 driver, Johan Hovold fixed a use-after-free found
by syzbot in the atusb driver and Christophe JAILLET makes sure
__skb_put_data is used instead of memcpy in the mcr20a driver
I switched from branches to tags here to be pulled from. So far not
annotated and not signed. Once I fixed my scripts it should contain
this messages as annotations. If you want it signed as well just tell
me. If there are any problems let me know.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The "reuse->sock[]" array is shared by multiple sockets. The going away
sk must unpublish itself from "reuse->sock[]" before making call_rcu()
call. However, this unpublish-action is currently done after a grace
period and it may cause use-after-free.
The fix is to move reuseport_detach_sock() to sk_destruct().
Due to the above reason, any socket with sk_reuseport_cb has
to go through the rcu grace period before freeing it.
It is a rather old bug (~3 yrs). The Fixes tag is not necessary
the right commit but it is the one that introduced the SOCK_RCU_FREE
logic and this fix is depending on it.
Fixes: a4298e4522 ("net: add SOCK_RCU_FREE socket flag")
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
From commit a94e4f24ec ("MIPS: init: Drop boot_mem_map") onwards,
add_memory_region() is handled by memblock_add()/memblock_reserve()
directly and all bootmem API should be converted to memblock API.
Otherwise it will lead to boot failure, especially in the NUMA case
because add_memory_region lose the node_id information.
Fixes: a94e4f24ec ("MIPS: init: Drop boot_mem_map")
Signed-off-by: Huacai Chen <chenhc@lemote.com>
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
[paul.burton@mips.com:
- Invert node_id check to de-indent the switch statement & avoid lines
over 80 characters.
- Fixup commit reference in commit message.]
Signed-off-by: Paul Burton <paul.burton@mips.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: James Hogan <jhogan@kernel.org>
Cc: linux-mips@linux-mips.org
Cc: linux-mips@vger.kernel.org
Cc: Fuxin Zhang <zhangfx@lemote.com>
Cc: Zhangjin Wu <wuzhangjin@gmail.com>
Cc: Huacai Chen <chenhuacai@gmail.com>
The Intel fixed counters use a special table to override the JSON
information.
During this override the period information from the JSON file got
dropped, which results in inst_retired.any and similar running with
frequency mode instead of a period.
Just specify the expected period in the table.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Link: http://lore.kernel.org/lkml/20190927233546.11533-2-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When the LBR data and the instructions in a binary do not match the loop
printing instructions could get confused and print a long stream of
bogus <bad> instructions.
The problem was that if the instruction decoder cannot decode an
instruction it ilen wasn't initialized, so the loop going through the
basic block would continue with the previous value.
Harden the code to avoid such problems:
- Make sure ilen is always freshly initialized and is 0 for bad
instructions.
- Do not overrun the code buffer while printing instructions
- Print a warning message if the final jump is not on an instruction
boundary.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Link: http://lore.kernel.org/lkml/20190927233546.11533-1-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Whenever an mmap/mmap2 event occurs, the map tree must be updated to add a new
entry. If a new map overlaps a previous map, the overlapped section of the
previous map is effectively unmapped, but the non-overlapping sections are
still valid.
maps__fixup_overlappings() is responsible for creating any new map entries from
the previously overlapped map. It optionally creates a before and an after map.
When creating the after map the existing code failed to adjust the map.pgoff.
This meant the new after map would incorrectly calculate the file offset
for the ip. This results in incorrect symbol name resolution for any ip in the
after region.
Make maps__fixup_overlappings() correctly populate map.pgoff.
Add an assert that new mapping matches old mapping at the beginning of
the after map.
Committer-testing:
Validated correct parsing of libcoreclr.so symbols from .NET Core 3.0 preview9
(which didn't strip symbols).
Preparation:
~/dotnet3.0-preview9/dotnet new webapi -o perfSymbol
cd perfSymbol
~/dotnet3.0-preview9/dotnet publish
perf record ~/dotnet3.0-preview9/dotnet \
bin/Debug/netcoreapp3.0/publish/perfSymbol.dll
^C
Before:
perf script --show-mmap-events 2>&1 | grep -e MMAP -e unknown |\
grep libcoreclr.so | head -n 4
dotnet 1907 373352.698780: PERF_RECORD_MMAP2 1907/1907: \
[0x7fe615726000(0x768000) @ 0 08:02 5510620 765057155]: \
r-xp .../3.0.0-preview9-19423-09/libcoreclr.so
dotnet 1907 373352.701091: PERF_RECORD_MMAP2 1907/1907: \
[0x7fe615974000(0x1000) @ 0x24e000 08:02 5510620 765057155]: \
rwxp .../3.0.0-preview9-19423-09/libcoreclr.so
dotnet 1907 373352.701241: PERF_RECORD_MMAP2 1907/1907: \
[0x7fe615c42000(0x1000) @ 0x51c000 08:02 5510620 765057155]: \
rwxp .../3.0.0-preview9-19423-09/libcoreclr.so
dotnet 1907 373352.705249: 250000 cpu-clock: \
7fe6159a1f99 [unknown] \
(.../3.0.0-preview9-19423-09/libcoreclr.so)
After:
perf script --show-mmap-events 2>&1 | grep -e MMAP -e unknown |\
grep libcoreclr.so | head -n 4
dotnet 1907 373352.698780: PERF_RECORD_MMAP2 1907/1907: \
[0x7fe615726000(0x768000) @ 0 08:02 5510620 765057155]: \
r-xp .../3.0.0-preview9-19423-09/libcoreclr.so
dotnet 1907 373352.701091: PERF_RECORD_MMAP2 1907/1907: \
[0x7fe615974000(0x1000) @ 0x24e000 08:02 5510620 765057155]: \
rwxp .../3.0.0-preview9-19423-09/libcoreclr.so
dotnet 1907 373352.701241: PERF_RECORD_MMAP2 1907/1907: \
[0x7fe615c42000(0x1000) @ 0x51c000 08:02 5510620 765057155]: \
rwxp .../3.0.0-preview9-19423-09/libcoreclr.so
All the [unknown] symbols were resolved.
Signed-off-by: Steve MacLean <Steve.MacLean@Microsoft.com>
Tested-by: Brian Robbins <brianrob@microsoft.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Eric Saint-Etienne <eric.saint.etienne@oracle.com>
Cc: John Keeping <john@metanate.com>
Cc: John Salem <josalem@microsoft.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Tom McDonald <thomas.mcdonald@microsoft.com>
Link: http://lore.kernel.org/lkml/BN8PR21MB136270949F22A6A02335C238F7800@BN8PR21MB1362.namprd21.prod.outlook.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To pick the changes in:
200824f55e ("KVM: s390: Disallow invalid bits in kvm_valid_regs and kvm_dirty_regs")
4a53d99dd0 ("KVM: VMX: Introduce exit reason for receiving INIT signal on guest-mode")
7396d337cf ("KVM: x86: Return to userspace with internal error on unexpected exit reason")
92f35b751c ("KVM: arm/arm64: vgic: Allow more than 256 vcpus for KVM_IRQ_LINE")
None of them trigger any changes in tooling, this time this is just to silence
these perf build warnings:
Warning: Kernel ABI header at 'tools/include/uapi/linux/kvm.h' differs from latest version at 'include/uapi/linux/kvm.h'
diff -u tools/include/uapi/linux/kvm.h include/uapi/linux/kvm.h
Warning: Kernel ABI header at 'tools/arch/x86/include/uapi/asm/vmx.h' differs from latest version at 'arch/x86/include/uapi/asm/vmx.h'
diff -u tools/arch/x86/include/uapi/asm/vmx.h arch/x86/include/uapi/asm/vmx.h
Warning: Kernel ABI header at 'tools/arch/s390/include/uapi/asm/kvm.h' differs from latest version at 'arch/s390/include/uapi/asm/kvm.h'
diff -u tools/arch/s390/include/uapi/asm/kvm.h arch/s390/include/uapi/asm/kvm.h
Warning: Kernel ABI header at 'tools/arch/arm/include/uapi/asm/kvm.h' differs from latest version at 'arch/arm/include/uapi/asm/kvm.h'
diff -u tools/arch/arm/include/uapi/asm/kvm.h arch/arm/include/uapi/asm/kvm.h
Warning: Kernel ABI header at 'tools/arch/arm64/include/uapi/asm/kvm.h' differs from latest version at 'arch/arm64/include/uapi/asm/kvm.h'
diff -u tools/arch/arm64/include/uapi/asm/kvm.h arch/arm64/include/uapi/asm/kvm.h
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Janosch Frank <frankja@linux.ibm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Liran Alon <liran.alon@oracle.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Thomas Huth <thuth@redhat.com>
Link: https://lkml.kernel.org/n/tip-akuugvvjxte26kzv23zp5d2z@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To pick the changes from:
78a1b96bcf ("fscrypt: add FS_IOC_REMOVE_ENCRYPTION_KEY_ALL_USERS ioctl")
23c688b540 ("fscrypt: allow unprivileged users to add/remove keys for v2 policies")
5dae460c22 ("fscrypt: v2 encryption policy support")
5a7e29924d ("fscrypt: add FS_IOC_GET_ENCRYPTION_KEY_STATUS ioctl")
b1c0ec3599 ("fscrypt: add FS_IOC_REMOVE_ENCRYPTION_KEY ioctl")
22d94f493b ("fscrypt: add FS_IOC_ADD_ENCRYPTION_KEY ioctl")
3b6df59bc4 ("fscrypt: use FSCRYPT_* definitions, not FS_*")
2336d0deb2 ("fscrypt: use FSCRYPT_ prefix for uapi constants")
7af0ab0d3a ("fs, fscrypt: move uapi definitions to new header <linux/fscrypt.h>")
That don't trigger any changes in tooling, as it so far is used only
for:
$ grep -l 'fs\.h' tools/perf/trace/beauty/*.sh | xargs grep regex=
tools/perf/trace/beauty/rename_flags.sh:regex='^[[:space:]]*#[[:space:]]*define[[:space:]]+RENAME_([[:alnum:]_]+)[[:space:]]+\(1[[:space:]]*<<[[:space:]]*([[:xdigit:]]+)[[:space:]]*\)[[:space:]]*.*'
tools/perf/trace/beauty/sync_file_range.sh:regex='^[[:space:]]*#[[:space:]]*define[[:space:]]+SYNC_FILE_RANGE_([[:alnum:]_]+)[[:space:]]+([[:xdigit:]]+)[[:space:]]*.*'
tools/perf/trace/beauty/usbdevfs_ioctl.sh:regex="^#[[:space:]]*define[[:space:]]+USBDEVFS_(\w+)(\(\w+\))?[[:space:]]+_IO[CWR]{0,2}\([[:space:]]*(_IOC_\w+,[[:space:]]*)?'U'[[:space:]]*,[[:space:]]*([[:digit:]]+).*"
tools/perf/trace/beauty/usbdevfs_ioctl.sh:regex="^#[[:space:]]*define[[:space:]]+USBDEVFS_(\w+)[[:space:]]+_IO[WR]{0,2}\([[:space:]]*'U'[[:space:]]*,[[:space:]]*([[:digit:]]+).*"
$
This silences this perf build warning:
Warning: Kernel ABI header at 'tools/include/uapi/linux/fs.h' differs from latest version at 'include/uapi/linux/fs.h'
diff -u tools/include/uapi/linux/fs.h include/uapi/linux/fs.h
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Eric Biggers <ebiggers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-44g48exl9br9ba0t64chqb4i@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To pick up the changes from:
4ed3350539 ("USB: usbfs: Add a capability flag for runtime suspend")
7794f486ed ("usbfs: Add ioctls for runtime power management")
This triggers these changes in the kernel sources, automagically
supporting these new ioctls in the 'perf trace' beautifiers.
Soon this will be used in things like filter expressions for tracepoints
in 'perf record', 'perf trace', 'perf top', i.e. filter expressions will
do a lookup to turn things like USBDEVFS_WAIT_FOR_RESUME into _IO('U',
35) before associating the tracepoint expression to tracepoint perf
event.
$ tools/perf/trace/beauty/usbdevfs_ioctl.sh > before
$ cp include/uapi/linux/usbdevice_fs.h tools/include/uapi/linux/usbdevice_fs.h
$ git diff
diff --git a/tools/include/uapi/linux/usbdevice_fs.h b/tools/include/uapi/linux/usbdevice_fs.h
index 78efe870c2b7..cf525cddeb94 100644
--- a/tools/include/uapi/linux/usbdevice_fs.h
+++ b/tools/include/uapi/linux/usbdevice_fs.h
@@ -158,6 +158,7 @@ struct usbdevfs_hub_portinfo {
#define USBDEVFS_CAP_MMAP 0x20
#define USBDEVFS_CAP_DROP_PRIVILEGES 0x40
#define USBDEVFS_CAP_CONNINFO_EX 0x80
+#define USBDEVFS_CAP_SUSPEND 0x100
/* USBDEVFS_DISCONNECT_CLAIM flags & struct */
@@ -223,5 +224,8 @@ struct usbdevfs_streams {
* extending size of the data returned.
*/
#define USBDEVFS_CONNINFO_EX(len) _IOC(_IOC_READ, 'U', 32, len)
+#define USBDEVFS_FORBID_SUSPEND _IO('U', 33)
+#define USBDEVFS_ALLOW_SUSPEND _IO('U', 34)
+#define USBDEVFS_WAIT_FOR_RESUME _IO('U', 35)
#endif /* _UAPI_LINUX_USBDEVICE_FS_H */
$ tools/perf/trace/beauty/usbdevfs_ioctl.sh > after
$ diff -u before after
--- before 2019-09-27 11:41:50.634867620 -0300
+++ after 2019-09-27 11:42:07.453102978 -0300
@@ -24,6 +24,9 @@
[30] = "DROP_PRIVILEGES",
[31] = "GET_SPEED",
[32] = "CONNINFO_EX",
+ [33] = "FORBID_SUSPEND",
+ [34] = "ALLOW_SUSPEND",
+ [35] = "WAIT_FOR_RESUME",
[3] = "RESETEP",
[4] = "SETINTERFACE",
[5] = "SETCONFIGURATION",
$
This addresses the following perf build warning:
Warning: Kernel ABI header at 'tools/include/uapi/linux/usbdevice_fs.h' differs from latest version at 'include/uapi/linux/usbdevice_fs.h'
diff -u tools/include/uapi/linux/usbdevice_fs.h include/uapi/linux/usbdevice_fs.h
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-x1rb109b9nfi7pukota82xhj@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To pick the changes from:
1a4e58cce8 ("mm: introduce MADV_PAGEOUT")
9c276cc65a ("mm: introduce MADV_COLD")
That result in these changes in the tools:
$ tools/perf/trace/beauty/madvise_behavior.sh > before
$ cp include/uapi/asm-generic/mman-common.h tools/include/uapi/asm-generic/mman-common.h
$ git diff
diff --git a/tools/include/uapi/asm-generic/mman-common.h b/tools/include/uapi/asm-generic/mman-common.h
index 63b1f506ea67..c160a5354eb6 100644
--- a/tools/include/uapi/asm-generic/mman-common.h
+++ b/tools/include/uapi/asm-generic/mman-common.h
@@ -67,6 +67,9 @@
#define MADV_WIPEONFORK 18 /* Zero memory on fork, child only */
#define MADV_KEEPONFORK 19 /* Undo MADV_WIPEONFORK */
+#define MADV_COLD 20 /* deactivate these pages */
+#define MADV_PAGEOUT 21 /* reclaim these pages */
+
/* compatibility flags */
#define MAP_FILE 0
$ tools/perf/trace/beauty/madvise_behavior.sh > after
$ diff -u before after
--- before 2019-09-27 11:29:43.346320100 -0300
+++ after 2019-09-27 11:30:03.838570439 -0300
@@ -16,6 +16,8 @@
[17] = "DODUMP",
[18] = "WIPEONFORK",
[19] = "KEEPONFORK",
+ [20] = "COLD",
+ [21] = "PAGEOUT",
[100] = "HWPOISON",
[101] = "SOFT_OFFLINE",
};
$
I.e. now when madvise gets those behaviours as args, it will be able to
translate from the number to a human readable string.
This addresses the following perf build warning:
Warning: Kernel ABI header at 'tools/include/uapi/asm-generic/mman-common.h' differs from latest version at 'include/uapi/asm-generic/mman-common.h'
diff -u tools/include/uapi/asm-generic/mman-common.h include/uapi/asm-generic/mman-common.h
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-n40y6c4sa49p29q6sl8w3ufx@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Commit 6334150e9a ("remoteproc: don't allow modular build")
changes CONFIG_REMOTEPROC to a boolean from a tristate config
option which inhibits all defconfigs marking CONFIG_REMOTEPROC as
a module in compiling the remoteproc and dependent config options.
So fix the configs to have CONFIG_REMOTEPROC built in.
Link: https://lore.kernel.org/r/20190920075946.13282-5-j-keerthy@ti.com
Fixes: 6334150e9a ("remoteproc: don't allow modular build")
Signed-off-by: Keerthy <j-keerthy@ti.com>
Acked-by: Will Deacon <will@kernel.org>
[olof: Fixed up all 4 occurrances in this one commit]
Signed-off-by: Olof Johansson <olof@lixom.net>
erspan driver calls ether_setup(), after commit 61e84623ac
("net: centralize net_device min/max MTU checking"), the range
of mtu is [min_mtu, max_mtu], which is [68, 1500] by default.
It causes the dev mtu of the erspan device to not be greater
than 1500, this limit value is not correct for ipgre tap device.
Tested:
Before patch:
# ip link set erspan0 mtu 1600
Error: mtu greater than device maximum.
After patch:
# ip link set erspan0 mtu 1600
# ip -d link show erspan0
21: erspan0@NONE: <BROADCAST,MULTICAST> mtu 1600 qdisc noop state DOWN
mode DEFAULT group default qlen 1000
link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 0
Fixes: 61e84623ac ("net: centralize net_device min/max MTU checking")
Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The QCA8K family supports up to 7 ports. So use the existing
QCA8K_NUM_PORTS define to allocate the switch structure and limit all
operations with the switch ports.
This was not an issue until commit 0394a63acf ("net: dsa: enable and
disable all ports") disabled all unused ports. Since the unused ports 7-11
are outside of the correct register range on this switch some registers
were rewritten with invalid content.
Fixes: 6b93fb4648 ("net-next: dsa: add new driver for qca8xxx family")
Fixes: a0c02161ec ("net: dsa: variable number of ports")
Fixes: 0394a63acf ("net: dsa: enable and disable all ports")
Signed-off-by: Michal Vokáč <michal.vokac@ysoft.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
The largepages debugfs entry is incremented/decremented as shadow
pages are created or destroyed. Clearing it will result in an
underflow, which is harmless to KVM but ugly (and could be
misinterpreted by tools that use debugfs information), so make
this particular statistic read-only.
Cc: kvm-ppc@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Set memory bandwidth limit to filter out resolutions above 720p@60Hz to
avoid underflow errors due to the bandwidth needs of higher resolutions.
am43xx can not provide enough bandwidth to DISPC to correctly handle
'high' resolutions.
Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Pull NVMe changes from Sagi:
"This set consists of various fixes and cleanups:
- controller removal race fix from Balbir
- quirk additions from Gabriel and Jian-Hong
- nvme-pci power state save fix from Mario
- Add 64bit user commands (for 64bit registers) from Marta
- nvme-rdma/nvme-tcp fixes from Max, Mark and Me
- Minor cleanups and nits from James, Dan and John"
* 'nvme-5.4' of git://git.infradead.org/nvme:
nvme-rdma: fix possible use-after-free in connect timeout
nvme: Move ctrl sqsize to generic space
nvme: Add ctrl attributes for queue_count and sqsize
nvme: allow 64-bit results in passthru commands
nvme: Add quirk for Kingston NVME SSD running FW E8FK11.T
nvmet-tcp: remove superflous check on request sgl
Added QUIRKs for ADATA XPG SX8200 Pro 512GB
nvme-rdma: Fix max_hw_sectors calculation
nvme: fix an error code in nvme_init_subsystem()
nvme-pci: Save PCI state before putting drive into deepest state
nvme-tcp: fix wrong stop condition in io_work
nvme-pci: Fix a race in controller removal
nvmet: change ppl to lpp
Some HDD drive may expose multiple hardware queues, such as MegraRaid.
Let's apply the normal plugging for such devices because sequential IO
may benefit a lot from plug merging.
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
If a device is using multiple queues, the IO scheduler may be bypassed.
This may hurt performance for some slow MQ devices, and it also breaks
zoned devices which depend on mq-deadline for respecting the write order
in one zone.
Don't bypass io scheduler if we have one setup.
This patch can double sequential write performance basically on MQ
scsi_debug when mq-deadline is applied.
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Javier González <javier@javigon.com>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
If the connect times out, we may have already destroyed the
queue in the timeout handler, so test if the queue is still
allocated in the connect error handler.
Reported-by: Yi Zhang <yi.zhang@redhat.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
When KVM_GET_MSRS fail the report looks like
==== Test Assertion Failure ====
lib/x86_64/processor.c:1089: r == nmsrs
pid=28775 tid=28775 - Argument list too long
1 0x000000000040a55f: vcpu_save_state at processor.c:1088 (discriminator 3)
2 0x00000000004010e3: main at state_test.c:171 (discriminator 4)
3 0x00007fb8e69223d4: ?? ??:0
4 0x0000000000401287: _start at ??:?
Unexpected result from KVM_GET_MSRS, r: 36 (failed at 194)
and it's not obvious that '194' here is the failed MSR index and that
it's printed in hex. Change that.
Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The l1tf_vmx_mitigation is only set to VMENTER_L1D_FLUSH_NOT_REQUIRED
when the ARCH_CAPABILITIES MSR indicates that L1D flush is not required.
However, if the CPU is not affected by L1TF, l1tf_vmx_mitigation will
still be set to VMENTER_L1D_FLUSH_AUTO. This is certainly not the best
option for a !X86_BUG_L1TF CPU.
So force l1tf_vmx_mitigation to VMENTER_L1D_FLUSH_NOT_REQUIRED to make it
more explicit in case users are checking the vmentry_l1d_flush parameter.
Signed-off-by: Waiman Long <longman@redhat.com>
[Patch rewritten accoring to Borislav Petkov's suggestion. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
When a cpu requests broadcasting, before starting the tick broadcast
hrtimer, bc_set_next() checks if the timer callback (bc_handler) is active
using hrtimer_try_to_cancel(). But hrtimer_try_to_cancel() does not provide
the required synchronization when the callback is active on other core.
The callback could have already executed tick_handle_oneshot_broadcast()
and could have also returned. But still there is a small time window where
the hrtimer_try_to_cancel() returns -1. In that case bc_set_next() returns
without doing anything, but the next_event of the tick broadcast clock
device is already set to a timeout value.
In the race condition diagram below, CPU #1 is running the timer callback
and CPU #2 is entering idle state and so calls bc_set_next().
In the worst case, the next_event will contain an expiry time, but the
hrtimer will not be started which happens when the racing callback returns
HRTIMER_NORESTART. The hrtimer might never recover if all further requests
from the CPUs to subscribe to tick broadcast have timeout greater than the
next_event of tick broadcast clock device. This leads to cascading of
failures and finally noticed as rcu stall warnings
Here is a depiction of the race condition
CPU #1 (Running timer callback) CPU #2 (Enter idle
and subscribe to
tick broadcast)
--------------------- ---------------------
__run_hrtimer() tick_broadcast_enter()
bc_handler() __tick_broadcast_oneshot_control()
tick_handle_oneshot_broadcast()
raw_spin_lock(&tick_broadcast_lock);
dev->next_event = KTIME_MAX; //wait for tick_broadcast_lock
//next_event for tick broadcast clock
set to KTIME_MAX since no other cores
subscribed to tick broadcasting
raw_spin_unlock(&tick_broadcast_lock);
if (dev->next_event == KTIME_MAX)
return HRTIMER_NORESTART
// callback function exits without
restarting the hrtimer //tick_broadcast_lock acquired
raw_spin_lock(&tick_broadcast_lock);
tick_broadcast_set_event()
clockevents_program_event()
dev->next_event = expires;
bc_set_next()
hrtimer_try_to_cancel()
//returns -1 since the timer
callback is active. Exits without
restarting the timer
cpu_base->running = NULL;
The comment that hrtimer cannot be armed from within the callback is
wrong. It is fine to start the hrtimer from within the callback. Also it is
safe to start the hrtimer from the enter/exit idle code while the broadcast
handler is active. The enter/exit idle code and the broadcast handler are
synchronized using tick_broadcast_lock. So there is no need for the
existing try to cancel logic. All this can be removed which will eliminate
the race condition as well.
Fixes: 5d1638acb9 ("tick: Introduce hrtimer based broadcast")
Originally-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Balasubramani Vivekanandan <balasubramani_vivekanandan@mentor.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20190926135101.12102-2-balasubramani_vivekanandan@mentor.com
An optimized build such as:
make -C tools/perf CLANG=1 CC=clang EXTRA_CFLAGS="-O3
will turn the dereference operation into a ud2 instruction, raising a
SIGILL rather than a SIGSEGV. Use raise(..) for correctness and clarity.
Similar issues were addressed in Numfor Mbiziwo-Tiapo's patch:
https://lkml.org/lkml/2019/7/8/1234
Committer testing:
Before:
[root@quaco ~]# perf test hooks
55: perf hooks : Ok
[root@quaco ~]# perf test -v hooks
55: perf hooks :
--- start ---
test child forked, pid 17092
SIGSEGV is observed as expected, try to recover.
Fatal error (SEGFAULT) in perf hook 'test'
test child finished with 0
---- end ----
perf hooks: Ok
[root@quaco ~]#
After:
[root@quaco ~]# perf test hooks
55: perf hooks : Ok
[root@quaco ~]# perf test -v hooks
55: perf hooks :
--- start ---
test child forked, pid 17909
SIGSEGV is observed as expected, try to recover.
Fatal error (SEGFAULT) in perf hook 'test'
test child finished with 0
---- end ----
perf hooks: Ok
[root@quaco ~]#
Fixes: a074865e60 ("perf tools: Introduce perf hooks")
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lore.kernel.org/lkml/20190925195924.152834-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Check that accesses by nested guests are logged according to the
L1 physical addresses rather than L2.
Most of the patch is really adding EPT support to the testing
framework.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Shadow paging is fundamentally incompatible with the page-modification
log, because the GPAs in the log come from the wrong memory map.
In particular, for the EPT page-modification log, the GPAs in the log come
from L2 rather than L1. (If there was a non-EPT page-modification log,
we couldn't use it for shadow paging because it would log GVAs rather
than GPAs).
Therefore, we need to rely on write protection to record dirty pages.
This has the side effect of bypassing PML, since writes now result in an
EPT violation vmexit.
This is relatively easy to add to KVM, because pretty much the only place
that needs changing is spte_clear_dirty. The first access to the page
already goes through the page fault path and records the correct GPA;
it's only subsequent accesses that are wrong. Therefore, we can equip
set_spte (where the first access happens) to record that the SPTE will
have to be write protected, and then spte_clear_dirty will use this
information to do the right thing.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Currently, we are overloading SPTE_SPECIAL_MASK to mean both
"A/D bits unavailable" and MMIO, where the difference between the
two is determined by mio_mask and mmio_value.
However, the next patch will need two bits to distinguish
availability of A/D bits from write protection. So, while at
it give MMIO its own bit pattern, and move the two bits from
bit 62 to bits 52..53 since Intel is allocating EPT page table
bits from the top.
Reviewed-by: Junaid Shahid <junaids@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
I was surprised to see that the guest reported `fxsave_leak' while the
host did not. After digging deeper I noticed that the bits are simply
masked out during enumeration.
The XSAVEERPTR feature is actually a bug fix on AMD which means the
kernel can disable a workaround.
Pass XSAVEERPTR to the guest if available on the host.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
CLZERO is available to the guest if it is supported on the
host. Therefore, enumerate support for the instruction in
KVM_GET_SUPPORTED_CPUID whenever it is supported on the host.
Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
For these CPUID leaves, the EDX output is not dependent on the ECX
input (i.e. the SIGNIFCANT_INDEX flag doesn't apply to
EDX). Furthermore, the low byte of the ECX output is always identical
to the low byte of the ECX input. KVM does not produce the correct ECX
and EDX outputs for any undefined subleaves beyond the first.
Special-case these CPUID leaves in kvm_cpuid, so that the ECX and EDX
outputs are properly generated for all undefined subleaves.
Fixes: 0771671749 ("KVM: Enhance guest cpuid management")
Fixes: a87f2d3a6e ("KVM: x86: Add Intel CPUID.1F cpuid emulation support")
Signed-off-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Marc Orr <marcorr@google.com>
Reviewed-by: Peter Shier <pshier@google.com>
Reviewed-by: Jacob Xu <jacobhxu@google.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reported by syzkaller:
WARNING: CPU: 0 PID: 6544 at /home/kernel/data/kvm/arch/x86/kvm//vmx/vmx.c:4689 handle_desc+0x37/0x40 [kvm_intel]
CPU: 0 PID: 6544 Comm: a.out Tainted: G OE 5.3.0-rc4+ #4
RIP: 0010:handle_desc+0x37/0x40 [kvm_intel]
Call Trace:
vmx_handle_exit+0xbe/0x6b0 [kvm_intel]
vcpu_enter_guest+0x4dc/0x18d0 [kvm]
kvm_arch_vcpu_ioctl_run+0x407/0x660 [kvm]
kvm_vcpu_ioctl+0x3ad/0x690 [kvm]
do_vfs_ioctl+0xa2/0x690
ksys_ioctl+0x6d/0x80
__x64_sys_ioctl+0x1a/0x20
do_syscall_64+0x74/0x720
entry_SYSCALL_64_after_hwframe+0x49/0xbe
When CR4.UMIP is set, guest should have UMIP cpuid flag. Current
kvm set_sregs function doesn't have such check when userspace inputs
sregs values. SECONDARY_EXEC_DESC is enabled on writes to CR4.UMIP
in vmx_set_cr4 though guest doesn't have UMIP cpuid flag. The testcast
triggers handle_desc warning when executing ltr instruction since
guest architectural CR4 doesn't set UMIP. This patch fixes it by
adding valid CR4 and CPUID combination checking in __set_sregs.
syzkaller source: https://syzkaller.appspot.com/x/repro.c?x=138efb99600000
Reported-by: syzbot+0f1819555fbdce992df9@syzkaller.appspotmail.com
Cc: stable@vger.kernel.org
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Don't return -E2BIG from __do_cpuid_func when processing function 0BH
or 1FH and the last interesting subleaf occupies the last allocated
entry in the result array.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Fixes: 831bf664e9 ("KVM: Refactor and simplify kvm_dev_ioctl_get_supported_cpuid")
Signed-off-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Peter Shier <pshier@google.com>
Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
5000 guest cycles delta is easy to encounter on desktop, per-vCPU
lapic_timer_advance_ns always keeps at 1000ns initial value, let's
loosen the filter a bit to let adaptive tuning make progress.
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Current controller interrogation requires a lot of guesswork
on how many io queues were created and what the io sq size is.
The numbers are dependent upon core/fabric defaults, connect
arguments, and target responses.
Add sysfs attributes for queue_count and sqsize.
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Kingston NVME SSD with firmware version E8FK11.T has no interrupt after
resume with actions related to suspend to idle. This patch applied
NVME_QUIRK_SIMPLE_SUSPEND quirk to fix this issue.
Fixes: d916b1be94 ("nvme-pci: use host managed power state for suspend")
Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=204887
Signed-off-by: Jian-Hong Pan <jian-hong@endlessm.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Booting with default_ps_max_latency_us >6000 makes the device fail.
Also SUBNQN is NULL and gives a warning on each boot/resume.
$ nvme id-ctrl /dev/nvme0 | grep ^subnqn
subnqn : (null)
I use this device with an Acer Nitro 5 (AN515-43-R8BF) Laptop.
To be sure is not a Laptop issue only, I tested the device on
my server board with the same results.
( with 2x,4x link on the board and 4x link on a PCI-E card ).
Signed-off-by: Gabriel Craciunescu <nix.or.die@gmail.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
By default, the NVMe/RDMA driver should support max io_size of 1MiB (or
upto the maximum supported size by the HCA). Currently, one will see that
/sys/class/block/<bdev>/queue/max_hw_sectors_kb is 1020 instead of 1024.
A non power of 2 value can cause performance degradation due to
unnecessary splitting of IO requests and unoptimized allocation units.
The number of pages per MR has been fixed here, so there is no longer any
need to reduce max_sectors by 1.
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
"ret" should be a negative error code here, but it's either success or
possibly uninitialized.
Fixes: 32fd90c407 ("nvme: change locking for the per-subsystem controller list")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
The action of saving the PCI state will cause numerous PCI configuration
space reads which depending upon the vendor implementation may cause
the drive to exit the deepest NVMe state.
In these cases ASPM will typically resolve the PCIe link state and APST
may resolve the NVMe power state. However it has also been observed
that this register access after quiesced will cause PC10 failure
on some device combinations.
To resolve this, move the PCI state saving to before SetFeatures has been
called. This has been proven to resolve the issue across a 5000 sample
test on previously failing disk/system combinations.
Signed-off-by: Mario Limonciello <mario.limonciello@dell.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Allow the do/while statement to continue if current time
is not after the proposed time 'deadline'. Intent is to
allow loop to proceed for a specific time period. Currently
the loop, as coded, will exit after first pass.
Signed-off-by: Mark Wunderlich <mark.wunderlich@intel.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
This code is currently unable to find the dts opp tables as ti-cpufreq
needs to set them up first based on speed binning.
We stopped initializing the opp tables with platform code years ago for
device tree based booting with commit 92d51856d7 ("ARM: OMAP3+: do not
register non-dt OPP tables for device tree boot"), and all of mach-omap2
is now booting using device tree.
We currently get the following errors on init:
omap2_set_init_voltage: unable to find boot up OPP for vdd_mpu
omap2_set_init_voltage: unable to set vdd_mpu
omap2_set_init_voltage: unable to find boot up OPP for vdd_core
omap2_set_init_voltage: unable to set vdd_core
omap2_set_init_voltage: unable to find boot up OPP for vdd_iva
omap2_set_init_voltage: unable to set vdd_iva
Let's just drop the unused code. Nowadays ti-cpufreq should be used to
to initialize things properly.
Cc: Adam Ford <aford173@gmail.com>
Cc: André Roth <neolynx@gmail.com>
Cc: "H. Nikolaus Schaller" <hns@goldelico.com>
Cc: Nishanth Menon <nm@ti.com>
Cc: Tero Kristo <t-kristo@ti.com>
Tested-by: Adam Ford <aford173@gmail.com> #logicpd-torpedo-37xx-devkit
Signed-off-by: Tony Lindgren <tony@atomide.com>
TRM "Table 13-34. SYSCONFIG Register Field Descriptions" lists both
standbymode and idlemode that should be just the sidle and midle
registers where midle is currently unconfigured for lcdc_sysc. As
the dts data has been generated based on lcdc_sysc, we now have an
empty "ti,sysc-midle" property.
And so we currently get a warning for lcdc because of a difference
with dts provided configuration compared to the legacy platform
data. This is because lcdc has SYSC_HAS_MIDLEMODE configured in
the platform data without configuring the modes.
Let's fix the issue by adding the missing midlemode to lcdc_sysc,
and configuring the "ti,sysc-midle" property based on the TRM
values.
Fixes: f711c575cf ("ARM: dts: am335x: Add l4 interconnect hierarchy and ti-sysc data")
Cc: Jyri Sarha <jsarha@ti.com>
Cc: Keerthy <j-keerthy@ti.com>
Cc: Robert Nelson <robertcnelson@gmail.com>
Cc: Suman Anna <s-anna@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
For ti,sysc-omap4 compatible devices with no sysstatus register, we do have
reset done status available in the SOFTRESET bit that clears when the reset
is done. This is documented for example in am437x TRM for DMTIMER_TIOCP_CFG
register. The am335x TRM just says that SOFTRESET bit value 1 means reset is
ongoing, but it behaves the same way clearing after reset is done.
With the ti-sysc driver handling this automatically based on no sysstatus
register defined, we see warnings if SYSC_HAS_RESET_STATUS is missing in the
legacy platform data:
ti-sysc 48042000.target-module: sysc_flags 00000222 != 00000022
ti-sysc 48044000.target-module: sysc_flags 00000222 != 00000022
ti-sysc 48046000.target-module: sysc_flags 00000222 != 00000022
...
Let's fix these warnings by adding SYSC_HAS_RESET_STATUS. Let's also
remove the useless parentheses while at it.
If it turns out we do have ti,sysc-omap4 compatible devices without a
working SOFTRESET bit we can set up additional quirk handling for it.
Signed-off-by: Tony Lindgren <tony@atomide.com>
Use a 'skb_put_data()' variant instead of rewritting it.
The __skb_put_data variant is safe here. It is obvious that the skb can
not overflow. It has just been allocated a few lines above with the same
'len'.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Acked-by: Xue Liu <liuxuenetmail@gmail.com>
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
The ti,no-idle-on-init and ti,no-reset-on-init flags need to be at
the interconnect target module level for the modules that have it
defined. Otherwise we get the following warnings:
dts flag should be at module level for ti,no-idle-on-init
dts flag should be at module level for ti,no-reset-on-init
Fixes: 87fc89ced3 ("ARM: dts: am335x: Move l4 child devices to probe them with ti-sysc")
Cc: Lokesh Vutla <lokeshvutla@ti.com>
Reported-by: Suman Anna <s-anna@ti.com>
Reviewed-by: Lokesh Vutla <lokeshvutla@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Fix sparse warnings:
drivers/scsi/hisi_sas/hisi_sas_main.c:3686:6:
warning: symbol 'hisi_sas_debugfs_release' was not declared. Should it be static?
drivers/scsi/hisi_sas/hisi_sas_main.c:3708:5:
warning: symbol 'hisi_sas_debugfs_alloc' was not declared. Should it be static?
drivers/scsi/hisi_sas/hisi_sas_main.c:3799:6:
warning: symbol 'hisi_sas_debugfs_bist_init' was not declared. Should it be static?
Link: https://lore.kernel.org/r/20190923054035.19036-1-yuehaibing@huawei.com
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
storvsc doesn't use a dedicated hardware queue for a given CPU queue. When
issuing I/O, it selects returning CPU (hardware queue) dynamically based on
vmbus channel usage across all channels.
This patch advertises num_present_cpus() as number of hardware queues. This
will have upper layer setup 1:1 mapping between hardware queue and CPU
queue and avoid unnecessary locking when issuing I/O.
Link: https://lore.kernel.org/r/1567790660-48142-1-git-send-email-longli@linuxonhyperv.com
Signed-off-by: Long Li <longli@microsoft.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In some cases, hba may go through shutdown flow without successful
initialization and then make system hang.
For example, if ufshcd_change_power_mode() gets error and leads to
ufshcd_hba_exit() to release resources of the host, future shutdown flow
may hang the system since the host register will be accessed in unpowered
state.
To solve this issue, simply add checking to skip shutdown for above kind of
situation.
Link: https://lore.kernel.org/r/1568780438-28753-1-git-send-email-stanley.chu@mediatek.com
Signed-off-by: Stanley Chu <stanley.chu@mediatek.com>
Acked-by: Bean Huo <beanhuo@micron.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The qla2xxx driver had this issue as well when the newer array firmware
returned the retry_delay_timer in the fcp_rsp. The bnx2fc is not handling
the masking of the scope bits either so the retry_delay_timestamp value
lands up being a large value added to the timer timestamp delaying I/O for
up to 27 Minutes. This patch adds similar code to handle this to the
bnx2fc driver to avoid the huge delay.
Link: https://lore.kernel.org/r/1568210202-12794-1-git-send-email-loberman@redhat.com
Signed-off-by: Laurence Oberman <loberman@redhat.com>
Reported-by: David Jeffery <djeffery@redhat.com>
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
User space programs like udevd may try to read to partitions at the
same time the driver detects a namespace is unusable, and may deadlock
if revalidate_disk() is called while such a process is waiting to
enter the frozen queue. On detecting a dead namespace, move the disk
revalidate after unblocking dispatchers that may be holding bd_butex.
changelog Suggested-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Balbir Singh <sblbir@amzn.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
In nvmet_bdev_set_limits() the number of logical blocks per
physical block is calculated, but the opposite is mentioned in
the associated comment and reflected in the variable name. Correct
the comment and adjust the variable name to reflect the calculation
done.
Signed-off-by: John Pittman <jpittman@redhat.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Droid4 needs USB option serial driver for modem, and lm3532 for the
LCD backlight.
Note that the LCD backlight does not yet get enabled automatically,
but needs to be done manually with:
# echo 50 > /sys/class/leds/lm3532::backlight/brightness
Signed-off-by: Tony Lindgren <tony@atomide.com>
The TFP410 driver was removed but the replacement driver was
never enabled. This patch enableds the DRM_TI_TFP410
Fixes: be3143d8b2 ("drm/omap: Remove TFP410 and DVI connector drivers")
Signed-off-by: Adam Ford <aford173@gmail.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
commit 6953c57ab1 "gpio: of: Handle SPI chipselect legacy bindings"
did introduce logic to centrally handle the legacy spi-cs-high property
in combination with cs-gpios. This assumes that the polarity
of the CS has to be inverted if spi-cs-high is missing, even
and especially if non-legacy GPIO_ACTIVE_HIGH is specified.
The DTS for the GTA04 was orginally introduced under the assumption
that there is no need for spi-cs-high if the gpio is defined with
proper polarity GPIO_ACTIVE_HIGH.
This was not a problem until gpiolib changed the interpretation of
GPIO_ACTIVE_HIGH and missing spi-cs-high.
The effect is that the missing spi-cs-high is now interpreted as CS being
low (despite GPIO_ACTIVE_HIGH) which turns off the SPI interface when the
panel is to be programmed by the panel driver.
Therefore, we have to add the redundant and legacy spi-cs-high property
to properly activate CS.
Cc: stable@vger.kernel.org
Signed-off-by: H. Nikolaus Schaller <hns@goldelico.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
The ahclkr clkctrl clock bit 28 only exists for mcasp 1 and 2 on dra7.
This causes the following warning on beagle-x15:
ti-sysc 48468000.target-module: could not add child clock ahclkr: -19
Also the mcasp clkctrl clock bits are wrong:
For mcasp1 and 2 we have four clocks at bits 28, 24, 22 and 0:
bit 28 is ahclkr
bit 24 is ahclkx
bit 22 is auxclk
bit 0 is fck
For mcasp3 to 8 we have three clocks at bits 24, 22 and 0.
bit 24 is ahclkx
bit 22 is auxclk
bit 0 is fck
We do not have currently mapped auxclk at bit 22 for the drivers, that can
be added if needed.
Fixes: 5241ccbf28 ("ARM: dts: Add missing ranges for dra7 mcasp l3 ports")
Cc: Suman Anna <s-anna@ti.com>
Cc: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
There are two problems in dcache_readdir() - one is that lockless traversal
of the list needs non-trivial cooperation of d_alloc() (at least a switch
to list_add_rcu(), and probably more than just that) and another is that
it assumes that no removal will happen without the directory locked exclusive.
Said assumption had always been there, never had been stated explicitly and
is violated by several places in the kernel (devpts and selinuxfs).
* replacement of next_positive() with different calling conventions:
it returns struct list_head * instead of struct dentry *; the latter is
passed in and out by reference, grabbing the result and dropping the original
value.
* scan is under ->d_lock. If we run out of timeslice, cursor is moved
after the last position we'd reached and we reschedule; then the scan continues
from that place. To avoid livelocks between multiple lseek() (with cursors
getting moved past each other, never reaching the real entries) we always
skip the cursors, need_resched() or not.
* returned list_head * is either ->d_child of dentry we'd found or
->d_subdirs of parent (if we got to the end of the list).
* dcache_readdir() and dcache_dir_lseek() switched to new helper.
dcache_readdir() always holds a reference to dentry passed to dir_emit() now.
Cursor is moved to just before the entry where dir_emit() has failed or into
the very end of the list, if we'd run out.
* move_cursor() eliminated - it had sucky calling conventions and
after fixing that it became simply list_move() (in lseek and scan_positives)
or list_move_tail() (in readdir).
All operations with the list are under ->d_lock now, and we do not
depend upon having all file removals done with parent locked exclusive
anymore.
Cc: stable@vger.kernel.org
Reported-by: "zhengbin (A)" <zhengbin13@huawei.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Commit 49dfe94fe5 ("KVM: arm/arm64: Fix TRACE_INCLUDE_PATH") fixes
TRACE_INCLUDE_PATH to the correct relative path to the define_trace.h
and explains why did the old one work.
The same fix should be applied to virt/kvm/arm/vgic/trace.h.
Reviewed-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Given that the TLB invalidation path is pretty rarely used, there
was never any advantage to using hyp_alternate_select() here.
has_vhe(), being a glorified static key, is the right tool for
the job.
Off you go.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Christoffer Dall <christoffer.dall@arm.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
There is no reason for using hyp_alternate_select when checking
for ARM64_WORKAROUND_834220, as each of the capabilities is
also backed by a static key. Just replace the KVM-specific
construct with cpus_have_const_cap(ARM64_WORKAROUND_834220).
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Christoffer Dall <christoffer.dall@arm.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
The center temperature of the supported devices stored in the constant
BMC150_ACCEL_TEMP_CENTER_VAL is not 24 degrees but 23 degrees.
It seems that some datasheets were inconsistent on this value leading
to the error. For most usecases will only make minor difference so
not queued for stable.
Signed-off-by: Pascal Bouwmann <bouwmann@tau-tec.de>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Do not allow configuring null sensor gain since it will force to 0
device outputs
Fixes: c8d4066c7246 ("iio: imu: st_lsm6dsx: remove invalid gain value for LSM9DS1")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
meson_saradc's irq handler uses priv->regmap so make sure that it is
allocated before the irq get enabled.
This also fixes crash when CONFIG_DEBUG_SHIRQ is enabled, as device
managed resources are freed in the inverted order they had been
allocated, priv->regmap was freed before the spurious fake irq that
CONFIG_DEBUG_SHIRQ adds called the handler.
Fixes: 3af109131b ("iio: adc: meson-saradc: switch from polling to interrupt mode")
Reported-by: Elie Roudninski <xademax@gmail.com>
Signed-off-by: Remi Pommarel <repk@triplefau.lt>
Reviewed-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Tested-by: Elie ROUDNINSKI <xademax@gmail.com>
Reviewed-by: Kevin Hilman <khilman@baylibre.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2019-09-08 12:30:32 +01:00
677 changed files with 6071 additions and 6391 deletions
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.