Pull drm fixes from Dave Airlie:
"A bunch of fixes covering i915, amdgpu, one tegra and some core DRM
ones. Nothing too strange at this point"
* tag 'drm-fixes-for-4.8-rc4' of git://people.freedesktop.org/~airlied/linux: (21 commits)
drm/atomic: Don't potentially reset color_mgmt_changed on successive property updates.
drm: Protect fb_defio in drivers with CONFIG_KMS_FBDEV_EMULATION
drm/amdgpu: skip TV/CV in display parsing
drm/amdgpu: avoid a possible array overflow
drm/amdgpu: fix lru size grouping v2
drm/tegra: dsi: Enhance runtime power management
drm/i915: Fix botched merge that downgrades CSR versions.
drm/i915/skl: Ensure pipes with changed wms get added to the state
drm/i915/gen9: Only copy WM results for changed pipes to skl_hw
drm/i915/skl: Add support for the SAGV, fix underrun hangs
drm/i915/gen6+: Interpret mailbox error flags
drm/i915: Reattach comment, complete type specification
drm/i915: Unconditionally flush any chipset buffers before execbuf
drm/i915/gen9: Drop invalid WARN() during data rate calculation
drm/i915/gen9: Initialize intel_state->active_crtcs during WM sanitization (v2)
drm: Reject page_flip for !DRIVER_MODESET
drm/amdgpu: fix timeout value check in amd_sched_job_recovery
drm/amdgpu: fix sdma_v2_4_ring_test_ib
drm/amdgpu: fix amdgpu_move_blit on 32bit systems
drm/radeon: fix radeon_move_blit on 32bit systems
...
Due to assigning the 'replaced' value instead of or'ing it,
if drm_atomic_crtc_set_property() gets called multiple times,
the last call will define the color_mgmt_changed flag, so
a non-updating call to a property can reset the flag and
prevent actual hw state updates required by preceding
property updates.
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Cc: Daniel Vetter <daniel.vetter@intel.com>
Cc: <stable@vger.kernel.org> # v4.6+
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Pull perf fixes from Thomas Gleixner:
"A few fixes from the perf departement
- prevent a imbalanced preemption disable in the events teardown code
- prevent out of bound acces in perf userspace
- make perf tools compile with UCLIBC again
- a fix for the userspace unwinder utility"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/core: Use this_cpu_ptr() when stopping AUX events
perf evsel: Do not access outside hw cache name arrays
tools lib: Reinstate strlcpy() header guard with __UCLIBC__
perf unwind: Use addr_location::addr instead of ip for entries
Pull x86 fix from Thomas Gleixner:
"A single bugfix to prevent irq remapping when the ioapic is disabled"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/apic: Do not init irq remapping if ioapic is disabled
Pull irq fixes from Thomas Gleixner:
"This lot provides:
- plug a hotplug race in the new affinity infrastructure
- a fix for the trigger type of chained interrupts
- plug a potential memory leak in the core code
- a few fixes for ARM and MIPS GICs"
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/mips-gic: Implement activate op for device domain
irqchip/mips-gic: Cleanup chip and handler setup
genirq/affinity: Use get/put_online_cpus around cpumask operations
genirq: Fix potential memleak when failing to get irq pm
irqchip/gicv3-its: Disable the ITS before initializing it
irqchip/gicv3: Remove disabling redistributor and group1 non-secure interrupts
irqchip/gic: Allow self-SGIs for SMP on UP configurations
genirq: Correctly configure the trigger on chained interrupts
Pull timer fixes from Thomas Gleixner:
"A few updates for timers & co:
- prevent a livelock in the timekeeping code when debugging is
enabled
- prevent out of bounds access in the timekeeping debug code
- various fixes in clocksource drivers
- a new maintainers entry"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
clocksource/drivers/sun4i: Clear interrupts after stopping timer in probe function
drivers/clocksource/pistachio: Fix memory corruption in init
clocksource/drivers/timer-atmel-pit: Enable mck clock
clocksource/drivers/pxa: Fix include files for compilation
MAINTAINERS: Add ARM ARCHITECTED TIMER entry
timekeeping: Cap array access in timekeeping_debug
timekeeping: Avoid taking lock in NMI path with CONFIG_DEBUG_TIMEKEEPING
Pull KVM fixes from Paolo Bonzini:
"ARM:
- fixes for ITS init issues, error handling, IRQ leakage, race
conditions
- an erratum workaround for timers
- some removal of misleading use of errors and comments
- a fix for GICv3 on 32-bit guests
MIPS:
- fix for where the guest could wrongly map the first page of
physical memory
x86:
- nested virtualization fixes"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
MIPS: KVM: Check for pfn noslot case
kvm: nVMX: fix nested tsc scaling
KVM: nVMX: postpone VMCS changes on MSR_IA32_APICBASE write
KVM: nVMX: fix msr bitmaps to prevent L2 from accessing L0 x2APIC
arm64: KVM: report configured SRE value to 32-bit world
arm64: KVM: remove misleading comment on pmu status
KVM: arm/arm64: timer: Workaround misconfigured timer interrupt
arm64: Document workaround for Cortex-A72 erratum #853709
KVM: arm/arm64: Change misleading use of is_error_pfn
KVM: arm64: ITS: avoid re-mapping LPIs
KVM: arm64: check for ITS device on MSI injection
KVM: arm64: ITS: move ITS registration into first VCPU run
KVM: arm64: vgic-its: Make updates to propbaser/pendbaser atomic
KVM: arm64: vgic-its: Plug race in vgic_put_irq
KVM: arm64: vgic-its: Handle errors from vgic_add_lpi
KVM: arm64: ITS: return 1 on successful MSI injection
Pull ARM64 fix from Catalin Marinas:
"ARM64 fix to avoid potential TLB conflict when CONFIG_RANDOMIZE_BASE
is enabled"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: avoid TLB conflict with CONFIG_RANDOMIZE_BASE
Pull rdma fixes from Doug Ledford:
"Round one of 4.8 rc fixes.
This should be the bulk of the -rc fixes for 4.8. I only have a few
things that are still outstanding (two ipoib bugs for which the
solution is not yet fully known, and a few queued items that came in
after my last push and I didn't want to delay this pull request for
late comers again).
Even though the patch count is kind of high, everything is minor fixes
so the overall churn is pretty low.
Summary:
- minor fixes to cxgb4
- minor fixes to mlx4
- one minor fix each to core, rxe, isert, srpt, mlx5, ocrdma, and usnic
- six or so fixes to i40iw fixes
- the rest are hfi1 fixes"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (34 commits)
i40iw: Send last streaming mode message for loopback connections
IB/srpt: Update sport->port_guid with each port refresh
RDMA/ocrdma: Fix the max_sge reported from FW
i40iw: Avoid writing to freed memory
i40iw: Fix double free of allocated_buffer
IB/mlx5: Remove superfluous include of io-mapping.h
i40iw: Do not set self-referencing pointer to NULL after kfree
i40iw: Add missing NULL check for MPA private data
iw_cxgb4: Fix cxgb4 arm CQ logic w/IB_CQ_REPORT_MISSED_EVENTS
i40iw: Add missing check for interface already open
i40iw: Protect req_resource_num update
i40iw: Change mem_resources pointer to a u8
IB/core: Use memdup_user() rather than duplicating its implementation
IB/qib: Use memdup_user() rather than duplicating its implementation
iw_cxgb4: use the MPA initiator's IRD if < our ORD
iw_cxgb4: limit IRD/ORD advertised to ULP by device max.
IB/hfi1: Fix mm_struct use after free
IB/rdmvat: Fix double vfree() in rvt_create_qp() error path
IB/hfi1: Improve J_KEY generation
IB/hfi1: Return invalid field for non-QSFP CableInfo queries
...
Pull sound fixes from Takashi Iwai:
"Here are a bunch of fixes as you can see in diffstat.
One core change in ASoC is about the unexpected unbinding error, and
another about debugfs cleanup.
The rest are wide-spread driver-specific fixes: a series of LINE6 USB
fixes, a HD-audio quirk, and various ASoC fixes including OMAP boot
fixes and Intel SKL fixes"
* tag 'sound-4.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (22 commits)
ALSA: hda/realtek - fix headset mic detection for MSI MS-B120
ASoC: omap-mcpdm: Fix irq resource handling
ASoC: max98371: Add terminate entry for i2c_device_id tables
ALSA: line6: Fix POD sysfs attributes segfault
ALSA: line6: Give up on the lock while URBs are released.
ALSA: line6: Remove double line6_pcm_release() after failed acquire.
ASoC: omap-abe-twl6040: Correct dmic-codec device registration
ASoC: core: Clean up DAPM before the card debugfs
ASoC: omap-mcpdm: Drop pdmclk clock handling
ASoC: atmel_ssc_dai: Don't unconditionally reset SSC on stream startup
ASoC: compress: Fix leak of a widget list in soc_compr_open_fe
ASoC: Intel: Skylake: Fix error return code in skl_probe()
ASoC: wm2000: Fix return of uninitialised varible
ASoC: Fix leak of rtd in soc_bind_dai_link
ASoC: da7213: Default to 64 BCLKs per WCLK to support all formats
ASoC: nau8825: fix static check error about semaphone control
ASoC: nau8825: fix bug in playback when suspend
ASoC: samsung: Fix clock handling in S3C24XX_UDA134X card
ASoC: simple-card-utils: add missing MODULE_xxx()
ASoC: Intel: Skylake: Check list empty while getting module info
...
Pull btrfs fixes from Chris Mason:
"We've queued up a few different fixes in here. These range from
enospc corners to fsync and quota fixes, and a few targeted at error
handling for corrupt metadata/fuzzing"
* 'for-linus-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
Btrfs: fix lockdep warning on deadlock against an inode's log mutex
Btrfs: detect corruption when non-root leaf has zero item
Btrfs: check btree node's nritems
btrfs: don't create or leak aliased root while cleaning up orphans
Btrfs: fix em leak in find_first_block_group
btrfs: do not background blkdev_put()
Btrfs: clarify do_chunk_alloc()'s return value
btrfs: fix fsfreeze hang caused by delayed iputs deal
btrfs: update btrfs_space_info's bytes_may_use timely
btrfs: divide btrfs_update_reserved_bytes() into two functions
btrfs: use correct offset for reloc_inode in prealloc_file_extent_cluster()
btrfs: qgroup: Fix qgroup incorrectness caused by log replay
btrfs: relocation: Fix leaking qgroups numbers on data extents
btrfs: qgroup: Refactor btrfs_qgroup_insert_dirty_extent()
btrfs: waiting on qgroup rescan should not always be interruptible
btrfs: properly track when rescan worker is running
btrfs: flush_space: treat return value of do_chunk_alloc properly
Btrfs: add ASSERT for block group's memory leak
btrfs: backref: Fix soft lockup in __merge_refs function
Btrfs: fix memory leak of reloc_root
Pull dlm fix from David Teigland:
"This fixes a bug introduced by recent debugfs cleanup"
* tag 'dlm-4.8-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm:
dlm: fix malfunction of dlm_tool caused by debugfs changes
Pull device mapper fixes from Mike Snitzer:
- another stable fix for DM flakey (that tweaks the previous fix that
didn't factor in expected 'drop_writes' behavior for read IO).
- a dm-log bio operation flags fix for the broader block changes that
were merged during the 4.8 merge window.
* tag 'dm-4.8-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm log: fix unitialized bio operation flags
dm flakey: fix reads to be issued if drop_writes configured
Pull IOMMU fixes from Joerg Roedel:
"Fixes from Will Deacon:
- fix a couple of thinkos in the CMDQ error handling and
short-descriptor page table code that have been there since day one
- disable stalling faults, since they may result in hardware deadlock
- fix an accidental BUG() when passing disable_bypass=1 on the
cmdline"
* tag 'iommu-fixes-v4.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu/arm-smmu: Don't BUG() if we find aborting STEs with disable_bypass
iommu/arm-smmu: Disable stalling faults for all endpoints
iommu/arm-smmu: Fix CMDQ error handling
iommu/io-pgtable-arm-v7s: Fix attributes when splitting blocks
Pull block fixes from Jens Axboe:
"Here's a set of block fixes for the current 4.8-rc release. This
contains:
- a fix for a secure erase regression, from Adrian.
- a fix for an mmc use-after-free bug regression, also from Adrian.
- potential zero pointer deference in bdev freezing, from Andrey.
- a race fix for blk_set_queue_dying() from Bart.
- a set of xen blkfront fixes from Bob Liu.
- three small fixes for bcache, from Eric and Kent.
- a fix for a potential invalid NVMe state transition, from Gabriel.
- blk-mq CPU offline fix, preventing us from issuing and completing a
request on the wrong queue. From me.
- revert two previous floppy changes, since they caused a user
visibile regression. A better fix is in the works.
- ensure that we don't send down bios that have more than 256
elements in them. Fixes a crash with bcache, for example. From
Ming.
- a fix for deferencing an error pointer with cgroup writeback.
Fixes a regression. From Vegard"
* 'for-linus' of git://git.kernel.dk/linux-block:
mmc: fix use-after-free of struct request
Revert "floppy: refactor open() flags handling"
Revert "floppy: fix open(O_ACCMODE) for ioctl-only open"
fs/block_dev: fix potential NULL ptr deref in freeze_bdev()
blk-mq: improve warning for running a queue on the wrong CPU
blk-mq: don't overwrite rq->mq_ctx
block: make sure a big bio is split into at most 256 bvecs
nvme: Fix nvme_get/set_features() with a NULL result pointer
bdev: fix NULL pointer dereference
xen-blkfront: free resources if xlvbd_alloc_gendisk fails
xen-blkfront: introduce blkif_set_queue_limits()
xen-blkfront: fix places not updated after introducing 64KB page granularity
bcache: pr_err: more meaningful error message when nr_stripes is invalid
bcache: RESERVE_PRIO is too small by one when prio_buckets() is a power of two.
bcache: register_bcache(): call blkdev_put() when cache_alloc() fails
block: Fix race triggered by blk_set_queue_dying()
block: Fix secure erase
nvme: Prevent controller state invalid transition
The data offset for a dax region needs to account for a reservation in
the resource range. Otherwise, device-dax is allowing mappings directly
into the memmap or device-info-block area with crash signatures like the
following:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
IP: get_zone_device_page+0x11/0x30
Call Trace:
follow_devmap_pmd+0x298/0x2c0
follow_page_mask+0x275/0x530
__get_user_pages+0xe3/0x750
__gfn_to_pfn_memslot+0x1b2/0x450 [kvm]
tdp_page_fault+0x130/0x280 [kvm]
kvm_mmu_page_fault+0x5f/0xf0 [kvm]
handle_ept_violation+0x94/0x180 [kvm_intel]
vmx_handle_exit+0x1d3/0x1440 [kvm_intel]
kvm_arch_vcpu_ioctl_run+0x81d/0x16a0 [kvm]
kvm_vcpu_ioctl+0x33c/0x620 [kvm]
do_vfs_ioctl+0xa2/0x5d0
SyS_ioctl+0x79/0x90
entry_SYSCALL_64_fastpath+0x1a/0xa4
Fixes: ab68f26221 ("/dev/dax, pmem: direct access to persistent memory")
Link: http://lkml.kernel.org/r/147205536732.1606.8994275381938837346.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reported-by: Abhilash Kumar Mulumudi <m.abhilash-kumar@hpe.com>
Reported-by: Toshi Kani <toshi.kani@hpe.com>
Tested-by: Toshi Kani <toshi.kani@hpe.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
seq_read() is a nasty piece of work, not to mention buggy.
It has (I think) an old bug which allows unprivileged userspace to read
beyond the end of m->buf.
I was getting these:
BUG: KASAN: slab-out-of-bounds in seq_read+0xcd2/0x1480 at addr ffff880116889880
Read of size 2713 by task trinity-c2/1329
CPU: 2 PID: 1329 Comm: trinity-c2 Not tainted 4.8.0-rc1+ #96
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org 04/01/2014
Call Trace:
kasan_object_err+0x1c/0x80
kasan_report_error+0x2cb/0x7e0
kasan_report+0x4e/0x80
check_memory_region+0x13e/0x1a0
kasan_check_read+0x11/0x20
seq_read+0xcd2/0x1480
proc_reg_read+0x10b/0x260
do_loop_readv_writev.part.5+0x140/0x2c0
do_readv_writev+0x589/0x860
vfs_readv+0x7b/0xd0
do_readv+0xd8/0x2c0
SyS_readv+0xb/0x10
do_syscall_64+0x1b3/0x4b0
entry_SYSCALL64_slow_path+0x25/0x25
Object at ffff880116889100, in cache kmalloc-4096 size: 4096
Allocated:
PID = 1329
save_stack_trace+0x26/0x80
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
__kmalloc+0x1aa/0x4a0
seq_buf_alloc+0x35/0x40
seq_read+0x7d8/0x1480
proc_reg_read+0x10b/0x260
do_loop_readv_writev.part.5+0x140/0x2c0
do_readv_writev+0x589/0x860
vfs_readv+0x7b/0xd0
do_readv+0xd8/0x2c0
SyS_readv+0xb/0x10
do_syscall_64+0x1b3/0x4b0
return_from_SYSCALL_64+0x0/0x6a
Freed:
PID = 0
(stack is not available)
Memory state around the buggy address:
ffff88011688a000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff88011688a080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>ffff88011688a100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
^
ffff88011688a180: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff88011688a200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================
Disabling lock debugging due to kernel taint
This seems to be the same thing that Dave Jones was seeing here:
https://lkml.org/lkml/2016/8/12/334
There are multiple issues here:
1) If we enter the function with a non-empty buffer, there is an attempt
to flush it. But it was not clearing m->from after doing so, which
means that if we try to do this flush twice in a row without any call
to traverse() in between, we are going to be reading from the wrong
place -- the splat above, fixed by this patch.
2) If there's a short write to userspace because of page faults, the
buffer may already contain multiple lines (i.e. pos has advanced by
more than 1), but we don't save the progress that was made so the
next call will output what we've already returned previously. Since
that is a much less serious issue (and I have a headache after
staring at seq_read() for the past 8 hours), I'll leave that for now.
Link: http://lkml.kernel.org/r/1471447270-32093-1-git-send-email-vegard.nossum@oracle.com
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Reported-by: Dave Jones <davej@codemonkey.org.uk>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
A bugfix in v4.8-rc2 introduced a harmless warning when
CONFIG_MEMCG_SWAP is disabled but CONFIG_MEMCG is enabled:
mm/memcontrol.c:4085:27: error: 'mem_cgroup_id_get_online' defined but not used [-Werror=unused-function]
static struct mem_cgroup *mem_cgroup_id_get_online(struct mem_cgroup *memcg)
This moves the function inside of the #ifdef block that hides the
calling function, to avoid the warning.
Fixes: 1f47b61fb4 ("mm: memcontrol: fix swap counter leak on swapout from offline cgroup")
Link: http://lkml.kernel.org/r/20160824113733.2776701-1-arnd@arndb.de
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit bbeddf52ad ("printk: move braille console support into separate
braille.[ch] files") moved the parsing of braille-related options into
_braille_console_setup(), changing the type of variable str from char*
to char**. In this commit, memcmp(str, "brl,", 4) was correctly updated
to memcmp(*str, "brl,", 4) but not memcmp(str, "brl=", 4).
Update the code to make "brl=" option work again and replace memcmp()
with strncmp() to make the compiler able to detect such an issue.
Fixes: bbeddf52ad ("printk: move braille console support into separate braille.[ch] files")
Link: http://lkml.kernel.org/r/20160823165700.28952-1-nicolas.iooss_linux@m4x.org
Signed-off-by: Nicolas Iooss <nicolas.iooss_linux@m4x.org>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
While adding proper userfaultfd_wp support with bits in pagetable and
swap entry to avoid false positives WP userfaults through swap/fork/
KSM/etc, I've been adding a framework that mostly mirrors soft dirty.
So I noticed in one place I had to add uffd_wp support to the pagetables
that wasn't covered by soft_dirty and I think it should have.
Example: in the THP migration code migrate_misplaced_transhuge_page()
pmd_mkdirty is called unconditionally after mk_huge_pmd.
entry = mk_huge_pmd(new_page, vma->vm_page_prot);
entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma);
That sets soft dirty too (it's a false positive for soft dirty, the soft
dirty bit could be more finegrained and transfer the bit like uffd_wp
will do.. pmd/pte_uffd_wp() enforces the invariant that when it's set
pmd/pte_write is not set).
However in the THP split there's no unconditional pmd_mkdirty after
mk_huge_pmd and pte_swp_mksoft_dirty isn't called after the migration
entry is created. The code sets the dirty bit in the struct page
instead of setting it in the pagetable (which is fully equivalent as far
as the real dirty bit is concerned, as the whole point of pagetable bits
is to be eventually flushed out of to the page, but that is not
equivalent for the soft-dirty bit that gets lost in translation).
This was found by code review only and totally untested as I'm working
to actually replace soft dirty and I don't have time to test potential
soft dirty bugfixes as well :).
Transfer the soft_dirty from pmd to pte during THP splits.
This fix avoids losing the soft_dirty bit and avoids userland memory
corruption in the checkpoint.
Fixes: eef1b3ba05 ("thp: implement split_huge_pmd()")
Link: http://lkml.kernel.org/r/1471610515-30229-2-git-send-email-aarcange@redhat.com
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: Pavel Emelyanov <xemul@virtuozzo.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We have scripts which write to certain fields on 3.18 kernels but this
seems to be failing on 4.4 kernels. An entry which we write to here is
xfrm_aevent_rseqth which is u32.
echo 4294967295 > /proc/sys/net/core/xfrm_aevent_rseqth
Commit 230633d109 ("kernel/sysctl.c: detect overflows when converting
to int") prevented writing to sysctl entries when integer overflow
occurs. However, this does not apply to unsigned integers.
Heinrich suggested that we introduce a new option to handle 64 bit
limits and set min as 0 and max as UINT_MAX. This might not work as it
leads to issues similar to __do_proc_doulongvec_minmax. Alternatively,
we would need to change the datatype of the entry to 64 bit.
static int __do_proc_doulongvec_minmax(void *data, struct ctl_table
{
i = (unsigned long *) data; //This cast is causing to read beyond the size of data (u32)
vleft = table->maxlen / sizeof(unsigned long); //vleft is 0 because maxlen is sizeof(u32) which is lesser than sizeof(unsigned long) on x86_64.
Introduce a new proc handler proc_douintvec. Individual proc entries
will need to be updated to use the new handler.
[akpm@linux-foundation.org: coding-style fixes]
Fixes: 230633d109 ("kernel/sysctl.c:detect overflows when converting to int")
Link: http://lkml.kernel.org/r/1471479806-5252-1-git-send-email-subashab@codeaurora.org
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Cc: Heinrich Schuchardt <xypron.glpk@gmx.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
With the current kernel, `dlm_tool lockdebug` fails as below:
"dlm_tool lockdebug ED0BD86DCE724393918A1AE8FDBF1EE3
can't open /sys/kernel/debug/dlm/ED0BD86DCE724393918A1AE8FDBF1EE3:
Operation not permitted"
This is because table_open() depends on file->f_op to tell which
seq_file ops should be passed down. But, the original file ops in
file->f_op is replaced by "debugfs_full_proxy_file_operations" with
commit 49d200deaa ("debugfs: prevent access to removed files'
private data").
Currently, I can think up 2 solutions: 1st, replace
debugfs_create_file() with debugfs_create_file_unsafe();
2nd, make different table_open#() accordingly. The 1st one
is neat, but I don't thoroughly understand its risk. Maybe
someone has a better one.
Signed-off-by: Eric Ren <zren@suse.com>
Signed-off-by: David Teigland <teigland@redhat.com>
The bootloader (U-boot) sometimes uses this timer for various delays.
It uses it as a ongoing counter, and does comparisons on the current
counter value. The timer counter is never stopped.
In some cases when the user interacts with the bootloader, or lets
it idle for some time before loading Linux, the timer may expire,
and an interrupt will be pending. This results in an unexpected
interrupt when the timer interrupt is enabled by the kernel, at
which point the event_handler isn't set yet. This results in a NULL
pointer dereference exception, panic, and no way to reboot.
Clear any pending interrupts after we stop the timer in the probe
function to avoid this.
Cc: stable@vger.kernel.org
Signed-off-by: Chen-Yu Tsai <wens@csie.org>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Maxime Ripard <maxime.ripard@free-electrons.com>
Driver init code incorrectly uses the block base address and as a result
clears clocksource structure's fields instead of the hardware registers.
Commit 09a9982016 ("timekeeping: Lift clocksource cacheline
restriction") has changed the offsets within pistachio_clocksource
structure and what has previously gone unnoticed now leads to a kernel
panic during boot.
Signed-off-by: Marcin Nowakowski <marcin.nowakowski@imgtec.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
We call mmc_req_is_special() after having processed a request, but
it could be freed after that. Check that ahead of time, and use
the cached value.
Reported-by: Hans de Goede <hdegoede@redhat.com>
Tested-by: Hans de Goede <hdegoede@redhat.com>
Fixes: c2df40dfb8 ("drivers: use req op accessor")
Signed-off-by: Jens Axboe <axboe@fb.com>
i915 fixes queue.
* tag 'drm-intel-fixes-2016-08-25' of git://anongit.freedesktop.org/drm-intel:
drm/i915: Fix botched merge that downgrades CSR versions.
drm/i915/skl: Ensure pipes with changed wms get added to the state
drm/i915/gen9: Only copy WM results for changed pipes to skl_hw
drm/i915/skl: Add support for the SAGV, fix underrun hangs
drm/i915/gen6+: Interpret mailbox error flags
drm/i915: Reattach comment, complete type specification
drm/i915: Unconditionally flush any chipset buffers before execbuf
drm/i915/gen9: Drop invalid WARN() during data rate calculation
drm/i915/gen9: Initialize intel_state->active_crtcs during WM sanitization (v2)
For reasons that entirely elude me fb.h exposes all the structures,
even when it is not enabled. Except for special stuff like fb_defio.
Which means all the drivers which haven't yet switched over to the
defio support in the helpers and still roll their own, will fail
to compile when fbdev emulation is disabled. Protect just those
bits, as a gnarly reminder that conversion to the core defio helpers
would be good.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1470847958-28465-6-git-send-email-daniel.vetter@ffwll.ch
Signed-off-by: Dave Airlie <airlied@redhat.com>
ASoC: Fixes for v4.8
A clutch of fixes for v4.8. These are mainly driver specific, the most
notable ones being those for OMAP which fix a series of issues that
broke boot on some platforms there when deferred probe kicked in.
There's also one core fix for an issue when unbinding a card which for
some reason had managed to not manifest until recently.
Send a zero length last streaming mode message for loopback
connections to synchronize between accepting QP and connecting QP.
This avoids data transfer to start on the accepting QP before
the connecting QP is in RTS. Also remove function i40iw_loopback_nop()
as it is no longer used.
Fixes: f27b4746f3 ("i40iw: add connection management code")
Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Calling freeze_bdev() twice on the same block device without mounted
filesystem get_super() will return NULL, which will lead to NULL-ptr
dereference later in drop_super().
Check get_super() result to fix that.
Note, that this is a purely theoretical issue. We have only 3
freeze_bdev() callers. 2 of them are in filesystem code and used on a
device with mounted fs. The third one in lock_fs() has protection in
upper-layer code against freezing block device the second time without
thawing it first.
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
Commit 44f714dae5 ("Btrfs: improve performance on fsync against new
inode after rename/unlink"), which landed in 4.8-rc2, introduced a
possibility for a deadlock due to double locking of an inode's log mutex
by the same task, which lockdep reports with:
[23045.433975] =============================================
[23045.434748] [ INFO: possible recursive locking detected ]
[23045.435426] 4.7.0-rc6-btrfs-next-34+ #1 Not tainted
[23045.436044] ---------------------------------------------
[23045.436044] xfs_io/3688 is trying to acquire lock:
[23045.436044] (&ei->log_mutex){+.+...}, at: [<ffffffffa038552d>] btrfs_log_inode+0x13a/0xc95 [btrfs]
[23045.436044]
but task is already holding lock:
[23045.436044] (&ei->log_mutex){+.+...}, at: [<ffffffffa038552d>] btrfs_log_inode+0x13a/0xc95 [btrfs]
[23045.436044]
other info that might help us debug this:
[23045.436044] Possible unsafe locking scenario:
[23045.436044] CPU0
[23045.436044] ----
[23045.436044] lock(&ei->log_mutex);
[23045.436044] lock(&ei->log_mutex);
[23045.436044]
*** DEADLOCK ***
[23045.436044] May be due to missing lock nesting notation
[23045.436044] 3 locks held by xfs_io/3688:
[23045.436044] #0: (&sb->s_type->i_mutex_key#15){+.+...}, at: [<ffffffffa035f2ae>] btrfs_sync_file+0x14e/0x425 [btrfs]
[23045.436044] #1: (sb_internal#2){.+.+.+}, at: [<ffffffff8118446b>] __sb_start_write+0x5f/0xb0
[23045.436044] #2: (&ei->log_mutex){+.+...}, at: [<ffffffffa038552d>] btrfs_log_inode+0x13a/0xc95 [btrfs]
[23045.436044]
stack backtrace:
[23045.436044] CPU: 4 PID: 3688 Comm: xfs_io Not tainted 4.7.0-rc6-btrfs-next-34+ #1
[23045.436044] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.1-0-gb3ef39f-prebuilt.qemu-project.org 04/01/2014
[23045.436044] 0000000000000000 ffff88022f5f7860 ffffffff8127074d ffffffff82a54b70
[23045.436044] ffffffff82a54b70 ffff88022f5f7920 ffffffff81092897 ffff880228015d68
[23045.436044] 0000000000000000 ffffffff82a54b70 ffffffff829c3f00 ffff880228015d68
[23045.436044] Call Trace:
[23045.436044] [<ffffffff8127074d>] dump_stack+0x67/0x90
[23045.436044] [<ffffffff81092897>] __lock_acquire+0xcbb/0xe4e
[23045.436044] [<ffffffff8109155f>] ? mark_lock+0x24/0x201
[23045.436044] [<ffffffff8109179a>] ? mark_held_locks+0x5e/0x74
[23045.436044] [<ffffffff81092de0>] lock_acquire+0x12f/0x1c3
[23045.436044] [<ffffffff81092de0>] ? lock_acquire+0x12f/0x1c3
[23045.436044] [<ffffffffa038552d>] ? btrfs_log_inode+0x13a/0xc95 [btrfs]
[23045.436044] [<ffffffffa038552d>] ? btrfs_log_inode+0x13a/0xc95 [btrfs]
[23045.436044] [<ffffffff814a51a4>] mutex_lock_nested+0x77/0x3a7
[23045.436044] [<ffffffffa038552d>] ? btrfs_log_inode+0x13a/0xc95 [btrfs]
[23045.436044] [<ffffffffa039705e>] ? btrfs_release_delayed_node+0xb/0xd [btrfs]
[23045.436044] [<ffffffffa038552d>] btrfs_log_inode+0x13a/0xc95 [btrfs]
[23045.436044] [<ffffffffa038552d>] ? btrfs_log_inode+0x13a/0xc95 [btrfs]
[23045.436044] [<ffffffff810a0ed1>] ? vprintk_emit+0x453/0x465
[23045.436044] [<ffffffffa0385a61>] btrfs_log_inode+0x66e/0xc95 [btrfs]
[23045.436044] [<ffffffffa03c084d>] log_new_dir_dentries+0x26c/0x359 [btrfs]
[23045.436044] [<ffffffffa03865aa>] btrfs_log_inode_parent+0x4a6/0x628 [btrfs]
[23045.436044] [<ffffffffa0387552>] btrfs_log_dentry_safe+0x5a/0x75 [btrfs]
[23045.436044] [<ffffffffa035f464>] btrfs_sync_file+0x304/0x425 [btrfs]
[23045.436044] [<ffffffff811acaf4>] vfs_fsync_range+0x8c/0x9e
[23045.436044] [<ffffffff811acb22>] vfs_fsync+0x1c/0x1e
[23045.436044] [<ffffffff811acc79>] do_fsync+0x31/0x4a
[23045.436044] [<ffffffff811ace99>] SyS_fsync+0x10/0x14
[23045.436044] [<ffffffff814a88e5>] entry_SYSCALL_64_fastpath+0x18/0xa8
[23045.436044] [<ffffffff8108f039>] ? trace_hardirqs_off_caller+0x3f/0xaa
An example reproducer for this is:
$ mkfs.btrfs -f /dev/sdb
$ mount /dev/sdb /mnt
$ mkdir /mnt/dir
$ touch /mnt/dir/foo
$ sync
$ mv /mnt/dir/foo /mnt/dir/bar
$ touch /mnt/dir/foo
$ xfs_io -c "fsync" /mnt/dir/bar
This is because while logging the inode of file bar we end up logging its
parent directory (since its inode has an unlink_trans field matching the
current transaction id due to the rename operation), which in turn logs
the inodes for all its new dentries, so that the new inode for the new
file named foo gets logged which in turn triggered another logging attempt
for the inode we are fsync'ing, since that inode had an old name that
corresponds to the name of the new inode.
So fix this by ensuring that when logging the inode for a new dentry that
has a name matching an old name of some other inode, we don't log again
the original inode that we are fsync'ing.
Fixes: 44f714dae5 ("Btrfs: improve performance on fsync against new inode after rename/unlink")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
Right now we treat leaf which has zero item as a valid one
because we could have an empty tree, that is, a root that is
also a leaf without any item, however, in the same case but
when the leaf is not a root, we can end up with hitting the
BUG_ON(1) in btrfs_extend_item() called by
setup_inline_extent_backref().
This makes us check the situation as a corruption if leaf is
not its own root.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
When btree node (level = 1) has nritems which equals to zero,
we can end up with panic due to insert_ptr()'s
BUG_ON(slot > nritems);
where slot is 1 and nritems is 0, as copy_for_split() calls
insert_ptr(.., path->slots[1] + 1, ...);
A invalid value results in the whole mess, this adds the check
for btree's node nritems so that we stop reading block when
when something is wrong.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
commit 909c3a22da (Btrfs: fix loading of orphan roots leading to BUG_ON)
avoids the BUG_ON but can add an aliased root to the dead_roots list or
leak the root.
Since we've already been loading roots into the radix tree, we should
use it before looking the root up on disk.
Cc: <stable@vger.kernel.org> # 4.5
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
We need to call free_extent_map() on the em we look up.
Signed-off-by: Josef Bacik <jbacik@fb.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
At the end of unmount/dev-delete, if the device exclusive open is not
actually closed, then there might be a race with another program in
the userland who is trying to open the device in exclusive mode and
it may fail for eg:
unmount /btrfs; fsck /dev/x
btrfs dev del /dev/x /btrfs; fsck /dev/x
so here background blkdev_put() is not a choice
Signed-off-by: Anand Jain <Anand.Jain@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
Function start_transaction() can return ERR_PTR(1) when flush is
BTRFS_RESERVE_FLUSH_LIMIT, so the call graph is
start_transaction (return ERR_PTR(1))
-> btrfs_block_rsv_add (return 1)
-> reserve_metadata_bytes (return 1)
-> flush_space (return 1)
-> do_chunk_alloc (return 1)
With BTRFS_RESERVE_FLUSH_LIMIT, if flush_space is already on the
flush_state of ALLOC_CHUNK and it successfully allocates a new
chunk, then instead of trying to reserve space again,
reserve_metadata_bytes returns 1 immediately.
Eventually the callers who call start_transaction() usually just
do the IS_ERR() check which ERR_PTR(1) can pass, then it'll get
a panic when dereferencing a pointer which is ERR_PTR(1).
The following patch fixes the above problem.
"btrfs: flush_space: treat return value of do_chunk_alloc properly"
https://patchwork.kernel.org/patch/7778651/
This add comments to clarify do_chunk_alloc()'s return value.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
When running fstests generic/068, sometimes we got below deadlock:
xfs_io D ffff8800331dbb20 0 6697 6693 0x00000080
ffff8800331dbb20 ffff88007acfc140 ffff880034d895c0 ffff8800331dc000
ffff880032d243e8 fffffffeffffffff ffff880032d24400 0000000000000001
ffff8800331dbb38 ffffffff816a9045 ffff880034d895c0 ffff8800331dbba8
Call Trace:
[<ffffffff816a9045>] schedule+0x35/0x80
[<ffffffff816abab2>] rwsem_down_read_failed+0xf2/0x140
[<ffffffff8118f5e1>] ? __filemap_fdatawrite_range+0xd1/0x100
[<ffffffff8134f978>] call_rwsem_down_read_failed+0x18/0x30
[<ffffffffa06631fc>] ? btrfs_alloc_block_rsv+0x2c/0xb0 [btrfs]
[<ffffffff810d32b5>] percpu_down_read+0x35/0x50
[<ffffffff81217dfc>] __sb_start_write+0x2c/0x40
[<ffffffffa067f5d5>] start_transaction+0x2a5/0x4d0 [btrfs]
[<ffffffffa067f857>] btrfs_join_transaction+0x17/0x20 [btrfs]
[<ffffffffa068ba34>] btrfs_evict_inode+0x3c4/0x5d0 [btrfs]
[<ffffffff81230a1a>] evict+0xba/0x1a0
[<ffffffff812316b6>] iput+0x196/0x200
[<ffffffffa06851d0>] btrfs_run_delayed_iputs+0x70/0xc0 [btrfs]
[<ffffffffa067f1d8>] btrfs_commit_transaction+0x928/0xa80 [btrfs]
[<ffffffffa0646df0>] btrfs_freeze+0x30/0x40 [btrfs]
[<ffffffff81218040>] freeze_super+0xf0/0x190
[<ffffffff81229275>] do_vfs_ioctl+0x4a5/0x5c0
[<ffffffff81003176>] ? do_audit_syscall_entry+0x66/0x70
[<ffffffff810038cf>] ? syscall_trace_enter_phase1+0x11f/0x140
[<ffffffff81229409>] SyS_ioctl+0x79/0x90
[<ffffffff81003c12>] do_syscall_64+0x62/0x110
[<ffffffff816acbe1>] entry_SYSCALL64_slow_path+0x25/0x25
>From this warning, freeze_super() already holds SB_FREEZE_FS, but
btrfs_freeze() will call btrfs_commit_transaction() again, if
btrfs_commit_transaction() finds that it has delayed iputs to handle,
it'll start_transaction(), which will try to get SB_FREEZE_FS lock
again, then deadlock occurs.
The root cause is that in btrfs, sync_filesystem(sb) does not make
sure all metadata is updated. There still maybe some codes adding
delayed iputs, see below sample race window:
CPU1 | CPU2
|-> freeze_super() |
|-> sync_filesystem(sb); |
| |-> cleaner_kthread()
| | |-> btrfs_delete_unused_bgs()
| | |-> btrfs_remove_chunk()
| | |-> btrfs_remove_block_group()
| | |-> btrfs_add_delayed_iput()
| |
|-> sb->s_writers.frozen = SB_FREEZE_FS; |
|-> sb_wait_write(sb, SB_FREEZE_FS); |
| acquire SB_FREEZE_FS lock. |
| |
|-> btrfs_freeze() |
|-> btrfs_commit_transaction() |
|-> btrfs_run_delayed_iputs() |
| will handle delayed iputs, |
| that means start_transaction() |
| will be called, which will try |
| to get SB_FREEZE_FS lock. |
To fix this issue, introduce a "int fs_frozen" to record internally whether
fs has been frozen. If fs has been frozen, we can not handle delayed iputs.
Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ add comment to btrfs_freeze ]
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
This patch can fix some false ENOSPC errors, below test script can
reproduce one false ENOSPC error:
#!/bin/bash
dd if=/dev/zero of=fs.img bs=$((1024*1024)) count=128
dev=$(losetup --show -f fs.img)
mkfs.btrfs -f -M $dev
mkdir /tmp/mntpoint
mount $dev /tmp/mntpoint
cd /tmp/mntpoint
xfs_io -f -c "falloc 0 $((64*1024*1024))" testfile
Above script will fail for ENOSPC reason, but indeed fs still has free
space to satisfy this request. Please see call graph:
btrfs_fallocate()
|-> btrfs_alloc_data_chunk_ondemand()
| bytes_may_use += 64M
|-> btrfs_prealloc_file_range()
|-> btrfs_reserve_extent()
|-> btrfs_add_reserved_bytes()
| alloc_type is RESERVE_ALLOC_NO_ACCOUNT, so it does not
| change bytes_may_use, and bytes_reserved += 64M. Now
| bytes_may_use + bytes_reserved == 128M, which is greater
| than btrfs_space_info's total_bytes, false enospc occurs.
| Note, the bytes_may_use decrease operation will be done in
| end of btrfs_fallocate(), which is too late.
Here is another simple case for buffered write:
CPU 1 | CPU 2
|
|-> cow_file_range() |-> __btrfs_buffered_write()
|-> btrfs_reserve_extent() | |
| | |
| | |
| ..... | |-> btrfs_check_data_free_space()
| |
| |
|-> extent_clear_unlock_delalloc() |
In CPU 1, btrfs_reserve_extent()->find_free_extent()->
btrfs_add_reserved_bytes() do not decrease bytes_may_use, the decrease
operation will be delayed to be done in extent_clear_unlock_delalloc().
Assume in this case, btrfs_reserve_extent() reserved 128MB data, CPU2's
btrfs_check_data_free_space() tries to reserve 100MB data space.
If
100MB > data_sinfo->total_bytes - data_sinfo->bytes_used -
data_sinfo->bytes_reserved - data_sinfo->bytes_pinned -
data_sinfo->bytes_readonly - data_sinfo->bytes_may_use
btrfs_check_data_free_space() will try to allcate new data chunk or call
btrfs_start_delalloc_roots(), or commit current transaction in order to
reserve some free space, obviously a lot of work. But indeed it's not
necessary as long as decreasing bytes_may_use timely, we still have
free space, decreasing 128M from bytes_may_use.
To fix this issue, this patch chooses to update bytes_may_use for both
data and metadata in btrfs_add_reserved_bytes(). For compress path, real
extent length may not be equal to file content length, so introduce a
ram_bytes argument for btrfs_reserve_extent(), find_free_extent() and
btrfs_add_reserved_bytes(), it's becasue bytes_may_use is increased by
file content length. Then compress path can update bytes_may_use
correctly. Also now we can discard RESERVE_ALLOC_NO_ACCOUNT, RESERVE_ALLOC
and RESERVE_FREE.
As we know, usually EXTENT_DO_ACCOUNTING is used for error path. In
run_delalloc_nocow(), for inode marked as NODATACOW or extent marked as
PREALLOC, we also need to update bytes_may_use, but can not pass
EXTENT_DO_ACCOUNTING, because it also clears metadata reservation, so
here we introduce EXTENT_CLEAR_DATA_RESV flag to indicate btrfs_clear_bit_hook()
to update btrfs_space_info's bytes_may_use.
Meanwhile __btrfs_prealloc_file_range() will call
btrfs_free_reserved_data_space() internally for both sucessful and failed
path, btrfs_prealloc_file_range()'s callers does not need to call
btrfs_free_reserved_data_space() any more.
Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
This patch divides btrfs_update_reserved_bytes() into
btrfs_add_reserved_bytes() and btrfs_free_reserved_bytes(), and
next patch will extend btrfs_add_reserved_bytes()to fix some
false ENOSPC error, please see later patch for detailed info.
Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
In prealloc_file_extent_cluster(), btrfs_check_data_free_space() uses
wrong file offset for reloc_inode, it uses cluster->start and cluster->end,
which indeed are extent's bytenr. The correct value should be
cluster->[start|end] minus block group's start bytenr.
start bytenr cluster->start
| | extent | extent | ...| extent |
|----------------------------------------------------------------|
| block group reloc_inode |
Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
When doing log replay at mount time(after power loss), qgroup will leak
numbers of replayed data extents.
The cause is almost the same of balance.
So fix it by manually informing qgroup for owner changed extents.
The bug can be detected by btrfs/119 test case.
Cc: Mark Fasheh <mfasheh@suse.de>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Reviewed-and-Tested-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
This patch fixes a REGRESSION introduced in 4.2, caused by the big quota
rework.
When balancing data extents, qgroup will leak all its numbers for
relocated data extents.
The relocation is done in the following steps for data extents:
1) Create data reloc tree and inode
2) Copy all data extents to data reloc tree
And commit transaction
3) Create tree reloc tree(special snapshot) for any related subvolumes
4) Replace file extent in tree reloc tree with new extents in data reloc
tree
And commit transaction
5) Merge tree reloc tree with original fs, by swapping tree blocks
For 1)~4), since tree reloc tree and data reloc tree doesn't count to
qgroup, everything is OK.
But for 5), the swapping of tree blocks will only info qgroup to track
metadata extents.
If metadata extents contain file extents, qgroup number for file extents
will get lost, leading to corrupted qgroup accounting.
The fix is, before commit transaction of step 5), manually info qgroup to
track all file extents in data reloc tree.
Since at commit transaction time, the tree swapping is done, and qgroup
will account these data extents correctly.
Cc: Mark Fasheh <mfasheh@suse.de>
Reported-by: Mark Fasheh <mfasheh@suse.de>
Reported-by: Filipe Manana <fdmanana@gmail.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Tested-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
Refactor btrfs_qgroup_insert_dirty_extent() function, to two functions:
1. btrfs_qgroup_insert_dirty_extent_nolock()
Almost the same with original code.
For delayed_ref usage, which has delayed refs locked.
Change the return value type to int, since caller never needs the
pointer, but only needs to know if they need to free the allocated
memory.
2. btrfs_qgroup_insert_dirty_extent()
The more encapsulated version.
Will do the delayed_refs lock, memory allocation, quota enabled check
and other things.
The original design is to keep exported functions to minimal, but since
more btrfs hacks exposed, like replacing path in balance, we need to
record dirty extents manually, so we have to add such functions.
Also, add comment for both functions, to info developers how to keep
qgroup correct when doing hacks.
Cc: Mark Fasheh <mfasheh@suse.de>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Reviewed-and-Tested-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
We wait on qgroup rescan completion in three places: file system
shutdown, the quota disable ioctl, and the rescan wait ioctl. If the
user sends a signal while we're waiting, we continue happily along. This
is expected behavior for the rescan wait ioctl. It's racy in the shutdown
path but mostly works due to other unrelated synchronization points.
In the quota disable path, it Oopses the kernel pretty much immediately.
Cc: <stable@vger.kernel.org> # v4.4+
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
The qgroup_flags field is overloaded such that it reflects the on-disk
status of qgroups and the runtime state. The BTRFS_QGROUP_STATUS_FLAG_RESCAN
flag is used to indicate that a rescan operation is in progress, but if
the file system is unmounted while a rescan is running, the rescan
operation is paused. If the file system is then mounted read-only,
the flag will still be present but the rescan operation will not have
been resumed. When we go to umount, btrfs_qgroup_wait_for_completion
will see the flag and interpret it to mean that the rescan worker is
still running and will wait for a completion that will never come.
This patch uses a separate flag to indicate when the worker is
running. The locking and state surrounding the qgroup rescan worker
needs a lot of attention beyond this patch but this is enough to
avoid a hung umount.
Cc: <stable@vger.kernel.org> # v4.4+
Signed-off-by; Jeff Mahoney <jeffm@suse.com>
Reviewed-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
do_chunk_alloc returns 1 when it succeeds to allocate a new chunk.
But flush_space will not convert this to 0, and will also return 1.
As a result, reserve_metadata_bytes will think that flush_space failed,
and may potentially return this value "1" to the caller (depends how
reserve_metadata_bytes was called). The caller will also treat this as an error.
For example, btrfs_block_rsv_refill does:
int ret = -ENOSPC;
...
ret = reserve_metadata_bytes(root, block_rsv, num_bytes, flush);
if (!ret) {
block_rsv_add_bytes(block_rsv, num_bytes, 0);
return 0;
}
return ret;
So it will return -ENOSPC.
Signed-off-by: Alex Lyakas <alex@zadarastorage.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
This adds several ASSERT()' s to report memory leak of block group cache.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
When over 1000 file extents refers to one extent, find_parent_nodes()
will be obviously slow, due to the O(n^2)~O(n^3) loops inside
__merge_refs().
The following ftrace shows the cubic growth of execution time:
256 refs
5) + 91.768 us | __add_keyed_refs.isra.12 [btrfs]();
5) 1.447 us | __add_missing_keys.isra.13 [btrfs]();
5) ! 114.544 us | __merge_refs [btrfs]();
5) ! 136.399 us | __merge_refs [btrfs]();
512 refs
6) ! 279.859 us | __add_keyed_refs.isra.12 [btrfs]();
6) 3.164 us | __add_missing_keys.isra.13 [btrfs]();
6) ! 442.498 us | __merge_refs [btrfs]();
6) # 2091.073 us | __merge_refs [btrfs]();
and 1024 refs
7) ! 368.683 us | __add_keyed_refs.isra.12 [btrfs]();
7) 4.810 us | __add_missing_keys.isra.13 [btrfs]();
7) # 2043.428 us | __merge_refs [btrfs]();
7) * 18964.23 us | __merge_refs [btrfs]();
And sort them into the following char:
(Unit: us)
------------------------------------------------------------------------
Trace function | 256 ref | 512 refs | 1024 refs |
------------------------------------------------------------------------
__add_keyed_refs | 91 | 249 | 368 |
__add_missing_keys | 1 | 3 | 4 |
__merge_refs 1st call | 114 | 442 | 2043 |
__merge_refs 2nd call | 136 | 2091 | 18964 |
------------------------------------------------------------------------
We can see the that __add_keyed_refs() grows almost in linear behavior.
And __add_missing_keys() in this case doesn't change much or takes much
time.
While for the 1st __merge_refs() it's square growth
for the 2nd __merge_refs() call it's cubic growth.
It's no doubt that merge_refs() will take a long long time to execute if
the number of refs continues its grows.
So add a cond_resced() into the loop of __merge_refs().
Although this will solve the problem of soft lockup, we need to use the
new rb_tree based structure introduced by Lu Fengqi to really solve the
long execution time.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
When some critical errors occur and FS would be flipped into RO,
if we have an on-going balance, we can end up with a memory leak
of root->reloc_root since btrfs_drop_snapshots() bails out
without freeing reloc_root at the very early start.
However, we're not able to free reloc_root in btrfs_drop_snapshots()
because its caller, merge_reloc_roots(), still needs to access it to
cleanup reloc_root's rbtree.
This makes us free reloc_root when we're going to free fs/file roots.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
When CONFIG_RANDOMIZE_BASE is selected, we modify the page tables to remap the
kernel at a newly-chosen VA range. We do this with the MMU disabled, but do not
invalidate TLBs prior to re-enabling the MMU with the new tables. Thus the old
mappings entries may still live in TLBs, and we risk violating
Break-Before-Make requirements, leading to TLB conflicts and/or other issues.
We invalidate TLBs when we uninsall the idmap in early setup code, but prior to
this we are subject to issues relating to the Break-Before-Make violation.
Avoid these issues by invalidating the TLBs before the new mappings can be
used by the hardware.
Fixes: f80fb3a3d5 ("arm64: add support for kernel ASLR")
Cc: <stable@vger.kernel.org> # 4.6+
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Pull thermal fixes from Zhang Rui:
- Fix cpu_cooling to have separate thermal_cooling_device_ops
structures for cpus with and without power model, to avoid NULL
dereference in cpufreq_state2power. From Brendan Jackman.
- Fix a possible NULL dereference in imx_thermal driver. From Corentin
LABBE.
- Another two trivial fixes, one typo fix and one deleting module
owner. From Caesar Wang and Markus Elfring.
* 'for-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux:
thermal: imx: fix a possible NULL dereference
thermal: trivial: fix the typo
Thermal-INT3406: Delete owner assignment
thermal: cpu_cooling: Fix NULL dereference in cpufreq_state2power
radeon and amdgpu fixes for 4.8. Nothing major:
- fix a performance regression due to the LRU changes in 4.7
- 32 bit fixes
- fix a PLL regression
- misc bug fixes
* 'drm-fixes-4.8' of git://people.freedesktop.org/~agd5f/linux:
drm/amdgpu: skip TV/CV in display parsing
drm/amdgpu: avoid a possible array overflow
drm/amdgpu: fix lru size grouping v2
drm/amdgpu: fix timeout value check in amd_sched_job_recovery
drm/amdgpu: fix sdma_v2_4_ring_test_ib
drm/amdgpu: fix amdgpu_move_blit on 32bit systems
drm/radeon: fix radeon_move_blit on 32bit systems
drm/radeon: only apply the SS fractional workaround to RS[78]80
drm/tegra: Fixes for v4.8-rc4
This contains one fix for DSI runtime power management support that was
introduced in v4.8-rc1. This is slightly more elaborate than I would've
wished, but there are a few corner cases that needed fixing.
* tag 'drm/tegra/for-4.8-rc4' of git://anongit.freedesktop.org/tegra/linux:
drm/tegra: dsi: Enhance runtime power management
Commit e6047149db ("dm: use bio op accessors") switched DM over to
using bio_set_op_attrs() but didn't take care to initialize
lc->io_req.bi_op_flags in dm-log.c:rw_header(). This caused
rw_header()'s call to dm_io() to make bio->bi_op_flags be uninitialized
in dm-io.c:do_region(), which ultimately resulted in a SCSI BUG() in
sd_init_command().
Also, adjust rw_header() and its callers to use REQ_OP_{READ|WRITE}.
Fixes: e6047149db ("dm: use bio op accessors")
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@seagate.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
v4.8-rc3 commit 99f3c90d0d ("dm flakey: error READ bios during the
down_interval") overlooked the 'drop_writes' feature, which is meant to
allow reads to be issued rather than errored, during the down_interval.
Fixes: 99f3c90d0d ("dm flakey: error READ bios during the down_interval")
Reported-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org
__blk_mq_run_hw_queue() currently warns if we are running the queue on a
CPU that isn't set in its mask. However, this can happen if a CPU is
being offlined, and the workqueue handling will place the work on CPU0
instead. Improve the warning so that it only triggers if the batch cpu
in the hardware queue is currently online. If it triggers for that
case, then it's indicative of a flow problem in blk-mq, so we want to
retain it for that case.
Signed-off-by: Jens Axboe <axboe@fb.com>
We do this in a few places, if the CPU is offline. This isn't allowed,
though, since on multi queue hardware, we can't just move a request
from one software queue to another, if they map to different hardware
queues. The request and tag isn't valid on another hardware queue.
This can happen if plugging races with CPU offlining. But it does
no harm, since it can only happen in the window where we are
currently busy freezing the queue and flushing IO, in preparation
for redoing the software <-> hardware queue mappings.
Signed-off-by: Jens Axboe <axboe@fb.com>
If port_guid is set with the default subnet_prefix, then we get a change
event and run a port refresh, we don't update the port_guid. As a
result, attempts to create a target device that uses the new
subnet_prefix in the wwn will fail to find a match and be rejected by
the ib_srpt driver. This makes it impossible to configure a port if it
was initialized with a default subnet_prefix and later changed to any
non-default subnet-prefix. Updating the port refresh task to always
update the wwn based upon the current subnext_prefix solves this
problem.
Cc: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: nab@linux-iscsi.org
Signed-off-by: Doug Ledford <dledford@redhat.com>
Pull UML fix from Richard Weinberger:
"This contains a fix for a build regression introduced during the merge
window"
* 'for-linus-4.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
um: Don't discard .text.exit section
Pull UBIFS fixes from Richard Weinberger:
"This pull requests contains fixes for two issues in UBI and UBIFS:
- wrong UBIFS assertion.
- a UBIFS xattr regression"
* tag 'upstream-4.8-rc4' of git://git.infradead.org/linux-ubifs:
ubifs: Fix xattr generic handler usage
ubifs: Fix assertion in layout_in_gaps()
Pull xen regression fix from David Vrabel:
"Fix a regression in the xenbus device preventing userspace tools from
working"
* tag 'for-linus-4.8b-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen: change the type of xen_vcpu_id to uint32_t
xenbus: don't look up transaction IDs for ordinary writes
We pass xen_vcpu_id mapping information to hypercalls which require
uint32_t type so it would be cleaner to have it as uint32_t. The
initializer to -1 can be dropped as we always do the mapping before using
it and we never check the 'not set' value anyway.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Current driver is reporting wrong values for max_sge and
max_sge_rd in query_device. This breaks the nfs rdma and iser
in some device profiles. Fixing the driver to report
correct values from FW.
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
iwpbl->iwmr points to the structure that contains iwpbl,
which is iwmr. Setting this to NULL would result in
writing to freed memory. So just free iwmr, and return.
Fixes: d374984179 ("i40iw: add files for iwarp interface")
Reported-by: Stefan Assmann <sassmann@redhat.com>
Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Memory allocated for iwqp; iwqp->allocated_buffer is freed twice in
the create_qp error path. Correct this by having it freed only once in
i40iw_free_qp_resources().
Fixes: d374984179 ("i40iw: add files for iwarp interface")
Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
In i40iw_free_virt_mem(), do not set mem->va to NULL
after freeing it as mem->va is a self-referencing pointer
to mem.
Fixes: 4e9042e647 ("i40iw: add hw and utils files")
Reported-by: Stefan Assmann <sassmann@redhat.com>
Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Add NULL check for pdata and pdata->addr before the memcpy in
i40iw_form_cm_frame(). This fixes a NULL pointer de-reference
which occurs when the MPA private data pointer is NULL. Also
only copy pdata->size bytes in the memcpy to prevent reading
past the length of the private data buffer provided by upper layer.
Fixes: f27b4746f3 ("i40iw: add connection management code")
Reported-by: Stefan Assmann <sassmann@redhat.com>
Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
After arbitrary bio size was introduced, the incoming bio may
be very big. We have to split the bio into small bios so that
each holds at most BIO_MAX_PAGES bvecs for safety reason, such
as bio_clone().
This patch fixes the following kernel crash:
> [ 172.660142] BUG: unable to handle kernel NULL pointer dereference at 0000000000000028
> [ 172.660229] IP: [<ffffffff811e53b4>] bio_trim+0xf/0x2a
> [ 172.660289] PGD 7faf3e067 PUD 7f9279067 PMD 0
> [ 172.660399] Oops: 0000 [#1] SMP
> [...]
> [ 172.664780] Call Trace:
> [ 172.664813] [<ffffffffa007f3be>] ? raid1_make_request+0x2e8/0xad7 [raid1]
> [ 172.664846] [<ffffffff811f07da>] ? blk_queue_split+0x377/0x3d4
> [ 172.664880] [<ffffffffa005fb5f>] ? md_make_request+0xf6/0x1e9 [md_mod]
> [ 172.664912] [<ffffffff811eb860>] ? generic_make_request+0xb5/0x155
> [ 172.664947] [<ffffffffa0445c89>] ? prio_io+0x85/0x95 [bcache]
> [ 172.664981] [<ffffffffa0448252>] ? register_cache_set+0x355/0x8d0 [bcache]
> [ 172.665016] [<ffffffffa04497d3>] ? register_bcache+0x1006/0x1174 [bcache]
The issue can be reproduced by the following steps:
- create one raid1 over two virtio-blk
- build bcache device over the above raid1 and another cache device
and bucket size is set as 2Mbytes
- set cache mode as writeback
- run random write over ext4 on the bcache device
Fixes: 54efd50(block: make generic_make_request handle arbitrarily sized bios)
Reported-by: Sebastian Roesner <sroesner-kernelorg@roesner-online.de>
Reported-by: Eric Wheeler <bcache@lists.ewheeler.net>
Cc: stable@vger.kernel.org (4.3+)
Cc: Shaohua Li <shli@fb.com>
Acked-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
nvme_set_features() callers seem to expect that passing NULL as the
result pointer is acceptable. Teach nvme_set_features() not to try to
write to the NULL address.
For symmetry, make the same change to nvme_get_features(), despite the
fact that all current callers pass a valid result pointer.
I assume that this bug hasn't been reported in practice because
the callers that pass NULL are all in the SCSI translation layer
and no one uses the relevant operations.
Cc: stable@vger.kernel.org
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
The MIPI DSI output on Tegra SoCs requires some external logic to
calibrate the MIPI pads before a video signal can be transmitted. This
MIPI calibration logic requires to be powered on while the MIPI pads are
being used, which is currently done as part of the DSI driver's probe
implementation.
This is suboptimal because it will leave the MIPI calibration logic
powered up even if the DSI output is never used.
On Tegra114 and earlier this behaviour also causes the driver to hang
while trying to power up the MIPI calibration logic because the power
partition that contains the MIPI calibration logic will be powered on
by the display controller at output pipeline configuration time. Thus
the power up sequence for the MIPI calibration logic happens before
it's power partition is guaranteed to be enabled.
Fix this by splitting up the API into a request/free pair of functions
that manage the runtime dependency between the DSI and the calibration
modules (no registers are accessed) and a set of enable, calibrate and
disable functions that program the MIPI calibration logic at points in
time where the power partition is really enabled.
While at it, make sure that the runtime power management also works in
ganged mode, which is currently also broken.
Reported-by: Jonathan Hunter <jonathanh@nvidia.com>
Tested-by: Jonathan Hunter <jonathanh@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
When tearing down an AUX buf for an event via perf_mmap_close(),
__perf_event_output_stop() is called on the event's CPU to ensure that
trace generation is halted before the process of unmapping and
freeing the buffer pages begins.
The callback is performed via cpu_function_call(), which ensures that it
runs with interrupts disabled and is therefore not preemptible.
Unfortunately, the current code grabs the per-cpu context pointer using
get_cpu_ptr(), which unnecessarily disables preemption and doesn't pair
the call with put_cpu_ptr(), leading to a preempt_count() imbalance and
a BUG when freeing the AUX buffer later on:
WARNING: CPU: 1 PID: 2249 at kernel/events/ring_buffer.c:539 __rb_free_aux+0x10c/0x120
Modules linked in:
[...]
Call Trace:
[<ffffffff813379dd>] dump_stack+0x4f/0x72
[<ffffffff81059ff6>] __warn+0xc6/0xe0
[<ffffffff8105a0c8>] warn_slowpath_null+0x18/0x20
[<ffffffff8112761c>] __rb_free_aux+0x10c/0x120
[<ffffffff81128163>] rb_free_aux+0x13/0x20
[<ffffffff8112515e>] perf_mmap_close+0x29e/0x2f0
[<ffffffff8111da30>] ? perf_iterate_ctx+0xe0/0xe0
[<ffffffff8115f685>] remove_vma+0x25/0x60
[<ffffffff81161796>] exit_mmap+0x106/0x140
[<ffffffff8105725c>] mmput+0x1c/0xd0
[<ffffffff8105cac3>] do_exit+0x253/0xbf0
[<ffffffff8105e32e>] do_group_exit+0x3e/0xb0
[<ffffffff81068d49>] get_signal+0x249/0x640
[<ffffffff8101c273>] do_signal+0x23/0x640
[<ffffffff81905f42>] ? _raw_write_unlock_irq+0x12/0x30
[<ffffffff81905f69>] ? _raw_spin_unlock_irq+0x9/0x10
[<ffffffff81901896>] ? __schedule+0x2c6/0x710
[<ffffffff810022a4>] exit_to_usermode_loop+0x74/0x90
[<ffffffff81002a56>] prepare_exit_to_usermode+0x26/0x30
[<ffffffff81906d1b>] retint_user+0x8/0x10
This patch uses this_cpu_ptr() instead of get_cpu_ptr(), since preemption is
already disabled by the caller.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Reviewed-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Fixes: 95ff4ca26c ("perf/core: Free AUX pages in unmap path")
Link: http://lkml.kernel.org/r/20160824091905.GA16944@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull vhost bugfix from Michael Tsirkin:
"This includes a single bugfix for vhost-scsi"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
vhost/scsi: fix reuse of &vq->iov[out] in response
The ARM architected timer driver falls under the drivers/clocksource/
catch-all in MAINTAINERS, and get_maintainers.pl doesn't suggest a
number of people who should be Cc'd.
The ARM architected timer is a core component of ARMv7+VE and ARMv8, and
is critical to the correct operation of both architecture ports (and
their respective KVM code), and patches to it should have review by
knowledgeable interested parties.
This patch adds a MAINTAINERS entry for the driver and its low-level
arch components, such that get_maintainer.pl will always include
relevant interested parties for modifications to the driver. For the
timebeing, this means myself and Marc Zyngier.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/1470737036-2082-1-git-send-email-mark.rutland@arm.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
MSI Cubi MS-B120 needs the same fixup as the Gigabyte BXBT-2807 for its
mic to work.
They both use a single 3-way jack for both mic and headset with an
ALC283 codec, with the same pins used.
Cc: Daniel Drake <drake@endlessm.com>
Signed-off-by: Anisse Astier <anisse@astier.eu>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
It was reported that hibernation could fail on the 2nd attempt, where the
system hangs at hibernate() -> syscore_resume() -> i8237A_resume() ->
claim_dma_lock(), because the lock has already been taken.
However there is actually no other process would like to grab this lock on
that problematic platform.
Further investigation showed that the problem is triggered by setting
/sys/power/pm_trace to 1 before the 1st hibernation.
Since once pm_trace is enabled, the rtc becomes unmeaningful after suspend,
and meanwhile some BIOSes would like to adjust the 'invalid' RTC (e.g, smaller
than 1970) to the release date of that motherboard during POST stage, thus
after resumed, it may seem that the system had a significant long sleep time
which is a completely meaningless value.
Then in timekeeping_resume -> tk_debug_account_sleep_time, if the bit31 of the
sleep time happened to be set to 1, fls() returns 32 and we add 1 to
sleep_time_bin[32], which causes an out of bounds array access and therefor
memory being overwritten.
As depicted by System.map:
0xffffffff81c9d080 b sleep_time_bin
0xffffffff81c9d100 B dma_spin_lock
the dma_spin_lock.val is set to 1, which caused this problem.
This patch adds a sanity check in tk_debug_account_sleep_time()
to ensure we don't index past the sleep_time_bin array.
[jstultz: Problem diagnosed and original patch by Chen Yu, I've solved the
issue slightly differently, but borrowed his excelent explanation of the
issue here.]
Fixes: 5c83545f24 "power: Add option to log time spent in suspend"
Reported-by: Janek Kozicki <cosurgi@gmail.com>
Reported-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Cc: linux-pm@vger.kernel.org
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Xunlei Pang <xpang@redhat.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: stable <stable@vger.kernel.org>
Cc: Zhang Rui <rui.zhang@intel.com>
Link: http://lkml.kernel.org/r/1471993702-29148-3-git-send-email-john.stultz@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
When I added some extra sanity checking in timekeeping_get_ns() under
CONFIG_DEBUG_TIMEKEEPING, I missed that the NMI safe __ktime_get_fast_ns()
method was using timekeeping_get_ns().
Thus the locking added to the debug checks broke the NMI-safety of
__ktime_get_fast_ns().
This patch open-codes the timekeeping_get_ns() logic for
__ktime_get_fast_ns(), so can avoid any deadlocks in NMI.
Fixes: 4ca22c2648 "timekeeping: Add warnings when overflows or underflows are observed"
Reported-by: Steven Rostedt <rostedt@goodmis.org>
Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Cc: stable <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/1471993702-29148-2-git-send-email-john.stultz@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Pull f2fs fixes from Jaegeuk Kim:
- fsmark regression
- i_size race condition
- wrong conditions in f2fs_move_file_range
* tag 'for-f2fs-v4.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs:
f2fs: avoid potential deadlock in f2fs_move_file_range
f2fs: allow copying file range only in between regular files
Revert "f2fs: move i_size_write in f2fs_write_end"
Revert "f2fs: use percpu_rw_semaphore"
We can't initialize the list head on deletion as this causes the node to
point to itself, which causes an infinite loop if vmd_irq() happens to be
servicing that node.
The list initialization was trying to fix a bug from multiple calls to
disable the same IRQ. Fix this instead by having the VMD driver track if
the interrupt is enabled.
[bhelgaas: changelog, add "Fixes"]
Fixes: 97e9230635 ("x86/PCI: VMD: Initialize list item in IRQ disable")
Reported-by: Grzegorz Koczot <grzegorz.koczot@intel.com>
Tested-by: Miroslaw Drost <miroslaw.drost@intel.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by Jon Derrick: <jonathan.derrick@intel.com>
Commit e41f501d39 ("vmlinux.lds: account for destructor sections")
added '.text.exit' to EXIT_TEXT which is discarded at link time by default.
This breaks compilation of UML:
`.text.exit' referenced in section `.fini_array' of
/usr/lib/gcc/x86_64-linux-gnu/6/../../../x86_64-linux-gnu/libc.a(sdlerror.o):
defined in discarded section `.text.exit' of
/usr/lib/gcc/x86_64-linux-gnu/6/../../../x86_64-linux-gnu/libc.a(sdlerror.o)
Apparently UML doesn't want to discard exit text, so let's place all EXIT_TEXT
sections in .exit.text.
Fixes: e41f501d39 ("vmlinux.lds: account for destructor sections")
Reported-by: Stefan Traby <stefan@hello-penguin.com>
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: <stable@vger.kernel.org>
Acked-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
An assertion in layout_in_gaps() verifies that the gap_lebs pointer is
below the maximum bound. When computing this maximum bound the idx_lebs
count is multiplied by sizeof(int), while C pointers arithmetic does take
into account the size of the pointed elements implicitly already. Remove
the multiplication to fix the assertion.
Fixes: 1e51764a3c ("UBIFS: add new flash file system")
Cc: <stable@vger.kernel.org>
Signed-off-by: Vincent Stehlé <vincent.stehle@intel.com>
Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Pull hardened usercopy fixes from Kees Cook:
- avoid signed math problems on unexpected compilers
- avoid false positives at very end of kernel text range checks
* tag 'usercopy-v4.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
usercopy: fix overlap check for kernel text
usercopy: avoid potentially undefined behavior in pointer math
Pull crypto fixes from Herbert Xu:
"This fixes a number of memory corruption bugs in the newly added
sha256-mb/sha256-mb code"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: sha512-mb - fix ctx pointer
crypto: sha256-mb - fix ctx pointer and digest copy
Current cxgb4 arm CQ logic ignores IB_CQ_REPORT_MISSED_EVENTS for
request completion notification on a CQ. Due to this ib_poll_handler()
assumes all events polled and avoids further iopoll scheduling.
This patch adds logic to cxgb4 ib_req_notify_cq() handler to check if
CQ is not empty and return accordingly. Based on the return value of
ib_req_notify_cq() handler, ib_poll_handler() will schedule a run of
iopoll handler.
Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
iwdev->mem_resources is incorrectly defined as an unsigned
long instead of u8. As a result, the offset into the dynamic
allocated structures in i40iw_initialize_hw_resources() is
incorrectly calculated and would lead to writing of memory
regions outside of the allocated buffer.
Fixes: 8e06af711b ("i40iw: add main, hdr, status")
Reported-by: Stefan Assmann <sassmann@redhat.com>
Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
* Reuse existing functionality from memdup_user() instead of keeping
duplicate source code.
This issue was detected by using the Coccinelle software.
* The local variable "ret" will be set to an appropriate value a bit later.
Thus omit the explicit initialisation at the beginning.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Reuse existing functionality from memdup_user() instead of keeping
duplicate source code.
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The address of the iovec &vq->iov[out] is not guaranteed to contain the scsi
command's response iovec throughout the lifetime of the command. Rather, it
is more likely to contain an iovec from an immediately following command
after looping back around to vhost_get_vq_desc(). Pass along the iovec
entirely instead.
Fixes: 79c14141a4 ("vhost/scsi: Convert completion path to use copy_to_iter")
Cc: stable@vger.kernel.org
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Fixes: ddd17531ad ("ASoC: omap-mcpdm: Clean up with devm_* function")
Managed irq request will not doing any good in ASoC probe level as it is
not going to free up the irq when the driver is unbound from the sound
card.
Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
Reported-by: Russell King <linux@armlinux.org.uk>
Signed-off-by: Mark Brown <broonie@kernel.org>
check_bogus_address() checked for pointer overflow using this expression,
where 'ptr' has type 'const void *':
ptr + n < ptr
Since pointer wraparound is undefined behavior, gcc at -O2 by default
treats it like the following, which would not behave as intended:
(long)n < 0
Fortunately, this doesn't currently happen for kernel code because kernel
code is compiled with -fno-strict-overflow. But the expression should be
fixed anyway to use well-defined integer arithmetic, since it could be
treated differently by different compilers in the future or could be
reported by tools checking for undefined behavior.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Pull ARC fixes from Vineet Gupta:
- support for Syscall ABI v4 with upstream gcc 6.x
- lockdep fix (Daniel Mentz)
- gdb register clobber (Liav Rehana)
- couple of missing exports for modules
- other fixes here and there
* tag 'arc-4.8-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
ARC: export __udivdi3 for modules
ARC: mm: fix build breakage with STRICT_MM_TYPECHECKS
ARC: export kmap
ARC: Support syscall ABI v4
ARC: use correct offset in pt_regs for saving/restoring user mode r25
ARC: Elide redundant setup of DMA callbacks
ARC: Call trace_hardirqs_on() before enabling irqs
Pull GPIO fixes from Linus Walleij:
"Here are a few GPIO fixes for v4.8.
I was expecting some fallout from the new chardev rework but nothing
like that turned up att all. Instead a Kconfig confusion that I think
I have finally nailed, then some ordinary driver noise and trivia.
This fixes a Kconfig issue with UM: when I made GPIOLIB available to
all archs, that included UM, but the OF part of GPIOLIB requires
HAS_IOMEM, so we add HAS_IOMEM as a dependency to OF_GPIO.
This in turn exposed the fact that a few GPIO drivers were implicitly
assuming OF_GPIO as their dependency but instead depended on OF alone
(the typical problem being a pointer inside gpio_chip not existing
unless OF_GPIO is selected) and then UM would fail to compile with
these drivers instead. Then I lost patience and made any GPIO driver
depending on just OF depend on OF_GPIO instead, that is certainly what
they meant and the only thing that makes sense anyway. GPIO with just
OF but !OF_GPIO does not make sense.
Also a fix for the max730x driver data pointer, and a minor comment
fix for the GPIO tools"
* tag 'gpio-v4.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
gpio: make any OF dependent driver depend on OF_GPIO
gpio: Fix OF build problem on UM
gpio: max730x: set gpiochip data pointer before using it
tools/gpio: fix gpio-event-mon header comment
ADS7846 regulator is disabled twice in a row in ads7846_remove(). Valid
one is in ads7846_disable().
Removing the ads7846 module causes warning about unbalanced disables.
...
WARNING: CPU: 0 PID: 29269 at drivers/regulator/core.c:2251 _regulator_disable+0xf8/0x130
unbalanced disables for vads7846
CPU: 0 PID: 29269 Comm: rmmod Tainted: G D W 4.7.0+ #3
Hardware name: HTC Magician
...
show_stack+0x10/0x14
__warn+0xd8/0x100
warn_slowpath_fmt+0x38/0x48
_regulator_disable+0xf8/0x130
regulator_disable+0x34/0x60
ads7846_remove+0x58/0xd4 [ads7846]
spi_drv_remove+0x1c/0x34
__device_release_driver+0x84/0x114
driver_detach+0x8c/0x90
bus_remove_driver+0x5c/0xc8
SyS_delete_module+0x1a0/0x238
ret_fast_syscall+0x0/0x38
Signed-off-by: Petr Cvek <petr.cvek@tul.cz>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
The silead code is using devm_foo for everything (and does not free
any resources). Except that it is using gpiod_get instead of
devm_gpiod_get (but is not freeing the gpio_desc), change this
to use devm_gpiod_get so that the gpio will be properly released.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
The i40iw initiator sends an MPA-request with ird=16 and ord=16. The cxgb4
responder sends an MPA-reply with ord = 32 causing i40iw to terminate
due to insufficient resources.
The logic to reduce the ORD to <= peer's IRD was wrong.
Reported-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The i40iw initiator sends an MPA-request with ird = 63, ord = 63. The
cxgb4 responder sends a RST. Since the inbound ord=63 and it exceeds
the max_ird/c4iw_max_read_depth (=32 default), chelsio decides to abort.
Instead, cxgb4 should adjust the ord/ird down before presenting it to
the ULP.
Reported-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The unwind logic for creating a user QP has a double vfree
of the non-shared receive queue when handling a "too many qps"
failure.
The code unwinds the mmmap info by decrementing a reference
count which will call rvt_release_mmap_info() which in turn
does the vfree() of the r_rq.wq. The unwind code then does
the same free.
Fix by guarding the vfree() with the same test that is done
in close and only do the vfree() if qp->ip is NULL.
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Previously, J_KEY generation was based on the lower 16 bits
of the user's UID. While this works, it was not good enough
as a non-root user could collide with a root user given a
sufficiently large UID.
This patch attempt to improve the J_KEY generation by using
the following algorithm:
The 16 bit J_KEY space is partitioned into 3 separate spaces
reserved for different user classes:
* all users with administtor privileges (including 'root')
will use J_KEYs in the range of 0 to 31,
* all kernel protocols, which use KDETH packets will use
J_KEYs in the range of 32 to 63, and
* all other users will use J_KEYs in the range of 64 to
65535.
The above separation is aimed at preventing different user levels
from sending packets to each other and, additionally, separate
kernel protocols from all other types of users. The later is meant
to prevent the potential corruption of kernel memory by any other
type of user.
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The driver does not check if the CableInfo query is supported for the
port type. Return early if CableInfo is not supported for the port type,
making compliance with the specification explicit and preventing lower
level code from potentially doing the wrong thing if the query is not
supported for the hardware implementation.
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Easwar Hariharan <easwar.hariharan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The Soft RoCE (rxe) is located in drivers/inifiniband/sw
and not in drivers/infiniband/hw/.
This patch fixes it.
Fixes: 8700e3e7c4 ("Soft RoCE driver")
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
If 'pci_register_driver' fails, we return 'err' which is known to be 0.
Return the error instead.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The monitor values from bytes 22 through 81 of the QSFP memory space
(SFF 8636) are dynamic and serving them out of the QSFP memory cache
maintained by the driver provides stale data to the CableInfo SMA query.
This patch refreshes the dynamic values from the QSFP memory on request
and overwrites the stale data from the cache for the overlap between the
requested range and the monitor range.
Reviewed-by: Jubin John <jubin.john@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Easwar Hariharan <easwar.hariharan@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The qp init function does a kzalloc() while holding the RCU
lock that encounters the following warning with a debug kernel
when a cat of the qp_stats is done:
[ 231.723948] rcu_scheduler_active = 1, debug_locks = 0
[ 231.731939] 3 locks held by cat/11355:
[ 231.736492] #0: (debugfs_srcu){......}, at: [<ffffffff813001a5>] debugfs_use_file_start+0x5/0x90
[ 231.746955] #1: (&p->lock){+.+.+.}, at: [<ffffffff81289a6c>] seq_read+0x4c/0x3c0
[ 231.755873] #2: (rcu_read_lock){......}, at: [<ffffffffa0a0c535>] _qp_stats_seq_start+0x5/0xd0 [hfi1]
[ 231.766862]
The init functions do an implicit next which requires the rcu read lock
before the kzalloc().
Fix for both drivers is to change the scope of the init function to only
do the allocation and the initialization of the just allocated iter.
The implict next is moved back into the respective start functions to fix
the issue.
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
CC: <stable@vger.kernel.org> # 4.6.x-
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Fix to return error code -ENOMEM from the alloc error handling
case instead of 0, as done elsewhere in this function.
Signed-off-by: Wei Yongjun <weiyj.lk@gmail.com>
Acked-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
'work' and 'route->path_rec' are malloced in cma_resolve_iboe_route()
and should be freed before leaving from the error handling cases,
otherwise it will cause memory leak.
Fixes: 200298326b ('IB/core: Validate route when we init ah')
Signed-off-by: Wei Yongjun <weiyj.lk@gmail.com>
Reviewed-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
If CONFIG_FRAME_WARN is small (1K) and CONFIG_NR_CPUS big
then a frame size warning is triggered during build.
Allocate the cpu mask dynamically to silence the warning.
Reviewed-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Error code EAGAIN should be used when errors are temporary and next call
might succeeds.
When error code other than EAGAIN is returned, the caller (mlx4_ib_poll)
will assume all CQE in the same bunch are error too and will drop them all.
Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
If an IRQ is setup using __setup_irq(), which is used by the
request_irq() family of functions, and we are using an SMP kernel then
the affinity of the IRQ will be set via setup_affinity() immediately
after the IRQ is enabled. This call to gic_set_affinity() will lead to
the interrupt being mapped to a VPE. However there are other ways to use
IRQs which don't cause affinity to be set, for example if it is used to
chain to another IRQ controller with irq_set_chained_handler_and_data().
The irq_set_chained_handler_and_data() code path will enable the IRQ,
but will not trigger a call to gic_set_affinity() and in this case
nothing will map the interrupt to a VPE, meaning that the interrupt is
never received.
Fix this by implementing the activate operation for the GIC device IRQ
domain, using gic_shared_irq_domain_map() to map the interrupt to the
correct pin of cpu 0.
Fixes: c98c1822ee ("irqchip/mips-gic: Add device hierarchy domain")
Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Cc: linux-mips@linux-mips.org
Cc: Jason Cooper <jason@lakedaemon.net>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20160819170715.27820-2-paul.burton@imgtec.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
gic_shared_irq_domain_map() is called from gic_irq_domain_alloc() where
the wrong chip has been set, and is then overwritten. Tidy this up by
setting the correct chip the first time, and setting the
handle_level_irq handler from gic_irq_domain_alloc() too.
gic_shared_irq_domain_map() is also called from gic_irq_domain_map(),
which now calls irq_set_chip_and_handler() to retain its previous
behaviour.
This patch prepares for a follow-on which will call
gic_shared_irq_domain_map() from a callback where the lock on the struct
irq_desc is held, which without this change would cause the call to
irq_set_chip_and_handler() to lead to a deadlock.
Fixes: c98c1822ee ("irqchip/mips-gic: Add device hierarchy domain")
Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Cc: linux-mips@linux-mips.org
Cc: Jason Cooper <jason@lakedaemon.net>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20160819170715.27820-1-paul.burton@imgtec.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
When we write watermark values to the hardware, those values are stored
in dev_priv->wm.skl_hw. However with recent watermark changes, the
results structure we're copying from only contains valid watermark and
DDB values for the pipes that are actually changing; the values for
other pipes remain 0. Thus a blind copy of the entire skl_wm_values
structure will clobber the values for unchanged pipes...we need to be
more selective and only copy over the values for the changing pipes.
This mistake was hidden until recently due to another bug that caused us
to erroneously re-calculate watermarks for all active pipes rather than
changing pipes. Only when that bug was fixed was the impact of this bug
discovered (e.g., modesets failing with "Requested display configuration
exceeds system watermark limitations" messages and leaving watermarks
non-functional, even ones initiated by intel_fbdev_restore_mode).
Changes since v1:
- Add a function for copying a pipe's wm values
(skl_copy_wm_for_pipe()) so we can reuse this later
Fixes: 734fa01f3a ("drm/i915/gen9: Calculate watermarks during atomic 'check' (v2)")
Fixes: 9b61302274 ("drm/i915/gen9: Re-allocate DDB only for changed pipes")
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Lyude <cpaul@redhat.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Cc: stable@vger.kernel.org
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@intel.com>
Cc: Radhakrishna Sripada <radhakrishna.sripada@intel.com>
Cc: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1471463761-26796-4-git-send-email-cpaul@redhat.com
(cherry picked from commit 2722efb90b)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Since the watermark calculations for Skylake are still broken, we're apt
to hitting underruns very easily under multi-monitor configurations.
While it would be lovely if this was fixed, it's not. Another problem
that's been coming from this however, is the mysterious issue of
underruns causing full system hangs. An easy way to reproduce this with
a skylake system:
- Get a laptop with a skylake GPU, and hook up two external monitors to
it
- Move the cursor from the built-in LCD to one of the external displays
as quickly as you can
- You'll get a few pipe underruns, and eventually the entire system will
just freeze.
After doing a lot of investigation and reading through the bspec, I
found the existence of the SAGV, which is responsible for adjusting the
system agent voltage and clock frequencies depending on how much power
we need. According to the bspec:
"The display engine access to system memory is blocked during the
adjustment time. SAGV defaults to enabled. Software must use the
GT-driver pcode mailbox to disable SAGV when the display engine is not
able to tolerate the blocking time."
The rest of the bspec goes on to explain that software can simply leave
the SAGV enabled, and disable it when we use interlaced pipes/have more
then one pipe active.
Sure enough, with this patchset the system hangs resulting from pipe
underruns on Skylake have completely vanished on my T460s. Additionally,
the bspec mentions turning off the SAGV with more then one pipe enabled
as a workaround for display underruns. While this patch doesn't entirely
fix that, it looks like it does improve the situation a little bit so
it's likely this is going to be required to make watermarks on Skylake
fully functional.
This will still need additional work in the future: we shouldn't be
enabling the SAGV if any of the currently enabled planes can't enable WM
levels that introduce latencies >= 30 µs.
Changes since v11:
- Add skl_can_enable_sagv()
- Make sure we don't enable SAGV when not all planes can enable
watermarks >= the SAGV engine block time. I was originally going to
save this for later, but I recently managed to run into a machine
that was having problems with a single pipe configuration + SAGV.
- Make comparisons to I915_SKL_SAGV_NOT_CONTROLLED explicit
- Change I915_SAGV_DYNAMIC_FREQ to I915_SAGV_ENABLE
- Move printks outside of mutexes
- Don't print error messages twice
Changes since v10:
- Apparently sandybridge_pcode_read actually writes values and reads
them back, despite it's misleading function name. This means we've
been doing this mostly wrong and have been writing garbage to the
SAGV control. Because of this, we no longer attempt to read the SAGV
status during initialization (since there are no helpers for this).
- mlankhorst noticed that this patch was breaking on some very early
pre-release Skylake machines, which apparently don't allow you to
disable the SAGV. To prevent machines from failing tests due to SAGV
errors, if the first time we try to control the SAGV results in the
mailbox indicating an invalid command, we just disable future attempts
to control the SAGV state by setting dev_priv->skl_sagv_status to
I915_SKL_SAGV_NOT_CONTROLLED and make a note of it in dmesg.
- Move mutex_unlock() a little higher in skl_enable_sagv(). This
doesn't actually fix anything, but lets us release the lock a little
sooner since we're finished with it.
Changes since v9:
- Only enable/disable sagv on Skylake
Changes since v8:
- Add intel_state->modeset guard to the conditional for
skl_enable_sagv()
Changes since v7:
- Remove GEN9_SAGV_LOW_FREQ, replace with GEN9_SAGV_IS_ENABLED (that's
all we use it for anyway)
- Use GEN9_SAGV_IS_ENABLED instead of 0x1 for clarification
- Fix a styling error that snuck past me
Changes since v6:
- Protect skl_enable_sagv() with intel_state->modeset conditional in
intel_atomic_commit_tail()
Changes since v5:
- Don't use is_power_of_2. Makes things confusing
- Don't use the old state to figure out whether or not to
enable/disable the sagv, use the new one
- Split the loop in skl_disable_sagv into it's own function
- Move skl_sagv_enable/disable() calls into intel_atomic_commit_tail()
Changes since v4:
- Use is_power_of_2 against active_crtcs to check whether we have > 1
pipe enabled
- Fix skl_sagv_get_hw_state(): (temp & 0x1) indicates disabled, 0x0
enabled
- Call skl_sagv_enable/disable() from pre/post-plane updates
Changes since v3:
- Use time_before() to compare timeout to jiffies
Changes since v2:
- Really apply minor style nitpicks to patch this time
Changes since v1:
- Added comments about this probably being one of the requirements to
fixing Skylake's watermark issues
- Minor style nitpicks from Matt Roper
- Disable these functions on Broxton, since it doesn't have an SAGV
Signed-off-by: Lyude <cpaul@redhat.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1471463761-26796-3-git-send-email-cpaul@redhat.com
[mlankhorst: ENOSYS -> ENXIO, whitespace fixes]
(cherry picked from commit 656d1b89e5)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
In the recent patch
bc3d674 drm/i915: Allow userspace to request no-error-capture upon ...
the final version moved the flags and the associated #defines around
so they were adjacent; unfortunately, they ended up between a comment
and the thing (hw_id) to which the comment applies :(
So this patch reshuffles the comment and subject back together.
Also, as we're touching 'hw_id', let's change it from just 'unsigned'
to a fully-specified 'unsigned int', because some code checking tools
(including checkpatch) object to plain 'unsigned'.
Fixes: bc3d674462 ("drm/i915: Allow userspace to request no-error-capture...")
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1471616622-6919-1-git-send-email-david.s.gordon@intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
(cherry picked from commit 0be81156b3)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
The commit 02fc76f6a changed base of the sysfs attributes from device to card.
The "show" callbacks dereferenced wrong objects because of this.
Fixes: 02fc76f6a7 ('ALSA: line6: Create sysfs via snd_card_add_dev_attr()')
Cc: <stable@vger.kernel.org> # v4.0+
Reviewed-by: Stefan Hajnoczi <stefanha@gmail.com>
Signed-off-by: Andrej Krutak <dev@andree.sk>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
If there's an error, pcm is released in line6_pcm_acquire already.
Fixes: 247d95ee6d ('ALSA: line6: Handle error from line6_pcm_acquire()')
Cc: <stable@vger.kernel.org> # v4.0+
Reviewed-by: Stefan Hajnoczi <stefanha@gmail.com>
Signed-off-by: Andrej Krutak <dev@andree.sk>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Pull genirq/irqchip fixes for 4.8-rc4 from Marc Zygnier
- A critical fix for chained irqchip where we failed to configure
the cascade interrupt trigger
- A GIC fix for self-IPI in SMP-on-UP configurations
- A PM fix for GICv3
- A initialization fix the the GICv3 ITS, triggered by kexec
Pull two parisc fixes from Helge Deller:
"The first patch ensures that the high-res cr16 clocksource (which was
added in kernel 4.7) gets choosen as default clocksource for parisc.
The second patch moves the #define of EREFUSED down inside errno.h and
thus unbreaks building the gccgo compiler"
* 'parisc-4.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Fix order of EREFUSED define in errno.h
parisc: Fix automatic selection of cr16 clocksource
This is an entirely new driver instead of yet another set of patches
to sb_edac.c because:
1) Mapping from PCI devices to socket/memory controller is significantly
different. Skylake scatters devices on a socket across a number of
PCI buses.
2) There is an extra level of interleaving via the "mcroute" register
that would be a little messy to squeeze into the old driver.
3) Validation is getting too expensive. Changes to sb_edac need to
be checked against Sandy Bridge, Ivy Bridge, Haswell, Broadwell and
Knights Landing.
Acked-by: Aristeu Rozanski <aris@redhat.com>
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When building gccgo in userspace, errno.h gets parsed and the go include file
sysinfo.go is generated.
Since EREFUSED is defined to the same value as ECONNREFUSED, and ECONNREFUSED
is defined later on in errno.h, this leads to go complaining that EREFUSED
isn't defined yet.
Fix this trivial problem by moving the define of EREFUSED down after
ECONNREFUSED in errno.h (and clean up the indenting while touching this line).
Signed-off-by: Helge Deller <deller@gmx.de>
Cc: stable@vger.kernel.org
Commit 54b6680090 (parisc: Add native high-resolution sched_clock()
implementation) added support to use the CPU-internal cr16 counters as reliable
clocksource with the help of HAVE_UNSTABLE_SCHED_CLOCK.
Sadly the commit missed to remove the hack which prevented cr16 to become the
default clocksource even on SMP systems.
Signed-off-by: Helge Deller <deller@gmx.de>
Cc: stable@vger.kernel.org # 4.7+
Eric writes:
Please pull this bcache branch based on v4.8-rc2. These fix one
deadlock, one use blkdev_put() use counter, and one dmesg output with a
better pr_err() description.
The kernel test robot reported a usercopy failure in the new hardened
sanity checks, due to a page-crossing copy of the FPU state into the
task structure.
This happened because the kernel test robot was testing with SLOB, which
doesn't actually do the required book-keeping for slab allocations, and
as a result the hardening code didn't realize that the task struct
allocation was one single allocation - and the sanity checks fail.
Since SLOB doesn't even claim to support hardening (and you really
shouldn't use it), the straightforward solution is to just make the
usercopy hardening code depend on the allocator supporting it.
Reported-by: kernel test robot <xiaolong.ye@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull i2c fixes from Wolfram Sang:
"I2C has some pretty standard driver bugfixes and one minor cleanup"
* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: meson: Use complete() instead of complete_all()
i2c: brcmstb: Use complete() instead of complete_all()
i2c: bcm-kona: Use complete() instead of complete_all()
i2c: bcm-iproc: Use complete() instead of complete_all()
i2c: at91: fix support of the "alternative command" feature
i2c: ocores: add missed clk_disable_unprepare() on failure paths
i2c: cros-ec-tunnel: Fix usage of cros_ec_cmd_xfer()
i2c: mux: demux-pinctrl: properly roll back when adding adapter fails
| CC mm/memory.o
| In file included from ../mm/memory.c:53:0:
| ../include/linux/pfn_t.h: In function ‘pfn_t_pte’:
| ../include/linux/pfn_t.h:78:2: error: conversion to non-scalar type requested
| return pfn_pte(pfn_t_to_pfn(pfn), pgprot);
With STRICT_MM_TYPECHECKS pte_t is a struct and the offending code
forces a cast which ends up shifting a struct and hence the gcc warning.
Note that in recent past some of the arches (aarch64, s390) made
STRICT_MM_TYPECHECKS default, but we don't for ARC as this leads to slightly
worse generated code, given ARC ABI definition of returning structs
(which pte_t would become)
Quoting from ARC ABI...
"Results of type struct are returned in a caller-supplied temporary
variable whose address is passed in r0.
For such functions, the arguments are shifted so that they are
passed in r1 and up."
So
- struct to be returned would be allocated on stack requiring extra
code at call sites
- callee updates stack memory to facilitate the return (vs. simple
MOV into return reg r0)
Hence STRICT_MM_TYPECHECKS is not enabled by default for ARC
Cc: <stable@vger.kernel.org> #4.4+
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
The syscall ABI includes the gcc functional calling ABI since a syscall
implies userland caller and kernel callee.
The current gcc ABI (v3) for ARCv2 ISA required 64-bit data be passed in
even-odd register pairs, (potentially punching reg holes when passing such
values as args). This was partly driven by the fact that the double-word
LDD/STD instructions in ARCv2 expect the register alignment and thus gcc
forcing this avoids extra MOV at the cost of a few unused register (which we
have plenty anyways).
This however was rejected as part of upstreaming gcc port to HS. So the new
ABI v4 doesn't enforce the even-odd reg restriction.
Do note that for ARCompact ISA builds v3 and v4 are practically the same in
terms of gcc code generation.
In terms of change management, we infer the new ABI if gcc 6.x onwards
is used for building the kernel.
This also needs a stable backport to enable older kernels to work with
new tools/user-space
Cc: <stable@vger.kernel.org>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
User mode callee regs are explicitly collected before signal delivery or
breakpoint trap. r25 is special for kernel as it serves as task pointer,
so user mode value is clobbered very early. It is saved in pt_regs where
generally only scratch (aka caller saved) regs are saved.
The code to access the corresponding pt_regs location had a subtle bug as
it was using load/store with scaling of offset, whereas the offset was already
byte wise correct. So fix this by replacing LD.AS with a standard LD
Cc: <stable@vger.kernel.org>
Signed-off-by: Liav Rehana <liavr@mellanox.com>
Reviewed-by: Alexey Brodkin <abrodkin@synopsys.com>
[vgupta: rewrote title and commit log]
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
The commit 4097461897 ("Input: i8042 - break load dependency ...")
correctly set up ps2_cmd_mutex pointer for the KBD port but forgot to do
the same for AUX port(s), which results in communication on KBD and AUX
ports to clash with each other.
Fixes: 4097461897 ("Input: i8042 - break load dependency ...")
Reported-by: Bruno Wolff III <bruno@wolff.to>
Tested-by: Bruno Wolff III <bruno@wolff.to>
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Konrad writes:
Please git pull the following three fixes in to your 'for-linus' branch.
It is against 'for-linus' instead of 'for-4.8/drivers' because we had
some code in xen-blkfront go through the Xen tree (with your Ack). The
reason was that you 'for-4.8/drivers' was based on 4.7-rc2, and the
fixes needed to be against newer tag.
This branch fixes as we are exercising the multiqueue components more
aggressively.
Pull device mapper fixes from Mike Snitzer:
- a stable fix for DM round robin multipath path selector to disable
preemption before using this_cpu_ptr()
- a slight increase in DM crypt's mempool reserves to make swap ontop
of DM crypt more performant
- a few DM raid fixes to issues found while testing changes that were
merged in v4.8-rc1
* tag 'dm-4.8-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm raid: support raid0 with missing metadata devices
dm raid: enhance attempt_restore_of_faulty_devices() to support more devices
dm raid: fix restoring of failed devices regression
dm raid: fix frozen recovery regression
dm crypt: increase mempool reserve to better support swapping
dm round robin: do not use this_cpu_ptr() without having preemption disabled
Current code forgets to free resources in the failure path of
xlvbd_alloc_gendisk(), this patch fix it.
Signed-off-by: Bob Liu <bob.liu@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
blk_mq_update_nr_hw_queues() reset all queue limits to default which it's
not as xen-blkfront expected, introducing blkif_set_queue_limits() to reset
limits with initial correct values.
Signed-off-by: Bob Liu <bob.liu@oracle.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Two places didn't get updated when 64KB page granularity was introduced,
this patch fix them.
Signed-off-by: Bob Liu <bob.liu@oracle.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Pull SCSI fixes from James Bottomley:
"Six fairly small fixes. The ipr, mpt3sas and ses ones all trigger
oopses. The megaraid one fixes an attach failure on io mapped only
cards, the fcoe one is an obvious problem in the error path and the
aacraid one is a theoretical security issue (ability to trick the
kernel into a buffer overrun)"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
ses: Fix racy cleanup of /sys in remove_dev()
mpt3sas: Fix resume on WarpDrive flash cards
ipr: Fix sync scsi scan
megaraid_sas: Fix probing cards without io port
aacraid: Check size values after double-fetch from user
fcoe: Use kfree_skb() instead of kfree()
Pull USB fixes from Greg KH:
"Here are a number of USB fixes for reported issues for your tree.
The normal amount of gadget fixes, xhci fixes, new device ids, and a
few other minor things. All of them have been in linux-next for a
while, the full details are in the shortlog below"
* tag 'usb-4.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (43 commits)
xhci: don't dereference a xhci member after removing xhci
usb: xhci: Fix panic if disconnect
xhci: really enqueue zero length TRBs.
xhci: always handle "Command Ring Stopped" events
cdc-acm: fix wrong pipe type on rx interrupt xfers
usb: misc: usbtest: add fix for driver hang
usb: dwc3: gadget: stop processing on HWO set
usb: dwc3: don't set last bit for ISOC endpoints
usb: gadget: rndis: free response queue during REMOTE_NDIS_RESET_MSG
usb: udc: core: fix error handling
usb: gadget: fsl_qe_udc: off by one in setup_received_handle()
usb/gadget: fix gadgetfs aio support.
usb: gadget: composite: Fix return value in case of error
usb: gadget: uvc: Fix return value in case of error
usb: gadget: fix check in sync read from ep in gadgetfs
usb: misc: usbtest: usbtest_do_ioctl may return positive integer
usb: dwc3: fix missing platform_set_drvdata() in dwc3_of_simple_probe()
usb: phy: omap-otg: Fix missing platform_set_drvdata() in omap_otg_probe()
usb: gadget: configfs: add mutex lock before unregister gadget
usb: gadget: u_ether: fix dereference after null check coverify warning
...
Pull xfs and iomap fixes from Dave Chinner:
"Changes in this update:
Regression fixes for XFS changes introduce in 4.8-rc1:
- buffer IO accounting assert failure
- ENOSPC block accounting reservation issue
- DAX IO path page cache invalidation fix
- rmapbt on-disk block count in agf
- correct classification of rmap block type when updating AGFL.
- iomap support for attribute fork mapping
Regression fixes for iomap infrastructure in 4.8-rc1:
- fiemap: honor FIEMAP_FLAG_SYNC
- fiemap: implement FIEMAP_FLAG_XATTR support to fix XFS regression
- make mark_page_accessed and pagefault_disable usage consistent with
other IO paths"
* tag 'xfs-iomap-for-linus-4.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs:
xfs: remove OWN_AG rmap when allocating a block from the AGFL
xfs: (re-)implement FIEMAP_FLAG_XATTR
xfs: simplify xfs_file_iomap_begin
iomap: mark ->iomap_end as optional
iomap: prepare iomap_fiemap for attribute mappings
iomap: fiemap should honor the FIEMAP_FLAG_SYNC flag
iomap: remove superflous pagefault_disable from iomap_write_actor
iomap: remove superflous mark_page_accessed from iomap_write_actor
xfs: store rmapbt block count in the AGF
xfs: don't invalidate whole file on DAX read/write
xfs: fix bogus space reservation in xfs_iomap_write_allocate
xfs: don't assert fail on non-async buffers on ioacct decrement
Pull hwmon fixes from Guenter Roeck:
"Fix a bug in it87 driver and URLs in ftsteutates driver"
* tag 'hwmon-for-linus-v4.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: (ftsteutates) Correct ftp urls in driver documentation
hwmon: (it87) Features mask must be 32 bit wide
When mapping a page into the guest we error check using is_error_pfn(),
however this doesn't detect a value of KVM_PFN_NOSLOT, indicating an
error HVA for the page. This can only happen on MIPS right now due to
unusual memslot management (e.g. being moved / removed / resized), or
with an Enhanced Virtual Memory (EVA) configuration where the default
KVM_HVA_ERR_* and kvm_is_error_hva() definitions are unsuitable (fixed
in a later patch). This case will be treated as a pfn of zero, mapping
the first page of physical memory into the guest.
It would appear the MIPS KVM port wasn't updated prior to being merged
(in v3.10) to take commit 81c52c56e2 ("KVM: do not treat noslot pfn as
a error pfn") into account (merged v3.8), which converted a bunch of
is_error_pfn() calls to is_error_noslot_pfn(). Switch to using
is_error_noslot_pfn() instead to catch this case properly.
Fixes: 858dd5d457 ("KVM/MIPS32: MMU/TLB operations for the Guest.")
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Cc: <stable@vger.kernel.org> # 3.10.y-
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The dmic-codec was registered within the platform_driver's probe function,
which can cause deferred probe to run in loops as reported and analyzed by
Russell King.
Use module_init/exit in the driver and handle the dmic-codec device
registration and removal at that level instead of the platform_driver
probe/remove.
Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
Reported-by: Russell King <rmk+kernel@armlinux.org.uk>
Tested-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Mark Brown <broonie@kernel.org>
of_match_device could return NULL, and so cause a NULL pointer
dereference later at line 472:
data->socdata = of_id->data;
For fixing this problem, we use of_device_get_match_data(), this will
simplify the code a little by using a standard function for
getting the match data.
Reported-by: coverity (CID 1324128)
Signed-off-by: LABBE Corentin <clabbe.montjoie@gmail.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
The field "owner" is set by core. Thus delete an extra initialisation.
Generated by: scripts/coccinelle/api/platform_no_drv_owner.cocci
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Currently all CPU cooling devices share a
`struct thermal_cooling_device_ops` instance. The thermal core uses the
presence of functions in this struct to determine if a cooling device
has a power model (see cdev_is_power_actor). cpu_cooling.c adds the
power model functions to the shared struct when a device is registered
with a power model.
Therefore, if a CPU cooling device is registered using
[of_]cpufreq_power_cooling_register, _all_ devices will be determined to
have a power model, including any registered with
[of_]cpufreq_cooling_register. This can result in cpufreq_state2power
being called on a device where dyn_power_table is NULL.
With this commit, instead of having a shared thermal_cooling_device_ops
which is mutated, we have two versions: one with the power functions and
one without.
Signed-off-by: Brendan Jackman <brendan.jackman@arm.com>
Cc: Amit Daniel Kachhap <amit.kachhap@gmail.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Javi Merino <javi.merino@arm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Javi Merino <javi.merino@arm.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
The drivers that depend on OF but not OF_GPIO are wreaking havoc
with the autobuilders for archs that have all requirements for
OF but not for OF_GPIO, particularly the UM (Usermode) arch does
not have iomem (NO_IOMEM) which result in configuring GPIOLIB but
without OF_GPIO which is wrong if the driver is using the .of_node
of the gpiochip, which only appears with OF_GPIO.
After a brief look at the drivers just depending on OF it seems
most if not all of them actually require stuff from gpiolib-of so
the dependency is wrong in the first place.
This simply patches the Kconfig so that all GPIO drivers using OF
depend on OF_GPIO rather than just OF.
Cc: Rabin Vincent <rabin@rab.in>
Cc: Pramod Gurav <pramod.gurav@smartplayin.com>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: Gregory CLEMENT <gregory.clement@free-electrons.com>
Cc: Thierry Reding <treding@nvidia.com>
Cc: Laxman Dewangan <ldewangan@nvidia.com>
Cc: Alexandre Courbot <acourbot@nvidia.com>
Cc: Geert Uytterhoeven <geert+renesas@glider.be>
Cc: Phil Reid <preid@electromag.com.au>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
The UserMode (UM) Linux build was failing in gpiolib-of as it requires
ioremap()/iounmap() to exist, which is absent from UM. The non-existence
of IO memory is negatively defined as CONFIG_NO_IOMEM which means we
need to depend on HAS_IOMEM.
Cc: stable@vger.kernel.org
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Both the card and DAPM cleanups recursively delete their debugfs
directories. Since the DAPM debugfs subdirectory for the card is
located within the card debugfs this means we end up trying to double
free the DAPM subdirectory. Reorder the cleanup to free the card
debugfs after we've cleaned up DAPM and it has deleted its own
subdirectory.
Reported-by: Russell King - ARM Linux <linux@armlinux.org.uk>
Tested-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Mark Brown <broonie@kernel.org>
The disable_bypass cmdline option changes the SMMUv3 driver to put down
faulting stream table entries by default, as opposed to bypassing
transactions from unconfigured devices.
In this mode of operation, it is entirely expected to see aborting
entries in the stream table if and when we come to installing a valid
translation, so don't trigger a BUG() as a result of misdiagnosing these
entries as stream table corruption.
Cc: <stable@vger.kernel.org>
Fixes: 48ec83bcbc ("iommu/arm-smmu: Add initial driver support for ARM SMMUv3 devices")
Tested-by: Robin Murphy <robin.murphy@arm.com>
Reported-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Enabling stalling faults can result in hardware deadlock on poorly
designed systems, particularly those with a PCI root complex upstream of
the SMMU.
Although it's not really Linux's job to save hardware integrators from
their own misfortune, it *is* our job to stop userspace (e.g. VFIO
clients) from hosing the system for everybody else, even if they might
already be required to have elevated privileges.
Given that the fault handling code currently executes entirely in IRQ
context, there is nothing that can sensibly be done to recover from
things like page faults anyway, so let's rip this code out for now and
avoid the potential for deadlock.
Cc: <stable@vger.kernel.org>
Fixes: 48ec83bcbc ("iommu/arm-smmu: Add initial driver support for ARM SMMUv3 devices")
Reported-by: Matt Evans <matt.evans@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
When starting a kexec/kdump kernel, the GIC ITS will already have been
enabled. According to the ARM Generic Interrupt Controller
Architecture Specification (GIC architecture Version 3.0 and version
4.0), writing to GITS_BASER<n> or GITS_CBASER is "UNPREDICTABLE" when
the ITS is enabled. On Cavium Thunder systems, this prevents the ITS
from being initializing in the kexec/kdump kernel, resulting in
failure to register/enable interrupts for all devices.
The fix is to disable the ITS if it is not already in the disabled
state. This allows the ITS to be properly initialized and then
re-enabled in the kexec/kdump kernel.
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: David Daney <david.daney@cavium.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
In the unlikely event of a global command queue error, the ARM SMMUv3
driver attempts to convert the problematic command into a CMD_SYNC and
resume the command queue. Unfortunately, this code is pretty badly
broken:
1. It uses the index into the error string table as the CMDQ index,
so we probably read the wrong entry out of the queue
2. The arguments to queue_write are the wrong way round, so we end up
writing from the queue onto the stack.
These happily cancel out, so the kernel is likely to stay alive, but
the command queue will probably fault again when we resume.
This patch fixes the error handling code to use the correct queue index
and write back the CMD_SYNC to the faulting entry.
Cc: <stable@vger.kernel.org>
Fixes: 48ec83bcbc ("iommu/arm-smmu: Add initial driver support for ARM SMMUv3 devices")
Reported-by: Diwakar Subraveti <Diwakar.Subraveti@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Due to the attribute bits being all over the place in the different
types of short-descriptor PTEs, when remapping an existing entry, e.g.
splitting a section into pages, we take the approach of decomposing
the PTE attributes back to the IOMMU API flags to start from scratch.
On inspection, though, the existing code seems to have got the read-only
bit backwards and ignored the XN bit. How embarrassing...
Fortunately the primary user so far, the Mediatek IOMMU, both never
splits blocks (because it only serves non-overlapping DMA API calls) and
also ignores permissions anyway, but let's put things right before any
future users trip up.
Cc: <stable@vger.kernel.org>
Fixes: e5fc9753b1 ("iommu/io-pgtable: Add ARMv7 short descriptor support")
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The original error was thought to be corruption, but was actually caused by:
make-bcache --data-offset N
where N was in bytes and should have been in sectors. While userspace
tools should be updated to check --data-offset beyond end of volume,
hopefully this will help others that might not have noticed the units.
Signed-off-by: Eric Wheeler <bcache@linux.ewheeler.net>
Cc: Kent Overstreet <kent.overstreet@gmail.com>
This patch fixes a cachedev registration-time allocation deadlock.
This can deadlock on boot if your initrd auto-registeres bcache devices:
Allocator thread:
[ 720.727614] INFO: task bcache_allocato:3833 blocked for more than 120 seconds.
[ 720.732361] [<ffffffff816eeac7>] schedule+0x37/0x90
[ 720.732963] [<ffffffffa05192b8>] bch_bucket_alloc+0x188/0x360 [bcache]
[ 720.733538] [<ffffffff810e6950>] ? prepare_to_wait_event+0xf0/0xf0
[ 720.734137] [<ffffffffa05302bd>] bch_prio_write+0x19d/0x340 [bcache]
[ 720.734715] [<ffffffffa05190bf>] bch_allocator_thread+0x3ff/0x470 [bcache]
[ 720.735311] [<ffffffff816ee41c>] ? __schedule+0x2dc/0x950
[ 720.735884] [<ffffffffa0518cc0>] ? invalidate_buckets+0x980/0x980 [bcache]
Registration thread:
[ 720.710403] INFO: task bash:3531 blocked for more than 120 seconds.
[ 720.715226] [<ffffffff816eeac7>] schedule+0x37/0x90
[ 720.715805] [<ffffffffa05235cd>] __bch_btree_map_nodes+0x12d/0x150 [bcache]
[ 720.716409] [<ffffffffa0522d30>] ? bch_btree_insert_check_key+0x1c0/0x1c0 [bcache]
[ 720.717008] [<ffffffffa05236e4>] bch_btree_insert+0xf4/0x170 [bcache]
[ 720.717586] [<ffffffff810e6950>] ? prepare_to_wait_event+0xf0/0xf0
[ 720.718191] [<ffffffffa0527d9a>] bch_journal_replay+0x14a/0x290 [bcache]
[ 720.718766] [<ffffffff810cc90d>] ? ttwu_do_activate.constprop.94+0x5d/0x70
[ 720.719369] [<ffffffff810cf684>] ? try_to_wake_up+0x1d4/0x350
[ 720.719968] [<ffffffffa05317d0>] run_cache_set+0x580/0x8e0 [bcache]
[ 720.720553] [<ffffffffa053302e>] register_bcache+0xe2e/0x13b0 [bcache]
[ 720.721153] [<ffffffff81354cef>] kobj_attr_store+0xf/0x20
[ 720.721730] [<ffffffff812a2dad>] sysfs_kf_write+0x3d/0x50
[ 720.722327] [<ffffffff812a225a>] kernfs_fop_write+0x12a/0x180
[ 720.722904] [<ffffffff81225177>] __vfs_write+0x37/0x110
[ 720.723503] [<ffffffff81228048>] ? __sb_start_write+0x58/0x110
[ 720.724100] [<ffffffff812cedb3>] ? security_file_permission+0x23/0xa0
[ 720.724675] [<ffffffff812258a9>] vfs_write+0xa9/0x1b0
[ 720.725275] [<ffffffff8102479c>] ? do_audit_syscall_entry+0x6c/0x70
[ 720.725849] [<ffffffff81226755>] SyS_write+0x55/0xd0
[ 720.726451] [<ffffffff8106a390>] ? do_page_fault+0x30/0x80
[ 720.727045] [<ffffffff816f2cae>] system_call_fastpath+0x12/0x71
The fifo code in upstream bcache can't use the last element in the buffer,
which was the cause of the bug: if you asked for a power of two size,
it'd give you a fifo that could hold one less than what you asked for
rather than allocating a buffer twice as big.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Tested-by: Eric Wheeler <bcache@linux.ewheeler.net>
Cc: stable@vger.kernel.org
register_cache() is supposed to return an error string on error so that
register_bcache() will will blkdev_put and cleanup other user counters,
but it does not set 'char *err' when cache_alloc() fails (eg, due to
memory pressure) and thus register_bcache() performs no cleanup.
register_bcache() <----------\ <- no jump to err_close, no blkdev_put()
| |
+->register_cache() | <- fails to set char *err
| |
+->cache_alloc() ---/ <- returns error
This patch sets `char *err` for this failure case so that register_cache()
will cause register_bcache() to correctly jump to err_close and do
cleanup. This was tested under OOM conditions that triggered the bug.
Signed-off-by: Eric Wheeler <bcache@linux.ewheeler.net>
Cc: Kent Overstreet <kent.overstreet@gmail.com>
Cc: stable@vger.kernel.org
Pull more drm fixes from Dave Airlie:
"Daniel pointed out I'd missed some i915 fixes, and I also found a
single etnaviv fix I missed.
So here they are"
* tag 'drm-fixes-for-4.8-rc3-2' of git://people.freedesktop.org/~airlied/linux:
drm/etnaviv: take GPU lock later in the submit process
drm/i915: Fix modeset handling during gpu reset, v5.
drm/i915: fix aliasing_ppgtt leak
drm/i915: fix WaInsertDummyPushConstPs
drm/i915: Fix iboost setting for SKL Y/U DP DDI buffer translation entry 2
drm/i915/gen9: Give one extra block per line for SKL plane WM calculations
drm/i915: Acquire audio powerwell for HD-Audio registers
drm/i915: Add missing rpm wakelock to GGTT pread
drm/i915/fbc: FBC causes display flicker when VT-d is enabled on Skylake
drm/i915: Clean up the extra RPM ref on CHV with i915.enable_rc6=0
drm/i915: Program iboost settings for HDMI/DVI on SKL
drm/i915: Fix iboost setting for DDI with 4 lanes on SKL
drm/i915: Handle ENOSPC after failing to insert a mappable node
drm/i915: Flush GT idle status upon reset
Pull DeviceTree fixes from Rob Herring:
- a couple of DT node ref counting fixes
- fix __unflatten_device_tree for PPC PCI hotplug case
- rework marking irq controllers as OF_POPULATED in cases where real
driver is used.
- disable of_platform_default_populate_init on PPC. The change in
initcall order causes problems which need to be sorted out later.
* tag 'devicetree-fixes-for-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
of: fix reference counting in of_graph_get_endpoint_by_regs
of/platform: disable the of_platform_default_populate_init() for all the ppc boards
ARM: imx6: mark GPC node as not populated after irq init to probe pm domain driver
of/irq: Mark interrupt controllers as populated before initialisation
drivers/of: Validate device node in __unflatten_device_tree()
of: Delete an unnecessary check before the function call "of_node_put"
Thread A Thread B
- inode_lock fileA
- inode_lock fileB
- inode_lock fileA
- inode_lock fileB
We may encounter above potential deadlock during moving file range in
concurrent scenario. This patch fixes the issue by using inode_trylock
instead.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Only if two input files are regular files, we allow copying data in
range of them, otherwise, deny it.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This reverts commit a2ee0a3003.
When testing with generic/032 of xfstest suit, failure message will be
reported as below:
generic/032 8s ... [failed, exit status 1] - output mismatch (see results/generic/032.out.bad)
--- tests/generic/032.out 2015-01-11 16:52:27.643681072 +0800
+++ results/generic/032.out.bad 2016-08-06 13:44:43.861330500 +0800
@@ -1,5 +1,5 @@
QA output created by 032
-100 iterations
-0000000 cdcd cdcd cdcd cdcd cdcd cdcd cdcd cdcd
-*
-0100000
+1: [768..775]: unwritten
+Unwritten extents found!
...
(Run 'diff -u tests/generic/032.out results/generic/032.out.bad' to see the entire diff)
Ran: generic/032
Failures: generic/032
Failed 1 of 1 tests
In write_end(), we should update i_size of inode before unlock page,
otherwise, we will lose newly updated data in following race condition.
Thread A Thread B
- write_end
- unlock page
- writepages
- lock_page
- writepage
if page is out-of-range of file size,
we will skip writting the page.
- update i_size
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
LKP reported -36.3% regression of fsmark.files_per_sec due to this patch.
I've confirmed that fxmark [1] has also slight regression for DWAL.
[1] https://github.com/sslab-gatech/fxmark
This reverts commit ec795418c4.
Pull documentation fixes from Jonathan Corbet:
"Three small fixes for Sphinx-formatted documentation generation"
* tag '4.8-doc-fixes' of git://git.lwn.net/linux:
doc-rst: customize RTD theme, drop padding of inline literal
docs: kernel-documentation: remove some highlight directives
docs: Set the Sphinx default highlight language to "guess"
Collection of i915 fixes.
* tag 'drm-intel-fixes-2016-08-15' of git://anongit.freedesktop.org/drm-intel:
drm/i915: Fix modeset handling during gpu reset, v5.
drm/i915: fix aliasing_ppgtt leak
drm/i915: fix WaInsertDummyPushConstPs
drm/i915: Fix iboost setting for SKL Y/U DP DDI buffer translation entry 2
drm/i915/gen9: Give one extra block per line for SKL plane WM calculations
drm/i915: Acquire audio powerwell for HD-Audio registers
drm/i915: Add missing rpm wakelock to GGTT pread
drm/i915/fbc: FBC causes display flicker when VT-d is enabled on Skylake
drm/i915: Clean up the extra RPM ref on CHV with i915.enable_rc6=0
drm/i915: Program iboost settings for HDMI/DVI on SKL
drm/i915: Fix iboost setting for DDI with 4 lanes on SKL
drm/i915: Handle ENOSPC after failing to insert a mappable node
drm/i915: Flush GT idle status upon reset
Pull x86 fixes from Ingo Molnar:
"An initrd microcode loading fix, and an SMP bootup topology setup fix
to resolve crashes on SGI/UV systems if the BIOS is configured in a
certain way"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/smp: Fix __max_logical_packages value setup
x86/microcode/AMD: Fix initrd loading with CONFIG_RANDOMIZE_MEMORY=y
Pull perf fixes from Ingo Molnar:
"Mostly tooling fixes, but also start/stop filter related fixes, a perf
event read() fix, a fix uncovered by fuzzing, and an uprobes leak fix"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/core: Check return value of the perf_event_read() IPI
perf/core: Enable mapping of the stop filters
perf/core: Update filters only on executable mmap
perf/core: Fix file name handling for start/stop filters
perf/core: Fix event_function_local()
uprobes: Fix the memcg accounting
perf intel-pt: Fix occasional decoding errors when tracing system-wide
tools: Sync kvm related header files for arm64 and s390
perf probe: Release resources on error when handling exit paths
perf probe: Check for dup and fdopen failures
perf symbols: Fix annotation of objects with debuginfo files
perf script: Don't disable use_callchain if input is pipe
perf script: Show proper message when failed list scripts
perf jitdump: Add the right header to get the major()/minor() definitions
perf ppc64le: Fix build failure when libelf is not present
perf tools mem: Fix -t store option for record command
perf intel-pt: Fix ip compression
Pull perf/urgent fixes from Arnaldo Carvalho de Melo:
- Do not access outside hw cache name arrays (Arnaldo Carvalho de Melo)
- Use addr_location::addr instead of ip for entries when unwinding using
DWARF CFI, fixing the "srcline" information for userspace application
callchains (Milian Wolff)
- Reinstate strlcpy() header guard with __UCLIBC__, fixing the build
with uclibc, detected when building for the ARC architecture (Vineet Gupta)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
We have to check if the values are >= *_MAX, not just >, fix it.
From the bugzilla report:
''In file /tools/perf/util/evsel.c function __perf_evsel__hw_cache_name
it appears that there is a bug that reads beyond the end of the buffer.
The statement "if (type > PERF_COUNT_HW_CACHE_MAX)" allows type to be
equal to the maximum value. Later, when statement "if
(!perf_evsel__is_cache_op_valid(type, op))" is executed, the function
can access array perf_evsel__hw_cache_stat[type] beyond the end of the
buffer.
It appears to me that the statement "if (type > PERF_COUNT_HW_CACHE_MAX)"
should be "if (type >= PERF_COUNT_HW_CACHE_MAX)"
Bug found with Coverity and manual code review. No attempts were made to
execute the code with a maximum type value.''
Committer note:
Testing it:
$ perf record -e $(echo $(perf list cache | cut -d \[ -f1) | sed 's/ /,/g') usleep 1
[ perf record: Woken up 16 times to write data ]
[ perf record: Captured and wrote 0.023 MB perf.data (34 samples) ]
$ perf evlist
L1-dcache-load-misses
L1-dcache-loads
L1-dcache-stores
L1-icache-load-misses
LLC-load-misses
LLC-loads
LLC-store-misses
LLC-stores
branch-load-misses
branch-loads
dTLB-load-misses
dTLB-loads
dTLB-store-misses
dTLB-stores
iTLB-load-misses
iTLB-loads
node-load-misses
node-loads
node-store-misses
node-stores
$ perf list cache
List of pre-defined events (to be used in -e):
L1-dcache-load-misses [Hardware cache event]
L1-dcache-loads [Hardware cache event]
L1-dcache-stores [Hardware cache event]
L1-icache-load-misses [Hardware cache event]
LLC-load-misses [Hardware cache event]
LLC-loads [Hardware cache event]
LLC-store-misses [Hardware cache event]
LLC-stores [Hardware cache event]
branch-load-misses [Hardware cache event]
branch-loads [Hardware cache event]
dTLB-load-misses [Hardware cache event]
dTLB-loads [Hardware cache event]
dTLB-store-misses [Hardware cache event]
dTLB-stores [Hardware cache event]
iTLB-load-misses [Hardware cache event]
iTLB-loads [Hardware cache event]
node-load-misses [Hardware cache event]
node-loads [Hardware cache event]
node-store-misses [Hardware cache event]
node-stores [Hardware cache event]
$
Reported-by: Brian Sweeney <bsweeney@lgsinnovations.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=153351
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf tools build in recent kernels spews splat when cross compiling with uClibc:
| CC util/alias.o
| In file included from tools/perf/util/../ui/../util/cache.h:8:0,
| from tools/perf/util/../ui/helpline.h:7,
| from tools/perf/util/debug.h:8,
| from arch/../util/cpumap.h:9,
| from arch/../util/env.h:5,
| from arch/common.h:4,
| from arch/common.c:3:
| tools/include/linux/string.h:12:15: warning: redundant redeclaration of ‘strlcpy’ [-Wredundant-decls]
| extern size_t strlcpy(char *dest, const char *src, size_t size);
^
This is after commit 61a6445e46 ("tools lib: Guard the strlcpy() header with
__GLIBC__").
The problem is uClibc also defines __GLIBC__ for exported headers for
applications. So add that specific check to not trip for uClibc.
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexey Brodkin <Alexey.Brodkin@synopsys.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Petri Gynther <pgynther@google.com>
Cc: Wang Nan <wangnan0@huawei.com>
Cc: linux-snps-arc@lists.infradead.org
Link: http://lkml.kernel.org/r/1471537703-16439-1-git-send-email-vgupta@synopsys.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Pull arm64 fixes from Catalin Marinas:
- Avoid a literal load with the MMU off on the CPU resume path
(potential inconsistency between cache and RAM)
- Build error with CONFIG_ACPI=n fixed
- Compiler warning in the arch/arm64/mm/dump.c code fixed
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: Fix shift warning in arch/arm64/mm/dump.c
arm64: kernel: avoid literal load of virtual address with MMU off
arm64: Fix NUMA build error when !CONFIG_ACPI
Pull ARM fixes from Russell King:
"Only three fixes this time:
- Emil found an overflow problem with the memory layout sanity check.
- Ard Biesheuvel noticed that late-allocated page tables (for EFI)
weren't being properly constructed.
- Guenter Roeck reported a problem found on qemu caused by the recent
addr_limit changes"
* 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm:
ARM: fix address limit restoration for undefined instructions
ARM: 8591/1: mm: use fully constructed struct pages for EFI pgd allocations
ARM: 8590/1: sanity_check_meminfo(): avoid overflow on vmalloc_limit
Pull power management fixes from Rafael Wysocki:
"More hibernation-related material: one fix for a recent regression in
the core, one small cleanup of the x86-64 resume code and a
documentation update.
Specifics:
- Fix a hibernate core regression resulting from uncovering a latent
bug in its implementation of memory bitmaps by a recent commit
(James Morse).
- Use __pa() to compute a physical address in the x86-64 code
finalizing resume from hibernation (Rafael Wysocki).
- Update power management documentation related to system sleep
states to remove outdated information from it and to add a
description of a recently introduced hibernation debug feature to
it (Rafael Wysocki)"
* tag 'pm-4.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PM / hibernate: Fix rtree_next_node() to avoid walking off list ends
x86/power/64: Use __pa() for physical address computation
PM / sleep: Update some system sleep documentation
Pull drm fixes from Dave Airlie:
"Pretty quiet so far:
- a few amdgpu/radeon fixup for pcie pm changes
- a couple of amdgpu fixes
- some build fixes
- printk fix"
* tag 'drm-fixes-for-4.8-rc3' of git://people.freedesktop.org/~airlied/linux:
drm/amdgpu: Change GART offset to 64-bit
drm/mediatek: add ARM_SMCCC dependency
drm/mediatek: add CONFIG_OF dependency
drm/mediatek: add COMMON_CLK dependency
drm/amdgpu: Fix memory trashing if UVD ring test fails
drm/amdgpu: fix vm init error path
drm/amdkfd: print doorbell offset as a hex value
Revert "drm/radeon: work around lack of upstream ACPI support for D3cold"
Revert "drm/amdgpu: work around lack of upstream ACPI support for D3cold"
This reverts commit 65aca64d05.
The patches for twl6040 MFD and clk missed the merge window and
causing the McPDM driver to never probe since it is put back to
the deferred list because the missing drivers.
Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
ahci currently insists on an explicit call to pci_intx() before falling
back from MSI or MSI-X to legacy IRQs. As pci_intx() is a no-op if the
command register already contains the right value it seems safe and useful
to add this call to pci_alloc_irq_vectors() so that ahci can just use
pci_alloc_irq_vectors().
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
When building with 48-bit VAs and 16K page configuration, it's possible
to get the following warning when building the arm64 page table dumping
code:
arch/arm64/mm/dump.c: In function ‘walk_pud’:
arch/arm64/mm/dump.c:274:102: warning: right shift count >= width of type [-Wshift-count-overflow]
This is because pud_offset(pgd, 0) performs a shift to the right by 36
while the value 0 has the type 'int' by default, therefore 32-bit.
This patch modifies all the p*_offset() uses in arch/arm64/mm/dump.c to
use 0UL for the address argument.
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
KVM/ARM Fixes for v4.8-rc3
This tag contains the following fixes on top of v4.8-rc1:
- ITS init issues
- ITS error handling issues
- ITS IRQ leakage fix
- Plug a couple of ITS race conditions
- An erratum workaround for timers
- Some removal of misleading use of errors and comments
- A fix for GICv3 on 32-bit guests
When the host supported TSC scaling, L2 would use a TSC multiplier of
0, which causes a VM entry failure. Now L2's TSC uses the same
multiplier as L1.
Signed-off-by: Peter Feiner <pfeiner@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
If vmcs12 does not intercept APIC_BASE writes, then KVM will handle the
write with vmcs02 as the current VMCS.
This will incorrectly apply modifications intended for vmcs01 to vmcs02
and L2 can use it to gain access to L0's x2APIC registers by disabling
virtualized x2APIC while using msr bitmap that assumes enabled.
Postpone execution of vmx_set_virtual_x2apic_mode until vmcs01 is the
current VMCS. An alternative solution would temporarily make vmcs01 the
current VMCS, but it requires more care.
Fixes: 8d14695f95 ("x86, apicv: add virtual x2apic support")
Reported-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
msr bitmap can be used to avoid a VM exit (interception) on guest MSR
accesses. In some configurations of VMX controls, the guest can even
directly access host's x2APIC MSRs. See SDM 29.5 VIRTUALIZING MSR-BASED
APIC ACCESSES.
L2 could read all L0's x2APIC MSRs and write TPR, EOI, and SELF_IPI.
To do so, L1 would first trick KVM to disable all possible interceptions
by enabling APICv features and then would turn those features off;
nested_vmx_merge_msr_bitmap() only disabled interceptions, so VMX would
not intercept previously enabled MSRs even though they were not safe
with the new configuration.
Correctly re-enabling interceptions is not enough as a second bug would
still allow L1+L2 to access host's MSRs: msr bitmap was shared for all
VMCSs, so L1 could trigger a race to get the desired combination of msr
bitmap and VMX controls.
This fix allocates a msr bitmap for every L1 VCPU, allows only safe
x2APIC MSRs from L1's msr bitmap, and disables msr bitmaps if they would
have to intercept everything anyway.
Fixes: 3af18d9c5f ("KVM: nVMX: Prepare for using hardware MSR bitmap")
Reported-by: Jim Mattson <jmattson@google.com>
Suggested-by: Wincy Van <fanwenyi0529@gmail.com>
Reviewed-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Commit:
5743021831 ("sched/cputime: Count actually elapsed irq & softirq time")
... fixed a bug but also triggered a regression:
On an i5 laptop, 4 pCPUs, 4vCPUs for one full dynticks guest, there are four
CPU hog processes(for loop) running in the guest, I hot-unplug the pCPUs
on host one by one until there is only one left, then observe CPU utilization
via 'top' in the guest, it shows:
100% st for cpu0(housekeeping)
75% st for other CPUs (nohz full mode)
However, w/o this commit it shows the correct 75% for all four CPUs.
When a guest is interrupted for a longer amount of time, missed clock ticks
are not redelivered later. Because of that, we should not limit the amount
of steal time accounted to the amount of time that the calling functions
think have passed.
However, the interval returned by account_other_time() is NOT rounded down
to the nearest jiffy, while the base interval in get_vtime_delta() it is
subtracted from is, so the max cputime limit is required to avoid underflow.
This patch fixes the regression by limiting the account_other_time() from
get_vtime_delta() to avoid underflow, and lets the other three call sites
(in account_other_time() and steal_account_process_time()) account however
much steal time the host told us elapsed.
Suggested-by: Rik van Riel <riel@redhat.com>
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Radim Krcmar <rkrcmar@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kvm@vger.kernel.org
Link: http://lkml.kernel.org/r/1471399546-4069-1-git-send-email-wanpeng.li@hotmail.com
[ Improved the changelog. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Mike reports:
Roughly 10% of the time, ltp testcase getrusage04 fails:
getrusage04 0 TINFO : Expected timers granularity is 4000 us
getrusage04 0 TINFO : Using 1 as multiply factor for max [us]time increment (1000+4000us)!
getrusage04 0 TINFO : utime: 0us; stime: 179us
getrusage04 0 TINFO : utime: 3751us; stime: 0us
getrusage04 1 TFAIL : getrusage04.c:133: stime increased > 5000us:
And tracked it down to the case where the task simply doesn't get
_any_ [us]time ticks.
Update the code to assume all rtime is utime when we lack information,
thus ensuring a task that elides the tick gets time accounted.
Reported-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Tested-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Fredrik Markstrom <fredrik.markstrom@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Radim <rkrcmar@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Cc: stable@vger.kernel.org # 4.3+
Fixes: 9d7fb04276 ("sched/cputime: Guarantee stime + utime == rtime")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Function perf_event_mmap() is called by the MM subsystem each time
part of a binary is loaded in memory. There can be several mapping
for a binary, many times unrelated to the code section.
Each time a section of a binary is mapped address filters are
updated, event when the map doesn't pertain to the code section.
The end result is that filters are configured based on the last map
event that was received rather than the last mapping of the code
segment.
For example if we have an executable 'main' that calls library
'libcstest.so.1.0', and that we want to collect traces on code
that is in that library. The perf cmd line for this scenario
would be:
perf record -e cs_etm// --filter 'filter 0x72c/0x40@/opt/lib/libcstest.so.1.0' --per-thread ./main
Resulting in binaries being mapped this way:
root@linaro-nano:~# cat /proc/1950/maps
00400000-00401000 r-xp 00000000 08:02 33169 /home/linaro/main
00410000-00411000 r--p 00000000 08:02 33169 /home/linaro/main
00411000-00412000 rw-p 00001000 08:02 33169 /home/linaro/main
7fa2464000-7fa2474000 rw-p 00000000 00:00 0
7fa2474000-7fa25a4000 r-xp 00000000 08:02 543 /lib/aarch64-linux-gnu/libc-2.21.so
7fa25a4000-7fa25b3000 ---p 00130000 08:02 543 /lib/aarch64-linux-gnu/libc-2.21.so
7fa25b3000-7fa25b7000 r--p 0012f000 08:02 543 /lib/aarch64-linux-gnu/libc-2.21.so
7fa25b7000-7fa25b9000 rw-p 00133000 08:02 543 /lib/aarch64-linux-gnu/libc-2.21.so
7fa25b9000-7fa25bd000 rw-p 00000000 00:00 0
7fa25bd000-7fa25be000 r-xp 00000000 08:02 38308 /opt/lib/libcstest.so.1.0
7fa25be000-7fa25cd000 ---p 00001000 08:02 38308 /opt/lib/libcstest.so.1.0
7fa25cd000-7fa25ce000 r--p 00000000 08:02 38308 /opt/lib/libcstest.so.1.0
7fa25ce000-7fa25cf000 rw-p 00001000 08:02 38308 /opt/lib/libcstest.so.1.0
7fa25cf000-7fa25eb000 r-xp 00000000 08:02 574 /lib/aarch64-linux-gnu/ld-2.21.so
7fa25ef000-7fa25f2000 rw-p 00000000 00:00 0
7fa25f7000-7fa25f9000 rw-p 00000000 00:00 0
7fa25f9000-7fa25fa000 r--p 00000000 00:00 0 [vvar]
7fa25fa000-7fa25fb000 r-xp 00000000 00:00 0 [vdso]
7fa25fb000-7fa25fc000 r--p 0001c000 08:02 574 /lib/aarch64-linux-gnu/ld-2.21.so
7fa25fc000-7fa25fe000 rw-p 0001d000 08:02 574 /lib/aarch64-linux-gnu/ld-2.21.so
7ff2ea8000-7ff2ec9000 rw-p 00000000 00:00 0 [stack]
root@linaro-nano:~#
Before 'main()' can execute 'libcstest.so.1.0' has to be loaded in
memory. Once that has been done perf_event_mmap() has been called
4 times, with the last map starting at address 0x7fa25ce000 and
the address filter configured to start filtering when the
IP has passed over address 0x0x7fa25ce72c (0x7fa25ce000 + 0x72c).
But that is wrong since the code segment for library 'libcstest.so.1.0'
as been mapped at 0x7fa25bd000, resulting in traces not being
collected.
This patch corrects the situation by requesting that address
filters be updated only if the mapped event is for a code
segment.
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1468860187-318-3-git-send-email-mathieu.poirier@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Frank reported kernel panic when he disabled several cores in BIOS
via following option:
Core Disable Bitmap(Hex) [0]
with number 0xFFE, which leaves 16 CPUs in system (out of 48).
The kernel panic below goes along with following messages:
smpboot: Max logical packages: 2^M
smpboot: APIC(0) Converting physical 0 to logical package 0^M
smpboot: APIC(20) Converting physical 1 to logical package 1^M
smpboot: APIC(40) Package 2 exceeds logical package map^M
smpboot: CPU 8 APICId 40 disabled^M
smpboot: APIC(60) Package 3 exceeds logical package map^M
smpboot: CPU 12 APICId 60 disabled^M
...
general protection fault: 0000 [#1] SMP^M
Modules linked in:^M
CPU: 15 PID: 1 Comm: swapper/0 Not tainted 4.7.0-rc5+ #1^M
Hardware name: SGI UV300/UV300, BIOS SGI UV 300 series BIOS 05/25/2016^M
task: ffff8801673e0000 ti: ffff8801673ac000 task.ti: ffff8801673ac000^M
RIP: 0010:[<ffffffff81014d54>] [<ffffffff81014d54>] uncore_change_context+0xd4/0x180^M
...
[<ffffffff810158ac>] uncore_event_init_cpu+0x6c/0x70^M
[<ffffffff81d8c91c>] intel_uncore_init+0x1c2/0x2dd^M
[<ffffffff81d8c75a>] ? uncore_cpu_setup+0x17/0x17^M
[<ffffffff81002190>] do_one_initcall+0x50/0x190^M
[<ffffffff810ab193>] ? parse_args+0x293/0x480^M
[<ffffffff81d87365>] kernel_init_freeable+0x1a5/0x249^M
[<ffffffff81d86a35>] ? set_debug_rodata+0x12/0x12^M
[<ffffffff816dc19e>] kernel_init+0xe/0x110^M
[<ffffffff816e93bf>] ret_from_fork+0x1f/0x40^M
[<ffffffff816dc190>] ? rest_init+0x80/0x80^M
The reason for the panic is wrong value of __max_logical_packages,
which lets logical_package_map uninitialized and the uncore code
relying on this map being properly initialized (maybe we should
add some safety checks there as well).
The __max_logical_packages is computed as:
DIV_ROUND_UP(total_cpus, ncpus);
- ncpus being number of cores
With above BIOS setup we get total_cpus == 16 which set
__max_logical_packages to 2 (ncpus is 12).
Once topology_update_package_map processes CPU with logical
pkg over 2 we display above messages and fail to initialize
the physical_to_logical_pkg map, which makes the uncore code
crash.
The fix is to remove logical_package_map bitmap completely
and keep and update the logical_packages number instead.
After we enumerate all the present CPUs, we check if the
enumerated logical packages count is within its computed
maximum from BIOS data.
If it's not the case, we set this maximum to the new enumerated
value and freeze any new addition of logical packages.
The freeze is because lot of init code like uncore/rapl/cqm
depends on having maximum logical package value set to allocate
their data, so we can't change it later on.
Prarit Bhargava tested the patch and confirms that it solves
the problem:
From dmidecode:
Core Count: 24
Core Enabled: 24
Thread Count: 48
Orig kernel boot log:
[ 0.464981] smpboot: Max logical packages: 19
[ 0.469861] smpboot: APIC(0) Converting physical 0 to logical package 0
[ 0.477261] smpboot: APIC(40) Converting physical 1 to logical package 1
[ 0.484760] smpboot: APIC(80) Converting physical 2 to logical package 2
[ 0.492258] smpboot: APIC(c0) Converting physical 3 to logical package 3
1. nr_cpus=8, should stop enumerating in package 0:
[ 0.533664] smpboot: APIC(0) Converting physical 0 to logical package 0
[ 0.539596] smpboot: Max logical packages: 19
2. max_cpus=8, should still enumerate all packages:
[ 0.526494] smpboot: APIC(0) Converting physical 0 to logical package 0
[ 0.532428] smpboot: APIC(40) Converting physical 1 to logical package 1
[ 0.538456] smpboot: APIC(80) Converting physical 2 to logical package 2
[ 0.544486] smpboot: APIC(c0) Converting physical 3 to logical package 3
[ 0.550524] smpboot: Max logical packages: 19
3. nr_cpus=49 ( 2 socket + 1 core on 3rd socket), should stop enumerating in
package 2:
[ 0.521378] smpboot: APIC(0) Converting physical 0 to logical package 0
[ 0.527314] smpboot: APIC(40) Converting physical 1 to logical package 1
[ 0.533345] smpboot: APIC(80) Converting physical 2 to logical package 2
[ 0.539368] smpboot: Max logical packages: 19
4. maxcpus=49, should still enumerate all packages:
[ 0.525591] smpboot: APIC(0) Converting physical 0 to logical package 0
[ 0.531525] smpboot: APIC(40) Converting physical 1 to logical package 1
[ 0.537547] smpboot: APIC(80) Converting physical 2 to logical package 2
[ 0.543579] smpboot: APIC(c0) Converting physical 3 to logical package 3
[ 0.549624] smpboot: Max logical packages: 19
5. kdump (nr_cpus=1) works as well.
Reported-by: Frank Ramsay <framsay@redhat.com>
Tested-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Reviewed-by: Prarit Bhargava <prarit@redhat.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160815101700.GA30090@krava
Signed-off-by: Ingo Molnar <mingo@kernel.org>
* pm-sleep:
PM / hibernate: Fix rtree_next_node() to avoid walking off list ends
x86/power/64: Use __pa() for physical address computation
PM / sleep: Update some system sleep documentation
Pull networking fixes from David Miller:
1) Buffers powersave frame test is reversed in cfg80211, fix from Felix
Fietkau.
2) Remove bogus WARN_ON in openvswitch, from Jarno Rajahalme.
3) Fix some tg3 ethtool logic bugs, and one that would cause no
interrupts to be generated when rx-coalescing is set to 0. From
Satish Baddipadige and Siva Reddy Kallam.
4) QLCNIC mailbox corruption and napi budget handling fix from Manish
Chopra.
5) Fix fib_trie logic when walking the trie during /proc/net/route
output than can access a stale node pointer. From David Forster.
6) Several sctp_diag fixes from Phil Sutter.
7) PAUSE frame handling fixes in mlxsw driver from Ido Schimmel.
8) Checksum fixup fixes in bpf from Daniel Borkmann.
9) Memork leaks in nfnetlink, from Liping Zhang.
10) Use after free in rxrpc, from David Howells.
11) Use after free in new skb_array code of macvtap driver, from Jason
Wang.
12) Calipso resource leak, from Colin Ian King.
13) mediatek bug fixes (missing stats sync init, etc.) from Sean Wang.
14) Fix bpf non-linear packet write helpers, from Daniel Borkmann.
15) Fix lockdep splats in macsec, from Sabrina Dubroca.
16) hv_netvsc bug fixes from Vitaly Kuznetsov, mostly to do with VF
handling.
17) Various tc-action bug fixes, from CONG Wang.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (116 commits)
net_sched: allow flushing tc police actions
net_sched: unify the init logic for act_police
net_sched: convert tcf_exts from list to pointer array
net_sched: move tc offload macros to pkt_cls.h
net_sched: fix a typo in tc_for_each_action()
net_sched: remove an unnecessary list_del()
net_sched: remove the leftover cleanup_a()
mlxsw: spectrum: Allow packets to be trapped from any PG
mlxsw: spectrum: Unmap 802.1Q FID before destroying it
mlxsw: spectrum: Add missing rollbacks in error path
mlxsw: reg: Fix missing op field fill-up
mlxsw: spectrum: Trap loop-backed packets
mlxsw: spectrum: Add missing packet traps
mlxsw: spectrum: Mark port as active before registering it
mlxsw: spectrum: Create PVID vPort before registering netdevice
mlxsw: spectrum: Remove redundant errors from the code
mlxsw: spectrum: Don't return upon error in removal path
i40e: check for and deal with non-contiguous TCs
ixgbe: Re-enable ability to toggle VLAN filtering
ixgbe: Force VLNCTRL.VFE to be set in all VMDq paths
...
Cong Wang says:
====================
net_sched: tc action fixes and updates
This patchset fixes a few regressions caused by the previous
code refactor and more. Thanks to Jamal for catching them!
Note, patch 3/7 and 4/7 are not strictly necessary for this patchset,
I just want to carry them together.
---
v4: adjust an indention for Jamal
add two more patches
v3: avoid list for fast path, suggested by Jamal
v2: replace flex_array with regular dynamic array
keep tcf_action_stats_update() in act_api.h
fix macro typos found by Amir
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The act_police uses its own code to walk the
action hashtable, which leads to that we could
not flush standalone tc police actions, so just
switch to tcf_generic_walker() like other actions.
(Joint work from Roman and Cong.)
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jamal reported a crash when we create a police action
with a specific index, this is because the init logic
is not correct, we should always create one for this
case. Just unify the logic with other tc actions.
Fixes: a03e6fe569 ("act_police: fix a crash during removal")
Reported-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As pointed out by Jamal, an action could be shared by
multiple filters, so we can't use list to chain them
any more after we get rid of the original tc_action.
Instead, we could just save pointers to these actions
in tcf_exts, since they are refcount'ed, so convert
the list to an array of pointers.
The "ugly" part is the action API still accepts list
as a parameter, I just introduce a helper function to
convert the array of pointers to a list, instead of
relying on the C99 feature to iterate the array.
Fixes: a85a970af2 ("net_sched: move tc_action into tcf_common")
Reported-by: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This list_del() for tc action is not needed actually,
because we only use this list to chain bulk operations,
therefore should not be carried for latter operations.
Fixes: ec0595cc44 ("net_sched: get rid of struct tcf_common")
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
After refactoring tc_action into tcf_common, we no
longer need to cleanup temporary "actions" in list,
they are permanently stored in the hashtable.
Fixes: a85a970af2 ("net_sched: move tc_action into tcf_common")
Reported-by: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jeff Kirsher says:
====================
Intel Wired LAN Driver Updates 2016-08-16
This series contains fixes to e1000e, igb, ixgbe and i40e.
Kshitiz Gupta provides a fix for igb to resolve the PHY delay compensation
math in several functions.
Jarod Wilson provides a fix for e1000e which had to broken up into 2
patches, first is prepares the driver for expanding the list of NICs
that have occasional ~10 hour clock jumps when being used for PTP.
Second patch actually fixes i218 silicon which has been experiencing
the clock jumps while using PTP.
Alex provides 2 patches for ixgbe now that he is back at Intel. First
fixes setting VLNCTRL.VFE bit, which was left unchanged in earlier patches
which resulted in disabling VLAN filtering for all the VFs. Second
corrects the support for disabling the VLAN tag filtering via the
feature bit.
Lastly, David fixes i40e which was causing a kernel panic when
non-contiguous traffic classes or traffic classes not starting with TC0,
were configured on a link partner switch. To fix this, changed the
logic when determining the total number of TCs enabled.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko says:
====================
mlxsw: IPv4 UC router fixes
Ido says:
Patches 1-3 fix a long standing problem in the driver's init sequence,
which manifests itself quite often when routing daemons try to configure
an IP address on registered netdevs that don't yet have an associated
vPort.
Patches 4-9 add missing packet traps for the router to work properly and
also fix ordering issue following the recent changes to the driver's init
sequence.
The last patch isn't related to the router, but fixes a general problem
in which under certain conditions packets aren't trapped to CPU.
v1->v2:
- Change order of patch 7
- Add patch 6 following Ilan's comment
- Add patchset name and cover letter
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
When packets enter the device they are classified to a priority group
(PG) buffer based on their PCP value. After their egress port and
traffic class are determined they are moved to the switch's shared
buffer and await transmission, if:
(Ingress{Port}.Usage < Thres && Ingress{Port,PG}.Usage < Thres &&
Egress{Port}.Usage < Thres && Egress{Port,TC}.Usage < Thres)
||
(Ingress{Port}.Usage < Min || Ingress{Port,PG} < Min ||
Egress{Port}.Usage < Min || Egress{Port,TC}.Usage < Min)
Packets scheduled to transmission through CPU port (trapped to CPU) use
traffic class 7, which has a zero maximum and minimum quotas. However,
when such packets arrive from PG 0 they are admitted to the shared
buffer as PG 0 has a non-zero minimum quota.
Allow all packets to be trapped to the CPU - regardless of the PG they
were classified to - by assigning a 10KB minimum quota for CPU port and
TC7.
Fixes: 8e8dfe9fdf ("mlxsw: spectrum: Add IEEE 802.1Qaz ETS support")
Reported-by: Tamir Winetroub <tamirw@mellanox.com>
Tested-by: Tamir Winetroub <tamirw@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Before destroying the 802.1Q FID we should first remove the VID-to-FID
mapping. This makes mlxsw_sp_fid_destroy() symmetric with regards to
mlxsw_sp_fid_create().
Fixes: 14d39461b3 ("mlxsw: spectrum: Use per-FID struct for the VLAN-aware bridge")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
While going over the code I noticed we are missing two rollbacks in the
port's creation error path. Add them and adjust the place of one of them
in the port's removal sequence so that both are symmetric.
Fixes: 56ade8fe3f ("mlxsw: spectrum: Add initial support for Spectrum ASIC")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ralue pack function needs to set op, otherwise it is 0 for add always.
Fixes: d5a1c749d2 ("mlxsw: reg: Add Router Algorithmic LPM Unicast Entry Register definition")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
One of the conditions to generate an ICMP Redirect Message is that "the
packet is being forwarded out the same physical interface that it was
received from" (RFC 1812).
Therefore, we need to be able to trap such packets and let the kernel
decide what to do with them.
For each RIF, enable the loop-back filter, which will raise the LBERROR
trap whenever the ingress RIF equals the egress RIF.
Fixes: 99724c18fc ("mlxsw: spectrum: Introduce support for router interfaces")
Reported-by: Ilan Tayari <ilant@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add the following traps:
1) MTU Error: Trap packets whose size is bigger than the egress RIF's
MTU. If DF bit isn't set, traffic will continue to be routed in slow
path.
2) TTL Error: Trap packets whose TTL expired. This allows traceroute to
work properly.
3) OSPF packets.
Fixes: 7b27ce7bb9 ("mlxsw: spectrum: Add traps needed for router implementation")
Signed-off-by: Elad Raz <eladr@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit bbf2a4757b ("mlxsw: spectrum: Initialize ports at the end of
init sequence") moved ports initialization to the end of the init
sequence, which means ports are the first to be removed during fini.
Since the FDB delayed work is still active when ports are removed it's
possible for it to process FDB notifications of inactive ports,
resulting in a warning message.
Fix that by marking ports as inactive only after unregistering them. The
NETDEV_UNREGISTER event will invoke bridge's driver port removal
sequence that will cause the FDB (and FDB notifications) to be flushed.
Fixes: bbf2a4757b ("mlxsw: spectrum: Initialize ports at the end of init sequence")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
After registering a netdevice it's possible for user space applications
to configure an IP address on it. From the driver's perspective, this
means a router interface (RIF) should be created for the PVID vPort.
Therefore, we must create the PVID vPort before registering the
netdevice.
Fixes: 99724c18fc ("mlxsw: spectrum: Introduce support for router interfaces")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, when device configuration fails we emit errors to the kernel
log despite the fact we already get these from the EMAD transaction
layer, so remove them.
In addition to being unnecessary, removing these error messages will
allow us to reuse mlxsw_sp_port_add_vid() to create the PVID vPort
before registering the netdevice.
Fixes: 99724c18fc ("mlxsw: spectrum: Introduce support for router interfaces")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When removing a VLAN filter from the device we shouldn't return upon the
first error we encounter, as otherwise we'll have resources that will
never be freed nor used.
Instead, we should keep trying to free as much resources as possible in
a best effort mode.
Remove the error message as well, since we already get these from the
EMAD transaction code.
Fixes: 99724c18fc ("mlxsw: spectrum: Introduce support for router interfaces")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull power supply fixes from Sebastian Reichel.
* tag 'for-v4.8-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply:
power_supply: tps65217-charger: fix missing platform_set_drvdata()
power: reset: hisi-reboot: Unmap region obtained by of_iomap
power: reset: reboot-mode: fix build error of missing ioremap/iounmap on UM
power: supply: max17042_battery: fix model download bug.
As per the GICv3 specification, to power down a processor using GICv3
and allow automatic power-on if an interrupt must be sent to a processor,
software must set Enable to zero for all interrupt groups(by writing
to GICC_CTLR or ICC_IGRPEN{0,1}_EL1/3 as appropriate.
When commit 3708d52fc6 ("irqchip: gic-v3: Implement CPU PM notifier")
was introduced there were no firmware implementations(in particular PSCI)
handling this.
Linux kernel may not be aware of the CPU power state details and might
fail to identify the power states that require quiescing the CPU
interface. Even if it can be aware of those details, it can't determine
which CPU power state have been triggered at the platform level and how
the power control is implemented.
This patch make disabling redistributor and group1 non-secure interrupts
in the power down path and re-enabling of redistributor in the power-up
path conditional. It will be handled in the kernel if and only if the
non-secure accesses are permitted to access and modify control registers.
It is left to the platform implementation otherwise.
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jason Cooper <jason@lakedaemon.net>
Tested-by: Christopher Covington <cov@codeaurora.org>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
On systems where a single CPU is present, the GIC may not support
having SGIs delivered to a target list. In that case, we use the
self-SGI mechanism to allow the interrupt to be delivered locally.
Tested-by: Fabio Estevam <fabio.estevam@nxp.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Literal loads of virtual addresses are subject to runtime relocation when
CONFIG_RELOCATABLE=y, and given that the relocation routines run with the
MMU and caches enabled, literal loads of relocated values performed with
the MMU off are not guaranteed to return the latest value unless the
memory covering the literal is cleaned to the PoC explicitly.
So defer the literal load until after the MMU has been enabled, just like
we do for primary_switch() and secondary_switch() in head.S.
Fixes: 1e48ef7fcc ("arm64: add support for building vmlinux as a relocatable PIE binary")
Cc: <stable@vger.kernel.org> # 4.6+
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Since asm/acpi.h is only included by linux/acpi.h when CONFIG_ACPI is
enabled, disabling the latter leads to the following build error on
arm64:
arch/arm64/mm/numa.c: In function ‘arm64_numa_init’:
arch/arm64/mm/numa.c:395:24: error: ‘arm64_acpi_numa_init’ undeclared (first use in this function)
if (!acpi_disabled && !numa_init(arm64_acpi_numa_init))
This patch include the asm/acpi.h explicitly in arch/arm64/mm/numa.c for
the arm64_acpi_numa_init() definition.
Fixes: d8b47fca8c ("arm64, ACPI, NUMA: NUMA support based on SRAT and SLIT")
Reviewed-by: Hanjun Guo <hanjun.guo@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The raid0 MD personality does not start a raid0 array with any of its
data devices missing.
dm-raid was removing data/metadata device pairs unconditionally if it
failed to read a superblock off the respective metadata device of such
pair, resulting in failure to start arrays with the raid0 personality.
Avoid removing any data/metadata device pairs in case of raid0
(e.g. lvm2 segment type 'raid0_meta') thus allowing MD to start the
array.
Also, avoid region size validation for raid0.
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
In commit:
d8152bf85d ("clocksource/drivers/mips-gic-timer: Convert init function to return error")
several return values were added to a void function resulting in the following warnings:
clocksource/mips-gic-timer.c: In function 'gic_clocksource_of_init':
clocksource/mips-gic-timer.c:175:3: warning: 'return' with a value, in function returning void [enabled by default]
clocksource/mips-gic-timer.c:183:4: warning: 'return' with a value, in function returning void [enabled by default]
clocksource/mips-gic-timer.c:190:3: warning: 'return' with a value, in function returning void [enabled by default]
clocksource/mips-gic-timer.c:195:3: warning: 'return' with a value, in function returning void [enabled by default]
clocksource/mips-gic-timer.c:200:3: warning: 'return' with a value, in function returning void [enabled by default]
clocksource/mips-gic-timer.c:211:2: warning: 'return' with a value, in function returning void [enabled by default]
clocksource/mips-gic-timer.c: At top level:
clocksource/mips-gic-timer.c:213:1: warning: comparison of distinct pointer types lacks a cast [enabled by default]
clocksource/mips-gic-timer.c: In function 'gic_clocksource_of_init':
clocksource/mips-gic-timer.c:183:18: warning: ignoring return value of 'PTR_ERR', declared with attribute warn_unused_result [-Wunused-result]
Given that the addition of the return values was intentional, it seems
that the conversion of the containing function from void to int was
simply overlooked.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mips@linux-mips.org
Fixes: d8152bf85d ("clocksource/drivers/mips-gic-timer: Convert init function to return error")
Link: http://lkml.kernel.org/r/1471429296-9053-3-git-send-email-daniel.lezcano@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
I could not figure out why, but GCC cannot prove that the
kona_timer_init() function always initializes its two outputs,
and we get a warning for the use of the 'lsw' variable later,
which is obviously correct.
drivers/clocksource/bcm_kona_timer.c: In function 'kona_timer_init':
drivers/clocksource/bcm_kona_timer.c:119:13: error: 'lsw' may be used uninitialized in this function [-Werror=maybe-uninitialized]
Slightly reordering the loop makes the warning disappear, after
it becomes more obvious to the compiler that the loop is
always entered on the first iteration.
As pointed out by Ray Jui, there is a related problem in the
way we deal with the loop running into the limit, as we just
keep going there with an invalid counter data, so instead we
now propagate a -ETIMEDOUT result to the caller.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Ray Jui <ray.jui@broadcom.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: bcm-kernel-feedback-list@broadcom.com
Link: http://lkml.kernel.org/r/1471429296-9053-2-git-send-email-daniel.lezcano@linaro.org
Link: https://patchwork.kernel.org/patch/9174261/
Signed-off-by: Ingo Molnar <mingo@kernel.org>
After commit b34f2bc ("arm64: KVM: Make ICC_SRE_EL1 access return the
configured SRE value") we report SRE value to 64-bit guest, but 32-bit
one still handled as RAZ/WI what leads to funny promise we do not keep:
"GICv3: GIC: unable to set SRE (disabled at EL2), panic ahead"
Instead, return the actual value of the ICC_SRE_EL1 register that the
guest should see.
[ Tweaked commit message - Christoffer ]
Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Commit 1e2a7d7849 ("irqdomain: Don't set type when mapping an IRQ")
moved the trigger configuration call from the irqdomain mapping to
the interrupt being actually requested.
This patch failed to handle the case where we configure a chained
interrupt, which doesn't get requested through the usual path.
In order to solve this, let's call __irq_set_trigger just before
starting the cascade interrupt. Special care must be taken to
make the flow handler stick, as the .irq_set_type method could
have reset it (it doesn't know we're dealing with a chained
interrupt).
Based on an initial patch by Jon Hunter.
Fixes: 1e2a7d7849 ("irqdomain: Don't set type when mapping an IRQ")
Reported-by: John Stultz <john.stultz@linaro.org>
Reported-by: Linus Walleij <linus.walleij@linaro.org>
Tested-by: John Stultz <john.stultz@linaro.org>
Acked-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Similarily to f005bd7e3b ("clocksource/arm_arch_timer: Force
per-CPU interrupt to be level-triggered"), make sure we can
survive an interrupt that has been misconfigured as edge-triggered
by forcing it to be level-triggered (active low is assumed, but
the GIC doesn't really care whether this is high or low).
Hopefully, the amount of shouting in the kernel log will convince
the user to do something about their firmware.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
We already have a workaround for Cortex-A57 erratum #852523,
but Cortex-A72 r0p0 to r0p2 do suffer from the same issue
(known as erratum #853709).
Let's document the fact that we already handle this.
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
When converting a gfn to a pfn, we call gfn_to_pfn_prot, which returns
various kinds of error values. It turns out that is_error_pfn() only
returns true when the gfn was found in a memory slot and could somehow
not be used, but it does not return true if the gfn does not belong to
any memory slot.
Change use to is_error_noslot_pfn() which covers both cases.
Note: Since we already check for kvm_is_error_hva(hva) explicitly in the
caller of this function while holding the kvm->srcu lock protecting the
memory slots, this should never be a problem, but nevertheless this
change is warranted as it shows the intention of the code.
Reported-by: James Hogan <james.hogan@imgtec.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
blk_set_queue_dying() can be called while another thread is
submitting I/O or changing queue flags, e.g. through dm_stop_queue().
Hence protect the QUEUE_FLAG_DYING flag change with locking.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Jens Axboe <axboe@fb.com>
When we're really tight on space, xfs_alloc_ag_vextent_small() can
allocate a block from the AGFL and give it to the caller. Since the
caller is never the AGFL-fixing method, we must remove the OWN_AG
reverse mapping because it will clash with whatever rmap the caller
wants to set up. This bug was discovered by running generic/299
repeatedly.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Pull virtio/vhost fixes from Michael Tsirkin:
- test fixes
- a vsock fix
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
tools/virtio: add dma stubs
vhost/test: fix after swiotlb changes
vhost/vsock: drop space available check for TX vq
ringtest: test build fix
Pull s390 fixes from Martin Schwidefsky:
"A couple of bug fixes, minor cleanup and a change to the default
config"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/dasd: fix failing CUIR assignment under LPAR
s390/pageattr: handle numpages parameter correctly
s390/dasd: fix hanging device after clear subchannel
s390/qdio: avoid reschedule of outbound tasklet once killed
s390/qdio: remove checks for ccw device internal state
s390/qdio: fix double return code evaluation
s390/qdio: get rid of spin_lock_irqsave usage
s390/cio: remove subchannel_id from ccw_device_private
s390/qdio: obtain subchannel_id via ccw_device_get_schid()
s390/cio: stop using subchannel_id from ccw_device_private
s390/config: make the vector optimized crc function builtin
s390/lib: fix memcmp and strstr
s390/crc32-vx: Fix checksum calculation for small sizes
s390: clarify compressed image code path
Use a special read-only iomap_ops implementation to support fiemap on
the attr fork.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
We'll never get nimap == 0 for a successful return from xfs_bmapi_read,
so don't try to handle it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
By bassing through an -ENOENT, similar to the old XFS implementation of
FIEMAP_FLAG_XATTR.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
[hch: split from a larger patch]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
The flag is checked as supported, but then we do an unconditional
sync of the file, regardless of whether the flag is set or not. Make
the sync conditional on having the FIEMAP_FLAG_SYNC flag set.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
iov_iter_copy_from_user_atomic disables page faults internally, no need to
do it around the call. This also brings the iomap code in line with
the original filemap version.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
This catches up with commit 2457ae ("mm: non-atomically mark page
accessed during page cache allocation where possible"), which
moved the initial access marking into the pagecache allocator.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Track the number of blocks used for the rmapbt in the AGF. When we
get to the AG reservation code we need this counter to quickly
make our reservation during mount.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
When we do DAX IO, we try to invalidate the entire page cache held
on the file. This is incorrect as it will trash the entire mapping
tree that now tracks dirty state in exceptional entries in the radix
tree slots.
What we are trying to do is remove cached pages (e.g from reads
into holes) that sit in the radix tree over the range we are about
to write to. Hence we should just limit the invalidation to the
range we are about to overwrite.
Reported-by: Jan Kara <jack@suse.cz>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
The space reservations was without an explaination in commit
"Add error reporting calls in error paths that return EFSCORRUPTED"
back in 2003. There is no reason to reserve disk blocks in the
transaction when allocating blocks for delalloc space as we already
reserved the space when creating the delalloc extent.
With this fix we stop running out of the reserved pool in
generic/229, which has happened for long time with small blocksize
file systems, and has increased in severity with the new buffered
write path.
[ dchinner: we still need to pass the block reservation into
xfs_bmapi_write() to ensure we don't deadlock during AG selection.
See commit dbd5c8c ("xfs: pass total block res. as total
xfs_bmapi_write() parameter") for more details on why this is
necessary. ]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
The buffer I/O accounting mechanism tracks async buffers under I/O. As
an optimization, the buffer I/O count is incremented only once on the
first async I/O for a given hold cycle of a buffer and decremented once
the buffer is released to the LRU (or freed).
xfs_buf_ioacct_dec() has an ASSERT() check for an XBF_ASYNC buffer, but
we have one or two corner cases where a buffer can be submitted for I/O
multiple times via different methods in a single hold cycle. If an async
I/O occurs first, the I/O count is incremented. If a sync I/O occurs
before the hold count drops, XBF_ASYNC is cleared by the time the I/O
count is decremented.
Remove the async assert check from xfs_buf_ioacct_dec() as this is a
perfectly valid scenario. For the purposes of I/O accounting, we really
only care about the buffer async state at I/O submission time.
Discovered-and-analyzed-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
The i40e driver was causing a kernel panic when
non-contiguous Traffic Classes, or Traffic Classes not
starting with TC0, were configured on a link partner switch.
i40e does not support non-contiguous TCs.
To fix this, the patch changes the logic when determining
the total number of TCs enabled. Before, this would use the
highest TC number enabled and assume that all TCs below it were
also enabled. Now, we create a bitmask of enabled TCs and scan
it to determine not only the number of TCs, but also if the set
of enabled TCs starts at zero and is contiguous. If not, then
DCB is disabled by only returning one TC.
Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
attempt_restore_of_faulty_devices() is limited to 64 when it should support
the new maximum of 253 when identifying any failed devices. It clears any
revivable devices via an MD personality hot remove and add cylce to allow
for their recovery.
Address by using existing functions to retrieve and update all failed
devices' bitfield members in the dm raid superblocks on all RAID devices
and check for any devices to clear in it.
Whilst on it, don't call attempt_restore_of_faulty_devices() for any MD
personality not providing disk hot add/remove methods (i.e. raid0 now),
because such personalities don't support reviving of failed disks.
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Back when I submitted the GSO code I messed up and dropped the support for
disabling the VLAN tag filtering via the feature bit. This patch
re-enables the use of the NETIF_F_HW_VLAN_CTAG_FILTER to enable/disable the
VLAN filtering independent of toggling promiscuous mode.
Fixes: b83e30104b ("ixgbe/ixgbevf: Add support for GSO partial")
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
'lvchange --refresh RaidLV' causes a mapped device suspend/resume cycle
aiming at device restore and resync after transient device failures. This
failed because flag RT_FLAG_RS_RESUMED was always cleared in the suspend path,
thus the device restore wasn't performed in the resume path.
Solve by removing RT_FLAG_RS_RESUMED from the suspend path and resume
unconditionally. Also, remove superfluous comment from raid_resume().
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
When I was adding the code for enabling VLAN promiscuous mode with SR-IOV
enabled I had inadvertently left the VLNCTRL.VFE bit unchanged as I has
assumed there was code in another path that was setting it when we enabled
SR-IOV. This wasn't the case and as a result we were just disabling VLAN
filtering for all the VFs apparently.
Also the previous patches were always clearing CFIEN which was always set
to 0 by the hardware anyway so I am dropping the redundant bit clearing.
Fixes: 1636956491 ("ixgbe: Add support for VLAN promiscuous with SR-IOV")
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
On LVM2 conversions via lvconvert(8), the target keeps mapped devices in
frozen state when requesting RAID devices be resynchronized. This
applies to e.g. adding legs to a raid1 device or taking over from raid0
to raid4 when the rebuild flag's set on the new raid1 legs or the added
dedicated parity stripe.
Also, fix frozen recovery for reshaping as well.
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Instead of passing negative flags like PCI_IRQ_NOMSI to prevent use of
certain interrupt types, pass positive flags like PCI_IRQ_LEGACY,
PCI_IRQ_MSI, etc., to specify the acceptable interrupt types.
This is based on a number of pending driver conversions that just happend
to be a whole more obvious to read this way, and given that we have no
users in the tree yet it can still easily be done.
I've also added a PCI_IRQ_ALL_TYPES catchall to keep the case of accepting
all interrupt types very simple.
[bhelgaas: changelog, fix PCI_IRQ_AFFINITY doc typo, remove mention of
PCI_IRQ_NOLEGACY]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Alexander Gordeev <agordeev@redhat.com>
Pull pin control fixes from Linus Walleij:
"Here are a few pin control fixes for the v4.8 series, nothing special
about them:
- Add the missing <linux/io.h> header to the Intel Merrifield driver
to get rid of build mess.
- Drop two instances of pinctrl_unregister() called for drivers using
devm_* resource management.
- Remove the default debounce time for the AMD driver"
* tag 'pinctrl-v4.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
pinctrl: intel: merrifield: Add missed header
pinctrl/amd: Remove the default de-bounce time
pinctrl: pistachio: Drop pinctrl_unregister for devm_ registered device
pinctrl: meson: Drop pinctrl_unregister for devm_ registered device
Pull perf/urgent fixes from Arnaldo Carvalho de Melo:
- Fix occasional decoding errors when tracing system-wide with
Intel PT (Adrian Hunter)
- Fix ip compression in Intel PT for some specific packet types not
present on current hardware (Adrian Hunter)
- Fix annotation of objects with debuginfo files (Anton Blanchard)
- Fix build on Fedora Rawhide (25) wrt using the right header to
get the major() & minor() definitions in the jitdump code, now
it is deprecated getting those using sys/types.h, one has to use
sys/sysmacros.h (Arnaldo Carvalho de Melo)
- Sync arm64/s390 kvm related header files (Arnaldo Carvalho de Melo)
- Check for dup and fdopen failures in 'perf probe' (Colin Ian King,
Arnaldo Carvalho de Melo)
- Fix showing callchains in pipe mode, i.e.
perf record -g -o - workload | perf script
now shows callchains (He Kuang)
- Show proper message when the scripts directory points to some
invalid location in 'perf script --list' (He Kuang)
- Fix 'perf mem -t store' to record 'cpu/mem-stores/P' events
again (Jiri Olsa)
- Fix ppc64le build failure when libelf is not present (Ravi Bangoria)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
I've got reports that the Intel I-218V NIC in Intel NUC5i5RYH systems used
as a PTP slave experiences random ~10 hour clock jumps, which are resolved
if the same workaround for the 82574 and 82583 is employed, so set the
appropriate flag2 in e1000_pch_lpt_info too.
Reported-by: Rupesh Patel <rupatel@redhat.com>
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This is prepatory work for an expanding list of adapter families that have
occasional ~10 hour clock jumps when being used for PTP. Factor out the
sanitization function and convert to using a feature (bug) flag, per
suggestion from Jesse Brandeburg.
Littering functional code with device-specific checks is much messier than
simply checking a flag, and having device-specific init set flags as needed.
There are probably a number of other cases in the e1000e code that
could/should be converted similarly.
Suggested-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Fix PHY delay compensation math in igb_ptp_tx_hwtstamp() and
igb_ptp_rx_rgtstamp. Add PHY delay compensation in
igb_ptp_rx_pktstamp().
In the IGB driver, there are two functions that retrieve timestamps
received by the PHY - igb_ptp_rx_rgtstamp() and igb_ptp_rx_pktstamp().
The previous commit only changed igb_ptp_rx_rgtstamp(), and the change
was incorrect.
There are two instances in which PHY delay compensations should be
made:
- Before the packet transmission over the PHY, the latency between
when the packet is timestamped and transmission of the packets,
should be an add operation, but it is currently a subtract.
- After the packets are received from the PHY, the latency between
the receiving and timestamping of the packets should be a subtract
operation, but it is currently an add.
Signed-off-by: Kshitiz Gupta <kshitiz.gupta@ni.com>
Fixes: 3f544d2 (igb: adjust ptp timestamps for tx/rx latency)
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
When a guest wants to map a device-ID/event-ID combination that is
already mapped, we may end up in a situation where an LPI is never
"put", thus never being freed.
Since the GICv3 spec says that mapping an already mapped LPI is
UNPREDICTABLE, lets just bail out early in this situation to avoid
any potential leaks.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Commit 288dab8a35 ("block: add a separate operation type for secure
erase") split REQ_OP_SECURE_ERASE from REQ_OP_DISCARD without considering
all the places REQ_OP_DISCARD was being used to mean either. Fix those.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Fixes: 288dab8a35 ("block: add a separate operation type for secure erase")
Signed-off-by: Jens Axboe <axboe@fb.com>
commit cbaadf0f90 ("ASoC: atmel_ssc_dai: refactor the startup and
shutdown") refactored code such that the SSC is reset on every
startup; this breaks duplex audio (e.g. first start audio playback,
then start record, causing the playback to stop/hang)
Fixes: cbaadf0f90 (ASoC: atmel_ssc_dai: refactor the startup and shutdown)
Signed-off-by: Christoph Huber <c.huber@bct-electronic.com>
Signed-off-by: Peter Meerwald-Stadler <p.meerwald@bct-electronic.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: stable@vger.kernel.org
rtree_next_node() walks the linked list of leaf nodes to find the next
block of pages in the struct memory_bitmap. If it walks off the end of
the list of nodes, it walks the list of memory zones to find the next
region of memory. If it walks off the end of the list of zones, it
returns false.
This leaves the struct bm_position's node and zone pointers pointing
at their respective struct list_heads in struct mem_zone_bm_rtree.
memory_bm_find_bit() uses struct bm_position's node and zone pointers
to avoid walking lists and trees if the next bit appears in the same
node/zone. It handles these values being stale.
Swap rtree_next_node()s 'step then test' to 'test-next then step',
this means if we reach the end of memory we return false and leave
the node and zone pointers as they were.
This fixes a panic on resume using AMD Seattle with 64K pages:
[ 6.868732] Freezing user space processes ... (elapsed 0.000 seconds) done.
[ 6.875753] Double checking all user space processes after OOM killer disable... (elapsed 0.000 seconds)
[ 6.896453] PM: Using 3 thread(s) for decompression.
[ 6.896453] PM: Loading and decompressing image data (5339 pages)...
[ 7.318890] PM: Image loading progress: 0%
[ 7.323395] Unable to handle kernel paging request at virtual address 00800040
[ 7.330611] pgd = ffff000008df0000
[ 7.334003] [00800040] *pgd=00000083fffe0003, *pud=00000083fffe0003, *pmd=00000083fffd0003, *pte=0000000000000000
[ 7.344266] Internal error: Oops: 96000005 [#1] PREEMPT SMP
[ 7.349825] Modules linked in:
[ 7.352871] CPU: 2 PID: 1 Comm: swapper/0 Tainted: G W I 4.8.0-rc1 #4737
[ 7.360512] Hardware name: AMD Overdrive/Supercharger/Default string, BIOS ROD1002C 04/08/2016
[ 7.369109] task: ffff8003c0220000 task.stack: ffff8003c0280000
[ 7.375020] PC is at set_bit+0x18/0x30
[ 7.378758] LR is at memory_bm_set_bit+0x24/0x30
[ 7.383362] pc : [<ffff00000835bbc8>] lr : [<ffff0000080faf18>] pstate: 60000045
[ 7.390743] sp : ffff8003c0283b00
[ 7.473551]
[ 7.475031] Process swapper/0 (pid: 1, stack limit = 0xffff8003c0280020)
[ 7.481718] Stack: (0xffff8003c0283b00 to 0xffff8003c0284000)
[ 7.800075] Call trace:
[ 7.887097] [<ffff00000835bbc8>] set_bit+0x18/0x30
[ 7.891876] [<ffff0000080fb038>] duplicate_memory_bitmap.constprop.38+0x54/0x70
[ 7.899172] [<ffff0000080fcc40>] snapshot_write_next+0x22c/0x47c
[ 7.905166] [<ffff0000080fe1b4>] load_image_lzo+0x754/0xa88
[ 7.910725] [<ffff0000080ff0a8>] swsusp_read+0x144/0x230
[ 7.916025] [<ffff0000080fa338>] load_image_and_restore+0x58/0x90
[ 7.922105] [<ffff0000080fa660>] software_resume+0x2f0/0x338
[ 7.927752] [<ffff000008083350>] do_one_initcall+0x38/0x11c
[ 7.933314] [<ffff000008b40cc0>] kernel_init_freeable+0x14c/0x1ec
[ 7.939395] [<ffff0000087ce564>] kernel_init+0x10/0xfc
[ 7.944520] [<ffff000008082e90>] ret_from_fork+0x10/0x40
[ 7.949820] Code: d2800022 8b400c21 f9800031 9ac32043 (c85f7c22)
[ 7.955909] ---[ end trace 0024a5986e6ff323 ]---
[ 7.960529] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
Here struct mem_zone_bm_rtree's start_pfn has been returned instead of
struct rtree_node's addr as the node/zone pointers are corrupt after
we walked off the end of the lists during mark_unsafe_pages().
This behaviour was exposed by commit 6dbecfd345 ("PM / hibernate:
Simplify mark_unsafe_pages()"), which caused mark_unsafe_pages() to call
duplicate_memory_bitmap(), which uses memory_bm_find_bit() after walking
off the end of the memory bitmap.
Fixes: 3a20cb1779 (PM / Hibernate: Implement position keeping in radix tree)
Signed-off-by: James Morse <james.morse@arm.com>
[ rjw: Subject ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
After we have called dpcm_path_get we should make sure to call
dpcm_path_put on all error paths. This was not happening causing the
allocated widget list to be leaked, this patch corrects this by adding a
dpcm_path_put to the error path.
Signed-off-by: Charles Keepax <ckeepax@opensource.wolfsonmicro.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
1. fix ctx pointer
Use req_ctx which is the ctx for the next job that have
been completed in the lanes instead of the first
completed job rctx, whose completion could have been
called and released.
Signed-off-by: Xiaodong Liu <xiaodong.liu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
1. fix ctx pointer
Use req_ctx which is the ctx for the next job that have
been completed in the lanes instead of the first
completed job rctx, whose completion could have been
called and released.
2. fix digest copy
Use XMM register to copy another 16 bytes sha256 digest
instead of a regular register.
Signed-off-by: Xiaodong Liu <xiaodong.liu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Remove the hcd after checking for the xhci last quirks, not before.
This caused a hang on a Alpine Ridge xhci based maching which remove
the whole xhci controller when unplugging the last usb device
CC: <stable@vger.kernel.org>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
After a device is disconnected, xhci_stop_device() will be invoked
in xhci_bus_suspend().
Also the "disconnect" IRQ will have ISR to invoke
xhci_free_virt_device() in this sequence.
xhci_irq -> xhci_handle_event -> handle_cmd_completion ->
xhci_handle_cmd_disable_slot -> xhci_free_virt_device
If xhci->devs[slot_id] has been assigned to NULL in
xhci_free_virt_device(), then virt_dev->eps[i].ring in
xhci_stop_device() may point to an invlid address to cause kernel
panic.
virt_dev = xhci->devs[slot_id];
:
if (virt_dev->eps[i].ring && virt_dev->eps[i].ring->dequeue)
[] Unable to handle kernel paging request at virtual address 00001a68
[] pgd=ffffffc001430000
[] [00001a68] *pgd=000000013c807003, *pud=000000013c807003,
*pmd=000000013c808003, *pte=0000000000000000
[] Internal error: Oops: 96000006 [#1] PREEMPT SMP
[] CPU: 0 PID: 39 Comm: kworker/0:1 Tainted: G U
[] Workqueue: pm pm_runtime_work
[] task: ffffffc0bc0e0bc0 ti: ffffffc0bc0ec000 task.ti:
ffffffc0bc0ec000
[] PC is at xhci_stop_device.constprop.11+0xb4/0x1a4
This issue is found when running with realtek ethernet device
(0bda:8153).
Signed-off-by: Jim Lin <jilin@nvidia.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Enqueue the first TRB even if full_len is zero.
Without this "adb install <apk>" freezes the system.
Signed-off-by: Alban Browaeys <alban.browaeys@gmail.com>
Fixes: 86065c2719 ("xhci: don't rely on precalculated value of needed trbs in the enqueue loop")
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fix "Command completion event does not match command" errors by always
handling the command ring stopped events.
The command ring stopped event is generated as a result of aborting
or stopping the command ring with a register write. It is not caused
by a command in the command queue, and thus won't have a matching command
in the comman list.
Solve it by handling the command ring stopped event before checking for a
matching command.
In most command time out cases we abort the command ring, and get
a command ring stopped event. The events command pointer will point at
the current command ring dequeue, which in most cases matches the timed
out command in the command list, and no error messages are seen.
If we instead get a command aborted event before the command ring stopped
event, the abort event will increse the command ring dequeue pointer, and
the following command ring stopped events command pointer will point at the
next, not yet queued command. This case triggered the error message
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
CC: <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Sean Wang says:
====================
mediatek: Fix warning and issue
This patch set fixes the following warning and issues
v1 -> v2: Fix message typos and add coverletter
v2 -> v3: Split from the previous series for submitting bug fixes
as a series targeting 'net'
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Runtime warning occurs if DMA-API debug feature is enabled that would be
raised by pointers passed to DMA API as arguments to inconsistent struct
device objects, so that the patch makes them usage aligned between DMA
operations such as dma_map_*() and dma_unmap_*() to eliminate the warning.
Signed-off-by: Sean Wang <sean.wang@mediatek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 08ef55c6f2
("net-next: mediatek: fix gigabit and flow control advertisement")
had supported proper flow control settings for GMAC1. But for GMAC0,
1.GMAC0 shares the common logic with GMAC1 inside mtk_phy_link_adjust()
to adapt various settings for the target phy.
2.GMAC0 uses fixed-phy to connect to a builtin gigabit switch with
fixed link speed as commit 0c72c50f6f
("net-next: mediatek: add fixed-phy support") describes.
3.However, fixed-phy doesn't enable SUPPORTED_Pause & SUPPORTED_Asym_Pause
supported flag on default that would cause mtk_phy_link_adjust() not to
enable flow control setting on GMAC0 properly and cause packet dropped
when high traffic.
Due to these reasons, the patch adds SUPPORTED_Pause & SUPPORTED_Asym_Pause
supported flags on fixed-phy used by the driver to have proper handling on
the both GMAC with the shared common logic.
Signed-off-by: Sean Wang <sean.wang@mediatek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The patch fixes up the incorrect setup of reduced MII (RMII) on GMAC
and adds the supplement for the setup of reverse MII (REVMII) on GMAC
, and rearranges the error handling for invalid PHY argument.
Signed-off-by: Sean Wang <sean.wang@mediatek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The value of temp_level4_pgt is the physical address of the
top-level page directory, so use __pa() to compute it.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
In order to successfully decode Intel PT traces, context switch events
are needed from the moment the trace starts. Currently that is ensured
by using the 'immediate' flag which enables the switch event when it is
opened.
However, since commit 86c2786994 ("perf intel-pt: Add support for
PERF_RECORD_SWITCH") that might not always happen. When tracing
system-wide the context switch event is added to the tracking event
which was not set as 'immediate'. Change that so it is.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: stable@vger.kernel.org # v4.4+
Fixes: 86c2786994 ("perf intel-pt: Add support for PERF_RECORD_SWITCH")
Link: http://lkml.kernel.org/r/1471245784-22580-1-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add missing platform_set_drvdata() in tps65217_charger_probe(), otherwise
calling platform_get_drvdata() in remove returns NULL.
This is detected by Coccinelle semantic patch.
Fixes: 3636859b28 ("power_supply: Add support for tps65217-charger")
Signed-off-by: Wei Yongjun <weiyj.lk@gmail.com>
Signed-off-by: Sebastian Reichel <sre@kernel.org>
When userspace provides the doorbell address for an MSI to be
injected into the guest, we find a KVM device which feels responsible.
Lets check that this device is really an emulated ITS before we make
real use of the container_of-ed pointer.
[ Moved NULL-pointer check to caller of static function
- Christoffer ]
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Currently we register an ITS device upon userland issuing the CTLR_INIT
ioctl to mark initialization of the ITS as done.
This deviates from the initialization sequence of the existing GIC
devices and does not play well with the way QEMU handles things.
To be more in line with what we are used to, register the ITS(es) just
before the first VCPU is about to run, so in the map_resources() call.
This involves iterating through the list of KVM devices and map each
ITS that we find.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
There are two problems with the current implementation of the MMIO
handlers for the propbaser and pendbaser:
First, the write to the value itself is not guaranteed to be an atomic
64-bit write so two concurrent writes to the structure field could be
intermixed.
Second, because we do a read-modify-update operation without any
synchronization, if we have two 32-bit accesses to separate parts of the
register, we can loose one of them.
By using the atomic cmpxchg64 we should cover both issues above.
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Vitaly Kuznetsov says:
====================
hv_netvsc: fixes for VF removal path
Kernel crash is reported after VF is removed and detached from netvsc
device. Turns out we have multiple different (but related) issues on the
VF removal path which I'm trying to address with PATCHes 2-5 of this
series. PATCH1 is required to support the change.
Changes since v1:
- Re-arrange patches in the series to not introduce new issues [David Miller]
- Add PATCH5 which fixes a new issue I discovered while testing.
- Add Haiyang' A-b tags to PATCH1-4
With regards to Stephen's suggestion: I believe that switching to using RCU
and eliminating vf_use_cnt/vf_inject is the right thing to do long-term, we
can either put this on top of this series or do it later in net-next.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Bonding driver sets IFF_BONDING on both master (the bonding device) and
slave (the real NIC) devices and in netvsc_netdev_event() we want to skip
master devices only. Currently, there is an uncertainty when a slave
interface is removed: if bonding module comes first in netdev_chain it
clears IFF_BONDING flag on the netdev and netvsc_netdev_event() correctly
handles NETDEV_UNREGISTER event, but in case netvsc comes first on the
chain it sees the device with IFF_BONDING still attached and skips it. As
we still hold vf_netdev pointer to the device we crash on the next inject.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Acked-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We're not guaranteed to see NETDEV_REGISTER/NETDEV_UNREGISTER notifications
only once per VF but we increase/decrease module refcount unconditionally.
Check vf_netdev to make sure we don't take/release it twice. We presume
that only one VF per netvsc device may exist.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Acked-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We reset vf_inject on VF going down (netvsc_vf_down()) but we don't on
VF removal (netvsc_unregister_vf()) so vf_inject stays 'true' while
vf_netdev is already NULL and we're trying to inject packets into NULL
net device in netvsc_recv_callback() causing kernel to crash.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Acked-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Here is a deadlock scenario:
- netvsc_vf_up() schedules netvsc_notify_peers() work and quits.
- netvsc_vf_down() runs before netvsc_notify_peers() gets executed. As it
is being executed from netdev notifier chain we hold rtnl lock when we
get here.
- we enter while (atomic_read(&net_device_ctx->vf_use_cnt) != 0) loop and
wait till netvsc_notify_peers() drops vf_use_cnt.
- netvsc_notify_peers() starts on some other CPU but netdev_notify_peers()
will hang on rtnl_lock().
- deadlock!
Instead of introducing additional synchronization I suggest we drop
gwrk.dwrk completely and call NETDEV_NOTIFY_PEERS directly. As we're
acting under rtnl lock this is legitimate.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Acked-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
struct netvsc_device is not suitable for storing VF information as this
structure is being destroyed on MTU change / set channel operation (see
rndis_filter_device_remove()). Move all VF related stuff to struct
net_device_context which is persistent.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Acked-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ensure that the inner_protocol is set on transmit so that GSO segmentation,
which relies on that field, works correctly.
This is achieved by setting the inner_protocol in gre_build_header rather
than each caller of that function. It ensures that the inner_protocol is
set when gre_fb_xmit() is used to transmit GRE which was not previously the
case.
I have observed this is not the case when OvS transmits GRE using
lwtunnel metadata (which it always does).
Fixes: 3872035241 ("gre: Use inner_proto to obtain inner header protocol")
Cc: Pravin Shelar <pshelar@ovn.org>
Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 73cdf0c6ea ("perf symbols: Record text offset in dso
to calculate objdump address") started storing the offset of
the text section for all DSOs:
if (elf_section_by_name(elf, &ehdr, &tshdr, ".text", NULL))
dso->text_offset = tshdr.sh_addr - tshdr.sh_offset;
Unfortunately this breaks debuginfo files, because we need to calculate
the offset of the text section in the associated executable file. As a
result perf annotate returns junk for all debuginfo files.
Fix this by using runtime_ss->elf which should point at the executable
when parsing a debuginfo file.
Signed-off-by: Anton Blanchard <anton@samba.org>
Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Tested-by: Wang Nan <wangnan0@huawei.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org # v4.6+
Fixes: 73cdf0c6ea ("perf symbols: Record text offset in dso to calculate objdump address")
Link: http://lkml.kernel.org/r/20160813115533.6de17912@kryten
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Pull IOMMU fixes from Joerg Roedel:
- Some functions defined in a header file for the mediatek driver were
not marked inline. Fix that oversight.
- Fix a potential crash in the ARM64 dma-mapping code when freeing a
partially initialized domain.
- Another fix for ARM64 dma-mapping to respect IOMMU mapping
constraints when allocating IOVA addresses.
* tag 'iommu-fixes-v4.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu/dma: Respect IOMMU aperture when allocating
iommu/dma: Don't put uninitialised IOVA domains
iommu/mediatek: Mark static functions in headers inline
Pull EDAC fix from Borislav Petkov:
"A fix to sb_edac correcting channel reporting on Knights Landing"
* tag 'edac_fixes_for_4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
EDAC, sb_edac: Fix channel reporting on Knights Landing
ping_v6_sendmsg does not set flowi6_oif in response to
sin6_scope_id or sk_bound_dev_if, so it is not possible to use
these APIs to ping an IPv6 address on a different interface.
Instead, it sets flowi6_iif, which is incorrect but harmless.
Stop setting flowi6_iif, and support various ways of setting oif
in the same priority order used by udpv6_sendmsg.
Tested: https://android-review.googlesource.com/#/c/254470/
Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Because perf data from pipe do not have a header with evsel attr, we
should not check that and disable symbol_conf.use_callchain. Otherwise,
perf script won't show callchains even if the data stream contains
callchain.
Before:
$ perf record -g -o - uname |perf script
Linux
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.000 MB - ]
uname 1828 182630.186578: 250000 cpu-clock: ..b9499 setup_arg_pages
uname 1828 182630.186850: 250000 cpu-clock: ..83b20 ___might_sleep
uname 1828 182630.187153: 250000 cpu-clock: ..4b6be file_map_prot_ch
...
After:
$ perf record -g -o - uname |perf script
Linux
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.000 MB - ]
uname 1833 182675.927099: 250000 cpu-clock:
ba5520 _raw_spin_lock+0xfe200040 ([kernel.kallsyms])
389dd4 expand_downwards+0xfe200154 ([kernel.kallsyms])
389f34 expand_stack+0xfe200024 ([kernel.kallsyms])
3b957e setup_arg_pages+0xfe20019e ([kernel.kallsyms])
40c80f load_elf_binary+0xfe20042f ([kernel.kallsyms])
...
Signed-off-by: He Kuang <hekuang@huawei.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1470309943-153909-2-git-send-email-hekuang@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Perf shows the usage message when perf scripts folder failed to open,
which misleads users to let them think the command is being mistyped.
This patch shows a proper message and guides users to check the
PERF_EXEC_PATH environment variable in that case.
Before:
$ perf script --list
Usage: perf script [<options>]
or: perf script [<options>] record <script> [<record-options>] <command>
or: perf script [<options>] report <script> [script-args]
or: perf script [<options>] <script> [<record-options>] <command>
or: perf script [<options>] <top-script> [script-args]
-l, --list list available scripts
After:
$ perf script --list
open(/home/user/perf-core/scripts) failed.
Check for "PERF_EXEC_PATH" env to set scripts dir.
Signed-off-by: He Kuang <hekuang@huawei.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1470309943-153909-1-git-send-email-hekuang@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The called of_graph_get_next_endpoint() already decrements the refcount
of the prev node, so it is wrong to do it again in the calling function.
Use the for_each_endpoint_of_node() helper to interate through the
endpoint OF nodes, which already does the right thing and simplifies
the code a bit.
Fixes: 8ccd0d0ca0
(of: add helper for getting endpoint node of specific identifiers)
Cc: stable@vger.kernel.org
Reported-by: David Jander <david@protonic.nl>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Acked-by: Philipp Zabel <p.zabel@pengutronix.de>
Signed-off-by: Rob Herring <robh@kernel.org>
Noticed on Fedora Rawhide:
$ gcc --version
gcc (GCC) 6.1.1 20160721 (Red Hat 6.1.1-4)
$ rpm -q glibc
glibc-2.24.90-1.fc26.x86_64
$
CC /tmp/build/perf/util/jitdump.o
util/jitdump.c: In function 'jit_repipe_code_load':
util/jitdump.c:428:2: error: '__major_from_sys_types' is deprecated:
In the GNU C Library, `major' is defined by <sys/sysmacros.h>.
For historical compatibility, it is currently defined by
<sys/types.h> as well, but we plan to remove this soon.
To use `major', include <sys/sysmacros.h> directly.
If you did not intend to use a system-defined macro `major',
you should #undef it after including <sys/types.h>.
[-Werror=deprecated-declarations]
event->mmap2.maj = major(st.st_dev);
^~~~~
In file included from /usr/include/features.h:397:0,
from /usr/include/sys/types.h:25,
from util/jitdump.c:1:
/usr/include/sys/sysmacros.h:87:1: note: declared here
__SYSMACROS_DEFINE_MAJOR (__SYSMACROS_FST_IMPL_TEMPL)
Fix it following that recomendation.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-3majvd0adhfr25rvx4v5e9te@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acquiring the nvme_ctrl lock before reading ctrl->state in
nvme_change_ctrl_state() should prevent a theoretical invalid state
transition, in the event of two threads racing inside that function.
I haven't been able to observe this happening with the current code, and
the current state machine seems to be simple enough to not be
affected by these invalid transitions, but future modifications could
make it more likely to happen.
Signed-off-by: Gabriel Krisman Bertazi <krisman@linux.vnet.ibm.com>
Reviewed-by: Sagi Grimberg <sag@grimberg.me>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Increase mempool size from 16 to 64 entries. This increase improves
swap on dm-crypt performance.
When swapping to dm-crypt, all available memory is temporarily exhausted
and dm-crypt can only use the mempool reserve.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Both the fence and event alloc are safe to be done without holding the GPU
lock, as they either don't need any locking (fences) or are protected by
their own lock (events).
This solves a bad locking interaction between the submit path and the
recover worker. If userspace manages to exhaust all available events while
the GPU is hung, the submit will wait for events to become available
holding the GPU lock. The recover worker waits for this lock to become
available before trying to recover the GPU which frees up the allocated
events. Essentially both paths are deadlocked until the submit path
times out waiting for available events, failing the submit that could
otherwise be handled just fine if the recover worker had the chance to
bring the GPU back in a working state.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
In mlxsw_sp_router_fib4_add_info_destroy(), the fib_entry pointer is used
after it has been freed by mlxsw_sp_fib_entry_destroy(). Use a temporary
variable to fix this.
Fixes: 61c503f976 ("mlxsw: spectrum_router: Implement fib4 add/del switchdev obj ops")
Signed-off-by: Vincent Stehlé <vincent.stehle@laposte.net>
Cc: Jiri Pirko <jiri@mellanox.com>
Acked-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sander reports following splat after netfilter nat bysrc table got
converted to rhashtable:
swapper/0: page allocation failure: order:3, mode:0x2084020(GFP_ATOMIC|__GFP_COMP)
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.8.0-rc1 [..]
[<ffffffff811633ed>] warn_alloc_failed+0xdd/0x140
[<ffffffff811638b1>] __alloc_pages_nodemask+0x3e1/0xcf0
[<ffffffff811a72ed>] alloc_pages_current+0x8d/0x110
[<ffffffff8117cb7f>] kmalloc_order+0x1f/0x70
[<ffffffff811aec19>] __kmalloc+0x129/0x140
[<ffffffff8146d561>] bucket_table_alloc+0xc1/0x1d0
[<ffffffff8146da1d>] rhashtable_insert_rehash+0x5d/0xe0
[<ffffffff819fcfff>] nf_nat_setup_info+0x2ef/0x400
The failure happens when allocating the spinlock array.
Even with GFP_KERNEL its unlikely for such a large allocation
to succeed.
Thomas Graf pointed me at inet_ehash_locks_alloc(), so in addition
to adding NOWARN for atomic allocations this also makes the bucket-array
sizing more conservative.
In commit 095dc8e0c3 ("tcp: fix/cleanup inet_ehash_locks_alloc()"),
Eric Dumazet says: "Budget 2 cache lines per cpu worth of 'spinlocks'".
IOW, consider size needed by a single spinlock when determining
number of locks per cpu. So with 64 byte per cacheline and 4 byte per
spinlock this gives 32 locks per cpu.
Resulting size of the lock-array (sizeof(spinlock) == 4):
cpus: 1 2 4 8 16 32 64
old: 1k 1k 4k 8k 16k 16k 16k
new: 128 256 512 1k 2k 4k 8k
8k allocation should have decent chance of success even
with GFP_ATOMIC, and should not fail with GFP_KERNEL.
With 72-byte spinlock (LOCKDEP):
cpus : 1 2
old: 9k 18k
new: ~2k ~4k
Reported-by: Sander Eikelenboom <linux@eikelenboom.it>
Suggested-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove unnecessary use of enable/disable callback notifications
and the incorrect more space available check.
The virtio_transport_tx_work handles when the TX virtqueue
has more buffers available.
Signed-off-by: Gerard Garcia <ggarcia@deic.uab.cat>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Recent changes to ptr_ring broke the ringtest
which lacks a likely() stub. Fix it up.
Fixes: 982fb490c2
("ptr_ring: support zero length ring")
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
There is only one waiter for the completion, therefore there
is no need to use complete_all(). Let's make that clear by
using complete() instead of complete_all().
The usage pattern of the completion is:
meson_i2c_xfer_msg()
reinit_completion()
...
/* Start the transfer */
...
wait_for_completion_timeout()
Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
There is only one waiter for the completion, therefore there
is no need to use complete_all(). Let's make that clear by
using complete() instead of complete_all().
The usage pattern of the completion is:
brcmstb_send_i2c_cmd()
reinit_completion()
...
/* initiate transfer by setting iic_enable */
...
brcmstb_i2c_wait_for_completion()
Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
Reviewed-by: Kamal Dasu <kdasu.kdev@gmail.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
There is only one waiter for the completion, therefore there
is no need to use complete_all(). Let's make that clear by
using complete() instead of complete_all().
The usage pattern of the completion is:
bcm_kona_send_i2c_cmd()
reinit_completion()
...
bcm_kona_i2c_send_cmd_to_ctrl()
...
wait_for_completion_timeout()
Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
Acked-by: Ray Jui <ray.jui@broadcom.com>
Reviewed-by: Tim Kryger <tim.kryger@gmail.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
There is only one waiter for the completion, therefore there
is no need to use complete_all(). Let's make that clear by
using complete() instead of complete_all().
The usage pattern of the completion is:
bcm_iproc_i2c_xfer_single_msg()
reinit_completion()
...
(activate the transfer)
...
wait_for_completion_timeout()
Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
Acked-by: Ray Jui <ray.jui@broadcom.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
The "alternative command" feature was introduced with sama5d2 SoCs.
Its purpose is to let the hardware i2c controller automatically send the
STOP condition on the i2c bus at the end of a data transfer.
Without this feature, the i2c driver has to write the 'STOP' bit into the
Control Register so the hardware i2c controller is triggered to send the
STOP condition on the bus.
Using the "alternative command" feature requires to set the transfer data
length into the 8bit DATAL field of the Alternative Command Register.
Hence only data transfers up to 255 bytes can take advantage of the
"alternative command" feature. For greater data transfer sizes, the driver
should use the previous implementation, when the "alternative command"
support was not implemented yet.
Signed-off-by: Cyrille Pitchen <cyrille.pitchen@atmel.com>
Signed-off-by: Ludovic Desroches <ludovic.desroches@atmel.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
clk_disable_unprepare() is missed on failure paths in ocores_i2c_probe().
Found by Linux Driver Verification project (linuxtesting.org).
Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Acked-by: Peter Korsgaard <peter@korsgaard.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
cros_ec_cmd_xfer returns success status if the command transport
completes successfully, but the execution result is incorrectly ignored.
In many cases, the execution result is assumed to be successful, leading
to ignored errors and operating on uninitialized data.
We've recently introduced the cros_ec_cmd_xfer_status() helper to avoid these
problems. Let's use it.
[Regarding the 'Fixes' tag; there is significant refactoring since the driver's
introduction, but the underlying logical error exists throughout I believe]
Fixes: 9d230c9e4f ("i2c: ChromeOS EC tunnel driver")
Cc: <stable@vger.kernel.org> # 9798ac6d32 mfd: cros_ec: Add cros_ec_cmd_xfer_status() helper
Signed-off-by: Brian Norris <briannorris@chromium.org>
Reviewed-by: Javier Martinez Canillas <javier@osg.samsung.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
We also need to revert the dynamic OF change, so we get a consistent
state again. Otherwise, we might have two devices enabled e.g. after
pinctrl setup fails.
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Cc: stable@kernel.org
The idea for type_check in dev_get_nest_level() was to count the number
of nested devices of the same type (currently, only macvlan or vlan
devices).
This prevented the false positive lockdep warning on configurations such
as:
eth0 <--- macvlan0 <--- vlan0 <--- macvlan1
However, this doesn't prevent a warning on a configuration such as:
eth0 <--- macvlan0 <--- vlan0
eth1 <--- vlan1 <--- macvlan1
In this case, all the locks end up with a nesting subclass of 1, so
lockdep thinks that there is still a deadlock:
- in the first case we have (macvlan_netdev_addr_lock_key, 1) and then
take (vlan_netdev_xmit_lock_key, 1)
- in the second case, we have (vlan_netdev_xmit_lock_key, 1) and then
take (macvlan_netdev_addr_lock_key, 1)
By removing the linktype check in dev_get_nest_level() and always
incrementing the nesting depth, lockdep considers this configuration
valid.
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, trying to setup a vlan over a macsec device, or other
combinations of devices, triggers a lockdep warning.
Use netdev_lockdep_set_classes and ndo_get_lock_subclass, similar to
what macvlan does.
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
If IPv6 is disabled when the option is set to keep IPv6
addresses on link down, userspace is unaware of this as
there is no such indication via netlink. The solution is to
remove the IPv6 addresses in this case, which results in
netlink messages indicating removal of addresses in the
usual manner. This fix also makes the behavior consistent
with the case of having IPv6 disabled first, which stops
IPv6 addresses from being added.
Fixes: f1705ec197 ("net: ipv6: Make address flushing on ifdown optional")
Signed-off-by: Mike Manning <mmanning@brocade.com>
Acked-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
sctp_transport_seq_start() does not currently clear iter->start_fail on
success, but relies on it being zero when it is allocated (by
seq_open_net()).
This can be a problem in the following sequence:
open() // allocates iter (and implicitly sets iter->start_fail = 0)
read()
- iter->start() // fails and sets iter->start_fail = 1
- iter->stop() // doesn't call sctp_transport_walk_stop() (correct)
read() again
- iter->start() // succeeds, but doesn't change iter->start_fail
- iter->stop() // doesn't call sctp_transport_walk_stop() (wrong)
We should initialize sctp_ht_iter::start_fail to zero if ->start()
succeeds, otherwise it's possible that we leave an old value of 1 there,
which will cause ->stop() to not call sctp_transport_walk_stop(), which
causes all sorts of problems like not calling rcu_read_unlock() (and
preempt_enable()), eventually leading to more warnings like this:
BUG: sleeping function called from invalid context at mm/slab.h:388
in_atomic(): 0, irqs_disabled(): 0, pid: 16551, name: trinity-c2
Preemption disabled at:[<ffffffff819bceb6>] rhashtable_walk_start+0x46/0x150
[<ffffffff81149abb>] preempt_count_add+0x1fb/0x280
[<ffffffff83295892>] _raw_spin_lock+0x12/0x40
[<ffffffff819bceb6>] rhashtable_walk_start+0x46/0x150
[<ffffffff82ec665f>] sctp_transport_walk_start+0x2f/0x60
[<ffffffff82edda1d>] sctp_transport_seq_start+0x4d/0x150
[<ffffffff81439e50>] traverse+0x170/0x850
[<ffffffff8143aeec>] seq_read+0x7cc/0x1180
[<ffffffff814f996c>] proc_reg_read+0xbc/0x180
[<ffffffff813d0384>] do_loop_readv_writev+0x134/0x210
[<ffffffff813d2a95>] do_readv_writev+0x565/0x660
[<ffffffff813d6857>] vfs_readv+0x67/0xa0
[<ffffffff813d6c16>] do_preadv+0x126/0x170
[<ffffffff813d710c>] SyS_preadv+0xc/0x10
[<ffffffff8100334c>] do_syscall_64+0x19c/0x410
[<ffffffff83296225>] return_from_SYSCALL_64+0x0/0x6a
[<ffffffffffffffff>] 0xffffffffffffffff
Notice that this is a subtly different stacktrace from the one in commit
5fc382d875 ("net/sctp: terminate rhashtable walk correctly").
Cc: Xin Long <lucien.xin@gmail.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Acked-By: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If iriap_register_lsap() fails to allocate memory, self->lsap is
set to NULL. However, none of the callers handle the failure and
irlmp_connect_request() will happily dereference it:
iriap_register_lsap: Unable to allocated LSAP!
================================================================================
UBSAN: Undefined behaviour in net/irda/irlmp.c:378:2
member access within null pointer of type 'struct lsap_cb'
CPU: 1 PID: 15403 Comm: trinity-c0 Not tainted 4.8.0-rc1+ #81
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org
04/01/2014
0000000000000000 ffff88010c7e78a8 ffffffff82344f40 0000000041b58ab3
ffffffff84f98000 ffffffff82344e94 ffff88010c7e78d0 ffff88010c7e7880
ffff88010630ad00 ffffffff84a5fae0 ffffffff84d3f5c0 000000000000017a
Call Trace:
[<ffffffff82344f40>] dump_stack+0xac/0xfc
[<ffffffff8242f5a8>] ubsan_epilogue+0xd/0x8a
[<ffffffff824302bf>] __ubsan_handle_type_mismatch+0x157/0x411
[<ffffffff83b7bdbc>] irlmp_connect_request+0x7ac/0x970
[<ffffffff83b77cc0>] iriap_connect_request+0xa0/0x160
[<ffffffff83b77f48>] state_s_disconnect+0x88/0xd0
[<ffffffff83b78904>] iriap_do_client_event+0x94/0x120
[<ffffffff83b77710>] iriap_getvaluebyclass_request+0x3e0/0x6d0
[<ffffffff83ba6ebb>] irda_find_lsap_sel+0x1eb/0x630
[<ffffffff83ba90c8>] irda_connect+0x828/0x12d0
[<ffffffff833c0dfb>] SYSC_connect+0x22b/0x340
[<ffffffff833c7e09>] SyS_connect+0x9/0x10
[<ffffffff81007bd3>] do_syscall_64+0x1b3/0x4b0
[<ffffffff845f946a>] entry_SYSCALL64_slow_path+0x25/0x25
================================================================================
The bug seems to have been around since forever.
There's more problems with missing error checks in iriap_init() (and
indeed all of irda_init()), but that's a bigger problem that needs
very careful review and testing. This patch will fix the most serious
bug (as it's easily reached from unprivileged userspace).
I have tested my patch with a reproducer.
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pass the correct type __wsum to csum_sub() and csum_add(). This doesn't
really change anything since __wsum really *is* __be32, but removes the
address space warnings from sparse.
Cc: Eric Dumazet <edumazet@google.com>
Fixes: 34ae6a1aa0 ("ipv6: update skb->csum when CE mark is propagated")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix the bpf_try_make_writable() helper and all call sites we have in BPF,
it's currently defect with regards to skbs when the write_len spans into
non-linear parts, no matter if cloned or not.
There are multiple issues at once. First, using skb_store_bits() is not
correct since even if we have a cloned skb, page frags can still be shared.
To really make them private, we need to pull them in via __pskb_pull_tail()
first, which also gets us a private head via pskb_expand_head() implicitly.
This is for helpers like bpf_skb_store_bytes(), bpf_l3_csum_replace(),
bpf_l4_csum_replace(). Really, the only thing reasonable and working here
is to call skb_ensure_writable() before any write operation. Meaning, via
pskb_may_pull() it makes sure that parts we want to access are pulled in and
if not does so plus unclones the skb implicitly. If our write_len still fits
the headlen and we're cloned and our header of the clone is not writable,
then we need to make a private copy via pskb_expand_head(). skb_store_bits()
is a bit misleading and only safe to store into non-linear data in different
contexts such as 357b40a18b ("[IPV6]: IPV6_CHECKSUM socket option can
corrupt kernel memory").
For above BPF helper functions, it means after fixed bpf_try_make_writable(),
we've pulled in enough, so that we operate always based on skb->data. Thus,
the call to skb_header_pointer() and skb_store_bits() becomes superfluous.
In bpf_skb_store_bytes(), the len check is unnecessary too since it can
only pass in maximum of BPF stack size, so adding offset is guaranteed to
never overflow. Also bpf_l3/4_csum_replace() helpers must test for proper
offset alignment since they use __sum16 pointer for writing resulting csum.
The remaining helpers that change skb data not discussed here yet are
bpf_skb_vlan_push(), bpf_skb_vlan_pop() and bpf_skb_change_proto(). The
vlan helpers internally call either skb_ensure_writable() (pop case) and
skb_cow_head() (push case, for head expansion), respectively. Similarly,
bpf_skb_proto_xlat() takes care to not mangle page frags.
Fixes: 608cd71a9c ("tc: bpf: generalize pedit action")
Fixes: 91bc4822c3 ("tc: bpf: add checksum helpers")
Fixes: 3697649ff2 ("bpf: try harder on clones when writing into skb")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the missing of_node_put() after finishing the usage
of of_parse_phandle() or of_node_get() used by fixed_phy.
Signed-off-by: Sean Wang <sean.wang@mediatek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
To fix runtime warning with lockdep is enabled due that u64_stats_sync
is not initialized well, so add it.
Signed-off-by: Sean Wang <sean.wang@mediatek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, if calipso_genopt fails then the error exit path
does not free the ipv6_opt_hdr new causing a memory leak. Fix
this by kfree'ing new on the error exit path.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
While hashing out BPF's current_task_under_cgroup helper bits, it came
to discussion that the skb_in_cgroup helper name was suboptimally chosen.
Tejun says:
So, I think in_cgroup should mean that the object is in that
particular cgroup while under_cgroup in the subhierarchy of that
cgroup. Let's rename the other subhierarchy test to under too. I
think that'd be a lot less confusing going forward.
[...]
It's more intuitive and gives us the room to implement the real
"in" test if ever necessary in the future.
Since this touches uapi bits, we need to change this as long as v4.8
is not yet officially released. Thus, change the helper enum and rename
related bits.
Fixes: 4a482f34af ("cgroup: bpf: Add bpf_skb_in_cgroup_proto")
Reference: http://patchwork.ozlabs.org/patch/658500/
Suggested-by: Sargun Dhillon <sargun@sargun.me>
Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Update some documentation related to system sleep to document new
features and remove outdated information from it.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Pavel Machek <pavel@ucw.cz>
Reviewed-by: Chen Yu <yu.c.chen@intel.com>
When CONFIG_NET_DSA_HWMON is disabled, we get warnings about two unused
functions whose only callers are all inside of an #ifdef:
drivers/net/dsa/mv88e6xxx.c:3257:12: 'mv88e6xxx_mdio_page_write' defined but not used [-Werror=unused-function]
drivers/net/dsa/mv88e6xxx.c:3244:12: 'mv88e6xxx_mdio_page_read' defined but not used [-Werror=unused-function]
This adds another ifdef around the function definitions. The warnings
appeared after the functions were marked 'static', but the problem
was already there before that.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: 57d3231057 ("net: dsa: mv88e6xxx: fix style issues")
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently we free the resources backing the enclosure device before we
call device_unregister(). This is racy: during rmmod of low-level SCSI
drivers that hook into enclosure, we end up with a small window of time
during which writing to /sys can OOPS. Example trace with mpt3sas:
general protection fault: 0000 [#1] SMP KASAN
Modules linked in: mpt3sas(-) <...>
RIP: [<ffffffffa0388a98>] ses_get_page2_descriptor.isra.6+0x38/0x220 [ses]
Call Trace:
[<ffffffffa0389d14>] ses_set_fault+0xf4/0x400 [ses]
[<ffffffffa0361069>] set_component_fault+0xa9/0xf0 [enclosure]
[<ffffffff8205bffc>] dev_attr_store+0x3c/0x70
[<ffffffff81677df5>] sysfs_kf_write+0x115/0x180
[<ffffffff81675725>] kernfs_fop_write+0x275/0x3a0
[<ffffffff8151f810>] __vfs_write+0xe0/0x3e0
[<ffffffff8152281f>] vfs_write+0x13f/0x4a0
[<ffffffff81526731>] SyS_write+0x111/0x230
[<ffffffff828b401b>] entry_SYSCALL_64_fastpath+0x13/0x94
Fortunately the solution is extremely simple: call device_unregister()
before we free the resources, and the race no longer exists. The driver
core holds a reference over ->remove_dev(), so AFAICT this is safe.
Signed-off-by: Calvin Owens <calvinowens@fb.com>
Reviewed-by: James Bottomley <jejb@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
commit 4fcd504edb ("power: reset: add reboot mode driver") uses api from
syscon, and syscon uses ioremap/iounmap which depends on HAS_IOMEM, so
let's depend on MFD_SYSCON instead of selecting it directly to avoid the
um-allyesconfig like build error on archs that without iomem:
drivers/mfd/syscon.c: In function 'of_syscon_register':
drivers/mfd/syscon.c:67:9: error: implicit declaration of function 'ioremap' [-Werror=implicit-function-declaration]
base = ioremap(res.start, resource_size(&res));
^
drivers/mfd/syscon.c:67:7: warning: assignment makes pointer from integer without a cast [-Wint-conversion]
base = ioremap(res.start, resource_size(&res));
^
drivers/mfd/syscon.c:109:2: error: implicit declaration of function 'iounmap' [-Werror=implicit-function-declaration]
iounmap(base);
^
Reported-by: Kbuild test robot <fengguang.wu@intel.com>
Fixes: 4fcd504edbf7("power: reset: add reboot mode driver")
Signed-off-by: Andy Yan <andy.yan@rock-chips.com>
Reviewed-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Signed-off-by: Sebastian Reichel <sre@kernel.org>
The device's model download function returns the model data as
an array of u32s, which is later compared to the reference
model data. However, since the latter is an array of u16s,
the comparison does not happen correctly, and model verification
fails. This in turn breaks the POR initialization sequence.
Fixes: 39e7213edc ("max17042_battery: Support regmap to access device's registers")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Sven Van Asbroeck <TheSven73@googlemail.com>
Reviewed-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Signed-off-by: Sebastian Reichel <sre@kernel.org>
This should eliminate a whole class of markup warnings, at the cost of
occasionally amusing markup choices; we'll have to see if it works out.
Suggested-by: Markus Heiser <markus.heiser@darmarIT.de>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
With the commit 44a7185c2a ("of/platform: Add common method to
populate default bus"), a default function is introduced to populate
the default bus and this function is invoked at the arch_initcall_sync
level. But a lot of ppc boards use machine_device_initcall() to
populate the default bus. This means that the default populate function
has higher priority and would override the arch specific population of
the bus. The side effect is that some arch specific bus are not probed,
then cause various malfunction due to the miss of some devices. Since
it is very possible to introduce bugs if we simply change the initcall
level for all these boards(about 30+). This just disable this default
function for all the ppc boards.
Signed-off-by: Kevin Hao <haokexin@gmail.com>
Signed-off-by: Rob Herring <robh@kernel.org>
Michael reported 'perf mem -t store record' being broken. The reason is
latest rework of this area:
commit acbe613e0c ("perf tools: Add monitored events array")
We don't mark perf_mem_events store record when -t store option is
specified.
Committer notes:
Before:
# perf mem -t store record usleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.020 MB perf.data (7 samples) ]
# perf evlist
cycles:ppp
#
After:
# perf mem -t store record usleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.020 MB perf.data (7 samples) ]
# perf evlist
cpu/mem-stores/P
#
Reported-by: Michael Petlan <mpetlan@redhat.com>
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Fixes: acbe613e0c ("perf tools: Add monitored events array")
Link: http://lkml.kernel.org/r/1470905457-18311-1-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The June 2015 Intel SDM introduced IP Compression types 4 and 6. Refer
to section 36.4.2.2 Target IP (TIP) Packet - IP Compression.
Existing Intel PT packet decoder did not support type 4, and got type 6
wrong. Because type 3 and type 4 have the same number of bytes, the
packet 'count' has been changed from being the number of ip bytes to
being the type code. That allows the Intel PT decoder to correctly
decide whether to sign-extend or use the last ip. However that also
meant the code had to be adjusted in a number of places.
Currently hardware is not using the new compression types, so this fix
has no effect on existing hardware.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1469005206-3049-1-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Johan writes:
USB-serial fixes for v4.8-rc2
Here is a fix of a memory leak in a driver-registration error path, and
some new device ids.
Signed-off-by: Johan Hovold <johan@kernel.org>
Fix to return error code -ENODEV from the error handling
case instead of 0, as done elsewhere in this function.
Fixes: 87b2bdf022 ("ASoC: Intel: Skylake: Initialize NHLT table")
Signed-off-by: Wei Yongjun <weiyj.lk@gmail.com>
Acked-By: Vinod Koul <vinod.kou@intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Some AMD fixes and remove workaround now we have pcieport pm.
* 'drm-fixes-4.8' of git://people.freedesktop.org/~agd5f/linux:
drm/amdgpu: Fix memory trashing if UVD ring test fails
drm/amdgpu: fix vm init error path
Revert "drm/radeon: work around lack of upstream ACPI support for D3cold"
Revert "drm/amdgpu: work around lack of upstream ACPI support for D3cold"
Simple amdkfd fix.
* tag 'drm-amdkfd-fixes-2016-08-09' of git://people.freedesktop.org/~gabbayo/linux:
drm/amdkfd: print doorbell offset as a hex value
Coverity reports:
result_independent_of_operands: data->features & (65536UL /* 1UL << 16 */)
is always 0 regardless of the values of its operands. This occurs as the
logical operand of if.
data->features needs to be 32 bit wide since there are more than 16 features.
Fixes: cc18da79d9 ("hwmon: (it87) Support up to 6 temperature sensors ... ");
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
macsec_notify() loops over the list of macsec devices configured on the
underlying device when this device is being removed. This list is part
of the rx_handler data.
However, macsec_dellink unregisters the rx_handler and frees the
rx_handler data when the last macsec device is removed from the
underlying device.
Add macsec_common_dellink() to delete macsec devices without
unregistering the rx_handler and freeing the associated data.
Fixes: 960d5848db ("macsec: fix memory leaks around rx_handler (un)registration")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
We've clean skb_array in macvtap_put_queue() but still try to pop from
it during macvtap_sock_destruct(). Fix this use after free by moving
the skb array cleanup to macvtap_sock_destruct() instead.
Fixes: 362899b872 ("macvtap: switch to use skb array")
Reported-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Tested-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In sg_timeout(), req->status is set to "-ETIMEDOUT" before calling
into usb_sg_cancel(). usb_sg_cancel() will do nothing and return
directly if req->status has been set to a non-zero value. This will
cause driver hang whenever transfer time out is triggered.
This patch fixes this issue. It could be backported to stable kernel
with version later than v3.15.
Cc: stable@vger.kernel.org # 3.15+
Cc: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Suggested-by: Alan Stern <stern@rowland.harvard.edu>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Felipe writes:
usb: fixes for v4.8-rc1
First set of fixes for v4.8-rc cycle. Again, dwc3 is
the most active driver with over 60% of this pull
request touching it.
The most important fixes are related to scatterlist
usage with dwc3. Before this pull request, we were
increment request->actual multiple times and this
would result in request->actual being larger than
request->length.
Also, if a we received a short packet midway through
processing a scatterlist, we were not clearning HWO
bit as we should.
Other than the large dwc3 scatterlist fixes, we have
a new Device ID for Intel's Kabylake silicon.
Other drivers, such as fsl_qe_udc and renesas udc,
also got a few minor fixes. Details are in shortlog.
Anything that sets ret in wm2000_anc_transition will have immediately
returned anyway as such we will always return an uninitialised ret at the
bottom of the function. Simply replace the return with a return 0;
Signed-off-by: Charles Keepax <ckeepax@opensource.wolfsonmicro.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
If we fail to find a platform we simply return EPROBE_DEFER,
but we have allocated the rtd pointer. All error paths before
soc_add_pcm_runtime need to call soc_free_pcm_runtime first to
avoid leaking the rtd pointer. A suitable error path already
exists and is used else where in the function so simply use that
here as well.
Signed-off-by: Charles Keepax <ckeepax@opensource.wolfsonmicro.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
stop consuming TRBs when we reach one with HWO bit
already set. This will prevent us from prematurely
retiring a TRB.
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
According to Synopsys Databook 2.60a, section 8.3.4,
it's stated that:
The LST bit should be set to 0 (isochronous
transfers normally continue until the
endpoint is removed entirely, at which time
an End Transfer command is used to stop the
transfer).
This patch makes sure that detail is observed and
fixes a regression with Android Audio playback
caused by recent changes to DWC3 gadget.
Signed-off-by: Janusz Dziedzic <januszx.dziedzic@linux.intel.com>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
When rndis data transfer is in progress, some Windows7 Host PC is not
sending the GET_ENCAPSULATED_RESPONSE command for receiving the response
for the previous SEND_ENCAPSULATED_COMMAND processed.
The rndis function driver appends each response for the
SEND_ENCAPSULATED_COMMAND in a queue. As the above process got corrupted,
the Host sends a REMOTE_NDIS_RESET_MSG command to do a soft-reset.
As the rndis response queue is not freed, the previous response is sent
as a part of this REMOTE_NDIS_RESET_MSG's reset response and the Host
block any more Rndis transfers.
Hence free the rndis response queue as a part of this soft-reset so that
the correct response for REMOTE_NDIS_RESET_MSG is sent properly during the
response command.
Signed-off-by: Rajkumar Raghupathy <raghup@codeaurora.org>
Signed-off-by: Xerox Lin <xerox_lin@htc.com>
[AmitP: Cherry-picked this patch and folded other relevant
fixes from Android common kernel android-4.4]
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
The udc device needs to be deleted if error occurs
Fixes: 855ed04a37 ("usb: gadget: udc-core: independent registration of
gadgets and gadget drivers")
Signed-off-by: Peter Chen <peter.chen@nxp.com>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
In 'composite_os_desc_req_prepare', if one of the memory allocations fail,
0 will be returned, which means success.
We should return -ENOMEM instead.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
When reading synchronously from a non-zero endpoint, gadgetfs will
return -EFAULT even if the read succeeds, due to a bad check of the
copy_to_iter() return value.
This fix compares the return value of copy_to_iter to the amount of
bytes that was passed, and only fails if they are not the same.
Signed-off-by: Binyamin Sharet <s.binyamin@gmail.com>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
For case 14 and case 21, their correct return value is the number
of bytes transferred, so it is a positive integer. But in usbtest_ioctl,
it takes non-zero as false return value for usbtest_do_ioctl, so
it will treat the correct test as wrong test, then the time on
tests will be the minus value.
Signed-off-by: Peter Chen <peter.chen@nxp.com>
Cc: stable <stable@vger.kernel.org>
Fixes: 18fc4ebdc7 ("usb: misc: usbtest: Remove timeval usage")
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
Add missing platform_set_drvdata() in dwc3_of_simple_probe(), otherwise
calling platform_get_drvdata() in remove returns NULL.
This is detected by Coccinelle semantic patch.
Signed-off-by: Wei Yongjun <weiyj.lk@gmail.com>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
Add missing platform_set_drvdata() in omap_otg_probe(), otherwise
calling platform_get_drvdata() in remove returns NULL.
This is detected by Coccinelle semantic patch.
Signed-off-by: Wei Yongjun <weiyj.lk@gmail.com>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
There may be a race condition if f_fs calls unregister_gadget_item in
ffs_closed() when unregister_gadget is called by UDC store at the same time.
this leads to a kernel NULL pointer dereference:
[ 310.644928] Unable to handle kernel NULL pointer dereference at virtual address 00000004
[ 310.645053] init: Service 'adbd' is being killed...
[ 310.658938] pgd = c9528000
[ 310.662515] [00000004] *pgd=19451831, *pte=00000000, *ppte=00000000
[ 310.669702] Internal error: Oops: 817 [#1] PREEMPT SMP ARM
[ 310.675211] Modules linked in:
[ 310.678294] CPU: 0 PID: 1537 Comm: ->transport Not tainted 4.1.15-03725-g793404c #2
[ 310.685958] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
[ 310.692493] task: c8e24200 ti: c945e000 task.ti: c945e000
[ 310.697911] PC is at usb_gadget_unregister_driver+0xb4/0xd0
[ 310.703502] LR is at __mutex_lock_slowpath+0x10c/0x16c
[ 310.708648] pc : [<c075efc0>] lr : [<c0bfb0bc>] psr: 600f0113
<snip..>
[ 311.565585] [<c075efc0>] (usb_gadget_unregister_driver) from [<c075e2b8>] (unregister_gadget_item+0x1c/0x34)
[ 311.575426] [<c075e2b8>] (unregister_gadget_item) from [<c076fcc8>] (ffs_closed+0x8c/0x9c)
[ 311.583702] [<c076fcc8>] (ffs_closed) from [<c07736b8>] (ffs_data_reset+0xc/0xa0)
[ 311.591194] [<c07736b8>] (ffs_data_reset) from [<c07738ac>] (ffs_data_closed+0x90/0xd0)
[ 311.599208] [<c07738ac>] (ffs_data_closed) from [<c07738f8>] (ffs_ep0_release+0xc/0x14)
[ 311.607224] [<c07738f8>] (ffs_ep0_release) from [<c023e030>] (__fput+0x80/0x1d0)
[ 311.614635] [<c023e030>] (__fput) from [<c014e688>] (task_work_run+0xb0/0xe8)
[ 311.621788] [<c014e688>] (task_work_run) from [<c010afdc>] (do_work_pending+0x7c/0xa4)
[ 311.629718] [<c010afdc>] (do_work_pending) from [<c010770c>] (work_pending+0xc/0x20)
for functions using functionFS, i.e. android adbd will close /dev/usb-ffs/adb/ep0
when usb IO thread fails, but switch adb from on to off also triggers write
"none" > UDC. These 2 operations both call unregister_gadget, which will lead
to the panic above.
add a mutex before calling unregister_gadget for api used in f_fs.
Signed-off-by: Winter Wang <wente.wang@nxp.com>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
dev->port_usb is checked for null pointer at above code, so dev->port_usb
might be null, fix it by adding null pointer check.
Signed-off-by: Peter Chen <peter.chen@nxp.com>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
cdev->config is checked for null pointer at above code, so cdev->config
might be null, fix it by adding null pointer check.
Signed-off-by: Peter Chen <peter.chen@nxp.com>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
This patch fixes an issue that isochronous transfer's data is possible to
be lost as a workaround. Since this driver uses a workqueue to start
the dmac, the transfer is possible to be delayed when system load is high.
Fixes: 6e4b74e469 ("usb: renesas: fix scheduling in atomic context bug")
Cc: <stable@vger.kernel.org> # v3.4+
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
This patch fixes an issue that unexpected BRDY interruption happens
when the usb_ep_{enable,disable}() are called with different direction.
In this case, the driver will cause the following message:
renesas_usbhs e6590000.usb: irq_ready run_error 1 : -16
This issue causes the followings:
1) A pipe is enabled as transmission
2) The pipe sent a data
3) The pipe is disabled and re-enabled as reception.
4) The pipe got a queue
Since the driver doesn't clear the BRDYSTS flags after 2) above, the issue
happens. If we add such clearing the flags into the driver, the code will
become complicate. So, this patch clears the BRDYSTS flag of reception in
usbhsg_ep_enable() to avoid complicate.
Cc: <stable@vger.kernel.org> # v4.1+ (usbhs_xxxsts_clear() is needed)
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
Since R-Car Gen3 SoC has the USB-DMAC, this driver should set
dparam->has_usb_dmac to 1. Otherwise, behavior of this driver and
the usb-dmac driver will be mismatch, then sometimes receiving data will
be corrupt.
Fixes: de18757e27 ("usb: renesas_usbhs: add R-Car Gen3 power control")
Cc: <stable@vger.kernel.org> # v4.5+
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
On LPAR the read message buffer command should be executed on the path
it was received on otherwise there is a chance that the CUIR assignment
might be faulty and the wrong channel path is set online/offline.
Fix by setting the path mask accordingly.
On z/VM we might not be able to do I/O on this path but there it does
not matter on which path the read message buffer command is executed.
Therefor implement a retry with an open path mask.
Signed-off-by: Stefan Haberland <sth@linux.vnet.ibm.com>
ARM SMCCC is only set for ARMv7 and ARMv8 CPUs, but we currently
allow the driver to be build for older architecture levels as
well, which results in a link failure:
drivers/gpu/built-in.o: In function `mtk_hdmi_hw_make_reg_writable':
:(.text+0x1e737c): undefined reference to `arm_smccc_smc'
This adds a Kconfig dependency. The patch applies on my two
previous fixes that are not yet applied, so please apply all
three to get randconfig builds to work correctly.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: 8f83f26891 ("drm/mediatek: Add HDMI support")
Reviewed-by: Matthias Brugger <matthias.bgg@gmail.com>
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
The mediatek DRM driver can be configured for compile testing with
CONFIG_OF disabled, but then fails to link:
drivers/gpu/built-in.o: In function `mtk_drm_bind':
analogix_dp_reg.c:(.text+0x52888): undefined reference to `of_find_device_by_node'
analogix_dp_reg.c:(.text+0x52930): undefined reference to `of_find_device_by_node'
This adds an explicit Kconfig dependency.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://patchwork.kernel.org/patch/9120871/
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
On kernel builds without COMMON_CLK, the newly added mediatek drm
driver fails to build:
drivers/gpu/drm/mediatek/mtk_mipi_tx.c:130:16: error: field 'pll_hw' has incomplete type
struct clk_hw pll_hw;
^~~~~~
In file included from ../include/linux/clk.h:16:0,
from ../drivers/gpu/drm/mediatek/mtk_mipi_tx.c:14:
drivers/gpu/drm/mediatek/mtk_mipi_tx.c: In function 'mtk_mipi_tx_from_clk_hw':
include/linux/kernel.h:831:48: error: initialization from incompatible pointer type [-Werror=incompatible-pointer-types]
const typeof( ((type *)0)->member ) *__mptr = (ptr); \
^
/drivers/gpu/drm/mediatek/mtk_mipi_tx.c:136:9: note: in expansion of macro 'container_of'
return container_of(hw, struct mtk_mipi_tx, pll_hw);
^~~~~~~~~~~~
drivers/gpu/drm/mediatek/mtk_mipi_tx.c: At top level:
drivers/gpu/drm/mediatek/mtk_mipi_tx.c:302:21: error: variable 'mtk_mipi_tx_pll_ops' has initializer but incomplete type
static const struct clk_ops mtk_mipi_tx_pll_ops = {
This adds the required Kconfig dependency.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Philipp Zabel <p.zabel@pengutronix.de>
Link: https://patchwork.kernel.org/patch/9069061/
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
The creation of a tunnel vport (geneve, gre, vxlan) brings up a
corresponding netdev, a multi-step operation which can fail.
For example, changing a vxlan vport's netdev state to 'up' binds the
vport's socket to a UDP port - if the binding fails (e.g. due to the
port being in use), the error is currently ignored giving the
appearance that the tunnel vport creation completed successfully.
Signed-off-by: Martynas Pumputis <martynas@weave.works>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit b195d5e2bf ("ipr: Wait to do async scan until scsi host is
initialized") fixed async scan for ipr, but broke sync scan. This fixes
sync scan back up.
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Tested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Found one megaraid_sas HBA probe fails,
[ 187.235190] scsi host2: Avago SAS based MegaRAID driver
[ 191.112365] megaraid_sas 0000:89:00.0: BAR 0: can't reserve [io 0x0000-0x00ff]
[ 191.120548] megaraid_sas 0000:89:00.0: IO memory region busy!
and the card has resource like,
[ 125.097714] pci 0000:89:00.0: [1000:005d] type 00 class 0x010400
[ 125.104446] pci 0000:89:00.0: reg 0x10: [io 0x0000-0x00ff]
[ 125.110686] pci 0000:89:00.0: reg 0x14: [mem 0xce400000-0xce40ffff 64bit]
[ 125.118286] pci 0000:89:00.0: reg 0x1c: [mem 0xce300000-0xce3fffff 64bit]
[ 125.125891] pci 0000:89:00.0: reg 0x30: [mem 0xce200000-0xce2fffff pref]
that does not io port resource allocated from BIOS, and kernel can not
assign one as io port shortage.
The driver is only looking for MEM, and should not fail.
It turns out megasas_init_fw() etc are using bar index as mask. index 1
is used as mask 1, so that pci_request_selected_regions() is trying to
request BAR0 instead of BAR1.
Fix all related reference.
Fixes: b6d5d8808b ("megaraid_sas: Use lowest memory bar for SR-IOV VF support")
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Acked-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In commit cf6f7e1d51 ("tipc: dump monitor attributes"),
I dereferenced a pointer before checking if its valid.
This is reported by static check Smatch as:
net/tipc/monitor.c:733 tipc_nl_add_monitor_peer()
warn: variable dereferenced before check 'mon' (see line 731)
In this commit, we check for a valid monitor before proceeding
with any other operation.
Fixes: cf6f7e1d51 ("tipc: dump monitor attributes")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This function would call drm_modeset_lock_all, while the suspend/resume
functions already have their own locking. Fix this by factoring out
__intel_display_resume, and calling the atomic helpers for duplicating
atomic state and disabling all crtc's during suspend.
Changes since v1:
- Deal with -EDEADLK right after lock_all and clean up calls
to hw readout.
- Always take all modeset locks so updates during gpu reset are blocked.
Changes since v2:
- Fix deadlock in intel_update_primary_planes.
- Move WARN_ON(EDEADLK) to __intel_display_resume.
- pctx -> ctx
- only call __intel_display_resume on success in intel_display_resume.
Changes since v3:
- Rebase on top of dev_priv -> dev change.
- Use drm_modeset_lock_all_ctx instead of drm_modeset_lock_all.
Changes since v4 [by vsyrjala]:
- Deal with skip_intermediate_wm
- Update comment w.r.t. mode_config.mutex vs. ->detect()
- Rebase due to INTEL_GEN() etc.
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Fixes: e2c8b8701e ("drm/i915: Use atomic helpers for suspend, v2.")
Cc: stable@vger.kernel.org
Tested-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470428910-12125-2-git-send-email-ville.syrjala@linux.intel.com
(cherry picked from commit 7397489399)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Erratum SKL075: Display Flicker May Occur When Both VT-d And FBC Are Enabled
"Display flickering may occur when both FBC (Frame Buffer Compression)
and VT - d (Intel® Virtualization Technology for Directed I/O) are enabled
and in use by the display controller."
Ville found the w/a name in the database:
WaFbcTurnOffFbcWhenHyperVisorIsUsed:skl,bxt and also dug out that it
affects Broxton.
v2: Log when the quirk is applied.
v3: Ensure i915.enable_fbc is false when !HAS_FBC()
v4: Fix function name after rebase
v5: Add Broxton to the workaround
Note for backporting to stable, we need to add
#define mkwrite_device_info(ptr) \
((struct intel_device_info *)INTEL_INFO(ptr))
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: stable@vger.kernel.org
Link: http://patchwork.freedesktop.org/patch/msgid/1470296633-20388-1-git-send-email-chris@chris-wilson.co.uk
(cherry picked from commit 36dbc4d769)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for your net tree,
they are:
1) Use mod_timer_pending() to avoid reactivating a dead expectation in
the h323 conntrack helper, from Liping Zhang.
2) Oneliner to fix a type in the register name defined in the nf_tables
header.
3) Don't try to look further when we find an inactive elements with no
descendants in the rbtree set implementation, otherwise we crash.
4) Handle valid zero CSeq in the SIP conntrack helper, from
Christophe Leroy.
5) Don't display a trailing slash in conntrack helper with no classes
via /proc/net/nf_conntrack_expect, from Liping Zhang.
6) Fix an expectation leak during creation from the nfqueue path, again
from Liping Zhang.
7) Validate netlink port ID in verdict message from nfqueue, otherwise
an injection can be possible. Again from Zhang.
8) Reject conntrack tuples with different transport protocol on
original and reply tuples, also from Zhang.
9) Validate offset and length in nft_exthdr, make sure they are under
sizeof(u8), from Laura Garcia Liebana.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
For resources shared by all cores such as SLC and IOC, only the master
core needs to do any setups / enabling / disabling etc.
Cc: <stable@vger.kernel.org>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
trace_hardirqs_on_caller() in lockdep.c expects to be called before, not
after interrupts are actually enabled.
The following comment in kernel/locking/lockdep.c substantiates this
claim:
"
/*
* We're enabling irqs and according to our state above irqs weren't
* already enabled, yet we find the hardware thinks they are in fact
* enabled.. someone messed up their IRQ state tracing.
*/
"
An example can be found in include/linux/irqflags.h:
do { trace_hardirqs_on(); raw_local_irq_enable(); } while (0)
Without this change, we hit the following DEBUG_LOCKS_WARN_ON.
[ 7.760000] ------------[ cut here ]------------
[ 7.760000] WARNING: CPU: 0 PID: 1 at kernel/locking/lockdep.c:2711 resume_user_mode_begin+0x48/0xf0
[ 7.770000] DEBUG_LOCKS_WARN_ON(!irqs_disabled())
[ 7.780000] Modules linked in:
[ 7.780000] CPU: 0 PID: 1 Comm: init Not tainted 4.7.0-00003-gc668bb9-dirty #366
[ 7.790000]
[ 7.790000] Stack Trace:
[ 7.790000] arc_unwind_core.constprop.1+0xa4/0x118
[ 7.800000] warn_slowpath_fmt+0x72/0x158
[ 7.800000] resume_user_mode_begin+0x48/0xf0
[ 7.810000] ---[ end trace 6f6a7a8fae20d2f0 ]---
Signed-off-by: Daniel Mentz <danielmentz@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
On x86 builds the absense of <linux/io.h> makes static analyzer and compiler
unhappy which fails to build the driver.
CHECK drivers/pinctrl/intel/pinctrl-merrifield.c
drivers/pinctrl/intel/pinctrl-merrifield.c:518:17:
error: undefined identifier 'readl'
drivers/pinctrl/intel/pinctrl-merrifield.c:570:17:
error: undefined identifier 'readl'
drivers/pinctrl/intel/pinctrl-merrifield.c:575:9:
error: undefined identifier 'writel'
drivers/pinctrl/intel/pinctrl-merrifield.c:645:17:
error: undefined identifier 'readl'
CC drivers/pinctrl/intel/pinctrl-merrifield.o
drivers/pinctrl/intel/pinctrl-merrifield.c: In function ‘mrfld_pin_dbg_show’:
drivers/pinctrl/intel/pinctrl-merrifield.c:518:10:
error: implicit declaration of function ‘readl’
[-Werror=implicit-function-declaration]
value = readl(bufcfg);
^
drivers/pinctrl/intel/pinctrl-merrifield.c: In function ‘mrfld_update_bufcfg’:
drivers/pinctrl/intel/pinctrl-merrifield.c:575:2:
error: implicit declaration of function ‘writel’
[-Werror=implicit-function-declaration]
writel(value, bufcfg);
^
cc1: some warnings being treated as errors
Add header to the top of the module.
Fixes: 4e80c8f505 ("pinctrl: intel: Add Intel Merrifield pin controller support")
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
In the function amd_gpio_irq_enable() and
amd_gpio_direction_input(), remove the code which is setting
the default de-bounce time to 2.75ms.
The driver code shall use the same settings as specified in
BIOS. Any default assignment impacts TouchPad behaviour when
the LevelTrig is set to EDGE FALLING.
Cc: stable@vger.kernel.org
Reviewed-by: Ken Xue <Ken.Xue@amd.com>
Signed-off-by: Nitesh Kumar Agrawal <Nitesh-kumar.Agrawal@amd.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
It's not necessary to unregister pin controller device registered
with devm_pinctrl_register() and using pinctrl_unregister() leads
to a double free.
This is detected by Coccinelle semantic patch.
Signed-off-by: Wei Yongjun <weiyj.lk@gmail.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
It's not necessary to unregister pin controller device registered
with devm_pinctrl_register() and using pinctrl_unregister() leads
to a double free.
This is detected by Coccinelle semantic patch.
Fixes: e649f7ec8c ("pinctrl: meson: Use devm_pinctrl_register() for pinctrl registration")
Signed-off-by: Wei Yongjun <weiyj.lk@gmail.com>
Reviewed-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Acked-by: Kevin Hilman <khilman@baylibre.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
DWC3 has one interesting peculiarity with chained
transfers. If we setup N chained transfers and we
get a short packet before processing all N TRBs,
DWC3 will (conditionally) issue a XferComplete or
XferInProgress event and retire all TRBs from the
one which got a short packet to the last without
clearing their HWO bits.
This means SW must clear HWO bit manually, which
this patch is doing.
Cc: <stable@vger.kernel.org>
Cc: Brian E Rogers <brian.e.rogers@intel.com>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
When using SG lists, we would end up setting
request->actual to:
num_mapped_sgs * (request->length - count)
Let's fix that up by incrementing request->actual
only once.
Cc: <stable@vger.kernel.org>
Reported-by: Brian E Rogers <brian.e.rogers@intel.com>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
Fix the direct assignment of offset and length attributes included in
nft_exthdr structure from u32 data to u8.
Signed-off-by: Laura Garcia Liebana <nevola@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Where a device driver has set a 64-bit DMA mask to indicate the absence
of addressing limitations, we still need to ensure that we don't
allocate IOVAs beyond the actual input size of the IOMMU. The reported
aperture is the most reliable way we have of inferring that input
address size, so use that to enforce a hard upper limit where available.
Fixes: 0db2e5d18f ("iommu: Implement common IOMMU ops for DMA mapping")
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Right now the following sequence of events can happen:
1. Thread X calls vgic_put_irq
2. Thread Y calls vgic_add_lpi
3. Thread Y gets lpi_list_lock
4. Thread X drops the ref count to 0 and blocks on lpi_list_lock
5. Thread Y finds the irq via the lpi_list_lock, raises the ref
count to 1, and release the lpi_list_lock.
6. Thread X proceeds and frees the irq.
Avoid this by holding the spinlock around the kref_put.
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
During low memory conditions, we could be dereferencing a NULL pointer
when vgic_add_lpi fails to allocate memory.
Consider for example this call sequence:
vgic_its_cmd_handle_mapi
itte->irq = vgic_add_lpi(kvm, lpi_nr);
update_lpi_config(kvm, itte->irq, NULL);
ret = kvm_read_guest(kvm, propbase + irq->intid
^^^^
kaboom?
Instead, return an error pointer from vgic_add_lpi and check the return
value from its single caller.
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Both set_memory_ro() and set_memory_rw() will modify the page
attributes of at least one page, even if the numpages parameter is
zero.
The author expected that calling these functions with numpages == zero
would never happen. However with the new 444d13ff10 ("modules: add
ro_after_init support") feature this happens frequently.
Therefore do the right thing and make these two functions return
gracefully if nothing should be done.
Fixes crashes on module load like this one:
Unable to handle kernel pointer dereference in virtual kernel address space
Failing address: 000003ff80008000 TEID: 000003ff80008407
Fault in home space mode while using kernel ASCE.
AS:0000000000d18007 R3:00000001e6aa4007 S:00000001e6a10800 P:00000001e34ee21d
Oops: 0004 ilc:3 [#1] SMP
Modules linked in: x_tables
CPU: 10 PID: 1 Comm: systemd Not tainted 4.7.0-11895-g3fa9045 #4
Hardware name: IBM 2964 N96 703 (LPAR)
task: 00000001e9118000 task.stack: 00000001e9120000
Krnl PSW : 0704e00180000000 00000000005677f8 (rb_erase+0xf0/0x4d0)
R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3
Krnl GPRS: 000003ff80008b20 000003ff80008b20 000003ff80008b70 0000000000b9d608
000003ff80008b20 0000000000000000 00000001e9123e88 000003ff80008950
00000001e485ab40 000003ff00000000 000003ff80008b00 00000001e4858480
0000000100000000 000003ff80008b68 00000000001d5998 00000001e9123c28
Krnl Code: 00000000005677e8: ec1801c3007c cgij %r1,0,8,567b6e
00000000005677ee: e32010100020 cg %r2,16(%r1)
#00000000005677f4: a78401c2 brc 8,567b78
>00000000005677f8: e35010080024 stg %r5,8(%r1)
00000000005677fe: ec5801af007c cgij %r5,0,8,567b5c
0000000000567804: e30050000024 stg %r0,0(%r5)
000000000056780a: ebacf0680004 lmg %r10,%r12,104(%r15)
0000000000567810: 07fe bcr 15,%r14
Call Trace:
([<000003ff80008900>] __this_module+0x0/0xffffffffffffd700 [x_tables])
([<0000000000264fd4>] do_init_module+0x12c/0x220)
([<00000000001da14a>] load_module+0x24e2/0x2b10)
([<00000000001da976>] SyS_finit_module+0xbe/0xd8)
([<0000000000803b26>] system_call+0xd6/0x264)
Last Breaking-Event-Address:
[<000000000056771a>] rb_erase+0x12/0x4d0
Kernel panic - not syncing: Fatal exception: panic_on_oops
Reported-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reported-and-tested-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Fixes: e8a97e42dc ("s390/pageattr: allow kernel page table splitting")
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
When a device is in a status where CIO has killed all I/O by itself the
interrupt for a clear request may not contain an irb to determine the
clear function. Instead it contains an error pointer -EIO.
This was ignored by the DASD int_handler leading to a hanging device
waiting for a clear interrupt.
Handle -EIO error pointer correctly for requests that are clear pending and
treat the clear as successful.
Signed-off-by: Stefan Haberland <sth@linux.vnet.ibm.com>
Reviewed-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Adding fdb entries pointing to the bridge device uses fdb_insert(),
which lacks various checks and does not respect added_by_user flag.
As a result, some inconsistent behavior can happen:
* Adding temporary entries succeeds but results in permanent entries.
* Same goes for "dynamic" and "use".
* Changing mac address of the bridge device causes deletion of
user-added entries.
* Replacing existing entries looks successful from userspace but actually
not, regardless of NLM_F_EXCL flag.
Use the same logic as other entries and fix them.
Fixes: 3741873b4f ("bridge: allow adding of fdb entries pointing to the bridge device")
Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Disable all interrupts when suspend, they will be enabled
when resume. Otherwise, the suspend/resume process will be
blocked occasionally.
Signed-off-by: Wenyou Yang <wenyou.yang@atmel.com>
Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit b5a099c67a "net: ethernet: davicom: fix devicetree irq
resource" causes an interrupt storm after the ethernet interface
is activated on S3C24XX platform (ARM non-dt), due to the interrupt
trigger type not being set properly.
It seems, after adding parsing of IRQ flags in commit 7085a7401b
"drivers: platform: parse IRQ flags from resources", there is no path
for non-dt platforms where irq_set_type callback could be invoked when
we don't pass the trigger type flags to the request_irq() call.
In case of a board where the regression is seen the interrupt trigger
type flags are passed through a platform device's resource and it is
not currently handled properly without passing the irq trigger type
flags to the request_irq() call. In case of OF an of_irq_get() call
within platform_get_irq() function seems to be ensuring required irq_chip
setup, but there is no equivalent code for non OF/ACPI platforms.
This patch mostly restores irq trigger type setting code which has been
removed in commit ("net: ethernet: davicom: fix devicetree irq resource").
Fixes: b5a099c67a ("net: ethernet: davicom: fix devicetree irq resource")
Signed-off-by: Sylwester Nawrocki <s.nawrocki@samsung.com>
Acked-by: Robert Jarzmik <robert.jarzmik@free.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
During boot, sometimes the kernel will test to see if an instruction
causes an undefined instruction exception. Unfortunately, the exit
path for these exceptions did not restore the address limit, which
causes the rootfs mount code to fail. Fix the missing address limit
restoration.
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
The late_alloc() PTE allocation function used by create_mapping_late()
does not call pgtable_page_ctor() on PTE pages it allocates, leaving
the per-page spinlock uninitialized.
Since generic page table manipulation code may assume that translation
table pages that are not owned by init_mm are covered by fully
constructed struct pages, the following crash may occur with the new
UEFI memory attributes table code.
efi: memattr: Processing EFI Memory Attributes table:
efi: memattr: 0x0000ffa16000-0x0000ffa82fff [Runtime Code |RUN| | |XP| | | | | | | | ]
Unable to handle kernel NULL pointer dereference at virtual address 00000010
pgd = c0204000
[00000010] *pgd=00000000
Internal error: Oops: 5 [#1] SMP ARM
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.7.0-rc4-00063-g3882aa7b340b #361
Hardware name: Generic DT based system
task: ed858000 ti: ed842000 task.ti: ed842000
PC is at __lock_acquire+0xa0/0x19a8
...
[<c038c830>] (__lock_acquire) from [<c038e4f8>] (lock_acquire+0x6c/0x88)
[<c038e4f8>] (lock_acquire) from [<c0c06134>] (_raw_spin_lock+0x2c/0x3c)
[<c0c06134>] (_raw_spin_lock) from [<c0410384>] (apply_to_page_range+0xe8/0x238)
[<c0410384>] (apply_to_page_range) from [<c1205f34>] (efi_set_mapping_permissions+0x54/0x5c)
[<c1205f34>] (efi_set_mapping_permissions) from [<c1247474>] (efi_memattr_apply_permissions+0x2b8/0x378)
[<c1247474>] (efi_memattr_apply_permissions) from [<c1248258>] (arm_enable_runtime_services+0x1f0/0x22c)
[<c1248258>] (arm_enable_runtime_services) from [<c0301f0c>] (do_one_initcall+0x44/0x174)
[<c0301f0c>] (do_one_initcall) from [<c1200d10>] (kernel_init_freeable+0x90/0x1e8)
[<c1200d10>] (kernel_init_freeable) from [<c0bff690>] (kernel_init+0x8/0x114)
[<c0bff690>] (kernel_init) from [<c0307ed0>] (ret_from_fork+0x14/0x24)
The crash is due to the fact that the UEFI page tables are not owned by
init_mm, but are not covered by fully constructed struct pages.
Given that the UEFI subsystem is currently the only user of
create_mapping_late(), add an unconditional call to pgtable_page_ctor() to
late_alloc().
Fixes: 9fc68b717c ("ARM/efi: Apply strict permissions for UEFI Runtime Services regions")
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
To limit the amount of mapped low memory, we determine a physical address
boundary based on the start of the vmalloc area using __pa().
Strictly speaking, the vmalloc area location is arbitrary and does not
necessarily corresponds to a valid physical address. For example, if
PAGE_OFFSET = 0x80000000
PHYS_OFFSET = 0x90000000
vmalloc_min = 0xf0000000
then __pa(vmalloc_min) overflows and returns a wrapped 0 when phys_addr_t
is a 32-bit type. Then the code that follows determines that the entire
physical memory is above that boundary and no low memory gets mapped at
all:
|[...]
|Machine model: Freescale i.MX51 NA04 Board
|Ignoring RAM at 0x90000000-0xb0000000 (!CONFIG_HIGHMEM)
|Consider using a HIGHMEM enabled kernel.
To avoid this problem let's make vmalloc_limit a 64-bit value all the
time and determine that boundary explicitly without using __pa().
Reported-by: Emil Renner Berthing <kernel@esmil.dk>
Signed-off-by: Nicolas Pitre <nico@linaro.org>
Tested-by: Emil Renner Berthing <kernel@esmil.dk>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
When executing the script included below, the netns delete operation
hangs with the following message (repeated at 10 second intervals):
kernel:unregister_netdevice: waiting for lo to become free. Usage count = 1
This occurs because a reference to the lo interface in the "secure" netns
is still held by a dst entry in the xfrm bundle cache in the init netns.
Address this problem by garbage collecting the tunnel netns flow cache
when a cross-namespace vti interface receives a NETDEV_DOWN notification.
A more detailed description of the problem scenario (referencing commands
in the script below):
(1) ip link add vti_test type vti local 1.1.1.1 remote 1.1.1.2 key 1
The vti_test interface is created in the init namespace. vti_tunnel_init()
attaches a struct ip_tunnel to the vti interface's netdev_priv(dev),
setting the tunnel net to &init_net.
(2) ip link set vti_test netns secure
The vti_test interface is moved to the "secure" netns. Note that
the associated struct ip_tunnel still has tunnel->net set to &init_net.
(3) ip netns exec secure ping -c 4 -i 0.02 -I 192.168.100.1 192.168.200.1
The first packet sent using the vti device causes xfrm_lookup() to be
called as follows:
dst = xfrm_lookup(tunnel->net, skb_dst(skb), fl, NULL, 0);
Note that tunnel->net is the init namespace, while skb_dst(skb) references
the vti_test interface in the "secure" namespace. The returned dst
references an interface in the init namespace.
Also note that the first parameter to xfrm_lookup() determines which flow
cache is used to store the computed xfrm bundle, so after xfrm_lookup()
returns there will be a cached bundle in the init namespace flow cache
with a dst referencing a device in the "secure" namespace.
(4) ip netns del secure
Kernel begins to delete the "secure" namespace. At some point the
vti_test interface is deleted, at which point dst_ifdown() changes
the dst->dev in the cached xfrm bundle flow from vti_test to lo (still
in the "secure" namespace however).
Since nothing has happened to cause the init namespace's flow cache
to be garbage collected, this dst remains attached to the flow cache,
so the kernel loops waiting for the last reference to lo to go away.
<Begin script>
ip link add br1 type bridge
ip link set dev br1 up
ip addr add dev br1 1.1.1.1/8
ip netns add secure
ip link add vti_test type vti local 1.1.1.1 remote 1.1.1.2 key 1
ip link set vti_test netns secure
ip netns exec secure ip link set vti_test up
ip netns exec secure ip link s lo up
ip netns exec secure ip addr add dev lo 192.168.100.1/24
ip netns exec secure ip route add 192.168.200.0/24 dev vti_test
ip xfrm policy flush
ip xfrm state flush
ip xfrm policy add dir out tmpl src 1.1.1.1 dst 1.1.1.2 \
proto esp mode tunnel mark 1
ip xfrm policy add dir in tmpl src 1.1.1.2 dst 1.1.1.1 \
proto esp mode tunnel mark 1
ip xfrm state add src 1.1.1.1 dst 1.1.1.2 proto esp spi 1 \
mode tunnel enc des3_ede 0x112233445566778811223344556677881122334455667788
ip xfrm state add src 1.1.1.2 dst 1.1.1.1 proto esp spi 1 \
mode tunnel enc des3_ede 0x112233445566778811223344556677881122334455667788
ip netns exec secure ping -c 4 -i 0.02 -I 192.168.100.1 192.168.200.1
ip netns del secure
<End script>
Reported-by: Hangbin Liu <haliu@redhat.com>
Reported-by: Jan Tluka <jtluka@redhat.com>
Signed-off-by: Lance Richardson <lrichard@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Howells says:
====================
rxrpc: Miscellaneous fixes
Here are a bunch of miscellaneous fixes to AF_RXRPC:
(*) Fix an uninitialised pointer.
(*) Fix error handling when we fail to connect a call.
(*) Fix a NULL pointer dereference.
(*) Fix two occasions where a packet is accessed again after being queued
for someone else to deal with.
(*) Fix a missing skb free.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Since IRQCHIP_DECLARE now flags the GPC node as already populated, the
GPC power domain driver is never probed unless we clear the flag again.
Fixes: 15cc2ed6dc ("of/irq: Mark initialised interrupt controllers as populated")
Suggested-by: Rob Herring <robh@kernel.org>
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Cc: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Rob Herring <robh@kernel.org>
That way the init callback may clear the flag again, in case of drivers
split between early irq chip and a normal platform driver.
Fixes: 15cc2ed6dc ("of/irq: Mark initialised interrupt controllers as populated")
Suggested-by: Rob Herring <robh@kernel.org>
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Acked-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Rob Herring <robh@kernel.org>
@mynodes is set to NULL when __unflatten_device_tree() is called
to unflatten device sub-tree in PCI hot add scenario on PowerPC
PowerNV platform. Marking @mynodes detached unconditionally causes
kernel crash as below backtrace shows:
Unable to handle kernel paging request for data at address 0x00000000
Faulting instruction address: 0xc000000000b26f64
cpu 0x0: Vector: 300 (Data Access) at [c000003fcc7cf740]
pc: c000000000b26f64: __unflatten_device_tree+0xf4/0x190
lr: c000000000b26f40: __unflatten_device_tree+0xd0/0x190
sp: c000003fcc7cf9c0
msr: 900000000280b033
dar: 0
dsisr: 40000000
current = 0xc000003fcc281680
paca = 0xc00000000ff00000 softe: 0 irq_happened: 0x01
pid = 2724, comm = sh
Linux version 4.7.0-gavin-07754-g92a6836 (gwshan@gwshan) (gcc version \
4.9.3 (Buildroot 2016.02-rc2-00093-g5ea3bce) ) #539 SMP Mon Aug 1 \
12:40:29 AEST 2016
enter ? for help
[c000003fcc7cfa50] c000000000b27060 of_fdt_unflatten_tree+0x60/0x90
[c000003fcc7cfaa0] c0000000004c6288 pnv_php_set_slot_power_state+0x118/0x440
[c000003fcc7cfb80] c0000000004c6a10 pnv_php_enable+0xc0/0x170
[c000003fcc7cfbd0] c0000000004c4d80 power_write_file+0xa0/0x190
[c000003fcc7cfc50] c0000000004be93c pci_slot_attr_store+0x3c/0x60
[c000003fcc7cfc70] c0000000002d3fd4 sysfs_kf_write+0x94/0xc0
[c000003fcc7cfcb0] c0000000002d2c30 kernfs_fop_write+0x180/0x260
[c000003fcc7cfd00] c000000000230fe0 __vfs_write+0x40/0x190
[c000003fcc7cfd90] c000000000232278 vfs_write+0xc8/0x240
[c000003fcc7cfde0] c000000000233d90 SyS_write+0x60/0x110
[c000003fcc7cfe30] c000000000009524 system_call+0x38/0x108
This avoids the kernel crash by marking @mynodes detached only when
@mynodes is dereferencing valid device node in __unflatten_device_tree().
Fixes: 1d1bde550e ("of: fdt: mark unflattened tree as detached")
Reported-by: Meng Li <shlimeng@cn.ibm.com>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Rob Herring <robh@kernel.org>
The of_node_put() function tests whether its argument is NULL
and then returns immediately.
Thus the test around the call is not needed.
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Rob Herring <robh@kernel.org>
Some uio based PCI drivers, e.g., uio_cif, do not work if the assigned PCI
memory resources are not page aligned. By using the kernel option
"pci=resource_alignment=<align>@<bus>:<slot>.<func>" it is possible to
request page alignment for memory resources of devices.
However, this is cumbersome when using several devices, and the
bus/slot/func addresses may change if devices are added to or removed from
the system.
Extend the "pci=resource_alignment" option so we can specify the relevant
devices via PCI vendor, device, subvendor, and subdevice IDs. The
specification of the devices via IDs is indicated by a leading string
"pci:" as argument to "pci=resource_alignment".
The format of the specification is
pci:<vendor>:<device>[:<subvendor>:<subdevice>]
Examples:
pci=resource_alignment=4096@pci:8086:9c22:103c:198f
pci=resource_alignment=pci:8086:9c22 # defaults to PAGE_SIZE align
[bhelgaas: changelog, use actual vendor/device IDs in examples]
Signed-off-by: Mathias Koehrer <mathias.koehrer@etas.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Once a packet has been posted to a connection in the data_ready handler, we
mustn't try reposting if we then find that the connection is dying as the
refcount has been given over to the dying connection and the packet might
no longer exist.
Losing the packet isn't a problem as the peer will retransmit.
Signed-off-by: David Howells <dhowells@redhat.com>
The call state machine processor sets up the message parameters for a UDP
message that it might need to transmit in advance on the basis that there's
a very good chance it's going to have to transmit either an ACK or an
ABORT. This requires it to look in the connection struct to retrieve some
of the parameters.
However, if the call is complete, the call connection pointer may be NULL
to dissuade the processor from transmitting a message. However, there are
some situations where the processor is still going to be called - and it's
still going to set up message parameters whether it needs them or not.
This results in a NULL pointer dereference at:
net/rxrpc/call_event.c:837
To fix this, skip the message pre-initialisation if there's no connection
attached.
Signed-off-by: David Howells <dhowells@redhat.com>
If rxrpc_new_client_call() fails to make a connection, the call record that
it allocated needs to be marked as RXRPC_CALL_RELEASED before it is passed
to rxrpc_put_call() to indicate that it no longer has any attachment to the
AF_RXRPC socket.
Without this, an assertion failure may occur at:
net/rxrpc/call_object:635
Signed-off-by: David Howells <dhowells@redhat.com>
Due to the limitations of having to wait until we see a device's DMA
restrictions before we know how we want an IOVA domain initialised,
there is a window for error if a DMA ops domain is allocated but later
freed without ever being used. In that case, init_iova_domain() was
never called, so calling put_iova_domain() from iommu_put_dma_cookie()
ends up trying to take an uninitialised lock and crashing.
Make things robust by skipping the call unless the IOVA domain actually
has been initialised, as we probably should have done from the start.
Fixes: 0db2e5d18f ("iommu: Implement common IOMMU ops for DMA mapping")
Cc: stable@vger.kernel.org
Reported-by: Nate Watterson <nwatters@codeaurora.org>
Reviewed-by: Nate Watterson <nwatters@codeaurora.org>
Tested-by: Nate Watterson <nwatters@codeaurora.org>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
According to the KVM API documentation a successful MSI injection
should return a value > 0 on success.
Return possible errors in vgic_its_trigger_msi() and report a
successful injection back to userland, while also reporting the
case where the MSI could not be delivered due to the guest not
having the LPI mapped, for instance.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
GPIO control register is divided into IOPINS1 and IOPINS2.
And low 4-bit of register is controls output.
So, this patch fixes wrong mask of GPIO output.
Signed-off-by: Jaewon Kim <jaewon02.kim@samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Hans de Goede has reported a difficulty in the Linux port of libusb.
When a device is removed, the poll() system call in usbfs starts
returning POLLERR as soon as udev->state is set to
USB_STATE_NOTATTACHED, but the outstanding URBs are not available for
reaping until some time later (after usbdev_remove() has been called).
This is awkward for libusb or other usbfs clients, although not an
insuperable problem.
At any rate, it's easy to change usbfs so that it returns POLLHUP as
soon as the state becomes USB_STATE_NOTATTACHED but it doesn't return
POLLERR until after the outstanding URBs have completed. That's what
this patch does; it uses the fact that ps->list is always on the
dev->filelist list until usbdev_remove() takes it off, which happens
after all the outstanding URBs have been cancelled.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reported-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
usbdev_mmap allocates a buffer. The size of the buffer is determined
by a user. So with this code (no need to be root):
int fd = open("/dev/bus/usb/001/001", O_RDONLY);
mmap(NULL, 0x800000, PROT_READ, MAP_SHARED, fd, 0);
we can see a warning:
WARNING: CPU: 0 PID: 21771 at ../mm/page_alloc.c:3563 __alloc_pages_slowpath+0x1036/0x16e0()
...
Call Trace:
[<ffffffff8117a3ae>] ? warn_slowpath_null+0x2e/0x40
[<ffffffff815178b6>] ? __alloc_pages_slowpath+0x1036/0x16e0
[<ffffffff81516880>] ? warn_alloc_failed+0x250/0x250
[<ffffffff8151226b>] ? get_page_from_freelist+0x75b/0x28b0
[<ffffffff815184e3>] ? __alloc_pages_nodemask+0x583/0x6b0
[<ffffffff81517f60>] ? __alloc_pages_slowpath+0x16e0/0x16e0
[<ffffffff810565d4>] ? dma_generic_alloc_coherent+0x104/0x220
[<ffffffffa0269e56>] ? hcd_buffer_alloc+0x1d6/0x3e0 [usbcore]
[<ffffffffa0269c80>] ? hcd_buffer_destroy+0xa0/0xa0 [usbcore]
[<ffffffffa0228f05>] ? usb_alloc_coherent+0x65/0x90 [usbcore]
[<ffffffffa0275c05>] ? usbdev_mmap+0x1a5/0x770 [usbcore]
...
Allocations like this one should be marked as __GFP_NOWARN. So do so.
The size could be also clipped by something like:
if (size >= (1 << (MAX_ORDER + PAGE_SHIFT - 1)))
return -ENOMEM;
But I think the overall limit of 16M (by usbfs_increase_memory_usage)
is enough, so that we only silence the warning here.
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Steinar H. Gunderson <sesse@google.com>
Cc: Markus Rechberger <mrechberger@gmail.com>
Fixes: f7d34b445a (USB: Add support for usbfs zerocopy.)
Cc: 4.6+ <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In ehci_turn_off_all_ports() all EHCI port registers are cleared to zero.
On some hardware, this can lead to an system hang,
when ehci_port_power() accesses the already cleared registers.
This patch changes the order of cleanup.
First call ehci_port_power() which respects the current bits in
port status registers
and afterwards cleanup the hard way by setting everything to zero.
Signed-off-by: Marc Ohlf <ohlf@mkt-sys.de>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
CC: <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Erroneous or malicious endpoint descriptors may have non-zero bits in
reserved positions, or out-of-bounds values. This patch helps prevent
these from causing problems by bounds-checking the wMaxPacketValue
entries in endpoint descriptors and capping the values at the maximum
allowed.
This issue was first discovered and tests were conducted by Jake Lamberson
<jake.lamberson1@gmail.com>, an intern working for Rosie Hall.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reported-by: roswest <roswest@cisco.com>
Tested-by: roswest <roswest@cisco.com>
CC: <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This was an oversight while merging these functions. Fix it.
Cc: Honghui Zhang <honghui.zhang@mediatek.com>
Fixes: 9ca340c98c ('iommu/mediatek: move the common struct into header file')
Signed-off-by: Joerg Roedel <jroedel@suse.de>
This patch fixes fives off-by-one bugs in the ftdi-elan driver code. The
bug can be triggered by plugging a USB adapter for CardBus 3G cards (model
U132 manufactured by Elan Digital Systems, Ltd), causing a kernel panic.
The fix was tested on Ubuntu 14.04.4 with 4.7.0-rc14.2.0-27-generic+ and
4.4.0-22-generic+ kernel. In the ftdi_elan_synchronize function, an
off-by-one memory corruption occurs when packet_bytes is equal or bigger
than m. After having read m bytes, that is bytes_read is equal to m, "
..\x00" is still copied to the stack variable causing an out bounds write
of 4 bytes, which overwrites the stack canary and results in a kernel
panic.
This off-by-one requires physical access to the machine. It is not
exploitable since we have no control on the overwritten data. Similar
off-by-one bugs have been observed in 4 other functions:
ftdi_elan_stuck_waiting, ftdi_elan_read, ftdi_elan_edset_output and
ftdi_elan_flush_input_fifo.
Reported-by: Alex Palesandro <palexster@gmail.com>
Signed-off-by: Xiao Han <xiao.han@orange.fr>
Tested-by: Paul Chaignon <pchaigno@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
For case 14 and case 21, their correct return value is the number
of bytes transferred, so it is a positive integer. But in usbtest_ioctl,
it takes non-zero as false return value for usbtest_do_ioctl, so
it will treat the correct test as wrong test, then the time on
tests will be the minus value.
Signed-off-by: Peter Chen <peter.chen@nxp.com>
Cc: stable <stable@vger.kernel.org>
Fixes: 18fc4ebdc7 ("usb: misc: usbtest: Remove timeval usage")
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The locking in hub_activate() is not adequate to provide full mutual
exclusion with hub_quiesce(). The subroutine locks the hub's
usb_interface, but the callers of hub_quiesce() (such as
hub_pre_reset() and hub_event()) hold the lock to the hub's
usb_device.
This patch changes hub_activate() to make it acquire the same lock as
those other routines.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
CC: <stable@vger.kernel.org> #4.4+
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The early-exit pathway in hub_activate, added by commit e50293ef97
("USB: fix invalid memory access in hub_activate()") needs
improvement. It duplicates code that is already present at the end of
the subroutine, and it neglects to undo the effect of a
usb_autopm_get_interface_no_resume() call.
This patch fixes both problems by making the early-exit pathway jump
directly to the end of the subroutine. It simplifies the code at the
end by merging two conditionals that actually test the same condition
although they appear different: If type < HUB_INIT3 then type must be
either HUB_INIT2 or HUB_INIT, and it can't be HUB_INIT because in that
case the subroutine would have exited earlier.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
CC: <stable@vger.kernel.org> #4.4+
Reviewed-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Memory leak and unbalanced reference count:
If the hub gets disconnected while the core is still activating it, this
can result in leaking memory of few USB structures.
This will happen if we have done a kref_get() from hub_activate() and
scheduled a delayed work item for HUB_INIT2/3. Now if hub_disconnect()
gets called before the delayed work expires, then we will cancel the
work from hub_quiesce(), but wouldn't do a kref_put(). And so the
unbalance.
kmemleak reports this as (with the commit e50293ef97 backported to
3.10 kernel with other changes, though the same is true for mainline as
well):
unreferenced object 0xffffffc08af5b800 (size 1024):
comm "khubd", pid 73, jiffies 4295051211 (age 6482.350s)
hex dump (first 32 bytes):
30 68 f3 8c c0 ff ff ff 00 a0 b2 2e c0 ff ff ff 0h..............
01 00 00 00 00 00 00 00 00 94 7d 40 c0 ff ff ff ..........}@....
backtrace:
[<ffffffc0003079ec>] create_object+0x148/0x2a0
[<ffffffc000cc150c>] kmemleak_alloc+0x80/0xbc
[<ffffffc000303a7c>] kmem_cache_alloc_trace+0x120/0x1ac
[<ffffffc0006fa610>] hub_probe+0x120/0xb84
[<ffffffc000702b20>] usb_probe_interface+0x1ec/0x298
[<ffffffc0005d50cc>] driver_probe_device+0x160/0x374
[<ffffffc0005d5308>] __device_attach+0x28/0x4c
[<ffffffc0005d3164>] bus_for_each_drv+0x78/0xac
[<ffffffc0005d4ee0>] device_attach+0x6c/0x9c
[<ffffffc0005d42b8>] bus_probe_device+0x28/0xa0
[<ffffffc0005d23a4>] device_add+0x324/0x604
[<ffffffc000700fcc>] usb_set_configuration+0x660/0x6cc
[<ffffffc00070a350>] generic_probe+0x44/0x84
[<ffffffc000702914>] usb_probe_device+0x54/0x74
[<ffffffc0005d50cc>] driver_probe_device+0x160/0x374
[<ffffffc0005d5308>] __device_attach+0x28/0x4c
Deadlocks:
If the hub gets disconnected early enough (i.e. before INIT2/INIT3 are
finished and the init_work is still queued), the core may call
hub_quiesce() after acquiring interface device locks and it will wait
for the work to be cancelled synchronously. But if the work handler is
already running in parallel, it may try to acquire the same interface
device lock and this may result in deadlock.
Fix both the issues by removing the call to cancel_delayed_work_sync().
CC: <stable@vger.kernel.org> #4.4+
Fixes: e50293ef97 ("USB: fix invalid memory access in hub_activate()")
Reported-by: Manu Gautam <mgautam@codeaurora.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Since 6de62f15b5 ("crypto: algif_hash - Require setkey before
accept(2)"), the AF_ALG interface requires userspace to provide a key
to any algorithm that has a setkey method. However, the non-HMAC
algorithms are not keyed, so setting a key is unnecessary.
Fix this by removing the setkey method from the non-keyed hash
algorithms.
Fixes: 6de62f15b5 ("crypto: algif_hash - Require setkey before accept(2)")
Cc: <stable@vger.kernel.org>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The optimised crc32c implementation depends on VMX (aka. Altivec)
instructions, so the kernel must be built with Altivec support in order
for the crc32c code to build.
Fixes: 6dd7a82cc5 ("crypto: powerpc - Add POWER8 optimised crc32c")
Acked-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
A newly added bugfix caused an uninitialized variable to be
used for printing debug output. This is harmless as long
as the debug setting is disabled, but otherwise leads to an
immediate crash.
gcc warns about this when -Wmaybe-uninitialized is enabled:
net/rxrpc/call_object.c: In function 'rxrpc_release_call':
net/rxrpc/call_object.c:496:163: error: 'sp' may be used uninitialized in this function [-Werror=maybe-uninitialized]
The initialization was removed but one of the users remains.
This adds back the initialization.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: 372ee16386 ("rxrpc: Fix races between skb free, ACK generation and replying")
Signed-off-by: David Howells <dhowells@redhat.com>
Currently, user can add a conntrack with different l4proto via nfnetlink.
For example, original tuple is TCP while reply tuple is SCTP. This is
invalid combination, we should report EINVAL to userspace.
Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Like NFQNL_MSG_VERDICT_BATCH do, we should also reject the verdict
request when the portid is not same with the initial portid(maybe
from another process).
Fixes: 97d32cf944 ("netfilter: nfnetlink_queue: batch verdict support")
Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
User can use NFQA_EXP to attach expectations to conntracks, but we
forget to put back nf_conntrack_expect when it is inserted successfully,
i.e. in this normal case, expect's use refcnt will be 3. So even we
unlink it and put it back later, the use refcnt is still 1, then the
memory will be leaked forever.
Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The 'name' filed in struct nf_conntrack_expect_policy{} is not a
pointer, so check it is NULL or not will always return true. Even if the
name is empty, slash will always be displayed like follows:
# cat /proc/net/nf_conntrack_expect
297 l3proto = 2 proto=6 src=1.1.1.1 dst=2.2.2.2 sport=1 dport=1025 ftp/
^
Fixes: 3a8fc53a45 ("netfilter: nf_ct_helper: allocate 16 bytes for the helper and policy names")
Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The doorbell offset is formatted with a 0x prefix to suggest it is
a hexadecimal value, when in fact %d is being used and this is confusing.
Use %X instead to match the proceeding 0x prefix.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Sudarsana Reddy Kalluru says:
====================
qed: dcbx fix series.
The patch series contains the minor bug fixes for qed dcbx module.
Please consider applying this to 'net' branch.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
MFW now supports the Selection field for IEEE mode. Add driver changes to
use the newer MFW masks to read/write the port-id value.
Signed-off-by: Sudarsana Reddy Kalluru <sudarsana.kalluru@qlogic.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Endian-ness conversion is not needed for priority-to-TC field as the
field is already being read/written by the driver in big-endian way.
Signed-off-by: Sudarsana Reddy Kalluru <sudarsana.kalluru@qlogic.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In aacraid's ioctl_send_fib() we do two fetches from userspace, one the
get the fib header's size and one for the fib itself. Later we use the
size field from the second fetch to further process the fib. If for some
reason the size from the second fetch is different than from the first
fix, we may encounter an out-of- bounds access in aac_fib_send(). We
also check the sender size to insure it is not out of bounds. This was
reported in https://bugzilla.kernel.org/show_bug.cgi?id=116751 and was
assigned CVE-2016-6480.
Reported-by: Pengfei Wang <wpengfeinudt@gmail.com>
Fixes: 7c00ffa31 '[SCSI] 2.6 aacraid: Variable FIB size (updated patch)'
Cc: stable@vger.kernel.org
Signed-off-by: Dave Carroll <david.carroll@microsemi.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Commit 52253db924 ("sctp: also point GSO head_skb to the sk when
it's available") used event->chunk->head_skb to get the head_skb in
sctp_ulpevent_set_owner().
But at that moment, the event->chunk was NULL, as it cloned the skb
in sctp_ulpevent_make_rcvmsg(). Therefore, that patch didn't really
work.
This patch is to move the event->chunk initialization before calling
sctp_ulpevent_receive_data() so that it uses event->chunk when it's
valid.
Fixes: 52253db924 ("sctp: also point GSO head_skb to the sk when it's available")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
vxlan driver has bypass for local vxlan traffic, but that
depends on information about all VNIs on local system in
vxlan driver. This is not available in case of LWT.
Therefore following patch disable encap bypass for LWT
vxlan traffic.
Fixes: ee122c79d4 ("vxlan: Flow based tunneling").
Reported-by: Jakub Libosvar <jlibosva@redhat.com>
Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
LWT user can specify destination as well as source ip address
for given tunnel endpoint. But vxlan is ignoring given source
ip address. Following patch uses both ip address to route the
tunnel packet. This consistent with other LWT implementations,
like GENEVE and GRE.
Fixes: ee122c79d4 ("vxlan: Flow based tunneling").
Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann says:
====================
Few BPF helper related checksum fixes
The set contains three fixes with regards to CHECKSUM_COMPLETE
and BPF helper functions. For details please see individual
patches.
Thanks!
v1 -> v2:
- Fixed make htmldocs issue reported by kbuild bot.
- Rest as is.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
When having skbs on ingress with CHECKSUM_COMPLETE, tc BPF programs don't
push rcsum of mac header back in and after BPF run back pull out again as
opposed to some other subsystems (ovs, for example).
For cases like q-in-q, meaning when a vlan tag for offloading is already
present and we're about to push another one, then skb_vlan_push() pushes the
inner one into the skb, increasing mac header and skb_postpush_rcsum()'ing
the 4 bytes vlan header diff. Likewise, for the reverse operation in
skb_vlan_pop() for the case where vlan header needs to be pulled out of the
skb, we're decreasing the mac header and skb_postpull_rcsum()'ing the 4 bytes
rcsum of the vlan header that was removed.
However mangling the rcsum here will lead to hw csum failure for BPF case,
since we're pulling or pushing data that was not part of the current rcsum.
Changing tc BPF programs in general to push/pull rcsum around BPF_PROG_RUN()
is also not really an option since current behaviour is ABI by now, but apart
from that would also mean to do quite a bit of useless work in the sense that
usually 12 bytes need to be rcsum pushed/pulled also when we don't need to
touch this vlan related corner case. One way to fix it would be to push the
necessary rcsum fixup down into vlan helpers that are (mostly) slow-path
anyway.
Fixes: 4e10df9a60 ("bpf: introduce bpf_skb_vlan_push/pop() helpers")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
bpf_skb_store_bytes() invocations above L2 header need BPF_F_RECOMPUTE_CSUM
flag for updates, so that CHECKSUM_COMPLETE will be fixed up along the way.
Where we ran into an issue with bpf_skb_store_bytes() is when we did a
single-byte update on the IPv6 hoplimit despite using BPF_F_RECOMPUTE_CSUM
flag; simple ping via ICMPv6 triggered a hw csum failure as a result. The
underlying issue has been tracked down to a buffer alignment issue.
Meaning, that csum_partial() computations via skb_postpull_rcsum() and
skb_postpush_rcsum() pair invoked had a wrong result since they operated on
an odd address for the hoplimit, while other computations were done on an
even address. This mix doesn't work as-is with skb_postpull_rcsum(),
skb_postpush_rcsum() pair as it always expects at least half-word alignment
of input buffers, which is normally the case. Thus, instead of these helpers
using csum_sub() and (implicitly) csum_add(), we need to use csum_block_sub(),
csum_block_add(), respectively. For unaligned offsets, they rotate the sum
to align it to a half-word boundary again, otherwise they work the same as
csum_sub() and csum_add().
Adding __skb_postpull_rcsum(), __skb_postpush_rcsum() variants that take the
offset as an input and adapting bpf_skb_store_bytes() to them fixes the hw
csum failures again. The skb_postpull_rcsum(), skb_postpush_rcsum() helpers
use a 0 constant for offset so that the compiler optimizes the offset & 1
test away and generates the same code as with csum_sub()/_add().
Fixes: 608cd71a9c ("tc: bpf: generalize pedit action")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Follow-up to commit f8ffad69c9 ("bpf: add skb_postpush_rcsum and fix
dev_forward_skb occasions") to fix an issue for dev_queue_xmit() redirect
locations which need CHECKSUM_COMPLETE fixups on ingress.
For the same reasons as described in f8ffad69c9 already, we of course
also need this here, since dev_queue_xmit() on a veth device will let us
end up in the dev_forward_skb() helper again to cross namespaces.
Latter then calls into skb_postpull_rcsum() to pull out L2 header, so
that netif_rx_internal() sees CHECKSUM_COMPLETE as it is expected. That
is, CHECKSUM_COMPLETE on ingress covering L2 _payload_, not L2 headers.
Also here we have to address bpf_redirect() and bpf_clone_redirect().
Fixes: 3896d655f4 ("bpf: introduce bpf_clone_redirect() helper")
Fixes: 27b29f6305 ("bpf: add bpf_redirect() helper")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The call site for this function appears as:
#ifdef DEBUG
data->msg_enable = DEBUG;
dump_eth_one(dev);
#endif
...leading to the following warning for !DEBUG builds:
drivers/net/ethernet/tundra/tsi108_eth.c:169:13: warning: 'dump_eth_one' defined but not used [-Wunused-function]
static void dump_eth_one(struct net_device *dev)
^
...when using the arch/powerpc/configs/mpc7448_hpc2_defconfig
Put the function definition under the same #ifdef as the call site
to avoid the warning.
Cc: "David S. Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel says:
====================
mlxsw: DCB fixes
Patches 1 and 2 fix a problem in which PAUSE frames settings are wrongly
overridden when ieee_setpfc() gets called.
Patch 3 adds a missing rollback in port's creation error path.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
We correctly execute mlxsw_sp_port_dcb_fini() when port is removed, but
I missed its rollback in the error path of port creation, so add it.
Fixes: f00817df2b ("mlxsw: spectrum: Introduce support for Data Center Bridging (DCB)")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The PFCC register is used to configure both PAUSE and PFC frames.
Therefore, when PFC frames are disabled we must make sure we don't
mistakenly also disable PAUSE frames (which might be enabled).
Fix this by packing the PFCC register with the current PAUSE settings.
Note that this register is also accessed via ethtool ops, but there we
are guaranteed to have PFC disabled.
Fixes: d81a6bdb87 ("mlxsw: spectrum: Add IEEE 802.1Qbb PFC support")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When ieee_setpfc() gets called, PAUSE frames are not necessarily
disabled on the port.
Check if PAUSE frames are disabled or enabled and configure the port's
headroom buffer accordingly.
Fixes: d81a6bdb87 ("mlxsw: spectrum: Add IEEE 802.1Qbb PFC support")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Looks like a simple copy'n'paste error.
Fixes: 1aa661f5c3 ("rhashtable-test: Measure time to insert, remove & traverse entries")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
Phil Sutter says:
====================
sctp_diag: A bunch of fixes for upcoming 'ss' support
The following series contains a number of fixes necessary to make my yet
unpublished 'ss' support patch functional.
Changes since v1:
- Fixed patch 2/3
- Rebased whole series onto current net-next/master
Changes since v2:
- Improved description of patch 2/3
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Since 'ss' always adds TCPF_CLOSE to idiag_states flags, sctp_diag can't
rely upon TCPF_LISTEN flag solely being present when listening sockets
are requested.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
The asoc's timer value is not kept in asoc->timeouts array but in it's
primary transport instead.
Furthermore, we must export the timer only if it is pending, otherwise
the value will underrun when stored in an unsigned variable and
user space will only see a very large timeout value.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is required to correctly interpret INET_DIAG_INFO messages exported
by sctp_diag module.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
To be able to generate shared descriptors for AEAD, the authentication size
needs to be known. However, there is no imposed order of calling .setkey,
.setauthsize callbacks.
Thus, in case authentication size is not known at .setkey time, defer it
until .setauthsize is called.
The authsize != 0 check was incorrectly removed when converting the driver
to the new AEAD interface.
Cc: <stable@vger.kernel.org> # 4.3+
Fixes: 479bcc7c5b ("crypto: caam - Convert authenc to new AEAD interface")
Signed-off-by: Horia Geantă <horia.geanta@nxp.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
There are a few things missed by the conversion to the
new AEAD interface:
1 - echainiv(authenc) encrypt shared descriptor
The shared descriptor is incorrect: due to the order of operations,
at some point in time MATH3 register is being overwritten.
2 - buffer used for echainiv(authenc) encrypt shared descriptor
Encrypt and givencrypt shared descriptors (for AEAD ops) are mutually
exclusive and thus use the same buffer in context state: sh_desc_enc.
However, there's one place missed by s/sh_desc_givenc/sh_desc_enc,
leading to errors when echainiv(authenc(...)) algorithms are used:
DECO: desc idx 14: Header Error. Invalid length or parity, or
certain other problems.
While here, also fix a typo: dma_mapping_error() is checking
for validity of sh_desc_givenc_dma instead of sh_desc_enc_dma.
Cc: <stable@vger.kernel.org> # 4.3+
Fixes: 479bcc7c5b ("crypto: caam - Convert authenc to new AEAD interface")
Signed-off-by: Horia Geantă <horia.geanta@nxp.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
On 32-bit (e.g. with m68k-linux-gnu-gcc-4.1):
crypto/sha3_generic.c:27: warning: integer constant is too large for ‘long’ type
crypto/sha3_generic.c:28: warning: integer constant is too large for ‘long’ type
crypto/sha3_generic.c:29: warning: integer constant is too large for ‘long’ type
crypto/sha3_generic.c:29: warning: integer constant is too large for ‘long’ type
crypto/sha3_generic.c:31: warning: integer constant is too large for ‘long’ type
crypto/sha3_generic.c:31: warning: integer constant is too large for ‘long’ type
crypto/sha3_generic.c:32: warning: integer constant is too large for ‘long’ type
crypto/sha3_generic.c:32: warning: integer constant is too large for ‘long’ type
crypto/sha3_generic.c:32: warning: integer constant is too large for ‘long’ type
crypto/sha3_generic.c:33: warning: integer constant is too large for ‘long’ type
crypto/sha3_generic.c:33: warning: integer constant is too large for ‘long’ type
crypto/sha3_generic.c:34: warning: integer constant is too large for ‘long’ type
crypto/sha3_generic.c:34: warning: integer constant is too large for ‘long’ type
Fixes: 53964b9ee6 ("crypto: sha3 - Add SHA-3 hash algorithm")
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
During qdio_shutdown the queue tasklets are killed for all inbound and
outbound queues. The queue structures might be freed after
qdio_shutdown.
Thus it must be guaranteed that these queue tasklets are not rescheduled
after that. In addition the outbound queue timers are deleted and it
must
be guaranteed that these timers are not restarted after qdio_shutdown
processing. Timer deletion should make use of del_timer_sync() to make
sure qdio_outbound_timer() is finished on other CPUs as well. Queue
tasklets should be scheduled in state QDIO_IRQ_STATE_ACTIVE only.
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Prior to starting IO qdio checks for the internal state of the ccw
device. These checks happen without locking, so consistency between
state evaluation and starting of the IO is not guaranteed.
Since the internal state is checked during ccw_device_start it is
safe to get rid of these additional checks.
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
qdio sometimes checks return codes twice. First with the ccw device's
lock held and then a 2nd time after the lock is released. Simplify
the code by releasing the lock earlier and unify the return code
evaluation.
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Reviewed-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
All qdio functions that use spin_lock_irqsave are never used
from irq context. Thus it is safe to convert all of them to
use spin_lock_irq.
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Reviewed-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
A copy of struct subchannel_id is maintained in ccw_device_private.
The subchannel id is a property of the subchannel. The additional
copy is not needed.
Internal users can obtain it from subchannel.schid - device drivers
can use ccw_device_get_schid().
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
We want to get rid of the copy of struct subchannel_id maintained in
ccw_device_private, so obtain it from the subchannel directly.
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
For all configs with CONFIG_BTRFS_FS = y we should also make the
optimized crc module builtin. Otherwise early mounts will fall
back to the software variant.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
if two string compare equal the clcle instruction will update the
string addresses to point _after_ the string. This might already
be on a different page, so we should not use these pointer to
calculate the difference as in that case the calculation of the
difference can cause oopses.
The return value of memcmp does not need the difference, we
can just reuse the condition code and return for CC=1 (All bytes
compared, first operand low) -1 and for CC=2 (All bytes compared,
first operand high) +1
strstr also does not need the diff.
While fixing this, make the common function clcle "correct on its
own" by using l1 instead of l2 for the first length. strstr will
call this with l2 for both strings.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Fixes: db7f5eef3d ("s390/lib: use basic blocks for inline assemblies")
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The current prealign logic will fail for sizes < alignment,
as the new datalen passed to the vector function is smaller
than zero. Being a size_t this gets wrapped to a huge
number causing memory overruns and wrong data.
Let's add an early exit if the size is smaller than the minimal
size with alignment. This will also avoid calling the software
fallback twice for all sizes smaller than the minimum size
(prealign + remaining)
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Fixes: f848dbd3bc ("s390/crc32-vx: add crypto API module for optimized CRC-32 algorithms")
Reviewed-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The way the decompressor is hooked into the start-up code is rather
subtle, with a mix of multiply-defined symbols and hardcoded address
literals. Add some comments at the junction points to clarify how it
works.
Signed-off-by: Sascha Silbe <silbe@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
udriver struct allocated by kzalloc() will not be freed
if usb_register() and next calls fail. This patch fixes this
by adding one more step with kfree(udriver) in error path.
Signed-off-by: Alexey Klimov <klimov.linux@gmail.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Johan Hovold <johan@kernel.org>
This patch adds a set of compositions for Telit LE920A4.
Compositions in short are:
0x1207: tty + tty
0x1208: tty + adb + tty + tty
0x1211: tty + adb + ecm
0x1212: tty + adb
0x1213: ecm + tty
0x1214: tty + adb + ecm + tty
telit_le922_blacklist_usbcfg3 is reused for compositions 0x1211
and 0x1214 due to the same interfaces positions.
Signed-off-by: Daniele Palmas <dnlplm@gmail.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Johan Hovold <johan@kernel.org>
BCM20706V2_EVAL is a WICED dev board designed with FT2232H USB 2.0
UART/FIFO IC.
To support BCM920706V2_EVAL dev board for WICED development on Linux.
Add the VID(0a5c) and PID(6422) to ftdi_sio driver to allow loading
ftdi_sio for this board.
Signed-off-by: Sheng-Hui J. Chu <s.jeffrey.chu@gmail.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Johan Hovold <johan@kernel.org>
Ivium Technologies uses the FTDI VID with custom PIDs for their line of
electrochemical interfaces and the PalmSens they developed for PalmSens
BV.
Signed-off-by: Robert Delien <robert@delien.nl>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Johan Hovold <johan@kernel.org>
Do not drop packet when CSeq is 0 as 0 is also a valid value for CSeq.
simple_strtoul() will return 0 either when all digits are 0
or if there are no digits at all. Therefore when simple_strtoul()
returns 0 we check if first character is digit 0 or not.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The device has four interfaces; the three serial ports ought to be
handled by this driver:
00 Diagnostic interface serial port
01 NMEA device serial port
02 Mass storage (sd card)
03 Modem serial port
The other product ids listed in the Windows driver are present already.
Signed-off-by: Lubomir Rintel <lkundrak@v3.sk>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Johan Hovold <johan@kernel.org>
If we find a matching element that is inactive with no descendants, we
jump to the found label, then crash because of nul-dereference on the
left branch.
Fix this by checking that the element is active and not an interval end
and skipping the logic that only applies to the tree iteration.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Tested-by: Anders K. Pedersen <akp@akp.dk>
Commit 96d1327ac2 ("netfilter: h323: Use mod_timer instead of
set_expect_timeout") just simplify the source codes
if (!del_timer(&exp->timeout))
return 0;
add_timer(&exp->timeout);
to mod_timer(&exp->timeout, jiffies + info->timeout * HZ);
This is not correct, and introduce a race codition:
CPU0 CPU1
- timer expire
process_rcf expectation_timed_out
lock(exp_lock) -
find_exp waiting exp_lock...
re-activate timer!! waiting exp_lock...
unlock(exp_lock) lock(exp_lock)
- unlink expect
- free(expect)
- unlock(exp_lock)
So when the timer expires again, we will access the memory that
was already freed.
Replace mod_timer with mod_timer_pending here to fix this problem.
Fixes: 96d1327ac2 ("netfilter: h323: Use mod_timer instead of set_expect_timeout")
Cc: Gao Feng <fgao@ikuai8.com>
Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
On Intel Xeon Phi Knights Landing processor family the channels of the
memory controller have untypical arrangement - MC0 is mapped to CH3,4,5
and MC1 is mapped to CH0,1,2. This causes the EDAC driver to report the
channel name incorrectly.
We missed this change earlier, so the code already contains similar
comment, but the translation function is incorrect.
Without this patch:
errors in DIMM_A and DIMM_D were reported in DIMM_D
errors in DIMM_B and DIMM_E were reported in DIMM_E
errors in DIMM_C and DIMM_F were reported in DIMM_F
Correct this.
Hubert Chrzaniuk:
- rebased to 4.8
- comments and code cleanup
Fixes: d0cdf90031 ("sb_edac: Add Knights Landing (Xeon Phi gen 2) support")
Reviewed-by: Tony Luck <tony.luck@intel.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Hubert Chrzaniuk <hubert.chrzaniuk@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: lukasz.anaczkowski@intel.com
Cc: lukasz.odzioba@intel.com
Cc: mchehab@kernel.org
Cc: <stable@vger.kernel.org> # v4.5..
Link: http://lkml.kernel.org/r/1469231089-22837-1-git-send-email-lukasz.odzioba@intel.com
Signed-off-by: Lukasz Odzioba <lukasz.odzioba@intel.com>
[ Boris: Simplify a bit by removing char mc. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Johannes Berg says:
====================
First set of fixes for the current cycle:
* fix 80+80 bandwidth warning
* fix powersave with mac80211 TXQ implementation
* use correct way to free SKBs from multicast buffering
* mesh: fix operation ordering to work with all drivers
* mesh: end service period even when peer goes away
* mesh: correct HT opmode validity checks
* pass hw pointer from mac80211 to driver in TPT method,
fixing a bug (in a bit the wrong way, but that's what
we have right now)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
increase test coverage to check previously missing 'update when full'
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The introduction of pre-allocated hash elements inadvertently broke
the behavior of bpf hash maps where users expected to call
bpf_map_update_elem() without considering that the map can be full.
Some programs do:
old_value = bpf_map_lookup_elem(map, key);
if (old_value) {
... prepare new_value on stack ...
bpf_map_update_elem(map, key, new_value);
}
Before pre-alloc the update() for existing element would work even
in 'map full' condition. Restore this behavior.
The above program could have updated old_value in place instead of
update() which would be faster and most programs use that approach,
but sometimes the values are large and the programs use update()
helper to do atomic replacement of the element.
Note we cannot simply update element's value in-place like percpu
hash map does and have to allocate extra num_possible_cpu elements
and use this extra reserve when the map is full.
Fixes: 6c90598174 ("bpf: pre-allocate hash map elements")
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
On 32-bit (e.g. with m68k-linux-gnu-gcc-4.1):
drivers/net/dsa/b53/b53_common.c: In function ‘b53_arl_read’:
drivers/net/dsa/b53/b53_common.c:1072: warning: integer constant is too large for ‘long’ type
Fixes: 1da6df85c6 ("net: dsa: b53: Implement ARL add/del/dump operations")
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Inside the kafs filesystem it is possible to occasionally have a call
processed and terminated before we've had a chance to check whether we need
to clean up the rx queue for that call because afs_send_simple_reply() ends
the call when it is done, but this is done in a workqueue item that might
happen to run to completion before afs_deliver_to_call() completes.
Further, it is possible for rxrpc_kernel_send_data() to be called to send a
reply before the last request-phase data skb is released. The rxrpc skb
destructor is where the ACK processing is done and the call state is
advanced upon release of the last skb. ACK generation is also deferred to
a work item because it's possible that the skb destructor is not called in
a context where kernel_sendmsg() can be invoked.
To this end, the following changes are made:
(1) kernel_rxrpc_data_consumed() is added. This should be called whenever
an skb is emptied so as to crank the ACK and call states. This does
not release the skb, however. kernel_rxrpc_free_skb() must now be
called to achieve that. These together replace
rxrpc_kernel_data_delivered().
(2) kernel_rxrpc_data_consumed() is wrapped by afs_data_consumed().
This makes afs_deliver_to_call() easier to work as the skb can simply
be discarded unconditionally here without trying to work out what the
return value of the ->deliver() function means.
The ->deliver() functions can, via afs_data_complete(),
afs_transfer_reply() and afs_extract_data() mark that an skb has been
consumed (thereby cranking the state) without the need to
conditionally free the skb to make sure the state is correct on an
incoming call for when the call processor tries to send the reply.
(3) rxrpc_recvmsg() now has to call kernel_rxrpc_data_consumed() when it
has finished with a packet and MSG_PEEK isn't set.
(4) rxrpc_packet_destructor() no longer calls rxrpc_hard_ACK_data().
Because of this, we no longer need to clear the destructor and put the
call before we free the skb in cases where we don't want the ACK/call
state to be cranked.
(5) The ->deliver() call-type callbacks are made to return -EAGAIN rather
than 0 if they expect more data (afs_extract_data() returns -EAGAIN to
the delivery function already), and the caller is now responsible for
producing an abort if that was the last packet.
(6) There are many bits of unmarshalling code where:
ret = afs_extract_data(call, skb, last, ...);
switch (ret) {
case 0: break;
case -EAGAIN: return 0;
default: return ret;
}
is to be found. As -EAGAIN can now be passed back to the caller, we
now just return if ret < 0:
ret = afs_extract_data(call, skb, last, ...);
if (ret < 0)
return ret;
(7) Checks for trailing data and empty final data packets has been
consolidated as afs_data_complete(). So:
if (skb->len > 0)
return -EBADMSG;
if (!last)
return 0;
becomes:
ret = afs_data_complete(call, skb, last);
if (ret < 0)
return ret;
(8) afs_transfer_reply() now checks the amount of data it has against the
amount of data desired and the amount of data in the skb and returns
an error to induce an abort if we don't get exactly what we want.
Without these changes, the following oops can occasionally be observed,
particularly if some printks are inserted into the delivery path:
general protection fault: 0000 [#1] SMP
Modules linked in: kafs(E) af_rxrpc(E) [last unloaded: af_rxrpc]
CPU: 0 PID: 1305 Comm: kworker/u8:3 Tainted: G E 4.7.0-fsdevel+ #1303
Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014
Workqueue: kafsd afs_async_workfn [kafs]
task: ffff88040be041c0 ti: ffff88040c070000 task.ti: ffff88040c070000
RIP: 0010:[<ffffffff8108fd3c>] [<ffffffff8108fd3c>] __lock_acquire+0xcf/0x15a1
RSP: 0018:ffff88040c073bc0 EFLAGS: 00010002
RAX: 6b6b6b6b6b6b6b6b RBX: 0000000000000000 RCX: ffff88040d29a710
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88040d29a710
RBP: ffff88040c073c70 R08: 0000000000000001 R09: 0000000000000001
R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000000 R14: ffff88040be041c0 R15: ffffffff814c928f
FS: 0000000000000000(0000) GS:ffff88041fa00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fa4595f4750 CR3: 0000000001c14000 CR4: 00000000001406f0
Stack:
0000000000000006 000000000be04930 0000000000000000 ffff880400000000
ffff880400000000 ffffffff8108f847 ffff88040be041c0 ffffffff81050446
ffff8803fc08a920 ffff8803fc08a958 ffff88040be041c0 ffff88040c073c38
Call Trace:
[<ffffffff8108f847>] ? mark_held_locks+0x5e/0x74
[<ffffffff81050446>] ? __local_bh_enable_ip+0x9b/0xa1
[<ffffffff8108f9ca>] ? trace_hardirqs_on_caller+0x16d/0x189
[<ffffffff810915f4>] lock_acquire+0x122/0x1b6
[<ffffffff810915f4>] ? lock_acquire+0x122/0x1b6
[<ffffffff814c928f>] ? skb_dequeue+0x18/0x61
[<ffffffff81609dbf>] _raw_spin_lock_irqsave+0x35/0x49
[<ffffffff814c928f>] ? skb_dequeue+0x18/0x61
[<ffffffff814c928f>] skb_dequeue+0x18/0x61
[<ffffffffa009aa92>] afs_deliver_to_call+0x344/0x39d [kafs]
[<ffffffffa009ab37>] afs_process_async_call+0x4c/0xd5 [kafs]
[<ffffffffa0099e9c>] afs_async_workfn+0xe/0x10 [kafs]
[<ffffffff81063a3a>] process_one_work+0x29d/0x57c
[<ffffffff81064ac2>] worker_thread+0x24a/0x385
[<ffffffff81064878>] ? rescuer_thread+0x2d0/0x2d0
[<ffffffff810696f5>] kthread+0xf3/0xfb
[<ffffffff8160a6ff>] ret_from_fork+0x1f/0x40
[<ffffffff81069602>] ? kthread_create_on_node+0x1cf/0x1cf
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
commit a94efbd7cc ("ethernet: arc: emac_main: add missing of_node_put
after calling of_parse_phandle") added missing of_node_put after calling
of_parse_phandle, but missing the devm_ioremap_resource() error handling
case.
Signed-off-by: Wei Yongjun <weiyj.lk@gmail.com>
Reviewed-by: Peter Chen <peter.chen@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
net_device->ndo_set_rx_headroom (introduced in
871b642ade) says
"Setting a negtaive value reset the rx headroom
to the default value".
It seems that the OVS implementation in
3a927bc7cf overlooked this and sets
dev->needed_headroom unconditionally.
This doesn't have an immediate effect, but can mess up later
LL_RESERVED_SPACE calculations, such as done in
net/ipv6/mcast.c:mld_newpack. For reference, this issue was found
from a skb_panic raised there after the length calculations had given
the wrong result.
Note the other current users of this interface
(drivers/net/tun.c:tun_set_headroom and
drivers/net/veth.c:veth_set_rx_headroom) are both checking this
correctly thus need no modification.
Thanks to Ben for some pointers from the crash dumps!
Cc: Benjamin Poirier <bpoirier@suse.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1361414
Signed-off-by: Ian Wienand <iwienand@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The variable is added to allow the driver an easy access to
it's own hw->priv when the op is invoked.
This fixes a crash in wlcore because it was relying on a
station pointer that wasn't initialized yet. It's the wrong
way to fix the crash, but it solves the problem for now and
it does make sense to have the hw pointer here.
Signed-off-by: Maxim Altshul <maxim.altshul@ti.com>
[rewrite commit message, fix indentation]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Previously, NL80211_MESHCONF_HT_OPMODE validation rejected correct
flag combinations, e.g. IEEE80211_HT_OP_MODE_PROTECTION_NONHT_MIXED |
IEEE80211_HT_OP_MODE_NON_HT_STA_PRSNT.
Doing just a range-check allows setting flags that don't exist (0x8)
and invalid flag combinations.
Implements some checks based on IEEE 802.11 2012 8.4.2.59 "HT
Operation element".
Signed-off-by: Masashi Honma <masashi.honma@gmail.com>
[reword commit message, simplify a bit]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
If QoS frame with EOSP (end of service period) subfield=1 sent by local
peer was not acked by remote peer, local peer did not end the MPSP. This
prevents local peer from going to DOZE state. And if the remote peer
goes away without closing connection, local peer continues AWAKE state
and wastes battery.
Signed-off-by: Masashi Honma <masashi.honma@gmail.com>
Acked-by: Bob Copeland <me@bobcopeland.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The code currently assumes that buffered multicast PS frames don't have
a pending ACK frame for tx status reporting.
However, hostapd sends a broadcast deauth frame on teardown for which tx
status is requested. This can lead to the "Have pending ack frames"
warning on module reload.
Fix this by using ieee80211_free_txskb/ieee80211_purge_tx_queue.
Cc: stable@vger.kernel.org
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Previously code defaulted to 32 BCLKS per WCLK which meant 24 and
32 bit DAI formats would not work properly. This patch fixes the
issue by defaulting to 64 BCLKs per WCLK.
Signed-off-by: Adam Thomson <Adam.Thomson.Opensource@diasemi.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
The patch is to fix the static check error as the following.
The patch commit b50455fab4 ("ASoC: nau8825: cross talk suppression
measurement function") from Jun 7, 2016, leads to the following
static checker warning:
sound/soc/codecs/nau8825.c:265 nau8825_sema_acquire()
warn: 'sem:&nau8825->xtalk_sem' is sometimes locked here and
sometimes unlocked.
The semaphone acquire function has return value, and some callers
can do error handling when lock fails.
Signed-off-by: John Hsu <KCHSU0@nuvoton.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
In chromium, the following steps will make codec function fail.
\1. plug in headphones, Play music
\2. run "powerd_dbus_suspend"
\3. resume from S3
After resume, the jack detection will restart and make configuration
for the headset. Meanwhile, the playback prepares and starts to work.
The two sequences will conflict and make wrong register configuration.
Originally, the driver adds protection for the case when it finds
the playback is active. But the "powerd_dbus_suspend" command will
close the pcm stream before suspend. Therefore, the driver can't
detect the playback after resume, and the protection not works.
For the issue, the driver raises protection every time after resume.
The protection will release after jack detection and configuration
completes, and then the playback just will goes on.
Signed-off-by: John Hsu <KCHSU0@nuvoton.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
There is no "pclk" alias in the s3c2440 clk driver for "soc-audio"
device so related clk_get() fails, which prevents any operation
of the S3C24XX_UDA134X sound card.
Instead we get the clock on behalf of the I2S device, i.e. we use
the I2S block gate clock which has PCLK is its parent clock.
Without this patch there is an error like:
s3c24xx_uda134x_startup cannot get pclk
ASoC: UDA134X startup failed: -2
Signed-off-by: Sylwester Nawrocki <s.nawrocki@samsung.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Manish Chopra says:
====================
qlcnic: bug fixes
This series fixes a data structure corruption bug in
VF's async mailbox commands handling and an issue realted
to napi poll budget in the driver.
Please consider applying this series to "net"
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Driver modifies the supplied NAPI budget in qlcnic_83xx_msix_tx_poll()
function. Instead, it should use the budget as it is.
Signed-off-by: Manish Chopra <manish.chopra@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes a data structure corruption bug in the SRIOV VF mailbox
handler code. While handling mailbox commands from the atomic context,
driver is accessing and updating qlcnic_async_work_list_struct entry fields
in the async work list. These fields could be concurrently accessed by the
work function resulting in data corruption.
This patch restructures async mbx command handling by using a separate
async command list instead of using a list of work_struct structures.
A single work_struct is used to schedule and handle the async commands
with proper locking mechanism.
Signed-off-by: Rajesh Borundia <rajesh.borundia@qlogic.com>
Signed-off-by: Sony Chacko <sony.chacko@qlogic.com>
Signed-off-by: Manish Chopra <manish.chopra@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Siva Reddy Kallam says:
====================
tg3: Disallow 0 rx coalesce time and correctly report RSS queues in tg3_get_rxnfc
First patch:
Diasllow rx coalescing time to be 0
Second patch:
Report the correct number of RSS queues through tg3_get_rxnfc
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch remove the wrong substraction from info->data in
tg3_get_rxnfc function. Without this patch, the number of RSS
queues reported is less by one.
Reported-by: Michal Soltys <soltys@ziu.info>
Signed-off-by: Siva Reddy Kallam <siva.kallam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the rx coalescing time is 0, interrupts
are not generated from the controller and rx path hangs.
To avoid this rx hang, updating the driver to not allow
rx coalescing time to be 0.
Signed-off-by: Satish Baddipadige <satish.baddipadige@broadcom.com>
Signed-off-by: Siva Reddy Kallam <siva.kallam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Using per-register incrementing ID can lead to
find_good_pkt_pointers() confusing registers which
have completely different values. Consider example:
0: (bf) r6 = r1
1: (61) r8 = *(u32 *)(r6 +76)
2: (61) r0 = *(u32 *)(r6 +80)
3: (bf) r7 = r8
4: (07) r8 += 32
5: (2d) if r8 > r0 goto pc+9
R0=pkt_end R1=ctx R6=ctx R7=pkt(id=0,off=0,r=32) R8=pkt(id=0,off=32,r=32) R10=fp
6: (bf) r8 = r7
7: (bf) r9 = r7
8: (71) r1 = *(u8 *)(r7 +0)
9: (0f) r8 += r1
10: (71) r1 = *(u8 *)(r7 +1)
11: (0f) r9 += r1
12: (07) r8 += 32
13: (2d) if r8 > r0 goto pc+1
R0=pkt_end R1=inv56 R6=ctx R7=pkt(id=0,off=0,r=32) R8=pkt(id=1,off=32,r=32) R9=pkt(id=1,off=0,r=32) R10=fp
14: (71) r1 = *(u8 *)(r9 +16)
15: (b7) r7 = 0
16: (bf) r0 = r7
17: (95) exit
We need to get a UNKNOWN_VALUE with imm to force id
generation so lines 0-5 make r7 a valid packet pointer.
We then read two different bytes from the packet and
add them to copies of the constructed packet pointer.
r8 (line 9) and r9 (line 11) will get the same id of 1,
independently. When either of them is validated (line
13) - find_good_pkt_pointers() will also mark the other
as safe. This leads to access on line 14 being mistakenly
considered safe.
Fixes: 969bf05eb3 ("bpf: direct packet access")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Building with -Wmaybe-uninitialized shows a potential use of
an uninitialized variable:
drivers/net/ethernet/apm/xgene/xgene_enet_hw.c: In function 'xgene_enet_phy_connect':
drivers/net/ethernet/apm/xgene/xgene_enet_hw.c:802:23: warning: 'phy_dev' may be used uninitialized in this function [-Wmaybe-uninitialized]
Although the compiler correctly identified this based on the function,
the current code is still safe as long dev->of_node is non-NULL
for the case of CONFIG_ACPI=n, which is currently the case.
The warning is now disabled by default, but still appears when
building with W=1, and other build test tools should be able to
detect it as well. Adding an #else clause here makes the code
more robust and makes it clear to the compiler that this cannot
happen.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: 8089a96f60 ("drivers: net: xgene: Add backward compatibility")
Signed-off-by: David S. Miller <davem@davemloft.net>
ovs_ct_find_existing() issues a warning if an existing conntrack entry
classified as IP_CT_NEW is found, with the premise that this should
not happen. However, a newly confirmed, non-expected conntrack entry
remains IP_CT_NEW as long as no reply direction traffic is seen. This
has resulted into somewhat confusing kernel log messages. This patch
removes this check and warning.
Fixes: 289f2253 ("openvswitch: Find existing conntrack entry after upcall.")
Suggested-by: Joe Stringer <joe@ovn.org>
Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Acked-by: Joe Stringer <joe@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
simple-card-utils might be used as module, but MODULE_xxx()
information was missed. This patch adds it.
Otherwise, we will have below error, and can't use it.
Specil thanks to Kevin.
> insmod simple-card-utils.ko
simple_card_utils: module license 'unspecified' taints kernel.
Disabling lock debugging due to kernel taint
simple_card_utils: Unknown symbol snd_soc_of_parse_daifmt (err 0)
simple_card_utils: Unknown symbol snd_soc_of_parse_card_name (err 0)
insmod: can't insert 'simple-card-utils.ko': \
unknown symbol in module, or unknown parameter
Reported-by: Kevin Hilman <khilman@baylibre.com>
Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Some drivers (e.g. wl18xx) expect that the last stage in the
de-initialization process will be stopping the beacons, similar to AP flow.
Update ieee80211_stop_mesh() flow accordingly.
As peers can be removed dynamically, this would not impact other drivers.
Tested also on Ralink RT3572 chipset.
Signed-off-by: Maital Hahn <maitalm@ti.com>
Signed-off-by: Yaniv Machani <yanivma@ti.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The logic was inverted here, set the bit if frames are pending.
Fixes: ba8c3d6f16 ("mac80211: add an intermediate software queue implementation")
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The switch on chandef->width is missing a break on the
NL8211_CHAN_WIDTH_80P80 case; currently we get a WARN_ON when
center_freq2 is non-zero because of the missing break.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The SND_SOC_DAPM_PRE_PMU case would call startup()/hw_params() that
might access substream->runtime through other functions.
For example:
Unable to handle kernel NULL pointer dereference at virtual address
[....]
PC is at snd_pcm_hw_rule_add+0x24/0x1b0
LR is at snd_pcm_hw_constraint_list+0x20/0x28
[....]
Process arecord (pid: 424, stack limit = 0xffffffc1ecaf0020)
Call trace:
[<ffffffc00086be68>] snd_pcm_hw_rule_add+0x24/0x1b0
[<ffffffc00086c014>] snd_pcm_hw_constraint_list+0x20/0x28
[<ffffffc0008b47a4>] cs53l30_pcm_startup+0x24/0x30
[<ffffffc0008a6260>] snd_soc_dai_link_event+0x290/0x354
[<ffffffc0008a7528>] dapm_seq_check_event.isra.31+0x134/0x2c8
[<ffffffc0008a7768>] dapm_seq_run_coalesced+0x94/0x1c8
[<ffffffc0008a7940>] dapm_seq_run+0xa4/0x404
[<ffffffc0008a8bac>] dapm_power_widgets+0x524/0x984
[<ffffffc0008ab1c4>] snd_soc_dapm_stream_event+0x8c/0xa8
[<ffffffc0008ac7f4>] soc_pcm_prepare+0x10c/0x1ec
[<ffffffc000865b9c>] snd_pcm_do_prepare+0x1c/0x38
[<ffffffc000865600>] snd_pcm_action_single+0x40/0x88
[<ffffffc0008656b8>] snd_pcm_action_nonatomic+0x70/0x90
[<ffffffc000868d28>] snd_pcm_common_ioctl1+0xb6c/0xdd8
[<ffffffc000869508>] snd_pcm_capture_ioctl1+0x200/0x334
[<ffffffc00086a084>] snd_pcm_ioctl_compat+0x648/0x95c
[<ffffffc0001ff4b4>] compat_SyS_ioctl+0xac/0xfc4
[<ffffffc000084cf0>] el0_svc_naked+0x24/0x28
---[ end trace 0dc4f99c2759c35c ]---
So this patch adds a dummy runtime for the original dummy substream
to merely avoid the NULL pointer access.
Signed-off-by: Nicolin Chen <nicoleotsuka@gmail.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
2016-07-27 18:59:45 +01:00
481 changed files with 6070 additions and 2868 deletions
@@ -1333,8 +1333,6 @@ int etnaviv_gpu_submit(struct etnaviv_gpu *gpu,
if(ret<0)
returnret;
mutex_lock(&gpu->lock);
/*
* TODO
*
@@ -1348,16 +1346,18 @@ int etnaviv_gpu_submit(struct etnaviv_gpu *gpu,
if(unlikely(event==~0U)){
DRM_ERROR("no free event\n");
ret=-EBUSY;
gotoout_unlock;
gotoout_pm_put;
}
fence=etnaviv_gpu_fence_alloc(gpu);
if(!fence){
event_free(gpu,event);
ret=-ENOMEM;
gotoout_unlock;
gotoout_pm_put;
}
mutex_lock(&gpu->lock);
gpu->event[event].fence=fence;
submit->fence=fence->seqno;
gpu->active_fence=submit->fence;
@@ -1395,9 +1395,9 @@ int etnaviv_gpu_submit(struct etnaviv_gpu *gpu,
hangcheck_timer_reset(gpu);
ret=0;
out_unlock:
mutex_unlock(&gpu->lock);
out_pm_put:
etnaviv_gpu_pm_put(gpu);
returnret;
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.