Pull perf fixes from Ingo Molnar:
"Two fixlets"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/hwbp: Simplify the perf-hwbp code, fix documentation
perf/x86/intel: Fix linear IP of PEBS real_ip on Haswell and later CPUs
Pull x86 PTI fixes from Ingo Molnar:
"Two fixes: a relatively simple objtool fix that makes Clang built
kernels work with ORC debug info, plus an alternatives macro fix"
* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/alternatives: Fixup alternative_call_2
objtool: Add Clang support
Pull Kbuild fixes from Masahiro Yamada:
- fix missed rebuild of TRIM_UNUSED_KSYMS
- fix rpm-pkg for GNU tar >= 1.29
- include scripts/dtc/include-prefixes/* to kernel header deb-pkg
- add -no-integrated-as option ealier to fix building with Clang
- fix netfilter Makefile for parallel building
* tag 'kbuild-fixes-v4.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
netfilter: nf_nat_snmp_basic: add correct dependency to Makefile
kbuild: rpm-pkg: Support GNU tar >= 1.29
builddeb: Fix header package regarding dtc source links
kbuild: set no-integrated-as before incl. arch Makefile
kbuild: make scripts/adjust_autoksyms.sh robust against timestamp races
Pull networking fixes from David Miller:
1) Fix RCU locking in xfrm_local_error(), from Taehee Yoo.
2) Fix return value assignments and thus error checking in
iwl_mvm_start_ap_ibss(), from Johannes Berg.
3) Don't count header length twice in vti4, from Stefano Brivio.
4) Fix deadlock in rt6_age_examine_exception, from Eric Dumazet.
5) Fix out-of-bounds access in nf_sk_lookup_slow{v4,v6}() from Subash
Abhinov.
6) Check nladdr size in netlink_connect(), from Alexander Potapenko.
7) VF representor SQ numbers are 32 not 16 bits, in mlx5 driver, from
Or Gerlitz.
8) Out of bounds read in skb_network_protocol(), from Eric Dumazet.
9) r8169 driver sets driver data pointer after register_netdev() which
is too late. Fix from Heiner Kallweit.
10) Fix memory leak in mlx4 driver, from Moshe Shemesh.
11) The multi-VLAN decap fix added a regression when dealing with device
that lack a MAC header, such as tun. Fix from Toshiaki Makita.
12) Fix integer overflow in dynamic interrupt coalescing code. From Tal
Gilboa.
13) Use after free in vrf code, from David Ahern.
14) IPV6 route leak between VRFs fix, also from David Ahern.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (81 commits)
net: mvneta: fix enable of all initialized RXQs
net/ipv6: Fix route leaking between VRFs
vrf: Fix use after free and double free in vrf_finish_output
ipv6: sr: fix seg6 encap performances with TSO enabled
net/dim: Fix int overflow
vlan: Fix vlan insertion for packets without ethernet header
net: Fix untag for vlan packets without ethernet header
atm: iphase: fix spelling mistake: "Receiverd" -> "Received"
vhost: validate log when IOTLB is enabled
qede: Do not drop rx-checksum invalidated packets.
hv_netvsc: enable multicast if necessary
ip_tunnel: Resolve ipsec merge conflict properly.
lan78xx: Crash in lan78xx_writ_reg (Workqueue: events lan78xx_deferred_multicast_write)
qede: Fix barrier usage after tx doorbell write.
vhost: correctly remove wait queue during poll failure
net/mlx4_core: Fix memory leak while delete slave's resources
net/mlx4_en: Fix mixed PFC and Global pause user control requests
net/smc: use announced length in sock_recvmsg()
llc: properly handle dev_queue_xmit() return value
strparser: Fix sign of err codes
...
In mvneta_port_up() we enable relevant RX and TX port queues by write
queues bit map to an appropriate register.
q_map must be ZERO in the beginning of this process.
Signed-off-by: Yelena Krivosheev <yelena@marvell.com>
Signed-off-by: Gregory CLEMENT <gregory.clement@bootlin.com>
Acked-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Donald reported that IPv6 route leaking between VRFs is not working.
The root cause is the strict argument in the call to rt6_lookup when
validating the nexthop spec.
ip6_route_check_nh validates the gateway and device (if given) of a
route spec. It in turn could call rt6_lookup (e.g., lookup in a given
table did not succeed so it falls back to a full lookup) and if so
sets the strict argument to 1. That means if the egress device is given,
the route lookup needs to return a result with the same device. This
strict requirement does not work with VRFs (IPv4 or IPv6) because the
oif in the flow struct is overridden with the index of the VRF device
to trigger a match on the l3mdev rule and force the lookup to its table.
The right long term solution is to add an l3mdev index to the flow
struct such that the oif is not overridden. That solution will not
backport well, so this patch aims for a simpler solution to relax the
strict argument if the route spec device is an l3mdev slave. As done
in other places, use the FLOWI_FLAG_SKIP_NH_OIF to know that the
RT6_LOOKUP_F_IFACE flag needs to be removed.
Fixes: ca254490c8 ("net: Add VRF support to IPv6 stack")
Reported-by: Donald Sharp <sharpd@cumulusnetworks.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Miguel reported an skb use after free / double free in vrf_finish_output
when neigh_output returns an error. The vrf driver should return after
the call to neigh_output as it takes over the skb on error path as well.
Patch is a simplified version of Miguel's patch which was written for 4.9,
and updated to top of tree.
Fixes: 8f58336d3f ("net: Add ethernet header for pass through VRF device")
Signed-off-by: Miguel Fadon Perlines <mfadon@teldat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Enabling TSO can lead to abysmal performances when using seg6 in
encap mode, such as with the ixgbe driver. This patch adds a call to
iptunnel_handle_offloads() to remove the encapsulation bit if needed.
Before:
root@comp4-seg6bpf:~# iperf3 -c fc00::55
Connecting to host fc00::55, port 5201
[ 4] local fc45::4 port 36592 connected to fc00::55 port 5201
[ ID] Interval Transfer Bandwidth Retr Cwnd
[ 4] 0.00-1.00 sec 196 KBytes 1.60 Mbits/sec 47 6.66 KBytes
[ 4] 1.00-2.00 sec 304 KBytes 2.49 Mbits/sec 100 5.33 KBytes
[ 4] 2.00-3.00 sec 284 KBytes 2.32 Mbits/sec 92 5.33 KBytes
After:
root@comp4-seg6bpf:~# iperf3 -c fc00::55
Connecting to host fc00::55, port 5201
[ 4] local fc45::4 port 43062 connected to fc00::55 port 5201
[ ID] Interval Transfer Bandwidth Retr Cwnd
[ 4] 0.00-1.00 sec 1.03 GBytes 8.89 Gbits/sec 0 743 KBytes
[ 4] 1.00-2.00 sec 1.03 GBytes 8.87 Gbits/sec 0 743 KBytes
[ 4] 2.00-3.00 sec 1.03 GBytes 8.87 Gbits/sec 0 743 KBytes
Reported-by: Tom Herbert <tom@quantonium.net>
Fixes: 6c8702c60b ("ipv6: sr: add support for SRH encapsulation and injection with lwtunnels")
Signed-off-by: David Lebrun <dlebrun@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull ceph fix from Ilya Dryomov:
"A fix for a dio-enabled loop on ceph deadlock from Zheng, marked for
stable"
* tag 'ceph-for-4.16-rc8' of git://github.com/ceph/ceph-client:
ceph: only dirty ITER_IOVEC pages for direct read
Pull KVM fixes from Radim Krčmář:
"PPC:
- Fix a bug causing occasional machine check exceptions on POWER8
hosts (introduced in 4.16-rc1)
x86:
- Fix a guest crashing regression with nested VMX and restricted
guest (introduced in 4.16-rc1)
- Fix dependency check for pv tlb flush (the wrong dependency that
effectively disabled the feature was added in 4.16-rc4, the
original feature in 4.16-rc1, so it got decent testing)"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86: Fix pv tlb flush dependencies
KVM: nVMX: sync vmcs02 segment regs prior to vmx_set_cr0
KVM: PPC: Book3S HV: Fix duplication of host SLB entries
Pull i2c fix from Wolfram Sang:
"A simple but worthwhile I2C driver fix for 4.16"
* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: i2c-stm32f7: fix no check on returned setup
Pull sound fixes from Takashi Iwai:
"Very small fixes (all one-liners) at this time.
One fix is for a PCM core stuff to correct the mmap behavior on
non-x86. It doesn't show on most machines but mostly only for exotic
non-interleaved formats"
* tag 'sound-4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: pcm: potential uninitialized return values
ALSA: pcm: Use dma_bytes as size parameter in dma_mmap_coherent()
ALSA: usb-audio: Add native DSD support for TEAC UD-301
When calculating difference between samples, the values
are multiplied by 100. Large values may cause int overflow
when multiplied (usually on first iteration).
Fixed by forcing 100 to be of type unsigned long.
Fixes: 4c4dbb4a73 ("net/mlx5e: Move dynamic interrupt coalescing code to include/linux")
Signed-off-by: Tal Gilboa <talgi@mellanox.com>
Reviewed-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Toshiaki Makita says:
====================
Fix vlan tag handling for vlan packets without ethernet headers
Eric Dumazet reported syzbot found a new bug which leads to underflow of
size argument of memmove(), causing crash[1]. This can be triggered by tun
devices.
The underflow happened because skb_vlan_untag() did not expect vlan packets
without ethernet headers, and tun can produce such packets.
I also checked vlan_insert_inner_tag() and found a similar bug.
This series fixes these problems.
[1] https://marc.info/?l=linux-netdev&m=152221753920510&w=2
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
In some situation vlan packets do not have ethernet headers. One example
is packets from tun devices. Users can specify vlan protocol in tun_pi
field instead of IP protocol. When we have a vlan device with reorder_hdr
disabled on top of the tun device, such packets from tun devices are
untagged in skb_vlan_untag() and vlan headers will be inserted back in
vlan_insert_inner_tag().
vlan_insert_inner_tag() however did not expect packets without ethernet
headers, so in such a case size argument for memmove() underflowed.
We don't need to copy headers for packets which do not have preceding
headers of vlan headers, so skip memmove() in that case.
Also don't write vlan protocol in skb->data when it does not have enough
room for it.
Fixes: cbe7128c4b ("vlan: Fix out of order vlan headers with reorder header off")
Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
If a page is already locked, attempting to dirty it leads to a deadlock
in lock_page(). This is what currently happens to ITER_BVEC pages when
a dio-enabled loop device is backed by ceph:
$ losetup --direct-io /dev/loop0 /mnt/cephfs/img
$ xfs_io -c 'pread 0 4k' /dev/loop0
Follow other file systems and only dirty ITER_IOVEC pages.
Cc: stable@kernel.org
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Pull device mapper fixes from Mike Snitzer:
- Fix a DM multipath regression introduced in a v4.16-rc6 commit:
restore support for loading, and attaching, scsi_dh modules during
multipath table load. Otherwise some users may find themselves unable
to boot, as was reported today:
https://marc.info/?l=linux-scsi&m=152231276114962&w=2
- Fix a DM core ioctl permission check regression introduced in a
v4.16-rc5 commit.
* tag 'for-4.16/dm-fixes-4' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm: fix dropped return code from dm_get_bdev_for_ioctl
dm mpath: fix support for loading scsi_dh modules during table load
Pull rdma fixes from Jason Gunthorpe:
"It has been fairly silent lately on our -rc front. Big queue of
patches on the mailing list going to for-next though.
Bug fixes:
- qedr driver bugfixes causing application hangs, wrong uapi errnos,
and a race condition
- three syzkaller found bugfixes in the ucma uapi
Regression fixes for things introduced in 4.16:
- Crash on error introduced in mlx5 UMR flow
- Crash on module unload/etc introduced by bad interaction of
restrack and mlx5 patches this cycle
- Typo in a two line syzkaller bugfix causing a bad regression
- Coverity report of nonsense code in hns driver"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
RDMA/ucma: Introduce safer rdma_addr_size() variants
RDMA/hns: ensure for-loop actually iterates and free's buffers
RDMA/ucma: Check that device exists prior to accessing it
RDMA/ucma: Check that device is connected prior to access it
RDMA/rdma_cm: Fix use after free race with process_one_req
RDMA/qedr: Fix QP state initialization race
RDMA/qedr: Fix rc initialization on CNQ allocation failure
RDMA/qedr: fix QP's ack timeout configuration
RDMA/ucma: Correct option size check using optlen
RDMA/restrack: Move restrack_clean to be symmetrical to restrack_init
IB/mlx5: Don't clean uninitialized UMR resources
Pull MTD fixes from Boris Brezillon:
"Two fixes, one in the atmel NAND driver and another one in the
CFI/JEDEC code.
Summary:
- Fix a bug in Atmel ECC engine driver
- Fix a bug in the CFI/JEDEC driver"
* tag 'mtd/fixes-for-4.16' of git://git.infradead.org/linux-mtd:
mtd: jedec_probe: Fix crash in jedec_read_mfr()
mtd: nand: atmel: Fix get_sectorsize() function
dm_get_bdev_for_ioctl()'s return of 0 or 1 must be the result from
prepare_ioctl (1 means the ioctl was issued to a partition, 0 means it
wasn't). Unfortunately commit 519049afea ("dm: use blkdev_get rather
than bdgrab when issuing pass-through ioctl") reused the variable 'r'
to store the return from blkdev_get() that follows prepare_ioctl()
-- whereby dropping prepare_ioctl()'s result on the floor.
This can lead to an ioctl or persistent reservation being issued to a
partition going unnoticed, which implies the extra permission check for
CAP_SYS_RAWIO is skipped.
Fix this by using a different variable to store blkdev_get()'s return.
Fixes: 519049afea ("dm: use blkdev_get rather than bdgrab when issuing pass-through ioctl")
Reported-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Daniel Borkman says:
====================
pull-request: bpf 2018-03-29
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) Fix nfp to properly check max insn count while emitting
instructions in the JIT which was wrongly comparing bytes
against number of instructions before, from Jakub.
2) Fix for bpftool to avoid usage of hex numbers in JSON
output since JSON doesn't accept hex numbers with 0x
prefix, also from Jakub.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The ability to have multipath dynamically attach a scsi_dh, that the user
specified in the multipath table, was broken by commit e8f74a0f00 ("dm
mpath: eliminate need to use scsi_device_from_queue").
Restore the ability to load, and attach, a particular scsi_dh module if
one is specified (as noticed by checking m->hw_handler_name).
Fixes: e8f74a0f00 ("dm mpath: eliminate need to use scsi_device_from_queue")
Reported-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Vq log_base is the userspace address of bitmap which has nothing to do
with IOTLB. So it needs to be validated unconditionally otherwise we
may try use 0 as log_base which may lead to pin pages that will lead
unexpected result (e.g trigger BUG_ON() in set_bit_to_user()).
Fixes: 6b1e6cc785 ("vhost: new device IOTLB API")
Reported-by: syzbot+6304bf97ef436580fede@syzkaller.appspotmail.com
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Today, driver drops received packets which are indicated as
invalid checksum by the device. Instead of dropping such packets,
pass them to the stack with CHECKSUM_NONE indication in skb.
Signed-off-by: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: Manish Chopra <manish.chopra@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It turns out that the loop where we read manufacturer
jedec_read_mfd() can under some circumstances get a
CFI_MFR_CONTINUATION repeatedly, making the loop go
over all banks and eventually hit the end of the
map and crash because of an access violation:
Unable to handle kernel paging request at virtual address c4980000
pgd = (ptrval)
[c4980000] *pgd=03808811, *pte=00000000, *ppte=00000000
Internal error: Oops: 7 [#1] PREEMPT ARM
CPU: 0 PID: 1 Comm: swapper Not tainted 4.16.0-rc1+ #150
Hardware name: Gemini (Device Tree)
PC is at jedec_probe_chip+0x6ec/0xcd0
LR is at 0x4
pc : [<c03a2bf4>] lr : [<00000004>] psr: 60000013
sp : c382dd18 ip : 0000ffff fp : 00000000
r10: c0626388 r9 : 00020000 r8 : c0626340
r7 : 00000000 r6 : 00000001 r5 : c3a71afc r4 : c382dd70
r3 : 00000001 r2 : c4900000 r1 : 00000002 r0 : 00080000
Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none
Control: 0000397f Table: 00004000 DAC: 00000053
Process swapper (pid: 1, stack limit = 0x(ptrval))
Fix this by breaking the loop with a return 0 if
the offset exceeds the map size.
Fixes: 5c9c11e1c4 ("[MTD] [NOR] Add support for flash chips with ID in bank other than 0")
Cc: <stable@vger.kernel.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
My recent change to netvsc drive in how receive flags are handled
broke multicast. The Hyper-v/Azure virtual interface there is not a
multicast filter list, filtering is only all or none. The driver must
enable all multicast if any multicast address is present.
Fixes: 009f766ca2 ("hv_netvsc: filter multicast/broadcast")
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Description:
Crash was reported with syzkaller pointing to lan78xx_write_reg routine.
Root-cause:
Proper cleanup of workqueues and init/setup routines was not happening
in failure conditions.
Fix:
Handled the error conditions by cleaning up the queues and init/setup
routines.
Fixes: 55d7de9de6 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet device driver")
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Raghuram Chary J <raghuramchary.jallipalli@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Steffen Klassert says:
====================
pull request (net): ipsec 2018-03-29
1) Fix a rcu_read_lock/rcu_read_unlock imbalance
in the error path of xfrm_local_error().
From Taehee Yoo.
2) Some VTI MTU fixes. From Stefano Brivio.
3) Fix a too early overwritten skb control buffer
on xfrm transport mode.
Please note that this pull request has a merge conflict
in net/ipv4/ip_tunnel.c.
The conflict is between
commit f6cc9c054e ("ip_tunnel: Emit events for post-register MTU changes")
from the net tree and
commit 24fc79798b ("ip_tunnel: Clamp MTU to bounds on new link")
from the ipsec tree.
It can be solved as it is currently done in linux-next.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull drm fixes from Dave Airlie:
"Nothing serious, two amdkfd and two tegra fixes"
* tag 'drm-fixes-for-v4.16-rc8' of git://people.freedesktop.org/~airlied/linux:
drm/tegra: dc: Using NULL instead of plain integer
drm/amdkfd: Deallocate SDMA queues correctly
drm/amdkfd: Fix scratch memory with HWS enabled
drm/tegra: dc: Use correct format array for Tegra124
nf_nat_snmp_basic_main.c includes a generated header, but the
necessary dependency is missing in Makefile. This could cause
build error in parallel building.
Remove a weird line, and add a correct one.
Fixes: cc2d58634e ("netfilter: nf_nat_snmp_basic: use asn1 decoder library")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Merge misc fixes from Andrew Morton:
"8 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
MAINTAINERS: demote ARM port to "odd fixes"
MAINTAINERS: correct rmk's email address
mm/kmemleak.c: wait for scan completion before disabling free
mm/memcontrol.c: fix parameter description mismatch
mm/vmstat.c: fix vmstat_update() preemption BUG
mm/page_owner: fix recursion bug after changing skip entries
ipc/shm.c: add split function to shm_vm_ops
mm, slab: memcg_link the SLAB's kmem_cache
drm/tegra: Fixes for v4.16
This contains two small fixes, one which fixes a typo that causes a
crash with the new framebuffer modifier query support and another that
fixes a build warning.
* tag 'drm/tegra/for-4.16-fixes' of git://anongit.freedesktop.org/tegra/linux:
drm/tegra: dc: Using NULL instead of plain integer
drm/tegra: dc: Use correct format array for Tegra124
Pull powerpc fixes from Michael Ellerman:
"Some more powerpc fixes for 4.16. Apologies if this is a bit big at
rc7, but they're all reasonably important fixes. None are actually for
new code, so they aren't indicative of 4.16 being in bad shape from
our point of view.
- Fix missing AT_BASE_PLATFORM (in auxv) when we're using a new
firmware interface for describing CPU features.
- Fix lost pending interrupts due to a race in our interrupt
soft-masking code.
- A workaround for a nest MMU bug with TLB invalidations on Power9.
- A workaround for broadcast TLB invalidations on Power9.
- Fix a bug in our instruction SLB miss handler, when handling bad
addresses (eg. >= TASK_SIZE), which could corrupt non-volatile user
GPRs.
Thanks to: Aneesh Kumar K.V, Balbir Singh, Benjamin Herrenschmidt,
Nicholas Piggin"
* tag 'powerpc-4.16-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/64s: Fix i-side SLB miss bad address handler saving nonvolatile GPRs
powerpc/mm: Fixup tlbie vs store ordering issue on POWER9
powerpc/mm/radix: Move the functions that does the actual tlbie closer
powerpc/mm/radix: Remove unused code
powerpc/mm: Workaround Nest MMU bug with TLB invalidations
powerpc/mm: Add tracking of the number of coprocessors using a context
powerpc/64s: Fix lost pending interrupt due to race causing lost update to irq_happened
powerpc/64s: Fix NULL AT_BASE_PLATFORM when using DT CPU features
Pull ARM SoC fixes from Arnd Bergmann:
"Here are are a couple of last-minute fixes for 4.16, mostly for
regressions. As usual, the majory are device tree changes:
- USB 3 support on rk3399 didn't work and is being reverted for now
- One fix for an old suspend/resume bug on rk3399
- A few regulator related fixes on Banana Pi M2, and on imx7d-sdb
- A boot regression fix for all Aspeed SoCs failing to find their
memory
- One more dtc warning fix
The other changes are:
- A few updates to the MAINTAINERS file
- A revert for an incorrect orion5x cleanup
- Two power management fixes for OMAP"
* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
ARM: OMAP: Fix SRAM W+X mapping
ARM: dts: aspeed: Add default memory node
mailmap: Update email address for Gregory CLEMENT
ARM: davinci: fix the GPIO lookup for omapl138-hawk
MAINTAINERS: Update Tegra IOMMU maintainer
ARM: dts: imx7d-sdb: Fix regulator-usb-otg2-vbus node name
ARM: ux500: Fix PMU IRQ regression
ARM: dts: rockchip: Add missing #sound-dai-cells on rk3288
Revert "arm64: dts: rockchip: add usb3-phy otg-port support for rk3399"
arm64: dts: rockchip: Fix rk3399-gru-* s2r (pinctrl hogs, wifi reset)
ARM: OMAP: Fix dmtimer init for omap1
MAINTAINERS: update email address for Maxime Ripard
ARM: dts: sun6i: a31s: bpi-m2: add missing regulators
ARM: dts: sun6i: a31s: bpi-m2: improve pmic properties
As of the start of 2018, I am no longer paid to support the core 32-bit
ARM architecture code. This means that this code is no longer
commercially supported, and is now only supported through voluntary
effort.
I will continue to merge patches as and when able, but this will be at a
lower priority than before (which means a longer latency.) I have also
be scaled back the amount of time spent reading email, so email that is
intended for my attention needs to make itself plainly obvious, or I
will miss it.
In an attempt to reduce the amount of email Cc'd to me, exclude
arch/arm/boot/dts from the maintainers patterns, but add entries for the
SolidRun platforms I look after.
Link: http://lkml.kernel.org/r/E1ezkgn-0002fO-52@rmk-PC.armlinux.org.uk
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
A crash is observed when kmemleak_scan accesses the object->pointer,
likely due to the following race.
TASK A TASK B TASK C
kmemleak_write
(with "scan" and
NOT "scan=on")
kmemleak_scan()
create_object
kmem_cache_alloc fails
kmemleak_disable
kmemleak_do_cleanup
kmemleak_free_enabled = 0
kfree
kmemleak_free bails out
(kmemleak_free_enabled is 0)
slub frees object->pointer
update_checksum
crash - object->pointer
freed (DEBUG_PAGEALLOC)
kmemleak_do_cleanup waits for the scan thread to complete, but not for
direct call to kmemleak_scan via kmemleak_write. So add a wait for
kmemleak_scan completion before disabling kmemleak_free, and while at it
fix the comment on stop_scan_thread.
[vinmenon@codeaurora.org: fix stop_scan_thread comment]
Link: http://lkml.kernel.org/r/1522219972-22809-1-git-send-email-vinmenon@codeaurora.org
Link: http://lkml.kernel.org/r/1522063429-18992-1-git-send-email-vinmenon@codeaurora.org
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Attempting to hotplug CPUs with CONFIG_VM_EVENT_COUNTERS enabled can
cause vmstat_update() to report a BUG due to preemption not being
disabled around smp_processor_id().
Discovered on Ubiquiti EdgeRouter Pro with Cavium Octeon II processor.
BUG: using smp_processor_id() in preemptible [00000000] code:
kworker/1:1/269
caller is vmstat_update+0x50/0xa0
CPU: 0 PID: 269 Comm: kworker/1:1 Not tainted
4.16.0-rc4-Cavium-Octeon-00009-gf83bbd5-dirty #1
Workqueue: mm_percpu_wq vmstat_update
Call Trace:
show_stack+0x94/0x128
dump_stack+0xa4/0xe0
check_preemption_disabled+0x118/0x120
vmstat_update+0x50/0xa0
process_one_work+0x144/0x348
worker_thread+0x150/0x4b8
kthread+0x110/0x140
ret_from_kernel_thread+0x14/0x1c
Link: http://lkml.kernel.org/r/1520881552-25659-1-git-send-email-steven.hill@cavium.com
Signed-off-by: Steven J. Hill <steven.hill@cavium.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Tejun Heo <htejun@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch fixes commit 5f48f0bd4e ("mm, page_owner: skip unnecessary
stack_trace entries").
Because if we skip first two entries then logic of checking count value
as 2 for recursion is broken and code will go in one depth recursion.
so we need to check only one call of _RET_IP(__set_page_owner) while
checking for recursion.
Current Backtrace while checking for recursion:-
(save_stack) from (__set_page_owner) // (But recursion returns true here)
(__set_page_owner) from (get_page_from_freelist)
(get_page_from_freelist) from (__alloc_pages_nodemask)
(__alloc_pages_nodemask) from (depot_save_stack)
(depot_save_stack) from (save_stack) // recursion should return true here
(save_stack) from (__set_page_owner)
(__set_page_owner) from (get_page_from_freelist)
(get_page_from_freelist) from (__alloc_pages_nodemask+)
(__alloc_pages_nodemask) from (depot_save_stack)
(depot_save_stack) from (save_stack)
(save_stack) from (__set_page_owner)
(__set_page_owner) from (get_page_from_freelist)
Correct Backtrace with fix:
(save_stack) from (__set_page_owner) // recursion returned true here
(__set_page_owner) from (get_page_from_freelist)
(get_page_from_freelist) from (__alloc_pages_nodemask+)
(__alloc_pages_nodemask) from (depot_save_stack)
(depot_save_stack) from (save_stack)
(save_stack) from (__set_page_owner)
(__set_page_owner) from (get_page_from_freelist)
Link: http://lkml.kernel.org/r/1521607043-34670-1-git-send-email-maninder1.s@samsung.com
Fixes: 5f48f0bd4e ("mm, page_owner: skip unnecessary stack_trace entries")
Signed-off-by: Maninder Singh <maninder1.s@samsung.com>
Signed-off-by: Vaneet Narang <v.narang@samsung.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oscar Salvador <osalvador@techadventures.net>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Ayush Mittal <ayush.m@samsung.com>
Cc: Prakash Gupta <guptap@codeaurora.org>
Cc: Vinayak Menon <vinmenon@codeaurora.org>
Cc: Vasyl Gomonovych <gomonovych@gmail.com>
Cc: Amit Sahrawat <a.sahrawat@samsung.com>
Cc: <pankaj.m@samsung.com>
Cc: Vaneet Narang <v.narang@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If System V shmget/shmat operations are used to create a hugetlbfs
backed mapping, it is possible to munmap part of the mapping and split
the underlying vma such that it is not huge page aligned. This will
untimately result in the following BUG:
kernel BUG at /build/linux-jWa1Fv/linux-4.15.0/mm/hugetlb.c:3310!
Oops: Exception in kernel mode, sig: 5 [#1]
LE SMP NR_CPUS=2048 NUMA PowerNV
Modules linked in: kcm nfc af_alg caif_socket caif phonet fcrypt
CPU: 18 PID: 43243 Comm: trinity-subchil Tainted: G C E 4.15.0-10-generic #11-Ubuntu
NIP: c00000000036e764 LR: c00000000036ee48 CTR: 0000000000000009
REGS: c000003fbcdcf810 TRAP: 0700 Tainted: G C E (4.15.0-10-generic)
MSR: 9000000000029033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 24002222 XER: 20040000
CFAR: c00000000036ee44 SOFTE: 1
NIP __unmap_hugepage_range+0xa4/0x760
LR __unmap_hugepage_range_final+0x28/0x50
Call Trace:
0x7115e4e00000 (unreliable)
__unmap_hugepage_range_final+0x28/0x50
unmap_single_vma+0x11c/0x190
unmap_vmas+0x94/0x140
exit_mmap+0x9c/0x1d0
mmput+0xa8/0x1d0
do_exit+0x360/0xc80
do_group_exit+0x60/0x100
SyS_exit_group+0x24/0x30
system_call+0x58/0x6c
---[ end trace ee88f958a1c62605 ]---
This bug was introduced by commit 31383c6865 ("mm, hugetlbfs:
introduce ->split() to vm_operations_struct"). A split function was
added to vm_operations_struct to determine if a mapping can be split.
This was mostly for device-dax and hugetlbfs mappings which have
specific alignment constraints.
Mappings initiated via shmget/shmat have their original vm_ops
overwritten with shm_vm_ops. shm_vm_ops functions will call back to the
original vm_ops if needed. Add such a split function to shm_vm_ops.
Link: http://lkml.kernel.org/r/20180321161314.7711-1-mike.kravetz@oracle.com
Fixes: 31383c6865 ("mm, hugetlbfs: introduce ->split() to vm_operations_struct")
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reported-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Reviewed-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Tested-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There are several places in the ucma ABI where userspace can pass in a
sockaddr but set the address family to AF_IB. When that happens,
rdma_addr_size() will return a size bigger than sizeof struct sockaddr_in6,
and the ucma kernel code might end up copying past the end of a buffer
not sized for a struct sockaddr_ib.
Fix this by introducing new variants
int rdma_addr_size_in6(struct sockaddr_in6 *addr);
int rdma_addr_size_kss(struct __kernel_sockaddr_storage *addr);
that are type-safe for the types used in the ucma ABI and return 0 if the
size computed is bigger than the size of the type passed in. We can use
these new variants to check what size userspace has passed in before
copying any addresses.
Reported-by: <syzbot+6800425d54ed3ed8135d@syzkaller.appspotmail.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
PV TLB FLUSH can only be turned on when steal time is enabled.
The condition got reversed during conflict resolution.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Fixes: 4f2f61fc50 ("KVM: X86: Avoid traversing all the cpus for pv tlb flush when steal time is disabled")
[Rebased on top of kvm/master and reworded the commit message. - Radim]
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Pull ARM fixes from Russell King:
"A small number of small fixes for ARM, mostly for some build issues.
One fix for a regression caused by the cpu hotplug conversion from a
few kernel versions ago"
* 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm:
ARM: 8750/1: deflate_xip_data.sh: minor fixes
ARM: 8748/1: mm: Define vdso_start, vdso_end as array
ARM: 8747/1: make CONFIG_DEBUG_WX depend on MMU
ARM: 8746/1: vfp: Go back to clearing vfp_current_hw_state[]
Pull SCSI fixes from James Bottomley:
"Two driver fixes (ibmvfc, iscsi_tcp) and a USB fix for devices that
give the wrong return to Read Capacity and cause a huge log spew.
The remaining five patches all try to fix commit 84676c1f21
("genirq/affinity: assign vectors to all possible CPUs") which broke
the non-mq I/O path"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: iscsi_tcp: set BDI_CAP_STABLE_WRITES when data digest enabled
scsi: sd: Remember that READ CAPACITY(16) succeeded
scsi: ibmvfc: Avoid unnecessary port relogin
scsi: virtio_scsi: unify scsi_host_template
scsi: virtio_scsi: fix IO hang caused by automatic irq vector affinity
scsi: core: introduce force_blk_mq
scsi: megaraid_sas: fix selection of reply queue
scsi: hpsa: fix selection of reply queue
The current for-loop zeros variable i and only loops once, hence
not all the buffers are free'd. Fix this by setting i correctly.
Detected by CoverityScan, CID#1463415 ("Operands don't affect result")
Fixes: a5073d6054 ("RDMA/hns: Add eq support of hip08")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Since commit c5ad119fb6
("net: sched: pfifo_fast use skb_array") driver is exposed
to an issue where it is hitting NULL skbs while handling TX
completions. Driver uses mmiowb() to flush the writes to the
doorbell bar which is a write-combined bar, however on x86
mmiowb() does not flush the write combined buffer.
This patch fixes this problem by replacing mmiowb() with wmb()
after the write combined doorbell write so that writes are
flushed and synchronized from more than one processor.
V1->V2:
-------
This patch was marked as "superseded" in patchwork.
(Not really sure for what reason).Resending it as v2.
Signed-off-by: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: Manish Chopra <manish.chopra@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We tried to remove vq poll from wait queue, but do not check whether
or not it was in a list before. This will lead double free. Fixing
this by switching to use vhost_poll_stop() which zeros poll->wqh after
removing poll from waitqueue to make sure it won't be freed twice.
Cc: Darren Kenny <darren.kenny@oracle.com>
Reported-by: syzbot+c0272972b01b872e604a@syzkaller.appspotmail.com
Fixes: 2b8b328b61 ("vhost_net: handle polling errors when setting backend")
Signed-off-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is a change in how command line parsing is done in this version.
Excludes and includes are now ordered with the file list. Since
the spec file puts the file list before the exclude list it means newer
tar ignores the excludes and packs all the build output into the
kernel-devel RPM resulting in a huge package.
Simple argument re-ordering fixes the problem.
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Tariq Toukan says:
====================
mlx4 misc fixes for 4.16
This patchset contains misc bug fixes from the team
to the mlx4 Core and Eth drivers.
Patch 1 by Eran fixes a control mix of PFC and Global pauses, please queue it
to -stable for >= v4.8.
Patch 2 by Moshe fixes a resource leak in slave's delete flow, please queue it
to -stable for >= v4.5.
Series generated against net commit:
3c82b372a9 net: dsa: mt7530: fix module autoloading for OF platform drivers
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
mlx4_delete_all_resources_for_slave in resource tracker should free all
memory allocated for a slave.
While releasing memory of fs_rule, it misses releasing memory of
fs_rule->mirr_mbox.
Fixes: 78efed2751 ('net/mlx4_core: Support mirroring VF DMFS rules on both ports')
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Global pause and PFC configuration should be mutually exclusive (i.e. only
one of them at most can be set). However, once PFC was turned off,
driver automatically turned Global pause on. This is a bug.
Fix the driver behaviour to turn off PFC/Global once the user turned the
other on.
This also fixed a weird behaviour that at a current time, the profile
had both PFC and global pause configuration turned on, which is
Hardware-wise impossible and caused returning false positive indication
to query tools.
In addition, fix error code when setting global pause or PFC to change
metadata only upon successful change.
Also, removed useless debug print.
Fixes: af7d518526 ("net/mlx4_en: Add DCB PFC support through CEE netlink commands")
Fixes: c27a02cd94 ("mlx4_en: Add driver for Mellanox ConnectX 10GbE NIC")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Not every CLC proposal message needs the maximum buffer length.
Due to the MSG_WAITALL flag, it is important to use the peeked
real length when receiving the message.
Fixes: d63d271ce2 ("smc: switch to sock_recvmsg()")
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
llc_conn_send_pdu() pushes the skb into write queue and
calls llc_conn_send_pdus() to flush them out. However, the
status of dev_queue_xmit() is not returned to caller,
in this case, llc_conn_state_process().
llc_conn_state_process() needs hold the skb no matter
success or failure, because it still uses it after that,
therefore we should hold skb before dev_queue_xmit() when
that skb is the one being processed by llc_conn_state_process().
For other callers, they can just pass NULL and ignore
the return value as they are.
Reported-by: Noam Rathaus <noamr@beyondsecurity.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Saeed Mahameed says:
====================
Mellanox, mlx5 fixes 2018-03-23
The following series includes fixes for mlx5 netdev and eswitch.
v1->v2:
- Fixed commit message quotation marks in patch #7
For -stable v4.12
('net/mlx5e: Avoid using the ipv6 stub in the TC offload neigh update path')
('net/mlx5e: Fix traffic being dropped on VF representor')
For -stable v4.13
('net/mlx5e: Fix memory usage issues in offloading TC flows')
('net/mlx5e: Verify coalescing parameters in range')
For -stable v4.14
('net/mlx5e: Don't override vport admin link state in switchdev mode')
For -stable v4.15
('108b2b6d5c02 net/mlx5e: Sync netdev vxlan ports at open')
Please pull and let me know if there's any problem.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
strp_parser_err is called with a negative code everywhere, which then
calls abort_parser with a negative code. strp_msg_timeout calls
abort_parser directly with a positive code. Negate ETIMEDOUT
to match signed-ness of other calls.
The default abort_parser callback, strp_abort_strp, sets
sk->sk_err to err. Also negate the error here so sk_err always
holds a positive value, as the rest of the net code expects. Currently
a negative sk_err can result in endless loops, or user code that
thinks it actually sent/received err bytes.
Found while testing net/tls_sw recv path.
Fixes: 43a0c6751a ("strparser: Stream parser for messages")
Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes a bug in the tcf_dump_walker function that can cause some actions
to not be reported when dumping a large number of actions. This issue
became more aggrevated when cookies feature was added. In particular
this issue is manifest when large cookie values are assigned to the
actions and when enough actions are created that the resulting table
must be dumped in multiple batches.
The number of actions returned in each batch is limited by the total
number of actions and the memory buffer size. With small cookies
the numeric limit is reached before the buffer size limit, which avoids
the code path triggering this bug. When large cookies are used buffer
fills before the numeric limit, and the erroneous code path is hit.
For example after creating 32 csum actions with the cookie
aaaabbbbccccdddd
$ tc actions ls action csum
total acts 26
action order 0: csum (tcp) action continue
index 1 ref 1 bind 0
cookie aaaabbbbccccdddd
.....
action order 25: csum (tcp) action continue
index 26 ref 1 bind 0
cookie aaaabbbbccccdddd
total acts 6
action order 0: csum (tcp) action continue
index 28 ref 1 bind 0
cookie aaaabbbbccccdddd
......
action order 5: csum (tcp) action continue
index 32 ref 1 bind 0
cookie aaaabbbbccccdddd
Note that the action with index 27 is omitted from the report.
Fixes: 4b3550ef53 ("[NET_SCHED]: Use nla_nest_start/nla_nest_end")"
Signed-off-by: Craig Dillabaugh <cdillaba@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
pci_set_drvdata() is called only after registering the net_device,
therefore we could run into a NPE if one of the functions using
driver_data is called before it's set.
Fix this by calling pci_set_drvdata() before registering the
net_device.
This fix is a candidate for stable. As far as I can see the
bug has been there in kernel version 3.2 already, therefore
I can't provide a reference which commit is fixed by it.
The fix may need small adjustments per kernel version because
due to other changes the label which is jumped to if
register_netdev() fails has changed over time.
Reported-by: David Miller <davem@davemloft.net>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Simon Wunderlich says:
====================
Here are some batman-adv bugfixes:
- fix multicast-via-unicast transmissions for AP isolation and gateway
extension, by Linus Luessing (2 patches)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Smatch complains that "tmp" can be uninitialized if we do a zero size
write.
Fixes: 02a5d6925c ("ALSA: pcm: Avoid potential races between OSS ioctls and read/write")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Pull "Allwinner Fixes for 4.16" from Maxime Ripard:
The first and second patches fix the regulator support for the Bananapi M2
board.
The last one updates my email address in MAINTAINERS.
* tag 'sunxi-fixes-for-4.16' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/sunxi/linux:
MAINTAINERS: update email address for Maxime Ripard
ARM: dts: sun6i: a31s: bpi-m2: add missing regulators
ARM: dts: sun6i: a31s: bpi-m2: improve pmic properties
Pull "Two fixes for omap variants for v4.16-rc cycle" from Tony Lindgren:
Fix insecure W+X mapping warning for SRAM for omaps that
don't yet use drivers/misc/*sram*.c code. An earlier attempt
at fixing this turned out to cause problems with PM on omap3,
this version works with PM on omap3.
Also fix dmtimer probe for omap16xx devices that was noticed
with the pending dmtimer move to drivers. It seems this has
been broken for a while and is a non-critical for booting.
It is needed for PM on omap16xx though.
* tag 'omap-for-v4.16/sram-fix-signed' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap:
ARM: OMAP: Fix SRAM W+X mapping
ARM: OMAP: Fix dmtimer init for omap1
Pull "ARM: tegra: Miscellaneous changes for v4.17-rc1" from Thierry Reding:
This contains a single patch to update the MAINTAINERS entry for the
Tegra SMMU driver.
* tag 'tegra-for-4.17-misc' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/tegra/linux:
MAINTAINERS: Update Tegra IOMMU maintainer
The following pattern fails to compile while the same pattern
with alternative_call() does:
if (...)
alternative_call_2(...);
else
alternative_call_2(...);
as it expands into
if (...)
{
}; <===
else
{
};
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20180114120504.GA11368@avx2
- Programming VMID correctly for scratch memory with HWS
- deallocating SDMA queues correctly in various situations
* tag 'drm-amdkfd-fixes-2018-03-25' of git://people.freedesktop.org/~gabbayo/linux:
drm/amdkfd: Deallocate SDMA queues correctly
drm/amdkfd: Fix scratch memory with HWS enabled
this patch fix a bug in how the pebs->real_ip is handled in the PEBS
handler. real_ip only exists in Haswell and later processor. It is
actually the eventing IP, i.e., where the event occurred. As opposed
to the pebs->ip which is the PEBS interrupt IP which is always off
by one.
The problem is that the real_ip just like the IP needs to be fixed up
because PEBS does not record all the machine state registers, and
in particular the code segement (cs). This is why we have the set_linear_ip()
function. The problem was that set_linear_ip() was only used on the pebs->ip
and not the pebs->real_ip.
We have profiles which ran into invalid callstacks because of this.
Here is an example:
..... 0: ffffffffffffff80 recent entry, marker kernel v
..... 1: 000000000040044d <= user address in kernel space!
..... 2: fffffffffffffe00 marker enter user v
..... 3: 000000000040044d
..... 4: 00000000004004b6 oldest entry
Debugging output in get_perf_callchain():
[ 857.769909] CALLCHAIN: CPU8 ip=40044d regs->cs=10 user_mode(regs)=0
The problem is that the kernel entry in 1: points to a user level
address. How can that be?
The reason is that with PEBS sampling the instruction that caused the event
to occur and the instruction where the CPU was when the interrupt was posted
may be far apart. And sometime during that time window, the privilege level may
change. This happens, for instance, when the PEBS sample is taken close to a
kernel entry point. Here PEBS, eventing IP (real_ip) captured a user level
instruction. But by the time the PMU interrupt fired, the processor had already
entered kernel space. This is why the debug output shows a user address with
user_mode() false.
The problem comes from PEBS not recording the code segment (cs) register.
The register is used in x86_64 to determine if executing in kernel vs user
space. This is okay because the kernel has a software workaround called
set_linear_ip(). But the issue in setup_pebs_sample_data() is that
set_linear_ip() is never called on the real_ip value when it is available
(Haswell and later) and precise_ip > 1.
This patch fixes this problem and eliminates the callchain discrepancy.
The patch restructures the code around set_linear_ip() to minimize the number
of times the IP has to be set.
Signed-off-by: Stephane Eranian <eranian@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: kan.liang@intel.com
Link: http://lkml.kernel.org/r/1521788507-10231-1-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Since the ORC unwinder was made the default on x86_64, Clang-built
defconfig kernels have triggered some new objtool warnings:
drivers/gpu/drm/i915/i915_gpu_error.o: warning: objtool: i915_error_printf()+0x6c: return with modified stack frame
drivers/gpu/drm/i915/intel_display.o: warning: objtool: pipe_config_err()+0xa6: return with modified stack frame
The problem is that objtool has never seen clang-built binaries before.
Shockingly enough, objtool is apparently able to follow the code flow
mostly fine, except for one instruction sequence. Instead of a LEAVE
instruction, clang restores RSP and RBP the long way:
67c: 48 89 ec mov %rbp,%rsp
67f: 5d pop %rbp
Teach objtool about this new code sequence.
Reported-and-test-by: Matthias Kaehlcke <mka@chromium.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthias Kaehlcke <mka@chromium.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/fce88ce81c356eedcae7f00ed349cfaddb3363cc.1521741586.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
When mlx5_core is loaded it is expected to sync ports
with all vxlan devices so it can support vxlan encap/decap.
This is done via udp_tunnel_get_rx_info(). Currently this
call is set in mlx5e_nic_enable() and if the netdev is not in
NETREG_REGISTERED state it will not be called.
Normally on load the netdev state is not NETREG_REGISTERED
so udp_tunnel_get_rx_info() will not be called.
Moving udp_tunnel_get_rx_info() to mlx5e_open() so
it will be called on netdev UP event and allow encap/decap.
Fixes: 610e89e05c ("net/mlx5e: Don't sync netdev state when not registered")
Signed-off-by: Shahar Klein <shahark@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Currently we use the global ipv6_stub var to access the ipv6 global
nd table. This practice gets us to troubles when the stub is only partially
set e.g when ipv6 is loaded under the disabled policy. In this case, as of commit
343d60aada ("ipv6: change ipv6_stub_impl.ipv6_dst_lookup to take net argument")
the stub is not null, but stub->nd_tbl is and we crash.
As we can access the ipv6 nd_tbl directly, the fix is just to avoid the
reference through the stub. There is one place in the code where we
issue ipv6 route lookup and keep doing it through the stub, but that
mentioned commit makes sure we get -EAFNOSUPPORT from the stack.
Fixes: 232c001398 ("net/mlx5e: Add support to neighbour update flow")
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Aviv Heller <avivh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
For NIC flows, the parsed attributes are not freed when we exit
successfully from mlx5e_configure_flower().
There is possible double free for eswitch flows. If error is returned
from rhashtable_insert_fast(), the parse attrs will be freed in
mlx5e_tc_del_flow(), but they will be freed again before exiting
mlx5e_configure_flower().
To fix both issues we do the following:
(1) change the condition that determines if to issue the free call to
check if this flow is NIC flow, or it does not have encap action.
(2) reorder the code such that that the check and free calls are done
before we attempt to add into the hash table.
Fixes: 232c001398 ('net/mlx5e: Add support to neighbour update flow')
Signed-off-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Increase representor netdev RQ size to avoid dropped packets.
The current size (two) is just too small to keep up with
conventional slow path traffic patterns.
Also match the SQ size to the RQ size.
Fixes: cb67b83292 ("net/mlx5e: Introduce SRIOV VF representors")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Add check of coalescing parameters received through ethtool are within
range of values supported by the HW.
Driver gets the coalescing rx/tx-usecs and rx/tx-frames as set by the
users through ethtool. The ethtool support up to 32 bit value for each.
However, mlx5 modify cq limits the coalescing time parameter to 12 bit
and coalescing frames parameters to 16 bits.
Return out of range error if user tries to set these parameters to
higher values.
Fixes: f62b8bb8f2 ('net/mlx5: Extend mlx5_core to support ConnectX-4 Ethernet functionality')
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Add dependancy for switchdev to be congfigured as any user-space control
plane SW is expected to use the HW switchdev ID to locate the representors
related to VFs of a certain PF and apply SW/offloaded switching on them.
Fixes: e80541ecab ('net/mlx5: Add CONFIG_MLX5_ESWITCH Kconfig')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
SQs are 32 and not 16 bits, hence it's wrong to use only 16 bits to
store the sq number for which are going to set steering rule, fix that.
Fixes: cb67b83292 ('net/mlx5e: Introduce SRIOV VF representors')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
The vport admin original link state will be re-applied after returning
back to legacy mode, it is not right to change the admin link state value
when in switchdev mode.
Use direct vport commands to alter logical vport state in netdev
representor open/close flows rather than the administrative eswitch API.
Fixes: 20a1ea6747 ('net/mlx5e: Support VF vport link state control for SRIOV switchdev mode')
Signed-off-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
It's required to create a modules.alias via MODULE_DEVICE_TABLE helper
for the OF platform driver. Otherwise, module autoloading cannot work.
Signed-off-by: Sean Wang <sean.wang@mediatek.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
MODULE_ALIAS exports information to allow the module to be auto-loaded at
boot for the drivers registered using legacy platform registration.
However, currently the driver is always used by DT-only platform,
MODULE_ALIAS is redundant and should be removed properly.
Signed-off-by: Sean Wang <sean.wang@mediatek.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
We try to hold TX virtqueue mutex in vhost_net_rx_peek_head_len()
after RX virtqueue mutex is held in handle_rx(). This requires an
appropriate lock nesting notation to calm down deadlock detector.
Fixes: 0308813724 ("vhost_net: basic polling support")
Reported-by: syzbot+7f073540b1384a614e09@syzkaller.appspotmail.com
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The same fix as in 'bonding: move dev_mc_sync after master_upper_dev_link
in bond_enslave' is needed for team driver.
The panic can be reproduced easily:
ip link add team1 type team
ip link set team1 up
ip link add link team1 vlan1 type vlan id 80
ip link set vlan1 master team1
Fixes: cb41c997d4 ("team: team should sync the port's uc/mc addrs when add a port")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long says:
====================
bonding: a bunch of fixes for dev hwaddr sync in bond_enslave
This patchset is mainly to fix a crash when adding vlan as slave of
bond which is also the parent link in patch 2/3, and also fix some
err process problems in bond_enslave in patch 1/3 and 3/3.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
When dev_set_promiscuity(1) succeeds but dev_set_allmulti(1) fails,
dev_set_promiscuity(-1) should be done before going to the err path.
Otherwise, dev->promiscuity will leak.
Fixes: 7e1a1ac1fb ("bonding: Check return of dev_set_promiscuity/allmulti")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Beniamino found a crash when adding vlan as slave of bond which is also
the parent link:
ip link add bond1 type bond
ip link set bond1 up
ip link add link bond1 vlan1 type vlan id 80
ip link set vlan1 master bond1
The call trace is as below:
[<ffffffffa850842a>] queued_spin_lock_slowpath+0xb/0xf
[<ffffffffa8515680>] _raw_spin_lock+0x20/0x30
[<ffffffffa83f6f07>] dev_mc_sync+0x37/0x80
[<ffffffffc08687dc>] vlan_dev_set_rx_mode+0x1c/0x30 [8021q]
[<ffffffffa83efd2a>] __dev_set_rx_mode+0x5a/0xa0
[<ffffffffa83f7138>] dev_mc_sync_multiple+0x78/0x80
[<ffffffffc084127c>] bond_enslave+0x67c/0x1190 [bonding]
[<ffffffffa8401909>] do_setlink+0x9c9/0xe50
[<ffffffffa8403bf2>] rtnl_newlink+0x522/0x880
[<ffffffffa8403ff7>] rtnetlink_rcv_msg+0xa7/0x260
[<ffffffffa8424ecb>] netlink_rcv_skb+0xab/0xc0
[<ffffffffa83fe498>] rtnetlink_rcv+0x28/0x30
[<ffffffffa8424850>] netlink_unicast+0x170/0x210
[<ffffffffa8424bf8>] netlink_sendmsg+0x308/0x420
[<ffffffffa83cc396>] sock_sendmsg+0xb6/0xf0
This is actually a dead lock caused by sync slave hwaddr from master when
the master is the slave's 'slave'. This dead loop check is actually done
by netdev_master_upper_dev_link. However, Commit 1f718f0f4f ("bonding:
populate neighbour's private on enslave") moved it after dev_mc_sync.
This patch is to fix it by moving dev_mc_sync after master_upper_dev_link,
so that this loop check would be earlier than dev_mc_sync. It also moves
if (mode == BOND_MODE_8023AD) into if (!bond_uses_primary) clause as an
improvement.
Note team driver also has this issue, I will fix it in another patch.
Fixes: 1f718f0f4f ("bonding: populate neighbour's private on enslave")
Reported-by: Beniamino Galvani <bgalvani@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
vlan_vids_add_by_dev is called right after dev hwaddr sync, so on
the err path it should unsync dev hwaddr. Otherwise, the slave
dev's hwaddr will never be unsync when this err happens.
Fixes: 1ff412ad77 ("bonding: change the bond's vlan syncing functions with the standard ones")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Acked-by: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
After the qdisc lock was dropped in pfifo_fast we allow multiple
enqueue threads and dequeue threads to run in parallel. On the
enqueue side the skb bit ooo_okay is used to ensure all related
skbs are enqueued in-order. On the dequeue side though there is
no similar logic. What we observe is with fewer queues than CPUs
it is possible to re-order packets when two instances of
__qdisc_run() are running in parallel. Each thread will dequeue
a skb and then whichever thread calls the ndo op first will
be sent on the wire. This doesn't typically happen because
qdisc_run() is usually triggered by the same core that did the
enqueue. However, drivers will trigger __netif_schedule()
when queues are transitioning from stopped to awake using the
netif_tx_wake_* APIs. When this happens netif_schedule() calls
qdisc_run() on the same CPU that did the netif_tx_wake_* which
is usually done in the interrupt completion context. This CPU
is selected with the irq affinity which is unrelated to the
enqueue operations.
To resolve this we add a RUNNING bit to the qdisc to ensure
only a single dequeue per qdisc is running. Enqueue and dequeue
operations can still run in parallel and also on multi queue
NICs we can still have a dequeue in-flight per qdisc, which
is typically per CPU.
Fixes: c5ad119fb6 ("net: sched: pfifo_fast use skb_array")
Reported-by: Jakob Unterwurzacher <jakob.unterwurzacher@theobroma-systems.com>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
BroadMobi BM806U is an Qualcomm MDM9225 based 3G/4G modem.
Tested hardware BM806U is mounted on D-Link DWR-921-C3 router.
The USB id is added to qmi_wwan.c to allow QMI communication with
the BM806U.
Tested on 4.14 kernel and OpenWRT.
Signed-off-by: Pawel Dembicki <paweldembicki@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
and switch to https where possible.
All links have been eyeballed to verify that the domains have
not changed, etc.
Signed-off-by: Sanjeev Gupta <ghane0@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When trying to use the driver (e.g. aplay *.wav), the 4MiB DMA buffer
will get mmapp'ed in 16KiB chunks. But this fails with the 2nd 16KiB
area, as the page offset is outside of the VMA range (size), which is
currently used as size parameter in snd_pcm_lib_default_mmap(). By
using the DMA buffer size (dma_bytes) instead, the complete DMA buffer
can be mmapp'ed and the issue is fixed.
This issue was detected on an ARM platform (TI AM57xx) using the RME
HDSP MADI PCIe soundcard.
Fixes: 657b1989da ("ALSA: pcm - Use dma_mmap_coherent() if available")
Signed-off-by: Stefan Roese <sr@denx.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Kalle Valo says:
====================
wireless-drivers fixes for 4.16
Some fixes for 4.16, only for iwlwifi and brcmfmac this time. All
pretty small.
iwlwifi
* fix an issue with the multicast queue
* fix IGTK handling
* fix some missing return value checks
* add support for a HW workaround for issues on some platforms
* a couple of fixes for channel-switch
* a few fixes for the aggregation handling code
brcmfmac
* drop Inter-Access Point Protocol packets by default
* fix check for ISO3166 regulatory code
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
KMSAN reports use of uninitialized memory in the case when |alen| is
smaller than sizeof(struct sockaddr_nl), and therefore |nladdr| isn't
fully copied from the userspace.
Signed-off-by: Alexander Potapenko <glider@google.com>
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Description:
EEE does not work with lan7800 when AutoSpeed is not set.
(This can happen when EEPROM is not populated or configured incorrectly)
Root-Cause:
When EEE is enabled, the mac config register ASD is not set
i.e. in default state, causing EEE fail.
Fix:
Set the register when eeprom is not present.
Fixes: 55d7de9de6 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet device driver")
Signed-off-by: Raghuram Chary J <raghuramchary.jallipalli@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, the SMC experimental TCP option in a SYN packet is lost on
the server side when SYN Cookies are active. However, the corresponding
SYNACK sent back to the client contains the SMC option. This causes an
inconsistent view of the SMC capabilities on the client and server.
This patch disables the SMC option in the SYNACK when SYN Cookies are
active to avoid this issue.
Fixes: 60e2a77807 ("tcp: TCP experimental option for SMC")
Signed-off-by: Hans Wippel <hwippel@linux.vnet.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The SLB bad address handler's trap number fixup does not preserve the
low bit that indicates nonvolatile GPRs have not been saved. This
leads save_nvgprs to skip saving them, and subsequent functions and
return from interrupt will think they are saved.
This causes kernel branch-to-garbage debugging to not have correct
registers, can also cause userspace to have its registers clobbered
after a segfault.
Fixes: f0f558b131 ("powerpc/mm: Preserve CFAR value on SLB miss caused by access to bogus address")
Cc: stable@vger.kernel.org # v4.9+
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Pull dmaengine fix from Vinod Koul:
"One small fix for stm32-dmamux fixing buffer overflow"
* tag 'dmaengine-fix-4.16-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/slave-dma:
dmaengine: stm32-dmamux: fix a potential buffer overflow
Pull x86 and PTI fixes from Ingo Molnar:
"Misc fixes:
- fix EFI pagetables freeing
- fix vsyscall pagetable setting on Xen PV guests
- remove ancient CONFIG_X86_PPRO_FENCE=y - x86 is TSO again
- fix two binutils (ld) development version related incompatibilities
- clean up breakpoint handling
- fix an x86 self-test"
* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/entry/64: Don't use IST entry for #BP stack
x86/efi: Free efi_pgd with free_pages()
x86/vsyscall/64: Use proper accessor to update P4D entry
x86/cpu: Remove the CONFIG_X86_PPRO_FENCE=y quirk
x86/boot/64: Verify alignment of the LOAD segment
x86/build/64: Force the linker to use 2MB page size
selftests/x86/ptrace_syscall: Fix for yet more glibc interference
Pull scheduler fixes from Ingo Molnar:
"Two sched debug output related fixes: a console output fix and
formatting fixes"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/debug: Adjust newlines for better alignment
sched/debug: Fix per-task line continuation for console output
Pull locking fixes from Ingo Molnar:
"Two fixes: tighten up a jump-labels warning to not trigger on certain
modules and fix confusing (and non-existent) mutex API documentation"
* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
jump_label: Disable jump labels in __exit code
locking/mutex: Improve documentation
Pull mqueuefs revert from Eric Biederman:
"This fixes a regression that came in the merge window for v4.16.
The problem is that the permissions for mounting and using the
mqueuefs filesystem are broken. The necessary permission check is
missing letting people who should not be able to mount mqueuefs mount
mqueuefs. The field sb->s_user_ns is set incorrectly not allowing the
mounter of mqueuefs to remount and otherwise have proper control over
the filesystem.
Al Viro and I see the path to the necessary fixes differently and I am
not even certain at this point he actually sees all of the necessary
fixes. Given a couple weeks we can probably work something out but I
don't see the review being resolved in time for the final v4.16. I
don't want v4.16 shipping with a nasty regression. So unfortunately I
am sending a revert"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
Revert "mqueue: switch to on-demand creation of internal mount"
This reverts commit 36735a6a2b.
Aleksa Sarai <asarai@suse.de> writes:
> [REGRESSION v4.16-rc6] [PATCH] mqueue: forbid unprivileged user access to internal mount
>
> Felix reported weird behaviour on 4.16.0-rc6 with regards to mqueue[1],
> which was introduced by 36735a6a2b ("mqueue: switch to on-demand
> creation of internal mount").
>
> Basically, the reproducer boils down to being able to mount mqueue if
> you create a new user namespace, even if you don't unshare the IPC
> namespace.
>
> Previously this was not possible, and you would get an -EPERM. The mount
> is the *host* mqueue mount, which is being cached and just returned from
> mqueue_mount(). To be honest, I'm not sure if this is safe or not (or if
> it was intentional -- since I'm not familiar with mqueue).
>
> To me it looks like there is a missing permission check. I've included a
> patch below that I've compile-tested, and should block the above case.
> Can someone please tell me if I'm missing something? Is this actually
> safe?
>
> [1]: https://github.com/docker/docker/issues/36674
The issue is a lot deeper than a missing permission check. sb->s_user_ns
was is improperly set as well. So in addition to the filesystem being
mounted when it should not be mounted, so things are not allow that should
be.
We are practically to the release of 4.16 and there is no agreement between
Al Viro and myself on what the code should looks like to fix things properly.
So revert the code to what it was before so that we can take our time
and discuss this properly.
Fixes: 36735a6a2b ("mqueue: switch to on-demand creation of internal mount")
Reported-by: Felix Abecassis <fabecassis@nvidia.com>
Reported-by: Aleksa Sarai <asarai@suse.de>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for your net tree,
they are:
1) Don't pick fixed hash implementation for NFT_SET_EVAL sets, otherwise
userspace hits EOPNOTSUPP with valid rules using the meter statement,
from Florian Westphal.
2) If you send a batch that flushes the existing ruleset (that contains
a NAT chain) and the new ruleset definition comes with a new NAT
chain, don't bogusly hit EBUSY. Also from Florian.
3) Missing netlink policy attribute validation, from Florian.
4) Detach conntrack template from skbuff if IP_NODEFRAG is set on,
from Paolo Abeni.
5) Cache device names in flowtable object, otherwise we may end up
walking over devices going aways given no rtnl_lock is held.
6) Fix incorrect net_device ingress with ingress hooks.
7) Fix crash when trying to read more data than available in UDP
packets from the nf_socket infrastructure, from Subash.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
skb_header_pointer will copy data into a buffer if data is non linear,
otherwise it will return a pointer in the linear section of the data.
nf_sk_lookup_slow_v{4,6} always copies data of size udphdr but later
accesses memory within the size of tcphdr (th->doff) in case of TCP
packets. This causes a crash when running with KASAN with the following
call stack -
BUG: KASAN: stack-out-of-bounds in xt_socket_lookup_slow_v4+0x524/0x718
net/netfilter/xt_socket.c:178
Read of size 2 at addr ffffffe3d417a87c by task syz-executor/28971
CPU: 2 PID: 28971 Comm: syz-executor Tainted: G B W O 4.9.65+ #1
Call trace:
[<ffffff9467e8d390>] dump_backtrace+0x0/0x428 arch/arm64/kernel/traps.c:76
[<ffffff9467e8d7e0>] show_stack+0x28/0x38 arch/arm64/kernel/traps.c:226
[<ffffff946842d9b8>] __dump_stack lib/dump_stack.c:15 [inline]
[<ffffff946842d9b8>] dump_stack+0xd4/0x124 lib/dump_stack.c:51
[<ffffff946811d4b0>] print_address_description+0x68/0x258 mm/kasan/report.c:248
[<ffffff946811d8c8>] kasan_report_error mm/kasan/report.c:347 [inline]
[<ffffff946811d8c8>] kasan_report.part.2+0x228/0x2f0 mm/kasan/report.c:371
[<ffffff946811df44>] kasan_report+0x5c/0x70 mm/kasan/report.c:372
[<ffffff946811bebc>] check_memory_region_inline mm/kasan/kasan.c:308 [inline]
[<ffffff946811bebc>] __asan_load2+0x84/0x98 mm/kasan/kasan.c:739
[<ffffff94694d6f04>] __tcp_hdrlen include/linux/tcp.h:35 [inline]
[<ffffff94694d6f04>] xt_socket_lookup_slow_v4+0x524/0x718 net/netfilter/xt_socket.c:178
Fix this by copying data into appropriate size headers based on protocol.
Fixes: a583636a83 ("inet: refactor inet[6]_lookup functions to take skb")
Signed-off-by: Tejaswi Tanikella <tejaswit@codeaurora.org>
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
NFP program allocation length is in bytes and NFP program length
is in instructions, fix the comparison of the two.
Fixes: 9314c442d7 ("nfp: bpf: move translation prepare to offload.c")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Pull pin control fixes from Linus Walleij:
"Two fixes for pin control for v4.16:
- Renesas SH-PFC: remove a duplicate clkout pin which was causing
crashes
- fix Samsung out of bounds exceptions"
* tag 'pinctrl-v4.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
pinctrl: samsung: Validate alias coming from DT
pinctrl: sh-pfc: r8a7795: remove duplicate of CLKOUT pin in pinmux_pins[]
Send nm complaints about broken pipe (when sed exits early) to /dev/null.
All errors should be printed to stderr.
Don't trap on normal exit so the trap can return an error code.
Signed-off-by: Nicolas Pitre <nico@linaro.org>
Tested-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Define vdso_start, vdso_end as array to avoid compile-time analysis error
for the case of built with CONFIG_FORTIFY_SOURCE.
and, since vdso_start, vdso_end are used in vdso.c only,
move extern-declaration from vdso.h to vdso.c.
If kernel is built with CONFIG_FORTIFY_SOURCE,
compile-time error happens at this code.
- if (memcmp(&vdso_start, "177ELF", 4))
The size of "&vdso_start" is recognized as 1 byte, but n is 4,
So that compile-time error is reported.
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Jinbum Park <jinb.park7@gmail.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Without CONFIG_MMU, this results in a build failure:
./arch/arm/include/asm/memory.h:92:23: error: initializer element is not constant
#define VECTORS_BASE vectors_base
arch/arm/mm/dump.c:32:4: note: in expansion of macro 'VECTORS_BASE'
{ VECTORS_BASE, "Vectors" },
arch/arm/mm/dump.c:71:11: error: 'L_PTE_USER' undeclared here (not in a function); did you mean 'VTIME_USER'?
.mask = L_PTE_USER,
^~~~~~~~~~
Obviously the feature only makes sense with an MMU, so let's add the
dependency here.
Fixes: a8e53c151f ("ARM: 8737/1: mm: dump: add checking for writable and executable")
Acked-by: Laura Abbott <labbott@redhat.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Commit 384b38b669 ("ARM: 7873/1: vfp: clear vfp_current_hw_state
for dying cpu") fixed the cpu dying notifier by clearing
vfp_current_hw_state[]. However commit e5b61bafe7 ("arm: Convert VFP
hotplug notifiers to state machine") incorrectly used the original
vfp_force_reload() function in the cpu dying notifier.
Fix it by going back to clearing vfp_current_hw_state[].
Fixes: e5b61bafe7 ("arm: Convert VFP hotplug notifiers to state machine")
Cc: linux-stable <stable@vger.kernel.org>
Reported-by: Kohji Okuno <okuno.kohji@jp.panasonic.com>
Signed-off-by: Fabio Estevam <fabio.estevam@nxp.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
DHCP connectivity issues can currently occur if the following conditions
are met:
1) A DHCP packet from a client to a server
2) This packet has a multicast destination
3) This destination has a matching entry in the translation table
(FF:FF:FF:FF:FF:FF for IPv4, 33:33:00:01:00:02/33:33:00:01:00:03
for IPv6)
4) The orig-node determined by TT for the multicast destination
does not match the orig-node determined by best-gateway-selection
In this case the DHCP packet will be dropped.
The "gateway-out-of-range" check is supposed to only be applied to
unicasted DHCP packets to a specific DHCP server.
In that case dropping the the unicasted frame forces the client to
retry via a broadcasted one, but now directed to the new best
gateway.
A DHCP packet with broadcast/multicast destination is already ensured to
always be delivered to the best gateway. Dropping a multicasted
DHCP packet here will only prevent completing DHCP as there is no
other fallback.
So far, it seems the unicast check was implicitly performed by
expecting the batadv_transtable_search() to return NULL for multicast
destinations. However, a multicast address could have always ended up in
the translation table and in fact is now common.
To fix this potential loss of a DHCP client-to-server packet to a
multicast address this patch adds an explicit multicast destination
check to reliably bail out of the gateway-out-of-range check for such
destinations.
The issue and fix were tested in the following three node setup:
- Line topology, A-B-C
- A: gateway client, DHCP client
- B: gateway server, hop-penalty increased: 30->60, DHCP server
- C: gateway server, code modifications to announce FF:FF:FF:FF:FF:FF
Without this patch, A would never transmit its DHCP Discover packet
due to an always "out-of-range" condition. With this patch,
a full DHCP handshake between A and B was possible again.
Fixes: be7af5cf9c ("batman-adv: refactoring gateway handling code")
Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
For multicast frames AP isolation is only supposed to be checked on
the receiving nodes and never on the originating one.
Furthermore, the isolation or wifi flag bits should only be intepreted
as such for unicast and never multicast TT entries.
By injecting flags to the multicast TT entry claimed by a single
target node it was verified in tests that this multicast address
becomes unreachable, leading to packet loss.
Omitting the "src" parameter to the batadv_transtable_search() call
successfully skipped the AP isolation check and made the target
reachable again.
Fixes: 1d8ab8d3c1 ("batman-adv: Modified forwarding behaviour for multicast packets")
Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Pull kprobe fixes from Steven Rostedt:
"The documentation for kprobe events says that symbol offets can take
both a + and - sign to get to befor and after the symbol address.
But in actuality, the code does not support the minus. This fixes that
issue, and adds a few more selftests to kprobe events"
* tag 'trace-v4.16-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
selftests: ftrace: Add a testcase for probepoint
selftests: ftrace: Add a testcase for string type with kprobe_event
selftests: ftrace: Add probe event argument syntax testcase
tracing: probeevent: Fix to support minus offset from symbol
There's nothing IST-worthy about #BP/int3. We don't allow kprobes
in the small handful of places in the kernel that run at CPL0 with
an invalid stack, and 32-bit kernels have used normal interrupt
gates for #BP forever.
Furthermore, we don't allow kprobes in places that have usergs while
in kernel mode, so "paranoid" is also unnecessary.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Deallocate SDMA queues during abnormal process termination and when
queue creation fails after the SDMA allocation.
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Pull MIPS fixes from James Hogan:
"Another miscellaneous pile of MIPS fixes for 4.16:
- lantiq: fixes for clocks and Amazon SE (4.14)
- ralink: fix booting on MT7621 (4.5)
- ralink: fix halt (3.9)"
* tag 'mips_fixes_4.16_5' of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/mips:
MIPS: ralink: Fix booting on MT7621
MIPS: ralink: Remove ralink_halt()
MIPS: lantiq: ase: Enable MFD_SYSCON
MIPS: lantiq: Enable AHB Bus for USB
MIPS: lantiq: Fix Danube USB clock
Pull VFIO fix from Alex Williamson:
"Revert masking INTx where it cannot be enabled - it plays poorly with
SR-IOV VFs and presumes DisINTx support"
* tag 'vfio-v4.16-rc7' of git://github.com/awilliam/linux-vfio:
Revert: "vfio-pci: Mask INTx if a device is not capabable of enabling it"
Pull MTD fixes from Boris Brezillon:
- Fix several problems in the fsl_ifc NAND controller driver
- Fix misuse of mtd_ooblayout_ecc() in mtdchar.c
* tag 'mtd/fixes-for-4.16-rc7' of git://git.infradead.org/linux-mtd:
mtd: nand: fsl_ifc: Read ECCSTAT0 and ECCSTAT1 registers for IFC 2.0
mtd: nand: fsl_ifc: Fix eccstat array overflow for IFC ver >= 2.0.0
mtd: nand: fsl_ifc: Fix nand waitfunc return value
mtdchar: fix usage of mtd_ooblayout_ecc()
Pull staging/IIO fixes from Greg KH:
"Here are a few small staging and IIO fixes for various reported
issues.
All of them are tiny, the majority being iio driver fixes for small
issues, and one staging driver fix for a memory corruption issue.
All have been in linux-next with no reported issues"
* tag 'staging-4.16-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
staging: ncpfs: memory corruption in ncp_read_kernel()
iio: st_pressure: st_accel: pass correct platform data to init
Revert "iio: accel: st_accel: remove redundant pointer pdata"
iio: adc: meson-saradc: unlock on error in meson_sar_adc_lock()
dt-bindings: iio: adc: sd-modulator: fix io-channel-cells
iio: adc: stm32-dfsdm: fix multiple channel initialization
iio: adc: stm32-dfsdm: fix clock source selection
iio: adc: stm32-dfsdm: fix call to stop channel
iio: adc: stm32-dfsdm: fix compatible data use
iio: chemical: ccs811: Corrected firmware boot/application mode transition
Pull hyperv fix from Greg KH:
"This is a single hyperv bugfix for 4.16-rc7.
It resolves an issue with the ring-buffer signaling to resolve
reported problems.
It's been in linux-next for a while now with no reported issues"
* tag 'char-misc-4.16-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
Drivers: hv: vmbus: Fix ring buffer signaling
Pull media fixes from Mauro Carvalho Chehab:
"Three fixes:
- dvb: fix a Kconfig typo on a help text
- tegra-cec: reset rx_buf_cnt when start bit detected
- rc: lirc does not use LIRC_CAN_SEND_SCANCODE feature"
* tag 'media/v4.16-4' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
media: dvb: fix a Kconfig typo
media: tegra-cec: reset rx_buf_cnt when start bit detected
media: rc: lirc does not use LIRC_CAN_SEND_SCANCODE feature
Segment registers must be synchronized prior to any code that may
trigger a call to emulation_required()/guest_state_valid(), e.g.
vmx_set_cr0(). Because preparing vmcs02 writes segmentation fields
directly, i.e. doesn't use vmx_set_segment(), emulation_required
will not be re-evaluated when synchronizing the segment registers,
which can result in L0 incorrectly starting emulation of L2.
Fixes: 8665c3f973 ("KVM: nVMX: initialize descriptor cache fields in prepare_vmcs02_full")
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
[Move all of prepare_vmcs02_full earlier, not just segment registers. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Pull sound fixes from Takashi Iwai:
"Things look calming down, but people were still busy to plaster over
small holes:
- Two fixes to harden against races in aloop driver
- A correction of a long-standing bug in USB-audio UAC2 processing
unit parser
- As usual suspects, HD-audio: a workaround for Coffee Lake
controller and a few other device-specific fixes
All small and for stable"
* tag 'sound-4.16-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: aloop: Fix access to not-yet-ready substream via cable
ALSA: aloop: Sync stale timer before release
ALSA: hda/realtek - Fix speaker no sound after system resume
ALSA: hda/realtek - Fix Dell headset Mic can't record
ALSA: hda - Force polling mode on CFL for fixing codec communication
ALSA: usb-audio: Fix parsing descriptor of UAC2 processing unit
ALSA: hda/realtek - Always immediately update mute LED with pin VREF
Ido Schimmel says:
====================
mlxsw: Handle changes to MTU in GRE tunnels
Petr says:
When offloading GRE tunnels, the MTU setting is kept fixed after the
initial offload even as the slow-path configuration changed. Worse: the
offloaded MTU setting is actually just a transient value set at the time
of NETDEV_REGISTER of the tunnel. As of commit ffc2b6ee41 ("ip_gre:
fix IFLA_MTU ignored on NEWLINK"), that transient value is zero, and
unless there's e.g. a VRF migration that prompts re-offload, it stays at
zero, and all GRE packets end up trapping.
Thus, in patch #1, change the way the MTU is changed post-registration,
so that the full event protocol is observed. That way the drivers get to
see the change and have a chance to react.
In the remaining two patches, implement support for MTU change in mlxsw
driver.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Update MTU of overlay loopback in accordance with the setting on the
tunnel netdevice.
Fixes: 0063587d35 ("mlxsw: spectrum: Support decap-only IP-in-IP tunnels")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move the function so that it can be called without forward declaration
from a function that will be added in a follow-up patch.
Fixes: 0063587d35 ("mlxsw: spectrum: Support decap-only IP-in-IP tunnels")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For tunnels created with IFLA_MTU, MTU of the netdevice is set by
rtnl_create_link() (called from rtnl_newlink()) before the device is
registered. However without IFLA_MTU that's not done.
rtnl_newlink() proceeds by calling struct rtnl_link_ops.newlink, which
via ip_tunnel_newlink() calls register_netdevice(), and that emits
NETDEV_REGISTER. Thus any listeners that inspect the netdevice get the
MTU of 0.
After ip_tunnel_newlink() corrects the MTU after registering the
netdevice, but since there's no event, the listeners don't get to know
about the MTU until something else happens--such as a NETDEV_UP event.
That's not ideal.
So instead of setting the MTU directly, go through dev_set_mtu(), which
takes care of distributing the necessary NETDEV_PRECHANGEMTU and
NETDEV_CHANGEMTU events.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
On POWER9, under some circumstances, a broadcast TLB invalidation
might complete before all previous stores have drained, potentially
allowing stale stores from becoming visible after the invalidation.
This works around it by doubling up those TLB invalidations which was
verified by HW to be sufficient to close the risk window.
This will be documented in a yet-to-be-published errata.
Fixes: 1a472c9dba ("powerpc/mm/radix: Add tlbflush routines")
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
[mpe: Enable the feature in the DT CPU features code for all Power9,
rename the feature to CPU_FTR_P9_TLBIE_BUG per benh.]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
A recent commit introduced a new struct xfrm_trans_cb
that is used with the sk_buff control buffer. Unfortunately
it placed the structure in front of the control buffer and
overlooked that the IPv4/IPv6 control buffer is still needed
for some layer 4 protocols. As a result the IPv4/IPv6 control
buffer is overwritten with this structure. Fix this by setting
a apropriate header in front of the structure.
Fixes acf568ee85 ("xfrm: Reinject transport-mode packets ...")
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
On POWER9 the Nest MMU may fail to invalidate some translations when
doing a tlbie "by PID" or "by LPID" that is targeted at the TLB only
and not the page walk cache.
This works around it by forcing such invalidations to escalate to
RIC=2 (full invalidation of TLB *and* PWC) when a coprocessor is in
use for the context.
Fixes: 03b8abedf4 ("cxl: Enable global TLBIs for cxl contexts")
Cc: stable@vger.kernel.org # v4.15+
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Balbir Singh <bsingharora@gmail.com>
[balbirs: fixed spelling and coding style to quiesce checkpatch.pl]
Tested-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Currently, when using coprocessors (which use the Nest MMU), we
simply increment the active_cpu count to force all TLB invalidations
to be come broadcast.
Unfortunately, due to an errata in POWER9, we will need to know
more specifically that coprocessors are in use.
This maintains a separate copros counter in the MMU context for
that purpose.
NB. The commit mentioned in the fixes tag below is not at fault for
the bug we're fixing in this commit and the next, but this fix applies
on top the infrastructure it introduced.
Fixes: 03b8abedf4 ("cxl: Enable global TLBIs for cxl contexts")
Cc: stable@vger.kernel.org # v4.15+
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Tested-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Since commit 6964e6a4e4 ("KVM: PPC: Book3S HV: Do SLB load/unload
with guest LPCR value loaded", 2018-01-11), we have been seeing
occasional machine check interrupts on POWER8 systems when running
KVM guests, due to SLB multihit errors.
This turns out to be due to the guest exit code reloading the host
SLB entries from the SLB shadow buffer when the SLB was not previously
cleared in the guest entry path. This can happen because the path
which skips from the guest entry code to the guest exit code without
entering the guest now does the skip before the SLB is cleared and
loaded with guest values, but the host values are loaded after the
point in the guest exit path that we skip to.
To fix this, we move the code that reloads the host SLB values up
so that it occurs just before the point in the guest exit code (the
label guest_bypass:) where we skip to from the guest entry path.
Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Fixes: 6964e6a4e4 ("KVM: PPC: Book3S HV: Do SLB load/unload with guest LPCR value loaded")
Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Merge misc fixes from Andrew Morton:
"13 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
mm, thp: do not cause memcg oom for thp
mm/vmscan: wake up flushers for legacy cgroups too
Revert "mm: page_alloc: skip over regions of invalid pfns where possible"
mm/shmem: do not wait for lock_page() in shmem_unused_huge_shrink()
mm/thp: do not wait for lock_page() in deferred_split_scan()
mm/khugepaged.c: convert VM_BUG_ON() to collapse fail
x86/mm: implement free pmd/pte page interfaces
mm/vmalloc: add interfaces to free unmapped page table
h8300: remove extraneous __BIG_ENDIAN definition
hugetlbfs: check for pgoff value overflow
lockdep: fix fs_reclaim warning
MAINTAINERS: update Mark Fasheh's e-mail
mm/mempolicy.c: avoid use uninitialized preferred_node
Pull libnvdimm fixes from Dan Williams:
"Two regression fixes, two bug fixes for older issues, two fixes for
new functionality added this cycle that have userspace ABI concerns,
and a small cleanup. These have appeared in a linux-next release and
have a build success report from the 0day robot.
* The 4.16 rework of altmap handling led to some configurations
leaking page table allocations due to freeing from the altmap
reservation rather than the page allocator.
The impact without the fix is leaked memory and a WARN() message
when tearing down libnvdimm namespaces. The rework also missed a
place where error handling code needed to be removed that can lead
to a crash if devm_memremap_pages() fails.
* acpi_map_pxm_to_node() had a latent bug whereby it could
misidentify the closest online node to a given proximity domain.
* Block integrity handling was reworked several kernels back to allow
calling add_disk() after setting up the integrity profile.
The nd_btt and nd_blk drivers are just now catching up to fix
automatic partition detection at driver load time.
* The new peristence_domain attribute, a platform indicator of
whether cpu caches are powerfail protected for example, is meant to
be a single value enum and not a set of flags.
This oversight was caught while reviewing new userspace code in
libndctl to communicate the attribute.
Fix this new enabling up so that we are not stuck with an unwanted
userspace ABI"
* 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
libnvdimm, nfit: fix persistence domain reporting
libnvdimm, region: hide persistence_domain when unknown
acpi, numa: fix pxm to online numa node associations
x86, memremap: fix altmap accounting at free
libnvdimm: remove redundant assignment to pointer 'dev'
libnvdimm, {btt, blk}: do integrity setup before add_disk()
kernel/memremap: Remove stale devres_free() call
Pull drm fixes from Dave Airlie:
"A bunch of fixes all over the place (core, i915, amdgpu, imx, sun4i,
ast, tegra, vmwgfx), nothing too serious or worrying at this stage.
- one uapi fix to stop multi-planar images with getfb
- Sun4i error path and clock fixes
- udl driver mmap offset fix
- i915 DP MST and GPU reset fixes
- vmwgfx mutex and black screen fixes
- imx array underflow fix and vblank fix
- amdgpu: display fixes
- exynos devicetree fix
- ast mode fix"
* tag 'drm-fixes-for-v4.16-rc7' of git://people.freedesktop.org/~airlied/linux: (29 commits)
drm/ast: Fixed 1280x800 Display Issue
drm: udl: Properly check framebuffer mmap offsets
drm/i915: Specify which engines to reset following semaphore/event lockups
drm/vmwgfx: Fix a destoy-while-held mutex problem.
drm/vmwgfx: Fix black screen and device errors when running without fbdev
drm: Reject getfb for multi-plane framebuffers
drm/amd/display: Add one to EDID's audio channel count when passing to DC
drm/amd/display: We shouldn't set format_default on plane as atomic driver
drm/amd/display: Fix FMT truncation programming
drm/amd/display: Allow truncation to 10 bits
drm/sun4i: hdmi: Fix another error handling path in 'sun4i_hdmi_bind()'
drm/sun4i: hdmi: Fix an error handling path in 'sun4i_hdmi_bind()'
drm/i915/dp: Write to SET_POWER dpcd to enable MST hub.
drm/amd/display: fix dereferencing possible ERR_PTR()
drm/amd/display: Refine disable VGA
drm/tegra: Shutdown on driver unbind
drm/tegra: dsi: Don't disable regulator on ->exit()
drm/tegra: dc: Detach IOMMU group from domain only once
dt-bindings: exynos: Document #sound-dai-cells property of the HDMI node
drm/imx: move arming of the vblank event to atomic_flush
...
Commit 2516035499 ("mm, thp: remove __GFP_NORETRY from khugepaged and
madvised allocations") changed the page allocator to no longer detect
thp allocations based on __GFP_NORETRY.
It did not, however, modify the mem cgroup try_charge() path to avoid
oom kill for either khugepaged collapsing or thp faulting. It is never
expected to oom kill a process to allocate a hugepage for thp; reclaim
is governed by the thp defrag mode and MADV_HUGEPAGE, but allocations
(and charging) should fallback instead of oom killing processes.
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1803191409420.124411@chino.kir.corp.google.com
Fixes: 2516035499 ("mm, thp: remove __GFP_NORETRY from khugepaged and madvised allocations")
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 726d061fbd ("mm: vmscan: kick flushers when we encounter dirty
pages on the LRU") added flusher invocation to shrink_inactive_list()
when many dirty pages on the LRU are encountered.
However, shrink_inactive_list() doesn't wake up flushers for legacy
cgroup reclaim, so the next commit bbef938429 ("mm: vmscan: remove old
flusher wakeup from direct reclaim path") removed the only source of
flusher's wake up in legacy mem cgroup reclaim path.
This leads to premature OOM if there is too many dirty pages in cgroup:
# mkdir /sys/fs/cgroup/memory/test
# echo $$ > /sys/fs/cgroup/memory/test/tasks
# echo 50M > /sys/fs/cgroup/memory/test/memory.limit_in_bytes
# dd if=/dev/zero of=tmp_file bs=1M count=100
Killed
dd invoked oom-killer: gfp_mask=0x14000c0(GFP_KERNEL), nodemask=(null), order=0, oom_score_adj=0
Call Trace:
dump_stack+0x46/0x65
dump_header+0x6b/0x2ac
oom_kill_process+0x21c/0x4a0
out_of_memory+0x2a5/0x4b0
mem_cgroup_out_of_memory+0x3b/0x60
mem_cgroup_oom_synchronize+0x2ed/0x330
pagefault_out_of_memory+0x24/0x54
__do_page_fault+0x521/0x540
page_fault+0x45/0x50
Task in /test killed as a result of limit of /test
memory: usage 51200kB, limit 51200kB, failcnt 73
memory+swap: usage 51200kB, limit 9007199254740988kB, failcnt 0
kmem: usage 296kB, limit 9007199254740988kB, failcnt 0
Memory cgroup stats for /test: cache:49632KB rss:1056KB rss_huge:0KB shmem:0KB
mapped_file:0KB dirty:49500KB writeback:0KB swap:0KB inactive_anon:0KB
active_anon:1168KB inactive_file:24760KB active_file:24960KB unevictable:0KB
Memory cgroup out of memory: Kill process 3861 (bash) score 88 or sacrifice child
Killed process 3876 (dd) total-vm:8484kB, anon-rss:1052kB, file-rss:1720kB, shmem-rss:0kB
oom_reaper: reaped process 3876 (dd), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
Wake up flushers in legacy cgroup reclaim too.
Link: http://lkml.kernel.org/r/20180315164553.17856-1-aryabinin@virtuozzo.com
Fixes: bbef938429 ("mm: vmscan: remove old flusher wakeup from direct reclaim path")
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Tested-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Tejun Heo <tj@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
khugepaged is not yet able to convert PTE-mapped huge pages back to PMD
mapped. We do not collapse such pages. See check
khugepaged_scan_pmd().
But if between khugepaged_scan_pmd() and __collapse_huge_page_isolate()
somebody managed to instantiate THP in the range and then split the PMD
back to PTEs we would have a problem --
VM_BUG_ON_PAGE(PageCompound(page)) will get triggered.
It's possible since we drop mmap_sem during collapse to re-take for
write.
Replace the VM_BUG_ON() with graceful collapse fail.
Link: http://lkml.kernel.org/r/20180315152353.27989-1-kirill.shutemov@linux.intel.com
Fixes: b1caa957ae ("khugepaged: ignore pmd tables with THP mapped with ptes")
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
On architectures with CONFIG_HAVE_ARCH_HUGE_VMAP set, ioremap() may
create pud/pmd mappings. A kernel panic was observed on arm64 systems
with Cortex-A75 in the following steps as described by Hanjun Guo.
1. ioremap a 4K size, valid page table will build,
2. iounmap it, pte0 will set to 0;
3. ioremap the same address with 2M size, pgd/pmd is unchanged,
then set the a new value for pmd;
4. pte0 is leaked;
5. CPU may meet exception because the old pmd is still in TLB,
which will lead to kernel panic.
This panic is not reproducible on x86. INVLPG, called from iounmap,
purges all levels of entries associated with purged address on x86. x86
still has memory leak.
The patch changes the ioremap path to free unmapped page table(s) since
doing so in the unmap path has the following issues:
- The iounmap() path is shared with vunmap(). Since vmap() only
supports pte mappings, making vunmap() to free a pte page is an
overhead for regular vmap users as they do not need a pte page freed
up.
- Checking if all entries in a pte page are cleared in the unmap path
is racy, and serializing this check is expensive.
- The unmap path calls free_vmap_area_noflush() to do lazy TLB purges.
Clearing a pud/pmd entry before the lazy TLB purges needs extra TLB
purge.
Add two interfaces, pud_free_pmd_page() and pmd_free_pte_page(), which
clear a given pud/pmd entry and free up a page for the lower level
entries.
This patch implements their stub functions on x86 and arm64, which work
as workaround.
[akpm@linux-foundation.org: fix typo in pmd_free_pte_page() stub]
Link: http://lkml.kernel.org/r/20180314180155.19492-2-toshi.kani@hpe.com
Fixes: e61ce6ade4 ("mm: change ioremap to set up huge I/O mappings")
Reported-by: Lei Li <lious.lilei@hisilicon.com>
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Wang Xuefeng <wxf.wang@hisilicon.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Hanjun Guo <guohanjun@huawei.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Chintan Pandya <cpandya@codeaurora.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
A bugfix I did earlier caused a build regression on h8300, which defines
the __BIG_ENDIAN macro in a slightly different way than the generic
code:
arch/h8300/include/asm/byteorder.h:5:0: warning: "__BIG_ENDIAN" redefined
We don't need to define it here, as the same macro is already provided
by the linux/byteorder/big_endian.h, and that version does not conflict.
While this is a v4.16 regression, my earlier patch also got backported
to the 4.14 and 4.15 stable kernels, so we need the fixup there as well.
Link: http://lkml.kernel.org/r/20180313120752.2645129-1-arnd@arndb.de
Fixes: 101110f627 ("Kbuild: always define endianess in kconfig.h")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
A vma with vm_pgoff large enough to overflow a loff_t type when
converted to a byte offset can be passed via the remap_file_pages system
call. The hugetlbfs mmap routine uses the byte offset to calculate
reservations and file size.
A sequence such as:
mmap(0x20a00000, 0x600000, 0, 0x66033, -1, 0);
remap_file_pages(0x20a00000, 0x600000, 0, 0x20000000000000, 0);
will result in the following when task exits/file closed,
kernel BUG at mm/hugetlb.c:749!
Call Trace:
hugetlbfs_evict_inode+0x2f/0x40
evict+0xcb/0x190
__dentry_kill+0xcb/0x150
__fput+0x164/0x1e0
task_work_run+0x84/0xa0
exit_to_usermode_loop+0x7d/0x80
do_syscall_64+0x18b/0x190
entry_SYSCALL_64_after_hwframe+0x3d/0xa2
The overflowed pgoff value causes hugetlbfs to try to set up a mapping
with a negative range (end < start) that leaves invalid state which
causes the BUG.
The previous overflow fix to this code was incomplete and did not take
the remap_file_pages system call into account.
[mike.kravetz@oracle.com: v3]
Link: http://lkml.kernel.org/r/20180309002726.7248-1-mike.kravetz@oracle.com
[akpm@linux-foundation.org: include mmdebug.h]
[akpm@linux-foundation.org: fix -ve left shift count on sh]
Link: http://lkml.kernel.org/r/20180308210502.15952-1-mike.kravetz@oracle.com
Fixes: 045c7a3f53 ("hugetlbfs: fix offset overflow in hugetlbfs mmap")
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reported-by: Nic Losby <blurbdust@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Yisheng Xie <xieyisheng1@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dave Jones reported fs_reclaim lockdep warnings.
============================================
WARNING: possible recursive locking detected
4.15.0-rc9-backup-debug+ #1 Not tainted
--------------------------------------------
sshd/24800 is trying to acquire lock:
(fs_reclaim){+.+.}, at: [<0000000084f438c2>] fs_reclaim_acquire.part.102+0x5/0x30
but task is already holding lock:
(fs_reclaim){+.+.}, at: [<0000000084f438c2>] fs_reclaim_acquire.part.102+0x5/0x30
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(fs_reclaim);
lock(fs_reclaim);
*** DEADLOCK ***
May be due to missing lock nesting notation
2 locks held by sshd/24800:
#0: (sk_lock-AF_INET6){+.+.}, at: [<000000001a069652>] tcp_sendmsg+0x19/0x40
#1: (fs_reclaim){+.+.}, at: [<0000000084f438c2>] fs_reclaim_acquire.part.102+0x5/0x30
stack backtrace:
CPU: 3 PID: 24800 Comm: sshd Not tainted 4.15.0-rc9-backup-debug+ #1
Call Trace:
dump_stack+0xbc/0x13f
__lock_acquire+0xa09/0x2040
lock_acquire+0x12e/0x350
fs_reclaim_acquire.part.102+0x29/0x30
kmem_cache_alloc+0x3d/0x2c0
alloc_extent_state+0xa7/0x410
__clear_extent_bit+0x3ea/0x570
try_release_extent_mapping+0x21a/0x260
__btrfs_releasepage+0xb0/0x1c0
btrfs_releasepage+0x161/0x170
try_to_release_page+0x162/0x1c0
shrink_page_list+0x1d5a/0x2fb0
shrink_inactive_list+0x451/0x940
shrink_node_memcg.constprop.88+0x4c9/0x5e0
shrink_node+0x12d/0x260
try_to_free_pages+0x418/0xaf0
__alloc_pages_slowpath+0x976/0x1790
__alloc_pages_nodemask+0x52c/0x5c0
new_slab+0x374/0x3f0
___slab_alloc.constprop.81+0x47e/0x5a0
__slab_alloc.constprop.80+0x32/0x60
__kmalloc_track_caller+0x267/0x310
__kmalloc_reserve.isra.40+0x29/0x80
__alloc_skb+0xee/0x390
sk_stream_alloc_skb+0xb8/0x340
tcp_sendmsg_locked+0x8e6/0x1d30
tcp_sendmsg+0x27/0x40
inet_sendmsg+0xd0/0x310
sock_write_iter+0x17a/0x240
__vfs_write+0x2ab/0x380
vfs_write+0xfb/0x260
SyS_write+0xb6/0x140
do_syscall_64+0x1e5/0xc05
entry_SYSCALL64_slow_path+0x25/0x25
This warning is caused by commit d92a8cfcb3 ("locking/lockdep:
Rework FS_RECLAIM annotation") which replaced the use of
lockdep_{set,clear}_current_reclaim_state() in __perform_reclaim()
and lockdep_trace_alloc() in slab_pre_alloc_hook() with
fs_reclaim_acquire()/ fs_reclaim_release().
Since __kmalloc_reserve() from __alloc_skb() adds __GFP_NOMEMALLOC |
__GFP_NOWARN to gfp_mask, and all reclaim path simply propagates
__GFP_NOMEMALLOC, fs_reclaim_acquire() in slab_pre_alloc_hook() is
trying to grab the 'fake' lock again when __perform_reclaim() already
grabbed the 'fake' lock.
The
/* this guy won't enter reclaim */
if ((current->flags & PF_MEMALLOC) && !(gfp_mask & __GFP_NOMEMALLOC))
return false;
test which causes slab_pre_alloc_hook() to try to grab the 'fake' lock
was added by commit cf40bd16fd ("lockdep: annotate reclaim context
(__GFP_NOFS)"). But that test is outdated because PF_MEMALLOC thread
won't enter reclaim regardless of __GFP_NOMEMALLOC after commit
341ce06f69 ("page allocator: calculate the alloc_flags for allocation
only once") added the PF_MEMALLOC safeguard (
/* Avoid recursion of direct reclaim */
if (p->flags & PF_MEMALLOC)
goto nopage;
in __alloc_pages_slowpath()).
Thus, let's fix outdated test by removing __GFP_NOMEMALLOC test and
allow __need_fs_reclaim() to return false.
Link: http://lkml.kernel.org/r/201802280650.FJC73911.FOSOMLJVFFQtHO@I-love.SAKURA.ne.jp
Fixes: d92a8cfcb3 ("locking/lockdep: Rework FS_RECLAIM annotation")
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reported-by: Dave Jones <davej@codemonkey.org.uk>
Tested-by: Dave Jones <davej@codemonkey.org.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Nick Piggin <npiggin@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Nikolay Borisov <nborisov@suse.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: <stable@vger.kernel.org> [4.14+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The original ast driver cannot display properly if the resolution is 1280x800 and the pixel clock is 83.5MHz.
Here is the update to fix it.
Signed-off-by: Y.C. Chen <yc_chen@aspeedtech.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Pull ACPI fixes from Rafael Wysocki:
"These revert one recent commit that added incorrect battery quirks for
some Asus systems and fix an off-by-one error in the watchdog driver
based on the ACPI WDAT table.
Specifics:
- Revert the recent change adding battery quirks for Asus GL502VSK
and UX305LA as these quirks turn out to be inadequate and possibly
premature (Daniel Drake).
- Fix an off-by-one error in the resource allocation part of the
watchdog driver based on the ACPI WDAT table (Takashi Iwai)"
* tag 'acpi-4.16-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI / watchdog: Fix off-by-one error at resource assignment
Revert "ACPI / battery: Add quirk for Asus GL502VSK and UX305LA"
Pull modules fix from Jessica Yu:
"Propagate error in modules_open() to avoid possible later NULL
dereference if seq_open() had failed"
* tag 'modules-for-v4.16-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux:
module: propagate error in modules_open()
force_external_irq_replay() can be called in the do_IRQ path with
interrupts hard enabled and soft disabled if may_hard_irq_enable() set
MSR[EE]=1. It updates local_paca->irq_happened with a load, modify,
store sequence. If a maskable interrupt hits during this sequence, it
will go to the masked handler to be marked pending in irq_happened.
This update will be lost when the interrupt returns and the store
instruction executes. This can result in unpredictable latencies,
timeouts, lockups, etc.
Fix this by ensuring hard interrupts are disabled before modifying
irq_happened.
This could cause any maskable asynchronous interrupt to get lost, but
it was noticed on P9 SMP system doing RDMA NVMe target over 100GbE,
so very high external interrupt rate and high IPI rate. The hang was
bisected down to enabling doorbell interrupts for IPIs. These provided
an interrupt type that could run at high rates in the do_IRQ path,
stressing the race.
Fixes: 1d607bb3bd ("powerpc/irq: Add mechanism to force a replay of interrupts")
Cc: stable@vger.kernel.org # v4.8+
Reported-by: Carol L. Soto <clsoto@us.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Pull networking fixes from David Miller:
1) Always validate XFRM esn replay attribute, from Florian Westphal.
2) Fix RCU read lock imbalance in xfrm_get_tos(), from Xin Long.
3) Don't try to get firmware dump if not loaded in iwlwifi, from Shaul
Triebitz.
4) Fix BPF helpers to deal with SCTP GSO SKBs properly, from Daniel
Axtens.
5) Fix some interrupt handling issues in e1000e driver, from Benjamin
Poitier.
6) Use strlcpy() in several ethtool get_strings methods, from Florian
Fainelli.
7) Fix rhlist dup insertion, from Paul Blakey.
8) Fix SKB leak in netem packet scheduler, from Alexey Kodanev.
9) Fix driver unload crash when link is up in smsc911x, from Jeremy
Linton.
10) Purge out invalid socket types in l2tp_tunnel_create(), from Eric
Dumazet.
11) Need to purge the write queue when TCP connections are aborted,
otherwise userspace using MSG_ZEROCOPY can't close the fd. From
Soheil Hassas Yeganeh.
12) Fix double free in error path of team driver, from Arkadi
Sharshevsky.
13) Filter fixes for hv_netvsc driver, from Stephen Hemminger.
14) Fix non-linear packet access in ipv6 ndisc code, from Lorenzo
Bianconi.
15) Properly filter out unsupported feature flags in macvlan driver,
from Shannon Nelson.
16) Don't request loading the diag module for a protocol if the protocol
itself is not even registered. From Xin Long.
17) If datagram connect fails in ipv6, make sure the socket state is
consistent afterwards. From Paolo Abeni.
18) Use after free in qed driver, from Dan Carpenter.
19) If received ipv4 PMTU is less than the min pmtu, lock the mtu in the
entry. From Sabrina Dubroca.
20) Fix sleep in atomic in tg3 driver, from Jonathan Toppins.
21) Fix vlan in vlan untagging in some situations, from Toshiaki Makita.
22) Fix double SKB free in genlmsg_mcast(). From Nicolas Dichtel.
23) Fix NULL derefs in error paths of tcf_*_init(), from Davide Caratti.
24) Unbalanced PM runtime calls in FEC driver, from Florian Fainelli.
25) Memory leak in gemini driver, from Igor Pylypiv.
26) IDR leaks in error paths of tcf_*_init() functions, from Davide
Caratti.
27) Need to use GFP_ATOMIC in seg6_build_state(), from David Lebrun.
28) Missing dev_put() in error path of macsec_newlink(), from Dan
Carpenter.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (201 commits)
macsec: missing dev_put() on error in macsec_newlink()
net: dsa: Fix functional dsa-loop dependency on FIXED_PHY
hv_netvsc: common detach logic
hv_netvsc: change GPAD teardown order on older versions
hv_netvsc: use RCU to fix concurrent rx and queue changes
hv_netvsc: disable NAPI before channel close
net/ipv6: Handle onlink flag with multipath routes
ppp: avoid loop in xmit recursion detection code
ipv6: sr: fix NULL pointer dereference when setting encap source address
ipv6: sr: fix scheduling in RCU when creating seg6 lwtunnel state
net: aquantia: driver version bump
net: aquantia: Implement pci shutdown callback
net: aquantia: Allow live mac address changes
net: aquantia: Add tx clean budget and valid budget handling logic
net: aquantia: Change inefficient wait loop on fw data reads
net: aquantia: Fix a regression with reset on old firmware
net: aquantia: Fix hardware reset when SPI may rarely hangup
s390/qeth: on channel error, reject further cmd requests
s390/qeth: lock read device while queueing next buffer
s390/qeth: when thread completes, wake up all waiters
...
Pull MMC fixes from Ulf Hansson:
"A couple of MMC fixes intended for v4.16-rc7:
MMC host:
- dw_mmc: Fix the suspend/resume issue for Exynos5433
- dw_mmc: Fix the DTO/CTO timeout overflow calculation for 32-bit
systems
- dw_mmc: Make PIO mode work when failing with idmac when
dw_mci_reset occurs
- sdhci-acpi: Re-allow IRQ 0 to fix broken probe
MMC core:
- Update EXT_CSD caches to correctly switch partition for ioctl calls
- Fix tracepoint print of blk_addr and blksz
- Disable HPI on broken Micron (Numonyx) eMMC cards"
* tag 'mmc-v4.16-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
mmc: sdhci-acpi: Fix IRQ 0
mmc: dw_mmc: fix falling from idmac to PIO mode when dw_mci_reset occurs
mmc: core: Fix tracepoint print of blk_addr and blksz
mmc: core: Disable HPI for certain Micron (Numonyx) eMMC cards
mmc: dw_mmc: exynos: fix the suspend/resume issue for exynos5433
mmc: block: fix updating ext_csd caches on ioctl call
mmc: dw_mmc: Fix the DTO/CTO timeout overflow calculation for 32-bit systems
Main change is a patch to reject getfb call for multiplanar framebuffers,
then we have a couple of error path fixes on the sun4i driver. Still on that
driver there is a clk fix and finally a mmap offset fix on the udl driver.
* tag 'drm-misc-fixes-2018-03-22' of git://anongit.freedesktop.org/drm/drm-misc:
drm: udl: Properly check framebuffer mmap offsets
drm: Reject getfb for multi-plane framebuffers
drm/sun4i: hdmi: Fix another error handling path in 'sun4i_hdmi_bind()'
drm/sun4i: hdmi: Fix an error handling path in 'sun4i_hdmi_bind()'
drm/sun4i: Fix an error handling path in 'sun4i_drv_bind()'
drm/sun4i: Fix exclusivity of the TCON clocks
One fix for DP MST and one fix for GPU reset on hang check.
* tag 'drm-intel-fixes-2018-03-21' of git://anongit.freedesktop.org/drm/drm-intel:
drm/i915: Specify which engines to reset following semaphore/event lockups
drm/i915/dp: Write to SET_POWER dpcd to enable MST hub.
Two vmwgfx fixes for 4.16. Both cc'd stable.
* 'vmwgfx-fixes-4.16' of git://people.freedesktop.org/~thomash/linux:
drm/vmwgfx: Fix a destoy-while-held mutex problem.
drm/vmwgfx: Fix black screen and device errors when running without fbdev
drm/imx: fixes for early vblank event issue, array underflow error
- fix an array underflow error by reordering the range check before the array
subscript in ipu-prg.
- make some local functions static in ipuv3-plane.
- add a missng header for ipu_planes_assign_pre in ipuv3-plane.
- move arming of the vblank event from atomic_begin to atomic_flush, to avoid
signalling atomic commit completion to userspace before plane atomic_update
has finished, due to a race condition that is likely to be hit on i.MX6QP on
PRE enabled channels.
* tag 'imx-drm-fixes-2018-03-22' of git://git.pengutronix.de/git/pza/linux:
drm/imx: move arming of the vblank event to atomic_flush
drm/imx: ipuv3-plane: Include "imx-drm.h" header file
drm/imx: ipuv3-plane: Make functions static when possible
gpu: ipu-v3: prg: avoid possible array underflow
We moved the dev_hold(real_dev); call earlier in the function but forgot
to update the error paths.
Fixes: 0759e552bc ("macsec: fix negative refcnt on parent link")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Johannes Berg says:
====================
Two more fixes (in three patches):
* ath9k_htc doesn't like QoS NDP frames, use regular ones
* hwsim: set up wmediumd for radios created later
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
We have a functional dependency on the FIXED_PHY MDIO bus because we register
fixed PHY devices "the old way" which only works if the code that does this has
had a chance to run before the fixed MDIO bus is probed. Make sure we account
for that and have dsa_loop_bdinfo.o be either built-in or modular depending on
whether CONFIG_FIXED_PHY reflects that too.
Fixes: 98cd1552ea ("net: dsa: Mock-up driver")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stephen Hemminger says:
====================
hv_netvsc: fix races during shutdown and changes
This set of patches fixes issues identified by Vitaly Kuznetsov and
Mohammed Gamal related to state changes in Hyper-v network driver.
A lot of the issues are because setting up the netvsc device requires
a second step (in work queue) to get all the sub-channels running.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Make common function for detaching internals of device
during changes to MTU and RSS. Make sure no more packets
are transmitted and all packets have been received before
doing device teardown.
Change the wait logic to be common and use usleep_range().
Changes transmit enabling logic so that transmit queues are disabled
during the period when lower device is being changed. And enabled
only after sub channels are setup. This avoids issue where it could
be that a packet was being sent while subchannel was not initialized.
Fixes: 8195b1396e ("hv_netvsc: fix deadlock on hotplug")
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
On older versions of Windows, the host ignores messages after
vmbus channel is closed.
Workaround this by doing what Windows does and send the teardown
before close on older versions of NVSP protocol.
Reported-by: Mohammed Gamal <mgamal@redhat.com>
Fixes: 0cf737808a ("hv_netvsc: netvsc_teardown_gpadl() split")
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The receive processing may continue to happen while the
internal network device state is in RCU grace period.
The internal RNDIS structure is associated with the
internal netvsc_device structure; both have the same
RCU lifetime.
Defer freeing all associated parts until after grace
period.
Fixes: 0cf737808a ("hv_netvsc: netvsc_teardown_gpadl() split")
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This makes sure that no CPU is still process packets when
the channel is closed.
Fixes: 76bb5db5c7 ("netvsc: fix use after free on module removal")
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For multipath routes the ONLINK flag can be specified per nexthop in
rtnh_flags or globally in rtm_flags. Update ip6_route_multipath_add
to consider the ONLINK setting coming from rtnh_flags. Each loop over
nexthops the config for the sibling route is initialized to the global
config and then per nexthop settings overlayed. The flag is 'or'ed into
fib6_config to handle the ONLINK flag coming from either rtm_flags or
rtnh_flags.
Fixes: fc1e64e109 ("net/ipv6: Add support for onlink flag")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We already detect situations where a PPP channel sends packets back to
its upper PPP device. While this is enough to avoid deadlocking on xmit
locks, this doesn't prevent packets from looping between the channel
and the unit.
The problem is that ppp_start_xmit() enqueues packets in ppp->file.xq
before checking for xmit recursion. Therefore, __ppp_xmit_process()
might dequeue a packet from ppp->file.xq and send it on the channel
which, in turn, loops it back on the unit. Then ppp_start_xmit()
queues the packet back to ppp->file.xq and __ppp_xmit_process() picks
it up and sends it again through the channel. Therefore, the packet
will loop between __ppp_xmit_process() and ppp_start_xmit() until some
other part of the xmit path drops it.
For L2TP, we rapidly fill the skb's headroom and pppol2tp_xmit() drops
the packet after a few iterations. But PPTP reallocates the headroom
if necessary, letting the loop run and exhaust the machine resources
(as reported in https://bugzilla.kernel.org/show_bug.cgi?id=199109).
Fix this by letting __ppp_xmit_process() enqueue the skb to
ppp->file.xq, so that we can check for recursion before adding it to
the queue. Now ppp_xmit_process() can drop the packet when recursion is
detected.
__ppp_channel_push() is a bit special. It calls __ppp_xmit_process()
without having any actual packet to send. This is used by
ppp_output_wakeup() to re-enable transmission on the parent unit (for
implementations like ppp_async.c, where the .start_xmit() function
might not consume the skb, leaving it in ppp->xmit_pending and
disabling transmission).
Therefore, __ppp_xmit_process() needs to handle the case where skb is
NULL, dequeuing as many packets as possible from ppp->file.xq.
Reported-by: xu heng <xuheng333@zoho.com>
Fixes: 55454a5658 ("ppp: avoid dealock on recursive xmit")
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Igor Russkikh says:
====================
Aquantia atlantic hot fixes 03-2018
This is a set of atlantic driver hot fixes for various areas:
Some issues with hardware reset covered,
Fixed napi_poll flood happening on some traffic conditions,
Allow system to change MAC address on live device,
Add pci shutdown handler.
patch v2:
- reverse christmas tree
- remove driver private parameter, replacing it with define.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
We should close link and all NIC operations during shutdown.
On some systems graceful reboot never closes NIC interface on its own,
but only indicates pci device shutdown. Without explicit handler, NIC
rx rings continued to transfer DMA data into prepared buffers while CPU
rebooted already. That caused memory corruptions on soft reboot.
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is nothing prevents us from changing MAC on the running interface.
Allow this with ndev priv flag.
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We should report to napi full budget only when we have more job to do.
Before this fix, on any tx queue cleanup we forced napi to do poll again.
Thats a waste of cpu resources and caused storming with napi polls when
there was at least one tx on each interrupt.
With this fix we report full budget only when there is more job on TX
to do. Or, as before, when rx budget was fully consumed.
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
B1 hardware changes behavior of mailbox interface, it has busy bit
always raised. Data ready condition should be detected by increment
of address register.
Old code has empty `for` loop, and that caused cpu overloads on B1
hardware. aq_nic_service_timer_cb consumed ~100ms because of that.
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
FW 1.5.58 and below needs a fixed delay even after 0x18 register
is filled. Otherwise, setting MPI_INIT state too fast causes
traffic hang.
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Under some circumstances (notably using thunderbolt interface) SPI
on chip reset may be in active transaction.
Here we forcibly cleanup SPI to prevent possible hangups.
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Julian Wiedmann says:
====================
s390/qeth: fixes 2018-03-20
Please apply one final set of qeth patches for 4.16.
All of these fix long-standing bugs, so please queue them up for -stable
as well.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
When the IRQ handler determines that one of the cmd IO channels has
failed and schedules recovery, block any further cmd requests from
being submitted. The request would inevitably stall, and prevent the
recovery from making progress until the request times out.
This sort of error was observed after Live Guest Relocation, where
the pending IO on the READ channel intentionally gets terminated to
kick-start recovery. Simultaneously the guest executed SIOCETHTOOL,
triggering qeth to issue a QUERY CARD INFO command. The command
then stalled in the inoperabel WRITE channel.
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For calling ccw_device_start(), issue_next_read() needs to hold the
device's ccwlock.
This is satisfied for the IRQ handler path (where qeth_irq() gets called
under the ccwlock), but we need explicit locking for the initial call by
the MPC initialization.
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
qeth_wait_for_threads() is potentially called by multiple users, make
sure to notify all of them after qeth_clear_thread_running_bit()
adjusted the thread_running_mask. With no timeout, callers would
otherwise stall.
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
On removal, a qeth card's netdevice is currently not properly freed
because the call chain looks as follows:
qeth_core_remove_device(card)
lx_remove_device(card)
unregister_netdev(card->dev)
card->dev = NULL !!!
qeth_core_free_card(card)
if (card->dev) !!!
free_netdev(card->dev)
Fix it by free'ing the netdev straight after unregistering. This also
fixes the sysfs-driven layer switch case (qeth_dev_layer2_store()),
where the need to free the current netdevice was not considered at all.
Note that free_netdev() takes care of the netif_napi_del() for us too.
Fixes: 4a71df5004 ("qeth: new qeth device driver")
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Reviewed-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kevin Hao says:
====================
net: phy: Add general dummy stubs for MMD register access
v2:
As suggested by Andrew:
- Add general dummy stubs
- Also use that for the micrel phy
This patch series fix the Ethernet broken on the mpc8315erdb board introduced
by commit b6b5e8a691 ("gianfar: Disable EEE autoneg by default").
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The new general dummy stubs for MMD register access were introduced.
Use that for the codes reuse.
Signed-off-by: Kevin Hao <haokexin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The Ethernet on mpc8315erdb is broken since commit b6b5e8a691
("gianfar: Disable EEE autoneg by default"). The reason is that
even though the rtl8211b doesn't support the MMD extended registers
access, it does return some random values if we trying to access
the MMD register via indirect method. This makes it seem that the
EEE is supported by this phy device. And the subsequent writing to
the MMD registers does cause the phy malfunction. So use the dummy
stubs for the MMD register access to fix this issue.
Fixes: b6b5e8a691 ("gianfar: Disable EEE autoneg by default")
Signed-off-by: Kevin Hao <haokexin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For some phy devices, even though they don't support the MMD extended
register access, it does have some side effect if we are trying to
read/write the MMD registers via indirect method. So introduce general
dummy stubs for MMD register access which these devices can use to avoid
such side effect.
Fixes: b6b5e8a691 ("gianfar: Disable EEE autoneg by default")
Signed-off-by: Kevin Hao <haokexin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Simon Wunderlich says:
====================
Here are some batman-adv bugfixes:
- fix possible IPv6 packet loss when multicast extension is used, by Linus Luessing
- fix SKB handling issues for TTVN and DAT, by Matthias Schiffer (two patches)
- fix include for eventpoll, by Sven Eckelmann
- fix skb checksum for ttvn reroutes, by Sven Eckelmann
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Use tegra124_(primary|overlay)_formats for Tegra124, otherwise the count
specified in the Tegra124 SoC info structure will be different from the
array size and cause a crash.
Fixes: 511c7023cf ("drm/tegra: dc: Support more formats")
Signed-off-by: Stefan Agner <stefan@agner.ch>
Acked-by: Marcel Ziswiler <marcel.ziswiler@toradex.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
The netfilter netdevice event handler hold the nfnl_lock mutex, this
avoids races with a device going away while such device is being
attached to hooks from the netlink control plane. Therefore, either
control plane bails out with ENOENT or netdevice event path waits until
the hook that is attached to net_device is registered.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Devices going away have to grab the nfnl_lock from the netdev event path
to avoid races with control plane updates.
However, netlink dumps in netfilter do not hold nfnl_lock mutex. Cache
the device name into the objects to avoid an use-after-free situation
for a device that is going away.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
In loopback_open() and loopback_close(), we assign and release the
substream object to the corresponding cable in a racy way. It's
neither locked nor done in the right position. The open callback
assigns the substream before its preparation finishes, hence the other
side of the cable may pick it up, which may lead to the invalid memory
access.
This patch addresses these: move the assignment to the end of the open
callback, and wrap with cable->lock for avoiding concurrent accesses.
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
The aloop driver tries to stop the pending timer via timer_del() in
the trigger callback and in the close callback. The former is
correct, as it's an atomic operation, while the latter expects that
the timer gets really removed and proceeds the resource releases after
that. But timer_del() doesn't synchronize, hence the running timer
may still access the released resources.
A similar situation can be also seen in the prepare callback after
trigger(STOP) where the prepare tries to re-initialize the things
while a timer is still running.
The problems like the above are seen indirectly in some syzkaller
reports (although it's not 100% clear whether this is the only cause,
as the race condition is quite narrow and not always easy to
trigger).
For addressing these issues, this patch adds the explicit alls of
timer_del_sync() in some places, so that the pending timer is properly
killed / synced.
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
It will have a chance speaker no sound after system resume.
To toggle NID 0x53 index 0x2 bit 15 will solve this issue.
This usage will also suitable with ALC256.
Fixes: 4a219ef8f3 ("ALSA: hda/realtek - Add ALC256 HP depop function")
Signed-off-by: Kailang Yang <kailang@realtek.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
This platform was hardware fixed type for CTIA type for headset port.
Assigned 0x19 verb will fix can't record issue.
Signed-off-by: Kailang Yang <kailang@realtek.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
The bitfield dma_inuse is allocated of size dma_requests bits, thus a
valid bit address is from 0 to (dma_requests - 1).
When find_first_zero_bit() fails, it returns dma_requests as invalid
address.
Using such address for the following set_bit() is incorrect and, if
dma_requests is a multiple of BITS_PER_LONG, it will cause a buffer
overflow.
Currently this driver is only used in DT stm32h743.dtsi where a safe value
dma_requests=16 is not triggering the buffer overflow.
Fixed by checking the return value of find_first_zero_bit() _before_
using it.
Signed-off-by: Antonio Borneo <borneo.antonio@gmail.com>
Signed-off-by: Pierre-Yves MORDRET <pierre-yves.mordret@st.com>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
This reverts commit 2170dd0431
The intent of commit 2170dd0431 ("vfio-pci: Mask INTx if a device is
not capabable of enabling it") was to disallow the user from seeing
that the device supports INTx if the platform is incapable of enabling
it. The detection of this case however incorrectly includes devices
which natively do not support INTx, such as SR-IOV VFs, and further
discussions reveal gaps even for the target use case.
Reported-by: Arjun Vynipadath <arjun@chelsio.com>
Fixes: 2170dd0431 ("vfio-pci: Mask INTx if a device is not capabable of enabling it")
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Since commit 3af5a67c86 ("MIPS: Fix early CM probing") the MT7621 has
not been able to boot.
This commit caused mips_cm_probe() to be called before
mt7621.c::proc_soc_init().
prom_soc_init() has a comment explaining that mips_cm_probe() "wipes out
the bootloader config" and means that configuration registers are no
longer available. It has some code to re-enable this config.
Before this re-enable code is run, the sysc register cannot be read, so
when SYSC_REG_CHIP_NAME0 is read, a garbage value is returned and
panic() is called.
If we move the config-repair code to the top of prom_soc_init(), the
registers can be read and boot can proceed.
Very occasionally, the first register read after the reconfiguration
returns garbage, so add a call to __sync().
Fixes: 3af5a67c86 ("MIPS: Fix early CM probing")
Signed-off-by: NeilBrown <neil@brown.name>
Reviewed-by: Matt Redfearn <matt.redfearn@mips.com>
Cc: John Crispin <john@phrozen.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: <stable@vger.kernel.org> # 4.5+
Patchwork: https://patchwork.linux-mips.org/patch/18859/
Signed-off-by: James Hogan <jhogan@kernel.org>
ralink_halt() does nothing that machine_halt() doesn't already do, so it
adds no value.
It actually causes incorrect behaviour due to the "unreachable()" at the
end. This tells the compiler that the end of the function will never be
reached, which isn't true. The compiler responds by not adding a
'return' instruction, so control simply moves on to whatever bytes come
afterwards in memory. In my tested, that was the ralink_restart()
function. This means that an attempt to 'halt' the machine would
actually cause a reboot.
So remove ralink_halt() so that a 'halt' really does halt.
Fixes: c06e836ada ("MIPS: ralink: adds reset code")
Signed-off-by: NeilBrown <neil@brown.name>
Cc: John Crispin <john@phrozen.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: <stable@vger.kernel.org> # 3.9+
Patchwork: https://patchwork.linux-mips.org/patch/18851/
Signed-off-by: James Hogan <jhogan@kernel.org>
A few more fixes for 4.16. Mostly for displays:
- A fix for DP handling on radeon
- Fix banding on eDP panels
- Fix HBR audio
- Fix for disabling VGA mode on Raven that leads to a corrupt or
blank display on some platforms
* 'drm-fixes-4.16' of git://people.freedesktop.org/~agd5f/linux:
drm/amd/display: Add one to EDID's audio channel count when passing to DC
drm/amd/display: We shouldn't set format_default on plane as atomic driver
drm/amd/display: Fix FMT truncation programming
drm/amd/display: Allow truncation to 10 bits
drm/amd/display: fix dereferencing possible ERR_PTR()
drm/amd/display: Refine disable VGA
drm/amdgpu: Use atomic function to disable crtcs with dc enabled
drm/radeon: Don't turn off DP sink when disconnected
Davide Caratti says:
====================
fix idr leak in actions
This series fixes situations where a temporary failure to install a TC
action results in the permanent impossibility to reuse the configured
value of 'index'.
Thanks to Cong Wang for the initial review.
v2: fix build error in act_ipt.c, reported by kbuild test robot
====================
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
tcf_skbmod_init() can fail after the idr has been successfully reserved.
When this happens, every subsequent attempt to configure skbmod rules
using the same idr value will systematically fail with -ENOSPC, unless
the first attempt was done using the 'replace' keyword:
# tc action add action skbmod swap mac index 100
RTNETLINK answers: Cannot allocate memory
We have an error talking to the kernel
# tc action add action skbmod swap mac index 100
RTNETLINK answers: No space left on device
We have an error talking to the kernel
# tc action add action skbmod swap mac index 100
RTNETLINK answers: No space left on device
We have an error talking to the kernel
...
Fix this in tcf_skbmod_init(), ensuring that tcf_idr_release() is called
on the error path when the idr has been reserved, but not yet inserted.
Also, don't test 'ovr' in the error path, to avoid a 'replace' failure
implicitly become a 'delete' that leaks refcount in act_skbmod module:
# rmmod act_skbmod; modprobe act_skbmod
# tc action add action skbmod swap mac index 100
# tc action add action skbmod swap mac continue index 100
RTNETLINK answers: File exists
We have an error talking to the kernel
# tc action replace action skbmod swap mac continue index 100
RTNETLINK answers: Cannot allocate memory
We have an error talking to the kernel
# tc action list action skbmod
#
# rmmod act_skbmod
rmmod: ERROR: Module act_skbmod is in use
Fixes: 65a206c01e ("net/sched: Change act_api and act_xxx modules to use IDR")
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
tcf_vlan_init() can fail after the idr has been successfully reserved.
When this happens, every subsequent attempt to configure vlan rules using
the same idr value will systematically fail with -ENOSPC, unless the first
attempt was done using the 'replace' keyword.
# tc action add action vlan pop index 100
RTNETLINK answers: Cannot allocate memory
We have an error talking to the kernel
# tc action add action vlan pop index 100
RTNETLINK answers: No space left on device
We have an error talking to the kernel
# tc action add action vlan pop index 100
RTNETLINK answers: No space left on device
We have an error talking to the kernel
...
Fix this in tcf_vlan_init(), ensuring that tcf_idr_release() is called on
the error path when the idr has been reserved, but not yet inserted. Also,
don't test 'ovr' in the error path, to avoid a 'replace' failure implicitly
become a 'delete' that leaks refcount in act_vlan module:
# rmmod act_vlan; modprobe act_vlan
# tc action add action vlan push id 5 index 100
# tc action replace action vlan push id 7 index 100
RTNETLINK answers: Cannot allocate memory
We have an error talking to the kernel
# tc action list action vlan
#
# rmmod act_vlan
rmmod: ERROR: Module act_vlan is in use
Fixes: 4c5b9d9642 ("act_vlan: VLAN action rewrite to use RCU lock/unlock and update")
Fixes: 65a206c01e ("net/sched: Change act_api and act_xxx modules to use IDR")
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
__tcf_ipt_init() can fail after the idr has been successfully reserved.
When this happens, subsequent attempts to configure xt/ipt rules using
the same idr value systematically fail with -ENOSPC:
# tc action add action xt -j LOG --log-prefix test1 index 100
tablename: mangle hook: NF_IP_POST_ROUTING
target: LOG level warning prefix "test1" index 100
RTNETLINK answers: Cannot allocate memory
We have an error talking to the kernel
Command "(null)" is unknown, try "tc actions help".
# tc action add action xt -j LOG --log-prefix test1 index 100
tablename: mangle hook: NF_IP_POST_ROUTING
target: LOG level warning prefix "test1" index 100
RTNETLINK answers: No space left on device
We have an error talking to the kernel
Command "(null)" is unknown, try "tc actions help".
# tc action add action xt -j LOG --log-prefix test1 index 100
tablename: mangle hook: NF_IP_POST_ROUTING
target: LOG level warning prefix "test1" index 100
RTNETLINK answers: No space left on device
We have an error talking to the kernel
...
Fix this in the error path of __tcf_ipt_init(), calling tcf_idr_release()
in place of tcf_idr_cleanup(). Since tcf_ipt_release() can now be called
when tcfi_t is NULL, we also need to protect calls to ipt_destroy_target()
to avoid NULL pointer dereference.
Fixes: 65a206c01e ("net/sched: Change act_api and act_xxx modules to use IDR")
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
tcf_pedit_init() can fail to allocate 'keys' after the idr has been
successfully reserved. When this happens, subsequent attempts to configure
a pedit rule using the same idr value systematically fail with -ENOSPC:
# tc action add action pedit munge ip ttl set 63 index 100
RTNETLINK answers: Cannot allocate memory
We have an error talking to the kernel
# tc action add action pedit munge ip ttl set 63 index 100
RTNETLINK answers: No space left on device
We have an error talking to the kernel
# tc action add action pedit munge ip ttl set 63 index 100
RTNETLINK answers: No space left on device
We have an error talking to the kernel
...
Fix this in the error path of tcf_act_pedit_init(), calling
tcf_idr_release() in place of tcf_idr_cleanup().
Fixes: 65a206c01e ("net/sched: Change act_api and act_xxx modules to use IDR")
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The persistence domain is a point in the platform where once writes
reach that destination the platform claims it will make them persistent
relative to power loss. In the ACPI NFIT this is currently communicated
as 2 bits in the "NFIT - Platform Capabilities Structure". The bits
comprise a hierarchy, i.e. bit0 "CPU Cache Flush to NVDIMM Durability on
Power Loss Capable" implies bit1 "Memory Controller Flush to NVDIMM
Durability on Power Loss Capable".
Commit 96c3a23905 "libnvdimm: expose platform persistence attr..."
shows the persistence domain as flags, but it's really an enumerated
hierarchy.
Fix this newly introduced user ABI to show the closest available
persistence domain before userspace develops dependencies on seeing, or
needing to develop code to tolerate, the raw NFIT flags communicated
through the libnvdimm-generic region attribute.
Fixes: 96c3a23905 ("libnvdimm: expose platform persistence attr...")
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
tcf_act_police_init() can fail after the idr has been successfully
reserved (e.g., qdisc_get_rtab() may return NULL). When this happens,
subsequent attempts to configure a police rule using the same idr value
systematiclly fail with -ENOSPC:
# tc action add action police rate 1000 burst 1000 drop index 100
RTNETLINK answers: Cannot allocate memory
We have an error talking to the kernel
# tc action add action police rate 1000 burst 1000 drop index 100
RTNETLINK answers: No space left on device
We have an error talking to the kernel
# tc action add action police rate 1000 burst 1000 drop index 100
RTNETLINK answers: No space left on device
...
Fix this in the error path of tcf_act_police_init(), calling
tcf_idr_release() in place of tcf_idr_cleanup().
Fixes: 65a206c01e ("net/sched: Change act_api and act_xxx modules to use IDR")
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
if the kernel fails to duplicate 'sdata', creation of a new action fails
with -ENOMEM. However, subsequent attempts to install the same action
using the same value of 'index' systematically fail with -ENOSPC, and
that value of 'index' will no more be usable by act_simple, until rmmod /
insmod of act_simple.ko is done:
# tc actions add action simple sdata hello index 100
# tc actions list action simple
action order 0: Simple <hello>
index 100 ref 1 bind 0
# tc actions flush action simple
# tc actions add action simple sdata hello index 100
RTNETLINK answers: Cannot allocate memory
We have an error talking to the kernel
# tc actions flush action simple
# tc actions add action simple sdata hello index 100
RTNETLINK answers: No space left on device
We have an error talking to the kernel
# tc actions add action simple sdata hello index 100
RTNETLINK answers: No space left on device
We have an error talking to the kernel
...
Fix this in the error path of tcf_simp_init(), calling tcf_idr_release()
in place of tcf_idr_cleanup().
Fixes: 65a206c01e ("net/sched: Change act_api and act_xxx modules to use IDR")
Suggested-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
when the following command sequence is entered
# tc action add action bpf bytecode '4,40 0 0 12,31 0 1 2048,6 0 0 262144,6 0 0 0' index 100
RTNETLINK answers: Invalid argument
We have an error talking to the kernel
# tc action add action bpf bytecode '4,40 0 0 12,21 0 1 2048,6 0 0 262144,6 0 0 0' index 100
RTNETLINK answers: No space left on device
We have an error talking to the kernel
act_bpf correctly refuses to install the first TC rule, because 31 is not
a valid instruction. However, it refuses to install the second TC rule,
even if the BPF code is correct. Furthermore, it's no more possible to
install any other rule having the same value of 'index' until act_bpf
module is unloaded/inserted again. After the idr has been reserved, call
tcf_idr_release() instead of tcf_idr_cleanup(), to fix this issue.
Fixes: 65a206c01e ("net/sched: Change act_api and act_xxx modules to use IDR")
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Trivial fix to spelling mistakes in DP_ERR error message text and
comments
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Once the FW is transitioned to error, FLUSH cqes can be received.
We want the driver to be aware of the fact that QP is already in error.
Without this fix, a user may see false error messages in the dmesg log,
mentioning that a FLUSH cqe was received while QP is not in error state.
Fixes: cecbcddf ("qedr: Add support for QP verbs")
Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Return code wasn't set properly when CNQ allocation failed.
This only affect error message logging, currently user will
receive an error message that says the qedr driver load failed
with rc '0', instead of ENOMEM
Fixes: ec72fce4 ("qedr: Add support for RoCE HW init")
Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
QPs that were configured with ack timeout value lower than 1
msec will not implement re-transmission timeout.
This means that if a packet / ACK were dropped, the QP
will not retransmit this packet.
This can lead to an application hang.
Fixes: cecbcddf6 ("qedr: Add support for QP verbs")
Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The option size check is using optval instead of optlen
causing the set option call to fail. Use the correct
field, optlen, for size check.
Fixes: 6a21dfc0d0 ("RDMA/ucma: Limit possible option size")
Signed-off-by: Chien Tin Tung <chien.tin.tung@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
In case we failed to create UMR resources, mark them as invalid so we
won't try to destroy them on the unwind path.
Add the relevant checks to destroy_umrc_res(), this is done for the
unlikely event ib_register_device() or create_umr_res() err out and we
try to destroy invalid objects.
Fixes: 42cea83f95 ("IB/mlx5: Fix cleanup order on unload")
Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Daniel Borkmann says:
====================
pull-request: bpf 2018-03-21
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) Follow-up fix to the fault injection framework to prevent jump
optimization on the kprobe by installing a dummy post-handler,
from Masami.
2) Drop bpf_perf_prog_read_value helper from tracepoint type programs
which was mistakenly added there and would otherwise crash due to
wrong input context, from Yonghong.
3) Fix a crash in BPF fs when compiled with clang. Code appears to
be fine just that clang tries to overly aggressive optimize in
non C conform ways, therefore fix the kernel's Makefile to
generally prevent such issues, from Daniel.
4) Skip unnecessary capability checks in bpf syscall, which is otherwise
triggering unnecessary security hooks on capability checking and
causing false alarms on unprivileged processes trying to access
CAP_SYS_ADMIN restricted infra, from Chenbo.
5) Fix the test_bpf.ko module when CONFIG_BPF_JIT_ALWAYS_ON is set
with regards to a test case that is really just supposed to fail
on x8_64 JIT but not others, from Thadeu.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
We are still using custom SRAM code for some SoCs and are not marking
the PM code mapped to SRAM as read-only and executable after we're
done. With CONFIG_DEBUG_WX=y, we will get "Found insecure W+X mapping
at address" warning.
Let's fix this issue the same way as commit 728bbe75c8 ("misc: sram:
Introduce support code for protect-exec sram type") is doing for
drivers/misc/sram-exec.c.
On omap3, we need to restore SRAM when returning from off mode after
idle, so init time configuration is not enough.
And as we no longer have users for omap_sram_push_address() we can
make it static while at it.
Note that eventually we should be using sram-exec.c for all SoCs.
Cc: stable@vger.kernel.org # v4.12+
Cc: Dave Gerlach <d-gerlach@ti.com>
Reported-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Set the wmediumd to the net's wmediumd when the radio gets created.
Radios created after HWSIM_CMD_REGISTER don't currently get their
data->wmediumd set and the userspace would need to reconnect to
netlink to be able to call HWSIM_CMD_REGISTER again.
Alternatively I think data->netgroup and data->wmedium could be
replaced with a pointer to hwsim_net.
Signed-off-by: Andrew Zaborowski <andrew.zaborowski@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Commit 7b6ddeaf27 ("mac80211: use QoS NDP for AP probing") added an
argument qos_ok to ieee80211_nullfunc_get to support QoS NDP. Despite
the claim in the commit log "Change all the drivers to *not* allow
QoS NDP for now, even though it looks like most of them should be OK
with that", this commit enables QoS NDP in response to beacons (see
change to mlme.c:ieee80211_send_nullfunc), causing ath9k_htc to lose
IP connectivity. See:
https://patchwork.kernel.org/patch/10241109/https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=891060
Introduce a hardware flag to allow such buggy drivers to override the
correct default behaviour of mac80211 of sending QoS NDP packets.
Signed-off-by: Ben Caradoc-Davies <ben@transient.nz>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
When validating legacy surfaces, the backup bo might be destroyed at
surface validate time. However, the kms resource validation code may have
the bo reserved, so we will destroy a locked mutex. While there shouldn't
be any other users of that mutex when it is destroyed, it causes a lock
leak and thus throws a lockdep error.
Fix this by having the kms resource validation code hold a reference to
the bo while we have it reserved. We do this by introducing a validation
context which might come in handy when the kms code is extended to validate
multiple resources or buffers.
Cc: <stable@vger.kernel.org>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
When we are running without fbdev, transitioning from the login screen to
X or gnome-shell/wayland will cause a vt switch and the driver will disable
svga mode, losing all modesetting resources. However, the kms atomic state
does not reflect that and may think that a crtc is still turned on, which
will cause device errors when we try to bind an fb to the crtc, and the
screen will remain black.
Fix this by turning off all kms resources before disabling svga mode.
Cc: <stable@vger.kernel.org>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
We've observed too long probe time with Coffee Lake (CFL) machines,
and the likely cause is some communication problem between the
HD-audio controller and the codec chips. While the controller expects
an IRQ wakeup for each codec response, it seems sometimes missing, and
it takes one second for the controller driver to time out and read the
response in the polling mode.
Although we aren't sure about the real culprit yet, in this patch, we
put a workaround by forcing the polling mode as default for CFL
machines; the polling mode itself isn't too heavy, and much better
than other workarounds initially suggested (e.g. disabling
power-save), at least.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=199007
Fixes: e79b0006c4 ("ALSA: hda - Add Coffelake PCI ID")
Reported-and-tested-by: Hui Wang <hui.wang@canonical.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Due to missing information in Hardware manual, current
implementation doesn't read ECCSTAT0 and ECCSTAT1 registers
for IFC 2.0.
Add support to read ECCSTAT0 and ECCSTAT1 registers during
ecccheck for IFC 2.0.
Fixes: 656441478e ("mtd: nand: ifc: Fix location of eccstat registers for IFC V1.0")
Cc: stable@vger.kernel.org # v3.18+
Signed-off-by: Jagdish Gediya <jagdish.gediya@nxp.com>
Reviewed-by: Prabhakar Kushwaha <prabhakar.kushwaha@nxp.com>
Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
In order to make sure compiler flag detection for ARM works
correctly the no-integrated-as flags need to be set before
including the arch specific Makefile.
Fixes: cfe17c9bbe ("kbuild: move cc-option and cc-disable-warning after incl. arch Makefile")
Signed-off-by: Stefan Agner <stefan@agner.ch>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Number of ECC status registers i.e. (ECCSTATx) has been increased in IFC
version 2.0.0 due to increase in SRAM size. This is causing eccstat
array to over flow.
So, replace eccstat array with u32 variable to make it fail-safe and
independent of number of ECC status registers or SRAM size.
Fixes: bccb06c353 ("mtd: nand: ifc: update bufnum mask for ver >= 2.0.0")
Cc: stable@vger.kernel.org # 3.18+
Signed-off-by: Prabhakar Kushwaha <prabhakar.kushwaha@nxp.com>
Signed-off-by: Jagdish Gediya <jagdish.gediya@nxp.com>
Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
Some filesystems have timestamps with coarse precision that may allow
for a recently built object file to have the same timestamp as the
updated time on one of its dependency files. When that happens, the
object file doesn't get rebuilt as it should.
This is especially the case on filesystems that don't have sub-second
time precision, such as ext3 or Ext4 with 128B inodes.
Let's prevent that by making sure updated dependency files have a newer
timestamp than the first file we created (i.e. autoksyms.h.tmpnew).
Reported-by: Thomas Lindroth <thomas.lindroth@gmail.com>
Signed-off-by: Nicolas Pitre <nico@linaro.org>
Tested-by: Thomas Lindroth <thomas.lindroth@gmail.com>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
As per the IFC hardware manual, Most significant 2 bytes in
nand_fsr register are the outcome of NAND READ STATUS command.
So status value need to be shifted and aligned as per the nand
framework requirement.
Fixes: 82771882d9 ("NAND Machine support for Integrated Flash Controller")
Cc: stable@vger.kernel.org # v3.18+
Signed-off-by: Jagdish Gediya <jagdish.gediya@nxp.com>
Reviewed-by: Prabhakar Kushwaha <prabhakar.kushwaha@nxp.com>
Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
The truncation isn't being programmed if the truncation
depth is set to 2, it causes an issue with dce11.2 asic
using 6bit eDP panel. It required to truncate 12:10 in order to
perform spatial dither 10:6.
This change will allow 12:10 truncation to be enabled.
Signed-off-by: Mikita Lipski <mikita.lipski@amd.com>
Reviewed-by: Jun Lei <Jun.Lei@amd.com>
Acked-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add a device tree property description for hdmi device node
. '#sound-dai-cells' property is required to describe link between
the HDMI IP block and the SoC's audio subsystem and Exynos SoC device
tree files already have this property but we missed its description.
* tag 'exynos-drm-fixes-for-v4.16-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/daeinki/drm-exynos:
dt-bindings: exynos: Document #sound-dai-cells property of the HDMI node
drm/tegra: Fixes for v4.16-rc7
This contains two small fixes for the alpha blending support that was
merged into v4.16-rc1 and a fix for connector reference leaks caused by
the fact that display pipelines are no longer automatically disabled if
the framebuffer is removed.
Furthermore this contains a fix for a crash on IOMMU detach at driver
unbind time and a regulator enable/disable unbalance fix.
* tag 'drm/tegra/for-4.16-rc7-fixes' of git://anongit.freedesktop.org/tegra/linux:
drm/tegra: Shutdown on driver unbind
drm/tegra: dsi: Don't disable regulator on ->exit()
drm/tegra: dc: Detach IOMMU group from domain only once
drm/tegra: plane: Correct legacy blending
drm/tegra: plane: Fix RGB565 format on older Tegra
Pull clk fixes from Stephen Boyd:
"A late collection of fixes for regressions seen this release cycle.
Normally I send this earlier than now but real life got in the way.
Things are back to normal now.
There's the normal set of SoC driver fixes: i.MX boot warning, TI
display clks, allwinner clk ops being wrong (fun), driver probe
badness on error paths, correctness fix for the new aspeed driver, and
even a fix for a race condition in the bcm2835 clk driver.
At the core framework level we also got some fixes for the clk phase
API caching at the wrong time, better handling of the enabled state of
orphan clks, and a fix for a newly introduced bug in how we handle
rate calculations for pass-through clks"
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: bcm2835: Protect sections updating shared registers
clk: bcm2835: Fix ana->maskX definitions
clk: aspeed: Prevent reset if clock is enabled
clk: aspeed: Fix is_enabled for certain clocks
clk: qcom: msm8916: Fix return value check in qcom_apcs_msm8916_clk_probe()
clk: hisilicon: hi3660:Fix potential NULL dereference in hi3660_stub_clk_probe()
clk: fix determine rate error with pass-through clock
clk: migrate the count of orphaned clocks at init
clk: update cached phase to respect the fact when setting phase
clk: ti: am43xx: add set-rate-parent support for display clkctrl clock
clk: ti: am33xx: add set-rate-parent support for display clkctrl clock
clk: ti: clkctrl: add support for CLK_SET_RATE_PARENT flag
clk: imx51-imx53: Fix UART4/5 registration on i.MX50 and i.MX53
clk: sunxi-ng: a31: Fix CLK_OUT_* clock ops
Prasad reported that he has seen crashes in BPF subsystem with netd
on Android with arm64 in the form of (note, the taint is unrelated):
[ 4134.721483] Unable to handle kernel paging request at virtual address 800000001
[ 4134.820925] Mem abort info:
[ 4134.901283] Exception class = DABT (current EL), IL = 32 bits
[ 4135.016736] SET = 0, FnV = 0
[ 4135.119820] EA = 0, S1PTW = 0
[ 4135.201431] Data abort info:
[ 4135.301388] ISV = 0, ISS = 0x00000021
[ 4135.359599] CM = 0, WnR = 0
[ 4135.470873] user pgtable: 4k pages, 39-bit VAs, pgd = ffffffe39b946000
[ 4135.499757] [0000000800000001] *pgd=0000000000000000, *pud=0000000000000000
[ 4135.660725] Internal error: Oops: 96000021 [#1] PREEMPT SMP
[ 4135.674610] Modules linked in:
[ 4135.682883] CPU: 5 PID: 1260 Comm: netd Tainted: G S W 4.14.19+ #1
[ 4135.716188] task: ffffffe39f4aa380 task.stack: ffffff801d4e0000
[ 4135.731599] PC is at bpf_prog_add+0x20/0x68
[ 4135.741746] LR is at bpf_prog_inc+0x20/0x2c
[ 4135.751788] pc : [<ffffff94ab7ad584>] lr : [<ffffff94ab7ad638>] pstate: 60400145
[ 4135.769062] sp : ffffff801d4e3ce0
[...]
[ 4136.258315] Process netd (pid: 1260, stack limit = 0xffffff801d4e0000)
[ 4136.273746] Call trace:
[...]
[ 4136.442494] 3ca0: ffffff94ab7ad584 0000000060400145 ffffffe3a01bf8f8 0000000000000006
[ 4136.460936] 3cc0: 0000008000000000 ffffff94ab844204 ffffff801d4e3cf0 ffffff94ab7ad584
[ 4136.479241] [<ffffff94ab7ad584>] bpf_prog_add+0x20/0x68
[ 4136.491767] [<ffffff94ab7ad638>] bpf_prog_inc+0x20/0x2c
[ 4136.504536] [<ffffff94ab7b5d08>] bpf_obj_get_user+0x204/0x22c
[ 4136.518746] [<ffffff94ab7ade68>] SyS_bpf+0x5a8/0x1a88
Android's netd was basically pinning the uid cookie BPF map in BPF
fs (/sys/fs/bpf/traffic_cookie_uid_map) and later on retrieving it
again resulting in above panic. Issue is that the map was wrongly
identified as a prog! Above kernel was compiled with clang 4.0,
and it turns out that clang decided to merge the bpf_prog_iops and
bpf_map_iops into a single memory location, such that the two i_ops
could then not be distinguished anymore.
Reason for this miscompilation is that clang has the more aggressive
-fmerge-all-constants enabled by default. In fact, clang source code
has a comment about it in lib/AST/ExprConstant.cpp on why it is okay
to do so:
Pointers with different bases cannot represent the same object.
(Note that clang defaults to -fmerge-all-constants, which can
lead to inconsistent results for comparisons involving the address
of a constant; this generally doesn't matter in practice.)
The issue never appeared with gcc however, since gcc does not enable
-fmerge-all-constants by default and even *explicitly* states in
it's option description that using this flag results in non-conforming
behavior, quote from man gcc:
Languages like C or C++ require each variable, including multiple
instances of the same variable in recursive calls, to have distinct
locations, so using this option results in non-conforming behavior.
There are also various clang bug reports open on that matter [1],
where clang developers acknowledge the non-conforming behavior,
and refer to disabling it with -fno-merge-all-constants. But even
if this gets fixed in clang today, there are already users out there
that triggered this. Thus, fix this issue by explicitly adding
-fno-merge-all-constants to the kernel's Makefile to generically
disable this optimization, since potentially other places in the
kernel could subtly break as well.
Note, there is also a flag called -fmerge-constants (not supported
by clang), which is more conservative and only applies to strings
and it's enabled in gcc's -O/-O2/-O3/-Os optimization levels. In
gcc's code, the two flags -fmerge-{all-,}constants share the same
variable internally, so when disabling it via -fno-merge-all-constants,
then we really don't merge any const data (e.g. strings), and text
size increases with gcc (14,927,214 -> 14,942,646 for vmlinux.o).
$ gcc -fverbose-asm -O2 foo.c -S -o foo.S
-> foo.S lists -fmerge-constants under options enabled
$ gcc -fverbose-asm -O2 -fno-merge-all-constants foo.c -S -o foo.S
-> foo.S doesn't list -fmerge-constants under options enabled
$ gcc -fverbose-asm -O2 -fno-merge-all-constants -fmerge-constants foo.c -S -o foo.S
-> foo.S lists -fmerge-constants under options enabled
Thus, as a workaround we need to set both -fno-merge-all-constants
*and* -fmerge-constants in the Makefile in order for text size to
stay as is.
[1] https://bugs.llvm.org/show_bug.cgi?id=18538
Reported-by: Prasad Sodagudi <psodagud@codeaurora.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Chenbo Feng <fengc@google.com>
Cc: Richard Smith <richard-llvm@metafoo.co.uk>
Cc: Chandler Carruth <chandlerc@gmail.com>
Cc: linux-kernel@vger.kernel.org
Tested-by: Prasad Sodagudi <psodagud@codeaurora.org>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Pull rdma fixes from Jason Gunthorpe:
"Not much exciting here, almost entirely syzkaller fixes.
This is going to be on ongoing theme for some time, I think. Both
Google and Mellanox are now running syzkaller on different parts of
the user API.
Summary:
- Many bug fixes related to syzkaller from Leon Romanovsky. These are
still for the mlx driver and ucma interface.
- Fix a situation with port reuse for iWarp, discovered during
scale-up testing
- Bug fixes for the profile and restrack patches accepted during this
merge window
- Compile warning cleanups from Arnd, this is apparently the last
warning to make 32 bit builds quiet"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
RDMA/ucma: Ensure that CM_ID exists prior to access it
RDMA/verbs: Remove restrack entry from XRCD structure
RDMA/ucma: Fix use-after-free access in ucma_close
RDMA/ucma: Check AF family prior resolving address
infiniband: bnxt_re: use BIT_ULL() for 64-bit bit masks
infiniband: qplib_fp: fix pointer cast
IB/mlx5: Fix cleanup order on unload
RDMA/ucma: Don't allow join attempts for unsupported AF family
RDMA/ucma: Fix access to non-initialized CM_ID object
RDMA/core: Do not use invalid destination in determining port reuse
RDMA/mlx5: Fix crash while accessing garbage pointer and freed memory
IB/mlx5: Fix integer overflows in mlx5_ib_create_srq
IB/mlx5: Fix out-of-bounds read in create_raw_packet_qp_rq
Pull SCSI fixes from James Bottomley:
- one driver patch (qla2xxx) which fixes a problem caused by an
existing regression fix (FCP discovery is failing)
- one generic fix to a longstanding bug in libsas that causes I/O
eventually to hang to the device in the face of ATA error recovery.
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: qla2xxx: Remove FC_NO_LOOP_ID for FCP and FC-NVMe Discovery
scsi: libsas: defer ata device eh commands to libata
Pull nfsd fix from Bruce Fields:
"Just one fix for an occasional panic from Jeff Layton"
* tag 'nfsd-4.16-1' of git://linux-nfs.org/~bfields/linux:
nfsd: remove blocked locks on client teardown
The current check statement in BPF syscall will do a capability check
for CAP_SYS_ADMIN before checking sysctl_unprivileged_bpf_disabled. This
code path will trigger unnecessary security hooks on capability checking
and cause false alarms on unprivileged process trying to get CAP_SYS_ADMIN
access. This can be resolved by simply switch the order of the statement
and CAP_SYS_ADMIN is not required anyway if unprivileged bpf syscall is
allowed.
Signed-off-by: Chenbo Feng <fengc@google.com>
Acked-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Commit 4bebdc7a85 ("bpf: add helper bpf_perf_prog_read_value")
added helper bpf_perf_prog_read_value so that perf_event type program
can read event counter and enabled/running time.
This commit, however, introduced a bug which allows this helper
for tracepoint type programs. This is incorrect as bpf_perf_prog_read_value
needs to access perf_event through its bpf_perf_event_data_kern type context,
which is not available for tracepoint type program.
This patch fixed the issue by separating bpf_func_proto between tracepoint
and perf_event type programs and removed bpf_perf_prog_read_value
from tracepoint func prototype.
Fixes: 4bebdc7a85 ("bpf: add helper bpf_perf_prog_read_value")
Reported-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Function bpf_fill_maxinsns11 is designed to not be able to be JITed on
x86_64. So, it fails when CONFIG_BPF_JIT_ALWAYS_ON=y, and
commit 09584b4067 ("bpf: fix selftests/bpf test_kmod.sh failure when
CONFIG_BPF_JIT_ALWAYS_ON=y") makes sure that failure is detected on that
case.
However, it does not fail on other architectures, which have a different
JIT compiler design. So, test_bpf has started to fail to load on those.
After this fix, test_bpf loads fine on both x86_64 and ppc64el.
Fixes: 09584b4067 ("bpf: fix selftests/bpf test_kmod.sh failure when CONFIG_BPF_JIT_ALWAYS_ON=y")
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
The undocumented 'icebp' instruction (aka 'int1') works pretty much like
'int3' in the absense of in-circuit probing equipment (except,
obviously, that it raises #DB instead of raising #BP), and is used by
some validation test-suites as such.
But Andy Lutomirski noticed that his test suite acted differently in kvm
than on bare hardware.
The reason is that kvm used an inexact test for the icebp instruction:
it just assumed that an all-zero VM exit qualification value meant that
the VM exit was due to icebp.
That is not unlike the guess that do_debug() does for the actual
exception handling case, but it's purely a heuristic, not an absolute
rule. do_debug() does it because it wants to ascribe _some_ reasons to
the #DB that happened, and an empty %dr6 value means that 'icebp' is the
most likely casue and we have no better information.
But kvm can just do it right, because unlike the do_debug() case, kvm
actually sees the real reason for the #DB in the VM-exit interruption
information field.
So instead of relying on an inexact heuristic, just use the actual VM
exit information that says "it was 'icebp'".
Right now the 'icebp' instruction isn't technically documented by Intel,
but that will hopefully change. The special "privileged software
exception" information _is_ actually mentioned in the Intel SDM, even
though the cause of it isn't enumerated.
Reported-by: Andy Lutomirski <luto@kernel.org>
Tested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Marc Kleine-Budde says:
====================
pull-request: can 2018-03-19
this is a pull reqeust of one patch for net/master.
The patch is by Andri Yngvason and fixes a potential use-after-free bug
in the cc770 driver introduced in the previous pull-request.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
If the optional regulator is deferred, we must release some resources.
They will be re-allocated when the probe function will be called again.
Fixes: 6eacf31139 ("ethernet: arc: Add support for Rockchip SoC layer device tree bindings")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current code performs unneeded free. Remove the redundant skb freeing
during the error path.
Fixes: 1555d204e7 ("devlink: Support for pipeline debug (dpipe)")
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Johan Hedberg says:
====================
Here are a few more important Bluetooth driver fixes for the 4.16
kernel.
Please let me know if there are any issues pulling. Thanks.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Sergei Trofimovich reported that restoring an nft ruleset doesn't work
anymore unless old rule content is flushed first.
The problem stems from a recent change designed to prevent multiple nat
hooks at the same hook point locations and nftables transaction model.
A 'flush ruleset' won't take effect until the entire transaction has
completed.
So, if one has a nft.rules file that contains a 'flush ruleset',
followed by a nat hook register request, then 'nft -f file' will work,
but running 'nft -f file' again will fail with -EBUSY.
Reason is that nftables will place the flush/removal requests in the
transaction list, but it will not act on the removal until after all new
rules are in place.
The netfilter core will therefore get request to register a new nat
hook before the old one is removed -- this now fails as the netfilter
core can't know the existing hook is staged for removal.
To fix this, we can search the transaction log when a hook collision
is detected. The collision is okay if
1. there is a delete request pending for the nat hook that is already
registered.
2. there is no second add request for a matching nat hook.
This is required to only apply the exception once.
Fixes: f92b40a8b2 ("netfilter: core: only allow one nat hook per hook point")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
in nftables, 'meter' can be used to instantiate a hash-table at run
time:
rule add filter forward iif "internal" meter hostacct { ip saddr counter}
nft list meter ip filter hostacct
table ip filter {
meter hostacct {
type ipv4_addr
elements = { 192.168.0.1 : counter packets 8 bytes 2672, ..
because elemets get added on the fly, the kernel must chose a set
backend type that implements the ->update() function, otherwise
rule insertion fails with EOPNOTSUPP.
Therefore, skip set types that lack ->update, and also
make sure we do not discard a (bad) candidate when we did yet
find any candidate at all. This could happen when userspace prefers
low memory footprint -- the set implementation currently checked might
not be a fit at all. Make sure we pick it anyway (!bops). In
case next candidate is a better fix, it will be chosen instead.
But in case nothing else is found we at least have a non-ideal
match rather than no match at all.
Fixes: 6c03ae210c ("netfilter: nft_set_hash: add non-resizable hashtable implementation")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The commit "regulatory: add NUL to request alpha2" increases the length of
alpha2 to 3. This causes a regression on brcmfmac, because
brcmf_cfg80211_reg_notifier() expect valid ISO3166 codes in the complete
array. So fix this accordingly.
Fixes: 657308f73e ("regulatory: add NUL to request alpha2")
Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com>
Acked-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Scheduler debug stats include newlines that display out of alignment
when prefixed by timestamps. For example, the dmesg utility:
% echo t > /proc/sysrq-trigger
% dmesg
...
[ 83.124251]
runnable tasks:
S task PID tree-key switches prio wait-time
sum-exec sum-sleep
-----------------------------------------------------------------------------------------------------------
At the same time, some syslog utilities (like rsyslog by default) don't
like the additional newlines control characters, saving lines like this
to /var/log/messages:
Mar 16 16:02:29 localhost kernel: #012runnable tasks:#012 S task PID tree-key ...
^^^^ ^^^^
Clean these up by moving newline characters to their own SEQ_printf
invocation. This leaves the /proc/sched_debug unchanged, but brings the
entire output into alignment when prefixed:
% echo t > /proc/sysrq-trigger
% dmesg
...
[ 62.410368] runnable tasks:
[ 62.410368] S task PID tree-key switches prio wait-time sum-exec sum-sleep
[ 62.410369] -----------------------------------------------------------------------------------------------------------
[ 62.410369] I kworker/u12:0 5 1932.215593 332 120 0.000000 3.621252 0.000000 0 0 /
and no escaped control characters from rsyslog in /var/log/messages:
Mar 16 16:15:06 localhost kernel: runnable tasks:
Mar 16 16:15:06 localhost kernel: S task PID tree-key ...
Signed-off-by: Joe Lawrence <joe.lawrence@redhat.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1521484555-8620-3-git-send-email-joe.lawrence@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
When the SEQ_printf() macro prints to the console, it runs a simple
printk() without KERN_CONT "continued" line printing. The result of
this is oddly wrapped task info, for example:
% echo t > /proc/sysrq-trigger
% dmesg
...
runnable tasks:
...
[ 29.608611] I
[ 29.608613] rcu_sched 8 3252.013846 4087 120
[ 29.608614] 0.000000 29.090111 0.000000
[ 29.608615] 0 0
[ 29.608616] /
Modify SEQ_printf to use pr_cont() for expected one-line results:
% echo t > /proc/sysrq-trigger
% dmesg
...
runnable tasks:
...
[ 106.716329] S cpuhp/5 37 2006.315026 14 120 0.000000 0.496893 0.000000 0 0 /
Signed-off-by: Joe Lawrence <joe.lawrence@redhat.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1521484555-8620-2-git-send-email-joe.lawrence@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
When a perf_event is attached to parent cgroup, it should count events
for all children cgroups:
parent_group <---- perf_event
\
- child_group <---- process(es)
However, in our tests, we found this perf_event cannot report reliable
results. Here is an example case:
# create cgroups
mkdir -p /sys/fs/cgroup/p/c
# start perf for parent group
perf stat -e instructions -G "p"
# on another console, run test process in child cgroup:
stressapptest -s 2 -M 1000 & echo $! > /sys/fs/cgroup/p/c/cgroup.procs
# after the test process is done, stop perf in the first console shows
<not counted> instructions p
The instruction should not be "not counted" as the process runs in the
child cgroup.
We found this is because perf_event->cgrp and cpuctx->cgrp are not
identical, thus perf_event->cgrp are not updated properly.
This patch fixes this by updating perf_cgroup properly for ancestor
cgroup(s).
Reported-by: Ephraim Park <ephiepark@fb.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <jolsa@redhat.com>
Cc: <kernel-team@fb.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/20180312165943.1057894-1-songliubraving@fb.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
With the following commit:
3335224470 ("jump_label: Explicitly disable jump labels in __init code")
... we explicitly disabled jump labels in __init code, so they could be
detected and not warned about in the following commit:
dc1dd184c2 ("jump_label: Warn on failed jump_label patching attempt")
In-kernel __exit code has the same issue. It's never used, so it's
freed along with the rest of initmem. But jump label entries in __exit
code aren't explicitly disabled, so we get the following warning when
enabling pr_debug() in __exit code:
can't patch jump_label at dmi_sysfs_exit+0x0/0x2d
WARNING: CPU: 0 PID: 22572 at kernel/jump_label.c:376 __jump_label_update+0x9d/0xb0
Fix the warning by disabling all jump labels in initmem (which includes
both __init and __exit code).
Reported-and-tested-by: Li Wang <liwang@redhat.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Jason Baron <jbaron@akamai.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: dc1dd184c2 ("jump_label: Warn on failed jump_label patching attempt")
Link: http://lkml.kernel.org/r/7121e6e595374f06616c505b6e690e275c0054d1.1521483452.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
iscsi tcp will first send out data, then calculate and send data
digest. If we don't have BDI_CAP_STABLE_WRITES, the page cache will be
written in spite of the on going writeback. Consequently, wrong digest
will be got and sent to target.
To fix this, set BDI_CAP_STABLE_WRITES when data digest is enabled
in iscsi_tcp .slave_configure callback.
Signed-off-by: Jianchao Wang <jianchao.w.wang@oracle.com>
Acked-by: Chris Leech <cleech@redhat.com>
Acked-by: Lee Duncan <lduncan@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The USB storage glue sets the try_rc_10_first flag in an attempt to
avoid wedging poorly implemented legacy USB devices.
If the device capacity is too large to be expressed in the provided
response buffer field of READ CAPACITY(10), a well-behaved device will
set the reported capacity to 0xFFFFFFFF. We will then attempt to issue a
READ CAPACITY(16) to obtain the real capacity.
Since this part of the discovery logic is not covered by the first_scan
flag, a warning will be printed a couple of times times per revalidate
attempt if we upgrade from READ CAPACITY(10) to READ CAPACITY(16).
Remember that we have successfully issued READ CAPACITY(16) so we can
take the fast path on subsequent revalidate attempts.
Reported-by: Menion <menion@gmail.com>
Reviewed-by: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Grygorii Strashko says:
====================
net: phy: relax error checking when creating sysfs link netdev->phydev
Some ethernet drivers (like TI CPSW) may connect and manage >1 Net PHYs per
one netdevice, as result such drivers will produce warning during system
boot and fail to connect second phy to netdevice when PHYLIB framework
will try to create sysfs link netdev->phydev for second PHY
in phy_attach_direct(), because sysfs link with the same name has been
created already for the first PHY.
As result, second CPSW external port will became unusable.
This regression was introduced by commits:
5568363f0c ("net: phy: Create sysfs reciprocal links for attached_dev/phydev"
a399546049 ("net: phy: Relax error checking on sysfs_create_link()"
Patch 1: exports sysfs_create_link_nowarn() function as preparation for Patch 2.
Patch 2: relaxes error checking when PHYLIB framework is creating sysfs
link netdev->phydev in phy_attach_direct(), suppresses warning by using
sysfs_create_link_nowarn() and adds error message instead, so links creation
failure is not fatal any more and system can continue working,
which fixes TI CPSW issue and makes boot logs accessible
in case of NFS boot, for example.
This can be stable material 4.13+.
Changes in v2:
- commit messages updated.
v1:
https://patchwork.ozlabs.org/cover/886058/
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Some ethernet drivers (like TI CPSW) may connect and manage >1 Net PHYs per
one netdevice, as result such drivers will produce warning during system
boot and fail to connect second phy to netdevice when PHYLIB framework
will try to create sysfs link netdev->phydev for second PHY
in phy_attach_direct(), because sysfs link with the same name has been
created already for the first PHY. As result, second CPSW external
port will became unusable.
Fix it by relaxing error checking when PHYLIB framework is creating sysfs
link netdev->phydev in phy_attach_direct(), suppressing warning by using
sysfs_create_link_nowarn() and adding error message instead.
After this change links (phy->netdev and netdev->phy) creation failure is not
fatal any more and system can continue working, which fixes TI CPSW issue.
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Andrew Lunn <andrew@lunn.ch>
Fixes: a399546049 ("net: phy: Relax error checking on sysfs_create_link()")
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The sysfs_create_link_nowarn() is going to be used in phylib framework in
subsequent patch which can be built as module. Hence, export
sysfs_create_link_nowarn() to avoid build errors.
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Andrew Lunn <andrew@lunn.ch>
Fixes: a399546049 ("net: phy: Relax error checking on sysfs_create_link()")
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull cgroup fixes from Tejun Heo:
"Two commits to fix the following subtle cgroup2 behavior bugs:
- cpu.max was rejecting config when it shouldn't
- thread mode enable was allowed when it shouldn't"
* 'for-4.16-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup: fix rule checking for threaded mode switching
sched, cgroup: Don't reject lower cpu.max on ancestors
The resource allocation in WDAT watchdog has off-one-by error, it sets
one byte more than the actual end address. This may eventually lead
to unexpected resource conflicts.
Fixes: 058dfc7670 (ACPI / watchdog: Add support for WDAT hardware watchdog)
Cc: 4.9+ <stable@vger.kernel.org> # 4.9+
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Acked-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Pull workqueue fixes from Tejun Heo:
"Two low-impact workqueue commits.
One fixes workqueue creation error path and the other removes the
unused cancel_work()"
* 'for-4.16-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue: remove unused cancel_work()
workqueue: use put_device() instead of kfree()
Pull percpu fixes from Tejun Heo:
"Late percpu pull request for v4.16-rc6.
- percpu allocator pool replenishing no longer triggers OOM or
warning messages.
Also, the alloc interface now understands __GFP_NORETRY and
__GFP_NOWARN. This is to allow avoiding OOMs from userland
triggered actions like bpf map creation.
Also added cond_resched() in alloc loop.
- perpcu allocation now can be interrupted by kill sigs to avoid
deadlocking OOM killer.
- Added Dennis Zhou as a co-maintainer.
He has rewritten the area map allocator, understands most of the
code base and has been responsive for all bug reports"
* 'for-4.16-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu:
percpu_ref: Update doc to dissuade users from depending on internal RCU grace periods
mm: Allow to kill tasks doing pcpu_alloc() and waiting for pcpu_balance_workfn()
percpu: include linux/sched.h for cond_resched()
percpu: add a schedule point in pcpu_balance_workfn()
percpu: allow select gfp to be passed to underlying allocators
percpu: add __GFP_NORETRY semantics to the percpu balancing path
percpu: match chunk allocator declarations with definitions
percpu: add Dennis Zhou as a percpu co-maintainer
Pull libata fixes from Tejun Heo:
"I sat on them too long and it's quite a few this late, but nothing has
a wide blast area. The changes are...
- Fix corner cases in SG command handling.
- Recent introduction of default powersaving mode config option
exposed several devices with broken powersaving behaviors. A number
of patches to update the blacklist accordingly.
- Fix a kernel panic on SAS hotplug.
- Other misc and device specific updates"
* 'for-4.16-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata:
libata: Modify quirks for MX100 to limit NCQ_TRIM quirk to MU01 version
libata: Make Crucial BX100 500GB LPM quirk apply to all firmware versions
libata: Apply NOLPM quirk to Crucial M500 480 and 960GB SSDs
libata: Enable queued TRIM for Samsung SSD 860
PCI: Add function 1 DMA alias quirk for Highpoint RocketRAID 644L
ahci: Add PCI-id for the Highpoint Rocketraid 644L card
ata: do not schedule hot plug if it is a sas host
libata: disable LPM for Crucial BX100 SSD 500GB drive
libata: Apply NOLPM quirk to Crucial MX100 512GB SSDs
libata: update documentation for sysfs interfaces
ata: sata_rcar: Remove unused variable in sata_rcar_init_controller()
libata: transport: cleanup documentation of sysfs interface
sata_rcar: Reset SATA PHY when Salvator-X board resumes
libata: don't try to pass through NCQ commands to non-NCQ devices
libata: remove WARN() for DMA or PIO command without data
libata: fix length validation of ATAPI-relayed SCSI commands
ata: libahci: fix comment indentation
ahci: Add check for device presence (PCIe hot unplug) in ahci_stop_engine()
libata: Fix compile warning with ATA_DEBUG enabled
We had some reports of panics in nfsd4_lm_notify, and that showed a
nfs4_lockowner that had outlived its so_client.
Ensure that we walk any leftover lockowners after tearing down all of
the stateids, and remove any blocked locks that they hold.
With this change, we also don't need to walk the nbl_lru on nfsd_net
shutdown, as that will happen naturally when we tear down the clients.
Fixes: 76d348fadf (nfsd: have nfsd4_lock use blocking locks for v4.1+ locks)
Reported-by: Frank Sorenson <fsorenso@redhat.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Cc: stable@vger.kernel.org # 4.9
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
XRCD object is not implemented in the restrack, so lets remove it.
Fixes: 02d8883f52 ("RDMA/restrack: Add general infrastructure to track RDMA resources")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The error in ucma_create_id() left ctx in the list of contexts belong
to ucma file descriptor. The attempt to close this file descriptor causes
to use-after-free accesses while iterating over such list.
Fixes: 7521663857 ("RDMA/cma: Export rdma cm interface to userspace")
Reported-by: <syzbot+dcfd344365a56fbebd0f@syzkaller.appspotmail.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
percpu_ref internally uses sched-RCU to implement the percpu -> atomic
mode switching and the documentation suggested that this could be
depended upon. This doesn't seem like a good idea.
* percpu_ref uses sched-RCU which has different grace periods regular
RCU. Users may combine percpu_ref with regular RCU usage and
incorrectly believe that regular RCU grace periods are performed by
percpu_ref. This can lead to, for example, use-after-free due to
premature freeing.
* percpu_ref has a grace period when switching from percpu to atomic
mode. It doesn't have one between the last put and release. This
distinction is subtle and can lead to surprising bugs.
* percpu_ref allows starting in and switching to atomic mode manually
for debugging and other purposes. This means that there may not be
any grace periods from kill to release.
This patch makes it clear that the grace periods are percpu_ref's
internal implementation detail and can't be depended upon by the
users.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Kent Overstreet <kent.overstreet@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
In case of memory deficit and low percpu memory pages,
pcpu_balance_workfn() takes pcpu_alloc_mutex for a long
time (as it makes memory allocations itself and waits
for memory reclaim). If tasks doing pcpu_alloc() are
choosen by OOM killer, they can't exit, because they
are waiting for the mutex.
The patch makes pcpu_alloc() to care about killing signal
and use mutex_lock_killable(), when it's allowed by GFP
flags. This guarantees, a task does not miss SIGKILL
from OOM killer.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
microblaze build broke due to missing declaration of the
cond_resched() invocation added recently. Let's include linux/sched.h
explicitly.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: kbuild test robot <fengguang.wu@intel.com>
CM_PLLx and A2W_XOSC_CTRL registers are accessed by different clock
handlers and must be accessed with ->regs_lock held.
Update the sections where this protection is missing.
Fixes: 41691b8862 ("clk: bcm2835: Add support for programming the audio domain clocks")
Cc: <stable@vger.kernel.org>
Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
ana->maskX values are already '~'-ed in bcm2835_pll_set_rate(). Remove
the '~' in the definition to fix ANA setup.
Note that this commit fixes a long standing bug preventing one from
using an HDMI display if it's plugged after the FW has booted Linux.
This is because PLLH is used by the HDMI encoder to generate the pixel
clock.
Fixes: 41691b8862 ("clk: bcm2835: Add support for programming the audio domain clocks")
Cc: <stable@vger.kernel.org>
Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Currently, the offsets in the UAC2 processing unit descriptor are
calculated incorrectly. It causes an issue when connecting the device which
provides such a feature:
~~~~
[84126.724420] usb 1-1.3.1: invalid Processing Unit descriptor (id 18)
~~~~
After this patch is applied, the UAC2 processing unit inits w/o this error.
Fixes: 23caaf19b1 ("ALSA: usb-mixer: Add support for Audio Class v2.0")
Signed-off-by: Kirill Marinushkin <k.marinushkin@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
When commit 9c7be59fc5 ("libata: Apply NOLPM quirk to Crucial MX100
512GB SSDs") was added it inherited the ATA_HORKAGE_NO_NCQ_TRIM quirk
from the existing "Crucial_CT*MX100*" entry, but that entry sets model_rev
to "MU01", where as the entry adding the NOLPM quirk sets it to NULL.
This means that after this commit we no apply the NO_NCQ_TRIM quirk to
all "Crucial_CT512MX100*" SSDs even if they have the fixed "MU02"
firmware. This commit splits the "Crucial_CT512MX100*" quirk into 2
quirks, one for the "MU01" firmware and one for all other firmware
versions, so that we once again only apply the NO_NCQ_TRIM quirk to the
"MU01" firmware version.
Fixes: 9c7be59fc5 ("libata: Apply NOLPM quirk to ... MX100 512GB SSDs")
Cc: stable@vger.kernel.org
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Commit b17e5729a6 ("libata: disable LPM for Crucial BX100 SSD 500GB
drive"), introduced a ATA_HORKAGE_NOLPM quirk for Crucial BX100 500GB SSDs
but limited this to the MU02 firmware version, according to:
http://www.crucial.com/usa/en/support-ssd-firmware
MU02 is the last version, so there are no newer possibly fixed versions
and if the MU02 version has broken LPM then the MU01 almost certainly
also has broken LPM, so this commit changes the quirk to apply to all
firmware versions.
Fixes: b17e5729a6 ("libata: disable LPM for Crucial BX100 SSD 500GB...")
Cc: stable@vger.kernel.org
Cc: Kai-Heng Feng <kai.heng.feng@canonical.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
There have been reports of the Crucial M500 480GB model not working
with LPM set to min_power / med_power_with_dipm level.
It has not been tested with medium_power, but that typically has no
measurable power-savings.
Note the reporters Crucial_CT480M500SSD3 has a firmware version of MU03
and there is a MU05 update available, but that update does not mention any
LPM fixes in its changelog, so the quirk matches all firmware versions.
In my experience the LPM problems with (older) Crucial SSDs seem to be
limited to higher capacity versions of the SSDs (different firmware?),
so this commit adds a NOLPM quirk for the 480 and 960GB versions of the
M500, to avoid LPM causing issues with these SSDs.
Cc: stable@vger.kernel.org
Reported-and-tested-by: Martin Steigerwald <martin@lichtvoll.de>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
If the server is malicious then *bytes_read could be larger than the
size of the "target" buffer. It would lead to memory corruption when we
do the memcpy().
Reported-by: Dr Silvio Cesare of InfoSect <Silvio Cesare <silvio.cesare@gmail.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Jonathan writes:
Second set of IIO fixes for the 4.16 cycle.
A slightly fiddly revert then fix pair in here as the bug lead to
an unused local variable that was then removed without us noticing
the bug. The revert should only be needed on 4.16 - the fix
goes back futher.
* ccs811
- Fix the transition from 'boot' to 'application' mode. Fixes the case
where the power is not cut between boot cycles.
* meson-saradc
- Fix missing mutex_unlock in an error path.
* sd-modulator
- Fix bindings doc to have the right value of io-channel-cells to reflect
that this device type only ever outputs one channel.
* st-accel
- Revert drop of redundant pointer patch.
- Use the now available pointer to avoid overwriting the platform data
pointer and causing trouble on reprobing the driver.
* st-pressure
- Use local copy of the platform data pointer to avoid overwriting the
one associated with the device, which would cause issues on reprobing
the driver.
* stm32-dfsdm
- Use the right regmap_cfg for the type of device.
- Correct the ID passed to stop channel to be the channel one.
- Correct which clock is used to allow for the 'audio' clock.
- Fix allocation of channels when more than one is enabled.
Section was not properly computed. The value of OOB region definition is
always ECC section 0 information in the OOB area, but we want to get all
the ECC bytes information, so we should call
mtd_ooblayout_ecc(mtd, section++, &oobregion) until it returns -ERANGE.
Fixes: c2b78452a9 ("mtd: use mtd_ooblayout_xxx() helpers where appropriate")
Cc: <stable@vger.kernel.org>
Signed-off-by: OuYang ZhiZhong <ouyzz@yealink.com>
Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
Revert commit c68f0676ef ("ACPI / battery: Add quirk for Asus
GL502VSK and UX305LA") and commit 4446823e25 ("ACPI / battery: Add
quirk for Asus UX360UA and UX410UAK").
On many many Asus products, the battery is sometimes reported as
charging or discharging even when it is full and you are on AC power.
This change quirked the kernel to avoid advertising the discharging
state when this happens on 4 laptop models, under the belief that
this was incorrect information. I presume it originates from user
reports who are confused that their battery status icon says that it
is discharging.
However, the reported information is indeed correct, and the quirk
approach taken is inadequate and more thought is needed first.
Specifically:
1. It only quirks discharging state, not charging
2. There are so many different Asus products and DMI naming variants
within those product families that behave this way; Linux could
grow to quirk hundreds of products and still not even be close at
"winning" this battle.
3. Asus previously clarified that this behaviour is intentional. The
platform will periodically do a partial discharge/charge cycle
when the battery is full, because this is one way to extend the
lifetime of the battery (leaving a battery at 100% charge and
unused will decrease its usable capacity over time).
My understanding is that any decent consumer product will have
this behaviour, but it appears that Asus is different in that
they expose this info through ACPI.
However, the behaviour seems correct. The ACPI spec does not
suggest in that the platform should hide the truth. It lets you
report that the battery is full of charge, and discharging, and
with external power connected; and Asus does this.
4. In terms of not confusing the user, this seems like something that
could/should be handled by userspace, which can also detect these
same (accurate) conditions in the general case.
Revert this quirk before it gets included in a release, while we look
for better approaches.
Signed-off-by: Daniel Drake <drake@endlessm.com>
Acked-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Since commit 846c7dfc11 ("drm/atomic: Try to preserve the crtc enabled
state in drm_atomic_remove_fb, v2."), removing the last framebuffer will
no longer disable the corresponding pipeline, which causes the KMS core
to complain about leaked connectors on driver unbind.
Fix this by calling drm_atomic_helper_shutdown() on driver unbind, which
will cause all display pipelines to be shut down and therefore drop the
extra references on the connectors.
Signed-off-by: Thierry Reding <treding@nvidia.com>
The regulator is controlled as part of runtime PM, so it should not be
additionally disabled from the ->exit() callback.
Signed-off-by: Thierry Reding <treding@nvidia.com>
Detaching from an IOMMU group multiple times can lead to a crash. This
could potentially be fixed in the IOMMU driver, but it's easy to avoid
the subsequent detach operations in this driver, so do that as well.
Signed-off-by: Thierry Reding <treding@nvidia.com>
When immediate quiet bit is set in CSA, the entire channel is blocked
by the firmware. It is expected that all the MACs will evacuate the
channel and the phy will be eventually either moved or removed.
Currently, the phy context is just unreferenced and thus, the quiet
bit is kept set and it will be impossible to TX on this phy, if we
will need to reuse it in the future. This can be seen when doing a
channel switch with mode=1 (quiet) twice from channel X to Y and then
back to channel X.
Fix that, by moving the phy context to a default channel when not
referenced anymore.
Signed-off-by: Andrei Otcheretianski <andrei.otcheretianski@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
When starting aggregation, the code checks the status of the queue
allocated to the aggregation tid, which might not yet be allocated
and thus the queue index may be invalid.
Fix this by reserving a new queue in case the queue id is invalid.
While at it, clean up some unreachable code (a condition that is
already handled earlier) and remove all the non-DQA comments since
non-DQA mode is no longer supported.
Fixes: cf961e1662 ("iwlwifi: mvm: support dqa-mode agg on non-shared queue")
Signed-off-by: Avraham Stern <avraham.stern@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
If the driver failed to resume from D3, it is possible that it has
no valid aux station. In such case, fw restart will end up in sending
station related commands with an invalid station id, which will
result in an assert.
Fix this by allocating a new station id for the aux station if it
does not have a valid id even in the case of fw restart.
Signed-off-by: Avraham Stern <avraham.stern@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
When a queue is reserved for aggregation, the queue id is assigned
to the tid_data. This is fine since iwl_mvm_sta_tx_agg_oper()
takes care of allocating the queue before actual tx starts.
When the reservation is cancelled (e.g. when the AP declined the
aggregation request) the tid_data is not cleared. As a result,
following tx for this tid was trying to use an unallocated queue.
Fix this by setting the txq_id for the tid to invalid when unreserving
the queue.
Signed-off-by: Avraham Stern <avraham.stern@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
After switching to a new channel, driver schedules session protection
time event in order to hear the beacon on the new channel.
The duration of the protection is two beacon intervals.
However, since we start to switch slightly before beacon with count 1, in
case we don't hear (or AP doesn't transmit) the very first beacon on the
new channel the protection ends without hearing any beacon at all.
At this stage the switch is not complete, the queues are closed and the
interface doesn't have quota yet or TBTT events. As the result, we are
stuck forever waiting for iwl_mvm_post_channel_switch() to be called.
Fix this by increasing the protection time to be 3 beacon intervals and
in addition drop the connection if the time event ends before we got any
beacon.
Signed-off-by: Andrei Otcheretianski <andrei.otcheretianski@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
We shouldn't allow a tunnel to have IP_MAX_MTU as MTU, because
another IPv6 header is going on top of our packets. Without this
patch, we might end up building packets bigger than IP_MAX_MTU.
Fixes: b96f9afee4 ("ipv4/6: use core net MTU range checking")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Acked-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
In vti6_link_config(), if MTU is already given on link creation
or change, validate and use it instead of recomputing it. To do
that, we need to propagate the knowledge that MTU was set by
userspace all the way down to vti6_link_config().
To keep this simple, vti6_dev_init() sets the new 'keep_mtu'
argument of vti6_link_config() to true: on initialization, we
don't have convenient access to netlink attributes there, but we
will anyway check whether dev->mtu is set in vti6_link_config().
If it's non-zero, it was set to the value of the IFLA_MTU
attribute during creation. Otherwise, determine a reasonable
value.
Fixes: ed1efb2aef ("ipv6: Add support for IPsec virtual tunnel interfaces")
Fixes: 53c81e95df ("ip6_vti: adjust vti mtu according to mtu of lower device")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Acked-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
If a lower device is found, we don't need to subtract
LL_MAX_HEADER to calculate our MTU: just use its MTU, the link
layer headers are already taken into account by it.
If the lower device is not found, start from ETH_DATA_LEN
instead, and only in this case subtract a worst-case
LL_MAX_HEADER.
We then need to subtract our additional IPv6 header from the
calculation.
While at it, note that vti6 doesn't have a hardware header, so
it doesn't need to set dev->hard_header_len. And as
vti6_link_config() now always sets the MTU, there's no need to
set a default value in vti6_dev_setup().
This makes the behaviour consistent with IPv4 vti, after
commit a32452366b ("vti4: Don't count header length twice."),
which was accidentally reverted by merge commit f895f0cfbb
("Merge branch 'master' of
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec").
While commit 53c81e95df ("ip6_vti: adjust vti mtu according to
mtu of lower device") improved on the original situation, this
was still not ideal. As reported in that commit message itself,
if we start from an underlying veth MTU of 9000, we end up with
an MTU of 8832, that is, 9000 - LL_MAX_HEADER - sizeof(ipv6hdr).
This should simply be 8880, or 9000 - sizeof(ipv6hdr) instead:
we found the lower device (veth) and we know we don't have any
additional link layer header, so there's no need to subtract an
hypothetical worst-case number.
Fixes: 53c81e95df ("ip6_vti: adjust vti mtu according to mtu of lower device")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Acked-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Don't hardcode a MTU value on vti tunnel initialization,
ip_tunnel_newlink() is able to deal with this already. See also
commit ffc2b6ee41 ("ip_gre: fix IFLA_MTU ignored on NEWLINK").
Fixes: 1181412c1a ("net/ipv4: VTI support new module for ip_vti.")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Acked-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Otherwise, it's possible to specify invalid MTU values directly
on creation of a link (via 'ip link add'). This is already
prevented on subsequent MTU changes by commit b96f9afee4
("ipv4/6: use core net MTU range checking").
Fixes: c544193214 ("GRE: Refactor GRE tunneling code.")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Acked-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
This re-introduces the effect of commit a32452366b ("vti4:
Don't count header length twice.") which was accidentally
reverted by merge commit f895f0cfbb ("Merge branch 'master' of
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec").
The commit message from Steffen Klassert said:
We currently count the size of LL_MAX_HEADER and struct iphdr
twice for vti4 devices, this leads to a wrong device mtu.
The size of LL_MAX_HEADER and struct iphdr is already counted in
ip_tunnel_bind_dev(), so don't do it again in vti_tunnel_init().
And this is still the case now: ip_tunnel_bind_dev() already
accounts for the header length of the link layer (not
necessarily LL_MAX_HEADER, if the output device is found), plus
one IP header.
For example, with a vti device on top of veth, with MTU of 1500,
the existing implementation would set the initial vti MTU to
1332, accounting once for LL_MAX_HEADER (128, included in
hard_header_len by vti) and twice for the same IP header (once
from hard_header_len, once from ip_tunnel_bind_dev()).
It should instead be 1480, because ip_tunnel_bind_dev() is able
to figure out that the output device is veth, so no additional
link layer header is attached, and will properly count one
single IP header.
The existing issue had the side effect of avoiding PMTUD for
most xfrm policies, by arbitrarily lowering the initial MTU.
However, the only way to get a consistent PMTU value is to let
the xfrm PMTU discovery do its course, and commit d6af1a31cc
("vti: Add pmtu handling to vti_xmit.") now takes care of local
delivery cases where the application ignores local socket
notifications.
Fixes: b9959fd3b0 ("vti: switch to new ip tunnel code")
Fixes: f895f0cfbb ("Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Acked-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
The #sound-dai-cells DT property is required to describe link between
the HDMI IP block and the SoC's audio subsystem.
Signed-off-by: Sylwester Nawrocki <s.nawrocki@samsung.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Inki Dae <inki.dae@samsung.com>
When unbinding/removing the driver, we will run into the following warnings:
[ 259.655198] fec 400d1000.ethernet: 400d1000.ethernet supply phy not found, using dummy regulator
[ 259.665065] fec 400d1000.ethernet: Unbalanced pm_runtime_enable!
[ 259.672770] fec 400d1000.ethernet (unnamed net_device) (uninitialized): Invalid MAC address: 00:00:00:00:00:00
[ 259.683062] fec 400d1000.ethernet (unnamed net_device) (uninitialized): Using random MAC address: f2:3e:93:b7:29:c1
[ 259.696239] libphy: fec_enet_mii_bus: probed
Avoid these warnings by balancing the runtime PM calls during fec_drv_remove().
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull x86/pti updates from Thomas Gleixner:
"Another set of melted spectrum updates:
- Iron out the last late microcode loading issues by actually
checking whether new microcode is present and preventing the CPU
synchronization to run into a timeout induced hang.
- Remove Skylake C2 from the microcode blacklist according to the
latest Intel documentation
- Fix the VM86 POPF emulation which traps if VIP is set, but VIF is
not. Enhance the selftests to catch that kind of issue
- Annotate indirect calls/jumps for objtool on 32bit. This is not a
functional issue, but for consistency sake its the right thing to
do.
- Fix a jump label build warning observed on SPARC64 which uses 32bit
storage for the code location which is casted to 64 bit pointer w/o
extending it to 64bit first.
- Add two new cpufeature bits. Not really an urgent issue, but
provides them for both x86 and x86/kvm work. No impact on the
current kernel"
* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/microcode: Fix CPU synchronization routine
x86/microcode: Attempt late loading only when new microcode is present
x86/speculation: Remove Skylake C2 from Speculation Control microcode blacklist
jump_label: Fix sparc64 warning
x86/speculation, objtool: Annotate indirect calls/jumps for objtool on 32-bit kernels
x86/vm86/32: Fix POPF emulation
selftests/x86/entry_from_vm86: Add test cases for POPF
selftests/x86/entry_from_vm86: Exit with 1 if we fail
x86/cpufeatures: Add Intel PCONFIG cpufeature
x86/cpufeatures: Add Intel Total Memory Encryption cpufeature
Pull x86 fix from Thomas Gleixner:
"A single fix for vmalloc_fault() which uses p*d_huge() unconditionally
whether CONFIG_HUGETLBFS is set or not. In case of CONFIG_HUGETLBFS=n
this results in a crash as p*d_huge() returns 0 in that case"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm: Fix vmalloc_fault to use pXd_large
Pull irq fixes from Thomas Gleixner:
"Three fixes for irq chip drivers:
- Make sure the allocations in the GIC-V3 ITS driver are large enough
to accomodate the interrupt space
- Fix a misplaced __iomem annotation which causes a splat of 26
sparse warnings
- Remove an unused function in the IMX GPCV2 driver which causes
build warnings"
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/irq-imx-gpcv2: Remove unused function
irqchip/gic-v3-its: Ensure nr_ites >= nr_lpis
irqchip/gic-v3-its: Fix misplaced __iomem annotations
Pull EFI fix from Thomas Gleixner:
"A single fix to prevent partially initialized pointers in mixed mode
(64bit kernel on 32bit UEFI)"
* 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
efi/libstub/tpm: Initialize pointer variables to zero for mixed mode
Pull KVM fixes from Paolo Bonzini:
"PPC:
- fix bug leading to lost IPIs and smp_call_function_many() lockups
on POWER9
ARM:
- locking fix
- reset fix
- GICv2 multi-source SGI injection fix
- GICv2-on-v3 MMIO synchronization fix
- make the console less verbose.
x86:
- fix device passthrough on AMD SME"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86: Fix device passthrough when SME is active
kvm: arm/arm64: vgic-v3: Tighten synchronization for guests using v2 on v3
KVM: arm/arm64: vgic: Don't populate multiple LRs with the same vintid
KVM: arm/arm64: Reduce verbosity of KVM init log
KVM: arm/arm64: Reset mapped IRQs on VM reset
KVM: arm/arm64: Avoid vcpu_load for other vcpu ioctls than KVM_RUN
KVM: arm/arm64: vgic: Add missing irq_lock to vgic_mmio_read_pending
KVM: PPC: Book3S HV: Fix trap number return from __kvmppc_vcore_entry
batadv_check_unicast_ttvn may redirect a packet to itself or another
originator. This involves rewriting the ttvn and the destination address in
the batadv unicast header. These field were not yet pulled (with skb rcsum
update) and thus any change to them also requires a change in the receive
checksum.
Reported-by: Matthias Schiffer <mschiffer@universe-factory.net>
Fixes: a73105b8d4 ("batman-adv: improved client announcement mechanism")
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
'Commit 45dac1d6ea ("vmxnet3: Changes for vmxnet3 adapter version 2
(fwd)")' introduced a flag "lro" in structure vmxnet3_adapter which is
used to indicate whether LRO is enabled or not. However, the patch
did not set the flag and hence it was never exercised.
So, when LRO is enabled, it resulted in poor TCP performance due to
delayed acks. This issue is seen with packets which are larger than
the mss getting a delayed ack rather than an immediate ack, thus
resulting in high latency.
This patch removes the lro flag and directly uses device features
against NETIF_F_LRO to check if lro is enabled.
Fixes: 45dac1d6ea ("vmxnet3: Changes for vmxnet3 adapter version 2 (fwd)")
Reported-by: Rachel Lunnon <rachel_lunnon@stormagic.com>
Signed-off-by: Ronak Doshi <doshir@vmware.com>
Acked-by: Shrikrishna Khare <skhare@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The field txNumDeferred is used by the driver to keep track of the number
of packets it has pushed to the emulation. The driver increments it on
pushing the packet to the emulation and the emulation resets it to 0 at
the end of the transmit.
There is a possibility of a race either when (a) ESX is under heavy load or
(b) workload inside VM is of low packet rate.
This race results in xmit hangs when network coalescing is disabled. This
change creates a local copy of txNumDeferred and uses it to perform ring
arithmetic.
Reported-by: Noriho Tanaka <ntanaka@vmware.com>
Signed-off-by: Ronak Doshi <doshir@vmware.com>
Acked-by: Shrikrishna Khare <skhare@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Davide Caratti says:
====================
net/sched: fix NULL dereference in the error path of .init()
with several TC actions it's possible to see NULL pointer dereference,
when the .init() function calls tcf_idr_alloc(), fails at some point and
then calls tcf_idr_release(): this series fixes all them introducing
non-NULL tests in the .cleanup() function.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The driver implementation returns support for private flags, while
no private flags are present. When asked for the number of private
flags it returns the number of statistic flag names.
Fix this by returning EOPNOTSUPP for not implemented ethtool flags.
Signed-off-by: Matthias Brugger <mbrugger@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some HP laptops have a mute mute LED controlled by a pin VREF. The
Realtek codec driver updates the VREF via vmaster hook by calling
snd_hda_set_pin_ctl_cache().
This works fine as long as the driver is running in a normal mode.
However, when the VREF change happens during the codec being in
runtime PM suspend, the regmap access will skip and postpone the
actual register change. This ends up with the unchanged LED status
until the next runtime PM resume even if you change the Master mute
switch. (Interestingly, the machine keeps the LED status even after
the codec goes into D3 -- but it's another story.)
For improving this usability, let the driver temporarily powering up /
down only during the pin VREF change. This can be achieved easily by
wrapping the call with snd_hda_power_up_pm() / *_down_pm().
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=199073
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
In commit 9ffcc3725f ("mlxsw: spectrum: Allow packets to be trapped
from any PG") I fixed a problem where packets could not be trapped to
the CPU due to exceeded shared buffer quotas. The mentioned commit
explains the problem in detail.
The problem was fixed by assigning a minimum quota for the CPU port and
the traffic class used for scheduling traffic to the CPU.
However, commit 117b0dad2d ("mlxsw: Create a different trap group list
for each device") assigned different traffic classes to different
packet types and rendered the fix useless.
Fix the problem by assigning a minimum quota for the CPU port and all
the traffic classes that are currently in use.
Fixes: 117b0dad2d ("mlxsw: Create a different trap group list for each device")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: Eddie Shklaer <eddies@mellanox.com>
Tested-by: Eddie Shklaer <eddies@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
syzbot reported one use-after-free in pfifo_fast_enqueue() [1]
Issue here is that we can not reuse skb after a successful skb_array_produce()
since another cpu might have consumed it already.
I believe a similar problem exists in try_bulk_dequeue_skb_slow()
in case we put an skb into qdisc_enqueue_skb_bad_txq() for lockless qdisc.
[1]
BUG: KASAN: use-after-free in qdisc_pkt_len include/net/sch_generic.h:610 [inline]
BUG: KASAN: use-after-free in qdisc_qstats_cpu_backlog_inc include/net/sch_generic.h:712 [inline]
BUG: KASAN: use-after-free in pfifo_fast_enqueue+0x4bc/0x5e0 net/sched/sch_generic.c:639
Read of size 4 at addr ffff8801cede37e8 by task syzkaller717588/5543
CPU: 1 PID: 5543 Comm: syzkaller717588 Not tainted 4.16.0-rc4+ #265
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:17 [inline]
dump_stack+0x194/0x24d lib/dump_stack.c:53
print_address_description+0x73/0x250 mm/kasan/report.c:256
kasan_report_error mm/kasan/report.c:354 [inline]
kasan_report+0x23c/0x360 mm/kasan/report.c:412
__asan_report_load4_noabort+0x14/0x20 mm/kasan/report.c:432
qdisc_pkt_len include/net/sch_generic.h:610 [inline]
qdisc_qstats_cpu_backlog_inc include/net/sch_generic.h:712 [inline]
pfifo_fast_enqueue+0x4bc/0x5e0 net/sched/sch_generic.c:639
__dev_xmit_skb net/core/dev.c:3216 [inline]
Fixes: c5ad119fb6 ("net: sched: pfifo_fast use skb_array")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot+ed43b6903ab968b16f54@syzkaller.appspotmail.com
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Just when I had decided that flush_cache_range() was always called with
a valid context, Helge reported two cases where the
"BUG_ON(!vma->vm_mm->context);" was hit on the phantom buildd:
kernel BUG at /mnt/sdb6/linux/linux-4.15.4/arch/parisc/kernel/cache.c:587!
CPU: 1 PID: 3254 Comm: kworker/1:2 Tainted: G D 4.15.0-1-parisc64-smp #1 Debian 4.15.4-1+b1
Workqueue: events free_ioctx
IAOQ[0]: flush_cache_range+0x164/0x168
IAOQ[1]: flush_cache_page+0x0/0x1c8
RP(r2): unmap_page_range+0xae8/0xb88
Backtrace:
[<00000000404a6980>] unmap_page_range+0xae8/0xb88
[<00000000404a6ae0>] unmap_single_vma+0xc0/0x188
[<00000000404a6cdc>] zap_page_range_single+0x134/0x1f8
[<00000000404a702c>] unmap_mapping_range+0x1cc/0x208
[<0000000040461518>] truncate_pagecache+0x98/0x108
[<0000000040461624>] truncate_setsize+0x9c/0xb8
[<00000000405d7f30>] put_aio_ring_file+0x80/0x100
[<00000000405d803c>] aio_free_ring+0x8c/0x290
[<00000000405d82c0>] free_ioctx+0x80/0x180
[<0000000040284e6c>] process_one_work+0x21c/0x668
[<00000000402854c4>] worker_thread+0x20c/0x778
[<0000000040291d44>] kthread+0x2d4/0x2e0
[<0000000040204020>] end_fault_vector+0x20/0xc0
This indicates that we need to handle the no context case in
flush_cache_range() as we do in flush_cache_mm().
In thinking about this, I realized that we don't need to flush the TLB
when there is no context. So, I added context checks to the large flush
cases in flush_cache_mm() and flush_cache_range(). The large flush case
occurs frequently in flush_cache_mm() and the change should improve fork
performance.
The v2 version of this change removes the BUG_ON from flush_cache_page()
by skipping the TLB flush when there is no context. I also added code
to flush the TLB in flush_cache_mm() and flush_cache_range() when we
have a context that's not current. Now all three routines handle TLB
flushes in a similar manner.
Signed-off-by: John David Anglin <dave.anglin@bell.net>
Cc: stable@vger.kernel.org # 4.9+
Signed-off-by: Helge Deller <deller@gmx.de>
Pull btrfs fixes from David Sterba:
"There's an important revert in this pull request that needs to go to
stable as it causes a corruption on big endian machines.
The other fix is for FIEMAP incorrectly reporting shared extents
before a sync and one fix for a crash in raid56.
So far we got only one report about the BE corruption, the stable
kernels were out for like a week, so hopefully the scope of the damage
is low"
* tag 'for-4.16-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
Revert "btrfs: use proper endianness accessors for super_copy"
btrfs: add missing initialization in btrfs_check_shared
btrfs: Fix NULL pointer exception in find_bio_stripe
Pull microblaze fixes from Michal Simek:
- Use NO_BOOTMEM to fix boot issue
- Fix opt lib endian dependencies
* tag 'microblaze-4.16-rc6' of git://git.monstr.eu/linux-2.6-microblaze:
microblaze: switch to NO_BOOTMEM
microblaze: remove unused alloc_maybe_bootmem
microblaze: Setup dependencies for ASM optimized lib functions
Emanuel reported an issue with a hang during microcode update because my
dumb idea to use one atomic synchronization variable for both rendezvous
- before and after update - was simply bollocks:
microcode: microcode_reload_late: late_cpus: 4
microcode: __reload_late: cpu 2 entered
microcode: __reload_late: cpu 1 entered
microcode: __reload_late: cpu 3 entered
microcode: __reload_late: cpu 0 entered
microcode: __reload_late: cpu 1 left
microcode: Timeout while waiting for CPUs rendezvous, remaining: 1
CPU1 above would finish, leave and the others will still spin waiting for
it to join.
So do two synchronization atomics instead, which makes the code a lot more
straightforward.
Also, since the update is serialized and it also takes quite some time per
microcode engine, increase the exit timeout by the number of CPUs on the
system.
That's ok because the moment all CPUs are done, that timeout will be cut
short.
Furthermore, panic when some of the CPUs timeout when returning from a
microcode update: we can't allow a system with not all cores updated.
Also, as an optimization, do not do the exit sync if microcode wasn't
updated.
Reported-by: Emanuel Czirai <xftroxgpx@protonmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Emanuel Czirai <xftroxgpx@protonmail.com>
Tested-by: Ashok Raj <ashok.raj@intel.com>
Tested-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lkml.kernel.org/r/20180314183615.17629-2-bp@alien8.de
Pull drm fixes from Dave Airlie:
"i915, amd and nouveau fixes.
i915:
- backlight fix for some panels
- pm fix
- fencing fix
- some GVT fixes
amdgpu:
- backlight fix across suspend/resume
- object destruction ordering issue fix
- displayport fix
nouveau:
- two backlight fixes
- fix for some lockups
Pretty quiet week, seems like everyone was fixing backlights"
* tag 'drm-fixes-for-v4.16-rc6' of git://people.freedesktop.org/~airlied/linux:
drm/nouveau/bl: fix backlight regression
drm/nouveau/bl: Fix oops on driver unbind
drm/nouveau/mmu: ALIGN_DOWN correct variable
drm/i915/gvt: fix user copy warning by whitelist workload rb_tail field
drm/i915/gvt: Correct the privilege shadow batch buffer address
drm/amdgpu/dce: Don't turn off DP sink when disconnected
drm/amdgpu: save/restore backlight level in legacy dce code
drm/radeon: fix prime teardown order
drm/amdgpu: fix prime teardown order
drm/i915: Kick the rps worker when changing the boost frequency
drm/i915: Only prune fences after wait-for-all
drm/i915: Enable VBT based BL control for DP
drm/i915/gvt: keep oa config in shadow ctx
drm/i915/gvt: Add runtime_pm_get/put into gvt_switch_mmio
Checking for 0 is insufficient: when an SKB without a batadv header, but
with a VLAN header is received, hdr_size will be 4, making the following
code interpret the Ethernet header as a batadv header.
Fixes: be1db4f661 ("batman-adv: make the Distributed ARP Table vlan aware")
Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
batadv_check_unicast_ttvn() calls skb_cow(), so pointers into the SKB data
must be (re)set after calling it. The ethhdr variable is dropped
altogether.
Fixes: 7cdcf6dddc ("batman-adv: add UNICAST_4ADDR packet type")
Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
The MDIO busses are switch properties and so should be inside the
switch node. Fix the examples in the binding document.
Reported-by: 尤晓杰 <yxj790222@163.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Fixes: a3c53be55c ("net: dsa: mv88e6xxx: Support multiple MDIO busses")
Signed-off-by: David S. Miller <davem@davemloft.net>
When errors are enqueued to the error queue via sock_queue_err_skb()
function, it is possible that the waiting application is not notified.
Calling 'sk->sk_data_ready()' would not notify applications that
selected only POLLERR events in poll() (for example).
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Reported-by: Randy E. Witt <randy.e.witt@intel.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
nlmsg_multicast() consumes always the skb, thus the original skb must be
freed only when this function is called with a clone.
Fixes: cb9f7a9a5c ("netlink: ensure to loop over all netns in genlmsg_multicast_allns()")
Reported-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Link updates were not reported to qedr correctly.
Leading to cases where a link could be down, but qedr
would see it as up.
In addition, once qede was loaded, link state would be up,
regardless of the actual link state.
Signed-off-by: Michal Kalderon <michal.kalderon@cavium.com>
Signed-off-by: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michal Kalderon says:
====================
qed: iWARP related fixes
This series contains two fixes related to iWARP flow.
====================
Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
FW workaround. The iWARP LL2 connection did not expect TCP packets
to arrive on it's connection. The fix drops any non-tcp packets
Fixes b5c29ca ("qed: iWARP CM - setup a ll2 connection for handling
SYN packets")
Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is a corner case in the MPA unalign flow where a FPDU header is
split over two tcp segments. The length of the first fragment in this
case was not initialized properly and should be '1'
Fixes: c7d1d839 ("qed: Add support for MPA header being split over two tcp packets")
Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is no need for complex checking between the last consumed index
and current consumed index, a simple subtraction will do.
This also eliminates the possibility of a permanent transmit queue stall
under the following conditions:
- one CPU bursts ring->size worth of traffic (up to 256 buffers), to the
point where we run out of free descriptors, so we stop the transmit
queue at the end of bcm_sysport_xmit()
- because of our locking, we have the transmit process disable
interrupts which means we can be blocking the TX reclamation process
- when TX reclamation finally runs, we will be computing the difference
between ring->c_index (last consumed index by SW) and what the HW
reports through its register
- this register is masked with (ring->size - 1) = 0xff, which will lead
to stripping the upper bits of the index (register is 16-bits wide)
- we will be computing last_tx_cn as 0, which means there is no work to
be done, and we never wake-up the transmit queue, leaving it
permanently disabled
A practical example is e.g: ring->c_index aka last_c_index = 12, we
pushed 256 entries, HW consumer index = 268, we mask it with 0xff = 12,
so last_tx_cn == 0, nothing happens.
Fixes: 80105befdb ("net: systemport: add Broadcom SYSTEMPORT Ethernet MAC driver")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Toshiaki Makita says:
====================
Fix vlan untag and insertion for bridge and vlan with reorder_hdr off
As Brandon Carpenter reported[1], sending non-vlan-offloaded packets from
bridge devices ends up with corrupted packets. He narrowed down this problem
and found that the root cause is in skb_reorder_vlan_header().
While I was working on fixing this problem, I found that the function does
not work properly for double tagged packets with reorder_hdr off as well.
Patch 1 fixes these 2 problems in skb_reorder_vlan_header().
And it turned out that fixing skb_reorder_vlan_header() is not sufficient
to receive double tagged packets with reorder_hdr off while I was testing the
fix. Vlan tags got out of order when vlan devices with reorder_hdr disabled
were stacked. Patch 2 fixes this problem.
[1] https://www.spinics.net/lists/linux-ethernet-bridging/msg07039.html
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
With reorder header off, received packets are untagged in skb_vlan_untag()
called from within __netif_receive_skb_core(), and later the tag will be
inserted back in vlan_do_receive().
This caused out of order vlan headers when we create a vlan device on top
of another vlan device, because vlan_do_receive() inserts a tag as the
outermost vlan tag. E.g. the outer tag is first removed in skb_vlan_untag()
and inserted back in vlan_do_receive(), then the inner tag is next removed
and inserted back as the outermost tag.
This patch fixes the behaviour by inserting the inner tag at the right
position.
Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
When we have a bridge with vlan_filtering on and a vlan device on top of
it, packets would be corrupted in skb_vlan_untag() called from
br_dev_xmit().
The problem sits in skb_reorder_vlan_header() used in skb_vlan_untag(),
which makes use of skb->mac_len. In this function mac_len is meant for
handling rx path with vlan devices with reorder_header disabled, but in
tx path mac_len is typically 0 and cannot be used, which is the problem
in this case.
The current code even does not properly handle rx path (skb_vlan_untag()
called from __netif_receive_skb_core()) with reorder_header off actually.
In rx path single tag case, it works as follows:
- Before skb_reorder_vlan_header()
mac_header data
v v
+-------------------+-------------+------+----
| ETH | VLAN | ETH |
| ADDRS | TPID | TCI | TYPE |
+-------------------+-------------+------+----
<-------- mac_len --------->
<------------->
to be removed
- After skb_reorder_vlan_header()
mac_header data
v v
+-------------------+------+----
| ETH | ETH |
| ADDRS | TYPE |
+-------------------+------+----
<-------- mac_len --------->
This is ok, but in rx double tag case, it corrupts packets:
- Before skb_reorder_vlan_header()
mac_header data
v v
+-------------------+-------------+-------------+------+----
| ETH | VLAN | VLAN | ETH |
| ADDRS | TPID | TCI | TPID | TCI | TYPE |
+-------------------+-------------+-------------+------+----
<--------------- mac_len ---------------->
<------------->
should be removed
<--------------------------->
actually will be removed
- After skb_reorder_vlan_header()
mac_header data
v v
+-------------------+------+----
| ETH | ETH |
| ADDRS | TYPE |
+-------------------+------+----
<--------------- mac_len ---------------->
So, two of vlan tags are both removed while only inner one should be
removed and mac_header (and mac_len) is broken.
skb_vlan_untag() is meant for removing the vlan header at (skb->data - 2),
so use skb->data and skb->mac_header to calculate the right offset.
Reported-by: Brandon Carpenter <brandon.carpenter@cypherpath.com>
Fixes: a6e18ff111 ("vlan: Fix untag operations of stacked vlans with REORDER_HEADER off")
Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit 3c181c12c4.
The offending patch was merged in 4.16-rc4 and was promptly applied to
stable kernels 4.14.25 and 4.15.8.
The patch causes a corruption in several superblock items on big-endian
machines because of messed up endianity conversions. The damage is
manually repairable. A filesystem cannot be mounted again after it has
been unmounted once.
We do a full revert and not a fixup so stable can pick that patch ASAP.
Fixes: 3c181c12c4 ("btrfs: use proper endianness accessors for super_copy")
Link: https://lkml.kernel.org/r/1521139304@msgid.manchmal.in-ulm.de
CC: stable@vger.kernel.org # 4.14+
Reported-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Signed-off-by: David Sterba <dsterba@suse.com>
When using device passthrough with SME active, the MMIO range that is
mapped for the device should not be mapped encrypted. Add a check in
set_spte() to insure that a page is not mapped encrypted if that page
is a device MMIO page as indicated by kvm_is_mmio_pfn().
Cc: <stable@vger.kernel.org> # 4.14.x-
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Testing brcmfmac with more recent firmwares resulted in AP interfaces
not working in some specific setups. Debugging resulted in discovering
support for IAPP in Broadcom's firmwares.
Older firmwares were only generating 802.11f frames. Newer ones like:
1) 10.10 (TOB) (r663589)
2) 10.10.122.20 (r683106)
for 4366b1 and 4366c0 respectively seem to also /respect/ 802.11f frames
in the Tx path by performing a STA disassociation.
This obsoleted standard and its implementation is something that:
1) Most people don't need / want to use
2) Can allow local DoS attacks
3) Breaks AP interfaces in some specific bridge setups
To solve issues it can cause this commit modifies brcmfmac to drop IAPP
packets. If affects:
1) Rx path: driver won't be sending these unwanted packets up.
2) Tx path: driver will reject packets that would trigger STA
disassociation perfromed by a firmware (possible local DoS attack).
It appears there are some Broadcom's clients/users who care about this
feature despite the drawbacks. They can switch it on using a new module
param.
This change results in only two more comparisons (check for module param
and check for Ethernet packet length) for 99.9% of packets. Its overhead
should be very minimal.
Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
Acked-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Third batch of iwlwifi fixes intended for 4.16:
* Fix an issue with the multicast queue;
* Fix IGTK handling;
* Fix some missing return value checks;
* Add support for a HW workaround for issues on some platforms;
Microblaze doesn't set CONFIG_NO_BOOTMEM and so memblock_virt_alloc()
doesn't work for CONFIG_HAVE_MEMBLOCK && !CONFIG_NO_BOOTMEM.
Similar change was already done by others architectures
"ARM: mm: Remove bootmem code and switch to NO_BOOTMEM"
(sha1: 84f452b1e8)
or
"openrisc: Consolidate setup to use memblock instead of bootmem"
(sha1: 266c7fad15)
or
"parisc: Drop bootmem and switch to memblock"
(sha1: 4fe9e1d957)
or
"powerpc: Remove bootmem allocator"
(sha1: 10239733ee)
or
"s390/mm: Convert bootmem to memblock"
(sha1: 50be634507)
or
"sparc64: Convert over to NO_BOOTMEM."
(sha1: 625d693e97)
or
"xtensa: drop sysmem and switch to memblock"
(sha1: 0e46c1115f)
Issue was introduced by:
"of/fdt: use memblock_virt_alloc for early alloc"
(sha1: 0fa1c57934)
Signed-off-by: Rob Herring <robh@kernel.org>
Tested-by: Alvaro Gamez Machado <alvaro.gamez@hazent.com>
Tested-by: Michal Simek <michal.simek@xilinx.com>
Signed-off-by: Michal Simek <michal.simek@xilinx.com>
The patch:
"microblaze: Setup proper dependency for optimized lib functions"
(sha1: 7b6ce52be3)
didn't setup all dependencies properly.
Optimized lib functions in C are also present for little endian
and optimized library functions in assembler are implemented only for
big endian version.
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Michal Simek <michal.simek@xilinx.com>
Some devices use a shared clock which is very sensitive to variations
and cause trouble in some situations. We need to set a bit in the phy
configuration to indicate that to the FW. To make this generic, add a
extra_phy_config_flags element to the device configuration and OR it
into the phy_cfg before sending it to the firmware. And also create a
set of configurations for devices that use shared clocks and need this
extra bit to be set.
Fixes: c62446d2b0 ("iwlwifi: add new 9460 series PCI IDs")
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
The earlier patch called the station add functions but didn't
assign their return value to the ret variable, so that the
checks for it were meaningless. Fix that.
Found by smatch:
.../mac80211.c:2560 iwl_mvm_start_ap_ibss() warn: we tested 'ret' before and it was 'false'
.../mac80211.c:2563 iwl_mvm_start_ap_ibss() warn: we tested 'ret' before and it was 'false'
Fixes: 3a89411cd31c ("iwlwifi: mvm: fix assert 0x2B00 on older FWs")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Currently when an IGTK is set for an AP, it is set as a regular key.
Since the cipher is set to CMAC, the STA_KEY_FLG_EXT flag is added to
the host command, which causes assert 0x253D on NICs that do not support
this.
Fixes: 85aeb58cec ("iwlwifi: mvm: Enable security on new TX API")
Signed-off-by: Beni Lev <beni.lev@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
The tid being used for the queue (cab_queue) for the MCAST
station has been changed recently to be 0 (for BE).
The flush path still flushed only the special tid (15)
which means that the firmware wasn't flushing the right
queue and we could get a firmware crash upon remove
station if we had an MCAST packet on the ring.
The current code that flushes queues for a station only
differentiates between internal stations (stations that
aren't instantiated in mac80211, like the MCAST station)
and the non-internal ones.
Internal stations can be either: BCAST (beacons), MCAST
(for cab_queue), GENERAL_PURPOSE (p2p dev, and sniffer
injection). The internal stations can use different tids.
To make the code simpler, just flush all the tids always
and add the special internal tid (15) for internal
stations. The firmware will know how to handle this even
if we hadn't any queue mapped that that tid.
Fixes: e340c1a6ef4b ("iwlwifi: mvm: Correctly set the tid for mcast queue")
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
In the xfrm_local_error, rcu_read_unlock should be called when afinfo
is not NULL. because xfrm_state_get_afinfo calls rcu_read_unlock
if afinfo is NULL.
Fixes: af5d27c4e1 ("xfrm: remove xfrm_state_put_afinfo")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Only GVT fixes:
- Two warnings fix for runtime pm and usr copy (Xiong, Zhenyu)
- OA context fix for vGPU profiling (Min)
- privilege batch buffer reloc fix (Fred)
* tag 'drm-intel-fixes-2018-03-15' of git://anongit.freedesktop.org/drm/drm-intel:
drm/i915/gvt: fix user copy warning by whitelist workload rb_tail field
drm/i915/gvt: Correct the privilege shadow batch buffer address
drm/i915/gvt: keep oa config in shadow ctx
drm/i915/gvt: Add runtime_pm_get/put into gvt_switch_mmio
Commit 99759869fa "acpi: Add acpi_map_pxm_to_online_node()" added
support for mapping a given proximity to its nearest, by SLIT distance,
online node. However, it sometimes returns unexpected results due to the
fact that it switches from comparing the PXM node to the last node that
was closer than the current max.
for_each_online_node(n) {
dist = node_distance(node, n);
if (dist < min_dist) {
min_dist = dist;
node = n; <---- from this point we're using the
wrong node for node_distance()
Fixes: 99759869fa ("acpi: Add acpi_map_pxm_to_online_node()")
Cc: <stable@vger.kernel.org>
Reviewed-by: Toshi Kani <toshi.kani@hp.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Pull vfs fixes from Al Viro:
- backport-friendly part of lock_parent() race fix
- a fix for an assumption in the heurisic used by path_connected() that
is not true on NFS
- livelock fixes for d_alloc_parallel()
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fs: Teach path_connected to handle nfs filesystems with multiple roots.
fs: dcache: Use READ_ONCE when accessing i_dir_seq
fs: dcache: Avoid livelock between d_alloc_parallel and __d_add
lock_parent() needs to recheck if dentry got __dentry_kill'ed under it
Unbinding nouveau on a dual GPU MacBook Pro oopses because we iterate
over the bl_connectors list in nouveau_backlight_exit() but skipped
initializing it in nouveau_backlight_init(). Stacktrace for posterity:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
IP: nouveau_backlight_exit+0x2b/0x70 [nouveau]
nouveau_display_destroy+0x29/0x80 [nouveau]
nouveau_drm_unload+0x65/0xe0 [nouveau]
drm_dev_unregister+0x3c/0xe0 [drm]
drm_put_dev+0x2e/0x60 [drm]
nouveau_drm_device_remove+0x47/0x70 [nouveau]
pci_device_remove+0x36/0xb0
device_release_driver_internal+0x157/0x220
driver_detach+0x39/0x70
bus_remove_driver+0x51/0xd0
pci_unregister_driver+0x2a/0xa0
nouveau_drm_exit+0x15/0xfb0 [nouveau]
SyS_delete_module+0x18c/0x290
system_call_fast_compare_end+0xc/0x6f
Fixes: b53ac1ee12 ("drm/nouveau/bl: Do not register interface if Apple GMUX detected")
Cc: stable@vger.kernel.org # v4.10+
Cc: Pierre Moreau <pierre.morrow@free.fr>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Commit 7110c89bb8852ff8b0f88ce05b332b3fe22bd11e ("mmu: swap out round
for ALIGN") replaced two calls to round/rounddown with ALIGN/ALIGN_DOWN,
but erroneously applied ALIGN_DOWN to a different variable (addr) and left
intended variable (tail) not rounded/ALIGNed.
As a result screen corruption, X lockups are observable. An example of kernel
log of affected system with NV98 card where it was bisected:
nouveau 0000:01:00.0: gr: TRAP_M2MF 00000002 [IN]
nouveau 0000:01:00.0: gr: TRAP_M2MF 00320951 400007c0 00000000 04000000
nouveau 0000:01:00.0: gr: 00200000 [] ch 1 [000fbbe000 DRM] subc 4 class 5039
mthd 0100 data 00000000
nouveau 0000:01:00.0: fb: trapped read at 0040000000 on channel 1
[0fbbe000 DRM]
engine 00 [PGRAPH] client 03 [DISPATCH] subclient 04 [M2M_IN] reason 00000006
[NULL_DMAOBJ]
Fixes bug 105173 ("[MCP79][Regression] Unhandled NULL pointer dereference in
nvkm_object_unmap since kernel 4.15")
https://bugs.freedesktop.org/show_bug.cgi?id=105173
Fixes: 7110c89bb885 ("mmu: swap out round for ALIGN ")
Tested-by: Pierre Moreau <pierre.morrow@free.fr>
Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>
Signed-off-by: Maris Nartiss <maris.nartiss@gmail.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Cc: stable@vger.kernel.org # v4.15+
On nfsv2 and nfsv3 the nfs server can export subsets of the same
filesystem and report the same filesystem identifier, so that the nfs
client can know they are the same filesystem. The subsets can be from
disjoint directory trees. The nfsv2 and nfsv3 filesystems provides no
way to find the common root of all directory trees exported form the
server with the same filesystem identifier.
The practical result is that in struct super s_root for nfs s_root is
not necessarily the root of the filesystem. The nfs mount code sets
s_root to the root of the first subset of the nfs filesystem that the
kernel mounts.
This effects the dcache invalidation code in generic_shutdown_super
currently called shrunk_dcache_for_umount and that code for years
has gone through an additional list of dentries that might be dentry
trees that need to be freed to accomodate nfs.
When I wrote path_connected I did not realize nfs was so special, and
it's hueristic for avoiding calling is_subdir can fail.
The practical case where this fails is when there is a move of a
directory from the subtree exposed by one nfs mount to the subtree
exposed by another nfs mount. This move can happen either locally or
remotely. With the remote case requiring that the move directory be cached
before the move and that after the move someone walks the path
to where the move directory now exists and in so doing causes the
already cached directory to be moved in the dcache through the magic
of d_splice_alias.
If someone whose working directory is in the move directory or a
subdirectory and now starts calling .. from the initial mount of nfs
(where s_root == mnt_root), then path_connected as a heuristic will
not bother with the is_subdir check. As s_root really is not the root
of the nfs filesystem this heuristic is wrong, and the path may
actually not be connected and path_connected can fail.
The is_subdir function might be cheap enough that we can call it
unconditionally. Verifying that will take some benchmarking and
the result may not be the same on all kernels this fix needs
to be backported to. So I am avoiding that for now.
Filesystems with snapshots such as nilfs and btrfs do something
similar. But as the directory tree of the snapshots are disjoint
from one another and from the main directory tree rename won't move
things between them and this problem will not occur.
Cc: stable@vger.kernel.org
Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Fixes: 397d425dc2 ("vfs: Test for and handle paths that are unreachable from their mnt_root")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
pmdp_invalidate() was changed to update the pmd atomically
(to not lose dirty/access bits) and return the original pmd
value.
However, in doing so, we lost a lot of the essential work that
set_pmd_at() does, namely to update hugepage mapping counts and
queuing up the batched TLB flush entry.
Thus we were not flushing entries out of the TLB when making
such PMD changes.
Fix this by abstracting the accounting work of set_pmd_at() out into a
separate function, and call it from pmdp_establish().
Fixes: a8e654f01c ("sparc64: update pmdp_invalidate() to return old pmd value")
Signed-off-by: David S. Miller <davem@davemloft.net>
kvm/arm fixes for 4.16, take 2
- Peace of mind locking fix in vgic_mmio_read_pending
- Allow hw-mapped interrupts to be reset when the VM resets
- Fix GICv2 multi-source SGI injection
- Fix MMIO synchronization for GICv2 on v3 emulation
- Remove excess verbosity on the console
The IRQ output of the bcm bt-device is really a level IRQ signal, which
signals a logical high as long as the device's buffer contains data. Since
the draining in the buffer is done in the tty driver, we cannot (easily)
wait in a threaded interrupt handler for the draining, after which the
IRQ should go low again.
So instead we treat the IRQ as an edge interrupt. This opens the window
for a theoretical race where we wakeup, read some data and then autosuspend
*before* the IRQ has gone (logical) low, followed by the device just at
that moment receiving more data, causing the IRQ to stay high and we never
see an edge.
Since we call pm_runtime_mark_last_busy() on every received byte, there
should be plenty time for the IRQ to go (logical) low before we ever
suspend, so this should never happen, but after commit 43fff76834
("Bluetooth: hci_bcm: Streamline runtime PM code"), which has been reverted
since, this was actually happening causing the device to get stuck in
runtime suspend.
The bcm bt-device actually has a workaround for this, if we set the
pulsed_host_wake flag in the sleep parameters, then the device monitors
if the host is draining the buffer and if not then after a timeout the
device will pulse the IRQ line, causing us to see an edge, fixing the
stuck in suspend condition.
This commit sets the pulsed_host_wake flag to fix the (mostly theoretical)
race caused by us treating the IRQ as an edge IRQ.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This reverts commit 43fff76834 ("Bluetooth: hci_bcm: Streamline runtime
PM code"). The commit msg for this commit states "No functional change
intended.", but replacing:
pm_runtime_get();
pm_runtime_mark_last_busy();
pm_runtime_put_autosuspend();
with:
pm_request_resume();
Does result in a functional change, pm_request_resume() only calls
pm_runtime_mark_last_busy() if the device was suspended before the call.
This results in the following happening:
1) Device is runtime suspended
2) Device drives host_wake IRQ logically high as it starts receiving data
3) bcm_host_wake() gets called, causes the device to runtime-resume,
current time gets marked as last_busy time
4) After 5 seconds the autosuspend timer expires and the dev autosuspends
as no one has been calling pm_runtime_mark_last_busy(), the device was
resumed during those 5 seconds, so all the pm_request_resume() calls
while receiving data and/or bcm_host_wake() calls were nops
5) If 4) happens while the device has (just received) data in its buffer to
be read by the host the IRQ line is *already* / still logically high
when we autosuspend and since we use an edge triggered IRQ, the IRQ
will never trigger, causing the device to get stuck in suspend
Therefor this commit has to be reverted, so that we avoid the device
getting stuck in suspend.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Acked-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
According to the Aspeed specification, the reset and enable sequence
should be done when the clock is stopped. The specification doesn't
define behavior if the reset is done while the clock is enabled.
From testing on the AST2500, the LPC Controller has problems if the
clock is reset while enabled.
Therefore, check whether the clock is enabled or not before performing
the reset and enable sequence in the Aspeed clock driver.
Reported-by: Lei Yu <mine260309@gmail.com>
Signed-off-by: Eddie James <eajames@linux.vnet.ibm.com>
Fixes: 15ed8ce5f8 ("clk: aspeed: Register gated clocks")
Reviewed-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Some of the Aspeed clocks are disabled by setting the relevant bit in
the "clock stop control" register to one, while others are disabled by
setting their bit to zero. The driver already uses a flag per gate to
identify this behavior, but doesn't apply it in the clock is_enabled
function.
Use the existing gate flag to correctly return whether or not a clock
is enabled in the aspeed_clk_is_enabled function.
Signed-off-by: Eddie James <eajames@linux.vnet.ibm.com>
Fixes: 6671507f0f ("clk: aspeed: Handle inverse polarity of USB port 1 clock gate")
Reviewed-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Pull sound fixes from Takashi Iwai:
"A series of small fixes in ASoC, HD-audio and core stuff:
- a UAF fix in ALSA PCM core
- yet more hardening for ALSA sequencer
- a regression fix for the previous HD-audio power_save option change
- various ASoC codec fixes (sgtl5000, rt5651, hdmi-codec, wm_adsp)
- minor ASoC platform fixes (AMD ACP, sun4i)"
* tag 'sound-4.16-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda - Revert power_save option default value
ALSA: pcm: Fix UAF in snd_pcm_oss_get_formats()
ALSA: seq: Clear client entry before deleting else at closing
ALSA: seq: Fix possible UAF in snd_seq_check_queue()
ASoC: amd: 16bit resolution support for i2s sp instance
ASoC: wm_adsp: For TLV controls only register TLV get/set
ASoC: sun4i-i2s: Fix RX slot number of SUN8I
ASoC: hdmi-codec: Fix module unloading caused kernel crash
ASoC: rt5651: Fix regcache sync errors on resume
ASoC: sgtl5000: Fix suspend/resume
MAINTAINERS: Add myself as sgtl5000 maintainer
ASoC: samsung: Add the DT binding files entry to MAINTAINERS
sgtl5000: change digital_mute policy
Pull device mapper fixes from Mike Snitzer:
- a stable DM multipath fix to restore ability to pass integrity data
- two DM multipath fixes for a fix that was merged into 4.16-rc5
* tag 'for-4.16/dm-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm mpath: fix passing integrity data
dm mpath: eliminate need to use scsi_device_from_queue
dm mpath: fix uninitialized 'pg_init_wait' waitqueue_head NULL pointer
Right now the vblank event completion is racing with the atomic update,
which is especially bad when the PRE is in use, as one of the hardware
issue workaround might extend the atomic commit for quite some time.
If the vblank IRQ happens to trigger during that time, we will prematurely
signal the atomic commit completion to userspace, which causes tearing
when userspace re-uses a framebuffer we haven't managed to flip away from
yet.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
ipu_planes_assign_pre() prototype is in "imx-drm.h" header file, so
include it to fix the following sparse warning:
drivers/gpu/drm/imx/ipuv3-plane.c:729:5: warning: symbol 'ipu_planes_assign_pre' was not declared. Should it be static?
Signed-off-by: Fabio Estevam <fabio.estevam@nxp.com>
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
ipu_plane_state_reset(), ipu_plane_duplicate_state() and
ipu_plane_destroy_state() are only used in this file, so make them static.
This fixes the following sparse warnings:
drivers/gpu/drm/imx/ipuv3-plane.c:275:6: warning: symbol 'ipu_plane_state_reset' was not declared. Should it be static?
drivers/gpu/drm/imx/ipuv3-plane.c:295:24: warning: symbol 'ipu_plane_duplicate_state' was not declared. Should it be static?
drivers/gpu/drm/imx/ipuv3-plane.c:309:6: warning: symbol 'ipu_plane_destroy_state' was not declared. Should it be static?
Signed-off-by: Fabio Estevam <fabio.estevam@nxp.com>
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
gcc-8 reports that we access an array with a negative index
in an error case:
drivers/gpu/ipu-v3/ipu-prg.c: In function 'ipu_prg_channel_disable':
drivers/gpu/ipu-v3/ipu-prg.c:252:43: error: array subscript -22 is below array bounds of 'struct ipu_prg_channel[3]' [-Werror=array-bounds]
This moves the range check in front of the first time that
variable gets used.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Keep old 'dependent' state of unaffected planes, this way new state takes
into account current state of unaffected planes.
Fixes: ebae8d0743 ("drm/tegra: dc: Implement legacy blending")
Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Closing of a listen socket wakes up kernel_accept() of
smc_tcp_listen_worker(), and then has to wait till smc_tcp_listen_worker()
gives up the internal clcsock. The wait logic introduced with
commit 127f497058 ("net/smc: release clcsock from tcp_listen_worker")
might wait longer than necessary. This patch implements the idea to
implement the wait just with flush_work(), and gets rid of the extra
smc_close_wait_listen_clcsock() function.
Fixes: 127f497058 ("net/smc: release clcsock from tcp_listen_worker")
Reported-by: Hans Wippel <hwippel@linux.vnet.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The opaque/alpha format conversion code is currently only looking at
XRGB formats because they have an equivalent ARGB format. The opaque
format for RGB565 is RGB565 itself, much like the YUV formats map to
themselves.
Reported-by: Dmitry Osipenko <digetx@gmail.com>
Fixes: ebae8d0743 ("drm/tegra: dc: Implement legacy blending")
Signed-off-by: Thierry Reding <treding@nvidia.com>
Swap the positions of blk_addr and blksz in the tracepoint print arguments
so that they match the print format.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Fixes: d2f82254e4 ("mmc: core: Add members to mmc_request and mmc_data for CQE's")
Cc: <stable@vger.kernel.org> # 4.14+
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Certain Micron eMMC v4.5 cards might get broken when HPI feature is used
and hence this patch disables the HPI feature for such buggy cards.
In U-Boot, these cards are reported as
Manufacturer: Micron (ID: 0xFE)
OEM: 0x4E
Name: MMC32G
Revision: 19 (0x13)
Serial: 959241022 Manufact. date: 8/2015 (0x82) CRC: 0x00
Tran Speed: 52000000
Rd Block Len: 512
MMC version 4.5
High Capacity: Yes
Capacity: 29.1 GiB
Boot Partition Size: 16 MiB
Bus Width: 8-bit
According to JEDEC JEP106 manufacturer 0xFE is Numonyx, which was bought by
Micron.
Signed-off-by: Dirk Behme <dirk.behme@de.bosch.com>
Signed-off-by: Mark Craske <Mark_Craske@mentor.com>
Cc: <stable@vger.kernel.org> # 4.8+
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
PARTITION_CONFIG is cached in mmc_card->ext_csd.part_config and the
currently active partition in mmc_blk_data->part_curr. These caches do
not always reflect changes if the ioctl call modifies the
PARTITION_CONFIG registers, e.g. by changing BOOT_PARTITION_ENABLE.
Write the PARTITION_CONFIG value extracted from the ioctl call to the
cache and update the currently active partition accordingly. This
ensures that the user space cannot change the values behind the
kernel's back. The next call to mmc_blk_part_switch() will operate on
the data set by the ioctl and reflect the changes appropriately.
Signed-off-by: Bastian Stender <bst@pengutronix.de>
Signed-off-by: Jan Luebbe <jlu@pengutronix.de>
Cc: stable@vger.kernel.org
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Once the ring buffer is copied to ring_scan_buffer and scanned,
the shadow batch buffer start address is only updated into
ring_scan_buffer, not the real ring address allocated through
intel_ring_begin in later copy_workload_to_ring_buffer.
This patch is only to set the right shadow batch buffer address
from Ring buffer, not include the shadow_wa_ctx.
v2:
- refine some comments. (Zhenyu)
v3:
- fix typo in title. (Zhenyu)
v4:
- remove the unnecessary comments. (Zhenyu)
- add comments in bb_start_cmd_va update. (Zhenyu)
Fixes: 0a53bc07f0 ("drm/i915/gvt: Separate cmd scan from request allocation")
Cc: stable@vger.kernel.org # v4.15
Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Cc: Yulei Zhang <yulei.zhang@intel.com>
Signed-off-by: fred gao <fred.gao@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Following an RSCN, ibmvfc will issue an ADISC to determine if the
underlying target has changed, comparing the SCSI ID, WWPN, and WWNN to
determine how to handle the rport in discovery. However, the comparison
of the WWPN and WWNN was performing a memcmp between a big endian field
against a CPU endian field, which resulted in the wrong answer on LE
systems. This was observed as unexpected errors getting logged at boot
time as targets were getting relogins when not needed.
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Pull SCSI fixes from James Bottomley:
"This is four patches, consisting of one regression from the merge
window (qla2xxx), one long-standing memory leak (sd_zbc), one event
queue mislabelling which we want to eliminate to discourage the
pattern (mpt3sas), and one behaviour change because re-reading the
partition table shouldn't clear the ro flag"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: sd: Keep disk read-only when re-reading partition
scsi: qla2xxx: Fix crashes in qla2x00_probe_one on probe failure
scsi: sd_zbc: Fix potential memory leak
scsi: mpt3sas: Do not mark fw_event workqueue as WQ_MEM_RECLAIM
geo->keylen cannot be larger than 4. So we might as well make
fixed-size allocations.
Given the one remaining user, geo->keylen cannot even be larger than 1.
Logfs used to have 64bit and 128bit keys, tcm_qla2xxx only has 32bit
keys. But let's not break the code if we don't have to.
Signed-off-by: Joern Engel <joern@purestorage.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull percpu_ref rcu fixes from Tejun Heo:
"Jann Horn found that aio was depending on the internal RCU grace
periods of percpu-ref and that it's broken because aio uses regular
RCU while percpu_ref uses sched-RCU.
Depending on percpu_ref's internal grace periods isn't a good idea
because
- The RCU type might not match.
- percpu_ref's grace periods are used to switch to atomic mode. They
aren't between the last put and the invocation of the last release.
This is easy to get confused about and can lead to subtle bugs.
- percpu_ref might not have grace periods at all depending on its
current operation mode.
This patchset audits and fixes percpu_ref users for their RCU usages"
[ There's a continuation of this series that clarifies percpu_ref
documentation that the internal grace periods must not be depended
upon, and introduces rcu_work to simplify bouncing to a workqueue
after an RCU grace period.
That will go in for 4.17 - this is just the minimal set with the fixes
that are tagged for -stable ]
* 'percpu_ref-rcu-audit-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc:
RDMAVT: Fix synchronization around percpu_ref
fs/aio: Use RCU accessors for kioctx_table->table[]
fs/aio: Add explicit RCU grace period when freeing kioctx
This reverts commit 864b75f9d6.
Commit 864b75f9d6 ("mm/page_alloc: fix memmap_init_zone pageblock
alignment") modified the logic in memmap_init_zone() to initialize
struct pages associated with invalid PFNs, to appease a VM_BUG_ON()
in move_freepages(), which is redundant by its own admission, and
dereferences struct page fields to obtain the zone without checking
whether the struct pages in question are valid to begin with.
Commit 864b75f9d6 only makes it worse, since the rounding it does
may cause pfn assume the same value it had in a prior iteration of
the loop, resulting in an infinite loop and a hang very early in the
boot. Also, since it doesn't perform the same rounding on start_pfn
itself but only on intermediate values following an invalid PFN, we
may still hit the same VM_BUG_ON() as before.
So instead, let's fix this at the core, and ensure that the BUG
check doesn't dereference struct page fields of invalid pages.
Fixes: 864b75f9d6 ("mm/page_alloc: fix memmap_init_zone pageblock alignment")
Tested-by: Jan Glauber <jglauber@cavium.com>
Tested-by: Shanker Donthineni <shankerd@codeaurora.org>
Cc: Daniel Vacek <neelx@redhat.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Paul Burton <paul.burton@imgtec.com>
Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- 1 display fix for bxt
- 1 gem fix for fences
- 1 gem/pm fix for rps freq
* tag 'drm-intel-fixes-2018-03-14' of git://anongit.freedesktop.org/drm/drm-intel:
drm/i915: Kick the rps worker when changing the boost frequency
drm/i915: Only prune fences after wait-for-all
drm/i915: Enable VBT based BL control for DP
A few fixes for 4.16:
- Fix a backlight S/R regression on amdgpu
- Fix prime teardown on radeon and amdgpu
- DP fix for amdgpu
* 'drm-fixes-4.16' of git://people.freedesktop.org/~agd5f/linux:
drm/amdgpu/dce: Don't turn off DP sink when disconnected
drm/amdgpu: save/restore backlight level in legacy dce code
drm/radeon: fix prime teardown order
drm/amdgpu: fix prime teardown order
On 32-bit targets, we otherwise get a warning about an impossible constant
integer expression:
In file included from include/linux/kernel.h:11,
from include/linux/interrupt.h:6,
from drivers/infiniband/hw/bnxt_re/ib_verbs.c:39:
drivers/infiniband/hw/bnxt_re/ib_verbs.c: In function 'bnxt_re_query_device':
include/linux/bitops.h:7:24: error: left shift count >= width of type [-Werror=shift-count-overflow]
#define BIT(nr) (1UL << (nr))
^~
drivers/infiniband/hw/bnxt_re/bnxt_re.h:61:34: note: in expansion of macro 'BIT'
#define BNXT_RE_MAX_MR_SIZE_HIGH BIT(39)
^~~
drivers/infiniband/hw/bnxt_re/bnxt_re.h:62:30: note: in expansion of macro 'BNXT_RE_MAX_MR_SIZE_HIGH'
#define BNXT_RE_MAX_MR_SIZE BNXT_RE_MAX_MR_SIZE_HIGH
^~~~~~~~~~~~~~~~~~~~~~~~
drivers/infiniband/hw/bnxt_re/ib_verbs.c:149:25: note: in expansion of macro 'BNXT_RE_MAX_MR_SIZE'
ib_attr->max_mr_size = BNXT_RE_MAX_MR_SIZE;
^~~~~~~~~~~~~~~~~~~
Fixes: 872f357824 ("RDMA/bnxt_re: Add support for MRs with Huge pages")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Building for a 32-bit target results in a couple of warnings from casting
between a 32-bit pointer and a 64-bit integer:
drivers/infiniband/hw/bnxt_re/qplib_fp.c: In function 'bnxt_qplib_service_nq':
drivers/infiniband/hw/bnxt_re/qplib_fp.c:333:23: error: cast to pointer from integer of different size [-Werror=int-to-pointer-cast]
bnxt_qplib_arm_srq((struct bnxt_qplib_srq *)q_handle,
^
drivers/infiniband/hw/bnxt_re/qplib_fp.c:336:12: error: cast to pointer from integer of different size [-Werror=int-to-pointer-cast]
(struct bnxt_qplib_srq *)q_handle,
^
In file included from include/linux/byteorder/little_endian.h:5,
from arch/arm/include/uapi/asm/byteorder.h:22,
from include/asm-generic/bitops/le.h:6,
from arch/arm/include/asm/bitops.h:342,
from include/linux/bitops.h:38,
from include/linux/kernel.h:11,
from include/linux/interrupt.h:6,
from drivers/infiniband/hw/bnxt_re/qplib_fp.c:39:
drivers/infiniband/hw/bnxt_re/qplib_fp.c: In function 'bnxt_qplib_create_srq':
include/uapi/linux/byteorder/little_endian.h:31:43: error: cast from pointer to integer of different size [-Werror=pointer-to-int-cast]
#define __cpu_to_le64(x) ((__force __le64)(__u64)(x))
^
include/linux/byteorder/generic.h:86:21: note: in expansion of macro '__cpu_to_le64'
#define cpu_to_le64 __cpu_to_le64
^~~~~~~~~~~~~
drivers/infiniband/hw/bnxt_re/qplib_fp.c:569:19: note: in expansion of macro 'cpu_to_le64'
req.srq_handle = cpu_to_le64(srq);
Using a uintptr_t as an intermediate works on all architectures.
Fixes: 37cb11acf1 ("RDMA/bnxt_re: Add SRQ support for Broadcom adapters")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Commit 24b6d41643 "mm: pass the vmem_altmap to vmemmap_free" converted
the vmemmap_free() path to pass the altmap argument all the way through
the call chain rather than looking it up based on the page.
Unfortunately that ends up over freeing altmap allocated pages in some
cases since free_pagetable() is used to free both memmap space and pte
space, where only the memmap stored in huge pages uses altmap
allocations.
Given that altmap allocations for memmap space are special cased in
vmemmap_populate_hugepages() add a symmetric / special case
free_hugepage_table() to handle altmap freeing, and cleanup the unneeded
passing of altmap to leaf functions that do not require it.
Without this change the sanity check accounting in
devm_memremap_pages_release() will throw a warning with the following
signature.
nd_pmem pfn10.1: devm_memremap_pages_release: failed to free all reserved pages
WARNING: CPU: 44 PID: 3539 at kernel/memremap.c:310 devm_memremap_pages_release+0x1c7/0x220
CPU: 44 PID: 3539 Comm: ndctl Tainted: G L 4.16.0-rc1-linux-stable #7
RIP: 0010:devm_memremap_pages_release+0x1c7/0x220
[..]
Call Trace:
release_nodes+0x225/0x270
device_release_driver_internal+0x15d/0x210
bus_remove_device+0xe2/0x160
device_del+0x130/0x310
? klist_release+0x56/0x100
? nd_region_notify+0xc0/0xc0 [libnvdimm]
device_unregister+0x16/0x60
This was missed in testing since not all configurations will trigger
this warning.
Fixes: 24b6d41643 ("mm: pass the vmem_altmap to vmemmap_free")
Reported-by: Jane Chu <jane.chu@oracle.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
This patch addresses an issue that causes fiemap to falsely
report a shared extent. The test case is as follows:
xfs_io -f -d -c "pwrite -b 16k 0 64k" -c "fiemap -v" /media/scratch/file5
sync
xfs_io -c "fiemap -v" /media/scratch/file5
which gives the resulting output:
wrote 65536/65536 bytes at offset 0
64 KiB, 4 ops; 0.0000 sec (121.359 MiB/sec and 7766.9903 ops/sec)
/media/scratch/file5:
EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS
0: [0..127]: 24576..24703 128 0x2001
/media/scratch/file5:
EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS
0: [0..127]: 24576..24703 128 0x1
This is because btrfs_check_shared calls find_parent_nodes
repeatedly in a loop, passing a share_check struct to report
the count of shared extent. But btrfs_check_shared does not
re-initialize the count value to zero for subsequent calls
from the loop, resulting in a false share count value. This
is a regressive behavior from 4.13.
With proper re-initialization the test result is as follows:
wrote 65536/65536 bytes at offset 0
64 KiB, 4 ops; 0.0000 sec (110.035 MiB/sec and 7042.2535 ops/sec)
/media/scratch/file5:
EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS
0: [0..127]: 24576..24703 128 0x1
/media/scratch/file5:
EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS
0: [0..127]: 24576..24703 128 0x1
which corrects the regression.
Fixes: 3ec4d3238a ("btrfs: allow backref search checks for shared extents")
Signed-off-by: Edmund Nadolski <enadolski@suse.com>
[ add text from cover letter to changelog ]
Signed-off-by: David Sterba <dsterba@suse.com>
On load we create private CQ/QP/PD in order to be used by UMR, we create
those resources after we register ourself as an IB device, and we destroy
them after we unregister as an IB device. This was changed by commit
16c1975f10 ("IB/mlx5: Create profile infrastructure to add and remove
stages") which moved the destruction before we unregistration. This
allowed to trigger an invalid memory access when unloading mlx5_ib while
there are open resources:
BUG: unable to handle kernel paging request at 00000001002c012c
...
Call Trace:
mlx5_ib_post_send_wait+0x75/0x110 [mlx5_ib]
__slab_free+0x9a/0x2d0
delay_time_func+0x10/0x10 [mlx5_ib]
unreg_umr.isra.15+0x4b/0x50 [mlx5_ib]
mlx5_mr_cache_free+0x46/0x150 [mlx5_ib]
clean_mr+0xc9/0x190 [mlx5_ib]
dereg_mr+0xba/0xf0 [mlx5_ib]
ib_dereg_mr+0x13/0x20 [ib_core]
remove_commit_idr_uobject+0x16/0x70 [ib_uverbs]
uverbs_cleanup_ucontext+0xe8/0x1a0 [ib_uverbs]
ib_uverbs_cleanup_ucontext.isra.9+0x19/0x40 [ib_uverbs]
ib_uverbs_remove_one+0x162/0x2e0 [ib_uverbs]
ib_unregister_device+0xd4/0x190 [ib_core]
__mlx5_ib_remove+0x2e/0x40 [mlx5_ib]
mlx5_remove_device+0xf5/0x120 [mlx5_core]
mlx5_unregister_interface+0x37/0x90 [mlx5_core]
mlx5_ib_cleanup+0xc/0x225 [mlx5_ib]
SyS_delete_module+0x153/0x230
do_syscall_64+0x62/0x110
entry_SYSCALL_64_after_hwframe+0x21/0x86
...
We restore the original behavior by breaking the UMR stage into two parts,
pre and post IB registration stages, this way we can restore the original
functionality and maintain clean separation of logic between stages.
Fixes: 16c1975f10 ("IB/mlx5: Create profile infrastructure to add and remove stages")
Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Pull x86 platform drives fixes from Darren Hart:
- DELL_SMBIOS conditionally depends on ACPI_WMI in the same way it
depends on DCDBAS, update the Kconfig accordingly.
- fix the dell driver init order to ensure that the driver dependencies
are met, avoiding race conditions resulting in boot failure on
certain systems when the drivers are built-in.
* tag 'platform-drivers-x86-v4.16-7' of git://git.infradead.org/linux-platform-drivers-x86:
platform/x86: Fix dell driver init order
platform/x86: dell-smbios: Resolve dependency error on ACPI_WMI
cma_port_is_unique() allows local port reuse if the quad (source
address and port, destination address and port) for this connection
is unique. However, if the destination info is zero or unspecified, it
can't make a correct decision but still allows port reuse. For example,
sometimes rdma_bind_addr() is called with unspecified destination and
reusing the port can lead to creating a connection with a duplicate quad,
after the destination is resolved. The issue manifests when MPI scale-up
tests hang after the duplicate quad is used.
Set the destination address family and add checks for zero destination
address and port to prevent source port reuse based on invalid destination.
Fixes: 19b752a19d ("IB/cma: Allow port reuse for rdma_id")
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
After v4.12 commit e2460f2a4b ("dm: mark targets that pass integrity
data"), dm-multipath, e.g. on DIF+DIX SCSI disk paths, does not support
block integrity any more. So add it to the whitelist.
This is also a pre-requisite to use block integrity with other dm layer(s)
on top of multipath, such as kpartx partitions (dm-linear) or LVM.
Also, bump target version to reflect this fix.
Fixes: e2460f2a4b ("dm: mark targets that pass integrity data")
Cc: <stable@vger.kernel.org> #4.12+
Bisected-by: Fedor Loshakov <loshakov@linux.vnet.ibm.com>
Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
rvt_mregion uses percpu_ref for reference counting and RCU to protect
accesses from lkey_table. When a rvt_mregion needs to be freed, it
first gets unregistered from lkey_table and then rvt_check_refs() is
called to wait for in-flight usages before the rvt_mregion is freed.
rvt_check_refs() seems to have a couple issues.
* It has a fast exit path which tests percpu_ref_is_zero(). However,
a percpu_ref reading zero doesn't mean that the object can be
released. In fact, the ->release() callback might not even have
started executing yet. Proceeding with freeing can lead to
use-after-free.
* lkey_table is RCU protected but there is no RCU grace period in the
free path. percpu_ref uses RCU internally but it's sched-RCU whose
grace periods are different from regular RCU. Also, it generally
isn't a good idea to depend on internal behaviors like this.
To address the above issues, this patch removes the fast exit and adds
an explicit synchronize_rcu().
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Cc: Mike Marciniszyn <mike.marciniszyn@intel.com>
Cc: linux-rdma@vger.kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
While converting ioctx index from a list to a table, db446a08c2
("aio: convert the ioctx list to table lookup v3") missed tagging
kioctx_table->table[] as an array of RCU pointers and using the
appropriate RCU accessors. This introduces a small window in the
lookup path where init and access may race.
Mark kioctx_table->table[] with __rcu and use the approriate RCU
accessors when using the field.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Jann Horn <jannh@google.com>
Fixes: db446a08c2 ("aio: convert the ioctx list to table lookup v3")
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: stable@vger.kernel.org # v3.12+
While fixing refcounting, e34ecee2ae ("aio: Fix a trinity splat")
incorrectly removed explicit RCU grace period before freeing kioctx.
The intention seems to be depending on the internal RCU grace periods
of percpu_ref; however, percpu_ref uses a different flavor of RCU,
sched-RCU. This can lead to kioctx being freed while RCU read
protected dereferences are still in progress.
Fix it by updating free_ioctx() to go through call_rcu() explicitly.
v2: Comment added to explain double bouncing.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Jann Horn <jannh@google.com>
Fixes: e34ecee2ae ("aio: Fix a trinity splat")
Cc: Kent Overstreet <kent.overstreet@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: stable@vger.kernel.org # v3.13+
On guest exit, and when using GICv2 on GICv3, we use a dsb(st) to
force synchronization between the memory-mapped guest view and
the system-register view that the hypervisor uses.
This is incorrect, as the spec calls out the need for "a DSB whose
required access type is both loads and stores with any Shareability
attribute", while we're only synchronizing stores.
We also lack an isb after the dsb to ensure that the latter has
actually been executed before we start reading stuff from the sysregs.
The fix is pretty easy: turn dsb(st) into dsb(sy), and slap an isb()
just after.
Cc: stable@vger.kernel.org
Fixes: f68d2b1b73 ("arm64: KVM: Implement vgic-v3 save/restore")
Acked-by: Christoffer Dall <cdall@kernel.org>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
The vgic code is trying to be clever when injecting GICv2 SGIs,
and will happily populate LRs with the same interrupt number if
they come from multiple vcpus (after all, they are distinct
interrupt sources).
Unfortunately, this is against the letter of the architecture,
and the GICv2 architecture spec says "Each valid interrupt stored
in the List registers must have a unique VirtualID for that
virtual CPU interface.". GICv3 has similar (although slightly
ambiguous) restrictions.
This results in guests locking up when using GICv2-on-GICv3, for
example. The obvious fix is to stop trying so hard, and inject
a single vcpu per SGI per guest entry. After all, pending SGIs
with multiple source vcpus are pretty rare, and are mostly seen
in scenario where the physical CPUs are severely overcomitted.
But as we now only inject a single instance of a multi-source SGI per
vcpu entry, we may delay those interrupts for longer than strictly
necessary, and run the risk of injecting lower priority interrupts
in the meantime.
In order to address this, we adopt a three stage strategy:
- If we encounter a multi-source SGI in the AP list while computing
its depth, we force the list to be sorted
- When populating the LRs, we prevent the injection of any interrupt
of lower priority than that of the first multi-source SGI we've
injected.
- Finally, the injection of a multi-source SGI triggers the request
of a maintenance interrupt when there will be no pending interrupt
in the LRs (HCR_NPIE).
At the point where the last pending interrupt in the LRs switches
from Pending to Active, the maintenance interrupt will be delivered,
allowing us to add the remaining SGIs using the same process.
Cc: stable@vger.kernel.org
Fixes: 0919e84c0f ("KVM: arm/arm64: vgic-new: Add IRQ sync/flush framework")
Acked-by: Christoffer Dall <cdall@kernel.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
On my GICv3 system, the following is printed to the kernel log at boot:
kvm [1]: 8-bit VMID
kvm [1]: IDMAP page: d20e35000
kvm [1]: HYP VA range: 800000000000:ffffffffffff
kvm [1]: vgic-v2@2c020000
kvm [1]: GIC system register CPU interface enabled
kvm [1]: vgic interrupt IRQ1
kvm [1]: virtual timer IRQ4
kvm [1]: Hyp mode initialized successfully
The KVM IDMAP is a mapping of a statically allocated kernel structure,
and so printing its physical address leaks the physical placement of
the kernel when physical KASLR in effect. So change the kvm_info() to
kvm_debug() to remove it from the log output.
While at it, trim the output a bit more: IRQ numbers can be found in
/proc/interrupts, and the HYP VA and vgic-v2 lines are not highly
informational either.
Cc: <stable@vger.kernel.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Acked-by: Christoffer Dall <cdall@kernel.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
We currently don't allow resetting mapped IRQs from userspace, because
their state is controlled by the hardware. But we do need to reset the
state when the VM is reset, so we provide a function for the 'owner' of
the mapped interrupt to reset the interrupt state.
Currently only the timer uses mapped interrupts, so we call this
function from the timer reset logic.
Cc: stable@vger.kernel.org
Fixes: 4c60e360d6 ("KVM: arm/arm64: Provide a get_input_level for the arch timer")
Signed-off-by: Christoffer Dall <cdall@kernel.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Calling vcpu_load() registers preempt notifiers for this vcpu and calls
kvm_arch_vcpu_load(). The latter will soon be doing a lot of heavy
lifting on arm/arm64 and will try to do things such as enabling the
virtual timer and setting us up to handle interrupts from the timer
hardware.
Loading state onto hardware registers and enabling hardware to signal
interrupts can be problematic when we're not actually about to run the
VCPU, because it makes it difficult to establish the right context when
handling interrupts from the timer, and it makes the register access
code difficult to reason about.
Luckily, now when we call vcpu_load in each ioctl implementation, we can
simply remove the call from the non-KVM_RUN vcpu ioctls, and our
kvm_arch_vcpu_load() is only used for loading vcpu content to the
physical CPU when we're actually going to run the vcpu.
Cc: stable@vger.kernel.org
Fixes: 9b062471e5 ("KVM: Move vcpu_load to arch-specific kvm_arch_vcpu_ioctl")
Reviewed-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Our irq_is_pending() helper function accesses multiple members of the
vgic_irq struct, so we need to hold the lock when calling it.
Add that requirement as a comment to the definition and take the lock
around the call in vgic_mmio_read_pending(), where we were missing it
before.
Fixes: 96b298000d ("KVM: arm/arm64: vgic-new: Add PENDING registers handlers")
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Update the initcall ordering to satisfy the following dependency
ordering:
1. DCDBAS, ACPI_WMI
2. DELL_SMBIOS, DELL_RBTN
3. DELL_LAPTOP, DELL_WMI
By assigning them to the following initcall levels:
subsys_initcall: DCDBAS, ACPI_WMI
module_init: DELL_SMBIOS, DELL_RBTN
late_initcall: DELL_LAPTOP, DELL_WMI
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Mario.Limonciello@dell.com
Signed-off-by: Darren Hart (VMware) <dvhart@infradead.org>
Similarly to DCDBAS for DELL_SMBIOS_SMM, if DELL_SMBIOS_WMI is enabled,
DELL_SMBIOS becomes dependent on ACPI_WMI. Update the depends lines to
prevent a configuration where DELL_SMBIOS=y and either backend
dependency =m. Update the comment accordingly.
Cc: Mario Limonciello <mario.limonciello@dell.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Darren Hart (VMware) <dvhart@infradead.org>
The NETIF_F_GSO_SOFTWARE implies support for GSO on SCTP, but the
sunvnet driver does not support GSO for sctp. Here we remove the
NETIF_F_GSO_SOFTWARE feature flag and only report NETIF_F_ALL_TSO
instead.
Signed-off-by: Cathy Zhou <Cathy.Zhou@Oracle.COM>
Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Marc Kleine-Budde says:
====================
pull-request: can 2018-03-14
this is a pull request of two patches for net/master.
Both patches are by Andri Yngvason and fix problems in the cc770 driver,
that show up quite fast on RT systems, but also on non RT setups.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The problem was introduced in commit
506b0a395f ("[netdrv] tg3: APE heartbeat changes"). The bug occurs
because tp->lock spinlock is held which is obtained in tg3_start
by way of tg3_full_lock(), line 11571. The documentation for usleep_range()
specifically states it cannot be used inside a spinlock.
Fixes: 506b0a395f ("[netdrv] tg3: APE heartbeat changes")
Signed-off-by: Jonathan Toppins <jtoppins@redhat.com>
Acked-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Prior to the rework of PMTU information storage in commit
2c8cec5c10 ("ipv4: Cache learned PMTU information in inetpeer."),
when a PMTU event advertising a PMTU smaller than
net.ipv4.route.min_pmtu was received, we would disable setting the DF
flag on packets by locking the MTU metric, and set the PMTU to
net.ipv4.route.min_pmtu.
Since then, we don't disable DF, and set PMTU to
net.ipv4.route.min_pmtu, so the intermediate router that has this link
with a small MTU will have to drop the packets.
This patch reestablishes pre-2.6.39 behavior by splitting
rtable->rt_pmtu into a bitfield with rt_mtu_locked and rt_pmtu.
rt_mtu_locked indicates that we shouldn't set the DF bit on that path,
and is checked in ip_dont_fragment().
One possible workaround is to set net.ipv4.route.min_pmtu to a value low
enough to accommodate the lowest MTU encountered.
Fixes: 2c8cec5c10 ("ipv4: Cache learned PMTU information in inetpeer.")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Madalin Bucur says:
====================
DPAA Ethernet fixes
This patch set is addressing several issues in the DPAA Ethernet
driver suite:
- module unload crash caused by wrong reference to device being left
in the cleanup code after the DSA related changes
- scheduling wile atomic bug in QMan code revealed during dpaa_eth
module unload
- a couple of error counter fixes, a duplicated init in dpaa_eth.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The fd_format has already been initialized at this point.
Signed-off-by: Camelia Groza <camelia.groza@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The recent changes that make the driver probing compatible with DSA
were not propagated in the dpa_remove() function, breaking the
module unload function. Using the proper device to address the issue.
Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The wait_for_completion() call in qman_delete_cgr_safe()
was triggering a scheduling while atomic bug, replacing the
kthread with a smp_call_function_single() call to fix it.
Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com>
Signed-off-by: Roy Pledge <roy.pledge@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull USB fixes from Greg KH:
"Here are a small clump of USB fixes for 4.16-rc6.
Nothing major, just a number of fixes in lots of different drivers, as
well as a PHY driver fix that snuck into this tree. Full details are
in the shortlog.
All of these have been in linux-next with no reported issues"
* tag 'usb-4.16-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (22 commits)
usb: musb: Fix external abort in musb_remove on omap2430
phy: qcom-ufs: add MODULE_LICENSE tag
usb: typec: tcpm: fusb302: Do not log an error on -EPROBE_DEFER
USB: OHCI: Fix NULL dereference in HCDs using HCD_LOCAL_MEM
usbip: vudc: fix null pointer dereference on udc->lock
xhci: Fix front USB ports on ASUS PRIME B350M-A
usb: host: xhci-plat: revert "usb: host: xhci-plat: enable clk in resume timing"
usb: usbmon: Read text within supplied buffer size
usb: host: xhci-rcar: add support for r8a77965
USB: storage: Add JMicron bridge 152d:2567 to unusual_devs.h
usb: xhci: dbc: Fix lockdep warning
xhci: fix endpoint context tracer output
Revert "typec: tcpm: Only request matching pdos"
usb: musb: call pm_runtime_{get,put}_sync before reading vbus registers
usb: quirks: add control message delay for 1b1c:1b20
uas: fix comparison for error code
usb: gadget: udc: renesas_usb3: add binging for r8a77965
usb: renesas_usbhs: add binding for r8a77965
usb: dwc2: fix STM32F7 USB OTG HS compatible
dt-bindings: usb: fix the STM32F7 DWC2 OTG HS core binding
...
Pull tty/serial driver fixes from Greg KH:
"Here are some small tty core and serial driver fixes for 4.16-rc6.
They resolve some newly reported bugs, as well as some very old ones,
which is always nice to see. There is also a new device id added in
here for good measure.
All of these have been in linux-next for a while with no reported
issues"
* tag 'tty-4.16-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
serial: imx: fix bogus dev_err
serial: sh-sci: prevent lockup on full TTY buffers
serial: 8250_pci: Add Brainboxes UC-260 4 port serial device
earlycon: add reg-offset to physical address before mapping
serial: core: mark port as initialized in autoconfig
serial: 8250_pci: Don't fail on multiport card class
tty/serial: atmel: add new version check for usart
tty: make n_tty_read() always abort if hangup is in progress
Pull staging fixes from Greg KH:
"Here are three staging driver fixes for 4.16-rc6
Two of them are lockdep fixes for the ashmem driver that have been
reported by a number of people recently. The last one is a fix for the
comedi driver core.
All of these have been in linux-next for a while with no reported
issues"
* tag 'staging-4.16-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
staging: android: ashmem: Fix possible deadlock in ashmem_ioctl
staging: comedi: fix comedi_nsamples_left.
staging: android: ashmem: Fix lockdep issue during llseek
Andrei Vagin reported a KASAN: slab-out-of-bounds error in
skb_update_prio()
Since SYNACK might be attached to a request socket, we need to
get back to the listener socket.
Since this listener is manipulated without locks, add const
qualifiers to sock_cgroup_prioidx() so that the const can also
be used in skb_update_prio()
Also add the const qualifier to sock_cgroup_classid() for consistency.
Fixes: ca6fb06518 ("tcp: attach SYNACK messages to request sockets instead of listener")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Andrei Vagin <avagin@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull auxdisplay fixes from Miguel Ojeda:
"Silence a few warnings in auxdisplay.
- a couple of uninitialized warnings reported by the build service
- a doc comment warning under W=1
- three fall-through comments not recognized under W=1"
* tag 'auxdisplay-for-linus-v4.16-rc6' of git://github.com/ojeda/linux:
auxdisplay: img-ascii-lcd: Silence 2 uninitialized warnings
auxdisplay: img-ascii-lcd: Fix doc comment to silence warnings
auxdisplay: panel: Change comments to silence fallthrough warnings
The kbuild test robot reported the following warning on sparc64:
kernel/jump_label.c: In function '__jump_label_update':
kernel/jump_label.c:376:51: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
WARN_ONCE(1, "can't patch jump_label at %pS", (void *)entry->code);
On sparc64, the jump_label entry->code field is of type u32, but
pointers are 64-bit. Silence the warning by casting entry->code to an
unsigned long before casting it to a pointer. This is also what the
sparc jump label code does.
Fixes: dc1dd184c2 ("jump_label: Warn on failed jump_label patching attempt")
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Jason Baron <jbaron@akamai.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: "David S . Miller" <davem@davemloft.net>
Link: https://lkml.kernel.org/r/c966fed42be6611254a62d46579ec7416548d572.1521041026.git.jpoimboe@redhat.com
Samsung explicitly states that queued TRIM is supported for Linux with
860 PRO and 860 EVO.
Make the previous blacklist to cover only 840 and 850 series.
Signed-off-by: Park Ju Hyung <qkrwngud825@gmail.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: stable@vger.kernel.org
While waiting for the TX object to send an RTR, an external message with a
matching id can overwrite the TX data. In this case we must call the rx
routine and then try transmitting the message that was overwritten again.
The queue was being stalled because the RX event did not generate an
interrupt to wake up the queue again and the TX event did not happen
because the TXRQST flag is reset by the chip when new data is received.
According to the CC770 datasheet the id of a message object should not be
changed while the MSGVAL bit is set. This has been fixed by resetting the
MSGVAL bit before modifying the object in the transmit function and setting
it after. It is not enough to set & reset CPUUPD.
It is important to keep the MSGVAL bit reset while the message object is
being modified. Otherwise, during RTR transmission, a frame with matching
id could trigger an rx-interrupt, which would cause a race condition
between the interrupt routine and the transmit function.
Signed-off-by: Andri Yngvason <andri.yngvason@marel.com>
Tested-by: Richard Weinberger <richard@nod.at>
Cc: linux-stable <stable@vger.kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
When running virtualised the powerpc kernel is able to run the system
in "compat mode" - which means the kernel and hardware are pretending
to userspace that the CPU is an older version than it actually is.
AT_BASE_PLATFORM is an AUXV entry that we export to userspace for use
when we're running in that mode, which tells userspace the "platform"
string for the real CPU version, as opposed to the faked version.
Although we don't support compat mode when using DT CPU features, and
arguably don't need to set AT_BASE_PLATFORM, the existing cputable
based code always sets it even when we're running bare metal. That
means the lack of AT_BASE_PLATFORM is a user-visible artifact of the
fact that the kernel is using DT CPU features, which we don't want.
So set it in the DT CPU features code also.
This results in eg:
$ LD_SHOW_AUXV=1 /bin/true | grep "AT_.*PLATFORM"
AT_PLATFORM: power9
AT_BASE_PLATFORM:power9
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
When we removed the inclusion of skeleton.dtsi from the device trees, we
broke booting for systems with bootloaders that aren't device tre aware.
This can be seen, for example, when appending the device tree blob to
the kernel image.
The reason booting broke was that the kernel lacked the device_type
label in the memory node. Add in a default memory node wth the
device_type. It can contain the memory address as the location is fixed
for each SoC generation, but the size needs to be added by the
bootloader or the board specific dts.
Fixes: 73102d6fdc ("ARM: dts: aspeed: Remove skeleton.dtsi")
Cc: <stable@vger.kernel.org>
Reported-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Pointer dev is being assigned a value that is never read, it is being
re-assigned the same value later on, hence the initialization is redundant
and can be removed.
Cleans up clang warning:
drivers/nvdimm/pfn_devs.c:307:17: warning: Value stored to 'dev' during
its initialization is never read
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
This fixes a bug where the trap number that is returned by
__kvmppc_vcore_entry gets corrupted. The effect of the corruption
is that IPIs get ignored on POWER9 systems when the IPI is sent via
a doorbell interrupt to a CPU which is executing in a KVM guest.
The effect of the IPI being ignored is often that another CPU locks
up inside smp_call_function_many() (and if that CPU is holding a
spinlock, other CPUs then lock up inside raw_spin_lock()).
The trap number is currently held in register r12 for most of the
assembly-language part of the guest exit path. In that path, we
call kvmppc_subcore_exit_guest(), which is a C function, without
restoring r12 afterwards. Depending on the kernel config and the
compiler, it may modify r12 or it may not, so some config/compiler
combinations see the bug and others don't.
To fix this, we arrange for the trap number to be stored on the
stack from the 'guest_bypass:' label until the end of the function,
then the trap number is loaded and returned in r12 as before.
Cc: stable@vger.kernel.org # v4.8+
Fixes: fd7bacbca4 ("KVM: PPC: Book3S HV: Fix TB corruption in guest exit path on HMI interrupt")
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Found this by accident.
There are no usages of bare cancel_work() in current kernel source.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
This patch validates user provided input to prevent integer overflow due
to integer manipulation in the mlx5_ib_create_srq function.
Cc: syzkaller <syzkaller@googlegroups.com>
Fixes: e126ba97db ("mlx5: Add driver for Mellanox Connect-IB adapters")
Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Add a check for the length of the qpin structure to prevent out-of-bounds reads
BUG: KASAN: slab-out-of-bounds in create_raw_packet_qp+0x114c/0x15e2
Read of size 8192 at addr ffff880066b99290 by task syz-executor3/549
CPU: 3 PID: 549 Comm: syz-executor3 Not tainted 4.15.0-rc2+ #27 Hardware
name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
Call Trace:
dump_stack+0x8d/0xd4
print_address_description+0x73/0x290
kasan_report+0x25c/0x370
? create_raw_packet_qp+0x114c/0x15e2
memcpy+0x1f/0x50
create_raw_packet_qp+0x114c/0x15e2
? create_raw_packet_qp_tis.isra.28+0x13d/0x13d
? lock_acquire+0x370/0x370
create_qp_common+0x2245/0x3b50
? destroy_qp_user.isra.47+0x100/0x100
? kasan_kmalloc+0x13d/0x170
? sched_clock_cpu+0x18/0x180
? fs_reclaim_acquire.part.15+0x5/0x30
? __lock_acquire+0xa11/0x1da0
? sched_clock_cpu+0x18/0x180
? kmem_cache_alloc_trace+0x17e/0x310
? mlx5_ib_create_qp+0x30e/0x17b0
mlx5_ib_create_qp+0x33d/0x17b0
? sched_clock_cpu+0x18/0x180
? create_qp_common+0x3b50/0x3b50
? lock_acquire+0x370/0x370
? __radix_tree_lookup+0x180/0x220
? uverbs_try_lock_object+0x68/0xc0
? rdma_lookup_get_uobject+0x114/0x240
create_qp.isra.5+0xce4/0x1e20
? ib_uverbs_ex_create_cq_cb+0xa0/0xa0
? copy_ah_attr_from_uverbs.isra.2+0xa00/0xa00
? ib_uverbs_cq_event_handler+0x160/0x160
? __might_fault+0x17c/0x1c0
ib_uverbs_create_qp+0x21b/0x2a0
? ib_uverbs_destroy_cq+0x2e0/0x2e0
ib_uverbs_write+0x55a/0xad0
? ib_uverbs_destroy_cq+0x2e0/0x2e0
? ib_uverbs_destroy_cq+0x2e0/0x2e0
? ib_uverbs_open+0x760/0x760
? futex_wake+0x147/0x410
? check_prev_add+0x1680/0x1680
? do_futex+0x3d3/0xa60
? sched_clock_cpu+0x18/0x180
__vfs_write+0xf7/0x5c0
? ib_uverbs_open+0x760/0x760
? kernel_read+0x110/0x110
? lock_acquire+0x370/0x370
? __fget+0x264/0x3b0
vfs_write+0x18a/0x460
SyS_write+0xc7/0x1a0
? SyS_read+0x1a0/0x1a0
? trace_hardirqs_on_thunk+0x1a/0x1c
entry_SYSCALL_64_fastpath+0x18/0x85
RIP: 0033:0x4477b9
RSP: 002b:00007f1822cadc18 EFLAGS: 00000292 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00000000004477b9
RDX: 0000000000000070 RSI: 000000002000a000 RDI: 0000000000000005
RBP: 0000000000708000 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000292 R12: 00000000ffffffff
R13: 0000000000005d70 R14: 00000000006e6e30 R15: 0000000020010ff0
Allocated by task 549:
__kmalloc+0x15e/0x340
kvmalloc_node+0xa1/0xd0
create_user_qp.isra.46+0xd42/0x1610
create_qp_common+0x2e63/0x3b50
mlx5_ib_create_qp+0x33d/0x17b0
create_qp.isra.5+0xce4/0x1e20
ib_uverbs_create_qp+0x21b/0x2a0
ib_uverbs_write+0x55a/0xad0
__vfs_write+0xf7/0x5c0
vfs_write+0x18a/0x460
SyS_write+0xc7/0x1a0
entry_SYSCALL_64_fastpath+0x18/0x85
Freed by task 368:
kfree+0xeb/0x2f0
kernfs_fop_release+0x140/0x180
__fput+0x266/0x700
task_work_run+0x104/0x180
exit_to_usermode_loop+0xf7/0x110
syscall_return_slowpath+0x298/0x370
entry_SYSCALL_64_fastpath+0x83/0x85
The buggy address belongs to the object at ffff880066b99180 which
belongs to the cache kmalloc-512 of size 512 The buggy address is
located 272 bytes inside of 512-byte region [ffff880066b99180,
ffff880066b99380) The buggy address belongs to the page:
page:000000006040eedd count:1 mapcount:0 mapping: (null)
index:0x0 compound_mapcount: 0
flags: 0x4000000000008100(slab|head)
raw: 4000000000008100 0000000000000000 0000000000000000 0000000180190019
raw: ffffea00019a7500 0000000b0000000b ffff88006c403080 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff880066b99180: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff880066b99200: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>ffff880066b99280: 00 00 fc fc fc fc fc fc fc fc fc fc fc fc fc fc
^
ffff880066b99300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff880066b99380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
Cc: syzkaller <syzkaller@googlegroups.com>
Fixes: 0fb2ed66a1 ("IB/mlx5: Add create and destroy functionality for Raw Packet QP")
Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Never directly free @dev after calling device_register(), even
if it returned an error! Always use put_device() to give up the
reference initialized in this function instead.
Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Initialize all the scsi_dh related 'struct multipath' members regardless
of whether a scsi_dh is in use or not.
The subtle (and fragile) SCSI-assuming legacy code clearly needs further
decoupling from non-SCSI (and/or developer understanding).
Fixes: 8d47e65948 ("dm mpath: remove unnecessary NVMe branching in favor of scsi_dh checks")
Reported-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
The warnings are:
drivers/auxdisplay/img-ascii-lcd.c: warning: 'err' may be used
uninitialized in this function [-Wuninitialized]
At lines 109 and 207. Reported by Geert using the build service
several times, e.g.:
https://lkml.org/lkml/2018/2/19/303
They are two false positives, since num_chars > 0 in the three present
configurations (boston, malta, sead3). Initialize to 0 in order to
silence the warning.
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Paul Burton <paul.burton@mips.com>
Signed-off-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
Compiling with W=1 with gcc 7.2.0 gives 2 warnings:
drivers/auxdisplay/img-ascii-lcd.c:233: warning: Function parameter or
member 't' not described in 'img_ascii_lcd_scroll'
drivers/auxdisplay/img-ascii-lcd.c:233: warning: Excess function
parameter 'arg' description in 'img_ascii_lcd_scroll'
Cc: Paul Burton <paul.burton@mips.com>
Signed-off-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
Compiling with W=1 with gcc 7.2.0 gives 3 warnings like:
drivers/auxdisplay/panel.c: In function ‘panel_process_inputs’:
drivers/auxdisplay/panel.c:1374:17: warning: this statement may fall
through [-Wimplicit-fallthrough=]
Cc: Willy Tarreau <w@1wt.eu>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
This fixes an oops on unbind / module unload (on the musb omap2430
platform).
musb_remove function now calls musb_platform_exit before disabling
runtime pm.
Signed-off-by: Merlijn Wajer <merlijn@wizzup.org>
Signed-off-by: Bin Liu <b-liu@ti.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We're dereferencing "p_hwfn->p_rdma_info" but that is freed on the line
before in qed_rdma_resc_free(p_hwfn).
Fixes: 9de506a547 ("qed: Free RoCE ILT Memory on rmmod qedr")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Steffen Klassert says:
====================
pull request (net): ipsec 2018-03-13
1) Refuse to insert 32 bit userspace socket policies on 64
bit systems like we do it for standard policies. We don't
have a compat layer, so inserting socket policies from
32 bit userspace will lead to a broken configuration.
2) Make the policy hold queue work without the flowcache.
Dummy bundles are not chached anymore, so we need to
generate a new one on each lookup as long as the SAs
are not yet in place.
3) Fix the validation of the esn replay attribute. The
The sanity check in verify_replay() is bypassed if
the XFRM_STATE_ESN flag is not set. Fix this by doing
the sanity check uncoditionally.
From Florian Westphal.
4) After most of the dst_entry garbage collection code
is removed, we may leak xfrm_dst entries as they are
neither cached nor tracked somewhere. Fix this by
reusing the 'uncached_list' to track xfrm_dst entries
too. From Xin Long.
5) Fix a rcu_read_lock/rcu_read_unlock imbalance in
xfrm_get_tos() From Xin Long.
6) Fix an infinite loop in xfrm_get_dst_nexthop. On
transport mode we fetch the child dst_entry after
we continue, so this pointer is never updated.
Fix this by fetching it before we continue.
7) Fix ESN sequence number gap after IPsec GSO packets.
We accidentally increment the sequence number counter
on the xfrm_state by one packet too much in the ESN
case. Fix this by setting the sequence number to the
correct value.
8) Reset the ethernet protocol after decapsulation only if a
mac header was set. Otherwise it breaks configurations
with TUN devices. From Yossi Kuperman.
9) Fix __this_cpu_read() usage in preemptible code. Use
this_cpu_read() instead in ipcomp_alloc_tfms().
From Greg Hackmann.
Please pull or let me know if there are problems.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 7d64c39e64310 fixed regression of FCP discovery when Nport Handle
is in-use and relogin is triggered. However, during FCP and FC-NVMe
discovery this resulted into only discovering NVMe LUNs.
This patch fixes issue where FCP and FC-NVMe protocol is used on same
port where assigning FC_NO_LOOP_ID will result into discovery failure
for FCP LUNs.
Fixes: a084fd68e1 ("scsi: qla2xxx: Fix re-login for Nport Handle in use")
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When ata device doing EH, some commands still attached with tasks are
not passed to libata when abort failed or recover failed, so libata did
not handle these commands. After these commands done, sas task is freed,
but ata qc is not freed. This will cause ata qc leak and trigger a
warning like below:
WARNING: CPU: 0 PID: 28512 at drivers/ata/libata-eh.c:4037
ata_eh_finish+0xb4/0xcc
CPU: 0 PID: 28512 Comm: kworker/u32:2 Tainted: G W OE 4.14.0#1
......
Call trace:
[<ffff0000088b7bd0>] ata_eh_finish+0xb4/0xcc
[<ffff0000088b8420>] ata_do_eh+0xc4/0xd8
[<ffff0000088b8478>] ata_std_error_handler+0x44/0x8c
[<ffff0000088b8068>] ata_scsi_port_error_handler+0x480/0x694
[<ffff000008875fc4>] async_sas_ata_eh+0x4c/0x80
[<ffff0000080f6be8>] async_run_entry_fn+0x4c/0x170
[<ffff0000080ebd70>] process_one_work+0x144/0x390
[<ffff0000080ec100>] worker_thread+0x144/0x418
[<ffff0000080f2c98>] kthread+0x10c/0x138
[<ffff0000080855dc>] ret_from_fork+0x10/0x18
If ata qc leaked too many, ata tag allocation will fail and io blocked
for ever.
As suggested by Dan Williams, defer ata device commands to libata and
merge sas_eh_finish_cmd() with sas_eh_defer_cmd(). libata will handle
ata qcs correctly after this.
Signed-off-by: Jason Yan <yanaijie@huawei.com>
CC: Xiaofei Tan <tanxiaofei@huawei.com>
CC: John Garry <john.garry@huawei.com>
CC: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
During the conversion to dsa_is_user_port(), a condition ended up being
reversed, which would prevent the creation of any user port when using
the legacy binding and/or platform data, fix that.
Fixes: 4a5b85ffe2 ("net: dsa: use dsa_is_user_port everywhere")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In case of error, the function dev_get_regmap() returns NULL pointer
not ERR_PTR(). The IS_ERR() test in the return value check should be
replaced with NULL test.
Fixes: 81ac38847a ("clk: qcom: Add APCS clock controller support")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
platform_get_resource() may return NULL, add proper check to
avoid potential NULL dereferencing.
This is detected by Coccinelle semantic patch.
@@
expression pdev, res, n, t, e, e1, e2;
@@
res = platform_get_resource(pdev, t, n);
+ if (!res)
+ return -EINVAL;
... when != res == NULL
e = devm_ioremap(e1, res->start, e2);
Fixes: 4f16f7ff3b ("clk: hisilicon: Add support for Hi3660 stub clocks")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
If we try to determine the rate of a pass-through clock (a clock which
does not implement .round_rate() nor .determine_rate()),
clk_core_round_rate_nolock() will directly forward the call to the
parent clock. In the particular case where the pass-through actually
does not have a parent, clk_core_round_rate_nolock() will directly
return 0 with the requested rate still set to the initial request
structure. This is interpreted as if the rate could be exactly achieved
while it actually cannot be adjusted.
This become a real problem when this particular pass-through clock is
the parent of a mux with the flag CLK_SET_RATE_PARENT set. The
pass-through clock will always report an exact match, get picked and
finally error when the rate is actually getting set.
This is fixed by setting the rate inside the req to 0 when core is NULL
in clk_core_round_rate_nolock() (same as in __clk_determine_rate() when
hw is NULL)
Fixes: 0f6cc2b8e9 ("clk: rework calls to round and determine rate callbacks")
Signed-off-by: Jerome Brunet <jbrunet@baylibre.com>
Signed-off-by: Michael Turquette <mturquette@baylibre.com>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Pull TI SoC clock fixes for 4.16 from Tero Kristo:
* tag 'ti-clk-fixes-4.16' of https://github.com/t-kristo/linux-pm:
clk: ti: am43xx: add set-rate-parent support for display clkctrl clock
clk: ti: am33xx: add set-rate-parent support for display clkctrl clock
clk: ti: clkctrl: add support for CLK_SET_RATE_PARENT flag
Pull i.MX clock fixes for 4.16 from Shawn Guo:
- Update i.MX5 clock driver to register UART4/5 clock only on i.MX50
and i.MX53. It fixes a kernel warning seen on i.MX53, caused by
commit 59dc3d8c86 ("clk: imx51: uart4, uart5 gates only exist on
imx50, imx53").
* tag 'clk-imx-fixes-4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux:
clk: imx51-imx53: Fix UART4/5 registration on i.MX50 and i.MX53
Pull Allwinner clock fixes for 4.16 from Chen-Yu Tsai:
A critical fix for the A31 sunxi-ng clock driver. The CLK_OUT clocks
had definitions paired with the incorrect type of clk ops. This results
in a serious oops starting with commit 946797aa3f ("clk: sunxi-ng:
Support fixed post-dividers on MP style clocks"), which exposed the
incorrect clk ops when it added a new field to the data structures,
which then nudged the underlying (compatible but incorrect) data
structures out of alignment.
* tag 'sunxi-clk-fixes-for-4.16' of https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux:
clk: sunxi-ng: a31: Fix CLK_OUT_* clock ops
Jeff Kirsher says:
====================
Intel Wired LAN Driver Updates 2018-03-12
This series contains fixes to e1000e only.
Benjamin Poirier provides two fixes, first reverts commits that changed
what happens to the link status when there is an error. These commits
were to resolve a race condition, but in the process of fixing the race
condition, they changed the behavior when an error occurred. Second fix
resolves a race condition by not setting "get_link_status" to false
after checking the link.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni says:
====================
l2tp: fix races with ipv4-mapped ipv6 addresses
The syzbot reported an l2tp oops that uncovered some races in the l2tp xmit
path and a partially related issue in the generic ipv6 code.
We need to address them separately.
v1 -> v2:
- add missing fixes tag in patch 1
- fix several issues in patch 2
v2 -> v3:
- dropped some unneeded chunks in patch 2
====================
Reviewed-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
The l2tp_tunnel_create() function checks for v4mapped ipv6
sockets and cache that flag, so that l2tp core code can
reusing it at xmit time.
If the socket is provided by the userspace, the connection
status of the tunnel sockets can change between the tunnel
creation and the xmit call, so that syzbot is able to
trigger the following splat:
BUG: KASAN: use-after-free in ip6_dst_idev include/net/ip6_fib.h:192
[inline]
BUG: KASAN: use-after-free in ip6_xmit+0x1f76/0x2260
net/ipv6/ip6_output.c:264
Read of size 8 at addr ffff8801bd949318 by task syz-executor4/23448
CPU: 0 PID: 23448 Comm: syz-executor4 Not tainted 4.16.0-rc4+ #65
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:17 [inline]
dump_stack+0x194/0x24d lib/dump_stack.c:53
print_address_description+0x73/0x250 mm/kasan/report.c:256
kasan_report_error mm/kasan/report.c:354 [inline]
kasan_report+0x23c/0x360 mm/kasan/report.c:412
__asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433
ip6_dst_idev include/net/ip6_fib.h:192 [inline]
ip6_xmit+0x1f76/0x2260 net/ipv6/ip6_output.c:264
inet6_csk_xmit+0x2fc/0x580 net/ipv6/inet6_connection_sock.c:139
l2tp_xmit_core net/l2tp/l2tp_core.c:1053 [inline]
l2tp_xmit_skb+0x105f/0x1410 net/l2tp/l2tp_core.c:1148
pppol2tp_sendmsg+0x470/0x670 net/l2tp/l2tp_ppp.c:341
sock_sendmsg_nosec net/socket.c:630 [inline]
sock_sendmsg+0xca/0x110 net/socket.c:640
___sys_sendmsg+0x767/0x8b0 net/socket.c:2046
__sys_sendmsg+0xe5/0x210 net/socket.c:2080
SYSC_sendmsg net/socket.c:2091 [inline]
SyS_sendmsg+0x2d/0x50 net/socket.c:2087
do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287
entry_SYSCALL_64_after_hwframe+0x42/0xb7
RIP: 0033:0x453e69
RSP: 002b:00007f819593cc68 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f819593d6d4 RCX: 0000000000453e69
RDX: 0000000000000081 RSI: 000000002037ffc8 RDI: 0000000000000004
RBP: 000000000072bea0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
R13: 00000000000004c3 R14: 00000000006f72e8 R15: 0000000000000000
This change addresses the issues:
* explicitly checking for TCP_ESTABLISHED for user space provided sockets
* dropping the v4mapped flag usage - it can become outdated - and
explicitly invoking ipv6_addr_v4mapped() instead
The issue is apparently there since ancient times.
v1 -> v2: (many thanks to Guillaume)
- with csum issue introduced in v1
- replace pr_err with pr_debug
- fix build issue with IPV6 disabled
- move l2tp_sk_is_v4mapped in l2tp_core.c
v2 -> v3:
- don't update inet_daddr for v4mapped address, unneeded
- drop rendundant check at creation time
Reported-and-tested-by: syzbot+92fa328176eb07e4ac1a@syzkaller.appspotmail.com
Fixes: 3557baabf2 ("[L2TP]: PPP over L2TP driver core")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
On unsuccesful ip6_datagram_connect(), if the failure is caused by
ip6_datagram_dst_update(), the sk peer information are cleared, but
the sk->sk_state is preserved.
If the socket was already in an established status, the overall sk
status is inconsistent and fouls later checks in datagram code.
Fix this saving the old peer information and restoring them in
case of failure. This also aligns ipv6 datagram connect() behavior
with ipv4.
v1 -> v2:
- added missing Fixes tag
Fixes: 85cb73ff9b ("net: ipv6: reset daddr and dport in sk if connect() fails")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alex reported the following race condition:
/* link goes up... interrupt... schedule watchdog */
\ e1000_watchdog_task
\ e1000e_has_link
\ hw->mac.ops.check_for_link() === e1000e_check_for_copper_link
\ e1000e_phy_has_link_generic(..., &link)
link = true
/* link goes down... interrupt */
\ e1000_msix_other
hw->mac.get_link_status = true
/* link is up */
mac->get_link_status = false
link_active = true
/* link_active is true, wrongly, and stays so because
* get_link_status is false */
Avoid this problem by making sure that we don't set get_link_status = false
after having checked the link.
It seems this problem has been present since the introduction of e1000e.
Link: https://lkml.org/lkml/2018/1/29/338
Reported-by: Alexander Duyck <alexander.duyck@gmail.com>
Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This reverts commit 19110cfbb3.
This reverts commit 4110e02eb4.
This reverts commit d3604515c9eda464a92e8e67aae82dfe07fe3c98.
Commit 19110cfbb3 ("e1000e: Separate signaling for link check/link up")
changed what happens to the link status when there is an error which
happens after "get_link_status = false" in the copper check_for_link
callbacks. Previously, such an error would be ignored and the link
considered up. After that commit, any error implies that the link is down.
Revert commit 19110cfbb3 ("e1000e: Separate signaling for link check/link
up") and its followups. After reverting, the race condition described in
the log of commit 19110cfbb3 is reintroduced. It may still be triggered
by LSC events but this should keep the link down in case the link is
electrically unstable, as discussed. The race may no longer be
triggered by RXO events because commit 4aea7a5c5e ("e1000e: Avoid
receiver overrun interrupt bursts") restored reading icr in the Other
handler.
Link: https://lkml.org/lkml/2018/3/1/789
Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Currently, we only allow ourselves to prune the fences so long as
all the waits completed (i.e. all the fences we checked were signaled),
and that the reservation snapshot did not change across the wait.
However, if we only waited for a subset of the reservation object, i.e.
just waiting for the last writer to complete as opposed to all readers
as well, then we would erroneously conclude we could prune the fences as
indeed although all of our waits were successful, they did not represent
the totality of the reservation object.
v2: We only need to check the shared fences due to construction (i.e.
all of the shared fences will be later than the exclusive fence, if
any).
Fixes: e54ca97747 ("drm/i915: Remove completed fences after a wait")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180307171303.29466-1-chris@chris-wilson.co.uk
(cherry picked from commit fa73055b84)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Currently, BXT_PP is hardcoded with value '0'.
It practically disabled eDP backlight on MRB (BXT) platform.
This patch will tell which BXT_PP registers (there are two set of
PP_CONTROL in the spec) to be used as defined in VBT (Video Bios Timing
table) and this will enabled eDP backlight controller on MRB (BXT)
platform.
v2:
- Remove unnecessary information in commit message.
- Assign vbt.backlight.controller to a backlight_controller variable and
return the variable value.
v3:
- Rebased to latest code base.
- updated commit title.
Signed-off-by: Mustamin B Mustaffa <mustamin.b.mustaffa@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180227030734.37901-1-mustamin.b.mustaffa@intel.com
(cherry picked from commit 73c0fcac97)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
The orphan clocks reparents should migrate any existing count from the
orphan clock to its new acestor clocks, otherwise we may have
inconsistent counts in the tree and end-up with gated critical clocks
Assuming we have two clocks, A and B.
* Clock A has CLK_IS_CRITICAL flag set.
* Clock B is an ancestor of A which can gate. Clock B gate is left
enabled by the bootloader.
Step 1: Clock A is registered. Since it is a critical clock, it is
enabled. The clock being still an orphan, no parent are enabled.
Step 2: Clock B is registered and reparented to clock A (potentially
through several other clocks). We are now in situation where the enable
count of clock A is 1 while the enable count of its ancestors is 0, which
is not good.
Step 3: in lateinit, clk_disable_unused() is called, the enable_count of
clock B being 0, clock B is gated and and critical clock A actually gets
disabled.
This situation was found while adding fdiv_clk gates to the meson8b
platform. These clocks parent clk81 critical clock, which is the mother
of all peripheral clocks in this system. Because of the issue described
here, the system is crashing when clk_disable_unused() is called.
The situation is solved by reverting
commit f8f8f1d044 ("clk: Don't touch hardware when reparenting during registration").
To avoid breaking again the situation described in this commit
description, enabling critical clock should be done before walking the
orphan list. This way, a parent critical clock may not be accidentally
disabled due to the CLK_OPS_PARENT_ENABLE mechanism.
Fixes: f8f8f1d044 ("clk: Don't touch hardware when reparenting during registration")
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: Shawn Guo <shawnguo@kernel.org>
Cc: Dong Aisheng <aisheng.dong@nxp.com>
Signed-off-by: Jerome Brunet <jbrunet@baylibre.com>
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Tested-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Michael Turquette <mturquette@baylibre.com>
Pull NFS client bugfixes from Trond Myklebust:
"Hightlights include the following stable fixes:
- NFS: Fix an incorrect type in struct nfs_direct_req
- pNFS: Prevent the layout header refcount going to zero in
pnfs_roc()
- NFS: Fix unstable write completion"
* tag 'nfs-for-4.16-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
NFS: Fix unstable write completion
pNFS: Prevent the layout header refcount going to zero in pnfs_roc()
NFS: Fix an incorrect type in struct nfs_direct_req
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for your net tree, they are:
1) Fixed hashtable representation doesn't support timeout flag, skip it
otherwise rules to add elements from the packet fail bogusly fail with
EOPNOTSUPP.
2) Fix bogus error with 32-bits ebtables userspace and 64-bits kernel,
patch from Florian Westphal.
3) Sanitize proc names in several x_tables extensions, also from Florian.
4) Add sanitization to ebt_among wormhash logic, from Florian.
5) Missing release of hook array in flowtable.
====================
ASoC: Fixes for v4.16
This is a fairly standard collection of fixes, there's no changes to the
core here just a bunch of small device specific changes for single
drivers plus an update to the MAINTAINERS file for the sgl5000.
Marc Kleine-Budde says:
====================
pull-request: can 2018-03-12
this is a pull reqeust of 6 patches for net/master.
The first patch is by Wolfram Sang and fixes a bitshift vs. comparison mistake
in the m_can driver. Two patches of Marek Vasut repair the error handling in
the ifi driver. The two patches by Stephane Grosjean fix a "echo_skb is
occupied!" bug in the peak/pcie_fd driver. Bich HEMON's patch adds pinctrl
select state calls to the m_can's driver to further improve power saving during
suspend.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Since the kprobe which was optimized by jump can not change
the execution path, the kprobe for error-injection must not
be optimized. To prohibit it, set a dummy post-handler as
officially stated in Documentation/kprobes.txt.
Fixes: 4b1a29a7f5 ("error-injection: Support fault injection framework")
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Now when using 'ss' in iproute, kernel would try to load all _diag
modules, which also causes corresponding family and proto modules
to be loaded as well due to module dependencies.
Like after running 'ss', sctp, dccp, af_packet (if it works as a module)
would be loaded.
For example:
$ lsmod|grep sctp
$ ss
$ lsmod|grep sctp
sctp_diag 16384 0
sctp 323584 5 sctp_diag
inet_diag 24576 4 raw_diag,tcp_diag,sctp_diag,udp_diag
libcrc32c 16384 3 nf_conntrack,nf_nat,sctp
As these family and proto modules are loaded unintentionally, it
could cause some problems, like:
- Some debug tools use 'ss' to collect the socket info, which loads all
those diag and family and protocol modules. It's noisy for identifying
issues.
- Users usually expect to drop sctp init packet silently when they
have no sense of sctp protocol instead of sending abort back.
- It wastes resources (especially with multiple netns), and SCTP module
can't be unloaded once it's loaded.
...
In short, it's really inappropriate to have these family and proto
modules loaded unexpectedly when just doing debugging with inet_diag.
This patch is to introduce sock_load_diag_module() where it loads
the _diag module only when it's corresponding family or proto has
been already registered.
Note that we can't just load _diag module without the family or
proto loaded, as some symbols used in _diag module are from the
family or proto module.
v1->v2:
- move inet proto check to inet_diag to avoid a compiling err.
v2->v3:
- define sock_load_diag_module in sock.c and export one symbol
only.
- improve the changelog.
Reported-by: Sabrina Dubroca <sd@queasysnail.net>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Phil Sutter <phil@nwl.cc>
Acked-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan says:
====================
bnxt_en: Bug fixes.
There are 3 bug fixes in this series to fix regressions recently
introduced when adding the new ring reservations scheme. 2 minor
fixes in the TC Flower code to return standard errno values and
to elide some unnecessary warning dmesg. One Fixes the VLAN TCI
value passed to the stack by including the entire 16-bit VLAN TCI,
and the last fix is to check for valid VNIC ID before setting up or
shutting down LRO/GRO.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
During initialization, if we encounter errors, there is a code path that
calls bnxt_hwrm_vnic_set_tpa() with invalid VNIC ID. This may cause a
warning in firmware logs.
Fixes: c0c050c58d ("bnxt_en: New Broadcom ethernet driver.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
bnxt_restore_pf_fw_resources routine frees PF resources by calling
close_nic and allocates the resources back, by doing open_nic. However,
this is not needed, if the PF is already in closed state.
This bug causes the driver to call open the device and call request_irq()
when it is not needed. Ultimately, pci_disable_msix() will crash
when bnxt_en is unloaded.
This patch fixes the problem by skipping __bnxt_close_nic and
__bnxt_open_nic inside bnxt_restore_pf_fw_resources routine, if the
interface is not running.
Fixes: 80fcaf46c0 ("bnxt_en: Restore MSIX after disabling SRIOV.")
Signed-off-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, internal error value is returned by the driver, when
hwrm_cfa_flow_alloc() fails due lack of resources. We should be returning
Linux errno value -ENOSPC instead.
This patch also converts other similar command errors to standard Linux errno
code (-EIO) in bnxt_tc.c
Fixes: db1d36a273 ("bnxt_en: add TC flower offload flow_alloc/free FW cmds")
Signed-off-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Recent changes added the bnxt_init_int_mode() call in the driver's open
path whenever ring reservations are changed. This call was previously
only called in the probe path. In the open path, if MQPRIO TC has been
setup, the bnxt_init_int_mode() call would reset and mess up the MQPRIO
per TC rings.
Fix it by not re-initilizing bp->tx_nr_rings_per_tc in
bnxt_init_int_mode(). Instead, initialize it in the probe path only
after the bnxt_init_int_mode() call.
Fixes: 674f50a5b0 ("bnxt_en: Implement new method to reserve rings.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When receiving a packet with VLAN tag, pass the entire 16-bit TCI to the
stack when calling __vlan_hwaccel_put_tag(). The current code is only
passing the 12-bit tag and it is missing the priority bits.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In some conditions when the driver fails to add a flow in HW and returns
an error back to the stack, the stack continues to invoke get_flow_stats()
and/or del_flow() on it. The driver fails these APIs with an error message
"no flow_node for cookie". The message gets logged repeatedly as long as
the stack keeps invoking these functions.
Fix this by removing the corresponding netdev_info() calls from these
functions.
Fixes: d7bc730530 ("bnxt_en: add code to query TC flower offload stats")
Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The number of vnics to check must be determined ahead of time because
only standard RX rings require vnics to support RFS. The logic is
similar to the ring reservation logic and we can now use the
refactored common functions to do most of the work in setting up
the firmware message.
Fixes: 8f23d638b3 ("bnxt_en: Expand bnxt_check_rings() to check all resources.")
Signed-off-by: Eddie Wai <eddie.wai@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The bnxt_hwrm_reserve_{pf|vf}_rings() functions are very similar to
the bnxt_hwrm_check_{pf|vf}_rings() functions. Refactor the former
so that the latter can make use of common code in the next patch.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull "Rockchip dts64 fixes for 4.16" from Heiko Stübner:
Pinctrl got a fix in 4.16-rc1, that exposed an issue with wifi-related
pinctrl hogs on rk3399-gru-kevin that broke suspend. This gets fixed
by moving the wifi pinctrl to the correct node.
Also revert the usb3 phy-port enablement, as a missing feature in the
type-c phy breaks usb on all non-gru rk3399 boards.
* tag 'v4.16-rockchip-dts64fixes-2' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip:
Revert "arm64: dts: rockchip: add usb3-phy otg-port support for rk3399"
arm64: dts: rockchip: Fix rk3399-gru-* s2r (pinctrl hogs, wifi reset)
In 664fcf123a (net: phy: Threaded interrupts allow some simplification)
the phy_interrupt system was changed to use a traditional threaded
interrupt scheme instead of a workqueue approach.
With this change, the phy status check moved into phy_change, which
did not report back to the caller whether or not the interrupt was
handled. This means that, in the case of a shared phy interrupt,
only the first phydev's interrupt registers are checked (since
phy_interrupt() would always return IRQ_HANDLED). This leads to
interrupt storms when it is a secondary device that's actually the
interrupt source.
Signed-off-by: Brad Mouring <brad.mouring@ni.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull "Rockchip dts32 fixes for 4.16" from Heiko Stübner:
Fix for a new dtc warning.
* tag 'v4.16-rockchip-dts32fixes-2' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip:
ARM: dts: rockchip: Add missing #sound-dai-cells on rk3288
Pull "mvebu fixes for 4.16 (part 2)" from Gregory CLEMENT:
- Updating my emails address but this time in mailmap (from
free-electrons to bootlin)
* tag 'mvebu-fixes-4.16-2' of git://git.infradead.org/linux-mvebu:
mailmap: Update email address for Gregory CLEMENT
Pull "DaVinci fixes for v4.16" from Sekhar Nori:
A patch fixing GPIO look-up for MMC/SD card detect and write-protect
pins on OMAP-L138 Hawkboard.
* tag 'davinci-fixes-for-v4.16' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/nsekhar/linux-davinci:
ARM: davinci: fix the GPIO lookup for omapl138-hawk
Pull "i.MX fixes for 4.16, 2nd round" from Shawn Guo:
- Fix a copy-paste error in imx7d-sdb device tree, which results in two
usb-otg regulators are both named "regulator-usb-otg1-vbus" and
override each other.
* tag 'imx-fixes-4.16-2' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux:
ARM: dts: imx7d-sdb: Fix regulator-usb-otg2-vbus node name
With the commit 1ba8f9d308 ("ALSA: hda: Add a power_save
blacklist"), we changed the default value of power_save option to -1
for processing the power-save blacklist.
Unfortunately, this seems breaking user-space applications that
actually read the power_save parameter value via sysfs and judge /
adjust the power-saving status. They see the value -1 as if the
power-save is turned off, although the actual value is taken from
CONFIG_SND_HDA_POWER_SAVE_DEFAULT and it can be a positive.
So, overall, passing -1 there was no good idea. Let's partially
revert it -- at least for power_save option default value is restored
again to CONFIG_SND_HDA_POWER_SAVE_DEFAULT. Meanwhile, in this patch,
we keep the blacklist behavior and make is adjustable via the new
option, pm_blacklist.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=199073
Fixes: 1ba8f9d308 ("ALSA: hda: Add a power_save blacklist")
Acked-by: Hans de Goede <hdegoede@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
While the specific UFS PHY drivers (14nm and 20nm) have a module
license, the common base module does not, leading to a Kbuild
failure:
WARNING: modpost: missing MODULE_LICENSE() in drivers/phy/qualcomm/phy-qcom-ufs.o
FATAL: modpost: GPL-incompatible module phy-qcom-ufs.ko uses GPL-only symbol 'clk_enable'
This adds a module description and license tag to fix the build.
I added both Yaniv and Vivek as authors here, as Yaniv sent the initial
submission, while Vivek did most of the work since.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
Make sure to apply the correct pin state in suspend/resume callbacks.
Putting pins in sleep state saves power.
Signed-off-by: Bich Hemon <bich.hemon@st.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Currently the exclusivity is enabled when the rate is set by
the mode setting functions. These functions are called by
mode_set_nofb callback of drm_crc_helper. Then exclusivity
is disabled when tcon is disabled by atomic_disable
callback.
What happens is that mode_set_nofb can be called once when
mode changes, and afterwards the system can call atomic_enable
and atomic_disable multiple times without further calls to
mode_set_nofb.
This happens:
mode_set_nofb - clk exclusivity is enabled
atomic_enable
atomic_disable - clk exclusivity is disabled
atomic_enable
atomic_disable - clk exclusivity is already disabled, leading to WARN
in clk_rate_exclusive_put
Solution is to enable exclusivity in sun4i_tcon_channel_set_status.
Signed-off-by: Ondrej Jirman <megous@megous.com>
Cc: Jernej Skrabec <jernej.skrabec@siol.net>
Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180310110511.14697-1-megous@megous.com
When an interface starts, the echo_skb array is empty and the network
queue should be started only. This patch replaces useless code and locks
when the internal RX_BARRIER message is received from the IP core, telling
the driver that tx may start.
Signed-off-by: Stephane Grosjean <s.grosjean@peak-system.com>
Cc: linux-stable <stable@vger.kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
This patch makes atomic the handling of the linux-can echo_skb array and
the network tx queue. This prevents from the "BUG! echo_skb is occupied!"
message to be printed by the linux-can core, in SMP environments.
Reported-by: Diana Burgess <diana@peloton-tech.com>
Signed-off-by: Stephane Grosjean <s.grosjean@peak-system.com>
Cc: linux-stable <stable@vger.kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
The new version of the IFI CANFD core has significantly less complex
error state indication logic. In particular, the warning/error state
bits are no longer all over the place, but are all present in the
STATUS register. Moreover, there is a new IRQ register bit indicating
transition between error states (active/warning/passive/busoff).
This patch makes use of this bit to weed out the obscure selective
INTERRUPT register clearing, which was used to carry over the error
state indication into the poll function. While at it, this patch
fixes the handling of the ACTIVE state, since the hardware provides
indication of the core being in ACTIVE state and that in turn fixes
the state transition indication toward userspace. Finally, register
reads in the poll function are moved to the matching subfunctions
since those are also no longer needed in the poll function.
Signed-off-by: Marek Vasut <marex@denx.de>
Cc: Heiko Schocher <hs@denx.de>
Cc: Markus Marb <markus@marb.org>
Cc: Marc Kleine-Budde <mkl@pengutronix.de>
Cc: linux-stable <stable@vger.kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Older versions of the core are not compatible with the driver due
to various intrusive fixes of the core. Read out the VER register,
check the core revision bitfield and verify if the core in use is
new enough (rev 2.1 or newer) to work correctly with this driver.
Signed-off-by: Marek Vasut <marex@denx.de>
Cc: Heiko Schocher <hs@denx.de>
Cc: Markus Marb <markus@marb.org>
Cc: Marc Kleine-Budde <mkl@pengutronix.de>
Cc: linux-stable <stable@vger.kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Max delat_t should be the full_bucket/rate instead of the full_bucket.
Also report EINVAL if the rate is zero.
Fixes: 96fbc13d7e ("openvswitch: Add meter infrastructure")
Cc: Andy Zhou <azhou@ovn.org>
Signed-off-by: zhangliping <zhangliping02@baidu.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adding a macvlan device on top of a lowerdev that supports
the xfrm offloads fails with a new regression:
# ip link add link ens1f0 mv0 type macvlan
RTNETLINK answers: Operation not permitted
Tracing down the failure shows that the macvlan device inherits
the NETIF_F_HW_ESP and NETIF_F_HW_ESP_TX_CSUM feature flags
from the lowerdev, but with no dev->xfrmdev_ops API filled
in, it doesn't actually support xfrm. When the request is
made to add the new macvlan device, the XFRM listener for
NETDEV_REGISTER calls xfrm_api_check() which fails the new
registration because dev->xfrmdev_ops is NULL.
The macvlan creation succeeds when we filter out the ESP
feature flags in macvlan_fix_features(), so let's filter them
out like we're already filtering out ~NETIF_F_NETNS_LOCAL.
When XFRM support is added in the future, we can add the flags
into MACVLAN_FEATURES.
This same problem could crop up in the future with any other
new feature flags, so let's filter out any flags that aren't
defined as supported in macvlan.
Fixes: d77e38e612 ("xfrm: Add an IPsec hardware offloading API")
Reported-by: Alexey Kodanev <alexey.kodanev@oracle.com>
Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It's found that the final phase set by driver doesn't match that of
the output from clk_summary:
dwmmc_rockchip fe310000.dwmmc: Successfully tuned phase to 346
mmc0: new ultra high speed SDR104 SDIO card at address 0001
cat /sys/kernel/debug/clk/clk_summary | grep sdio_sample
sdio_sample 0 1 0 50000000 0 0
It seems the cached core->phase isn't updated after the clk was
registered. So fix this issue by updating the core->phase if setting
phase successfully.
Fixes: 9e4d04adeb ("clk: add clk_core_set_phase_nolock function")
Cc: Stable <stable@vger.kernel.org>
Cc: Jerome Brunet <jbrunet@baylibre.com>
Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com>
Reviewed-by: Jerome Brunet <jbrunet@baylibre.com>
Tested-by: Jerome Brunet <jbrunet@baylibre.com>
Signed-off-by: Michael Turquette <mturquette@baylibre.com>
Pull x86/pti updates from Thomas Gleixner:
"Yet another pile of melted spectrum related updates:
- Drop native vsyscall support finally as it causes more trouble than
benefit.
- Make microcode loading more robust. There were a few issues
especially related to late loading which are now surfacing because
late loading of the IB* microcodes addressing spectre issues has
become more widely used.
- Simplify and robustify the syscall handling in the entry code
- Prevent kprobes on the entry trampoline code which lead to kernel
crashes when the probe hits before CR3 is updated
- Don't check microcode versions when running on hypervisors as they
are considered as lying anyway.
- Fix the 32bit objtool build and a coment typo"
* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/kprobes: Fix kernel crash when probing .entry_trampoline code
x86/pti: Fix a comment typo
x86/microcode: Synchronize late microcode loading
x86/microcode: Request microcode on the BSP
x86/microcode/intel: Look into the patch cache first
x86/microcode: Do not upload microcode if CPUs are offline
x86/microcode/intel: Writeback and invalidate caches before updating microcode
x86/microcode/intel: Check microcode revision before updating sibling threads
x86/microcode: Get rid of struct apply_microcode_ctx
x86/spectre_v2: Don't check microcode versions when running under hypervisors
x86/vsyscall/64: Drop "native" vsyscalls
x86/entry/64/compat: Save one instruction in entry_INT80_compat()
x86/entry: Do not special-case clone(2) in compat entry
x86/syscalls: Use COMPAT_SYSCALL_DEFINEx() macros for x86-only compat syscalls
x86/syscalls: Use proper syscall definition for sys_ioperm()
x86/entry: Remove stale syscall prototype
x86/syscalls/32: Simplify $entry == $compat entries
objtool: Fix 32-bit build
Pull timer fix from Thomas Gleixner:
"Just a single fix which adds a missing Kconfig dependency to avoid
unmet dependency warnings"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
clocksource/atmel-st: Add 'depends on HAS_IOMEM' to fix unmet dependency
Pull RAS fixes from Thomas Gleixner:
"Two small fixes for RAS/MCE:
- Serialize sysfs changes to avoid concurrent modificaiton of
underlying data
- Add microcode revision to Machine Check records. This should have
been there forever, but now with the broken microcode versions in
the wild it has become important"
* 'ras-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/MCE: Serialize sysfs changes
x86/MCE: Save microcode revision in machine check records
Pull perf updates from Thomas Gleixner:
"Another set of perf updates:
- Fix a Skylake Uncore event format declaration
- Prevent perf pipe mode from crahsing which was caused by a missing
buffer allocation
- Make the perf top popup message which tells the user that it uses
fallback mode on older kernels a debug message.
- Make perf context rescheduling work correcctly
- Robustify the jump error drawing in perf browser mode so it does
not try to create references to NULL initialized offset entries
- Make trigger_on() robust so it does not enable the trigger before
everything is set up correctly to handle it
- Make perf auxtrace respect the --no-itrace option so it does not
try to queue AUX data for decoding.
- Prevent having different number of field separators in CVS output
lines when a counter is not supported.
- Make the perf kallsyms man page usage behave like it does for all
other perf commands.
- Synchronize the kernel headers"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/core: Fix ctx_event_type in ctx_resched()
perf tools: Fix trigger class trigger_on()
perf auxtrace: Prevent decoding when --no-itrace
perf stat: Fix CVS output format for non-supported counters
tools headers: Sync x86's cpufeatures.h
tools headers: Sync copy of kvm UAPI headers
perf record: Fix crash in pipe mode
perf annotate browser: Be more robust when drawing jump arrows
perf top: Fix annoying fallback message on older kernels
perf kallsyms: Fix the usage on the man page
perf/x86/intel/uncore: Fix Skylake UPI event format
Pull locking fix from Thomas Gleixner:
"rt_mutex_futex_unlock() grew a new irq-off call site, but the function
assumes that its always called from irq enabled context.
Use (un)lock_irqsafe() to handle the new call site correctly"
* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
rtmutex: Make rt_mutex_futex_unlock() safe for irq-off callsites
Pull irqchip updates for 4.16-rc5 from Marc Zyngier
- IMX GPCv2 cleanup
- GICv3 iomem annontation fixes
- GICv3 ITS minimal ITE allocation now matching the LPIs'.
ebt_among is special, it has a dynamic match size and is exempt
from the central size checks.
commit c4585a2823 ("bridge: ebt_among: add missing match size checks")
added validation for pool size, but missed fact that the macros
ebt_among_wh_src/dst can already return out-of-bound result because
they do not check value of wh_src/dst_ofs (an offset) vs. the size
of the match that userspace gave to us.
v2:
check that offset has correct alignment.
Paolo Abeni points out that we should also check that src/dst
wormhash arrays do not overlap, and src + length lines up with
start of dst (or vice versa).
v3: compact wormhash_sizes_valid() part
NB: Fixes tag is intentionally wrong, this bug exists from day
one when match was added for 2.6 kernel. Tag is there so stable
maintainers will notice this one too.
Tested with same rules from the earlier patch.
Fixes: c4585a2823 ("bridge: ebt_among: add missing match size checks")
Reported-by: <syzbot+bdabab6f1983a03fc009@syzkaller.appspotmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The last rule in the blob has next_entry offset that is same as total size.
This made "ebtables32 -A OUTPUT -d de:ad:be:ef:01:02" fail on 64 bit kernel.
Fixes: b718121685 ("netfilter: ebtables: CONFIG_COMPAT: don't trust userland offsets")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Pull dmaengine fixes from Vinod Koul:
"Two small fixes are for this cycle:
- fix max_chunk_size for rcar-dmac for R-Car Gen3
- fix clock resource of mv_xor_v2"
* tag 'dmaengine-fix-4.16-rc5' of git://git.infradead.org/users/vkoul/slave-dma:
dmaengine: mv_xor_v2: Fix clock resource by adding a register clock
dmaengine: rcar-dmac: fix max_chunk_size for R-Car Gen3
Pull GPIO fix from Linus Walleij:
"This is a single GPIO fix for the v4.16 series affecting the Renesas
driver, and fixes wakeup from external stuff"
* tag 'gpio-v4.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
gpio: rcar: Use wakeup_path i.s.o. explicit clock handling
On the CP110 components which are present on the Armada 7K/8K SoC we need
to explicitly enable the clock for the registers. However it is not
needed for the AP8xx component, that's why this clock is optional.
With this patch both clock have now a name, but in order to be backward
compatible, the name of the first clock is not used. It allows to still
use this clock with a device tree using the old binding.
Signed-off-by: Gregory CLEMENT <gregory.clement@bootlin.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
imx_gpcv2_get_wakeup_source() is not used anywhere, so remove it.
This fixes the following sparse warning:
drivers/irqchip/irq-imx-gpcv2.c:34:5: warning: symbol 'imx_gpcv2_get_wakeup_source' was not declared. Should it be static?
Fixes: e324c4dc4a ("irqchip/imx-gpcv2: IMX GPCv2 driver for wakeup sources")
Signed-off-by: Fabio Estevam <fabio.estevam@nxp.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
When struct its_device instances are created, the nr_ites member
will be set to a power of 2 that equals or exceeds the requested
number of MSIs passed to the msi_prepare() callback. At the same
time, the LPI map is allocated to be some multiple of 32 in size,
where the allocated size may be less than the requested size
depending on whether a contiguous range of sufficient size is
available in the global LPI bitmap.
This may result in the situation where the nr_ites < nr_lpis, and
since nr_ites is what we program into the hardware when we map the
device, the additional LPIs will be non-functional.
For bog standard hardware, this does not really matter. However,
in cases where ITS device IDs are shared between different PCIe
devices, we may end up allocating these additional LPIs without
taking into account that they don't actually work.
So let's make nr_ites at least 32. This ensures that all allocated
LPIs are 'live', and that its_alloc_device_irq() will fail when
attempts are made to allocate MSIs beyond what was allocated in
the first place.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
[maz: updated comment]
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
snd_pcm_oss_get_formats() has an obvious use-after-free around
snd_mask_test() calls, as spotted by syzbot. The passed format_mask
argument is a pointer to the hw_params object that is freed before the
loop. What a surprise that it has been present since the original
code of decades ago...
Reported-by: syzbot+4090700a4f13fccaf648@syzkaller.appspotmail.com
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Pull Kbuild fixes from Masahiro Yamada:
- make fixdep parse kconfig.h to fix missing rebuild
- replace hyphens with underscores in builtin DTB label names
- fix typos
* tag 'kbuild-fixes-v4.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kbuild: Handle builtin dtb file names containing hyphens
scripts/bloat-o-meter: fix typos in help
fixdep: do not ignore kconfig.h
fixdep: remove some false CONFIG_ matches
fixdep: remove stale references to uml-config.h
Pull block fixes from Jens Axboe:
- a xen-blkfront fix from Bhavesh with a multiqueue fix when
detaching/re-attaching
- a few important NVMe fixes, including a revert for a sysfs fix that
caused some user space confusion
- two bcache fixes by way of Michael Lyle
- a loop regression fix, fixing an issue with lost writes on DAX.
* tag 'for-linus-20180309' of git://git.kernel.dk/linux-block:
loop: Fix lost writes caused by missing flag
nvme_fc: rework sqsize handling
nvme-fabrics: Ignore nr_io_queues option for discovery controllers
xen-blkfront: move negotiate_mq to cover all cases of new VBDs
Revert "nvme: create 'slaves' and 'holders' entries for hidden controllers"
bcache: don't attach backing with duplicate UUID
bcache: fix crashes in duplicate cache device register
nvme: pci: pass max vectors as num_possible_cpus() to pci_alloc_irq_vectors
nvme-pci: Fix EEH failure on ppc
Pull device mapper fixes from Mike Snitzer:
- Fix an uninitialized variable false warning in dm bufio
- Fix DM's passthrough ioctl support to be race free against an
underlying device being removed.
- Fix corner-case of DM raid resync reporting if/when the raid becomes
degraded during resync; otherwise automated raid repair will fail.
- A few DM multipath fixes to make non-SCSI optimizations, that were
introduced during the 4.16 merge, useful for all non-SCSI devices,
rather than narrowly define this non-SCSI mode in terms of "nvme".
This allows the removal of "queue_mode nvme" that really didn't need
to be introduced. Instead DM core will internalize whether
nvme-specific IO submission optimizations are doable and DM multipath
will only do SCSI-specific device handler operations if SCSI is in
use.
* tag 'for-4.16/dm-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm table: allow upgrade from bio-based to specialized bio-based variant
dm mpath: remove unnecessary NVMe branching in favor of scsi_dh checks
dm table: fix "nvme" test
dm raid: fix incorrect sync_ratio when degraded
dm: use blkdev_get rather than bdgrab when issuing pass-through ioctl
dm bufio: avoid false-positive Wmaybe-uninitialized warning
Pull rdma fixes from Doug Ledford:
- Various driver bug fixes in mlx5, mlx4, bnxt_re and qedr, ranging
from bugs under load to bad error case handling
- There in one largish patch fixing the locking in bnxt_re to avoid a
machine hard lock situation
- A few core bugs on error paths
- A patch to reduce stack usage in the new CQ API
- One mlx5 regression introduced in this merge window
- There were new syzkaller scripts written for the RDMA subsystem and
we are fixing issues found by the bot
- One of the commits (aa0de36a40 “RDMA/mlx5: Fix integer overflow
while resizing CQ”) is missing part of the commit log message and one
of the SOB lines. The original patch was from Leon Romanovsky, and a
cut-n-paste separator in the commit message confused patchworks which
then put the end of message separator in the wrong place in the
downloaded patch, and I didn’t notice in time. The patch made it into
the official branch, and the only way to fix it in-place was to
rebase. Given the pain that a rebase causes, and the fact that the
patch has relevant tags for stable and syzkaller, a revert of the
munged patch and a reapplication of the original patch with the log
message intact was done.
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (25 commits)
RDMA/mlx5: Fix integer overflow while resizing CQ
Revert "RDMA/mlx5: Fix integer overflow while resizing CQ"
RDMA/ucma: Check that user doesn't overflow QP state
RDMA/mlx5: Fix integer overflow while resizing CQ
RDMA/ucma: Limit possible option size
IB/core: Fix possible crash to access NULL netdev
RDMA/bnxt_re: Avoid Hard lockup during error CQE processing
RDMA/core: Reduce poll batch for direct cq polling
IB/mlx5: Fix an error code in __mlx5_ib_modify_qp()
IB/mlx5: When not in dual port RoCE mode, use provided port as native
IB/mlx4: Include GID type when deleting GIDs from HW table under RoCE
IB/mlx4: Fix corruption of RoCEv2 IPv4 GIDs
RDMA/qedr: Fix iWARP write and send with immediate
RDMA/qedr: Fix kernel panic when running fio over NFSoRDMA
RDMA/qedr: Fix iWARP connect with port mapper
RDMA/qedr: Fix ipv6 destination address resolution
IB/core : Add null pointer check in addr_resolve
RDMA/bnxt_re: Fix the ib_reg failure cleanup
RDMA/bnxt_re: Fix incorrect DB offset calculation
RDMA/bnxt_re: Unconditionly fence non wire memory operations
...
Pull x86 platform driver fixes from Darren Hart:
"Correct a module loading race condition between the DELL_SMBIOS
backend modules and the first user by converting them to bool features
of the DELL_SMBIOS driver. Fixup the resulting Kconfig dependency
issue with DCDBAS"
* tag 'platform-drivers-x86-v4.16-6' of git://git.infradead.org/linux-platform-drivers-x86:
platform/x86: dell-smbios: Resolve dependency error on DCDBAS
platform/x86: Allow for SMBIOS backend defaults
platform/x86: dell-smbios: Link all dell-smbios-* modules together
platform/x86: dell-smbios: Rename dell-smbios source to dell-smbios-base
platform/x86: dell-smbios: Correct some style warnings
When releasing a client, we need to clear the clienttab[] entry at
first, then call snd_seq_queue_client_leave(). Otherwise, the
in-flight cell in the queue might be picked up by the timer interrupt
via snd_seq_check_queue() before calling snd_seq_queue_client_leave(),
and it's delivered to another queue while the client is clearing
queues. This may eventually result in an uncleared cell remaining in
a queue, and the later snd_seq_pool_delete() may need to wait for a
long time until the event gets really processed.
By moving the clienttab[] clearance at the beginning of release, any
event delivery of a cell belonging to this client will fail at a later
point, since snd_seq_client_ptr() returns NULL. Thus the cell that
was picked up by the timer interrupt will be returned immediately
without further delivery, and the long stall of snd_seq_delete_pool()
can be avoided, too.
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Although we've covered the races between concurrent write() and
ioctl() in the previous patch series, there is still a possible UAF in
the following scenario:
A: user client closed B: timer irq
-> snd_seq_release() -> snd_seq_timer_interrupt()
-> snd_seq_free_client() -> snd_seq_check_queue()
-> cell = snd_seq_prioq_cell_peek()
-> snd_seq_prioq_leave()
.... removing all cells
-> snd_seq_pool_done()
.... vfree()
-> snd_seq_compare_tick_time(cell)
... Oops
So the problem is that a cell is peeked and accessed without any
protection until it's retrieved from the queue again via
snd_seq_prioq_cell_out().
This patch tries to address it, also cleans up the code by a slight
refactoring. snd_seq_prioq_cell_out() now receives an extra pointer
argument. When it's non-NULL, the function checks the event timestamp
with the given pointer. The caller needs to pass the right reference
either to snd_seq_tick or snd_seq_realtime depending on the event
timestamp type.
A good news is that the above change allows us to remove the
snd_seq_prioq_cell_peek(), too, thus the patch actually reduces the
code size.
Reviewed-by: Nicolai Stange <nstange@suse.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Commit 7383d44b added a pointer pdata which get set to the default
platform_data when non was defined in the device. But it did not
pass this pointer to the st_sensors_init_sensor call but still
used the maybe uninitialized platform_data from dev.
This breaks initialization when no platform_data is given and
the optional st,drdy-int-pin devicetree option is not set.
This commit fixes this.
Cc: stable@vger.kernel.org
Fixes: 7383d44b ("iio: st_pressure: st_accel: Initialise sensor platform data properly")
Signed-off-by: Michael Nosthoff <committed@heine.so>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
This reverts commit 585ed27d06.
This removed code which was unused due to a bug in commit 7383d44b.
To fix this bug the code is needed. Thus this revert.
Signed-off-by: Michael Nosthoff <committed@heine.so>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
The meson_sar_adc_lock() function is not supposed to hold the
"indio_dev->mlock" on the error path.
Fixes: 3adbf34273 ("iio: adc: add a driver for the SAR ADC found in Amlogic Meson SoCs")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Pull KVM fixes from Radim Krčmář:
"PPC:
- Fix guest time accounting in the host
- Fix large-page backing for radix guests on POWER9
- Fix HPT guests on POWER9 backed by 2M or 1G pages
- Compile fixes for some configs and gcc versions
s390:
- Fix random memory corruption when running as guest2 (e.g. KVM in
LPAR) and starting guest3 (e.g. nested KVM) with many CPUs
- Export forgotten io interrupt delivery statistics counter"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: s390: fix memory overwrites when not using SCA entries
KVM: PPC: Book3S HV: Fix guest time accounting with VIRT_CPU_ACCOUNTING_GEN
KVM: PPC: Book3S HV: Fix VRMA initialization with 2MB or 1GB memory backing
KVM: PPC: Book3S HV: Fix handling of large pages in radix page fault handler
KVM: s390: provide io interrupt kvm_stat
KVM: PPC: Book3S: Fix compile error that occurs with some gcc versions
KVM: PPC: Fix compile error that occurs when CONFIG_ALTIVEC=n
Pull xen fix from Juergen Gross:
"Just one fix for the correct error handling after a failed
device_register()"
* tag 'for-linus-4.16a-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen: xenbus: use put_device() instead of kfree()
Pull arm64 fixes from Catalin Marinas:
- The SMCCC firmware interface for the spectre variant 2 mitigation has
been updated to allow the discovery of whether the CPU needs the
workaround. This pull request relaxes the kernel check on the return
value from firmware.
- Fix the commit allowing changing from global to non-global page table
entries which inadvertently disallowed other safe attribute changes.
- Fix sleeping in atomic during the arm_perf_teardown_cpu() code.
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: Relax ARM_SMCCC_ARCH_WORKAROUND_1 discovery
arm_pmu: Use disable_irq_nosync when disabling SPI in CPU teardown hook
arm64: mm: fix thinko in non-global page table attribute check
Pull Documentation build fix from Jonathan Corbet:
"The Sphinx 1.7 release broke the build process for reasons that are
mostly our fault.
This is a single fix cherry-picked from docs-next that restores docs
buildability for all supported Sphinx versions"
* tag 'docs-4.16-fix' of git://git.lwn.net/linux:
Documentation/sphinx: Fix Directive import error
Merge misc fixes from Andrew Morton:
"8 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
lib/test_kmod.c: fix limit check on number of test devices created
selftests/vm/run_vmtests: adjust hugetlb size according to nr_cpus
mm/page_alloc: fix memmap_init_zone pageblock alignment
mm/memblock.c: hardcode the end_pfn being -1
mm/gup.c: teach get_user_pages_unlocked to handle FOLL_NOWAIT
lib/bug.c: exclude non-BUG/WARN exceptions from report_bug()
bug: use %pB in BUG and stack protector failure
hugetlb: fix surplus pages accounting
As reported by Dan the parentheses is in the wrong place, and since
unlikely() call returns either 0 or 1 it's never less than zero. The
second issue is that signed integer overflows like "INT_MAX + 1" are
undefined behavior.
Since num_test_devs represents the number of devices, we want to stop
prior to hitting the max, and not rely on the wrap arround at all. So
just cap at num_test_devs + 1, prior to assigning a new device.
Link: http://lkml.kernel.org/r/20180224030046.24238-1-mcgrof@kernel.org
Fixes: d9c6a72d6f ("kmod: add test driver to stress test the module loader")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit b92df1de5d ("mm: page_alloc: skip over regions of invalid pfns
where possible") introduced a bug where move_freepages() triggers a
VM_BUG_ON() on uninitialized page structure due to pageblock alignment.
To fix this, simply align the skipped pfns in memmap_init_zone() the
same way as in move_freepages_block().
Seen in one of the RHEL reports:
crash> log | grep -e BUG -e RIP -e Call.Trace -e move_freepages_block -e rmqueue -e freelist -A1
kernel BUG at mm/page_alloc.c:1389!
invalid opcode: 0000 [#1] SMP
--
RIP: 0010:[<ffffffff8118833e>] [<ffffffff8118833e>] move_freepages+0x15e/0x160
RSP: 0018:ffff88054d727688 EFLAGS: 00010087
--
Call Trace:
[<ffffffff811883b3>] move_freepages_block+0x73/0x80
[<ffffffff81189e63>] __rmqueue+0x263/0x460
[<ffffffff8118c781>] get_page_from_freelist+0x7e1/0x9e0
[<ffffffff8118caf6>] __alloc_pages_nodemask+0x176/0x420
--
RIP [<ffffffff8118833e>] move_freepages+0x15e/0x160
RSP <ffff88054d727688>
crash> page_init_bug -v | grep RAM
<struct resource 0xffff88067fffd2f8> 1000 - 9bfff System RAM (620.00 KiB)
<struct resource 0xffff88067fffd3a0> 100000 - 430bffff System RAM ( 1.05 GiB = 1071.75 MiB = 1097472.00 KiB)
<struct resource 0xffff88067fffd410> 4b0c8000 - 4bf9cfff System RAM ( 14.83 MiB = 15188.00 KiB)
<struct resource 0xffff88067fffd480> 4bfac000 - 646b1fff System RAM (391.02 MiB = 400408.00 KiB)
<struct resource 0xffff88067fffd560> 7b788000 - 7b7fffff System RAM (480.00 KiB)
<struct resource 0xffff88067fffd640> 100000000 - 67fffffff System RAM ( 22.00 GiB)
crash> page_init_bug | head -6
<struct resource 0xffff88067fffd560> 7b788000 - 7b7fffff System RAM (480.00 KiB)
<struct page 0xffffea0001ede200> 1fffff00000000 0 <struct pglist_data 0xffff88047ffd9000> 1 <struct zone 0xffff88047ffd9800> DMA32 4096 1048575
<struct page 0xffffea0001ede200> 505736 505344 <struct page 0xffffea0001ed8000> 505855 <struct page 0xffffea0001edffc0>
<struct page 0xffffea0001ed8000> 0 0 <struct pglist_data 0xffff88047ffd9000> 0 <struct zone 0xffff88047ffd9000> DMA 1 4095
<struct page 0xffffea0001edffc0> 1fffff00000400 0 <struct pglist_data 0xffff88047ffd9000> 1 <struct zone 0xffff88047ffd9800> DMA32 4096 1048575
BUG, zones differ!
Note that this range follows two not populated sections
68000000-77ffffff in this zone. 7b788000-7b7fffff is the first one
after a gap. This makes memmap_init_zone() skip all the pfns up to the
beginning of this range. But this range is not pageblock (2M) aligned.
In fact no range has to be.
crash> kmem -p 77fff000 78000000 7b5ff000 7b600000 7b787000 7b788000
PAGE PHYSICAL MAPPING INDEX CNT FLAGS
ffffea0001e00000 78000000 0 0 0 0
ffffea0001ed7fc0 7b5ff000 0 0 0 0
ffffea0001ed8000 7b600000 0 0 0 0 <<<<
ffffea0001ede1c0 7b787000 0 0 0 0
ffffea0001ede200 7b788000 0 0 1 1fffff00000000
Top part of page flags should contain nodeid and zonenr, which is not
the case for page ffffea0001ed8000 here (<<<<).
crash> log | grep -o fffea0001ed[^\ ]* | sort -u
fffea0001ed8000
fffea0001eded20
fffea0001edffc0
crash> bt -r | grep -o fffea0001ed[^\ ]* | sort -u
fffea0001ed8000
fffea0001eded00
fffea0001eded20
fffea0001edffc0
Initialization of the whole beginning of the section is skipped up to
the start of the range due to the commit b92df1de5d. Now any code
calling move_freepages_block() (like reusing the page from a freelist as
in this example) with a page from the beginning of the range will get
the page rounded down to start_page ffffea0001ed8000 and passed to
move_freepages() which crashes on assertion getting wrong zonenr.
> VM_BUG_ON(page_zone(start_page) != page_zone(end_page));
Note, page_zone() derives the zone from page flags here.
From similar machine before commit b92df1de5d:
crash> kmem -p 77fff000 78000000 7b5ff000 7b600000 7b7fe000 7b7ff000
PAGE PHYSICAL MAPPING INDEX CNT FLAGS
fffff73941e00000 78000000 0 0 1 1fffff00000000
fffff73941ed7fc0 7b5ff000 0 0 1 1fffff00000000
fffff73941ed8000 7b600000 0 0 1 1fffff00000000
fffff73941edff80 7b7fe000 0 0 1 1fffff00000000
fffff73941edffc0 7b7ff000 ffff8e67e04d3ae0 ad84 1 1fffff00020068 uptodate,lru,active,mappedtodisk
All the pages since the beginning of the section are initialized.
move_freepages()' not gonna blow up.
The same machine with this fix applied:
crash> kmem -p 77fff000 78000000 7b5ff000 7b600000 7b7fe000 7b7ff000
PAGE PHYSICAL MAPPING INDEX CNT FLAGS
ffffea0001e00000 78000000 0 0 0 0
ffffea0001e00000 7b5ff000 0 0 0 0
ffffea0001ed8000 7b600000 0 0 1 1fffff00000000
ffffea0001edff80 7b7fe000 0 0 1 1fffff00000000
ffffea0001edffc0 7b7ff000 ffff88017fb13720 8 2 1fffff00020068 uptodate,lru,active,mappedtodisk
At least the bare minimum of pages is initialized preventing the crash
as well.
Customers started to report this as soon as 7.4 (where b92df1de5d was
merged in RHEL) was released. I remember reports from
September/October-ish times. It's not easily reproduced and happens on
a handful of machines only. I guess that's why. But that does not make
it less serious, I think.
Though there actually is a report here:
https://bugzilla.kernel.org/show_bug.cgi?id=196443
And there are reports for Fedora from July:
https://bugzilla.redhat.com/show_bug.cgi?id=1473242
and CentOS:
https://bugs.centos.org/view.php?id=13964
and we internally track several dozens reports for RHEL bug
https://bugzilla.redhat.com/show_bug.cgi?id=1525121
Link: http://lkml.kernel.org/r/0485727b2e82da7efbce5f6ba42524b429d0391a.1520011945.git.neelx@redhat.com
Fixes: b92df1de5d ("mm: page_alloc: skip over regions of invalid pfns where possible")
Signed-off-by: Daniel Vacek <neelx@redhat.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Paul Burton <paul.burton@imgtec.com>
Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
KVM is hanging during postcopy live migration with userfaultfd because
get_user_pages_unlocked is not capable to handle FOLL_NOWAIT.
Earlier FOLL_NOWAIT was only ever passed to get_user_pages.
Specifically faultin_page (the callee of get_user_pages_unlocked caller)
doesn't know that if FAULT_FLAG_RETRY_NOWAIT was set in the page fault
flags, when VM_FAULT_RETRY is returned, the mmap_sem wasn't actually
released (even if nonblocking is not NULL). So it sets *nonblocking to
zero and the caller won't release the mmap_sem thinking it was already
released, but it wasn't because of FOLL_NOWAIT.
Link: http://lkml.kernel.org/r/20180302174343.5421-2-aarcange@redhat.com
Fixes: ce53053ce3 ("kvm: switch get_user_page_nowait() to get_user_pages_unlocked()")
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Reported-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Tested-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit b8347c2196 ("x86/debug: Handle warnings before the notifier
chain, to fix KGDB crash") changed the ordering of fixups, and did not
take into account the case of x86 processing non-WARN() and non-BUG()
exceptions. This would lead to output of a false BUG line with no other
information.
In the case of a refcount exception, it would be immediately followed by
the refcount WARN(), producing very strange double-"cut here":
lkdtm: attempting bad refcount_inc() overflow
------------[ cut here ]------------
Kernel BUG at 0000000065f29de5 [verbose debug info unavailable]
------------[ cut here ]------------
refcount_t overflow at lkdtm_REFCOUNT_INC_OVERFLOW+0x6b/0x90 in cat[3065], uid/euid: 0/0
WARNING: CPU: 0 PID: 3065 at kernel/panic.c:657 refcount_error_report+0x9a/0xa4
...
In the prior ordering, exceptions were searched first:
do_trap_no_signal(struct task_struct *tsk, int trapnr, char *str,
...
if (fixup_exception(regs, trapnr))
return 0;
- if (fixup_bug(regs, trapnr))
- return 0;
-
As a result, fixup_bugs()'s is_valid_bugaddr() didn't take into account
needing to search the exception list first, since that had already
happened.
So, instead of searching the exception list twice (once in
is_valid_bugaddr() and then again in fixup_exception()), just add a
simple sanity check to report_bug() that will immediately bail out if a
BUG() (or WARN()) entry is not found.
Link: http://lkml.kernel.org/r/20180301225934.GA34350@beast
Fixes: b8347c2196 ("x86/debug: Handle warnings before the notifier chain, to fix KGDB crash")
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Richard Weinberger <richard.weinberger@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dan Rue has noticed that libhugetlbfs test suite fails counter test:
# mount_point="/mnt/hugetlb/"
# echo 200 > /proc/sys/vm/nr_hugepages
# mkdir -p "${mount_point}"
# mount -t hugetlbfs hugetlbfs "${mount_point}"
# export LD_LIBRARY_PATH=/root/libhugetlbfs/libhugetlbfs-2.20/obj64
# /root/libhugetlbfs/libhugetlbfs-2.20/tests/obj64/counters
Starting testcase "/root/libhugetlbfs/libhugetlbfs-2.20/tests/obj64/counters", pid 3319
Base pool size: 0
Clean...
FAIL Line 326: Bad HugePages_Total: expected 0, actual 1
The bug was bisected to 0c397daea1 ("mm, hugetlb: further simplify
hugetlb allocation API").
The reason is that alloc_surplus_huge_page() misaccounts per node
surplus pages. We should increase surplus_huge_pages_node rather than
nr_huge_pages_node which is already handled by alloc_fresh_huge_page.
Link: http://lkml.kernel.org/r/20180221191439.GM2231@dhcp22.suse.cz
Fixes: 0c397daea1 ("mm, hugetlb: further simplify hugetlb allocation API")
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: Dan Rue <dan.rue@linaro.org>
Tested-by: Dan Rue <dan.rue@linaro.org>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The original commit of this patch has a munged log message that is
missing several of the tags the original author intended to be on the
patch. This was due to patchworks misinterpreting a cut-n-paste
separator line as an end of message line and munging the mbox that was
used to import the patch:
https://patchwork.kernel.org/patch/10264089/
The original patch will be reapplied with a fixed commit message so the
proper tags are applied.
This reverts commit aa0de36a40.
Signed-off-by: Doug Ledford <dledford@redhat.com>
Pull PCI fixes from Bjorn Helgaas:
- fix sparc build issue when OF_IRQ not enabled (Guenter Roeck)
- fix enumeration of devices below switches on DesignWare-based
controllers (Koen Vandeputte)
* tag 'pci-v4.16-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
PCI: dwc: Fix enumeration end when reaching root subordinate
PCI: Move of_irq_parse_and_map_pci() declaration under OF_IRQ
Pull fbdev fix from Bartlomiej Zolnierkiewicz:
"Just a single fix to close a kernel data leak in FBIOGETCMAP_SPARC
ioctl"
* tag 'fbdev-v4.16-rc5' of git://github.com/bzolnier/linux:
fbdev: Fixing arbitrary kernel leak in case FBIOGETCMAP_SPARC in sbusfb_ioctl_helper().
Pull drm fixes from Dave Airlie:
"There are a small set of sun4i and i915 fixes, and many more amdgpu
fixes:
sun4i:
- divide by zero fix
- clock and LVDS fixes
i915:
- fix for perf
- race fix
amdgpu:
- a bit more than we are normally comfortable with at this point,
however it does fix a lot of display issues with the new DC code
which result in black screens in various configurations along with
some run of the mill gpu configuration fixes.
I'm happy enough that the fixes are limited to the DC code and
should fix a bunch of issues on the new raven ridge APUs that we
are seeing shipped now"
* tag 'drm-fixes-for-v4.16-rc5' of git://people.freedesktop.org/~airlied/linux: (42 commits)
drm/amd/display: validate plane format on primary plane
drm/amdgpu:Always save uvd vcpu_bo in VM Mode
drm/amdgpu:Correct max uvd handles
drm/amd/display: early return if not in vga mode in disable_vga
drm/amd/display: Fix takover from VGA mode
drm/amd/display: Fix memleaks when atomic check fails.
drm/amd/display: Return success when enabling interrupt
drm/amd/display: Use crtc enable/disable_vblank hooks
drm/amd/display: update infoframe after dig fe is turned on
drm/amd/display: fix boot-up on vega10
drm/amd/display: fix cursor related Pstate hang
drm/amd/display: Set irq state only on existing crtcs
drm/amd/display: Fixed non-native modes not lighting up
drm/amd/display: Call update_stream_signal directly from amdgpu_dm
drm/amd/display: Make create_stream_for_sink more consistent
drm/amd/display: Don't block dual-link DVI modes
drm/amd/display: Don't allow dual-link DVI on all ASICs.
drm/amd/display: Pass signal directly to enable_tmds_output
drm/amd/display: Remove unnecessary fail labels in create_stream_for_sink
drm/amd/display: Move MAX_TMDS_CLOCK define to header
...
Do not log an error if tcpm_register_port() fails with -EPROBE_DEFER.
Fixes: cf140a3569 ("typec: fusb302: Use dev_err during probe")
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
William Tu says:
====================
a couple of erspan fixes
The series fixes a couple of erspan issues.
The first patch adds the erspan v2 proto type to the ip6 tunnel lookup.
The second patch improves the error handling when users screws the
version number in metadata. The final patch makes sure the skb has
enough headroom for pushing erspan header when xmit.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The patch adds skb_cow_header() to ensure enough headroom
at ip6erspan_tunnel_xmit before pushing the erspan header
to the skb.
Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When users fill in incorrect erspan version number through
the struct erspan_metadata uapi, current code skips pushing
the erspan header but continue pushing the gre header, which
is incorrect. The patch fixes it by returning error.
Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The patch adds the erspan v2 proto in ip6gre_tunnel_lookup
so the erspan v2 tunnel can be found correctly.
Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel says:
====================
mlxsw: ACL and mirroring fixes
The first patch fixes offload of rules using the 'pass' action. Instead
of continuing to evaluate lower priority rules, the binding is
terminated and the packet proceeds to the bridge and router blocks on
ingress, or goes out of the port on egress.
Second patch prevents the user from mirroring more than once from a
given {Port, Direction} as this is not supported by the device.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The Spectrum ASIC doesn't support mirroring more than once from a single
binding point (which is a port-direction pair). Therefore detect that a
second binding of a given binding point is attempted.
To that end, extend struct mlxsw_sp_span_inspected_port to track whether
a given binding point is bound or not. Extend
mlxsw_sp_span_entry_port_find() to look for ports based on the full
unique key: port number, direction, and boundness.
Besides fixing the overt bug where configured mirrors are not offloaded,
this also fixes a more subtle bug: mlxsw_sp_span_inspected_port_del()
just defers to mlxsw_sp_span_entry_bound_port_find(), and that used to
find the first port with the right number (disregarding the type). Thus
by adding and removing egress and ingress mirrors in the right order,
one could trick the system into believing it has no egress mirrors when
in fact it did have some. That then caused that
mlxsw_sp_span_port_mtu_update() didn't update mirroring buffer when MTU
was changed.
Fixes: 763b4b70af ("mlxsw: spectrum: Add support in matchall mirror TC offloading")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For ok GACT action, TERMINATE binding_cmd should be used in action set
passed down to HW.
Fixes: b2925957ec ("mlxsw: spectrum_flower: Offload "ok" termination action")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reported-by: Alexander Petrovskiy <alexpe@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull sound fixes from Takashi Iwai:
"Two type of fixes:
- The usual stuff, a handful HD-audio quirks for various machines
- Further hardening against ALSA sequencer ioctl/write races that are
triggered by fuzzer"
* tag 'sound-4.16-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda: add dock and led support for HP ProBook 640 G2
ALSA: hda: add dock and led support for HP EliteBook 820 G3
ALSA: hda/realtek - Make dock sound work on ThinkPad L570
ALSA: seq: Remove superfluous snd_seq_queue_client_leave_cells() call
ALSA: seq: More protection for concurrent write and ioctl races
ALSA: seq: Don't allow resizing pool in use
ALSA: hda/realtek - Fix dock line-out volume on Dell Precision 7520
ALSA: hda/realtek: Limit mic boost on T480
ALSA: hda/realtek - Add headset mode support for Dell laptop
ALSA: hda/realtek - Add support headset mode for DELL WYSE
ALSA: hda - Fix a wrong FIXUP for alc289 on Dell machines
Currently the driver attempts to spin lock on udc->lock before a NULL
pointer check is performed on udc, hence there is a potential null
pointer dereference on udc->lock. Fix this by moving the null check
on udc before the lock occurs.
Fixes: ea6873a45a ("usbip: vudc: Add SysFS infrastructure for VUDC")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Shuah Khan <shuahkh@osg.samsung.com>
Reviewed-by: Krzysztof Opasiak <k.opasiak@samsung.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
A recent update to the ARM SMCCC ARCH_WORKAROUND_1 specification
allows firmware to return a non zero, positive value to describe
that although the mitigation is implemented at the higher exception
level, the CPU on which the call is made is not affected.
Let's relax the check on the return value from ARCH_WORKAROUND_1
so that we only error out if the returned value is negative.
Fixes: b092201e00 ("arm64: Add ARM_SMCCC_ARCH_WORKAROUND_1 BP hardening support")
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Pull overlayfs fixes from Miklos Szeredi:
"This fixes a corner case for NFS exporting (introduced in this cycle)
as well as fixing miscellaneous bugs"
* 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
ovl: update Kconfig texts
ovl: redirect_dir=nofollow should not follow redirect for opaque lower
ovl: fix ptr_ret.cocci warnings
ovl: check ERR_PTR() return value from ovl_lookup_real()
ovl: check lower ancestry on encode of lower dir file handle
ovl: hash non-dir by lower inode for fsnotify
Pull xfs fixes from Darrick Wong:
- Fix some iomap locking problems
- Don't allocate cow blocks when we're zeroing file data
* tag 'xfs-4.16-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: don't block on the ilock for RWF_NOWAIT
xfs: don't start out with the exclusive ilock for direct I/O
xfs: don't allocate COW blocks for zeroing holes or unwritten extents
When the DELL_SMBIOS_SMM backend is enabled, the DELL_SMBIOS symbol
depends on DELL_DCDBAS, and we must avoid the situation where
DELL_SMBIOS=y and DCDBAS=m.
Adding the conditional dependency to DELL_SMBIOS such as:
depends !DELL_SMBIOS_SMM || (DCDBAS || DCDBAS=n)
results in the Kconfig tooling complaining about a circular dependency,
although it appears to work in practice.
Avoid the errors by simplifying the dependency and forcing DELL_SMBIOS
to be <= DCDBAS if DCDBAS is enabled (thanks to Greg KH for the
suggestion).
Cc: Mario.Limonciello@dell.com
Signed-off-by: Darren Hart (VMware) <dvhart@infradead.org>
Avoid accidental configurations by setting default y for DELL_SMBIOS
backends. Avoid this impacting the default build size, by making them
dependent on DELL_SMBIOS, so they only appear when DELL_SMBIOS is
manually selected, or by DELL_LAPTOP or DELL_WMI.
While DELL_SMBIOS does have a prompt, it does not have any dependencies.
Keeping DELL_SMBIOS visible, despite being "select"ed by DELL_LAPTOP and
DELL_WMI, is a deliberate choice to provide context for the WMI and SMM
backends, which would otherwise appear to float without context within
the menu.
Signed-off-by: Darren Hart (VMware) <dvhart@infradead.org>
Some race conditions were raised due to dell-smbios and its backends
not being ready by the time that a consumer would call one of the
exported methods.
To avoid this problem, guarantee that all initialization has been
done by linking them all together and running init for them all.
As part of this change the Kconfig needs to be adjusted so that
CONFIG_DELL_SMBIOS_SMM and CONFIG_DELL_SMBIOS_WMI are boolean
rather than modules.
CONFIG_DELL_SMBIOS is a visually selectable option again and both
CONFIG_DELL_SMBIOS_WMI and CONFIG_DELL_SMBIOS_SMM are optional.
Signed-off-by: Mario Limonciello <mario.limonciello@dell.com>
[dvhart: Update prompt and help text for DELL_SMBIOS_* backends]
Signed-off-by: Darren Hart (VMware) <dvhart@infradead.org>
This is being done to faciliate a later change to link all the dell-smbios
drivers together.
Signed-off-by: Mario Limonciello <mario.limonciello@dell.com>
Signed-off-by: Darren Hart (VMware) <dvhart@infradead.org>
WARNING: function definition argument 'struct calling_interface_buffer *'
should also have an identifier name
+ int (*call_fn)(struct calling_interface_buffer *);
WARNING: Block comments use * on subsequent lines
+ /* 4 bytes of table header, plus 7 bytes of Dell header,
plus at least
+ 6 bytes of entry */
WARNING: Block comments use a trailing */ on a separate line
+ 6 bytes of entry */
Signed-off-by: Mario Limonciello <mario.limonciello@dell.com>
Signed-off-by: Darren Hart (VMware) <dvhart@infradead.org>
Pull powerpc fixes from Michael Ellerman:
"One notable fix to properly advertise our support for a new firmware
feature, caused by two series conflicting semantically but not
textually.
There's a new ioctl for the new ocxl driver, which is not a fix, but
needed to complete the userspace API and good to have before the
driver is in a released kernel.
Finally three minor selftest fixes, and a fix for intermittent build
failures for some obscure platforms, caused by a missing make
dependency.
Thanks to: Alastair D'Silva, Bharata B Rao, Guenter Roeck"
* tag 'powerpc-4.16-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/pseries: Fix vector5 in ibm architecture vector table
ocxl: Document the OCXL_IOCTL_GET_METADATA IOCTL
ocxl: Add get_metadata IOCTL to share OCXL information to userspace
selftests/powerpc: Skip the subpage_prot tests if the syscall is unavailable
selftests/powerpc: Fix missing clean of pmu/lib.o
powerpc/boot: Fix random libfdt related build errors
selftests/powerpc: Skip tm-trap if transactional memory is not enabled
When a USB device gets plugged on ASUS PRIME B350M-A's front ports, the
xHC stops working:
[ 549.114587] xhci_hcd 0000:02:00.0: WARN: xHC CMD_RUN timeout
[ 549.114608] suspend_common(): xhci_pci_suspend+0x0/0xc0 returns -110
[ 549.114638] xhci_hcd 0000:02:00.0: can't suspend (hcd_pci_runtime_suspend returned -110)
Delay before running xHC command CMD_RUN can workaround the issue.
Use a new quirk to make the delay only targets to the affected xHC.
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch reverts the commit 835e4241e7 ("usb: host: xhci-plat:
enable clk in resume timing") because this driver also has runtime PM
and the commit 560869100b ("clk: renesas: cpg-mssr: Restore module
clocks during resume") will restore the clock on R-Car H3 environment.
If the xhci_plat_suspend() disables the clk, the system cannot enable
the clk in resume like the following behavior:
< In resume >
- genpd_resume_noirq() runs and enable the clk (enable_count = 1)
- cpg_mssr_resume_noirq() restores the clk register.
-- Since the clk was disabled in suspend, cpg_mssr_resume_noirq()
will disable the clk and keep the enable_count.
- Even if xhci_plat_resume() calls clk_prepare_enable(), since
the enable_count is 1, the clk will be not enabled.
After this patch is applied, the cpg-mssr driver will save the clk
as enable, so the clk will be enabled in resume.
Fixes: 835e4241e7 ("usb: host: xhci-plat: enable clk in resume timing")
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Jason Wang says:
====================
Several fixes for vhost_net ptr_ring usage
This small series try to fix several bugs of ptr_ring usage in
vhost_net. Please review.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
After commit fc72d1d54d ("tuntap: XDP transmission"), we can
actually queueing XDP pointers in the pointer ring, so we should
examine the pointer type before freeing the pointer.
Fixes: fc72d1d54d ("tuntap: XDP transmission")
Reported-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We get pointer ring from the exported sock, this means we should keep
rx_ring and vq->private synced during both vq stop and backend set,
otherwise we may see stale rx_ring.
Fixes: c67df11f6e ("vhost_net: try batch dequing from skb array")
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This enables AVE_GI_RXDROP interrupt factor. This factor indicates
depletion of Rx descriptors and the handler counts the number
of dropped packets.
Signed-off-by: Kunihiko Hayashi <hayashi.kunihiko@socionext.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As well as the basic conversion, I noticed that a lot of the
SCTP code checks gso_type without first checking skb_is_gso()
so I have added that where appropriate.
Also, document the helper.
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix the following slab-out-of-bounds kasan report in
ndisc_fill_redirect_hdr_option when the incoming ipv6 packet is not
linear and the accessed data are not in the linear data region of orig_skb.
[ 1503.122508] ==================================================================
[ 1503.122832] BUG: KASAN: slab-out-of-bounds in ndisc_send_redirect+0x94e/0x990
[ 1503.123036] Read of size 1184 at addr ffff8800298ab6b0 by task netperf/1932
[ 1503.123220] CPU: 0 PID: 1932 Comm: netperf Not tainted 4.16.0-rc2+ #124
[ 1503.123347] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-2.fc27 04/01/2014
[ 1503.123527] Call Trace:
[ 1503.123579] <IRQ>
[ 1503.123638] print_address_description+0x6e/0x280
[ 1503.123849] kasan_report+0x233/0x350
[ 1503.123946] memcpy+0x1f/0x50
[ 1503.124037] ndisc_send_redirect+0x94e/0x990
[ 1503.125150] ip6_forward+0x1242/0x13b0
[...]
[ 1503.153890] Allocated by task 1932:
[ 1503.153982] kasan_kmalloc+0x9f/0xd0
[ 1503.154074] __kmalloc_track_caller+0xb5/0x160
[ 1503.154198] __kmalloc_reserve.isra.41+0x24/0x70
[ 1503.154324] __alloc_skb+0x130/0x3e0
[ 1503.154415] sctp_packet_transmit+0x21a/0x1810
[ 1503.154533] sctp_outq_flush+0xc14/0x1db0
[ 1503.154624] sctp_do_sm+0x34e/0x2740
[ 1503.154715] sctp_primitive_SEND+0x57/0x70
[ 1503.154807] sctp_sendmsg+0xaa6/0x1b10
[ 1503.154897] sock_sendmsg+0x68/0x80
[ 1503.154987] ___sys_sendmsg+0x431/0x4b0
[ 1503.155078] __sys_sendmsg+0xa4/0x130
[ 1503.155168] do_syscall_64+0x171/0x3f0
[ 1503.155259] entry_SYSCALL_64_after_hwframe+0x42/0xb7
[ 1503.155436] Freed by task 1932:
[ 1503.155527] __kasan_slab_free+0x134/0x180
[ 1503.155618] kfree+0xbc/0x180
[ 1503.155709] skb_release_data+0x27f/0x2c0
[ 1503.155800] consume_skb+0x94/0xe0
[ 1503.155889] sctp_chunk_put+0x1aa/0x1f0
[ 1503.155979] sctp_inq_pop+0x2f8/0x6e0
[ 1503.156070] sctp_assoc_bh_rcv+0x6a/0x230
[ 1503.156164] sctp_inq_push+0x117/0x150
[ 1503.156255] sctp_backlog_rcv+0xdf/0x4a0
[ 1503.156346] __release_sock+0x142/0x250
[ 1503.156436] release_sock+0x80/0x180
[ 1503.156526] sctp_sendmsg+0xbb0/0x1b10
[ 1503.156617] sock_sendmsg+0x68/0x80
[ 1503.156708] ___sys_sendmsg+0x431/0x4b0
[ 1503.156799] __sys_sendmsg+0xa4/0x130
[ 1503.156889] do_syscall_64+0x171/0x3f0
[ 1503.156980] entry_SYSCALL_64_after_hwframe+0x42/0xb7
[ 1503.157158] The buggy address belongs to the object at ffff8800298ab600
which belongs to the cache kmalloc-1024 of size 1024
[ 1503.157444] The buggy address is located 176 bytes inside of
1024-byte region [ffff8800298ab600, ffff8800298aba00)
[ 1503.157702] The buggy address belongs to the page:
[ 1503.157820] page:ffffea0000a62a00 count:1 mapcount:0 mapping:0000000000000000 index:0x0 compound_mapcount: 0
[ 1503.158053] flags: 0x4000000000008100(slab|head)
[ 1503.158171] raw: 4000000000008100 0000000000000000 0000000000000000 00000001800e000e
[ 1503.158350] raw: dead000000000100 dead000000000200 ffff880036002600 0000000000000000
[ 1503.158523] page dumped because: kasan: bad access detected
[ 1503.158698] Memory state around the buggy address:
[ 1503.158816] ffff8800298ab900: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 1503.158988] ffff8800298ab980: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 1503.159165] >ffff8800298aba00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 1503.159338] ^
[ 1503.159436] ffff8800298aba80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1503.159610] ffff8800298abb00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1503.159785] ==================================================================
[ 1503.159964] Disabling lock debugging due to kernel taint
The test scenario to trigger the issue consists of 4 devices:
- H0: data sender, connected to LAN0
- H1: data receiver, connected to LAN1
- GW0 and GW1: routers between LAN0 and LAN1. Both of them have an
ethernet connection on LAN0 and LAN1
On H{0,1} set GW0 as default gateway while on GW0 set GW1 as next hop for
data from LAN0 to LAN1.
Moreover create an ip6ip6 tunnel between H0 and H1 and send 3 concurrent
data streams (TCP/UDP/SCTP) from H0 to H1 through ip6ip6 tunnel (send
buffer size is set to 16K). While data streams are active flush the route
cache on HA multiple times.
I have not been able to identify a given commit that introduced the issue
since, using the reproducer described above, the kasan report has been
triggered from 4.14 and I have not gone back further.
Reported-by: Jianlin Shi <jishi@redhat.com>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Moved 16bit resolution condition check for stoney platform
to acp_hw_params.Depending upon substream required register
value need to be programmed rather than enabling 16bit resolution
support all time in acp init.
Signed-off-by: Vijendar Mukunda <Vijendar.Mukunda@amd.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
The following commit:
commit aa4d86163e ("block: loop: switch to VFS ITER_BVEC")
replaced __do_lo_send_write(), which used ITER_KVEC iterators, with
lo_write_bvec() which uses ITER_BVEC iterators. In this change, though,
the WRITE flag was lost:
- iov_iter_kvec(&from, ITER_KVEC | WRITE, &kvec, 1, len);
+ iov_iter_bvec(&i, ITER_BVEC, bvec, 1, bvec->bv_len);
This flag is necessary for the DAX case because we make decisions based on
whether or not the iterator is a READ or a WRITE in dax_iomap_actor() and
in dax_iomap_rw().
We end up going through this path in configurations where we combine a PMEM
device with 4k sectors, a loopback device and DAX. The consequence of this
missed flag is that what we intend as a write actually turns into a read in
the DAX code, so no data is ever written.
The very simplest test case is to create a loopback device and try and
write a small string to it, then hexdump a few bytes of the device to see
if the write took. Without this patch you read back all zeros, with this
you read back the string you wrote.
For XFS this causes us to fail or panic during the following xfstests:
xfs/074 xfs/078 xfs/216 xfs/217 xfs/250
For ext4 we have a similar issue where writes never happen, but we don't
currently have any xfstests that use loopback and show this issue.
Fix this by restoring the WRITE flag argument to iov_iter_bvec(). This
causes the xfstests to all pass.
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: stable@vger.kernel.org
Fixes: commit aa4d86163e ("block: loop: switch to VFS ITER_BVEC")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
When populating shadow ctx from guest, we should handle oa related
registers in hw ctx, so that they will not be overlapped by guest oa
configs. This patch made it possible to capture oa data from host for
both host and guests.
Signed-off-by: Min He <min.he@intel.com>
Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
If user continuously create vgpu, boot guest, shoutdown guest and destroy
vgpu from remote, the following calltrace exists in dmesg sometimes:
[ 6412.954721] RPM wakelock ref not held during HW access
[ 6412.954795] WARNING: CPU: 7 PID: 11941 at
linux/drivers/gpu/drm/i915/intel_drv.h:1800
intel_uncore_forcewake_get.part.7+0x96/0xa0 [i915]
[ 6412.954915] Call Trace:
[ 6412.954951] intel_uncore_forcewake_get+0x18/0x20 [i915]
[ 6412.954989] intel_gvt_switch_mmio+0x8e/0x770 [i915]
[ 6412.954996] ? __slab_free+0x14d/0x2c0
[ 6412.955001] ? __slab_free+0x14d/0x2c0
[ 6412.955006] ? __slab_free+0x14d/0x2c0
[ 6412.955041] intel_vgpu_stop_schedule+0x92/0xd0 [i915]
[ 6412.955073] intel_gvt_deactivate_vgpu+0x48/0x60 [i915]
[ 6412.955078] __intel_vgpu_release+0x55/0x260 [kvmgt]
when this happens, gvt_switch_mmio is called at vgpu destroy, host i915 is
idle and doesn't hold RPM wakelock, igd is in powersave mode, but
gvt_switch_mmio require igd power on to access register, so
intel_runtime_pm_get should be added to make sure igd power on before
gvt_switch_mmio.
v2: Move runtime_pm_get/put into gvt_switch_mmio.(Zhenyu)
Signed-off-by: Xiong Zhang <xiong.y.zhang@intel.com>
Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
When running rcutorture with TREE03 config, CONFIG_PROVE_LOCKING=y, and
kernel cmdline argument "rcutorture.gp_exp=1", lockdep reports a
HARDIRQ-safe->HARDIRQ-unsafe deadlock:
================================
WARNING: inconsistent lock state
4.16.0-rc4+ #1 Not tainted
--------------------------------
inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
takes:
__schedule+0xbe/0xaf0
{IN-HARDIRQ-W} state was registered at:
_raw_spin_lock+0x2a/0x40
scheduler_tick+0x47/0xf0
...
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&rq->lock);
<Interrupt>
lock(&rq->lock);
*** DEADLOCK ***
1 lock held by rcu_torture_rea/724:
rcu_torture_read_lock+0x0/0x70
stack backtrace:
CPU: 2 PID: 724 Comm: rcu_torture_rea Not tainted 4.16.0-rc4+ #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-20171110_100015-anatol 04/01/2014
Call Trace:
lock_acquire+0x90/0x200
? __schedule+0xbe/0xaf0
_raw_spin_lock+0x2a/0x40
? __schedule+0xbe/0xaf0
__schedule+0xbe/0xaf0
preempt_schedule_irq+0x2f/0x60
retint_kernel+0x1b/0x2d
RIP: 0010:rcu_read_unlock_special+0x0/0x680
? rcu_torture_read_unlock+0x60/0x60
__rcu_read_unlock+0x64/0x70
rcu_torture_read_unlock+0x17/0x60
rcu_torture_reader+0x275/0x450
? rcutorture_booster_init+0x110/0x110
? rcu_torture_stall+0x230/0x230
? kthread+0x10e/0x130
kthread+0x10e/0x130
? kthread_create_worker_on_cpu+0x70/0x70
? call_usermodehelper_exec_async+0x11a/0x150
ret_from_fork+0x3a/0x50
This happens with the following even sequence:
preempt_schedule_irq();
local_irq_enable();
__schedule():
local_irq_disable(); // irq off
...
rcu_note_context_switch():
rcu_note_preempt_context_switch():
rcu_read_unlock_special():
local_irq_save(flags);
...
raw_spin_unlock_irqrestore(...,flags); // irq remains off
rt_mutex_futex_unlock():
raw_spin_lock_irq();
...
raw_spin_unlock_irq(); // accidentally set irq on
<return to __schedule()>
rq_lock():
raw_spin_lock(); // acquiring rq->lock with irq on
which means rq->lock becomes a HARDIRQ-unsafe lock, which can cause
deadlocks in scheduler code.
This problem was introduced by commit 02a7c234e5 ("rcu: Suppress
lockdep false-positive ->boost_mtx complaints"). That brought the user
of rt_mutex_futex_unlock() with irq off.
To fix this, replace the *lock_irq() in rt_mutex_futex_unlock() with
*lock_irq{save,restore}() to make it safe to call rt_mutex_futex_unlock()
with irq off.
Fixes: 02a7c234e5 ("rcu: Suppress lockdep false-positive ->boost_mtx complaints")
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com>
Link: https://lkml.kernel.org/r/20180309065630.8283-1-boqun.feng@gmail.com
The GPIO chip is called davinci_gpio.0 in legacy mode. Fix it, so that
mmc can correctly lookup the wp and cp gpios.
Note that it is the gpio-davinci driver that sets the gpiochip label to
davinci_gpio.0.
Fixes: c69f43fb4f ("ARM: davinci: hawk: use gpio descriptor for mmc pins")
Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
[nsekhar@ti.com: add a note on where the chip label is set]
Signed-off-by: Sekhar Nori <nsekhar@ti.com>
Disable the kprobe probing of the entry trampoline:
.entry_trampoline is a code area that is used to ensure page table
isolation between userspace and kernelspace.
At the beginning of the execution of the trampoline, we load the
kernel's CR3 register. This has the effect of enabling the translation
of the kernel virtual addresses to physical addresses. Before this
happens most kernel addresses can not be translated because the running
process' CR3 is still used.
If a kprobe is placed on the trampoline code before that change of the
CR3 register happens the kernel crashes because int3 handling pages are
not accessible.
To fix this, add the .entry_trampoline section to the kprobe blacklist
to prohibit the probing of code before all the kernel pages are
accessible.
Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: mathieu.desnoyers@efficios.com
Cc: mhiramat@kernel.org
Link: http://lkml.kernel.org/r/1520565492-4637-2-git-send-email-francis.deslauriers@efficios.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The email address for the Tegra IOMMU maintainer, Hiroshi, is no longer
valid. Hiroshi has not been maintaining the Tegra IOMMU driver for some
time now and so remove Hiroshi as the maintainer and add Thierry Reding
instead. Also add the mailing list information for this driver as well.
Signed-off-by: Jon Hunter <jonathanh@nvidia.com>
Acked-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
In ctx_resched(), EVENT_FLEXIBLE should be sched_out when EVENT_PINNED is
added. However, ctx_resched() calculates ctx_event_type before checking
this condition. As a result, pinned events will NOT get higher priority
than flexible events.
The following shows this issue on an Intel CPU (where ref-cycles can
only use one hardware counter).
1. First start:
perf stat -C 0 -e ref-cycles -I 1000
2. Then, in the second console, run:
perf stat -C 0 -e ref-cycles:D -I 1000
The second perf uses pinned events, which is expected to have higher
priority. However, because it failed in ctx_resched(). It is never
run.
This patch fixes this by calculating ctx_event_type after re-evaluating
event_type.
Reported-by: Ephraim Park <ephiepark@fb.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <jolsa@redhat.com>
Cc: <kernel-team@fb.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Fixes: 487f05e18a ("perf/core: Optimize event rescheduling on active contexts")
Link: http://lkml.kernel.org/r/20180306055504.3283731-1-songliubraving@fb.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull NVMe fixes for this series from Keith:
"A few late fixes for 4.16:
* Reverting sysfs slave device links for native nvme multipathing.
The hidden disk attributes broke common user tools.
* A fix for a PPC pci error handling regression.
* Update pci interrupt count to consider the actual IRQ spread, fixing
potentially poor initial queue affinity.
* Off-by-one errors in nvme-fc queue sizes
* A fabrics discovery fix to be more tolerant with user tools."
* 'nvme-4.16-rc5' of git://git.infradead.org/nvme:
nvme_fc: rework sqsize handling
nvme-fabrics: Ignore nr_io_queues option for discovery controllers
Revert "nvme: create 'slaves' and 'holders' entries for hidden controllers"
nvme: pci: pass max vectors as num_possible_cpus() to pci_alloc_irq_vectors
nvme-pci: Fix EEH failure on ppc
Fixes for 4.16. A bit bigger than I would have liked, but most of that
is DC fixes which Harry helped me pull together from the past few weeks.
Highlights:
- Fix DL DVI with DC
- Various RV fixes for DC
- Overlay fixes for DC
- Fix HDMI2 handling on boards without HBR tables in the vbios
- Fix crash with pass-through on SI on amdgpu
- Fix RB harvesting on KV
- Fix hibernation failures on UVD with certain cards
* 'drm-fixes-4.16' of git://people.freedesktop.org/~agd5f/linux: (35 commits)
drm/amd/display: validate plane format on primary plane
drm/amdgpu:Always save uvd vcpu_bo in VM Mode
drm/amdgpu:Correct max uvd handles
drm/amd/display: early return if not in vga mode in disable_vga
drm/amd/display: Fix takover from VGA mode
drm/amd/display: Fix memleaks when atomic check fails.
drm/amd/display: Return success when enabling interrupt
drm/amd/display: Use crtc enable/disable_vblank hooks
drm/amd/display: update infoframe after dig fe is turned on
drm/amd/display: fix boot-up on vega10
drm/amd/display: fix cursor related Pstate hang
drm/amd/display: Set irq state only on existing crtcs
drm/amd/display: Fixed non-native modes not lighting up
drm/amd/display: Call update_stream_signal directly from amdgpu_dm
drm/amd/display: Make create_stream_for_sink more consistent
drm/amd/display: Don't block dual-link DVI modes
drm/amd/display: Don't allow dual-link DVI on all ASICs.
drm/amd/display: Pass signal directly to enable_tmds_output
drm/amd/display: Remove unnecessary fail labels in create_stream_for_sink
drm/amd/display: Move MAX_TMDS_CLOCK define to header
...
sun4i fixes on clk, division by zero and LVDS.
* tag 'drm-misc-fixes-2018-03-07' of git://anongit.freedesktop.org/drm/drm-misc:
drm/sun4i: crtc: Call drm_crtc_vblank_on / drm_crtc_vblank_off
drm/sun4i: rgb: Fix potential division by zero
drm/sun4i: tcon: Reduce the scope of the LVDS error a bit
drm/sun4i: Release exclusive clock lock when disabling TCON
drm/sun4i: Fix dclk_set_phase
otherwise kernel can oops later in seq_release() due to dereferencing null
file->private_data which is only set if seq_open() succeeds.
BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
IP: seq_release+0xc/0x30
Call Trace:
close_pdeo+0x37/0xd0
proc_reg_release+0x5d/0x60
__fput+0x9d/0x1d0
____fput+0x9/0x10
task_work_run+0x75/0x90
do_exit+0x252/0xa00
do_group_exit+0x36/0xb0
SyS_exit_group+0xf/0x10
Fixes: 516fb7f2e7 ("/proc/module: use the same logic as /proc/kallsyms for address exposure")
Cc: Jessica Yu <jeyu@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: stable@vger.kernel.org # 4.15+
Signed-off-by: Leon Yu <chianglungyu@gmail.com>
Signed-off-by: Jessica Yu <jeyu@kernel.org>
Pull MIPS fixes from James Hogan:
"A miscellaneous pile of MIPS fixes for 4.16:
- move put_compat_sigset() to evade hardened usercopy warnings (4.16)
- select ARCH_HAVE_PC_{SERIO,PARPORT} for Loongson64 platforms (4.16)
- fix kzalloc() failure handling in ath25 (3.19) and Octeon (4.0)
- fix disabling of IPIs during BMIPS suspend (3.19)"
* tag 'mips_fixes_4.16_4' of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/mips:
MIPS: BMIPS: Do not mask IPIs during suspend
MIPS: Loongson64: Select ARCH_MIGHT_HAVE_PC_SERIO
MIPS: Loongson64: Select ARCH_MIGHT_HAVE_PC_PARPORT
signals: Move put_compat_sigset to compat.h to silence hardened usercopy
MIPS: OCTEON: irq: Check for null return on kzalloc allocation
MIPS: ath25: Check for kzalloc allocation failure
Pull chrome platform fix from Benson Leung:
"Revert a problematic patch that constified something imporperly"
* tag 'chrome-platform-4.16-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bleung/chrome-platform:
Revert "platform/chrome: chromeos_laptop: make chromeos_laptop const"
We do want to respect the FLUSH_SYNC argument to nfs_commit_inode() to
ensure that all outstanding COMMIT requests to the inode in question are
complete. Currently we may exit early from both nfs_commit_inode() and
nfs_write_inode() even if there are COMMIT requests in flight, or unstable
writes on the commit list.
In order to get the right semantics w.r.t. sync_inode(), we don't need
to have nfs_commit_inode() reset the inode dirty flags when called from
nfs_wb_page() and/or nfs_wb_all(). We just need to ensure that
nfs_write_inode() leaves them in the right state if there are outstanding
commits, or stable pages.
Reported-by: Scott Mayhew <smayhew@redhat.com>
Fixes: dc4fd9ab01 ("nfs: don't wait on commit in nfs_commit_inode()...")
Cc: stable@vger.kernel.org # v4.14+
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Stephen Hemminger says:
====================
hv_netvsc: fix multicast flags and sync
This set of patches deals with the handling of multicast flags
and addresses in transparent VF mode. The recent set of patches
(in linux-net) had a couple of bugs.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The dev_uc/mc_sync calls need to have the device address list
locked. This was spotted by running with lockdep enabled.
Fixes: bee9d41b37 ("hv_netvsc: propagate rx filters to VF")
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The rx_mode operation handler is different than other callbacks
in that is not always called with rtnl held. Therefore use
RCU to ensure that references are valid.
Fixes: bee9d41b37 ("hv_netvsc: propagate rx filters to VF")
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The netvsc driver can get repeated calls to netvsc_rx_mode during
network setup; each of these calls ends up scheduling the lower
layers to update tha packet filter. This update requires an
request/response to the host. So avoid doing this if we already
know that the correct packet filter value is set.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The recent change to not always enable all multicast and broadcast
was broken; meant to set filter, not change flags.
Fixes: 009f766ca2 ("hv_netvsc: filter multicast/broadcast")
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Corrected four outstanding issues in the transport around sqsize.
1: Create Connection LS is sending the 1's-based sqsize, should be
sending the 0's-based value.
2: allocation of hw queue is using the 0's-base size. It should be
using the 1's-based value.
3: normalization of ctrl.sqsize by MQES is using MQES+1 (1's-based
value). It should be MQES (0's-based value).
4: Missing clause to ensure queue_count not larger than ctrl->sqsize.
Corrected by:
Clean up routines that pass queue size around. The queue size value is
the actual count (1's-based) value and determined from ctrl->sqsize + 1.
Routines that send 0's-based value adapt from queue size.
Sset ctrl->sqsize properly for MQES.
Added clause to nsure queue_count not larger than ctrl->sqsize + 1.
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Kalle Valo:
====================
wireless-drivers fixes for 4.16
Quote a few fixes as I have not been able to send a pull request
earlier. Most of the fixes for iwlwifi but also few others, nothing
really standing out though.
iwlwifi
* fix a bogus warning when freeing a TFD
* fix severe throughput problem with 9000 series
* fix for a bug that caused queue hangs in certain situations
* fix for an issue with IBSS
* fix an issue with rate-scaling in AP-mode
* fix Channel Switch Announcement (CSA) issues with count 0 and 1
* some firmware debugging fixes
* remov a wrong error message when removing keys
* fix a firmware sysassert most usually triggered in IBSS
* a couple of fixes on multicast queues
* a fix with CCMP 256
rtlwifi
* fix loss of signal for rtl8723be
brcmfmac
* add possibility to obtain firmware error
* fix P2P_DEVICE ethernet address generation
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds missing initialisation for HP 2013 UltraSlim Dock
Line-In/Out PINs and activates keyboard mute/micmute leds
for HP ProBook 640 G2
Signed-off-by: Dennis Wassenberg <dennis.wassenberg@secunet.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
This patch adds missing initialisation for HP 2013 UltraSlim Dock
Line-In/Out PINs and activates keyboard mute/micmute leds
for HP EliteBook 820 G3
Signed-off-by: Dennis Wassenberg <dennis.wassenberg@secunet.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Pretty minor: just SKB_GSO_TCP -> SKB_GSO_TCPV4 and
SKB_GSO_TCP6 -> SKB_GSO_TCPV6.
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull a xen_blkfront fix from Konrad:
"It has one simple fix for the multi-queue support not showing up after
a block device was detached/re-attached."
* 'stable/for-jens-4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
xen-blkfront: move negotiate_mq to cover all cases of new VBDs
The __send_and_alloc_skb() receives a skb ptr as a parameter but in
case it fails the skb is not valid:
- Send failed and released the skb internally.
- Allocation failed.
The current code tries to release the skb in case of failure which
causes redundant freeing.
Fixes: 9b00cf2d10 ("team: implement multipart netlink messages for options transfers")
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This removes a dependency on the order options are passed when creating
a fabrics controller. With the old code, if "nr_io_queues" appears before
an "nqn" option specifying the discovery controller, then nr_io_queues
is overridden with zero. If "nr_io_queues" appears after specifying the
discovery controller, then the nr_io_queues option is used to set the
number of queues, and the driver attempts to establish IO connections
to the discovery controller (which doesn't work).
It seems better to ignore (and warn about) the "nr_io_queues" option
if userspace has already asked to connect to the discovery controller.
Signed-off-by: Roland Dreier <roland@purestorage.com>
Reviewed-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <keith.busch@intel.com>
cmd_dt_S_dtb constructs the assembly source to incorporate a devicetree
FDT (that is, the .dtb file) as binary data in the kernel image. This
assembly source contains labels before and after the binary data. The
label names incorporate the file name of the corresponding .dtb file.
Hyphens are not legal characters in labels, so .dtb files built into the
kernel with hyphens in the file name result in errors like the
following:
bcm3368-netgear-cvg834g.dtb.S: Assembler messages:
bcm3368-netgear-cvg834g.dtb.S:5: Error: : no such section
bcm3368-netgear-cvg834g.dtb.S:5: Error: junk at end of line, first unrecognized character is `-'
bcm3368-netgear-cvg834g.dtb.S:6: Error: unrecognized opcode `__dtb_bcm3368-netgear-cvg834g_begin:'
bcm3368-netgear-cvg834g.dtb.S:8: Error: unrecognized opcode `__dtb_bcm3368-netgear-cvg834g_end:'
bcm3368-netgear-cvg834g.dtb.S:9: Error: : no such section
bcm3368-netgear-cvg834g.dtb.S:9: Error: junk at end of line, first unrecognized character is `-'
Fix this by updating cmd_dt_S_dtb to transform all hyphens from the file
name to underscores when constructing the labels.
As of v4.16-rc2, 1139 .dts files across ARM64, ARM, MIPS and PowerPC
contain hyphens in their names, but the issue only currently manifests
on Broadcom MIPS platforms, as that is the only place where such files
are built into the kernel. For example when CONFIG_DT_NETGEAR_CVG834G=y,
or on BMIPS kernels when the dtbs target is used (in the latter case it
admittedly shouldn't really build all the dtb.o files, but thats a
separate issue).
Fixes: 695835511f ("MIPS: BMIPS: rename bcm96358nb4ser to bcm6358-neufbox4-sercom")
Signed-off-by: James Hogan <jhogan@kernel.org>
Reviewed-by: Frank Rowand <frowand.list@gmail.com>
Cc: Rob Herring <robh+dt@kernel.org>
Cc: Michal Marek <michal.lkml@markovi.net>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Kevin Cernekee <cernekee@gmail.com>
Cc: <stable@vger.kernel.org> # 4.9+
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
The bloat-o-meter script has two typos in the help, fix both.
Fixes: 192efb7a1f ("bloat-o-meter: provide 3 different arguments for data, function and All")
Signed-off-by: Matteo Croce <mcroce@redhat.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Felipe writes:
usb: fixes for v4.16-rc4
Just small fixes now. The two most important are a fix for a a lock up
on USB ID pin change during system suspend/resume on dwc3 and a
use-after-free fix in ffs_fs_kill_sb().
Apart from that, some DT compatible fixes.
The check_interval file in
/sys/devices/system/machinecheck/machinecheck<cpu number>
directory is a global timer value for MCE polling. If it is changed by one
CPU, mce_restart() broadcasts the event to other CPUs to delete and restart
the MCE polling timer and __mcheck_cpu_init_timer() reinitializes the
mce_timer variable.
If more than one CPU writes a specific value to the check_interval file
concurrently, mce_timer is not protected from such concurrent accesses and
all kinds of explosions happen. Since only root can write to those sysfs
variables, the issue is not a big deal security-wise.
However, concurrent writes to these configuration variables is void of
reason so the proper thing to do is to serialize the access with a mutex.
Boris:
- Make store_int_with_restart() use device_store_ulong() to filter out
negative intervals
- Limit min interval to 1 second
- Correct locking
- Massage commit message
Signed-off-by: Seunghun Han <kkamagui@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20180302202706.9434-1-kkamagui@gmail.com
Never directly free @dev after calling device_register(), even
if it returned an error! Always use put_device() to give up the
reference initialized.
Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
One version of Lenovo Thinkpad T570 did not use ALC298
(like other Kaby Lake devices). Instead it uses ALC292.
In order to make the Lenovo dock working with that codec
the dock quirk for ALC292 will be used.
Signed-off-by: Dennis Wassenberg <dennis.wassenberg@secunet.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Driver uses alias from Device Tree as an index of pin controller data
array. In case of a wrong DTB or an out-of-tree DTB, the alias could be
outside of this data array leading to out-of-bounds access.
Depending on binary and memory layout, this could be handled properly
(showing error like "samsung-pinctrl 3860000.pinctrl: driver data not
available") or could lead to exceptions.
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: <stable@vger.kernel.org>
Fixes: 30574f0db1 ("pinctrl: add samsung pinctrl and gpiolib driver")
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Tomasz Figa <tomasz.figa@gmail.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
With the previous two fixes for the write / ioctl races:
ALSA: seq: Don't allow resizing pool in use
ALSA: seq: More protection for concurrent write and ioctl races
the cells aren't any longer in queues at the point calling
snd_seq_pool_done() in snd_seq_ioctl_set_client_pool(). Hence the
function call snd_seq_queue_client_leave_cells() can be dropped safely
from there.
Suggested-by: Nicolai Stange <nstange@suse.de>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
This patch is an attempt for further hardening against races between
the concurrent write and ioctls. The previous fix d15d662e89
("ALSA: seq: Fix racy pool initializations") covered the race of the
pool initialization at writer and the pool resize ioctl by the
client->ioctl_mutex (CVE-2018-1000004). However, basically this mutex
should be applied more widely to the whole write operation for
avoiding the unexpected pool operations by another thread.
The only change outside snd_seq_write() is the additional mutex
argument to helper functions, so that we can unlock / relock the given
mutex temporarily during schedule() call for blocking write.
Fixes: d15d662e89 ("ALSA: seq: Fix racy pool initializations")
Reported-by: 范龙飞 <long7573@126.com>
Reported-by: Nicolai Stange <nstange@suse.de>
Reviewed-and-tested-by: Nicolai Stange <nstange@suse.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Display driver assumes it can use clk_set_rate for the display clock
via set-rate-parent mechanism, so add the flag for this to id.
Signed-off-by: Tero Kristo <t-kristo@ti.com>
Acked-by: Tony Lindgren <tony@atomide.com>
Display driver assumes it can use clk_set_rate for the display clock
via set-rate-parent mechanism, so add the flag for this to it.
Signed-off-by: Tero Kristo <t-kristo@ti.com>
Reported-by: Jyri Sarha <jsarha@ti.com>
Acked-by: Tony Lindgren <tony@atomide.com>
Tested-by: Jyri Sarha <jsarha@ti.com>
Certain clkctrl clocks, notably the display ones, use the
CLK_SET_RATE_PARENT feature extensively. Add support for this flag
to the clkctrl clocks.
Signed-off-by: Tero Kristo <t-kristo@ti.com>
Reported-by: Jyri Sarha <jsarha@ti.com>
Acked-by: Tony Lindgren <tony@atomide.com>
Tested-by: Jyri Sarha <jsarha@ti.com>
Original idea by Ashok, completely rewritten by Borislav.
Before you read any further: the early loading method is still the
preferred one and you should always do that. The following patch is
improving the late loading mechanism for long running jobs and cloud use
cases.
Gather all cores and serialize the microcode update on them by doing it
one-by-one to make the late update process as reliable as possible and
avoid potential issues caused by the microcode update.
[ Borislav: Rewrite completely. ]
Co-developed-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Tom Lendacky <thomas.lendacky@amd.com>
Tested-by: Ashok Raj <ashok.raj@intel.com>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com>
Link: https://lkml.kernel.org/r/20180228102846.13447-8-bp@alien8.de
The two usb-otg regulators for imx7d-sdb are both called
"regulator-usb-otg1-vbus" and they effectively override each other.
This is most likely a copy-paste error.
Fixes: b877039aa1 ("ARM: dts: imx7d-sdb: Adjust the regulator nodes")
Signed-off-by: Leonard Crestez <leonard.crestez@nxp.com>
Reviewed-by: Fabio Estevam <fabio.estevam@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
This is a fix for a (sort of) fallout in the recent commit
d15d662e89 ("ALSA: seq: Fix racy pool initializations") for
CVE-2018-1000004.
As the pool resize deletes the existing cells, it may lead to a race
when another thread is writing concurrently, eventually resulting a
UAF.
A simple workaround is not to allow the pool resizing when the pool is
in use. It's an invalid behavior in anyway.
Fixes: d15d662e89 ("ALSA: seq: Fix racy pool initializations")
Reported-by: 范龙飞 <long7573@126.com>
Reported-by: Nicolai Stange <nstange@suse.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Since Linux v3.2, vsyscalls have been deprecated and slow. From v3.2
on, Linux had three vsyscall modes: "native", "emulate", and "none".
"emulate" is the default. All known user programs work correctly in
emulate mode, but vsyscalls turn into page faults and are emulated.
This is very slow. In "native" mode, the vsyscall page is easily
usable as an exploit gadget, but vsyscalls are a bit faster -- they
turn into normal syscalls. (This is in contrast to vDSO functions,
which can be much faster than syscalls.) In "none" mode, there are
no vsyscalls.
For all practical purposes, "native" was really just a chicken bit
in case something went wrong with the emulation. It's been over six
years, and nothing has gone wrong. Delete it.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Kernel Hardening <kernel-hardening@lists.openwall.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/519fee5268faea09ae550776ce969fa6e88668b0.1520449896.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This reverts commit a376cd9160 because
chromeos_laptop instances should not be marked as "const" (at this
time), since i2c_peripheral is being modified (we change "state" and
"tries") when we instantiate devices.
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Benson Leung <bleung@chromium.org>
Pull virtio bugfix from Michael Tsirkin:
"A bugfix for error handling in virtio"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
virtio_ring: fix num_free handling in error case
Pull input fixes from Dmitry Torokhov:
- we are reverting patch that was switched touchpad on Lenovo T460P
over to native RMI because on these boxes BIOS messes up with SMBus
controller state. We might re-enable it later once SMBus issue is
resolved
- disabling interrupts in matrix_keypad driver was racy
- mms114 now has SPDX header and matching MODULE_LICENSE
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Revert "Input: synaptics - Lenovo Thinkpad T460p devices should use RMI"
Input: matrix_keypad - fix race when disabling interrupts
Input: mms114 - add SPDX identifier
Input: mms114 - fix license module information
Daniel Borkmann says:
====================
pull-request: bpf 2018-03-08
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) Fix various BPF helpers which adjust the skb and its GSO information
with regards to SCTP GSO. The latter is a special case where gso_size
is of value GSO_BY_FRAGS, so mangling that will end up corrupting
the skb, thus bail out when seeing SCTP GSO packets, from Daniel(s).
2) Fix a compilation error in bpftool where BPF_FS_MAGIC is not defined
due to too old kernel headers in the system, from Jiri.
3) Increase the number of x64 JIT passes in order to allow larger images
to converge instead of punting them to interpreter or having them
rejected when the interpreter is not built into the kernel, from Daniel.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit 4828296982 which
caused the following issues:
1. On T460p with BIOS version 2.22 touchpad and trackpoint stop working
after suspend-resume cycle. Due to strange state of the device another
suspend is impossible.
The following dmesg errors can be observed:
thinkpad_acpi: EC reports that Thermal Table has changed
rmi4_smbus 7-002c: failed to get SMBus version number!
rmi4_physical rmi4-00: rmi_driver_reset_handler: Failed to read current IRQ mask.
rmi4_f01 rmi4-00.fn01: Failed to restore normal operation: -16.
rmi4_f01 rmi4-00.fn01: Resume failed with code -16.
rmi4_physical rmi4-00: Failed to suspend functions: -16
rmi4_smbus 7-002c: Failed to resume device: -16
PM: resume devices took 0.640 seconds
rmi4_f03 rmi4-00.fn03: rmi_f03_pt_write: Failed to write to F03 TX register (-16).
rmi4_physical rmi4-00: rmi_driver_clear_irq_bits: Failed to change enabled interrupts!
rmi4_physical rmi4-00: rmi_driver_set_irq_bits: Failed to change enabled interrupts!
psmouse: probe of serio3 failed with error -1
2. On another T460p with BIOS version 2.15 two finger scrolling gesture
on the touchpad stops working after suspend-resume cycle (about 75%
reproducibility, when it still works, the scrolling gesture becomes
laggy). Nothing suspicious appears in the dmesg.
Analysis form Richard Schütz:
"RMI is unreliable on the ThinkPad T460p because the device is affected
by the firmware behavior addressed in a7ae81952c ("i2c: i801: Allow
ACPI SystemIO OpRegion to conflict with PCI BAR")."
The affected devices often show:
i801_smbus 0000:00:1f.4: BIOS is accessing SMBus registers
i801_smbus 0000:00:1f.4: Driver SMBus register access inhibited
Reported-by: Richard Schütz <rschuetz@uni-koblenz.de>
Signed-off-by: Arkadiusz Hiler <arkadiusz.hiler@intel.com>
Tested-by: Martin Peres <martin.peres@linux.intel.com>
Tested-by: Arkadiusz Hiler <arkadiusz.hiler@intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
In Cilium some of the main programs we run today are hitting 9 passes
on x64's JIT compiler, and we've had cases already where we surpassed
the limit where the JIT then punts the program to the interpreter
instead, leading to insertion failures due to CONFIG_BPF_JIT_ALWAYS_ON
or insertion failures due to the prog array owner being JITed but the
program to insert not (both must have the same JITed/non-JITed property).
One concrete case the program image shrunk from 12,767 bytes down to
10,288 bytes where the image converged after 16 steps. I've measured
that this took 340us in the JIT until it converges on my i7-6600U. Thus,
increase the original limit we had from day one where the JIT covered
cBPF only back then before we run into the case (as similar with the
complexity limit) where we trip over this and hit program rejections.
Also add a cond_resched() into the compilation loop, the JIT process
runs without any locks and may sleep anyway.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Prior to 25520d55cd ("block: Inline blk_integrity in struct gendisk")
we needed to temporarily add a zero-capacity disk before registering for
blk-integrity. But adding a zero-capacity disk caused the partition
table scanning to bail early, and this resulted in partitions not coming
up after a probe of the BTT or blk namespaces.
We can now register for integrity before the disk has been added, and
this fixes the rescan problems.
Fixes: 25520d55cd ("block: Inline blk_integrity in struct gendisk")
Reported-by: Dariusz Dokupil <dariusz.dokupil@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
In dce110, the plane configuration is such that plane 0
or the primary plane should be rendered with only RGB data.
This patch adds the validation to ensure that no video data
is rendered on plane 0.
Signed-off-by: Shirish S <shirish.s@amd.com>
Reviewed-by: Tony Cheng <Tony.Cheng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
HW Engineer's Notes:
During switch from vga->extended, if we set the VGA_TEST_ENABLE and then
hit the VGA_TEST_RENDER_START, then the DCHUBP timing gets updated correctly.
Then vBIOS will have it poll for the VGA_TEST_RENDER_DONE and unset
VGA_TEST_ENABLE, to leave it in the same state as before.
Signed-off-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
Reviewed-by: Tony Cheng <Tony.Cheng@amd.com>
Acked-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
While checking plane states for updates during atomic check, we create
dc_plane_states in preparation. These dc states should be freed if
something errors.
Although the input transfer function is also freed by
dc_plane_state_release(), we should free it (on error) under the same
scope as where it is created.
Signed-off-by: Leo (Sunpeng) Li <sunpeng.li@amd.com>
Reviewed-by: Harry Wentland <Harry.Wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Before dig fe is enabled, infoframe can't be programmed. So in
suspend resume case our infoframe programmming was not going through.
This change changes the sequence so that infoframe is programmed
after.
Signed-off-by: Eric Yang <Eric.Yang2@amd.com>
Reviewed-by: Tony Cheng <Tony.Cheng@amd.com>
Acked-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Fixing null-deref on Vega10 due to regression after
'fix cursor related Pstate hang' change.
Added null checks in setting cursor position.
Signed-off-by: Roman Li <Roman.Li@amd.com>
Reviewed-by: Eric Yang <eric.yang2@amd.com>
Reviewed-by: Harry Wentland <Harry.Wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Move cursor programming to inside the OTG_MASTER_UPDATE_LOCK
If graphics plane go from 1 pipe to hsplit, the cursor updates
after mpc programming and unlock. Which means there is a window
of time where cursor is enabled on the wrong pipe if it's on
the right side of the screen (i.e. case where cursor need to
move from pipe 0 to pipe 3 post split). This will cause pstate hang.
Solution is to program the cursor while still locked.
Signed-off-by: Eric Yang <Eric.Yang2@amd.com>
Reviewed-by: Tony Cheng <Tony.Cheng@amd.com>
Acked-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Because AMDGPU_CRTC_IRQ_VLINE1 = 6, it expected 6 more crtcs to be
programed with disabled irq state in amdgpu_irq_disable_all. That caused errors and accessed
the wrong memory location.
Signed-off-by: Mikita Lipski <mikita.lipski@amd.com>
Reviewed-by: Tony Cheng <Tony.Cheng@amd.com>
Reviewed-by: Harry Wentland <Harry.Wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
There is no need to call drm_mode_set_crtcinfo() again once
crtc timing is decided. Otherwise non-native/unsupported timing
might get overwritten.
Signed-off-by: Jerry (Fangzhi) Zuo <Jerry.Zuo@amd.com>
Reviewed-by: Harry Wentland <Harry.Wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
There's no good place in DC to cover all place where stream signal should
be updated. update_stream_signal depends on timing which comes from DM.
Signed-off-by: Harry Wentland <harry.wentland@amd.com>
Reviewed-by: Tony Cheng <Tony.Cheng@amd.com>
Acked-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
We've got a helper function to call dc_create_stream_for_sink and one
other place that calls it directly. Make sure we call the helper
functions always since we need to update a bunch of things in stream and
don't want to miss that.
Signed-off-by: Harry Wentland <harry.wentland@amd.com>
Reviewed-by: Tony Cheng <Tony.Cheng@amd.com>
Acked-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
When topology changed and rehook up MST display to the same DP
connector, need to take care of drm_dp_mst_port object.
Due to the topology is changed, drm_dp_mst_port and corresponding
i2c_algorithm object could be NULL in such situation.
Signed-off-by: Jerry (Fangzhi) Zuo <Jerry.Zuo@amd.com>
Reviewed-by: Roman Li <Roman.Li@amd.com>
Acked-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The below commit
"drm/atomic: Try to preserve the crtc enabled state in drm_atomic_remove_fb, v2"
introduces a slight behavioral change to rmfb. Instead of disabling a crtc
when the primary plane is disabled, it now preserves it.
This change leads to BUG hit while performing atomic commit on amd driver.
As a fix this patch ensures that we disable the CRTC's with NULL FB by returning
-EINVAL and hence triggering fall back to the old behavior and turning off the
crtc in atomic_remove_fb().
V2: Added error check for plane_state and removed sanity check for crtc.
Signed-off-by: Shirish S <shirish.s@amd.com>
Signed-off-by: Pratik Vishwakarma <Pratik.Vishwakarma@amd.com>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This patch updates the dc's plane state with the parameters set by the
user side.
This is needed to validate the plane capabilities with the parameters
user space wants to set.
Signed-off-by: Shirish S <shirish.s@amd.com>
Reviewed-by: Harry Wentland <Harry.Wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
CZ & ST support uptil a limit 2:1 downscaling, this patch
adds validate_plane hook, that shall be used to validate
the plane attributes sent by the user space based
on dce110 capabilities.
Signed-off-by: Shirish S <shirish.s@amd.com>
Reviewed-by: Harry Wentland <Harry.Wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
amdgpu_dm_atomic_check() is used to validate the entire configuration of
planes and crtc's that the user space wants to commit.
However amdgpu_dm_atomic_check() depends upon DRM_MODE_ATOMIC_ALLOW_MODESET
flag else its mostly dummy.
Its not mandatory for the user space to set DRM_MODE_ATOMIC_ALLOW_MODESET,
and in general its not set either along with DRM_MODE_ATOMIC_TEST_ONLY.
Considering its importantance, this patch defers the allow_modeset check
in dm_update_planes_state(), so that there shall be scope to validate
the configuration sent from user space, without impacting the population
of dc/dm related data structures.
Signed-off-by: Shirish S <shirish.s@amd.com>
Reviewed-by: Harry Wentland <Harry.Wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
it is required if a platform supports PCIe root complex
core voltage reduction. After receiving this notification,
SBIOS can apply default PCIe root complex power policy.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Rex Zhu <Rex.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
negotiate_mq should happen in all cases of a new VBD being discovered by
xen-blkfront, whether called through _probe() or a hot-attached new VBD
from dom-0 via xenstore. Otherwise, hot-attached new VBDs are left
configured without multi-queue.
Signed-off-by: Bhavesh Davda <bhavesh.davda@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Do not set 'needs_free_netdev' as we do call free_netdev
for mgmt net devices, doing both hits BUG_ON.
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
instantiation of VF's on different adapters fails, copy
adapter index and chip type to PF0-3 adapter instances
to fix the issue.
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The user can provide very large cqe_size which will cause to integer
overflow as it can be seen in the following UBSAN warning:
Signed-off-by: Doug Ledford <dledford@redhat.com>
Users of ucma are supposed to provide size of option level,
in most paths it is supposed to be equal to u8 or u16, but
it is not the case for the IB path record, where it can be
multiple of struct ib_path_rec_data.
This patch takes simplest possible approach and prevents providing
values more than possible to allocate.
Reported-by: syzbot+a38b0e9f694c379ca7ce@syzkaller.appspotmail.com
Fixes: 7ce86409ad ("RDMA/ucma: Allow user space to set service type")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
resolved_dev returned might be NULL as ifindex is transient number.
Ignoring NULL check of resolved_dev might crash the kernel.
Therefore perform NULL check before accessing resolved_dev.
Additionally rdma_resolve_ip_route() invokes addr_resolve() which
performs check and address translation for loopback ifindex.
Therefore, checking it again in rdma_resolve_ip_route() is not helpful.
Therefore, the code is simplified to avoid IFF_LOOPBACK check.
Fixes: 200298326b ("IB/core: Validate route when we init ah")
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Starting with v4.16-rc1 we've been seeing a higher than usual number
of requests for the kernel to load networking modules, even on events
which shouldn't trigger a module load (e.g. ioctl(TCGETS)). Stephen
Smalley suggested the problem may lie in commit 44c02a2c3d
("dev_ioctl(): move copyin/copyout to callers") which moves changes
the network dev_ioctl() function to always call dev_load(),
regardless of the requested ioctl.
This patch moves the dev_load() calls back into the individual ioctls
while preserving the rest of the original patch.
Reported-by: Dominick Grift <dac.override@gmail.com>
Suggested-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull gfs2 fix from Bob Peterson:
"An additional patch from Andreas Gruenbacher that fixes another
unfortunate GFS2 regression"
* tag 'gfs2-4.16.rc4.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
gfs2: Fixes to "Implement iomap for block_map" (2)
When the connection is aborted, there is no point in
keeping the packets on the write queue until the connection
is closed.
Similar to a27fd7a8ed ('tcp: purge write queue upon RST'),
this is essential for a correct MSG_ZEROCOPY implementation,
because userspace cannot call close(fd) before receiving
zerocopy signals even when the connection is aborted.
Fixes: f214f915e7 ("tcp: enable MSG_ZEROCOPY")
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull s390 fixes from Martin Schwidefsky:
"Nine bug fixes for s390:
- Three fixes for the expoline code, one of them is strictly speaking
a cleanup but as it relates to code added with 4.16 I would like to
include the patch.
- Three timer related fixes in the common I/O layer
- A fix for the handling of internal DASD request which could cause
panics.
- One correction in regard to the accounting of pud page tables vs.
compat tasks.
- The register scrubbing in entry.S caused spurious crashes, this is
fixed now as well"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/entry.S: fix spurious zeroing of r0
s390: Fix runtime warning about negative pgtables_bytes
s390: do not bypass BPENTER for interrupt system calls
s390/cio: clear timer when terminating driver I/O
s390/cio: fix return code after missing interrupt
s390/cio: fix ccw_device_start_timeout API
s390/clean-up: use CFI_* macros in entry.S
s390: Replace IS_ENABLED(EXPOLINE_*) with IS_ENABLED(CONFIG_EXPOLINE_*)
s390/dasd: fix handling of internal requests
Pull regulator fixes from Mark Brown:
"A couple of fixes here:
- another half of the supend to idle fix from Geert that went in
earlier, both he and I are confused as to why he didn't notice that
this was missing when his earlier fix was merged.
- a simple fix for a test done the wrong way round in the
stm32-vrefbuf driver"
* tag 'regulator-fix-v4.16-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: Fix resume from suspend to idle
regulator: stm32-vrefbuf: fix check on ready flag
Pull SCSI fixes from James Bottomley:
"This is mostly fixes for driver specific issues (nine of them) and the
storvsc performance improvement with interrupt handling which was
dropped from the previous fixes pull request.
We also have two regressions: one is a double call_rcu() in ATA error
handling and the other is a missed conversion to BLK_STS_OK in
__scsi_error_from_host_byte()"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: qedi: Fix kernel crash during port toggle
scsi: qla2xxx: Fix FC-NVMe LUN discovery
scsi: core: return BLK_STS_OK for DID_OK in __scsi_error_from_host_byte()
scsi: core: Avoid that ATA error handling can trigger a kernel hang or oops
scsi: qla2xxx: ensure async flags are reset correctly
scsi: qla2xxx: do not check login_state if no loop id is assigned
scsi: qla2xxx: Fixup locking for session deletion
scsi: qla2xxx: Fix NULL pointer crash due to active timer for ABTS
scsi: mpt3sas: wait for and flush running commands on shutdown/unload
scsi: mpt3sas: fix oops in error handlers after shutdown/unload
scsi: storvsc: Spread interrupts when picking a channel for I/O requests
scsi: megaraid_sas: Do not use 32-bit atomic request descriptor for Ventura controllers
It turns out that commit 3229c18c0d6b2 'Fixes to "Implement iomap for
block_map"' introduced another bug in gfs2_iomap_begin that can cause
gfs2_block_map to set bh->b_size of an actual buffer to 0. This can
lead to arbitrary incorrect behavior including crashes or disk
corruption. Revert the incorrect part of that commit.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
dccp_disconnect() sets 'dp->dccps_hc_tx_ccid' tx handler to NULL,
therefore if DCCP socket is disconnected and dccp_sendmsg() is
called after it, it will cause a NULL pointer dereference in
dccp_write_xmit().
This crash and the reproducer was reported by syzbot. Looks like
it is reproduced if commit 69c64866ce ("dccp: CVE-2017-8824:
use-after-free in DCCP code") is applied.
Reported-by: syzbot+f99ab3887ab65d70f816@syzkaller.appspotmail.com
Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
syzkaller found an issue caused by lack of sufficient checks
in l2tp_tunnel_create()
RAW sockets can not be considered as UDP ones for instance.
In another patch, we shall replace all pr_err() by less intrusive
pr_debug() so that syzkaller can find other bugs faster.
Acked-by: Guillaume Nault <g.nault@alphalink.fr>
Acked-by: James Chapman <jchapman@katalix.com>
==================================================================
BUG: KASAN: slab-out-of-bounds in setup_udp_tunnel_sock+0x3ee/0x5f0 net/ipv4/udp_tunnel.c:69
dst_release: dst:00000000d53d0d0f refcnt:-1
Write of size 1 at addr ffff8801d013b798 by task syz-executor3/6242
CPU: 1 PID: 6242 Comm: syz-executor3 Not tainted 4.16.0-rc2+ #253
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:17 [inline]
dump_stack+0x194/0x24d lib/dump_stack.c:53
print_address_description+0x73/0x250 mm/kasan/report.c:256
kasan_report_error mm/kasan/report.c:354 [inline]
kasan_report+0x23b/0x360 mm/kasan/report.c:412
__asan_report_store1_noabort+0x17/0x20 mm/kasan/report.c:435
setup_udp_tunnel_sock+0x3ee/0x5f0 net/ipv4/udp_tunnel.c:69
l2tp_tunnel_create+0x1354/0x17f0 net/l2tp/l2tp_core.c:1596
pppol2tp_connect+0x14b1/0x1dd0 net/l2tp/l2tp_ppp.c:707
SYSC_connect+0x213/0x4a0 net/socket.c:1640
SyS_connect+0x24/0x30 net/socket.c:1621
do_syscall_64+0x280/0x940 arch/x86/entry/common.c:287
entry_SYSCALL_64_after_hwframe+0x42/0xb7
Fixes: fd558d186d ("l2tp: Split pppol2tp patch into separate l2tp and ppp parts")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
inet_evict_bucket() iterates global list, and
several tasks may call it in parallel. All of
them hash the same fq->list_evictor to different
lists, which leads to list corruption.
This patch makes fq be hashed to expired list
only if this has not been made yet by another
task. Since inet_frag_alloc() allocates fq
using kmem_cache_zalloc(), we may rely on
list_evictor is initially unhashed.
The problem seems to exist before async
pernet_operations, as there was possible to have
exit method to be executed in parallel with
inet_frags::frags_work, so I add two Fixes tags.
This also may go to stable.
Fixes: d1fe19444d "inet: frag: don't re-use chainlist for evictor"
Fixes: f84c6821aa "net: Convert pernet_subsys, registered from inet_init()"
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The smsc911x driver will crash if it is rmmod'ed while the netdev
is up like:
Call trace:
phy_detach+0x94/0x150
phy_disconnect+0x40/0x50
smsc911x_stop+0x104/0x128 [smsc911x]
__dev_close_many+0xb4/0x138
dev_close_many+0xbc/0x190
rollback_registered_many+0x140/0x460
rollback_registered+0x68/0xb0
unregister_netdevice_queue+0x100/0x118
unregister_netdev+0x28/0x38
smsc911x_drv_remove+0x58/0x130 [smsc911x]
platform_drv_remove+0x30/0x50
device_release_driver_internal+0x15c/0x1f8
driver_detach+0x54/0x98
bus_remove_driver+0x64/0xe8
driver_unregister+0x34/0x60
platform_driver_unregister+0x20/0x30
smsc911x_cleanup_module+0x14/0xbca8 [smsc911x]
SyS_delete_module+0x1e8/0x238
__sys_trace_return+0x0/0x4
This is caused by the mdiobus being unregistered/free'd
and the code in phy_detach() attempting to manipulate mdio
related structures from unregister_netdev() calling close()
To fix this, we delay the mdiobus teardown until after
the netdev is deregistered.
Reported-by: Matt Sealey <matt.sealey@arm.com>
Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, administrative MTU changes on a given netdevice are
not reflected on route exceptions for MTU-less routes, with a
set PMTU value, for that device:
# ip -6 route get 2001:db8::b
2001:db8::b from :: dev vti_a proto kernel src 2001:db8::a metric 256 pref medium
# ping6 -c 1 -q -s10000 2001:db8::b > /dev/null
# ip netns exec a ip -6 route get 2001:db8::b
2001:db8::b from :: dev vti_a src 2001:db8::a metric 0
cache expires 571sec mtu 4926 pref medium
# ip link set dev vti_a mtu 3000
# ip -6 route get 2001:db8::b
2001:db8::b from :: dev vti_a src 2001:db8::a metric 0
cache expires 571sec mtu 4926 pref medium
# ip link set dev vti_a mtu 9000
# ip -6 route get 2001:db8::b
2001:db8::b from :: dev vti_a src 2001:db8::a metric 0
cache expires 571sec mtu 4926 pref medium
The first issue is that since commit fb56be83e4 ("net-ipv6: on
device mtu change do not add mtu to mtu-less routes") we don't
call rt6_exceptions_update_pmtu() from rt6_mtu_change_route(),
which handles administrative MTU changes, if the regular route
is MTU-less.
However, PMTU exceptions should be always updated, as long as
RTAX_MTU is not locked. Keep the check for MTU-less main route,
as introduced by that commit, but, for exceptions,
call rt6_exceptions_update_pmtu() regardless of that check.
Once that is fixed, one problem remains: MTU changes are not
reflected if the new MTU is higher than the previous one,
because rt6_exceptions_update_pmtu() doesn't allow that. We
should instead allow PMTU increase if the old PMTU matches the
local MTU, as that implies that the old MTU was the lowest in the
path, and PMTU discovery might lead to different results.
The existing check in rt6_mtu_change_route() correctly took that
case into account (for regular routes only), so factor it out
and re-use it also in rt6_exceptions_update_pmtu().
While at it, fix comments style and grammar, and try to be a bit
more descriptive.
Reported-by: Xiumei Mu <xmu@redhat.com>
Fixes: fb56be83e4 ("net-ipv6: on device mtu change do not add mtu to mtu-less routes")
Fixes: f5bbe7ee79 ("ipv6: prepare rt6_mtu_change() for exception table")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes the warning messages/call traces seen if DMA debug is
enabled, In case of fragmented skb's memory was allocated using
dma_map_page but freed using dma_unmap_single. This patch modifies buffer
allocations in TX path to use dma_map_page in all the places and
dma_unmap_page while freeing the buffers.
Signed-off-by: Hemanth Puranik <hpuranik@codeaurora.org>
Acked-by: Timur Tabi <timur@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rdma requires ILT Memory to be allocated for it's QPs.
Each ILT entry points to a page used by several Rdma QPs.
To avoid allocating all the memory in advance, the rdma
implementation dynamically allocates memory as more QPs are
added, however it does not dynamically free the memory.
The memory should have been freed on rmmod qedr, but isn't.
This patch adds the memory freeing on rmmod qedr (currently
it will be freed with qed is removed).
An outcome of this bug, is that if qedr is unloaded and loaded
without unloaded qed, there will be no more RoCE traffic.
The reason these are related, is that the logic of detecting the
first QP ever opened is by asking whether ILT memory for RoCE has
been allocated.
In addition, this patch modifies freeing of the Task context to
always use the PROTOCOLID_ROCE and not the protocol passed,
this is because task context for iWARP and ROCE both use the
ROCE protocol id, as opposed to the connection context.
Fixes: dbb799c397 ("qed: Initialize hardware for new protocols")
Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jeff Kirsher says:
====================
Intel Wired LAN Driver Updates 2018-03-05
This series contains fixes to e1000e only.
Benjamin Poirier provides all but one fix in this series, starting with
workaround for a VMWare e1000e emulation issue where ICR reads 0x0 on
the emulated device. Partially reverted a previous commit dealing with
the "Other" interrupt throttling to avoid unforeseen fallout from these
changes that are not strictly necessary. Restored the ICS write for
receive and transmit queue interrupts in the case that txq or rxq bits
were set in ICR and the Other interrupt handler read and cleared ICR
before the queue interrupt was raised. Fixed an bug where interrupts
may be missed if ICR is read while INT_ASSERTED is not set, so avoid the
problem by setting all bits related to events that can trigger the Other
interrupt in IMS. Fixed the return value for check_for_link() when
auto-negotiation is off.
Pierre-Yves Kerbrat fixes e1000e to use dma_zalloc_coherent() to make
sure the ring is memset to 0 to prevent the area from containing
garbage.
v2: added an additional e1000e fix to the series
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The subordinate value indicates the highest bus number which can be
reached downstream though a certain device.
Commit a20c7f36bd ("PCI: Do not allocate more buses than available in
parent") ensures that downstream devices cannot assign busnumbers higher
than the upstream device subordinate number, which was indeed illogical.
By default, dw_pcie_setup_rc() inits the Root Complex subordinate to a
value of 0x01.
Due to this combined with above commit, enumeration stops digging deeper
downstream as soon as bus num 0x01 has been assigned, which is always the
case for a bridge device.
This results in all devices behind a bridge bus remaining undetected, as
these would be connected to bus 0x02 or higher.
Fix this by initializing the RC to a subordinate value of 0xff, which is
not altering hardware behaviour in any way, but informs probing function
pci_scan_bridge() later on which reads this value back from register.
The following nasty errors during boot are also fixed by this:
pci_bus 0000:02: busn_res: can not insert [bus 02-ff] under [bus 01] (conflicts with (null) [bus 01])
...
pci_bus 0000:03: [bus 03] partially hidden behind bridge 0000:01 [bus 01]
...
pci_bus 0000:04: [bus 04] partially hidden behind bridge 0000:01 [bus 01]
...
pci_bus 0000:05: [bus 05] partially hidden behind bridge 0000:01 [bus 01]
pci_bus 0000:02: busn_res: [bus 02-ff] end is updated to 05
pci_bus 0000:02: busn_res: can not insert [bus 02-05] under [bus 01] (conflicts with (null) [bus 01])
pci_bus 0000:02: [bus 02-05] partially hidden behind bridge 0000:01 [bus 01]
Fixes: a20c7f36bd ("PCI: Do not allocate more buses than available in
parent")
Tested-by: Niklas Cassel <niklas.cassel@axis.com>
Tested-by: Fabio Estevam <fabio.estevam@nxp.com>
Tested-by: Sebastian Reichel <sebastian.reichel@collabora.co.uk>
Signed-off-by: Koen Vandeputte <koen.vandeputte@ncentric.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Acked-by: Lucas Stach <l.stach@pengutronix.de>
Cc: stable@vger.kernel.org # v4.15+
Cc: Binghui Wang <wangbinghui@hisilicon.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: Jianguo Sun <sunjianguo1@huawei.com>
Cc: Jingoo Han <jingoohan1@gmail.com>
Cc: Kishon Vijay Abraham I <kishon@ti.com>
Cc: Lucas Stach <l.stach@pengutronix.de>
Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Minghuan Lian <minghuan.Lian@freescale.com>
Cc: Mingkai Hu <mingkai.hu@freescale.com>
Cc: Murali Karicheri <m-karicheri2@ti.com>
Cc: Pratyush Anand <pratyush.anand@gmail.com>
Cc: Richard Zhu <hongxing.zhu@nxp.com>
Cc: Roy Zang <tie-fei.zang@freescale.com>
Cc: Shawn Guo <shawn.guo@linaro.org>
Cc: Stanimir Varbanov <svarbanov@mm-sol.com>
Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Cc: Xiaowei Song <songxiaowei@hisilicon.com>
Cc: Zhou Wang <wangzhou1@hisilicon.com>
When we exceed current packets limit and we have more than one
segment in the list returned by skb_gso_segment(), netem drops
only the first one, skipping the rest, hence kmemleak reports:
unreferenced object 0xffff880b5d23b600 (size 1024):
comm "softirq", pid 0, jiffies 4384527763 (age 2770.629s)
hex dump (first 32 bytes):
00 80 23 5d 0b 88 ff ff 00 00 00 00 00 00 00 00 ..#]............
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<00000000d8a19b9d>] __alloc_skb+0xc9/0x520
[<000000001709b32f>] skb_segment+0x8c8/0x3710
[<00000000c7b9bb88>] tcp_gso_segment+0x331/0x1830
[<00000000c921cba1>] inet_gso_segment+0x476/0x1370
[<000000008b762dd4>] skb_mac_gso_segment+0x1f9/0x510
[<000000002182660a>] __skb_gso_segment+0x1dd/0x620
[<00000000412651b9>] netem_enqueue+0x1536/0x2590 [sch_netem]
[<0000000005d3b2a9>] __dev_queue_xmit+0x1167/0x2120
[<00000000fc5f7327>] ip_finish_output2+0x998/0xf00
[<00000000d309e9d3>] ip_output+0x1aa/0x2c0
[<000000007ecbd3a4>] tcp_transmit_skb+0x18db/0x3670
[<0000000042d2a45f>] tcp_write_xmit+0x4d4/0x58c0
[<0000000056a44199>] tcp_tasklet_func+0x3d9/0x540
[<0000000013d06d02>] tasklet_action+0x1ca/0x250
[<00000000fcde0b8b>] __do_softirq+0x1b4/0x5a3
[<00000000e7ed027c>] irq_exit+0x1e2/0x210
Fix it by adding the rest of the segments, if any, to skb 'to_free'
list. Add new __qdisc_drop_all() and qdisc_drop_all() functions
because they can be useful in the future if we need to drop segmented
GSO packets in other places.
Fixes: 6071bd1aa1 ("netem: Segment GSO packets on enqueue")
Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Johan Hedberg says:
====================
pull request: bluetooth 2018-03-05
Here are a few more Bluetooth fixes for the 4.16 kernel:
- btusb: reset/resume fixes for Yoga 920 and Dell OptiPlex 3060
- Fix for missing encryption refresh with the Security Manager protocol
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Paul Blakey says:
====================
rhashtable: Fix rhltable duplicates insertion
On our mlx5 driver fs_core.c, we use the rhltable interface to store
flow groups. We noticed that sometimes we get a warning that flow group isn't
found at removal. This rare case was caused when a specific scenario happened,
insertion of a flow group with a similar match criteria (a duplicate),
but only where the flow group rhash_head was second (or not first)
on the relevant rhashtable bucket list.
The first patch fixes it, and the second one adds a test that show
it is now working.
Paul.
v3 --> v2 changes:
* added missing fix in rhashtable_lookup_one code path as well.
v1 --> v2 changes:
* Changed commit messages to better reflect the change
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Tries to insert duplicates in the middle of bucket's chain:
bucket 1: [[val 21 (tid=1)]] -> [[ val 1 (tid=2), val 1 (tid=0) ]]
Reuses tid to distinguish the elements insertion order.
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
When inserting duplicate objects (those with the same key),
current rhlist implementation messes up the chain pointers by
updating the bucket pointer instead of prev next pointer to the
newly inserted node. This causes missing elements on removal and
travesal.
Fix that by properly updating pprev pointer to point to
the correct rhash_head next pointer.
Issue: 1241076
Change-Id: I86b2c140bcb4aeb10b70a72a267ff590bb2b17e7
Fixes: ca26893f05 ('rhashtable: Add rhlist interface')
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 2b05f6ae1e ("ARM: ux500: remove PMU IRQ bouncer")
deleted some code to bounce and work around the weird PMU
IRQs in the DB8500 ASIC, but did a semantic mistake:
since the auxdata was now unused, the call to
of_platform_populate() was removed, but this does not
work: the default platform population will only kick in
if .init_machine() is assigned NULL, and since the U8540
was still using the callback that was not the case.
Fix this by reinstating the call to of_platform_populate(),
but pass NULL as auxdata.
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Fixes: 2b05f6ae1e ("ARM: ux500: remove PMU IRQ bouncer")
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
The Stream Buffer for EtherAVB-IF (STBE) is an optional component, and
is not present on all SoCs.
Document this in the DT bindings, including a list of SoCs that do have
it.
Fixes: 785ec87483 ("ravb: document R8A77970 bindings")
Fixes: f231c4178a ("dt-bindings: net: renesas-ravb: Add support for R8A77995 RAVB")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Simon Horman <horms+renesas@verge.net.au>
Acked-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The firmware has a requirement that the P2P_DEVICE address should
be different from the address of the primary interface. When not
specified by user-space, the driver generates the MAC address for
the P2P_DEVICE interface using the MAC address of the primary
interface and setting the locally administered bit. However, the MAC
address of the primary interface may already have that bit set causing
the creation of the P2P_DEVICE interface to fail with -EBUSY. Fix this
by using a random address instead to determine the P2P_DEVICE address.
Cc: stable@vger.kernel.org # 3.10.y
Reported-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
The feature module needs to evaluate the actual firmware error return
upon a control command. This adds a flag to struct brcmf_if that the
caller can set. This flag is checked to determine the error code that
needs to be returned.
Fixes: b69c1df472 ("brcmfmac: separate firmware errors from i/o errors")
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Fixing arbitrary kernel leak in case FBIOGETCMAP_SPARC in
sbusfb_ioctl_helper().
'index' is defined as an int in sbusfb_ioctl_helper().
We retrieve this from the user:
if (get_user(index, &c->index) ||
__get_user(count, &c->count) ||
__get_user(ured, &c->red) ||
__get_user(ugreen, &c->green) ||
__get_user(ublue, &c->blue))
return -EFAULT;
and then we use 'index' in the following way:
red = cmap->red[index + i] >> 8;
green = cmap->green[index + i] >> 8;
blue = cmap->blue[index + i] >> 8;
This is a classic information leak vulnerability. 'index' should be
an unsigned int, given its usage above.
This patch is straight-forward; it changes 'index' to unsigned int
in two switch-cases: FBIOGETCMAP_SPARC && FBIOPUTCMAP_SPARC.
This patch fixes CVE-2018-6412.
Signed-off-by: Peter Malone <peter.malone@gmail.com>
Acked-by: Mathieu Malaterre <malat@debian.org>
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Add some hints about overlayfs kernel config options.
Enabling NFS export by default is especially recommended against, as it
incurs a performance penalty even if the filesystem is not actually
exported.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
This reverts commit e9a48034d7.
The slaves and holders link for the hidden gendisks confuse lsblk so that
it errors out on, or doesn't report the nvme multipath devices. Given
that we don't need holder relationships for something that can't even be
directly accessed we should just stop creating those links.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reported-by: Potnuri Bharat Teja <bharat@chelsio.com>
Cc: stable@vger.kernel.org
Signed-off-by: Keith Busch <keith.busch@intel.com>
Artem Savkov reported that commit 5efec5c655 leads to a packet loss under
IPSec configuration. It appears that his setup consists of a TUN device,
which does not have a MAC header.
Make sure MAC header exists.
Note: TUN device sets a MAC header pointer, although it does not have one.
Fixes: 5efec5c655 ("xfrm: Fix eth_hdr(skb)->h_proto to reflect inner IP version")
Reported-by: Artem Savkov <artem.savkov@gmail.com>
Tested-by: Artem Savkov <artem.savkov@gmail.com>
Signed-off-by: Yossi Kuperman <yossiku@mellanox.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Pull perf/urgent fixes from Arnaldo Carvalho de Melo:
- Be more robust when drawing arrows in the annotation TUI, avoiding a
segfault when jump instructions have as a target addresses in functions
other that the one currently being annotated. The full fix will come in
the following days, when jumping to other functions will work as call
instructions (Arnaldo Carvalho de Melo)
- Prevent auxtrace_queues__process_index() from queuing AUX area data for
decoding when the --no-itrace option has been used (Adrian Hunter)
- Sync copy of kvm UAPI headers and x86's cpufeatures.h (Arnaldo Carvalho de Melo)
- Fix 'perf stat' CSV output format for non-supported counters (Ilya Pronin)
- Fix crash in 'perf record|perf report' pipe mode (Jiri Olsa)
- Fix annoying 'perf top' overwrite fallback message on older kernels (Kan Liang)
- Fix the usage on the 'perf kallsyms' man page (Sangwon Hong)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Hitting the following hardlockup due to a race condition in
error CQE processing.
[26146.879798] bnxt_en 0000:04:00.0: QPLIB: FP: CQ Processed Req
[26146.886346] bnxt_en 0000:04:00.0: QPLIB: wr_id[1251] = 0x0 with status 0xa
[26156.350935] NMI watchdog: Watchdog detected hard LOCKUP on cpu 4
[26156.357470] Modules linked in: nfsd auth_rpcgss nfs_acl lockd grace
[26156.447957] CPU: 4 PID: 3413 Comm: kworker/4:1H Kdump: loaded
[26156.457994] Hardware name: Dell Inc. PowerEdge R430/0CN7X8,
[26156.466390] Workqueue: ib-comp-wq ib_cq_poll_work [ib_core]
[26156.472639] Call Trace:
[26156.475379] <NMI> [<ffffffff98d0d722>] dump_stack+0x19/0x1b
[26156.481833] [<ffffffff9873f775>] watchdog_overflow_callback+0x135/0x140
[26156.489341] [<ffffffff9877f237>] __perf_event_overflow+0x57/0x100
[26156.496256] [<ffffffff98787c24>] perf_event_overflow+0x14/0x20
[26156.502887] [<ffffffff9860a580>] intel_pmu_handle_irq+0x220/0x510
[26156.509813] [<ffffffff98d16031>] perf_event_nmi_handler+0x31/0x50
[26156.516738] [<ffffffff98d1790c>] nmi_handle.isra.0+0x8c/0x150
[26156.523273] [<ffffffff98d17be8>] do_nmi+0x218/0x460
[26156.528834] [<ffffffff98d16d79>] end_repeat_nmi+0x1e/0x7e
[26156.534980] [<ffffffff987089c0>] ? native_queued_spin_lock_slowpath+0x1d0/0x200
[26156.543268] [<ffffffff987089c0>] ? native_queued_spin_lock_slowpath+0x1d0/0x200
[26156.551556] [<ffffffff987089c0>] ? native_queued_spin_lock_slowpath+0x1d0/0x200
[26156.559842] <EOE> [<ffffffff98d083e4>] queued_spin_lock_slowpath+0xb/0xf
[26156.567555] [<ffffffff98d15690>] _raw_spin_lock+0x20/0x30
[26156.573696] [<ffffffffc08381a1>] bnxt_qplib_lock_buddy_cq+0x31/0x40 [bnxt_re]
[26156.581789] [<ffffffffc083bbaa>] bnxt_qplib_poll_cq+0x43a/0xf10 [bnxt_re]
[26156.589493] [<ffffffffc083239b>] bnxt_re_poll_cq+0x9b/0x760 [bnxt_re]
The issue happens if RQ poll_cq or SQ poll_cq or Async error event tries to
put the error QP in flush list. Since SQ and RQ of each error qp are added
to two different flush list, we need to protect it using locks of
corresponding CQs. Difference in order of acquiring the lock in
SQ poll_cq and RQ poll_cq can cause a hard lockup.
Revisits the locking strategy and removes the usage of qplib_cq.hwq.lock.
Instead of this lock, introduces qplib_cq.flush_lock to handle
addition/deletion of QPs in flush list. Also, always invoke the flush_lock
in order (SQ CQ lock first and then RQ CQ lock) to avoid any potential
deadlock.
Other than the poll_cq context, the movement of QP to/from flush list can
be done in modify_qp context or from an async error event from HW.
Synchronize these operations using the bnxt_re verbs layer CQ locks.
To achieve this, adds a call back to the HW abstraction layer(qplib) to
bnxt_re ib_verbs layer in case of async error event. Also, removes the
buddy cq functions as it is no longer required.
Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Fix warning limit for kernel stack consumption:
drivers/infiniband/core/cq.c: In function 'ib_process_cq_direct':
drivers/infiniband/core/cq.c:78:1: error: the frame size of 1032 bytes
is larger than 1024 bytes [-Werror=frame-larger-than=]
Using smaller ib_wc array on the stack brings us comfortably below that
limit again.
Fixes: 246d8b184c ("IB/cq: Don't force IB_POLL_DIRECT poll context for ib_process_cq_direct")
Reported-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Sergey Gorenko <sergeygo@mellanox.com>
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
"err" is either zero or possibly uninitialized here. It should be
-EINVAL.
Fixes: 427c1e7bcd ("{IB, net}/mlx5: Move the modify QP operation table to mlx5_ib")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The series that introduced dual port RoCE mode assumed that we don't have
a dual port HCA that use the mlx5 driver, this is not the case for
Connect-IB HCAs. This reasoning led to assigning 1 as the native port
index which causes issue when the second port is used.
For example query_pkey() when called on the second port will return values
of the first port. Make sure that we assign the right port index as the
native port index.
Fixes: 32f69e4be2 ("{net, IB}/mlx5: Manage port association for multiport RoCE")
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The commit cited below added a gid_type field (RoCEv1 or RoCEv2)
to GID properties.
When adding GIDs, this gid_type field was copied over to the
hardware gid table. However, when deleting GIDs, the gid_type field
was not copied over to the hardware gid table.
As a result, when running RoCEv2, all RoCEv2 gids in the
hardware gid table were set to type RoCEv1 when any gid was deleted.
This problem would persist until the next gid was added (which would again
restore the gid_type field for all the gids in the hardware gid table).
Fix this by copying over the gid_type field to the hardware gid table
when deleting gids, so that the gid_type of all remaining gids is
preserved when a gid is deleted.
Fixes: b699a859d1 ("IB/mlx4: Add gid_type to GID properties")
Reviewed-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
When using IPv4 addresses in RoCEv2, the GID format for the mapped
IPv4 address should be: ::ffff:<4-byte IPv4 address>.
In the cited commit, IPv4 mapped IPV6 addresses had the 3 upper dwords
zeroed out by memset, which resulted in deleting the ffff field.
However, since procedure ipv6_addr_v4mapped() already verifies that the
gid has format ::ffff:<ipv4 address>, no change is needed for the gid,
and the memset can simply be removed.
Fixes: 7e57b85c44 ("IB/mlx4: Add support for setting RoCEv2 gids in hardware")
Reviewed-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
If the read-only flag is true on a SCSI disk, re-reading the partition
table sets the flag back to false.
To observe this bug, you can run:
1. blockdev --setro /dev/sda
2. blockdev --rereadpt /dev/sda
3. blockdev --getro /dev/sda
This commit reads the disk's old state and combines it with the device
disk-reported state rather than unconditionally marking it as RW.
Reported-by: Li Ning <lining916740672@icloud.com>
Signed-off-by: Jeremy Cline <jeremy@jcline.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Because of the shifting around of code in qla2x00_probe_one recently,
failures during adapter initialization can lead to problems, i.e. NULL
pointer crashes and doubly freed data structures which cause eventual
panics.
This V2 version makes the relevant memory free routines idempotent, so
repeat calls won't cause any harm. I also removed the problematic
probe_init_failed exit point as it is not needed.
Fixes: d64d6c5671 ("scsi: qla2xxx: Fix NULL pointer crash due to probe failure")
Signed-off-by: Bill Kuzeja <william.kuzeja@stratus.com>
Acked-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
iWARP does not support RDMA WRITE or SEND with immediate data.
Driver should check this before submitting to FW and return an
immediate error
Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Race in qedr_poll_cq, lastest_cqe wasn't protected by lock,
leading to a case where two context's accessing poll_cq at
the same time lead to one of them having a pointer to an old
latest_cqe and reading an invalid cqe element
Signed-off-by: Amit Radzi <Amit.Radzi@cavium.com>
Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Fix iWARP connect and listen to use the mapped port for
ipv4 and ipv6. Without this fixed, running on a server
that has iwpmd enabled will not use the correct port
Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
In practice this is really only meaningful in the context of the DM
multipath target (which uses dm_table_set_type() to set the type of
device DM should create via its "queue_mode" option).
So this change allows a DM multipath device with "queue_mode bio" to be
upgraded from DM_TYPE_BIO_BASED to DM_TYPE_NVME_BIO_BASED -- iff the
underlying device(s) are NVMe.
DM_TYPE_NVME_BIO_BASED is just a DM core implementation detail that
allows for NVMe-specific optimizations (e.g. use direct_make_request
instead of generic_make_request). If in the future there is no benefit
or need to distinguish NVMe vs not: then it will be removed.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
This eliminates the "queue_mode" configuration's "nvme" mode. There
wasn't anything NVMe-specific about that mode. It was named "nvme"
because it was a short name for the mode. But the entire point of the
mode was to optimize the multipath target for underlying devices that
are _not_ SCSI-based. Devices that aren't SCSI have no need for the
various SCSI device handler (scsi_dh) specific code in DM multipath.
But rather than narrowly define this scsi_dh vs not branching in terms
of "nvme": invert the logic so that we're just checking whether a
multipath device is layered on SCSI devices with scsi_dh attached.
This allows any future storage technology to avoid scsi_dh specific code
in the multipath target too.
Fixes: 848b8aefd4 ("dm mpath: optimize NVMe bio-based support")
Suggested-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
The strncmp function should compare 4 bytes.
Fixes: 22c11858e8 ("dm: introduce DM_TYPE_NVME_BIO_BASED")
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Upstream commit 4102d9de6d ("dm raid: fix rs_get_progress()
synchronization state/ratio") in combination with commit 7c29744ecc
("dm raid: simplify rs_get_progress()") introduced a regression by
incorrectly reporting a sync_ratio of 0 for degraded raid sets. This
caused lvm2 to fail to repair raid legs automatically.
Fix by identifying the degraded state by checking the MD_RECOVERY_INTR
flag and returning mddev->recovery_cp in case it is set.
MD sets recovery = [ MD_RECOVERY_RECOVER MD_RECOVERY_INTR
MD_RECOVERY_NEEDED ] when a RAID member fails. It then shuts down any
sync thread that is running and leaves us with all MD_RECOVERY_* flags
cleared. The bug occurs if a status is requested in the short time it
takes to shut down any sync thread and clear the flags, because we were
keying in on the MD_RECOVERY_NEEDED - understanding it to be the initial
phase of a “recover” sync thread. However, this is an incorrect
interpretation if MD_RECOVERY_INTR is also set.
This also explains why the bug only happened when automatic repair was
enabled and not a normal ‘manual’ method. It is impossible to react
quick enough to hit the problematic window without it being automated.
Fix passes automatic repair tests.
Fixes: 7c29744ecc ("dm raid: simplify rs_get_progress()")
Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Otherwise an underlying device's teardown (e.g. SCSI) may race with the
DM ioctl or persistent reservation and result in dereferencing driver
memory that gets freed when the underlying device's final blkdev_put()
occurs.
bdgrab() only increases the refcount for the block_device's inode to
ensure the block_device struct itself will not be freed, but does not
guarantee the block_device will remain associated with the gendisk or
its storage.
Cc: stable@vger.kernel.org # 4.8+
Reported-by: David Jeffery <djeffery@redhat.com>
Suggested-by: David Jeffery <djeffery@redhat.com>
Reviewed-by: Ben Marzinski <bmarzins@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
gcc-6.3 and earlier show a new warning after a seemingly unrelated
change to the arm64 PAGE_KERNEL definition:
In file included from drivers/md/dm-bufio.c:14:0:
drivers/md/dm-bufio.c: In function 'alloc_buffer':
include/linux/sched/mm.h:182:56: warning: 'noio_flag' may be used uninitialized in this function [-Wmaybe-uninitialized]
current->flags = (current->flags & ~PF_MEMALLOC_NOIO) | flags;
^
The same warning happened earlier on linux-3.18 for MIPS and I did a
workaround for that, but now it's come back.
gcc-7 and newer are apparently smart enough to figure this out, and
other architectures don't show it, so the best I could come up with is
to rework the caller slightly in a way that makes it obvious enough to
all arm64 compilers what is happening here.
Fixes: 41acec6240 ("arm64: kpti: Make use of nG dependent on arm64_kernel_unmapped_at_el0()")
Link: https://patchwork.kernel.org/patch/9692829/
Cc: stable@vger.kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
[snitzer: moved declarations inside conditional, altered vmalloc return]
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Compilation of bpftool on a distro that lacks eBPF support in the installed
kernel headers fails with:
common.c: In function ‘is_bpffs’:
common.c:96:40: error: ‘BPF_FS_MAGIC’ undeclared (first use in this function)
return (unsigned long)st_fs.f_type == BPF_FS_MAGIC;
^
Fix this the same way it is already in tools/lib/bpf/libbpf.c and
tools/lib/api/fs/fs.c.
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Only allow ifindex from IP_PKTINFO to override SO_BINDTODEVICE settings
if the index is actually set in the message.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull sigingo fix from Eric Biederman:
"The kbuild test robot found that I accidentally moved si_pkey when I
was cleaning up siginfo_t. A short followed by an int with the int
having 8 byte alignment. Sheesh siginfo_t is a weird structure.
I have now corrected it and added build time checks that with a little
luck will catch any similar future mistakes. The build time checks
were sufficient for me to verify the bug and to verify my fix. So they
are at least useful this once."
* 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
signal/x86: Include the field offsets in the build time checks
signal: Correct the offset of si_pkey in struct siginfo
devm_memremap_pages() was re-worked in e8d5134833 "memremap: change
devm_memremap_pages interface to use struct dev_pagemap" to take a
caller allocated struct dev_pagemap as a function parameter. A call to
devres_free() was left in the error cleanup path which results in a
kernel panic if the remap fails for some reason. Remove it to fix the
panic and let devm_memremap_pages() fail gracefully.
Fixes: e8d5134833 ("memremap: change devm_memremap_pages interface...")
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Fix bugs in signaling the Hyper-V host when freeing space in the
host->guest ring buffer:
1. The interrupt_mask must not be used to determine whether to signal
on the host->guest ring buffer
2. The ring buffer write_index must be read (via hv_get_bytes_to_write)
*after* pending_send_sz is read in order to avoid a race condition
3. Comparisons with pending_send_sz must treat the "equals" case as
not-enough-space
4. Don't signal if the pending_send_sz feature is not present. Older
versions of Hyper-V that don't implement this feature will poll.
Fixes: 03bad714a1 ("vmbus: more host signalling avoidance")
Cc: Stable <stable@vger.kernel.org> # 4.14 and above
Signed-off-by: Michael Kelley <mhkelley@outlook.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 57e6f0d7b8 ("typec: tcpm: Only request matching pdos") is causing
a regression, before this commit e.g. the GPD win and GPD pocket devices
were charging at 9V 3A with a PD charger, now they are instead slowly
discharging at 5V 0.4A, as this commit causes the ports max_snk_mv/ma/mw
settings to be completely ignored.
Arguably the way to fix this would be to add a PDO_VAR() describing the
voltage range to the snk_caps of boards which can handle any voltage in
their range, but the "typec: tcpm: Only request matching pdos" commit
looks at the type of PDO advertised by the source/charger and if that
is fixed (as it typically is) only compairs against PDO_FIXED entries
in the snk_caps so supporting a range of voltage would require adding a
PDO_FIXED entry for *every possible* voltage to snk_caps.
AFAICT there is no reason why a fixed source_cap cannot be matched against
a variable snk_cap, so at a minimum the commit should be rewritten to
support that.
For now lets revert the "typec: tcpm: Only request matching pdos" commit,
fixing the regression.
Fixes: 57e6f0d7b8 ("typec: tcpm: Only request matching pdos")
Cc: Badhri Jagan Sridharan <badhri@google.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Acked-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Acked-by: Adam Thomson <Adam.Thomson.Opensource@diasemi.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Without pm_runtime_{get,put}_sync calls in place, reading
vbus status via /sys causes the following error:
Unhandled fault: external abort on non-linefetch (0x1028) at 0xfa0ab060
pgd = b333e822
[fa0ab060] *pgd=48011452(bad)
[<c05261b0>] (musb_default_readb) from [<c0525bd0>] (musb_vbus_show+0x58/0xe4)
[<c0525bd0>] (musb_vbus_show) from [<c04c0148>] (dev_attr_show+0x20/0x44)
[<c04c0148>] (dev_attr_show) from [<c0259f74>] (sysfs_kf_seq_show+0x80/0xdc)
[<c0259f74>] (sysfs_kf_seq_show) from [<c0210bac>] (seq_read+0x250/0x448)
[<c0210bac>] (seq_read) from [<c01edb40>] (__vfs_read+0x1c/0x118)
[<c01edb40>] (__vfs_read) from [<c01edccc>] (vfs_read+0x90/0x144)
[<c01edccc>] (vfs_read) from [<c01ee1d0>] (SyS_read+0x3c/0x74)
[<c01ee1d0>] (SyS_read) from [<c0106fe0>] (ret_fast_syscall+0x0/0x54)
Solution was suggested by Tony Lindgren <tony@atomide.com>.
Signed-off-by: Merlijn Wajer <merlijn@wizzup.org>
Acked-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Bin Liu <b-liu@ti.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Corsair Strafe RGB keyboard does not respond to usb control messages
sometimes and hence generates timeouts.
Commit de3af5bf25 ("usb: quirks: add delay init quirk for Corsair
Strafe RGB keyboard") tried to fix those timeouts by adding
USB_QUIRK_DELAY_INIT.
Unfortunately, even with this quirk timeouts of usb_control_msg()
can still be seen, but with a lower frequency (approx. 1 out of 15):
[ 29.103520] usb 1-8: string descriptor 0 read error: -110
[ 34.363097] usb 1-8: can't set config #1, error -110
Adding further delays to different locations where usb control
messages are issued just moves the timeouts to other locations,
e.g.:
[ 35.400533] usbhid 1-8:1.0: can't add hid device: -110
[ 35.401014] usbhid: probe of 1-8:1.0 failed with error -110
The only way to reliably avoid those issues is having a pause after
each usb control message. In approx. 200 boot cycles no more timeouts
were seen.
Addionaly, keep USB_QUIRK_DELAY_INIT as it turned out to be necessary
to have the delay in hub_port_connect() after hub_port_init().
The overall boot time seems not to be influenced by these additional
delays, even on fast machines and lightweight distributions.
Fixes: de3af5bf25 ("usb: quirks: add delay init quirk for Corsair Strafe RGB keyboard")
Cc: stable@vger.kernel.org
Signed-off-by: Danilo Krummrich <danilokrummrich@dk-develop.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
KVM: s390: Fixes
- Fix random memory corruption when running as guest2 (e.g. KVM in
LPAR) and starting guest3 (nested KVM) with many CPUs (e.g. a nested
guest with 200 vcpu)
- io interrupt delivery counter was not exported
Fixes for PPC KVM:
- Fix guest time accounting in the host
- Fix large-page backing for radix guests on POWER9
- Fix HPT guests on POWER9 backed by 2M or 1G pages
- Compile fixes for some configs and gcc versions
Correct spelling of National Semi-conductor (no hyphen)
in drivers/net/ethernet/.
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli says:
====================
net: Use strlcpy() for ethtool::get_strings
After turning on KASAN on one of my systems, I started getting lots of out of
bounds errors while fetching a given port's statistics, and indeed using
memcpy() is unsafe for copying strings which have not been declared as an array
of ETH_GSTRING_LEN bytes, so let's use strlcpy() instead. This allows the best
of both worlds: we still keep the efficient memory usage of variably sized
strings, but we don't copy more than we need to.
Changes in v2:
- dropped the 3 other patches that were not necessary
- use strlcpy() instead of strncpy()
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Our statistics strings are allocated at initialization without being
bound to a specific size, yet, we would copy ETH_GSTRING_LEN bytes using
memcpy() which would create out of bounds accesses, this was flagged by
KASAN. Replace this with strlcpy() to make sure we are bound the source
buffer size and we also always NUL-terminate strings.
Fixes: 820ee17b8d ("net: phy: broadcom: Add support code for reading PHY counters")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Our statistics strings are allocated at initialization without being
bound to a specific size, yet, we would copy ETH_GSTRING_LEN bytes using
memcpy() which would create out of bounds accesses, this was flagged by
KASAN. Replace this with strlcpy() to make sure we are bound the source
buffer size and we also always NUL-terminate strings.
Fixes: 2b2427d064 ("phy: micrel: Add ethtool statistics counters")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Our statistics strings are allocated at initialization without being
bound to a specific size, yet, we would copy ETH_GSTRING_LEN bytes using
memcpy() which would create out of bounds accesses, this was flagged by
KASAN. Replace this with strlcpy() to make sure we are bound the source
buffer size and we also always NUL-terminate strings.
Fixes: d2fa47d9dd ("phy: marvell: Add ethtool statistics counters")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Our statistics strings are allocated at initialization without being
bound to a specific size, yet, we would copy ETH_GSTRING_LEN bytes using
memcpy() which would create out of bounds accesses, this was flagged by
KASAN. Replace this with strlcpy() to make sure we are bound the source
buffer size and we also always NUL-terminate strings.
Fixes: 967dd82ffc ("net: dsa: b53: Add support for Broadcom RoboSwitch")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ashmem_mutex may create a chain of dependencies like:
CPU0 CPU1
mmap syscall ioctl syscall
-> mmap_sem (acquired) -> ashmem_ioctl
-> ashmem_mmap -> ashmem_mutex (acquired)
-> ashmem_mutex (try to acquire) -> copy_from_user
-> mmap_sem (try to acquire)
There is a lock odering problem between mmap_sem and ashmem_mutex causing
a lockdep splat[1] during a syzcaller test. This patch fixes the problem
by move copy_from_user out of ashmem_mutex.
[1] https://www.spinics.net/lists/kernel/msg2733200.html
Fixes: ce8a3a9e76 (staging: android: ashmem: Fix a race condition in pin ioctls)
Reported-by: syzbot+d7a918a7a8e1c952bc36@syzkaller.appspotmail.com
Signed-off-by: Yisheng Xie <xieyisheng1@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Second batch of iwlwifi fixes intended for 4.16:
* Fix CSA issues with count 0 and 1;
* Some firmware debugging fixes;
* Removed a wrong error message when removing keys;
* Fix a firmware sysassert most usually triggered in IBSS;
* A couple of fixes on multicast queues;
* A fix with CCMP 256;
trigger_on() means that the trigger is available but not ready, however
trigger_on() was making it ready. That can segfault if the signal comes
before trigger_ready(). e.g. (USR2 signal delivery not shown)
$ perf record -e intel_pt//u -S sleep 1
perf: Segmentation fault
Obtained 16 stack frames.
/home/ahunter/bin/perf(sighandler_dump_stack+0x40) [0x4ec550]
/lib/x86_64-linux-gnu/libc.so.6(+0x36caf) [0x7fa76411acaf]
/home/ahunter/bin/perf(perf_evsel__disable+0x26) [0x4b9dd6]
/home/ahunter/bin/perf() [0x43a45b]
/lib/x86_64-linux-gnu/libc.so.6(+0x36caf) [0x7fa76411acaf]
/lib/x86_64-linux-gnu/libc.so.6(__xstat64+0x15) [0x7fa7641d2cc5]
/home/ahunter/bin/perf() [0x4ec6c9]
/home/ahunter/bin/perf() [0x4ec73b]
/home/ahunter/bin/perf() [0x4ec73b]
/home/ahunter/bin/perf() [0x4ec73b]
/home/ahunter/bin/perf() [0x4eca15]
/home/ahunter/bin/perf(machine__create_kernel_maps+0x257) [0x4f0b77]
/home/ahunter/bin/perf(perf_session__new+0xc0) [0x4f86f0]
/home/ahunter/bin/perf(cmd_record+0x722) [0x43c132]
/home/ahunter/bin/perf() [0x4a11ae]
/home/ahunter/bin/perf(main+0x5d4) [0x427fb4]
Note, for testing purposes, this is hard to hit unless you add some sleep()
in builtin-record.c before record__open().
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Cc: stable@vger.kernel.org
Fixes: 3dcc4436fa ("perf tools: Introduce trigger class")
Link: http://lkml.kernel.org/r/1519807144-30694-1-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When printing stats in CSV mode, 'perf stat' appends extra separators
when a counter is not supported:
<not supported>,,L1-dcache-store-misses,mesos/bd442f34-2b4a-47df-b966-9b281f9f56fc,0,100.00,,,,
Which causes a failure when parsing fields. The numbers of separators
should be the same for each line, no matter if the counter is or not
supported.
Signed-off-by: Ilya Pronin <ipronin@twitter.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/20180306064353.31930-1-xiyou.wangcong@gmail.com
Fixes: 92a61f6412 ("perf stat: Implement CSV metrics output")
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The dock line-out pin (NID 0x17 of ALC3254 codec) on Dell Precision
7520 may route to three different DACs, 0x02, 0x03 and 0x06. The
first two DACS have the volume amp controls while the last one
doesn't. And unfortunately, the auto-parser assigns this pin to DAC3,
resulting in the non-working volume control for the line out.
Fix it by disabling the routing to DAC3 on the corresponding pin.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=199029
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Even if we don't have extended SCA support, we can have more than 64 CPUs
if we don't enable any HW features that might use the SCA entries.
Now, this works just fine, but we missed a return, which is why we
would actually store the SCA entries. If we have more than 64 CPUs, this
means writing outside of the basic SCA - bad.
Let's fix this. This allows > 64 CPUs when running nested (under vSIE)
without random crashes.
Fixes: a6940674c3 ("KVM: s390: allow 255 VCPUs when sca entries aren't used")
Reported-by: Christian Borntraeger <borntraeger@de.ibm.com>
Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20180306132758.21034-1-david@redhat.com>
Cc: stable@vger.kernel.org
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
With ibm,dynamic-memory-v2 and ibm,drc-info coming around the same
time, byte22 in vector5 of ibm architecture vector table got set twice
separately. The end result is that guest kernel isn't advertising
support for ibm,dynamic-memory-v2.
Fix this by removing the duplicate assignment of byte22.
Fixes: 02ef6dd810 ("powerpc: Enable support for ibm,drc-info devtree property")
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The internal mic boost on the T480 is too high. Fix this by applying the
ALC269_FIXUP_LIMIT_INT_MIC_BOOST fixup to the machine to limit the gain.
Signed-off-by: Benjamin Berg <bberg@redhat.com>
Tested-by: Benjamin Berg <bberg@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
This platform was only one phone Jack.
Add dummy lineout verb to fix automute mode disable.
This just the workaround.
[ More background information:
since the platform has only a headphone jack without speaker, the
driver doesn't create the auto-mute control. Meanwhile we do update
the headset mode via the automute hook in the driver, thus with this
setup, the headset won't be updated any longer.
By adding a dummy line-out pin here, the auto-mute is added by the
driver, and the headset update is triggered properly.
Note that this is different from the other
ALC274_FIXUP_DELL_AIO_LINEOUT_VERB, which has the real line-out pin,
while this quirk adds a dummy line-out pin. -- tiwai ]
Signed-off-by: Kailang Yang <kailang@realtek.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
when a system call is interrupted we might call the critical section
cleanup handler that re-does some of the operations. When we are between
.Lsysc_vtime and .Lsysc_do_svc we might also redo the saving of the
problem state registers r0-r7:
.Lcleanup_system_call:
[...]
0: # update accounting time stamp
mvc __LC_LAST_UPDATE_TIMER(8),__LC_SYNC_ENTER_TIMER
# set up saved register r11
lg %r15,__LC_KERNEL_STACK
la %r9,STACK_FRAME_OVERHEAD(%r15)
stg %r9,24(%r11) # r11 pt_regs pointer
# fill pt_regs
mvc __PT_R8(64,%r9),__LC_SAVE_AREA_SYNC
---> stmg %r0,%r7,__PT_R0(%r9)
The problem is now, that we might have already zeroed out r0.
The fix is to move the zeroing of r0 after sysc_do_svc.
Reported-by: Farhan Ali <alifm@linux.vnet.ibm.com>
Fixes: 7041d28115 ("s390: scrub registers on kernel entry and KVM exit")
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Due to an oversight when refactoring siginfo_t si_pkey has been in the
wrong position since 4.16-rc1. Add an explicit check of the offset of
every user space field in siginfo_t and compat_siginfo_t to make a
mistake like this hard to make in the future.
I have run this code on 4.15 and 4.16-rc1 with the position of si_pkey
fixed and all of the fields show up in the same location.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
The change moving addr_lsb into the _sigfault union failed to take
into account that _sigfault._addr_bnd._lower being a pointer forced
the entire union to have pointer alignment. In practice this only
mattered for the offset of si_pkey which is why this has taken so long
to discover.
To correct this change _dummy_pkey and _dummy_bnd to have pointer type.
Reported-by: kernel test robot <shun.hao@intel.com>
Fixes: b68a68d3dc ("signal: Move addr_lsb into the _sigfault union for clarity")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Pull ia64 cleanups from Tony Luck:
- More atomic cleanup from willy
- Fix a python script to work with version 3
- Some other small cleanups
* tag 'please-pull-ia64_misc' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux:
ia64/err-inject: fix spelling mistake: "capapbilities" -> "capabilities"
ia64/err-inject: Use get_user_pages_fast()
ia64: doc: tweak whitespace for 'console=' parameter
ia64: Convert remaining atomic operations
ia64: convert unwcheck.py to python3
Since commit 4670d610d5 ("PCI: Move OF-related PCI functions into
PCI core"), sparc:allmodconfig fails to build with the following error.
pcie-cadence-host.c:(.text+0x4c4): undefined reference to `of_irq_parse_and_map_pci'
pcie-cadence-host.c:(.text+0x4c8): undefined reference to `of_irq_parse_and_map_pci'
of_irq_parse_and_map_pci() is now only available if OF_IRQ is enabled.
Make its declaration and its dummy function dependent on OF_IRQ to solve
the problem.
Fixes: 4670d610d5 ("PCI: Move OF-related PCI functions into PCI core")
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Rob Herring <robh@kernel.org>
Commit a3e6c1eff5 ("MIPS: IRQ: Fix disable_irq on CPU IRQs") fixes an
issue where disable_irq did not actually disable the irq. The bug caused
our IPIs to not be disabled, which actually is the correct behavior.
With the addition of commit a3e6c1eff5 ("MIPS: IRQ: Fix disable_irq on
CPU IRQs"), the IPIs were getting disabled going into suspend, thus
schedule_ipi() was not being called. This caused deadlocks where
schedulable task were not being scheduled and other cpus were waiting
for them to do something.
Add the IRQF_NO_SUSPEND flag so an irq_disable will not be called on the
IPIs during suspend.
Signed-off-by: Justin Chen <justinpopo6@gmail.com>
Fixes: a3e6c1eff5 ("MIPS: IRQ: Fix disabled_irq on CPU IRQs")
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: linux-mips@linux-mips.org
Cc: stable@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/17385/
[jhogan@kernel.org: checkpatch: wrap long lines and fix commit refs]
Signed-off-by: James Hogan <jhogan@kernel.org>
At the point of sysfs callback, the call to gup is
done without mmap_sem (or any lock for that matter).
This is racy. As such, use the get_user_pages_fast()
alternative and safely avoid taking the lock, if possible.
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Tony Luck <tony.luck@intel.com>
While we've only seen inlining problems with atomic_sub_return(),
the other atomic operations could have the same problem. Convert all
remaining operations to use the same solution as atomic_sub_return().
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Since my system use python3 as default, arch/ia64/scripts/unwcheck.py no
longer run.
This patch convert it to the python3 syntax.
I have ran it with python2/python3 while printing values of
start/end/rlen_sum which could be impacted by this change and I see no difference.
Fixes: 94a4708352 ("scripts: change scripts to use system python instead of env")
Signed-off-by: Corentin Labbe <clabbe@baylibre.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This can happen e.g. during disk cloning.
This is an incomplete fix: it does not catch duplicate UUIDs earlier
when things are still unattached. It does not unregister the device.
Further changes to cope better with this are planned but conflict with
Coly's ongoing improvements to handling device errors. In the meantime,
one can manually stop the device after this has happened.
Attempts to attach a duplicate device result in:
[ 136.372404] loop: module loaded
[ 136.424461] bcache: register_bdev() registered backing device loop0
[ 136.424464] bcache: bch_cached_dev_attach() Tried to attach loop0 but duplicate UUID already attached
My test procedure is:
dd if=/dev/sdb1 of=imgfile bs=1024 count=262144
losetup -f imgfile
Signed-off-by: Michael Lyle <mlyle@lyle.org>
Reviewed-by: Tang Junhui <tang.junhui@zte.com.cn>
Cc: <stable@vger.kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Descriptor rings were not initialized at zero when allocated
When area contained garbage data, it caused skb_over_panic in
e1000_clean_rx_irq (if data had E1000_RXD_STAT_DD bit set)
This patch makes use of dma_zalloc_coherent to make sure the
ring is memset at 0 to prevent the area from containing garbage.
Following is the signature of the panic:
IODDR0@0.0: skbuff: skb_over_panic: text:80407b20 len:64010 put:64010 head:ab46d800 data:ab46d842 tail:0xab47d24c end:0xab46df40 dev:eth0
IODDR0@0.0: BUG: failure at net/core/skbuff.c:105/skb_panic()!
IODDR0@0.0: Kernel panic - not syncing: BUG!
IODDR0@0.0:
IODDR0@0.0: Process swapper/0 (pid: 0, threadinfo=81728000, task=8173cc00 ,cpu: 0)
IODDR0@0.0: SP = <815a1c0c>
IODDR0@0.0: Stack: 00000001
IODDR0@0.0: b2d89800 815e33ac
IODDR0@0.0: ea73c040 00000001
IODDR0@0.0: 60040003 0000fa0a
IODDR0@0.0: 00000002
IODDR0@0.0:
IODDR0@0.0: 804540c0 815a1c70
IODDR0@0.0: b2744000 602ac070
IODDR0@0.0: 815a1c44 b2d89800
IODDR0@0.0: 8173cc00 815a1c08
IODDR0@0.0:
IODDR0@0.0: 00000006
IODDR0@0.0: 815a1b50 00000000
IODDR0@0.0: 80079434 00000001
IODDR0@0.0: ab46df40 b2744000
IODDR0@0.0: b2d89800
IODDR0@0.0:
IODDR0@0.0: 0000fa0a 8045745c
IODDR0@0.0: 815a1c88 0000fa0a
IODDR0@0.0: 80407b20 b2789f80
IODDR0@0.0: 00000005 80407b20
IODDR0@0.0:
IODDR0@0.0:
IODDR0@0.0: Call Trace:
IODDR0@0.0: [<804540bc>] skb_panic+0xa4/0xa8
IODDR0@0.0: [<80079430>] console_unlock+0x2f8/0x6d0
IODDR0@0.0: [<80457458>] skb_put+0xa0/0xc0
IODDR0@0.0: [<80407b1c>] e1000_clean_rx_irq+0x2dc/0x3e8
IODDR0@0.0: [<80407b1c>] e1000_clean_rx_irq+0x2dc/0x3e8
IODDR0@0.0: [<804079c8>] e1000_clean_rx_irq+0x188/0x3e8
IODDR0@0.0: [<80407b1c>] e1000_clean_rx_irq+0x2dc/0x3e8
IODDR0@0.0: [<80468b48>] __dev_kfree_skb_any+0x88/0xa8
IODDR0@0.0: [<804101ac>] e1000e_poll+0x94/0x288
IODDR0@0.0: [<8046e9d4>] net_rx_action+0x19c/0x4e8
IODDR0@0.0: ...
IODDR0@0.0: Maximum depth to print reached. Use kstack=<maximum_depth_to_print> To specify a custom value (where 0 means to display the full backtrace)
IODDR0@0.0: ---[ end Kernel panic - not syncing: BUG!
Signed-off-by: Pierre-Yves Kerbrat <pkerbrat@kalray.eu>
Signed-off-by: Marius Gligor <mgligor@kalray.eu>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Reviewed-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Clean fake sink flag after detecting link on downstream port.
Fixing display light-up after "hot-unplug&plug again" downstream
of an active dongle.
Signed-off-by: Roman Li <Roman.Li@amd.com>
Reviewed-by: Harry Wentland <Harry.Wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Pull kselftest fixes from Shuah Khan:
"A fix for regression in memory-hotplug install script that prevents
the test from running on the target"
* tag 'linux-kselftest-4.16-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests: memory-hotplug: fix emit_tests regression
Pull networking fixes from David Miller:
1) Use an appropriate TSQ pacing shift in mac80211, from Toke
Høiland-Jørgensen.
2) Just like ipv4's ip_route_me_harder(), we have to use skb_to_full_sk
in ip6_route_me_harder, from Eric Dumazet.
3) Fix several shutdown races and similar other problems in l2tp, from
James Chapman.
4) Handle missing XDP flush properly in tuntap, for real this time.
From Jason Wang.
5) Out-of-bounds access in powerpc ebpf tailcalls, from Daniel
Borkmann.
6) Fix phy_resume() locking, from Andrew Lunn.
7) IFLA_MTU values are ignored on newlink for some tunnel types, fix
from Xin Long.
8) Revert F-RTO middle box workarounds, they only handle one dimension
of the problem. From Yuchung Cheng.
9) Fix socket refcounting in RDS, from Ka-Cheong Poon.
10) Don't allow ppp unit registration to an unregistered channel, from
Guillaume Nault.
11) Various hv_netvsc fixes from Stephen Hemminger.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (98 commits)
hv_netvsc: propagate rx filters to VF
hv_netvsc: filter multicast/broadcast
hv_netvsc: defer queue selection to VF
hv_netvsc: use napi_schedule_irqoff
hv_netvsc: fix race in napi poll when rescheduling
hv_netvsc: cancel subchannel setup before halting device
hv_netvsc: fix error unwind handling if vmbus_open fails
hv_netvsc: only wake transmit queue if link is up
hv_netvsc: avoid retry on send during shutdown
virtio-net: re enable XDP_REDIRECT for mergeable buffer
ppp: prevent unregistered channels from connecting to PPP units
tc-testing: skbmod: fix match value of ethertype
mlxsw: spectrum_switchdev: Check success of FDB add operation
net: make skb_gso_*_seglen functions private
net: xfrm: use skb_gso_validate_network_len() to check gso sizes
net: sched: tbf: handle GSO_BY_FRAGS case in enqueue
net: rename skb_gso_validate_mtu -> skb_gso_validate_network_len
rds: Incorrect reference counting in TCP socket creation
net: ethtool: don't ignore return from driver get_fecparam method
vrf: check forwarding on the original netdevice when generating ICMP dest unreachable
...
When autoneg is off, the .check_for_link callback functions clear the
get_link_status flag and systematically return a "pseudo-error". This means
that the link is not detected as up until the next execution of the
e1000_watchdog_task() 2 seconds later.
Fixes: 19110cfbb3 ("e1000e: Separate signaling for link check/link up")
Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
Acked-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The 82574 specification update errata 12 states that interrupts may be
missed if ICR is read while INT_ASSERTED is not set. Avoid that problem by
setting all bits related to events that can trigger the Other interrupt in
IMS.
The Other interrupt is raised for such events regardless of whether or not
they are set in IMS. However, only when they are set is the INT_ASSERTED
bit also set in ICR.
By doing this, we ensure that INT_ASSERTED is always set when we read ICR
in e1000_msix_other() and steer clear of the errata. This also ensures that
ICR will automatically be cleared on read, therefore we no longer need to
clear bits explicitly.
Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Restores the ICS write for Rx/Tx queue interrupts which was present before
commit 16ecba59bc ("e1000e: Do not read ICR in Other interrupt", v4.5-rc1)
but was not restored in commit 4aea7a5c5e
("e1000e: Avoid receiver overrun interrupt bursts", v4.15-rc1).
This re-raises the queue interrupts in case the txq or rxq bits were set in
ICR and the Other interrupt handler read and cleared ICR before the queue
interrupt was raised.
Fixes: 4aea7a5c5e ("e1000e: Avoid receiver overrun interrupt bursts")
Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This partially reverts commit 4aea7a5c5e.
We keep the fix for the first part of the problem (1) described in the log
of that commit, that is to read ICR in the other interrupt handler. We
remove the fix for the second part of the problem (2), Other interrupt
throttling.
Bursts of "Other" interrupts may once again occur during rxo (receive
overflow) traffic conditions. This is deemed acceptable in the interest of
avoiding unforeseen fallout from changes that are not strictly necessary.
As discussed, the e1000e driver should be in "maintenance mode".
Link: https://www.spinics.net/lists/netdev/msg480675.html
Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
It was reported that emulated e1000e devices in vmware esxi 6.5 Build
7526125 do not link up after commit 4aea7a5c5e ("e1000e: Avoid receiver
overrun interrupt bursts", v4.15-rc1). Some tracing shows that after
e1000e_trigger_lsc() is called, ICR reads out as 0x0 in e1000_msix_other()
on emulated e1000e devices. In comparison, on real e1000e 82574 hardware,
icr=0x80000004 (_INT_ASSERTED | _LSC) in the same situation.
Some experimentation showed that this flaw in vmware e1000e emulation can
be worked around by not setting Other in EIAC. This is how it was before
16ecba59bc ("e1000e: Do not read ICR in Other interrupt", v4.5-rc1).
Fixes: 4aea7a5c5e ("e1000e: Avoid receiver overrun interrupt bursts")
Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Currently we can crash perf record when running in pipe mode, like:
$ perf record ls | perf report
# To display the perf.data header info, please use --header/--header-only options.
#
perf: Segmentation fault
Error:
The - file has no samples!
The callstack of the crash is:
0x0000000000515242 in perf_event__synthesize_event_update_name
3513 ev = event_update_event__new(len + 1, PERF_EVENT_UPDATE__NAME, evsel->id[0]);
(gdb) bt
#0 0x0000000000515242 in perf_event__synthesize_event_update_name
#1 0x00000000005158a4 in perf_event__synthesize_extra_attr
#2 0x0000000000443347 in record__synthesize
#3 0x00000000004438e3 in __cmd_record
#4 0x000000000044514e in cmd_record
#5 0x00000000004cbc95 in run_builtin
#6 0x00000000004cbf02 in handle_internal_command
#7 0x00000000004cc054 in run_argv
#8 0x00000000004cc422 in main
The reason of the crash is that the evsel does not have ids array
allocated and the pipe's synthesize code tries to access it.
We don't force evsel ids allocation when we have single event, because
it's not needed. However we need it when we are in pipe mode even for
single event as a key for evsel update event.
Fixing this by forcing evsel ids allocation event for single event, when
we are in pipe mode.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180302161354.30192-1-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This first happened with a gcc function, _cpp_lex_token, that has the
usual jumps:
│1159e6c: ↓ jne 115aa32 <_cpp_lex_token@@Base+0xf92>
I.e. jumps to a label inside that function (_cpp_lex_token), and those
works, but also this kind:
│1159e8b: ↓ jne c469be <cpp_named_operator2name@@Base+0xa72>
I.e. jumps to another function, outside _cpp_lex_token, which are not
being correctly handled generating as a side effect references to
ab->offset[] entries that are set to NULL, so to make this code more
robust, check that here.
A proper fix for will be put in place, looking at the function name
right after the '<' token and probably treating this like a 'call'
instruction.
For now just don't draw the arrow.
Reported-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Ingo Molnar <mingo@kernel.org>
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Cc: Jin Yao <yao.jin@intel.com>
Cc: Kan Liang <kan.liang@intel.com>
Link: https://lkml.kernel.org/n/tip-5tzvb875ep2sel03aeefgmud@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
On older (e.g. v4.4) kernels, an annoying fallback message can be
observed in 'perf top':
┌─Warning:──────────────────────┐
│fall back to non-overwrite mode│
│ │
│ │
│Press any key... │
└───────────────────────────────┘
The 'perf top' utility has been changed to overwrite mode since commit
ebebbf0823 ("perf top: Switch default mode to overwrite mode").
For older kernels which don't have overwrite mode support, 'perf top'
will fall back to non-overwrite mode and print out the fallback message
using ui__warning(), which needs user's input to close.
The fallback message is not critical for end users. Turning it to debug
message which is printed when running with -vv.
Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Kan Liang <kan.liang@intel.com>
Cc: Kan Liang <kan.liang@intel.com>
Fixes: ebebbf0823 ("perf top: Switch default mode to overwrite mode")
Link: http://lkml.kernel.org/r/1519669030-176549-1-git-send-email-kan.liang@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
kconfig.h was excluded from consideration by fixdep by
6a5be57f0f (fixdep: fix extraneous dependencies) to avoid some false
positive hits
(1) include/config/.h
(2) include/config/h.h
(3) include/config/foo.h
(1) occurred because kconfig.h contains the string CONFIG_ in a
comment. However, since dee81e9886 (fixdep: faster CONFIG_ search), we
have a check that the part after CONFIG_ is non-empty, so this does not
happen anymore (and CONFIG_ appears by itself elsewhere, so that check
is worthwhile).
(2) comes from the include guard, __LINUX_KCONFIG_H. But with the
previous patch, we no longer match that either.
That leaves (3), which amounts to one [1] false dependency (aka stat() call
done by make), which I think we can live with:
We've already had one case [2] where the lack of include/linux/kconfig.h in
the .o.cmd file caused a missing rebuild, and while I originally thought
we should just put kconfig.h in the dependency list without parsing it
for the CONFIG_ pattern, we actually do have some real CONFIG_ symbols
mentioned in it, and one can imagine some translation unit that just
does '#ifdef __BIG_ENDIAN' but doesn't through some other header
actually depend on CONFIG_CPU_BIG_ENDIAN - so changing the target
endianness could end up rebuilding the world, minus that small
TU. Quoting Linus,
... when missing dependencies cause a missed re-compile, the resulting
bugs can be _really_ subtle.
[1] well, two, we now also have CONFIG_BOOGER/booger.h - we could change
that to FOO if we care
[2] https://lkml.org/lkml/2018/2/22/838
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
The string CONFIG_ quite often appears after other alphanumerics,
meaning that that instance cannot be referencing a Kconfig
symbol. Omitting these means make has fewer files to stat() when
deciding what needs to be rebuilt - for a defconfig build, this seems to
remove about 2% of the (wildcard ...) lines from the .o.cmd files.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
uml-config.h hasn't existed in this decade (87e299e5c7 - x86, um: get
rid of uml-config.h). The few remaining UML_CONFIG instances are defined
directly in terms of their real CONFIG symbol in common-offsets.h, so
unlike when the symbols got defined via a sed script, anything that uses
UML_CONFIG_FOO now should also automatically pick up a dependency on
CONFIG_FOO via the normal fixdep mechanism (since common-offsets.h
should at least recursively be a dependency). Hence I believe we should
actually be able to ignore the HELLO_CONFIG_BOOM cases.
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Richard Weinberger <richard@nod.at>
Cc: user-mode-linux-devel@lists.sourceforge.net
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
dtc now gives the following warning:
arch/arm/boot/dts/rk3288-tinker.dtb: Warning (sound_dai_property): /sound/simple-audio-card,codec: Missing property '#sound-dai-cells' in node /hdmi@ff980000 or bad phandle (referred from sound-dai[0])
Add the missing #sound-dai-cells property.
Cc: Heiko Stuebner <heiko@sntech.de>
Cc: linux-rockchip@lists.infradead.org
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
If a start bit is detected, then reset the receive buffer counter to 0.
This ensures that no stale data is in the buffer if a message is
broken off midstream due to e.g. a Low Drive condition and then
retransmitted.
The only Rx interrupts we need to listen to are RX_REGISTER_FULL (i.e.
a valid byte was received) and RX_START_BIT_DETECTED (i.e. a new
message starts and we need to reset the counter).
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Cc: <stable@vger.kernel.org> # for v4.15 and up
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Since commit 02d742f4b2 ("media: lirc: lirc daemon fails to detect raw
IR device"), the feature LIRC_CAN_SEND_SCANCODE is no longer used as it
tripped up lircd. The ability to send scancodes for IR Tx is implied by
LIRC_CAN_SEND_PULSE (i.e. any device that can send can use IR Tx encoders).
So, remove LIRC_CAN_SEND_SCANCODE since it never used. This fixes:
Documentation/output/lirc.h.rst:6: WARNING: undefined label:
lirc-can-send-scancode (if the link has no caption the label must precede
a section header
As this flag was added for kernel 4.16, let's remove it, while not too
late.
Signed-off-by: Sean Young <sean@mess.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
When I debug a kernel crash issue in funcitonfs, found ffs_data.ref
overflowed, While functionfs is unmounting, ffs_data is put twice.
Commit 43938613c6 ("drivers, usb: convert ffs_data.ref from atomic_t to
refcount_t") can avoid refcount overflow, but that is risk some situations.
So no need put ffs data in ffs_fs_kill_sb, already put in ffs_data_closed.
The issue can be reproduced in Mediatek mt6763 SoC, ffs for ADB device.
KASAN enabled configuration reports use-after-free errro.
BUG: KASAN: use-after-free in refcount_dec_and_test+0x14/0xe0 at addr ffffffc0579386a0
Read of size 4 by task umount/4650
====================================================
BUG kmalloc-512 (Tainted: P W O ): kasan: bad access detected
-----------------------------------------------------------------------------
INFO: Allocated in ffs_fs_mount+0x194/0x844 age=22856 cpu=2 pid=566
alloc_debug_processing+0x1ac/0x1e8
___slab_alloc.constprop.63+0x640/0x648
__slab_alloc.isra.57.constprop.62+0x24/0x34
kmem_cache_alloc_trace+0x1a8/0x2bc
ffs_fs_mount+0x194/0x844
mount_fs+0x6c/0x1d0
vfs_kern_mount+0x50/0x1b4
do_mount+0x258/0x1034
INFO: Freed in ffs_data_put+0x25c/0x320 age=0 cpu=3 pid=4650
free_debug_processing+0x22c/0x434
__slab_free+0x2d8/0x3a0
kfree+0x254/0x264
ffs_data_put+0x25c/0x320
ffs_data_closed+0x124/0x15c
ffs_fs_kill_sb+0xb8/0x110
deactivate_locked_super+0x6c/0x98
deactivate_super+0xb0/0xbc
INFO: Object 0xffffffc057938600 @offset=1536 fp=0x (null)
......
Call trace:
[<ffffff900808cf5c>] dump_backtrace+0x0/0x250
[<ffffff900808d3a0>] show_stack+0x14/0x1c
[<ffffff90084a8c04>] dump_stack+0xa0/0xc8
[<ffffff900826c2b4>] print_trailer+0x158/0x260
[<ffffff900826d9d8>] object_err+0x3c/0x40
[<ffffff90082745f0>] kasan_report_error+0x2a8/0x754
[<ffffff9008274f84>] kasan_report+0x5c/0x60
[<ffffff9008273208>] __asan_load4+0x70/0x88
[<ffffff90084cd81c>] refcount_dec_and_test+0x14/0xe0
[<ffffff9008d98f9c>] ffs_data_put+0x80/0x320
[<ffffff9008d9d904>] ffs_fs_kill_sb+0xc8/0x110
[<ffffff90082852a0>] deactivate_locked_super+0x6c/0x98
[<ffffff900828537c>] deactivate_super+0xb0/0xbc
[<ffffff90082af0c0>] cleanup_mnt+0x64/0xec
[<ffffff90082af1b0>] __cleanup_mnt+0x10/0x18
[<ffffff90080d9e68>] task_work_run+0xcc/0x124
[<ffffff900808c8c0>] do_notify_resume+0x60/0x70
[<ffffff90080866e4>] work_pending+0x10/0x14
Cc: stable@vger.kernel.org
Signed-off-by: Xinyong <xinyong.fang@linux.alibaba.com>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
The commit 9d9491a7da ("mmc: dw_mmc: Fix the DTO timeout calculation")
and commit 4c2357f57d ("mmc: dw_mmc: Fix the CTO timeout calculation")
made changes, which cause multiply overflow for 32-bit systems. The broken
timeout calculations leads to unexpected ETIMEDOUT errors and causes
stacktrace splat (such as below) during normal data exchange with SD-card.
| Running : 4M-check-reassembly-tcp-cmykw2-rotatew2.out -v0 -w1
| - Info: Finished target initialization.
| mmcblk0: error -110 transferring data, sector 320544, nr 2048, cmd
| response 0x900, card status 0x0
DIV_ROUND_UP_ULL helps to escape usage of __udivdi3() from libgcc and so
code gets compiled on all 32-bit platforms as opposed to usage of
DIV_ROUND_UP when we may only compile stuff on a very few arches.
Lets cast this multiply to u64 type to prevent the overflow.
Fixes: 9d9491a7da ("mmc: dw_mmc: Fix the DTO timeout calculation")
Fixes: 4c2357f57d ("mmc: dw_mmc: Fix the CTO timeout calculation")
Tested-by: Vineet Gupta <Vineet.Gupta1@synopsys.com>
Reported-by: Vineet Gupta <Vineet.Gupta1@synopsys.com> # ARC STAR 9001306872 HSDK, sdio: board crashes when copying big files
Signed-off-by: Evgeniy Didin <Evgeniy.Didin@synopsys.com>
Cc: <stable@vger.kernel.org> # 4.14
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Shawn Lin <shawn.lin@rock-chips.com>
Reviewed-by: Jisheng Zhang <Jisheng.Zhang@synaptics.com>
Acked-by: Jaehoon Chung <jh80.chung@samsung.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Since commit ab82fa7da4 ("gpio: rcar: Prevent module clock disable
when wake-up is enabled"), when a GPIO is used for wakeup, the GPIO
block's module clock (if exists) is manually kept running during system
suspend, to make sure the device stays active.
However, this explicit clock handling is merely a workaround for a
failure to properly communicate wakeup information to the device core.
Instead, set the device's power.wakeup_path field, to indicate this
device is part of the wakeup path. Depending on the PM Domain's
active_wakeup configuration, the genpd core code will keep the device
enabled (and the clock running) during system suspend when needed.
This allows for the removal of all explicit clock handling code from the
driver.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Stephen Hemminger says:
====================
hv_netvsc: minor fixes
These are improvements to netvsc driver. They aren't functionality
changes so not targeting net-next; and they are not show stopper
bugs that need to go to stable either.
v2
- drop the irq flags patch, defer it to net-next
- split the multicast filter flag patch out
- change propogate rx mode patch to handle startup of vf
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The netvsc device should propagate filters to the SR-IOV VF
device (if present). The flags also need to be propagated to the
VF device as well. This only really matters on local Hyper-V
since Azure does not support multiple addresses.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The netvsc driver was always enabling all multicast and broadcast
even if netdevice flag had not enabled it.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When VF is used for accelerated networking it will likely have
more queues (and different policy) than the synthetic NIC.
This patch defers the queue policy to the VF so that all the
queues can be used. This impacts workloads like local generate UDP.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since the netvsc_channel_cb is already called in interrupt
context from vmbus, there is no need to do irqsave/restore.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is a race between napi_reschedule and re-enabling interrupts
which could lead to missed host interrrupts. This occurs when
interrupts are re-enabled (hv_end_read) and vmbus irq callback
(netvsc_channel_cb) has already scheduled NAPI.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Block setup of multiple channels earlier in the teardown
process. This avoids possible races between halt and subchannel
initialization.
Suggested-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change the initialization order so that the device is ready to transmit
(ie connect vsp is completed) before setting the internal reference
to the device with RCU.
This avoids any races on initialization and prevents retry issues
on shutdown.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
XDP_REDIRECT support for mergeable buffer was removed since commit
7324f5399b ("virtio_net: disable XDP_REDIRECT in receive_mergeable()
case"). This is because we don't reserve enough tailroom for struct
skb_shared_info which breaks XDP assumption. So this patch fixes this
by reserving enough tailroom and using fixed size of rx buffer.
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
PPP units don't hold any reference on the channels connected to it.
It is the channel's responsibility to ensure that it disconnects from
its unit before being destroyed.
In practice, this is ensured by ppp_unregister_channel() disconnecting
the channel from the unit before dropping a reference on the channel.
However, it is possible for an unregistered channel to connect to a PPP
unit: register a channel with ppp_register_net_channel(), attach a
/dev/ppp file to it with ioctl(PPPIOCATTCHAN), unregister the channel
with ppp_unregister_channel() and finally connect the /dev/ppp file to
a PPP unit with ioctl(PPPIOCCONNECT).
Once in this situation, the channel is only held by the /dev/ppp file,
which can be released at anytime and free the channel without letting
the parent PPP unit know. Then the ppp structure ends up with dangling
pointers in its ->channels list.
Prevent this scenario by forbidding unregistered channels from
connecting to PPP units. This maintains the code logic by keeping
ppp_unregister_channel() responsible from disconnecting the channel if
necessary and avoids modification on the reference counting mechanism.
This issue seems to predate git history (successfully reproduced on
Linux 2.6.26 and earlier PPP commits are unrelated).
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
iproute2 print_skbmod() prints the configured ethertype using format 0x%X:
therefore, test 9aa8 systematically fails, because it configures action #4
using ethertype 0x0031, and expects 0x0031 when it reads it back. Changing
the expected value to 0x31 lets the test result 'not ok' become 'ok'.
tested with:
# ./tdc.py -e 9aa8
Test 9aa8: Get a single skbmod action from a list
All test results:
1..1
ok 1 9aa8 Get a single skbmod action from a list
Fixes: cf797ac49b ("tc-testing: Add test cases for police and skbmod")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Until now, we assumed that in case of error when adding FDB entries, the
write operation will fail, but this is not the case. Instead, we need to
check that the number of entries reported in the response is equal to
the number of entries specified in the request.
Fixes: 56ade8fe3f ("mlxsw: spectrum: Add initial support for Spectrum ASIC")
Reported-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Axtens says:
====================
GSO_BY_FRAGS correctness improvements
As requested [1], I went through and had a look at users of gso_size to
see if there were things that need to be fixed to consider
GSO_BY_FRAGS, and I have tried to improve our helper functions to deal
with this case.
I found a few. This fixes bugs relating to the use of
skb_gso_*_seglen() where GSO_BY_FRAGS is not considered.
Patch 1 renames skb_gso_validate_mtu to skb_gso_validate_network_len.
This is follow-up to my earlier patch 2b16f04872 ("net: create
skb_gso_validate_mac_len()"), and just makes everything a bit clearer.
Patches 2 and 3 replace the final users of skb_gso_network_seglen() -
which doesn't consider GSO_BY_FRAGS - with
skb_gso_validate_network_len(), which does. This allows me to make the
skb_gso_*_seglen functions private in patch 4 - now future users won't
accidentally do the wrong comparison.
Two things remain. One is qdisc_pkt_len_init, which is discussed at
[2] - it's caught up in the GSO_DODGY mess. I don't have any expertise
in GSO_DODGY, and it looks like a good clean fix will involve
unpicking the whole validation mess, so I have left it for now.
Secondly, there are 3 eBPF opcodes that change the gso_size of an SKB
and don't consider GSO_BY_FRAGS. This is going through the bpf tree.
Regards,
Daniel
[1] https://patchwork.ozlabs.org/comment/1852414/
[2] https://www.spinics.net/lists/netdev/msg482397.html
PS: This is all in the core networking stack. For a driver to be
affected by this it would need to support NETIF_F_GSO_SCTP /
NETIF_F_GSO_SOFTWARE and then either use gso_size or not be a purely
virtual device. (Many drivers look at gso_size, but do not support
SCTP segmentation, so the core network will segment an SCTP gso before
it hits them.) Based on that, the only driver that may be affected is
sunvnet, but I have no way of testing it, so I haven't looked at it.
v2: split out bpf stuff
fix review comments from Dave Miller
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
They're very hard to use properly as they do not consider the
GSO_BY_FRAGS case. Code should use skb_gso_validate_network_len
and skb_gso_validate_mac_len as they do consider this case.
Make the seglen functions static, which stops people using them
outside of skbuff.c
Signed-off-by: Daniel Axtens <dja@axtens.net>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Replace skb_gso_network_seglen() with
skb_gso_validate_network_len(), as it considers the GSO_BY_FRAGS
case.
Signed-off-by: Daniel Axtens <dja@axtens.net>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
tbf_enqueue() checks the size of a packet before enqueuing it.
However, the GSO size check does not consider the GSO_BY_FRAGS
case, and so will drop GSO SCTP packets, causing a massive drop
in throughput.
Use skb_gso_validate_mac_len() instead, as it does consider that
case.
Signed-off-by: Daniel Axtens <dja@axtens.net>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If you take a GSO skb, and split it into packets, will the network
length (L3 headers + L4 headers + payload) of those packets be small
enough to fit within a given MTU?
skb_gso_validate_mtu gives you the answer to that question. However,
we recently added to add a way to validate the MAC length of a split GSO
skb (L2+L3+L4+payload), and the names get confusing, so rename
skb_gso_validate_mtu to skb_gso_validate_network_len
Signed-off-by: Daniel Axtens <dja@axtens.net>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Like the Highpoint Rocketraid 642L and cards using a Marvel 88SE9235
controller in general, this RAID card also supports AHCI mode and short
of a custom driver, this is the only way to make it work under Linux.
Note that even though the card is called to 644L, it has a product-id
of 0x0645.
Cc: stable@vger.kernel.org
BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1534106
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
As the kernel doc describes too the code is supposed to skip adding
multicast TT entries if both the WANT_ALL_IPV4 and WANT_ALL_IPV6 flags
are present.
Unfortunately, the current code even skips adding multicast TT entries
if only either the WANT_ALL_IPV4 or WANT_ALL_IPV6 is present.
This could lead to IPv6 multicast packet loss if only an IGMP but not an
MLD querier is present for instance or vice versa.
Fixes: 687937ab34 ("batman-adv: Add multicast optimization support for bridged setups")
Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Simon Wunderlich says:
====================
Here are some batman-adv bugfixes:
- fix skb checksum issues, by Matthias Schiffer (2 patches)
- fix exception handling when dumping data objects through netlink,
by Sven Eckelmann (4 patches)
- fix handling of interface indices, by Sven Eckelmann
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
SCTP GSO skbs have a gso_size of GSO_BY_FRAGS, so any sort of
unconditionally mangling of that will result in nonsense value
and would corrupt the skb later on.
Therefore, i) add two helpers skb_increase_gso_size() and
skb_decrease_gso_size() that would throw a one time warning and
bail out for such skbs and ii) refuse and return early with an
error in those BPF helpers that are affected. We do need to bail
out as early as possible from there before any changes on the
skb have been performed.
Fixes: 6578171a7f ("bpf: add bpf_skb_change_proto helper")
Co-authored-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Daniel Axtens <dja@axtens.net>
Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Gen8 and prior Proliant systems supported the "CRU" interface
to firmware. This interfaces allows linux to "call back" into firmware
to source the cause of an NMI. This feature isn't fully utilized
as the actual source of the NMI isn't printed, the driver only
indicates that the source couldn't be determined when the call
fails.
With the advent of Gen9, iCRU replaces the CRU. The call back
feature is no longer available in firmware. To be compatible and
not attempt to call back into firmware on system not supporting CRU,
the SMBIOS table is consulted to determine if it is safe to
make the call back or not.
This results in about half of the driver code being devoted
to either making CRU calls or determing if it is safe to make
CRU calls. As noted, the driver isn't really using the results of
the CRU calls.
Furthermore, as a consequence of the Spectre security issue, the
BIOS/EFI calls are being wrapped into Spectre-disabling section.
Removing the call back in hpwdt_pretimeout assists in this effort.
As the CRU sourcing of the NMI isn't required for handling the
NMI and there are security concerns with making the call back, remove
the legacy (pre Gen9) NMI sourcing and the DMI code to determine if
the system had the CRU interface.
Signed-off-by: Jerry Hoemann <jerry.hoemann@hpe.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
According to SBSA spec v3.1 section 5.3:
All registers are 32 bits in size and should be accessed using
32-bit reads and writes. If an access size other than 32 bits
is used then the results are IMPLEMENTATION DEFINED.
[...]
The Generic Watchdog is little-endian
The current code uses readq to read the watchdog compare register
which does a 64-bit access. This fails on ThunderX2 which does not
implement 64-bit access to this register.
Fix this by using lo_hi_readq() that does two 32-bit reads.
Signed-off-by: Jayachandran C <jnair@caviumnetworks.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
Watchdog close is "expected" when any byte is 'V' not just the last one.
Writing "V" to the device fails because the last byte is the end of string.
$ echo V > /dev/watchdog
f71808e_wdt: Unexpected close, not stopping watchdog!
Signed-off-by: Igor Pylypiv <igor.pylypiv@gmail.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
Since commit 8b24e69fc4 ("KVM: PPC: Book3S HV: Close race with testing
for signals on guest entry"), if CONFIG_VIRT_CPU_ACCOUNTING_GEN is set, the
guest time is not accounted to guest time and user time, but instead to
system time.
This is because guest_enter()/guest_exit() are called while interrupts
are disabled and the tick counter cannot be updated between them.
To fix that, move guest_exit() after local_irq_enable(), and as
guest_enter() is called with IRQ disabled, call guest_enter_irqoff()
instead.
Fixes: 8b24e69fc4 ("KVM: PPC: Book3S HV: Close race with testing for signals on guest entry")
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Pablo Neira Ayuso says:
====================
Netfilter/IPVS fixes for net
The following patchset contains Netfilter fixes for your net tree,
they are:
1) Put back reference on CLUSTERIP configuration structure from the
error path, patch from Florian Westphal.
2) Put reference on CLUSTERIP configuration instead of freeing it,
another cpu may still be walking over it, also from Florian.
3) Refetch pointer to IPv6 header from nf_nat_ipv6_manip_pkt() given
packet manipulation may reallocation the skbuff header, from Florian.
4) Missing match size sanity checks in ebt_among, from Florian.
5) Convert BUG_ON to WARN_ON in ebtables, from Florian.
6) Sanity check userspace offsets from ebtables kernel, from Florian.
7) Missing checksum replace call in flowtable IPv4 DNAT, from Felix
Fietkau.
8) Bump the right stats on checksum error from bridge netfilter,
from Taehee Yoo.
9) Unset interface flag in IPv6 fib lookups otherwise we get
misleading routing lookup results, from Florian.
10) Missing sk_to_full_sk() in ip6_route_me_harder() from Eric Dumazet.
11) Don't allow devices to be part of multiple flowtables at the same
time, this may break setups.
12) Missing netlink attribute validation in flowtable deletion.
13) Wrong array index in nf_unregister_net_hook() call from error path
in flowtable addition path.
14) Fix FTP IPVS helper when NAT mangling is in place, patch from
Julian Anastasov.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Since commit afcc90f862 ("usercopy: WARN() on slab cache usercopy
region violations"), MIPS systems booting with a compat root filesystem
emit a warning when copying compat siginfo to userspace:
WARNING: CPU: 0 PID: 953 at mm/usercopy.c:81 usercopy_warn+0x98/0xe8
Bad or missing usercopy whitelist? Kernel memory exposure attempt
detected from SLAB object 'task_struct' (offset 1432, size 16)!
Modules linked in:
CPU: 0 PID: 953 Comm: S01logging Not tainted 4.16.0-rc2 #10
Stack : ffffffff808c0000 0000000000000000 0000000000000001 65ac85163f3bdc4a
65ac85163f3bdc4a 0000000000000000 90000000ff667ab8 ffffffff808c0000
00000000000003f8 ffffffff808d0000 00000000000000d1 0000000000000000
000000000000003c 0000000000000000 ffffffff808c8ca8 ffffffff808d0000
ffffffff808d0000 ffffffff80810000 fffffc0000000000 ffffffff80785c30
0000000000000009 0000000000000051 90000000ff667eb0 90000000ff667db0
000000007fe0d938 0000000000000018 ffffffff80449958 0000000020052798
ffffffff808c0000 90000000ff664000 90000000ff667ab0 00000000100c0000
ffffffff80698810 0000000000000000 0000000000000000 0000000000000000
0000000000000000 0000000000000000 ffffffff8010d02c 65ac85163f3bdc4a
...
Call Trace:
[<ffffffff8010d02c>] show_stack+0x9c/0x130
[<ffffffff80698810>] dump_stack+0x90/0xd0
[<ffffffff80137b78>] __warn+0x100/0x118
[<ffffffff80137bdc>] warn_slowpath_fmt+0x4c/0x70
[<ffffffff8021e4a8>] usercopy_warn+0x98/0xe8
[<ffffffff8021e68c>] __check_object_size+0xfc/0x250
[<ffffffff801bbfb8>] put_compat_sigset+0x30/0x88
[<ffffffff8011af24>] setup_rt_frame_n32+0xc4/0x160
[<ffffffff8010b8b4>] do_signal+0x19c/0x230
[<ffffffff8010c408>] do_notify_resume+0x60/0x78
[<ffffffff80106f50>] work_notifysig+0x10/0x18
---[ end trace 88fffbf69147f48a ]---
Commit 5905429ad8 ("fork: Provide usercopy whitelisting for
task_struct") noted that:
"While the blocked and saved_sigmask fields of task_struct are copied to
userspace (via sigmask_to_save() and setup_rt_frame()), it is always
copied with a static length (i.e. sizeof(sigset_t))."
However, this is not true in the case of compat signals, whose sigset
is copied by put_compat_sigset and receives size as an argument.
At most call sites, put_compat_sigset is copying a sigset from the
current task_struct. This triggers a warning when
CONFIG_HARDENED_USERCOPY is active. However, by marking this function as
static inline, the warning can be avoided because in all of these cases
the size is constant at compile time, which is allowed. The only site
where this is not the case is handling the rt_sigpending syscall, but
there the copy is being made from a stack local variable so does not
trigger the warning.
Move put_compat_sigset to compat.h, and mark it static inline. This
fixes the WARN on MIPS.
Fixes: afcc90f862 ("usercopy: WARN() on slab cache usercopy region violations")
Signed-off-by: Matt Redfearn <matt.redfearn@mips.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: "Dmitry V . Levin" <ldv@altlinux.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: kernel-hardening@lists.openwall.com
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/18639/
Signed-off-by: James Hogan <jhogan@kernel.org>
Commit 16c513b134
("selftests: memory-hotplug: silence test command echo")
introduced regression in emit_tests and results in the following
failure when selftests are installed and run. Fix it.
Running tests in memory-hotplug
========================================
./run_kselftest.sh: line 121: @./mem-on-off-test.sh: No such file or
directory
selftests: memory-hotplug [FAIL]
Fixes: 16c513b134 (selftests: memory-hotplug: silence test command echo")
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Tested-by: Anders Roxell <anders.roxell@linaro.org>
Signed-off-by: Shuah Khan <shuahkh@osg.samsung.com>
Johannes Berg says:
====================
Three more patches:
* fix for a regression in 4-addr mode with fast-RX
* fix for a Kconfig problem with the new regdb
* fix for the long-standing TCP performance issue in
wifi using the new sk_pacing_shift_update()
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 0933a578cd ("rds: tcp: use sock_create_lite() to create the
accept socket") has a reference counting issue in TCP socket creation
when accepting a new connection. The code uses sock_create_lite() to
create a kernel socket. But it does not do __module_get() on the
socket owner. When the connection is shutdown and sock_release() is
called to free the socket, the owner's reference count is decremented
and becomes incorrect. Note that this bug only shows up when the socket
owner is configured as a kernel module.
v2: Update comments
Fixes: 0933a578cd ("rds: tcp: use sock_create_lite() to create the accept socket")
Signed-off-by: Ka-Cheong Poon <ka-cheong.poon@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Acked-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When running s390 images with 'compat' processes, the following
BUG is seen repeatedly.
BUG: non-zero pgtables_bytes on freeing mm: -16384
Bisect points to commit b4e98d9ac7 ("mm: account pud page tables").
Analysis shows that init_new_context() is called with
mm->context.asce_limit set to _REGION3_SIZE. In this situation,
pgtables_bytes remains set to 0 and is not increased. The message is
displayed when the affected process dies and mm_dec_nr_puds() is called.
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Fixes: b4e98d9ac7 ("mm: account pud page tables")
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
With the alc289, the Pin 0x1b is Headphone-Mic, so we should assign
ALC269_FIXUP_DELL4_MIC_NO_PRESENCE rather than
ALC225_FIXUP_DELL1_MIC_NO_PRESENCE to it. And this change is suggested
by Kailang of Realtek and is verified on the machine.
Fixes: 3f2f7c553d ("ALSA: hda - Fix headset mic detection problem for two Dell machines")
Cc: Kailang Yang <kailang@realtek.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Hui Wang <hui.wang@canonical.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
In the scheduler config command, the meaning of tid == 0xf was intended
to indicate the configuration is for management frames. However,
tid == 0xf was also used for the multicast queue that was meant only
for multicast data frames, which resulted with the FW not encrypting
multicast data frames.
As multicast frames do not have a QoS header, fix this by setting
tid == 0, to indicate that this is a data queue and not management
one.
Signed-off-by: Ilan Peer <ilan.peer@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Multicast frames for NL80211_IFTYPE_AP and NL80211_IFTYPE_ADHOC were
directed to the broadcast station, however, as the broadcast station
did not have keys configured, these frames were sent unencrypted.
Fix this by using the multicast station which is the station for which
encryption keys are configured.
Signed-off-by: Ilan Peer <ilan.peer@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
When the GTK is installed, we install it to HW with the
station ID of the AP.
Mac80211 will try to remove it only after the AP sta is
removed, which will result in a failure to remove key
since we do not have any station for it.
This is a valid situation, but a previous commit removed
the early return and added a return with error value, which
resulted in an error message that is confusing to users.
Remove the error return value.
Fixes: 85aeb58cec ("iwlwifi: mvm: Enable security on new TX API")
Signed-off-by: Sara Sharon <sara.sharon@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Trying to collect firmware debug data while firmware
is not loaded causes various errors (e.g. failing NIC access).
This causes even a bigger issue if at that time the
HW radio is off.
In that case, when later turning the radio on, the Driver
fails to read the HW (registers contain garbage values).
(It may be that the CSR_GP_CNTRL_REG_FLAG_RFKILL_WAKE_L1A_EN
bit is cleared on faulty NIC access - since the same behavior
was seen in HW RFKILL toggling before setting that bit.)
Signed-off-by: Shaul Triebitz <shaul.triebitz@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
We should add the multicast station before adding the
broadcast station.
However, in older FW, the firmware will start beaconing
when we add the multicast station, and since the broadcast
station is not added at this point so the transmission
of the beacon will fail on assert 0x2b00.
This is fixed in later firmware, so make the order
of addition depend on the TLV.
Fixes: 26d6c16bed ("iwlwifi: mvm: add multicast station")
Signed-off-by: Sara Sharon <sara.sharon@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
It was assumed that apply_time==0 implies immediate scheduling, which is
wrong. Instead, the fw expects the START_IMMEDIATELY flag to be set.
Otherwise, this resulted in 0x3063 assert.
Fix that.
While at it rename the T2_V2_START_IMMEDIATELY to
TE_V2_START_IMMEDIATELY.
Fixes: f5d8f50f27 ("iwlwifi: mvm: Fix channel switch in case of count <= 1")
Signed-off-by: Andrei Otcheretianski <andrei.otcheretianski@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
We don't have enough room in the TX command for a CCMP 256
key, and need to use key from table.
Fixes: 3264bf032bd9 ("[BUGFIX] iwlwifi: mvm: Fix CCMP IV setting")
Signed-off-by: Sara Sharon <sara.sharon@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
While entering to D3 mode there is a gap between the time the
driver handles the D3_CONFIG_CMD response to the time the host is going
to sleep.
In between there might be cases which MARKER_CMD can tailgate.
Also during resume flow the MARKER_CMD might get sent while D0I3_CMD
is being handled in the FW.
Cancel MARKER_CMD timer and set it again properly during suspend
resume flows to prevent this command from being sent accidentlly.
Signed-off-by: Haim Dreyfuss <haim.dreyfuss@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
This reverts commit c301b327ae.
While this works splendidly on rk3399-gru devices using the cros-ec
extcon, other rk3399-based devices using the fusb302 or no power-delivery
controller at all don't probe at all anymore, as the typec-phy currently
always expects the extcon to be available and therefore defers probing
indefinitly on these.
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
The current code for initializing the VRMA (virtual real memory area)
for HPT guests requires the page size of the backing memory to be one
of 4kB, 64kB or 16MB. With a radix host we have the possibility that
the backing memory page size can be 2MB or 1GB. In these cases, if the
guest switches to HPT mode, KVM will not initialize the VRMA and the
guest will fail to run.
In fact it is not necessary that the VRMA page size is the same as the
backing memory page size; any VRMA page size less than or equal to the
backing memory page size is acceptable. Therefore we now choose the
largest page size out of the set {4k, 64k, 16M} which is not larger
than the backing memory page size.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
This fixes several bugs in the radix page fault handler relating to
the way large pages in the memory backing the guest were handled.
First, the check for large pages only checked for explicit huge pages
and missed transparent huge pages. Then the check that the addresses
(host virtual vs. guest physical) had appropriate alignment was
wrong, meaning that the code never put a large page in the partition
scoped radix tree; it was always demoted to a small page.
Fixing this exposed bugs in kvmppc_create_pte(). We were never
invalidating a 2MB PTE, which meant that if a page was initially
faulted in without write permission and the guest then attempted
to store to it, we would never update the PTE to have write permission.
If we find a valid 2MB PTE in the PMD, we need to clear it and
do a TLB invalidation before installing either the new 2MB PTE or
a pointer to a page table page.
This also corrects an assumption that get_user_pages_fast would set
the _PAGE_DIRTY bit if we are writing, which is not true. Instead we
mark the page dirty explicitly with set_page_dirty_lock(). This
also means we don't need the dirty bit set on the host PTE when
providing write access on a read fault.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Daniel Borkmann says:
====================
pull-request: bpf 2018-02-28
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) Add schedule points and reduce the number of loop iterations
the test_bpf kernel module is performing in order to not hog
the CPU for too long, from Eric.
2) Fix an out of bounds access in tail calls in the ppc64 BPF
JIT compiler, from Daniel.
3) Fix a crash on arm64 on unaligned BPF xadd operations that
could be triggered via interpreter and JIT, from Daniel.
Please not that once you merge net into net-next at some point, there
is a minor merge conflict in test_verifier.c since test cases had
been added at the end in both trees. Resolution is trivial: keep all
the test cases from both trees.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
If ethtool_ops->get_fecparam returns an error, pass that error on to the
user, rather than ignoring it.
Fixes: 1a5f3da20b ("net: ethtool: add support for forward error correction modes")
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When ip_error() is called the device is the l3mdev master instead of the
original device. So the forwarding check should be on the original one.
Changes from v2:
- Handle the original device disappearing (per David Ahern)
- Minimize the change in code order
Changes from v1:
- Only need to reset the device on which __in_dev_get_rcu() is done (per
David Ahern).
Signed-off-by: Stephen Suryaputra <ssuryaextr@gmail.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Setting an interface into a VRF fails with 'RTNETLINK answers: File
exists' if one of its VLAN interfaces is already in the same VRF.
As the VRF is an upper device of the VLAN interface, it is also showing
up as an upper device of the interface itself. The solution is to
restrict this check to devices other than master. As only one master
device can be linked to a device, the check in this case is that the
upper device (VRF) being linked to is not the same as the master device
instead of it not being any one of the upper devices.
The following example shows an interface ens12 (with a VLAN interface
ens12.10) being set into VRF green, which behaves as expected:
# ip link add link ens12 ens12.10 type vlan id 10
# ip link set dev ens12 master vrfgreen
# ip link show dev ens12
3: ens12: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel
master vrfgreen state UP mode DEFAULT group default qlen 1000
link/ether 52:54:00:4c:a0:45 brd ff:ff:ff:ff:ff:ff
But if the VLAN interface has previously been set into the same VRF,
then setting the interface into the VRF fails:
# ip link set dev ens12 nomaster
# ip link set dev ens12.10 master vrfgreen
# ip link show dev ens12.10
39: ens12.10@ens12: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500
qdisc noqueue master vrfgreen state UP mode DEFAULT group default
qlen 1000 link/ether 52:54:00:4c:a0:45 brd ff:ff:ff:ff:ff:ff
# ip link set dev ens12 master vrfgreen
RTNETLINK answers: File exists
The workaround is to move the VLAN interface back into the default VRF
beforehand, but it has to be shut first so as to avoid the risk of
traffic leaking from the VRF. This fix avoids needing this workaround.
Signed-off-by: Mike Manning <mmanning@att.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some required information is not exposed to userspace currently (eg. the
PASID), pass this information back, along with other information which
is currently communicated via sysfs, which saves some parsing effort in
userspace.
Signed-off-by: Alastair D'Silva <alastair@d-silva.org>
Acked-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Acked-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
commit a4239945b8 ("scsi: qla2xxx: Add switch command to simplify
fabric discovery") introduced regression when it did not consider
FC-NVMe code path which broke NVMe LUN discovery.
Fixes: a4239945b8 ("scsi: qla2xxx: Add switch command to simplify fabric discovery")
Signed-off-by: Darren Trapp <darren.trapp@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When converting __scsi_error_from_host_byte() to BLK_STS error codes the
case DID_OK was forgotten, resulting in it always returning an error.
Fixes: 2a842acab1 ("block: introduce new block status code type")
Cc: Doug Gilbert <dgilbert@interlog.com>
Signed-off-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The fcport flags FCF_ASYNC_ACTIVE and FCF_ASYNC_SENT are used to
throttle the state machine, so we need to ensure to always set and unset
them correctly. Not doing so will lead to the state machine getting
confused and no login attempt into remote ports.
Cc: Quinn Tran <quinn.tran@cavium.com>
Cc: Himanshu Madhani <himanshu.madhani@cavium.com>
Fixes: 3dbec59bdf ("scsi: qla2xxx: Prevent multiple active discovery commands per session")
Signed-off-by: Hannes Reinecke <hare@suse.com>
Acked-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Commit d8630bb95f ('Serialize session deletion by using work_lock')
tries to fixup a deadlock when deleting sessions, but fails to take into
account the locking rules. This patch resolves the situation by
introducing a separate lock for processing the GNLIST response, and
ensures that sess_lock is released before calling
qlt_schedule_sess_delete().
Cc: Himanshu Madhani <himanshu.madhani@cavium.com>
Cc: Quinn Tran <quinn.tran@cavium.com>
Fixes: d8630bb95f ("scsi: qla2xxx: Serialize session deletion by using work_lock")
Signed-off-by: Hannes Reinecke <hare@suse.com>
Acked-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The subpage_prot syscall is only functional when the system is using
the Hash MMU. Since commit 5b2b807147 ("powerpc/mm: Invalidate
subpage_prot() system call on radix platforms") it returns ENOENT when
the Radix MMU is active. Currently this just makes the test fail.
Additionally the syscall is not available if the kernel is built with
4K pages, or if CONFIG_PPC_SUBPAGE_PROT=n, in which case it returns
ENOSYS because the syscall is missing entirely.
So check explicitly for ENOENT and ENOSYS and skip if we see either of
those.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Fix xfs_file_iomap_begin to trylock the ilock if IOMAP_NOWAIT is passed,
so that we don't block io_submit callers.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
There is no reason to take the ilock exclusively at the start of
xfs_file_iomap_begin for direct I/O, given that it will be demoted
just before calling xfs_iomap_write_direct anyway.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
The iomap zeroing interface is smart enough to skip zeroing holes or
unwritten extents. Don't subvert this logic for reflink files.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
We've got a kernel panic when using sata disk with sas controller:
[115946.152283] Unable to handle kernel NULL pointer dereference at virtual address 000007d8
[115946.223963] CPU: 0 PID: 22175 Comm: kworker/0:1 Tainted: G W OEL 4.14.0 #1
[115946.232925] Workqueue: events ata_scsi_hotplug
[115946.237938] task: ffff8021ee50b180 task.stack: ffff00000d5d0000
[115946.244717] PC is at sas_find_dev_by_rphy+0x44/0x114
[115946.250224] LR is at sas_find_dev_by_rphy+0x3c/0x114
......
[115946.355701] Process kworker/0:1 (pid: 22175, stack limit = 0xffff00000d5d0000)
[115946.363369] Call trace:
[115946.456356] [<ffff000008878a9c>] sas_find_dev_by_rphy+0x44/0x114
[115946.462908] [<ffff000008878b8c>] sas_target_alloc+0x20/0x5c
[115946.469408] [<ffff00000885a31c>] scsi_alloc_target+0x250/0x308
[115946.475781] [<ffff00000885ba30>] __scsi_add_device+0xb0/0x154
[115946.481991] [<ffff0000088b520c>] ata_scsi_scan_host+0x180/0x218
[115946.488367] [<ffff0000088b53d8>] ata_scsi_hotplug+0xb0/0xcc
[115946.494801] [<ffff0000080ebd70>] process_one_work+0x144/0x390
[115946.501115] [<ffff0000080ec100>] worker_thread+0x144/0x418
[115946.507093] [<ffff0000080f2c98>] kthread+0x10c/0x138
[115946.512792] [<ffff0000080855dc>] ret_from_fork+0x10/0x18
We found that Ding Xiang has reported a similar bug before:
https://patchwork.kernel.org/patch/9179817/
And this bug still exists in mainline. Since libsas handles hotplug and
device adding/removing itself, do not need to schedule ata hot plug task
here if it is a sas host.
Signed-off-by: Jason Yan <yanaijie@huawei.com>
Cc: Ding Xiang <dingxiang@huawei.com>
Cc: stable@vger.kernel.org
Signed-off-by: Tejun Heo <tj@kernel.org>
Commit 1fdb926974 ("Bluetooth: btusb: Use DMI matching for QCA
reset_resume quirking"), added the Lenovo Yoga 920 to the
btusb_needs_reset_resume_table.
Testing has shown that this is a false positive and the problems where
caused by issues with the initial fix: commit fd865802c6 ("Bluetooth:
btusb: fix QCA Rome suspend/resume"), which has already been reverted.
So the QCA Rome BT in the Yoga 920 does not need a reset-resume quirk at
all and this commit removes it from the btusb_needs_reset_resume_table.
Note that after this commit the btusb_needs_reset_resume_table is now
empty. It is kept around on purpose, since this whole series of commits
started for a reason and there are actually broken platforms around,
which need to be added to it.
BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1514836
Fixes: 1fdb926974 ("Bluetooth: btusb: Use DMI matching for QCA ...")
Cc: stable@vger.kernel.org
Cc: Brian Norris <briannorris@chromium.org>
Cc: Kai-Heng Feng <kai.heng.feng@canonical.com>
Tested-by: Kevin Fenzi <kevin@scrye.com>
Suggested-by: Brian Norris <briannorris@chromium.org>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Brian Norris <briannorris@chromium.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The newly introudced ip_min_valid_pmtu variable is only used when
CONFIG_SYSCTL is set:
net/ipv4/route.c:135:12: error: 'ip_min_valid_pmtu' defined but not used [-Werror=unused-variable]
This moves it to the other variables like it, to avoid the harmless
warning.
Fixes: c7272c2f12 ("net: ipv4: don't allow setting net.ipv4.route.min_pmtu below 68")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
84676c1f21 ("genirq/affinity: assign vectors to all possible CPUs")
has switched to do irq vectors spread among all possible CPUs, so
pass num_possible_cpus() as max vecotrs to be assigned.
For example, in a 8 cores system, 0~3 online, 4~8 offline/not present,
see 'lscpu':
[ming@box]$lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Thread(s) per core: 1
Core(s) per socket: 2
Socket(s): 2
NUMA node(s): 2
...
NUMA node0 CPU(s): 0-3
NUMA node1 CPU(s):
...
1) before this patch, follows the allocated vectors and their affinity:
irq 47, cpu list 0,4
irq 48, cpu list 1,6
irq 49, cpu list 2,5
irq 50, cpu list 3,7
2) after this patch, follows the allocated vectors and their affinity:
irq 43, cpu list 0
irq 44, cpu list 1
irq 45, cpu list 2
irq 46, cpu list 3
irq 47, cpu list 4
irq 48, cpu list 6
irq 49, cpu list 5
irq 50, cpu list 7
Cc: Keith Busch <keith.busch@intel.com>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <keith.busch@intel.com>
ashmem_mutex create a chain of dependencies like so:
(1)
mmap syscall ->
mmap_sem -> (acquired)
ashmem_mmap
ashmem_mutex (try to acquire)
(block)
(2)
llseek syscall ->
ashmem_llseek ->
ashmem_mutex -> (acquired)
inode_lock ->
inode->i_rwsem (try to acquire)
(block)
(3)
getdents ->
iterate_dir ->
inode_lock ->
inode->i_rwsem (acquired)
copy_to_user ->
mmap_sem (try to acquire)
There is a lock ordering created between mmap_sem and inode->i_rwsem
causing a lockdep splat [2] during a syzcaller test, this patch fixes
the issue by unlocking the mutex earlier. Functionally that's Ok since
we don't need to protect vfs_llseek.
[1] https://patchwork.kernel.org/patch/10185031/
[2] https://lkml.org/lkml/2018/1/10/48
Acked-by: Todd Kjos <tkjos@google.com>
Cc: Arve Hjonnevag <arve@android.com>
Cc: stable@vger.kernel.org
Reported-by: syzbot+8ec30bb7bf1a981a2012@syzkaller.appspotmail.com
Signed-off-by: Joel Fernandes <joelaf@google.com>
Acked-by: Greg Hackmann <ghackmann@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Triggering PPC EEH detection and handling requires a memory mapped read
failure. The NVMe driver removed the periodic health check MMIO, so
there's no early detection mechanism to trigger the recovery. Instead,
the detection now happens when the nvme driver handles an IO timeout
event. This takes the pci channel offline, so we do not want the driver
to proceed with escalating its own recovery efforts that may conflict
with the EEH handler.
This patch ensures the driver will observe the channel was set to offline
after a failed MMIO read and resets the IO timer so the EEH handler has
a chance to recover the device.
Signed-off-by: Wen Xiong <wenxiong@linux.vnet.ibm.com>
[updated change log]
Signed-off-by: Keith Busch <keith.busch@intel.com>
Back in the early days when gru devices were still under development
we found an issue where the WiFi reset line needed to be configured as
early as possible during the boot process to avoid the WiFi module
being in a bad state.
We found that the way to get the kernel to do this in the earliest
possible place was to configure this line in the pinctrl hogs, so
that's what we did. For some history here you can see
<http://crosreview.com/368770>. After the time that change landed in
the kernel, we landed a firmware change to configure this line even
earlier. See <http://crosreview.com/399919>. However, even after the
firmware change landed we kept the kernel change to deal with the fact
that some people working on devices might take a little while to
update their firmware.
At this there are definitely zero devices out in the wild that have
firmware without the fix in it. Specifically looking in the firmware
branch several critically important fixes for memory stability landed
after the patch in coreboot and I know we didn't ship without those.
Thus, by now, everyone should have the new firmware and it's safe to
not have the kernel set this up in a pinctrl hog.
Historically, even though it wasn't needed to have this in a pinctrl
hog, we still kept it since it didn't hurt. Pinctrl would apply the
default hog at bootup and then would never touch things again. That
all changed with commit 981ed1bfbc ("pinctrl: Really force states
during suspend/resume"). After that commit then we'll re-apply the
default hog at resume time and that can screw up the reset state of
WiFi. ...and on rk3399 if you touch a device on PCIe in the wrong way
then the whole system can go haywire. That's what was happening.
Specifically you'd resume a rk3399-gru-* device and it would mostly
resume, then would crash with some crazy weird crash.
One could say, perhaps, that the recent pinctrl change was at fault
(and should be fixed) since it changed behavior. ...but that's not
really true. The device tree for rk3399-gru is really to blame.
Specifically since the pinctrl is defined in the hog and not in the
"wlan-pd-n" node then the actual user of this pin doesn't have a
pinctrl entry for it. That's bad.
Let's fix our problems by just moving the control of
"wlan_module_reset_l pinctrl" out of the hog and put them in the
proper place.
NOTE: in theory, I think it should actually be possible to have a pin
controlled _both_ by the hog and by an actual device. Once the device
claims the pin I think the hog is supposed to let go. I'm not 100%
sure that this works and in any case this solution would be more
complex than is necessary.
Reported-by: Marc Zyngier <marc.zyngier@arm.com>
Fixes: 48f4d9796d ("arm64: dts: rockchip: add Gru/Kevin DTS")
Fixes: 981ed1bfbc ("pinctrl: Really force states during suspend/resume")
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Tested-by: Enric Balletbo i Serra <enric.balletbo@collabora.com>
Tested-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
When IPsec offloading was introduced, we accidentally incremented
the sequence number counter on the xfrm_state by one packet
too much in the ESN case. This leads to a sequence number gap of
one packet after each GSO packet. Fix this by setting the sequence
number to the correct value.
Fixes: d7dbefc45c ("xfrm: Add xfrm_replay_overflow functions for offloading")
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Release the netdev references in the cleanup path. Invokes the cleanup
routines if bnxt_re_ib_reg fails.
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
To support host systems with non 4K page size, l2_db_size shall be
calculated with 4096 instead of PAGE_SIZE. Also, supply the host page size
to FW during initialization.
Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
HW requires an unconditonal fence for all non-wire memory operations
through SQ. This guarantees the completions of these memory operations.
Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
During IB device registration process, if query_device() fails or if
ib_core fails to registers sysfs entries, rdma cgroup cleanup is
skipped.
Cc: <stable@vger.kernel.org> # v4.2+
Fixes: 4be3a4fa51 ("IB/core: Fix kernel crash during fail to initialize device")
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
IB spec says that a lid should be ignored when link layer is Ethernet,
for example when building or parsing a CM request message (CA17-34).
However, since ib_lid_be16() and ib_lid_cpu16() validates the slid,
not only when link layer is IB, we set the slid to zero to prevent
false warnings in the kernel log.
Fixes: 62ede77799 ("Add OPA extended LID support")
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
All other mlx5_events report the port number as 1 based, which is how FW
reports it in the port event EQE. Reporting 0 for this event causes
mlx5_ib to not raise a fatal event notification to registered clients
due to a seemingly invalid port.
All switch cases in mlx5_ib_event that go through the port check are
supposed to set the port now, so just do it once at variable
declaration.
Fixes: 89d44f0a6c73("net/mlx5_core: Add pci error handlers to mlx5_core driver")
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
During QP creation, the mlx5 driver translates the QP type to an
internal value which is passed on to FW. There was no check to make
sure that the translated value is valid, and -EINVAL was coerced into
the mailbox command.
Current firmware refuses this as an invalid QP type, but future/past
firmware may do something else.
Fixes: 09a7d9eca1 ('{net,IB}/mlx5: QP/XRCD commands via mlx5 ifc')
Reviewed-by: Ilya Lesokhin <ilyal@mellanox.com>
Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The IPS_NAT_MASK check in 4.12 replaced previous check for nfct_nat()
which was needed to fix a crash in 2.6.36-rc, see
commit 7bcbf81a22 ("ipvs: avoid oops for passive FTP").
But as IPVS does not set the IPS_SRC_NAT and IPS_DST_NAT bits,
checking for IPS_NAT_MASK prevents PASV response to be properly
mangled and blocks the transfer. Remove the check as it is not
needed after 3.12 commit 41d73ec053 ("netfilter: nf_conntrack:
make sequence number adjustments usuable without NAT") which
changes nfct_nat() with nfct_seqadj() and especially after 3.13
commit b25adce160 ("ipvs: correct usage/allocation of seqadj
ext in ipvs").
Thanks to Li Shuang and Florian Westphal for reporting the problem!
Reported-by: Li Shuang <shuali@redhat.com>
Fixes: be7be6e161 ("netfilter: ipvs: fix incorrect conflict resolution")
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Jiri Pirko says:
====================
mlxsw: couple of fixes
Couple of unrelated fixes for mlxsw.
---
v1->v2:
-patch 2:
- rebase on top of current -net tree
- removed forgotten empty line
-patch 3:
- new patch
-patch 4:
- new patch
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
One of the basic construct in the device is a port-VLAN pair, which can
be bound to a FID or a RIF in order to direct packets to the bridge or
the router, respectively.
Since not all the netdevs are configured with a VLAN (e.g., sw1p1 vs.
sw1p1.10), VID 1 is used to represent these and thus this VID can be
used by both upper devices of mlxsw ports and by the driver itself.
However, this VID is not reference counted and therefore might be freed
prematurely, which can result in various WARNINGs. For example:
$ ip link add name br0 type bridge vlan_filtering 1
$ teamd -t team0 -d -c '{"runner": {"name": "lacp"}}'
$ ip link set dev team0 master br0
$ ip link set dev enp1s0np1 master team0
$ ip address add 192.0.2.1/24 dev enp1s0np1
The enslavement to team0 will fail because team0 already has an upper
and thus vlan_vids_del_by_dev() will be executed as part of team's error
path which will delete VID 1 from enp1s0np1 (added by br0 as PVID). The
WARNING will be generated when the driver will realize it can't find VID
1 on the port and bind it to a RIF.
Fix this by adding a reference count to the VLAN entries on the port, in
a similar fashion to the reference counting used by the corresponding
'vlan_vid_info' structure in the 8021q driver.
Fixes: c57529e1d5 ("mlxsw: spectrum: Replace vPorts with Port-VLAN")
Reported-by: Tal Bar <talb@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Tested-by: Tal Bar <talb@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When multicast snooping is enabled, the Linux bridge resorts to flooding
unregistered multicast packets to all ports only in case it did not
detect a querier in the network.
The above condition is not reflected to underlying drivers, which is
especially problematic in IPv6 environments, as multicast snooping is
enabled by default and since neighbour solicitation packets might be
treated as unregistered multicast packets in case there is no
corresponding MDB entry.
Until the Linux bridge reflects its querier state to underlying drivers,
simply treat unregistered multicast packets as broadcast and allow them
to reach their destination.
Fixes: 9df552ef3e ("mlxsw: spectrum: Improve IPv6 unregistered multicast flooding")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Current code uses global variables, adjusts them and passes pointer down
to devlink. With every other mlxsw_core instance, the previously passed
pointer values are rewritten. Fix this by de-globalize the variables and
also memcpy size_params during devlink resource registration.
Also, introduce a convenient size_param_init helper.
Fixes: ef3116e540 ("mlxsw: spectrum: Register KVD resources with devlink")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
IP_TTL, IP_ECN and IP_DSCP are using the same offset within the
scratchpad as L4 ports. Fix this by shifting all up.
Fixes: 5f57e09091 ("mlxsw: acl: Add ip ttl acl element")
Fixes: i80d0fe4710c ("mlxsw: acl: Add ip tos acl element")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ursula Braun says:
====================
net/smc: fixes 2018-02-28
here are 3 smc bug fixes for the net-tree. Karsten's first patch is
the reworked version of last week's
"[PATCH net-next 2/5] net/smc: fix structure size"
patch, now solved without using __packed, and now targetted for net
instead of net-next.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The CONFIRM LINK reply message must contain the link_id sent
by the server. And set the link_id explicitly when
initializing the link.
Signed-off-by: Karsten Graul <kgraul@linux.vnet.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The sizeof(struct smc_cdc_msg) evaluates to 48 bytes instead of the
required 44 bytes. We need to use the constant value of
SMC_WR_TX_SIZE to set and check the control message length.
Signed-off-by: Karsten Graul <kgraul@linux.vnet.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We try to disable NAPI to prevent a single XDP TX queue being used by
multiple cpus. But we don't check if device is up (NAPI is enabled),
this could result stall because of infinite wait in
napi_disable(). Fixing this by checking device state through
netif_running() before.
Fixes: 4941d472bf ("virtio-net: do not reset during XDP set")
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For tests that are using the maximal number of BPF instruction, each
run takes 20 usec. Looping 10,000 times on them totals 200 ms, which
is bad when the loop is not preemptible.
test_bpf: #264 BPF_MAXINSNS: Call heavy transformations jited:1 19248
18548 PASS
test_bpf: #269 BPF_MAXINSNS: ld_abs+get_processor_id jited:1 20896 PASS
Lets divide by ten the number of iterations, so that max latency is
20ms. We could use need_resched() to break the loop earlier if we
believe 20 ms is too much.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
When the connection is reset, there is no point in
keeping the packets on the write queue until the connection
is closed.
RFC 793 (page 70) and RFC 793-bis (page 64) both suggest
purging the write queue upon RST:
https://tools.ietf.org/html/draft-ietf-tcpm-rfc793bis-07
Moreover, this is essential for a correct MSG_ZEROCOPY
implementation, because userspace cannot call close(fd)
before receiving zerocopy signals even when the connection
is reset.
Fixes: f214f915e7 ("tcp: enable MSG_ZEROCOPY")
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yuchung Cheng says:
====================
tcp: revert a F-RTO extension due to broken middle-boxes
This patch series reverts a (non-standard) TCP F-RTO extension that aimed
to detect more spurious timeouts. Unfortunately it could result in poor
performance due to broken middle-boxes that modify TCP packets. E.g.
https://www.spinics.net/lists/netdev/msg484154.html
We believe the best and simplest solution is to just revert the change.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit 89fe18e44f.
While the patch could detect more spurious timeouts, it could cause
poor TCP performance on broken middle-boxes that modifies TCP packets
(e.g. receive window, SACK options). Since the performance gain is
much smaller compared to the potential loss. The best solution is
to fully revert the change.
Fixes: 89fe18e44f ("tcp: extend F-RTO to catch more spurious timeouts")
Reported-by: Teodor Milkov <tm@del.bg>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit cc663f4d4c. While fixing
some broken middle-boxes that modifies receive window fields, it does not
address middle-boxes that strip off SACK options. The best solution is
to fully revert this patch and the root F-RTO enhancement.
Fixes: cc663f4d4c ("tcp: restrict F-RTO to work-around broken middle-boxes")
Reported-by: Teodor Milkov <tm@del.bg>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Julian Wiedmann says:
====================
s390/qeth: fixes 2018-02-27
please apply some more qeth patches for -net and stable.
One patch fixes a performance bug in the TSO path. Then there's several
more fixes for IP management on L3 devices - including a revert, so that
the subsequent fix cleanly applies to earlier kernels.
The final patch takes care of a race in the control IO code that causes
qeth to miss the cmd response, and subsequently trigger device recovery.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
If multiple IPA commands are build & sent out concurrently,
fill_ipacmd_header() may assign a seqno value to a command that's
different from what send_control_data() later assigns to this command's
reply.
This is due to other commands passing through send_control_data(),
and incrementing card->seqno.ipa along the way.
So one IPA command has no reply that's waiting for its seqno, while some
other IPA command has multiple reply objects waiting for it.
Only one of those waiting replies wins, and the other(s) times out and
triggers a recovery via send_ipa_cmd().
Fix this by making sure that the same seqno value is assigned to
a command and its reply object.
Do so immediately before submitting the command & while holding the
irq_pending "lock", to produce nicely ascending seqnos.
As a side effect, *all* IPA commands now use a reply object that's
waiting for its actual seqno. Previously, early IPA commands that were
submitted while the card was still DOWN used the "catch-all" IDX seqno.
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Current code ("qeth_l3_ip_from_hash()") matches a queried address object
against objects in the IP table by IP address, Mask/Prefix Length and
MAC address ("qeth_l3_ipaddrs_is_equal()"). But what callers actually
require is either
a) "is this IP address registered" (ie. match by IP address only),
before adding a new address.
b) or "is this address object registered" (ie. match all relevant
attributes), before deleting an address.
Right now
1. the ADD path is too strict in its lookup, and eg. doesn't detect
conflicts between an existing NORMAL address and a new VIPA address
(because the NORMAL address will have mask != 0, while VIPA has
a mask == 0),
2. the DELETE path is not strict enough, and eg. allows del_rxip() to
delete a VIPA address as long as the IP address matches.
Fix all this by adding helpers (_addr_match_ip() and _addr_match_all())
that do the appropriate checking.
Note that the ADD path for NORMAL addresses is special, as qeth keeps
track of how many times such an address is in use (and there is no
immediate way of returning errors to the caller). So when a requested
NORMAL address _fully_ matches an existing one, it's not considered a
conflict and we merely increment the refcount.
Fixes: 5f78e29cee ("qeth: optimize IP handling in rx_mode callback")
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit cb816192d9.
The issue this attempted to fix never actually occurs.
l3_add_rxip() checks (via l3_ip_from_hash()) if the requested address
was previously added to the card. If so, it returns -EEXIST and doesn't
call l3_add_ip().
As a result, the "address exists" path in l3_add_ip() is never taken
for rxip addresses, and this patch had no effect.
Fixes: cb816192d9 ("s390/qeth: fix using of ref counter for rxip addresses")
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Registering an IPv4 address with the HW takes quite a while, so we
temporarily drop the ip_htable lock. Any concurrent add/remove of the
same IP adjusts the IP's use count, and (on remove) is then blocked by
addr->in_progress.
After the register call has completed, we check the use count for
concurrently attempted add/remove calls - and possibly straight-away
deregister the IP again. This happens via l3_delete_ip(), which
1) looks up the queried IP in the htable (getting a reference to the
*same* queried object),
2) deregisters the IP from the HW, and
3) frees the IP object.
The caller in l3_add_ip() then does a second free on the same object.
For this case, skip all the extra checks and lookups in l3_delete_ip()
and just deregister & free the IP object ourselves.
Fixes: 5f78e29cee ("qeth: optimize IP handling in rx_mode callback")
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the HW is not reachable, then none of the IPs in qeth's internal
table has been registered with the HW yet. So when deleting such an IP,
there's no need to stage it for deregistration - just drop it from
the table.
This fixes the "add-delete-add" scenario on an offline card, where the
the second "add" merely increments the IP's use count. But as the IP is
still set to DISP_ADDR_DELETE from the previous "delete" step,
l3_recover_ip() won't register it with the HW when the card goes online.
Fixes: 5f78e29cee ("qeth: optimize IP handling in rx_mode callback")
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
qeth_get_elements_for_range() doesn't know how to handle a 0-length
range (ie. start == end), and returns 1 when it should return 0.
Such ranges occur on TSO skbs, where the L2/L3/L4 headers (and thus all
of the skb's linear data) are skipped when mapping the skb into regular
buffer elements.
This overestimation may cause several performance-related issues:
1. sub-optimal IO buffer selection, where the next buffer gets selected
even though the skb would actually still fit into the current buffer.
2. forced linearization, if the element count for a non-linear skb
exceeds QETH_MAX_BUFFER_ELEMENTS.
Rather than modifying qeth_get_elements_for_range() and adding overhead
to every caller, fix up those callers that are in risk of passing a
0-length range.
Fixes: 2863c61334 ("qeth: refactor calculation of SBALE count")
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Don't include in the Rx bytecount of the packet sent up the stack:
the FCB (frame control block), and the padding bytes inserted by
the controller into the frame payload, nor the FCS. All these are
being pulled out of the skb by gfar_process_frame().
This issue is old, likely from the driver's beginnings, however
it was amplified by recent:
commit d903ec7711 ("gianfar: simplify FCS handling and fix memory leak")
which basically added the FCS to the Rx bytecount, and so brought
this to my attention.
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 6de3f79112 ("arm_pmu: explicitly enable/disable SPIs at hotplug")
moved all of the arm_pmu IRQ enable/disable calls to the CPU hotplug hooks,
regardless of whether they are implemented as PPIs or SPIs. This can
lead to us sleeping from atomic context due to disable_irq blocking:
| BUG: sleeping function called from invalid context at kernel/irq/manage.c:112
| in_atomic(): 1, irqs_disabled(): 128, pid: 15, name: migration/1
| no locks held by migration/1/15.
| irq event stamp: 192
| hardirqs last enabled at (191): [<00000000803c2507>]
| _raw_spin_unlock_irq+0x2c/0x4c
| hardirqs last disabled at (192): [<000000007f57ad28>] multi_cpu_stop+0x9c/0x140
| softirqs last enabled at (0): [<0000000004ee1b58>]
| copy_process.isra.77.part.78+0x43c/0x1504
| softirqs last disabled at (0): [< (null)>] (null)
| CPU: 1 PID: 15 Comm: migration/1 Not tainted 4.16.0-rc3-salvator-x #1651
| Hardware name: Renesas Salvator-X board based on r8a7796 (DT)
| Call trace:
| dump_backtrace+0x0/0x140
| show_stack+0x14/0x1c
| dump_stack+0xb4/0xf0
| ___might_sleep+0x1fc/0x218
| __might_sleep+0x70/0x80
| synchronize_irq+0x40/0xa8
| disable_irq+0x20/0x2c
| arm_perf_teardown_cpu+0x80/0xac
Since the interrupt is always CPU-affine and this code is running with
interrupts disabled, we can just use disable_irq_nosync as we know there
isn't a concurrent invocation of the handler to worry about.
Fixes: 6de3f79112 ("arm_pmu: explicitly enable/disable SPIs at hotplug")
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Only one of the two is really required, not both:
* have_rtscts or
* have_rtsgpio
In imx_rs485_config() this is done correctly, so RS485 is working,
just the error message is false.
Signed-off-by: Phil Eichinger <phil@zankapfel.net>
Reviewed-by: Fabio Estevam <fabio.estevam@nxp.com>
Fixes: b8f3bff057 ("serial: imx: Support common rs485 binding for RTS polarity"
Acked-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When the TTY buffers fill up to the configured maximum, a system lockup
occurs:
[ 598.820128] INFO: rcu_preempt detected stalls on CPUs/tasks:
[ 598.825796] 0-...!: (1 GPs behind) idle=5a6/2/0 softirq=1974/1974 fqs=1
[ 598.832577] (detected by 3, t=62517 jiffies, g=296, c=295, q=126)
[ 598.838755] Task dump for CPU 0:
[ 598.841977] swapper/0 R running task 0 0 0 0x00000022
[ 598.849023] Call trace:
[ 598.851476] __switch_to+0x98/0xb0
[ 598.854870] (null)
This can be prevented by doing a dummy read of the RX data register.
This issue affects both HSCIF and SCIF ports. Reported for R-Car H3 ES2.0;
reproduced and fixed on H3 ES1.1. Probably affects other R-Car platforms
as well.
Reported-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Signed-off-by: Ulrich Hecht <ulrich.hecht+renesas@gmail.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Cc: stable <stable@vger.kernel.org>
Tested-by: Nguyen Viet Dung <dung.nguyen.aj@renesas.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Do not fail on multiport cards in serial_pci_is_class_communication().
It restores behaviour for SUNIX multiport cards, that enumerated by
class and have a custom board data.
Moreover it allows users to reenumerate port-by-port from user space.
Fixes: 7d8905d064 ("serial: 8250_pci: Enable device after we check black list")
Reported-by: Nikola Ciprich <nikola.ciprich@linuxbox.cz>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Tested-by: Nikola Ciprich <nikola.ciprich@linuxbox.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
A tty is hung up by __tty_hangup() setting file->f_op to
hung_up_tty_fops, which is skipped on ttys whose write operation isn't
tty_write(). This means that, for example, /dev/console whose write
op is redirected_tty_write() is never actually marked hung up.
Because n_tty_read() uses the hung up status to decide whether to
abort the waiting readers, the lack of hung-up marking can lead to the
following scenario.
1. A session contains two processes. The leader and its child. The
child ignores SIGHUP.
2. The leader exits and starts disassociating from the controlling
terminal (/dev/console).
3. __tty_hangup() skips setting f_op to hung_up_tty_fops.
4. SIGHUP is delivered and ignored.
5. tty_ldisc_hangup() is invoked. It wakes up the waits which should
clear the read lockers of tty->ldisc_sem.
6. The reader wakes up but because tty_hung_up_p() is false, it
doesn't abort and goes back to sleep while read-holding
tty->ldisc_sem.
7. The leader progresses to tty_ldisc_lock() in tty_ldisc_hangup()
and is now stuck in D sleep indefinitely waiting for
tty->ldisc_sem.
The following is Alan's explanation on why some ttys aren't hung up.
http://lkml.kernel.org/r/20171101170908.6ad08580@alans-desktop
1. It broke the serial consoles because they would hang up and close
down the hardware. With tty_port that *should* be fixable properly
for any cases remaining.
2. The console layer was (and still is) completely broken and doens't
refcount properly. So if you turn on console hangups it breaks (as
indeed does freeing consoles and half a dozen other things).
As neither can be fixed quickly, this patch works around the problem
by introducing a new flag, TTY_HUPPING, which is used solely to tell
n_tty_read() that hang-up is in progress for the console and the
readers should be aborted regardless of the hung-up status of the
device.
The following is a sample hung task warning caused by this issue.
INFO: task agetty:2662 blocked for more than 120 seconds.
Not tainted 4.11.3-dbg-tty-lockup-02478-gfd6c7ee-dirty #28
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
0 2662 1 0x00000086
Call Trace:
__schedule+0x267/0x890
schedule+0x36/0x80
schedule_timeout+0x23c/0x2e0
ldsem_down_write+0xce/0x1f6
tty_ldisc_lock+0x16/0x30
tty_ldisc_hangup+0xb3/0x1b0
__tty_hangup+0x300/0x410
disassociate_ctty+0x6c/0x290
do_exit+0x7ef/0xb00
do_group_exit+0x3f/0xa0
get_signal+0x1b3/0x5d0
do_signal+0x28/0x660
exit_to_usermode_loop+0x46/0x86
do_syscall_64+0x9c/0xb0
entry_SYSCALL64_slow_path+0x25/0x25
The following is the repro. Run "$PROG /dev/console". The parent
process hangs in D state.
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/wait.h>
#include <sys/ioctl.h>
#include <fcntl.h>
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#include <signal.h>
#include <time.h>
#include <termios.h>
int main(int argc, char **argv)
{
struct sigaction sact = { .sa_handler = SIG_IGN };
struct timespec ts1s = { .tv_sec = 1 };
pid_t pid;
int fd;
if (argc < 2) {
fprintf(stderr, "test-hung-tty /dev/$TTY\n");
return 1;
}
/* fork a child to ensure that it isn't already the session leader */
pid = fork();
if (pid < 0) {
perror("fork");
return 1;
}
if (pid > 0) {
/* top parent, wait for everyone */
while (waitpid(-1, NULL, 0) >= 0)
;
if (errno != ECHILD)
perror("waitpid");
return 0;
}
/* new session, start a new session and set the controlling tty */
if (setsid() < 0) {
perror("setsid");
return 1;
}
fd = open(argv[1], O_RDWR);
if (fd < 0) {
perror("open");
return 1;
}
if (ioctl(fd, TIOCSCTTY, 1) < 0) {
perror("ioctl");
return 1;
}
/* fork a child, sleep a bit and exit */
pid = fork();
if (pid < 0) {
perror("fork");
return 1;
}
if (pid > 0) {
nanosleep(&ts1s, NULL);
printf("Session leader exiting\n");
exit(0);
}
/*
* The child ignores SIGHUP and keeps reading from the controlling
* tty. Because SIGHUP is ignored, the child doesn't get killed on
* parent exit and the bug in n_tty makes the read(2) block the
* parent's control terminal hangup attempt. The parent ends up in
* D sleep until the child is explicitly killed.
*/
sigaction(SIGHUP, &sact, NULL);
printf("Child reading tty\n");
while (1) {
char buf[1024];
if (read(fd, buf, sizeof(buf)) < 0) {
perror("read");
return 1;
}
}
return 0;
}
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Alan Cox <alan@llwyncelyn.cymru>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The tm-resched-dscr test links against pmu/lib.o, but we don't have a
rule to clean pmu/lib.o. This can lead to a build break if you build
for big endian and then little, or vice versa.
Fix it by making tm-resched-dscr depend on pmu/lib.c, causing the code
to be built directly in, meaning no .o is generated.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Once in a while I see build errors similar to the following
when building images from a clean tree.
Building powerpc:virtex-ml507:44x/virtex5_defconfig ... failed
------------
Error log:
arch/powerpc/boot/treeboot-akebono.c:37:20: fatal error:
libfdt.h: No such file or directory
Building powerpc:bamboo:smpdev:44x/bamboo_defconfig ... failed
------------
Error log:
arch/powerpc/boot/treeboot-akebono.c:37:20: fatal error:
libfdt.h: No such file or directory
arch/powerpc/boot/treeboot-currituck.c:35:20: fatal error:
libfdt.h: No such file or directory
Rebuilds will succeed.
Turns out that several source files in arch/powerpc/boot/ include
libfdt.h, but Makefile dependencies are incomplete. Let's fix that.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Normal 512-byte get/set of a TLV isn't supported but we were
registering the normal get/set anyway and relying on omitting
the SNDRV_CTL_ELEM_ACCESS_[READ|WRITE] flags to prevent them
being called.
Trouble is if this gets broken in the core ALSA code - as it has
been since at least 4.14 - the standard get/set can be called
unexpectedly and corrupt memory.
There's no point providing functions that won't be called and
it's a trivial change. The benefit is that if the ALSA core gets
broken again we get a big fat immediate NULL dereference instead
of a memory corruption timebomb.
Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: stable@vger.kernel.org
To reproduce the lock up do the following
- connect otg host adapter and a USB device to the dual-role port
so that it is in host mode.
- suspend to mem.
- disconnect otg adapter.
- resume the system.
If we call dwc3_host_exit() before tasks are thawed
xhci_plat_remove() seems to lock up at the second usb_remove_hcd() call.
To work around this we queue the _dwc3_set_mode() work on
the system_freezable_wq.
Fixes: 41ce1456e1 ("usb: dwc3: core: make dwc3_set_mode() work properly")
Cc: <stable@vger.kernel.org> # v4.12+
Suggested-by: Manu Gautam <mgautam@codeaurora.org>
Signed-off-by: Roger Quadros <rogerq@ti.com>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
The Cinterion PL8 is an LTE modem with 2 possible WWAN interfaces.
The modem is controlled via AT commands through the exposed TTYs.
AT^SWWAN write command can be used to activate or deactivate a WWAN
connection for a PDP context defined with AT+CGDCONT. UE supports
two WWAN adapter. Both WWAN adapters can be activated a the same time
Signed-off-by: Bassem Boubaker <bassem.boubaker@actia.fr>
Acked-by: Oliver Neukum <oneukum@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The tls ulp overrides sk->prot with a new tls specific proto structs.
The tls specific structs were previously based on the ipv4 specific
tcp_prot sturct.
As a result, attaching the tls ulp to an ipv6 tcp socket replaced
some ipv6 callback with the ipv4 equivalents.
This patch adds ipv6 tls proto structs and uses them when
attached to ipv6 sockets.
Fixes: 3c4d755915 ('tls: kernel TLS support')
Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We have uninlined the sh_eth_{read|write}() functions introduced in the
commit 4a55530f38 ("net: sh_eth: modify the definitions of register").
Now remove *inline* from sh_eth_tsu_{read|write}() as well and move
these functions from the header to the driver itself. This saves 684
more bytes of object code (ARM gcc 4.8.5)...
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long says:
====================
net: fix IFLA_MTU ignored on NEWLINK for some ip and ipv6 tunnels
The fix for ip_gre follows the way other ip tunnels do: not to
set mtu in ndo_init, as ip_tunnel_newlink will take care of it
properly.
The fix for ip6_tunnel and sit follows the way ipv6 tunenls do:
to set mtu again according to IFLA_MTU after, as all bind_dev
are called in ndo_init where it can't get the tb[IFLA_MTU].
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 128bb975dc ("ip6_gre: init dev->mtu and dev->hard_header_len
correctly") fixed IFLA_MTU ignored on NEWLINK for ip6_gre. The same
mtu fix is also needed for sit.
Note that dev->hard_header_len setting for sit works fine, no need to
fix it. sit is actually ipv4 tunnel, it can't call ip6_tnl_change_mtu
to set mtu.
Reported-by: Jianlin Shi <jishi@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 128bb975dc ("ip6_gre: init dev->mtu and dev->hard_header_len
correctly") fixed IFLA_MTU ignored on NEWLINK for ip6_gre. The same
mtu fix is also needed for ip6_tunnel.
Note that dev->hard_header_len setting for ip6_tunnel works fine,
no need to fix it.
Reported-by: Jianlin Shi <jishi@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It's safe to remove the setting of dev's needed_headroom and mtu in
__gre_tunnel_init, as discussed in [1], ip_tunnel_newlink can do it
properly.
Now Eric noticed that it could cover the mtu value set in do_setlink
when creating a ip_gre dev. It makes IFLA_MTU param not take effect.
So this patch is to remove them to make IFLA_MTU work, as in other
ipv4 tunnels.
[1]: https://patchwork.ozlabs.org/patch/823504/
Fixes: c544193214 ("GRE: Refactor GRE tunneling code.")
Reported-by: Eric Garver <e@erig.me>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
commit f5e64032a7 ("net: phy: fix resume handling") changes the
locking semantics for phy_resume() such that the caller now needs to
hold the phy mutex. Not all call sites were adopted to this new
semantic, resulting in warnings from the added
WARN_ON(!mutex_is_locked(&phydev->lock)). Rather than change the
semantics, add a __phy_resume() and restore the old behavior of
phy_resume().
Reported-by: Heiner Kallweit <hkallweit1@gmail.com>
Fixes: f5e64032a7 ("net: phy: fix resume handling")
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use the right loop index, not the number of devices in the array that we
need to remove, the following message uncovered the problem:
[ 5437.044119] hook not found, pf 5 num 0
[ 5437.044140] WARNING: CPU: 2 PID: 24983 at net/netfilter/core.c:376 __nf_unregister_net_hook+0x250/0x280
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
In commit 60c2530696 ("tipc: fix race between poll() and
setsockopt()") we introduced a pointer from struct tipc_group to the
'group_is_connected' flag in struct tipc_sock, so that this field can
be checked without dereferencing the group pointer of the latter struct.
The initial value for this flag is correctly set to 'false' when a
group is created, but we miss the case when no group is created at
all, in which case the initial value should be 'true'. This has the
effect that SOCK_RDM/DGRAM sockets sending datagrams never receive
POLLOUT if they request so.
This commit corrects this bug.
Fixes: 60c2530696 ("tipc: fix race between poll() and setsockopt()")
Reported-by: Hoang Le <hoang.h.le@dektek.com.au>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
According to RFC 1191 sections 3 and 4, ICMP frag-needed messages
indicating an MTU below 68 should be rejected:
A host MUST never reduce its estimate of the Path MTU below 68
octets.
and (talking about ICMP frag-needed's Next-Hop MTU field):
This field will never contain a value less than 68, since every
router "must be able to forward a datagram of 68 octets without
fragmentation".
Furthermore, by letting net.ipv4.route.min_pmtu be set to negative
values, we can end up with a very large PMTU when (-1) is cast into u32.
Let's also make ip_rt_min_pmtu a u32, since it's only ever compared to
unsigned ints.
Reported-by: Jianlin Shi <jishi@redhat.com>
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Johan Hedberg says:
====================
pull request: bluetooth 2018-02-26
Here are a two Bluetooth driver fixes for the 4.16 kernel.
Please let me know if there are any issues pulling. Thanks.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The current implementation checks the combined size of the children with
the 'size' of the parent. The correct behavior is to check the combined
size vs the pending change and to compare vs the 'size_new'.
Fixes: d9f9b9a4d0 ("devlink: Add support for resource abstraction")
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Tested-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
According to R-Car Gen3 Rev.0.80 manual, the DMATCR can be set to
16,777,215 as maximum. So, this patch fixes the max_chunk_size for
safety on all of SoCs. Otherwise, a system may hang if the DMATCR
is set to 0 on R-Car Gen3.
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Reviewed-by: Simon Horman <horms+renesas@verge.net.au>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
New options introduced by the patch this fixes are still
enabled even if CFG80211 is disabled.
.config:
# CONFIG_CFG80211 is not set
CONFIG_CFG80211_REQUIRE_SIGNED_REGDB=y
CONFIG_CFG80211_USE_KERNEL_REGDB_KEYS=y
# CONFIG_LIB80211 is not set
When CFG80211_REQUIRE_SIGNED_REGDB is enabled, it selects
SYSTEM_DATA_VERIFICATION which selects SYSTEM_TRUSTED_KEYRING
that need extract-cert tool. extract-cert needs some openssl
headers to be installed on the build machine.
Instead of adding missing "depends on CFG80211", it's
easier to use a 'if' block around all options related
to CFG80211, so do that.
Fixes: 90a53e4432 ("cfg80211: implement regdb signature checking")
Signed-off-by: Romain Naour <romain.naour@gmail.com>
[touch up commit message a bit]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
If the netdevice is already part of a flowtable, return EBUSY. I cannot
find a valid usecase for having two flowtables bound to the same
netdevice. We can still have two flowtable where the device set is
disjoint.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
While working on 16338a9b3a ("bpf, arm64: fix out of bounds access in
tail call") I noticed that ppc64 JIT is partially affected as well. While
the bound checking is correctly performed as unsigned comparison, the
register with the index value however, is never truncated into 32 bit
space, so e.g. a index value of 0x100000000ULL with a map of 1 element
would pass with PPC_CMPLW() whereas we later on continue with the full
64 bit register value. Therefore, as we do in interpreter and other JITs
truncate the value to 32 bit initially in order to fix access.
Fixes: ce0761419f ("powerpc/bpf: Implement support for tail calls")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Tested-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
r8152 driver handles TSO packets (limited to ~16KB) quite well,
but pretends each TSO logical packet is a single packet on the wire.
There is also some error since headers are accounted once, but
error rate is small enough that we do not care.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit 5c38bd1b82.
skb->mark contains the mark the encapsulated traffic which
can result in incorrect routing decisions being made such
as routing loops if the route chosen is via tunnel itself.
The correct method should be to use tunnel->fwmark.
Signed-off-by: Thomas Winter <thomas.winter@alliedtelesis.co.nz>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a VLAN is added on a port, a reference is taken on the
corresponding master VLAN entry. If it does not already exist, then it
is created and a reference taken.
However, in the second case a reference is not really taken when
CONFIG_REFCOUNT_FULL is enabled as refcount_inc() is replaced by
refcount_inc_not_zero().
Fix this by using refcount_set() on a newly created master VLAN entry.
Fixes: 2512775985 ("net, bridge: convert net_bridge_vlan.refcnt from atomic_t to refcount_t")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Added MODULE_ALIAS("rpmsg:IPCRTR") to ensure qrtr-smd and qrtr will load
when IPCRTR channel is detected.
Signed-off-by: Ramon Fried <rfried@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
test_bpf() is taking 1.6 seconds nowadays, it is time
to add a schedule point in it.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Sometimes when physical lines have a just good noise to make the protocol
handshaking fail, but the carrier detect still good. Then after remove of
the noise, nobody will trigger this protocol to be start again to cause
the link to never come back. The fix is when the carrier is still on, not
terminate the protocol handshaking.
Signed-off-by: Denis Du <dudenis2000@yahoo.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
We don't flush batched XDP packets through xdp_do_flush_map(), this
will cause packets stall at TX queue. Consider we don't do XDP on NAPI
poll(), the only possible fix is to call xdp_do_flush_map()
immediately after xdp_do_redirect().
Note, this in fact won't try to batch packets through devmap, we could
address in the future.
Reported-by: Christoffer Dall <christoffer.dall@linaro.org>
Fixes: 761876c857 ("tap: XDP support")
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Except for tuntap, all other drivers' XDP was implemented at NAPI
poll() routine in a bh. This guarantees all XDP operation were done at
the same CPU which is required by e.g BFP_MAP_TYPE_PERCPU_ARRAY. But
for tuntap, we do it in process context and we try to protect XDP
processing by RCU reader lock. This is insufficient since
CONFIG_PREEMPT_RCU can preempt the RCU reader critical section which
breaks the assumption that all XDP were processed in the same CPU.
Fixing this by simply disabling preemption during XDP processing.
Fixes: 761876c857 ("tap: XDP support")
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit 762c330d67. The
reason is we try to batch packets for devmap which causes calling
xdp_do_flush() in the process context. Simply disabling preemption
may not work since process may move among processors which lead
xdp_do_flush() to miss some flushes on some processors.
So simply revert the patch, a follow-up patch will add the xdp flush
correctly.
Reported-by: Christoffer Dall <christoffer.dall@linaro.org>
Fixes: 762c330d67 ("tuntap: add missing xdp flush")
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is not valid for orion5x to use mac_pton().
First of all, the orion5x buffer is not NULL terminated. mac_pton()
has no business operating on non-NULL terminated buffers because
only the caller can know that this is valid and in what manner it
is ok to parse this NULL'less buffer.
Second of all, orion5x operates on an __iomem pointer, which cannot
be dereferenced using normal C pointer operations. Accesses to
such areas much be performed with the proper iomem accessors.
Fixes: 4904dbda41 ("ARM: orion5x: use mac_pton() helper")
Signed-off-by: David S. Miller <davem@davemloft.net>
James Chapman says:
====================
l2tp: fix API races discovered by syzbot
This patch series addresses several races with L2TP APIs discovered by
syzbot. There are no functional changes.
The set of patches 1-5 in combination fix the following syzbot reports.
19c09769f WARNING in debug_print_object
347bd5acd KASAN: use-after-free Read in inet_shutdown
6e6a5ec8d general protection fault in pppol2tp_connect
9df43faf0 KASAN: use-after-free Read in pppol2tp_connect
My first attempts to fix these issues were as net-next patches but
the series included other refactoring and cleanup work. I was asked to
separate out the bugfixes and redo for the net tree, which is what
these patches are.
The changes are:
1. Fix inet_shutdown races when L2TP tunnels and sessions close. (patches 1-2)
2. Fix races with tunnel and its socket. (patch 3)
3. Fix race in pppol2tp_release with session and its socket. (patch 4)
4. Fix tunnel lookup use-after-free. (patch 5)
All of the syzbot reproducers hit races in the tunnel and pppol2tp
session create and destroy paths. These tests create and destroy
pppol2tp tunnels and sessions rapidly using multiple threads,
provoking races in several tunnel/session create/destroy paths. The
key problem was that each tunnel/session socket could be destroyed
while its associated tunnel/session object still existed (patches 3,
4). Patch 5 addresses a problem with the way tunnels are removed from
the tunnel list. Patch 5 is tagged that it addresses all four syzbot
issues, though all 5 patches are needed.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
pppol2tp_release uses call_rcu to put the final ref on its socket. But
the session object doesn't hold a ref on the session socket so may be
freed while the pppol2tp_put_sk RCU callback is scheduled. Fix this by
having the session hold a ref on its socket until the session is
destroyed. It is this ref that is dropped via call_rcu.
Sessions are also deleted via l2tp_tunnel_closeall. This must now also put
the final ref via call_rcu. So move the call_rcu call site into
pppol2tp_session_close so that this happens in both destroy paths. A
common destroy path should really be implemented, perhaps with
l2tp_tunnel_closeall calling l2tp_session_delete like pppol2tp_release
does, but this will be looked at later.
ODEBUG: activate active (active state 1) object type: rcu_head hint: (null)
WARNING: CPU: 3 PID: 13407 at lib/debugobjects.c:291 debug_print_object+0x166/0x220
Modules linked in:
CPU: 3 PID: 13407 Comm: syzbot_19c09769 Not tainted 4.16.0-rc2+ #38
Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
RIP: 0010:debug_print_object+0x166/0x220
RSP: 0018:ffff880013647a00 EFLAGS: 00010082
RAX: dffffc0000000008 RBX: 0000000000000003 RCX: ffffffff814d3333
RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff88001a59f6d0
RBP: ffff880013647a40 R08: 0000000000000000 R09: 0000000000000001
R10: ffff8800136479a8 R11: 0000000000000000 R12: 0000000000000001
R13: ffffffff86161420 R14: ffffffff85648b60 R15: 0000000000000000
FS: 0000000000000000(0000) GS:ffff88001a580000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020e77000 CR3: 0000000006022000 CR4: 00000000000006e0
Call Trace:
debug_object_activate+0x38b/0x530
? debug_object_assert_init+0x3b0/0x3b0
? __mutex_unlock_slowpath+0x85/0x8b0
? pppol2tp_session_destruct+0x110/0x110
__call_rcu.constprop.66+0x39/0x890
? __call_rcu.constprop.66+0x39/0x890
call_rcu_sched+0x17/0x20
pppol2tp_release+0x2c7/0x440
? fcntl_setlk+0xca0/0xca0
? sock_alloc_file+0x340/0x340
sock_release+0x92/0x1e0
sock_close+0x1b/0x20
__fput+0x296/0x6e0
____fput+0x1a/0x20
task_work_run+0x127/0x1a0
do_exit+0x7f9/0x2ce0
? SYSC_connect+0x212/0x310
? mm_update_next_owner+0x690/0x690
? up_read+0x1f/0x40
? __do_page_fault+0x3c8/0xca0
do_group_exit+0x10d/0x330
? do_group_exit+0x330/0x330
SyS_exit_group+0x22/0x30
do_syscall_64+0x1e0/0x730
? trace_hardirqs_off_thunk+0x1a/0x1c
entry_SYSCALL_64_after_hwframe+0x42/0xb7
RIP: 0033:0x7f362e471259
RSP: 002b:00007ffe389abe08 EFLAGS: 00000202 ORIG_RAX: 00000000000000e7
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f362e471259
RDX: 00007f362e471259 RSI: 000000000000002e RDI: 0000000000000000
RBP: 00007ffe389abe30 R08: 0000000000000000 R09: 00007f362e944270
R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000400b60
R13: 00007ffe389abf50 R14: 0000000000000000 R15: 0000000000000000
Code: 8d 3c dd a0 8f 64 85 48 89 fa 48 c1 ea 03 80 3c 02 00 75 7b 48 8b 14 dd a0 8f 64 85 4c 89 f6 48 c7 c7 20 85 64 85 e
8 2a 55 14 ff <0f> 0b 83 05 ad 2a 68 04 01 48 83 c4 18 5b 41 5c 41 5d 41 5e 41
Fixes: ee40fb2e1e ("l2tp: protect sock pointer of struct pppol2tp_session with RCU")
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The tunnel socket tunnel->sock (struct sock) is accessed when
preparing a new ppp session on a tunnel at pppol2tp_session_init. If
the socket is closed by a thread while another is creating a new
session, the threads race. In pppol2tp_connect, the tunnel object may
be created if the pppol2tp socket is associated with the special
session_id 0 and the tunnel socket is looked up using the provided
fd. When handling this, pppol2tp_connect cannot sock_hold the tunnel
socket to prevent it being destroyed during pppol2tp_connect since
this may itself may race with the socket being destroyed. Doing
sockfd_lookup in pppol2tp_connect isn't sufficient to prevent
tunnel->sock going away either because a given tunnel socket fd may be
reused between calls to pppol2tp_connect. Instead, have
l2tp_tunnel_create sock_hold the tunnel socket before it does
sockfd_put. This ensures that the tunnel's socket is always extant
while the tunnel object exists. Hold a ref on the socket until the
tunnel is destroyed and ensure that all tunnel destroy paths go
through a common function (l2tp_tunnel_delete) since this will do the
final sock_put to release the tunnel socket.
Since the tunnel's socket is now guaranteed to exist if the tunnel
exists, we no longer need to use sockfd_lookup via l2tp_sock_to_tunnel
to derive the tunnel from the socket since this is always
sk_user_data.
Also, sessions no longer sock_hold the tunnel socket since sessions
already hold a tunnel ref and the tunnel sock will not be freed until
the tunnel is freed. Removing these sock_holds in
l2tp_session_register avoids a possible sock leak in the
pppol2tp_connect error path if l2tp_session_register succeeds but
attaching a ppp channel fails. The pppol2tp_connect error path could
have been fixed instead and have the sock ref dropped when the session
is freed, but doing a sock_put of the tunnel socket when the session
is freed would require a new session_free callback. It is simpler to
just remove the sock_hold of the tunnel socket in
l2tp_session_register, now that the tunnel socket lifetime is
guaranteed.
Finally, some init code in l2tp_tunnel_create is reordered to ensure
that the new tunnel object's refcount is set and the tunnel socket ref
is taken before the tunnel socket destructor callbacks are set.
kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] SMP KASAN
Modules linked in:
CPU: 0 PID: 4360 Comm: syzbot_19c09769 Not tainted 4.16.0-rc2+ #34
Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
RIP: 0010:pppol2tp_session_init+0x1d6/0x500
RSP: 0018:ffff88001377fb40 EFLAGS: 00010212
RAX: dffffc0000000000 RBX: ffff88001636a940 RCX: ffffffff84836c1d
RDX: 0000000000000045 RSI: 0000000055976744 RDI: 0000000000000228
RBP: ffff88001377fb60 R08: ffffffff84836bc8 R09: 0000000000000002
R10: ffff88001377fab8 R11: 0000000000000001 R12: 0000000000000000
R13: ffff88001636aac8 R14: ffff8800160f81c0 R15: 1ffff100026eff76
FS: 00007ffb3ea66700(0000) GS:ffff88001a400000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020e77000 CR3: 0000000016261000 CR4: 00000000000006f0
Call Trace:
pppol2tp_connect+0xd18/0x13c0
? pppol2tp_session_create+0x170/0x170
? __might_fault+0x115/0x1d0
? lock_downgrade+0x860/0x860
? __might_fault+0xe5/0x1d0
? security_socket_connect+0x8e/0xc0
SYSC_connect+0x1b6/0x310
? SYSC_bind+0x280/0x280
? __do_page_fault+0x5d1/0xca0
? up_read+0x1f/0x40
? __do_page_fault+0x3c8/0xca0
SyS_connect+0x29/0x30
? SyS_accept+0x40/0x40
do_syscall_64+0x1e0/0x730
? trace_hardirqs_off_thunk+0x1a/0x1c
entry_SYSCALL_64_after_hwframe+0x42/0xb7
RIP: 0033:0x7ffb3e376259
RSP: 002b:00007ffeda4f6508 EFLAGS: 00000202 ORIG_RAX: 000000000000002a
RAX: ffffffffffffffda RBX: 0000000020e77012 RCX: 00007ffb3e376259
RDX: 000000000000002e RSI: 0000000020e77000 RDI: 0000000000000004
RBP: 00007ffeda4f6540 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000400b60
R13: 00007ffeda4f6660 R14: 0000000000000000 R15: 0000000000000000
Code: 80 3d b0 ff 06 02 00 0f 84 07 02 00 00 e8 13 d6 db fc 49 8d bc 24 28 02 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 f
a 48 c1 ea 03 <80> 3c 02 00 0f 85 ed 02 00 00 4d 8b a4 24 28 02 00 00 e8 13 16
Fixes: 80d84ef3ff ("l2tp: prevent l2tp_tunnel_delete racing with userspace close")
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Previously, if a ppp session was closed, we called inet_shutdown to mark
the socket as unconnected such that userspace would get errors and
then close the socket. This could race with userspace closing the
socket. Instead, leave userspace to close the socket in its own time
(our session will be detached anyway).
BUG: KASAN: use-after-free in inet_shutdown+0x5d/0x1c0
Read of size 4 at addr ffff880010ea3ac0 by task syzbot_347bd5ac/8296
CPU: 3 PID: 8296 Comm: syzbot_347bd5ac Not tainted 4.16.0-rc1+ #91
Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
Call Trace:
dump_stack+0x101/0x157
? inet_shutdown+0x5d/0x1c0
print_address_description+0x78/0x260
? inet_shutdown+0x5d/0x1c0
kasan_report+0x240/0x360
__asan_load4+0x78/0x80
inet_shutdown+0x5d/0x1c0
? pppol2tp_show+0x80/0x80
pppol2tp_session_close+0x68/0xb0
l2tp_tunnel_closeall+0x199/0x210
? udp_v6_flush_pending_frames+0x90/0x90
l2tp_udp_encap_destroy+0x6b/0xc0
? l2tp_tunnel_del_work+0x2e0/0x2e0
udpv6_destroy_sock+0x8c/0x90
sk_common_release+0x47/0x190
udp_lib_close+0x15/0x20
inet_release+0x85/0xd0
inet6_release+0x43/0x60
sock_release+0x53/0x100
? sock_alloc_file+0x260/0x260
sock_close+0x1b/0x20
__fput+0x19f/0x380
____fput+0x1a/0x20
task_work_run+0xd2/0x110
exit_to_usermode_loop+0x18d/0x190
do_syscall_64+0x389/0x3b0
entry_SYSCALL_64_after_hwframe+0x26/0x9b
RIP: 0033:0x7fe240a45259
RSP: 002b:00007fe241132df8 EFLAGS: 00000297 ORIG_RAX: 0000000000000003
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00007fe240a45259
RDX: 00007fe240a45259 RSI: 0000000000000000 RDI: 00000000000000a5
RBP: 00007fe241132e20 R08: 00007fe241133700 R09: 0000000000000000
R10: 00007fe241133700 R11: 0000000000000297 R12: 0000000000000000
R13: 00007ffc49aff84f R14: 0000000000000000 R15: 00007fe241141040
Allocated by task 8331:
save_stack+0x43/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
kmem_cache_alloc+0x144/0x3e0
sock_alloc_inode+0x22/0x130
alloc_inode+0x3d/0xf0
new_inode_pseudo+0x1c/0x90
sock_alloc+0x30/0x110
__sock_create+0xaa/0x4c0
SyS_socket+0xbe/0x130
do_syscall_64+0x128/0x3b0
entry_SYSCALL_64_after_hwframe+0x26/0x9b
Freed by task 8314:
save_stack+0x43/0xd0
__kasan_slab_free+0x11a/0x170
kasan_slab_free+0xe/0x10
kmem_cache_free+0x88/0x2b0
sock_destroy_inode+0x49/0x50
destroy_inode+0x77/0xb0
evict+0x285/0x340
iput+0x429/0x530
dentry_unlink_inode+0x28c/0x2c0
__dentry_kill+0x1e3/0x2f0
dput.part.21+0x500/0x560
dput+0x24/0x30
__fput+0x2aa/0x380
____fput+0x1a/0x20
task_work_run+0xd2/0x110
exit_to_usermode_loop+0x18d/0x190
do_syscall_64+0x389/0x3b0
entry_SYSCALL_64_after_hwframe+0x26/0x9b
Fixes: fd558d186d ("l2tp: Split pppol2tp patch into separate l2tp and ppp parts")
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
redirect_dir=nofollow should not follow a redirect. But in a specific
configuration it can still follow it. For example try this.
$ mkdir -p lower0 lower1/foo upper work merged
$ touch lower1/foo/lower-file.txt
$ setfattr -n "trusted.overlay.opaque" -v "y" lower1/foo
$ mount -t overlay -o lowerdir=lower1:lower0,workdir=work,upperdir=upper,redirect_dir=on none merged
$ cd merged
$ mv foo foo-renamed
$ umount merged
# mount again. This time with redirect_dir=nofollow
$ mount -t overlay -o lowerdir=lower1:lower0,workdir=work,upperdir=upper,redirect_dir=nofollow none merged
$ ls merged/foo-renamed/
# This lists lower-file.txt, while it should not have.
Basically, we are doing redirect check after we check for d.stop. And
if this is not last lower, and we find an opaque lower, d.stop will be
set.
ovl_lookup_single()
if (!d->last && ovl_is_opaquedir(this)) {
d->stop = d->opaque = true;
goto out;
}
To fix this, first check redirect is allowed. And after that check if
d.stop has been set or not.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Fixes: 438c84c2f0 ("ovl: don't follow redirects if redirect_dir=off")
Cc: <stable@vger.kernel.org> #v4.15
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
The routine pgattr_change_is_safe() was extended in commit 4e60205655
("arm64: mm: Permit transitioning from Global to Non-Global without BBM")
to permit changing the nG attribute from not set to set, but did so in a
way that inadvertently disallows such changes if other permitted attribute
changes take place at the same time. So update the code to take this into
account.
Fixes: 4e60205655 ("arm64: mm: Permit transitioning from Global to ...")
Cc: <stable@vger.kernel.org> # 4.14.x-
Acked-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
fs/overlayfs/export.c:459:10-16: WARNING: PTR_ERR_OR_ZERO can be used
Use PTR_ERR_OR_ZERO rather than if(IS_ERR(...)) + PTR_ERR
Generated by: scripts/coccinelle/api/ptr_ret.cocci
Fixes: 4b91c30a5a ("ovl: lookup connected ancestor of dir in inode cache")
CC: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Some processor revisions do not support transactional memory, and
additionally kernel support can be disabled. In either case the
tm-trap test should be skipped, otherwise it will fail with a SIGILL.
Fixes: a08082f8e4 ("powerpc/selftests: Check endianness on trap in TM")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
According to the devicetree binding the shutdown and device wake
GPIOs are optional. Since commit 3e81a4ca51 ("Bluetooth: hci_bcm:
Mandate presence of shutdown and device wake GPIO") this driver
won't probe anymore on Raspberry Pi 3 and Zero W (no device wake GPIO
connected). So fix this regression by reverting this commit partially.
Fixes: 3e81a4ca51 ("Bluetooth: hci_bcm: Mandate presence of shutdown and device wake GPIO")
Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com>
Reviewed-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
For some reason, Florian forgot to apply to ip6_route_me_harder
the fix that went in commit 29e09229d9 ("netfilter: use
skb_to_full_sk in ip_route_me_harder")
Fixes: ca6fb06518 ("tcp: attach SYNACK messages to request sockets instead of listener")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
batman-adv uses internal indices for each enabled and active interface.
It is currently used by the B.A.T.M.A.N. IV algorithm to identifify the
correct position in the ogm_cnt bitmaps.
The type for the number of enabled interfaces (which defines the next
interface index) was set to char. This type can be (depending on the
architecture) either signed (limiting batman-adv to 127 active slave
interfaces) or unsigned (limiting batman-adv to 255 active slave
interfaces).
This limit was not correctly checked when an interface was enabled and thus
an overflow happened. This was only catched on systems with the signed char
type when the B.A.T.M.A.N. IV code tried to resize its counter arrays with
a negative size.
The if_num interface index was only a s16 and therefore significantly
smaller than the ifindex (int) used by the code net code.
Both &batadv_hard_iface->if_num and &batadv_priv->num_ifaces must be
(unsigned) int to support the same number of slave interfaces as the net
core code. And the interface activation code must check the number of
active slave interfaces to avoid integer overflows.
Fixes: c6c8fea297 ("net: Add batman-adv meshing protocol")
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
"fib" starts to behave strangely when an ipv6 default route is
added - the FIB lookup returns a route using 'oif' in this case.
This behaviour was inherited from ip6tables rpfilter so change
this as well.
Bugzilla: https://bugzilla.netfilter.org/show_bug.cgi?id=1221
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
In the ip_rcv, IPSTATS_MIB_CSUMERRORS is increased when
checksum error is occurred.
bridge netfilter routine should increase IPSTATS_MIB_CSUMERRORS.
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The function batadv_bla_backbone_dump_bucket must be able to handle
non-complete dumps of a single bucket. It tries to do that by saving the
latest dumped index in *idx_skip to inform the caller about the current
state.
But the caller only assumes that buckets were not completely dumped when
the return code is non-zero. This function must therefore also return a
non-zero index when the dumping of an entry failed. Otherwise the caller
will just skip all remaining buckets.
And the function must also reset *idx_skip back to zero when it finished a
bucket. Otherwise it will skip the same number of entries in the next
bucket as the previous one had.
Fixes: ea4152e117 ("batman-adv: add backbone table netlink support")
Reported-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
The function batadv_bla_claim_dump_bucket must be able to handle
non-complete dumps of a single bucket. It tries to do that by saving the
latest dumped index in *idx_skip to inform the caller about the current
state.
But the caller only assumes that buckets were not completely dumped when
the return code is non-zero. This function must therefore also return a
non-zero index when the dumping of an entry failed. Otherwise the caller
will just skip all remaining buckets.
And the function must also reset *idx_skip back to zero when it finished a
bucket. Otherwise it will skip the same number of entries in the next
bucket as the previous one had.
Fixes: 04f3f5bf18 ("batman-adv: add B.A.T.M.A.N. Dump BLA claims via netlink")
Reported-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
The function batadv_v_gw_dump stops the processing loop when
batadv_v_gw_dump_entry returns a non-0 return code. This should only
happen when the buffer is full. Otherwise, an empty message may be
returned by batadv_gw_dump. This empty message will then stop the netlink
dumping of gateway entries. At worst, not a single entry is returned to
userspace even when plenty of possible gateways exist.
Fixes: b71bb6f924 ("batman-adv: add B.A.T.M.A.N. V bat_gw_dump implementations")
Signed-off-by: Sven Eckelmann <sven.eckelmann@openmesh.com>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
The function batadv_iv_gw_dump stops the processing loop when
batadv_iv_gw_dump_entry returns a non-0 return code. This should only
happen when the buffer is full. Otherwise, an empty message may be
returned by batadv_gw_dump. This empty message will then stop the netlink
dumping of gateway entries. At worst, not a single entry is returned to
userspace even when plenty of possible gateways exist.
Fixes: efb766af06 ("batman-adv: add B.A.T.M.A.N. IV bat_gw_dump implementations")
Signed-off-by: Sven Eckelmann <sven.eckelmann@openmesh.com>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
We need to make sure the offsets are not out of range of the
total size.
Also check that they are in ascending order.
The WARN_ON triggered by syzkaller (it sets panic_on_warn) is
changed to also bail out, no point in continuing parsing.
Briefly tested with simple ruleset of
-A INPUT --limit 1/s' --log
plus jump to custom chains using 32bit ebtables binary.
Reported-by: <syzbot+845a53d13171abf8bf29@syzkaller.appspotmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
All of these conditions are not fatal and should have
been WARN_ONs from the get-go.
Convert them to WARN_ONs and bail out.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
ebt_among is special, it has a dynamic match size and is exempt
from the central size checks.
Therefore it must check that the size of the match structure
provided from userspace is sane by making sure em->match_size
is at least the minimum size of the expected structure.
The module has such a check, but its only done after accessing
a structure that might be out of bounds.
tested with: ebtables -A INPUT ... \
--among-dst fe:fe:fe:fe:fe:fe
--among-dst fe:fe:fe:fe:fe:fe --among-src fe:fe:fe:fe:ff:f,fe:fe:fe:fe:fe:fb,fe:fe:fe:fe:fc:fd,fe:fe:fe:fe:fe:fd,fe:fe:fe:fe:fe:fe
--among-src fe:fe:fe:fe:ff:f,fe:fe:fe:fe:fe:fa,fe:fe:fe:fe:fe:fd,fe:fe:fe:fe:fe:fe,fe:fe:fe:fe:fe:fe
Reported-by: <syzbot+fe0b19af568972814355@syzkaller.appspotmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Once struct is added to per-netns list it becomes visible to other cpus,
so we cannot use kfree().
Also delay setting entries refcount to 1 until after everything is
initialised so that when we call clusterip_config_put() in this spot
entries is still zero.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This needs to put() the entry to avoid a resource leak in error path.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
A more sophisticated implementation could try to combine fragment checksums
when all fragments have CHECKSUM_COMPLETE and are split at even offsets.
For now, we just set ip_summed to CHECKSUM_NONE to avoid "hw csum failure"
warnings in the kernel log when fragmented frames are received. In
consequence, skb_pull_rcsum() can be replaced with skb_pull().
Note that in usual setups, packets don't reach batman-adv with
CHECKSUM_COMPLETE (I assume NICs bail out of checksumming when they see
batadv's ethtype?), which is why the log messages do not occur on every
system using batman-adv. I could reproduce this issue by stacking
batman-adv on top of a VXLAN interface.
Fixes: 610bfc6bc9 ("batman-adv: Receive fragmented packets and merge")
Tested-by: Maximilian Wilhelm <max@sdn.clinic>
Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
eth_type_trans() internally calls skb_pull(), which does not adjust the
skb checksum; skb_postpull_rcsum() is necessary to avoid log spam of the
form "bat0: hw csum failure" when packets with CHECKSUM_COMPLETE are
received.
Note that in usual setups, packets don't reach batman-adv with
CHECKSUM_COMPLETE (I assume NICs bail out of checksumming when they see
batadv's ethtype?), which is why the log messages do not occur on every
system using batman-adv. I could reproduce this issue by stacking
batman-adv on top of a VXLAN interface.
Fixes: c6c8fea297 ("net: Add batman-adv meshing protocol")
Tested-by: Maximilian Wilhelm <max@sdn.clinic>
Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
i_dir_seq is subject to concurrent modification by a cmpxchg or
store-release operation, so ensure that the relaxed access in
d_alloc_parallel uses READ_ONCE.
Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
If d_alloc_parallel runs concurrently with __d_add, it is possible for
d_alloc_parallel to continuously retry whilst i_dir_seq has been
incremented to an odd value by __d_add:
CPU0:
__d_add
n = start_dir_add(dir);
cmpxchg(&dir->i_dir_seq, n, n + 1) == n
CPU1:
d_alloc_parallel
retry:
seq = smp_load_acquire(&parent->d_inode->i_dir_seq) & ~1;
hlist_bl_lock(b);
bit_spin_lock(0, (unsigned long *)b); // Always succeeds
CPU0:
__d_lookup_done(dentry)
hlist_bl_lock
bit_spin_lock(0, (unsigned long *)b); // Never succeeds
CPU1:
if (unlikely(parent->d_inode->i_dir_seq != seq)) {
hlist_bl_unlock(b);
goto retry;
}
Since the simple bit_spin_lock used to implement hlist_bl_lock does not
provide any fairness guarantees, then CPU1 can starve CPU0 of the lock
and prevent it from reaching end_dir_add(dir), therefore CPU1 cannot
exit its retry loop because the sequence number always has the bottom
bit set.
This patch resolves the livelock by not taking hlist_bl_lock in
d_alloc_parallel if the sequence counter is odd, since any subsequent
masked comparison with i_dir_seq will fail anyway.
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Reported-by: Naresh Madhusudana <naresh.madhusudana@arm.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
In commit c713fb071e ("rtlwifi: rtl8821ae: Fix connection lost problem
correctly") a problem in rtl8821ae that caused loss of signal was fixed.
That same problem has now been reported for rtl8723be. Accordingly,
the ASPM L1 latency has been increased from 0 to 7 to fix the instability.
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Cc: Stable <stable@vger.kernel.org>
Tested-by: James Cameron <quozl@laptop.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
First batch of iwlwifi fixes intended for 4.16:
* A couple of bugzilla fixes
- Fix a bogus warning when freeing a TFD;
- Fix severe throughput problem with 9000 series;
* Fix for a bug that caused queue hangs in certain situations;
* Fix for an issue with IBSS;
* Fix an issue with rate-scalingin AP-mode.
io-channel-cells should be <0> since sigma delta modulator exports only
one channel, as described in ../iio/iio-bindings.txt "IIO providers"
section. Only the phandle is necessary for IIO consumers in this case.
Fixes: af11143757 ("IIO: Add DT bindings for sigma delta adc modulator")
Signed-off-by: Fabrice Gasnier <fabrice.gasnier@st.com>
Acked-by: Arnaud Pouliquen <arnaud.pouliquen@st.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
When several channels are registered (e.g. via st,adc-channels property):
- channels array is wrongly filled in. Only 1st element in array is being
initialized with last registered channel.
Fix it by passing reference to relevant channel (e.g. array[index]).
- only last initialized channel can work properly (e.g. unique 'ch_id'
is used). Converting any other channel result in conversion timeout.
Fix it by getting rid of 'ch_id', use chan->channel instead.
Signed-off-by: Fabrice Gasnier <fabrice.gasnier@st.com>
Acked-by: Arnaud Pouliquen <arnaud.pouliquen@st.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
CCS811 has different I2C register maps in boot and application mode. When
CCS811 is in boot mode, register APP_START (0xF4) is used to transit the
firmware state from boot to application mode. However, APP_START is not a
valid register location when CCS811 is in application mode (refer to
"CCS811 Bootloader Register Map" and "CCS811 Application Register Map" in
CCS811 datasheet). The driver should not attempt to perform a write to
APP_START while CCS811 is in application mode, as this is not a valid or
documented register location.
When prob function is being called, the driver assumes the CCS811 sensor
is in boot mode, and attempts to perform a write to APP_START. Although
CCS811 powers-up in boot mode, it may have already been transited to
application mode by previous instances, e.g. unload and reload device
driver by the system, or explicitly by user. Depending on the system
design, CCS811 sensor may be permanently connected to system power source
rather than power controlled by GPIO, hence it is possible that the sensor
is never power reset, thus the firmware could be in either boot or
application mode at any given time when driver prob function is being
called.
This patch checks the STATUS register before attempting to send a write to
APP_START. Only if the firmware is not in application mode and has valid
firmware application loaded, then it will continue to start transiting the
firmware boot to application mode.
Signed-off-by: Richard Lai <richard@richardman.com>
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
In case when dentry passed to lock_parent() is protected from freeing only
by the fact that it's on a shrink list and trylock of parent fails, we
could get hit by __dentry_kill() (and subsequent dentry_kill(parent))
between unlocking dentry and locking presumed parent. We need to recheck
that dentry is alive once we lock both it and parent *and* postpone
rcu_read_unlock() until after that point. Otherwise we could return
a pointer to struct dentry that already is rcu-scheduled for freeing, with
->d_lock held on it; caller's subsequent attempt to unlock it can end
up with memory corruption.
Cc: stable@vger.kernel.org # 3.12+, counting backports
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
The requirements around atomic_add() / atomic64_add() resp. their
JIT implementations differ across architectures. E.g. while x86_64
seems just fine with BPF's xadd on unaligned memory, on arm64 it
triggers via interpreter but also JIT the following crash:
[ 830.864985] Unable to handle kernel paging request at virtual address ffff8097d7ed6703
[...]
[ 830.916161] Internal error: Oops: 96000021 [#1] SMP
[ 830.984755] CPU: 37 PID: 2788 Comm: test_verifier Not tainted 4.16.0-rc2+ #8
[ 830.991790] Hardware name: Huawei TaiShan 2280 /BC11SPCD, BIOS 1.29 07/17/2017
[ 830.998998] pstate: 80400005 (Nzcv daif +PAN -UAO)
[ 831.003793] pc : __ll_sc_atomic_add+0x4/0x18
[ 831.008055] lr : ___bpf_prog_run+0x1198/0x1588
[ 831.012485] sp : ffff00001ccabc20
[ 831.015786] x29: ffff00001ccabc20 x28: ffff8017d56a0f00
[ 831.021087] x27: 0000000000000001 x26: 0000000000000000
[ 831.026387] x25: 000000c168d9db98 x24: 0000000000000000
[ 831.031686] x23: ffff000008203878 x22: ffff000009488000
[ 831.036986] x21: ffff000008b14e28 x20: ffff00001ccabcb0
[ 831.042286] x19: ffff0000097b5080 x18: 0000000000000a03
[ 831.047585] x17: 0000000000000000 x16: 0000000000000000
[ 831.052885] x15: 0000ffffaeca8000 x14: 0000000000000000
[ 831.058184] x13: 0000000000000000 x12: 0000000000000000
[ 831.063484] x11: 0000000000000001 x10: 0000000000000000
[ 831.068783] x9 : 0000000000000000 x8 : 0000000000000000
[ 831.074083] x7 : 0000000000000000 x6 : 000580d428000000
[ 831.079383] x5 : 0000000000000018 x4 : 0000000000000000
[ 831.084682] x3 : ffff00001ccabcb0 x2 : 0000000000000001
[ 831.089982] x1 : ffff8097d7ed6703 x0 : 0000000000000001
[ 831.095282] Process test_verifier (pid: 2788, stack limit = 0x0000000018370044)
[ 831.102577] Call trace:
[ 831.105012] __ll_sc_atomic_add+0x4/0x18
[ 831.108923] __bpf_prog_run32+0x4c/0x70
[ 831.112748] bpf_test_run+0x78/0xf8
[ 831.116224] bpf_prog_test_run_xdp+0xb4/0x120
[ 831.120567] SyS_bpf+0x77c/0x1110
[ 831.123873] el0_svc_naked+0x30/0x34
[ 831.127437] Code: 97fffe97 17ffffec 00000000 f9800031 (885f7c31)
Reason for this is because memory is required to be aligned. In
case of BPF, we always enforce alignment in terms of stack access,
but not when accessing map values or packet data when the underlying
arch (e.g. arm64) has CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS set.
xadd on packet data that is local to us anyway is just wrong, so
forbid this case entirely. The only place where xadd makes sense in
fact are map values; xadd on stack is wrong as well, but it's been
around for much longer. Specifically enforce strict alignment in case
of xadd, so that we handle this case generically and avoid such crashes
in the first place.
Fixes: 17a5267067 ("bpf: verifier (add verifier core)")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
When a large BPF percpu map is destroyed, I have seen
pcpu_balance_workfn() holding cpu for hundreds of milliseconds.
On KASAN config and 112 hyperthreads, average time to destroy a chunk
is ~4 ms.
[ 2489.841376] destroy chunk 1 in 4148689 ns
...
[ 2490.093428] destroy chunk 32 in 4072718 ns
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Fixes rx for 4-addr packets in AP mode. These may be used for setting
up a 4-addr link for stations that are allowed to do so.
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Since we now have the convenient helper to do so, actually adjust the
TSQ pacing shift for packets going out over a WiFi interface. This
significantly improves throughput for locally-originated TCP
connections. The default pacing shift of 10 corresponds to ~1ms of
queued packet data. Adjusting this to a shift of 8 (i.e. ~4ms) improves
1-hop throughput for ath9k by a factor of 3, whereas increasing it more
has diminishing returns.
Achieved throughput for different values of sk_pacing_shift (average of
5 iterations of 10-sec netperf runs to a host on the other side of the
WiFi hop):
sk_pacing_shift 10: 43.21 Mbps (pre-patch)
sk_pacing_shift 9: 78.17 Mbps
sk_pacing_shift 8: 123.94 Mbps
sk_pacing_shift 7: 128.31 Mbps
Latency for competing flows increases from ~3 ms to ~10 ms with this
change. This is about the same magnitude of queueing latency induced by
flows that are not originated on the WiFi device itself (and so are not
limited by TSQ).
Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
We need to enable PM runtime on omap1 also as otherwise we
will get errors:
omap_timer omap_timer.1: omap_dm_timer_probe: pm_runtime_get_sync failed!
omap_timer: probe of omap_timer.1 failed with error -13
...
We are checking for OMAP_TIMER_NEEDS_RESET flag elsewhere so this is
safe to do.
Cc: Aaro Koskinen <aaro.koskinen@iki.fi>
Cc: Keerthy <j-keerthy@ti.com>
Cc: Ladislav Michl <ladis@linux-mips.org>
Signed-off-by: Tony Lindgren <tony@atomide.com>
The system call path can be interrupted before the switch back to the
standard branch prediction with BPENTER has been done. The critical
section cleanup code skips forward to .Lsysc_do_svc and bypasses the
BPENTER. In this case the kernel and all subsequent code will run with
the limited branch prediction.
Fixes: eacf67eb9b32 ("s390: run user space and KVM guests with modified branch prediction")
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
When we terminate driver I/O (because we need to stop using a certain
channel path) we also need to ensure that a timer (which may have been
set up using ccw_device_start_timeout) is cleared.
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
When a timeout occurs for users of ccw_device_start_timeout
we will stop the IO and call the drivers int handler with
the irb pointer set to ERR_PTR(-ETIMEDOUT). Sometimes
however we'd set the irb pointer to ERR_PTR(-EIO) which is
not intended. Just set the correct value in all codepaths.
Reported-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
There are cases a device driver can't start IO because the device is
currently in use by cio. In this case the device driver is notified
when the device is usable again.
Using ccw_device_start_timeout we would set the timeout (and change
an existing timeout) before we test for internal usage. Worst case
this could lead to an unexpected timer deletion.
Fix this by setting the timeout after we test for internal usage.
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Commit f19fbd5ed6 ("s390: introduce execute-trampolines for
branches") introduces .cfi_* assembler directives. Instead of
using the directives directly, use the macros from asm/dwarf.h.
This also ensures that the dwarf debug information are created
in the .debug_frame section.
Fixes: f19fbd5ed6 ("s390: introduce execute-trampolines for branches")
Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
This patch finishes all outstanding SCSI IO commands (but not other commands,
e.g., task management) in the shutdown and unload paths.
It first waits for the commands to complete (this is done after setting
'ioc->remove_host = 1 ', which prevents new commands to be queued) then it
flushes commands that might still be running.
This avoids triggering error handling (e.g., abort command) for all commands
possibly completed by the adapter after interrupts disabled.
[mauricfo: introduced something in commit message.]
Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Tested-by: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com>
Signed-off-by: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This patch adds checks for 'ioc->remove_host' in the SCSI error handlers, so
not to access pointers/resources potentially freed in the PCI shutdown/module
unload path. The error handlers may be invoked after shutdown/unload,
depending on other components.
This problem was observed with kexec on a system with a mpt3sas based adapter
and an infiniband adapter which takes long enough to shutdown:
The mpt3sas driver finished shutting down / disabled interrupt handling, thus
some commands have not finished and timed out.
Since the system was still running (waiting for the infiniband adapter to
shutdown), the scsi error handler for task abort of mpt3sas was invoked, and
hit an oops -- either in scsih_abort() because 'ioc->scsi_lookup' was NULL
without commit dbec4c9040 ("scsi: mpt3sas: lockless command submission"), or
later up in scsih_host_reset() (with or without that commit), because it
eventually called mpt3sas_base_get_iocstate().
After the above commit, the oops in scsih_abort() does not occur anymore
(_scsih_scsi_lookup_find_by_scmd() is no longer called), but that commit is
too big and out of the scope of linux-stable, where this patch might help, so
still go for the changes.
Also, this might help to prevent similar errors in the future, in case code
changes and possibly tries to access freed stuff.
Note the fix in scsih_host_reset() is still important anyway.
Signed-off-by: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com>
Acked-by: Sreekanth Reddy <Sreekanth.Reddy@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Update the algorithm in storvsc_do_io to look for a channel
starting with the current CPU + 1 and wrap around (within the
current NUMA node). This spreads VMbus interrupts more evenly
across CPUs. Previous code always started with first CPU in
the current NUMA node, skewing the interrupt load to that CPU.
Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Reviewed-by: Long Li <longli@microsoft.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
A domain cgroup isn't allowed to be turned threaded if its subtree is
populated or domain controllers are enabled. cgroup_enable_threaded()
depended on cgroup_can_be_thread_root() test to enforce this rule. A
parent which has populated domain descendants or have domain
controllers enabled can't become a thread root, so the above rules are
enforced automatically.
However, for the root cgroup which can host mixed domain and threaded
children, cgroup_can_be_thread_root() doesn't check any of those
conditions and thus first level cgroups ends up escaping those rules.
This patch fixes the bug by adding explicit checks for those rules in
cgroup_enable_threaded().
Reported-by: Michael Kerrisk (man-pages) <mtk.manpages@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Fixes: 8cfd8147df ("cgroup: implement cgroup v2 thread support")
Cc: stable@vger.kernel.org # v4.14+
After Laptop Mode Tools starts to use min_power for LPM, a user found
out Crucial BX100 SSD can't get mounted.
Crucial BX100 SSD 500GB drive don't work well with min_power. This also
happens to med_power_with_dipm.
So let's disable LPM for Crucial BX100 SSD 500GB drive.
BugLink: https://bugs.launchpad.net/bugs/1726930
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: stable@vger.kernel.org
The hcp->chmap_info must not be freed up in the hdmi_codec_remove()
function as it leads to kernel crash due ALSA core's
pcm_chmap_ctl_private_free() is trying to free it up again when the card
destroyed via snd_card_free.
Commit cd6111b262 ("ASoC: hdmi-codec: add channel mapping control")
should not have added the kfree(hcp->chmap_info); to the hdmi_codec_remove
function.
Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
Reviewed-by: Jyri Sarha <jsarha@ti.com>
Tested-by: Jyri Sarha <jsarha@ti.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
When resuming from idle with the new suspend mode configuration support
we go through the resume callbacks with a state of PM_SUSPEND_TO_IDLE
which we don't have regulator constraints for, causing an error:
dpm_run_callback(): regulator_resume_early+0x0/0x64 returns -22
PM: Device regulator.0 failed to resume early: error -22
Avoid this and similar errors by treating missing constraints as a noop.
See also commit 57a0dd1879 ("regulator: Fix suspend to idle"),
which fixed the suspend part.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Mark Brown <broonie@kernel.org>
On transport mode we forget to fetch the child dst_entry
before we continue the while loop, this leads to an infinite
loop. Fix this by fetching the child dst_entry before we
continue the while loop.
Fixes: 0f6c480f23 ("xfrm: Move dst->path into struct xfrm_dst")
Reported-by: syzbot+7d03c810e50aaedef98a@syzkaller.appspotmail.com
Tested-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
The ALC5651 does not like multi-write accesses, avoid them. This fixes:
rt5651 i2c-10EC5651:00: Unable to sync registers 0x27-0x28. -121
Errors on resume (and all registers after the registers in the error not
being synced).
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: stable@vger.kernel.org
When support for the A31/A31s CCU was first added, the clock ops for
the CLK_OUT_* clocks was set to the wrong type. The clocks are MP-type,
but the ops was set for div (M) clocks. This went unnoticed until now.
This was because while they are different clocks, their data structures
aligned in a way that ccu_div_ops would access the second ccu_div_internal
and ccu_mux_internal structures, which were valid, if not incorrect.
Furthermore, the use of these CLK_OUT_* was for feeding a precise 32.768
kHz clock signal to the WiFi chip. This was achievable by using the parent
with the same clock rate and no divider. So the incorrect divider setting
did not affect this usage.
Commit 946797aa3f ("clk: sunxi-ng: Support fixed post-dividers on MP
style clocks") added a new field to the ccu_mp structure, which broke
the aforementioned alignment. Now the system crashes as div_ops tries
to look up a nonexistent table.
Reported-by: Philipp Rossak <embed3d@gmail.com>
Tested-by: Philipp Rossak <embed3d@gmail.com>
Fixes: c6e6c96d8f ("clk: sunxi-ng: Add A31/A31s clocks")
Cc: <stable@vger.kernel.org>
Signed-off-by: Chen-Yu Tsai <wens@csie.org>
Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
When xfrm_policy_get_afinfo returns NULL, it will not hold rcu
read lock. In this case, rcu_read_unlock should not be called
in xfrm_get_tos, just like other places where it's calling
xfrm_policy_get_afinfo.
Fixes: f5e2bb4f5b ("xfrm: policy: xfrm_get_tos cannot fail")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
I've accidentally stumbled upon the IS_ENABLED(EXPOLINE_*) lines, which
obviously always evaluate to false. Fix this.
Fixes: f19fbd5ed6 ("s390: introduce execute-trampolines for branches")
Signed-off-by: Eugeniu Rosca <erosca@de.adit-jv.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Internal DASD device driver I/O such as query host access count or
path verification is started using the _sleep_on() function.
To mark a request as started or ended the callback_data is set to either
DASD_SLEEPON_START_TAG or DASD_SLEEPON_END_TAG.
In cases where the request has to be stopped unconditionally the status is
set to DASD_SLEEPON_END_TAG as well which leads to immediate clearing of
the request.
But the request might still be on a device request queue for normal
operation which might lead to a panic because of a BUG() statement in
__dasd_device_process_final_queue() or a list corruption of the device
request queue.
Fix by removing the setting of DASD_SLEEPON_END_TAG in the
dasd_cancel_req() and dasd_generic_requeue_all_requests() functions and
ensure that the request is not deleted in the requeue function.
Trigger the device tasklet in the requeue function and let the normal
processing cleanup the request.
Signed-off-by: Stefan Haberland <sth@linux.vnet.ibm.com>
Reviewed-by: Jan Hoeppner <hoeppner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The prior patch added support for passing gfp flags through to the
underlying allocators. This patch allows users to pass along gfp flags
(currently only __GFP_NORETRY and __GFP_NOWARN) to the underlying
allocators. This should allow users to decide if they are ok with
failing allocations recovering in a more graceful way.
Additionally, gfp passing was done as additional flags in the previous
patch. Instead, change this to caller passed semantics. GFP_KERNEL is
also removed as the default flag. It continues to be used for internally
caused underlying percpu allocations.
V2:
Removed gfp_percpu_mask in favor of doing it inline.
Removed GFP_KERNEL as a default flag for __alloc_percpu_gfp.
Signed-off-by: Dennis Zhou <dennisszhou@gmail.com>
Suggested-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Percpu memory using the vmalloc area based chunk allocator lazily
populates chunks by first requesting the full virtual address space
required for the chunk and subsequently adding pages as allocations come
through. To ensure atomic allocations can succeed, a workqueue item is
used to maintain a minimum number of empty pages. In certain scenarios,
such as reported in [1], it is possible that physical memory becomes
quite scarce which can result in either a rather long time spent trying
to find free pages or worse, a kernel panic.
This patch adds support for __GFP_NORETRY and __GFP_NOWARN passing them
through to the underlying allocators. This should prevent any
unnecessary panics potentially caused by the workqueue item. The passing
of gfp around is as additional flags rather than a full set of flags.
The next patch will change these to caller passed semantics.
V2:
Added const modifier to gfp flags in the balance path.
Removed an extra whitespace.
[1] https://lkml.org/lkml/2018/2/12/551
Signed-off-by: Dennis Zhou <dennisszhou@gmail.com>
Suggested-by: Daniel Borkmann <daniel@iogearbox.net>
Reported-by: syzbot+adb03f3f0bb57ce3acda@syzkaller.appspotmail.com
Acked-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
At some point the function declaration parameters got out of sync with
the function definitions in percpu-vm.c and percpu-km.c. This patch
makes them match again.
Signed-off-by: Dennis Zhou <dennisszhou@gmail.com>
Acked-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Various people have reported the Crucial MX100 512GB model not working
with LPM set to min_power. I've now received a report that it also does
not work with the new med_power_with_dipm level.
It does work with medium_power, but that has no measurable power-savings
and given the amount of people being bitten by the other levels not
working, this commit just disables LPM altogether.
Note all reporters of this have either the 512GB model (max capacity), or
are not specifying their SSD's size. So for now this quirk assumes this is
a problem with the 512GB model only.
Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=89261
Buglink: https://github.com/linrunner/TLP/issues/84
Cc: stable@vger.kernel.org
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Save 26 lines worth of Sparse complaints by fixing up this minor
mishap. The pointee lies in the __iomem space; the pointer does not.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Commit 8419caa727 ("ASoC: sgtl5000: Do not disable regulators in
SND_SOC_BIAS_OFF") causes the sgtl5000 to fail after a suspend/resume
sequence:
Playing WAVE '/media/a2002011001-e02.wav' : Signed 16 bit Little
Endian, Rate 44100 Hz, Stereo
aplay: pcm_write:2051: write error: Input/output error
The problem is caused by the fact that the aforementioned commit
dropped the cache handling, so re-introduce the register map
resync to fix the problem.
Suggested-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Fabio Estevam <fabio.estevam@nxp.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: <stable@vger.kernel.org>
I would like helping maintaining and reviewing/testing sgtl5000
related patches.
Signed-off-by: Fabio Estevam <fabio.estevam@nxp.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
In AP mode, when a new station associates, rs is initialized immediately
upon association completion, before the phy context is updated with the
association parameters, so the sta bandwidth might be wider than the phy
context allows.
To avoid this issue, always initialize rs with 20mhz bandwidth rate, and
after authorization, when the phy context is already up-to-date, re-init
rs with the correct bw.
Signed-off-by: Naftali Goldstein <naftali.goldstein@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Canceling the periodic timestamp work should be
done in the opposite flow to where it was started.
This also prevents from sending the MARKER command
during the mac_stop flow - causing a false queue hang
(FW is no longer there to send a response).
Fixes: 93b167c13a ("iwlwifi: runtime: sync FW and host clocks for logs")
Signed-off-by: Shaul Triebitz <shaul.triebitz@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
This change relaxes copy up on encode of merge dir with lower layer > 1
and handles the case of encoding a merge dir with lower layer 1, where an
ancestor is a non-indexed merge dir. In that case, decode of the lower
file handle will not have been possible if the non-indexed ancestor is
redirected before or after encode.
Before encoding a non-upper directory file handle from real layer N, we
need to check if it will be possible to reconnect an overlay dentry from
the real lower decoded dentry. This is done by following the overlay
ancestry up to a "layer N connected" ancestor and verifying that all
parents along the way are "layer N connectable". If an ancestor that is
NOT "layer N connectable" is found, we need to copy up an ancestor, which
is "layer N connectable", thus making that ancestor "layer N connected".
For example:
layer 1: /a
layer 2: /a/b/c
The overlay dentry /a is NOT "layer 2 connectable", because if dir /a is
copied up and renamed, upper dir /a will be indexed by lower dir /a from
layer 1. The dir /a from layer 2 will never be indexed, so the algorithm
in ovl_lookup_real_ancestor() (*) will not be able to lookup a connected
overlay dentry from the connected lower dentry /a/b/c.
To avoid this problem on decode time, we need to copy up an ancestor of
/a/b/c, which is "layer 2 connectable", on encode time. That ancestor is
/a/b. After copy up (and index) of /a/b, it will become "layer 2 connected"
and when the time comes to decode the file handle from lower dentry /a/b/c,
ovl_lookup_real_ancestor() will find the indexed ancestor /a/b and decoding
a connected overlay dentry will be accomplished.
(*) the algorithm in ovl_lookup_real_ancestor() can be improved to lookup
an entry /a in the lower layers above layer N and find the indexed dir /a
from layer 1. If that improvement is made, then the check for "layer N
connected" will need to verify there are no redirects in lower layers above
layer N. In the example above, /a will be "layer 2 connectable". However,
if layer 2 dir /a is a target of a layer 1 redirect, then /a will NOT be
"layer 2 connectable":
layer 1: /A (redirect = /a)
layer 2: /a/b/c
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Commit 31747eda41 ("ovl: hash directory inodes for fsnotify")
fixed an issue of inotify watch on directory that stops getting
events after dropping dentry caches.
A similar issue exists for non-dir non-upper files, for example:
$ mkdir -p lower upper work merged
$ touch lower/foo
$ mount -t overlay -o
lowerdir=lower,workdir=work,upperdir=upper none merged
$ inotifywait merged/foo &
$ echo 2 > /proc/sys/vm/drop_caches
$ cat merged/foo
inotifywait doesn't get the OPEN event, because ovl_lookup() called
from 'cat' allocates a new overlay inode and does not reuse the
watched inode.
Fix this by hashing non-dir overlay inodes by lower real inode in
the following cases that were not hashed before this change:
- A non-upper overlay mount
- A lower non-hardlink when index=off
A helper ovl_hash_bylower() was added to put all the logic and
documentation about which real inode an overlay inode is hashed by
into one place.
The issue dates back to initial version of overlayfs, but this
patch depends on ovl_inode code that was introduced in kernel v4.13.
Cc: <stable@vger.kernel.org> #v4.13
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Our Transmit Frame Descriptor (TFD) is a DMA descriptor that
includes several pointers to be able to transmit a packet
which is not physically contiguous.
Depending on the hardware being use, we can have 20 or 25
pointers in a single TFD. In both cases, it is more than
enough and it is quite hard to hit this limit.
It has been reported that when using specific applications
(Ktorrent), we can actually use all the pointers and then
a long standing bug showed up.
When we free the TFD, we check its number of valid pointers
and make sure it doesn't exceed the number of pointers the
hardware support.
This check had an off by one bug: it is perfectly valid to
free the 20 pointers if the TFD has 20 pointers.
Fix that.
https://bugzilla.kernel.org/show_bug.cgi?id=197981
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
In IBSS, the mac80211 sets the cab_queue to be invalid.
However, the multicast station uses it, so we need to override it.
A previous patch did it, but it was nested inside the if's and was
applied only for legacy FWs that don't support the new station type
API, instead of being applied for all paths.
In addition, add a missing NL80211_IFTYPE_ADHOC to the initialization
of the queues in iwl_mvm_mac_ctxt_init()
Fixes: ee48b72211 ("iwlwifi: mvm: support ibss in dqa mode")
Signed-off-by: Sara Sharon <sara.sharon@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
A previous patch allowed the same PN for packets originating from the
same AMSDU by copying PN only for the last packet in the series.
This however is bogus since we cannot assume the last frame will be
received on the same queue, and if it is received on a different ueue
we will end up not incrementing the PN and possibly let the next
packet to have the same PN and pass through.
Change the logic instead to driver explicitly indicate for the second
sub frame and on to be allowed to have the same PN as the first
subframe. Indicate it to mac80211 as well for the fallback queue.
Fixes: f1ae02b186 ("iwlwifi: mvm: allow same PN for de-aggregated AMSDU")
Signed-off-by: Sara Sharon <sara.sharon@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
In early time, when freeing a xdst, it would be inserted into
dst_garbage.list first. Then if it's refcnt was still held
somewhere, later it would be put into dst_busy_list in
dst_gc_task().
When one dev was being unregistered, the dev of these dsts in
dst_busy_list would be set with loopback_dev and put this dev.
So that this dev's removal wouldn't get blocked, and avoid the
kmsg warning:
kernel:unregister_netdevice: waiting for veth0 to become \
free. Usage count = 2
However after Commit 52df157f17 ("xfrm: take refcnt of dst
when creating struct xfrm_dst bundle"), the xdst will not be
freed with dst gc, and this warning happens.
To fix it, we need to find these xdsts that are still held by
others when removing the dev, and free xdst's dev and set it
with loopback_dev.
But unfortunately after flow_cache for xfrm was deleted, no
list tracks them anymore. So we need to save these xdsts
somewhere to release the xdst's dev later.
To make this easier, this patch is to reuse uncached_list to
track xdsts, so that the dev refcnt can be released in the
event NETDEV_UNREGISTER process of fib_netdev_notifier.
Thanks to Florian, we could move forward this fix quickly.
Fixes: 52df157f17 ("xfrm: take refcnt of dst when creating struct xfrm_dst bundle")
Reported-by: Jianlin Shi <jishi@redhat.com>
Reported-by: Hangbin Liu <liuhangbin@gmail.com>
Tested-by: Eyal Birger <eyal.birger@gmail.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Problem Statement: Sending I/O through 32 bit descriptors to Ventura series of
controller results in IO timeout on certain conditions.
This error only occurs on systems with high I/O activity on Ventura series
controllers.
Changes in this patch will prevent driver from using 32 bit descriptor and use
64 bit Descriptors.
Cc: <stable@vger.kernel.org>
Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Free Electrons is now Bootlin, change my email address accordingly.
Acked-by: Chen-Yu Tsai <wens@csie.org>
Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
This patch fixes a bootproblem with the Bananapi M2 board. Since there
are some regulators missing we add them right now. Those values come
from the schematic, below you can find a small overview:
* reg_aldo1: 3,3V, powers the wifi
* reg_aldo2: 2,5V, powers the IO of the RTL8211E
* reg_aldo3: 3,3V, powers the audio
* reg_dldo1: 3,0V, powers the RTL8211E
* reg_dldo2: 2,8V, powers the analog part of the csi
* reg_dldo3: 3,3V, powers misc
* reg_eldo1: 1,8V, powers the csi
* reg_ldo_io1:1,8V, powers the gpio
* reg_dc5ldo: needs to be always on
This patch updates also the vmmc-supply properties on the mmc0 and mmc2
node to use the allready existent regulators.
We can now remove the sunxi-common-regulators.dtsi include since we
don't need it anymore.
Fixes: 7daa213700 ("ARM: dts: sunxi: Add regulators for Sinovoip BPI-M2")
Cc: <stable@vger.kernel.org>
Signed-off-by: Philipp Rossak <embed3d@gmail.com>
Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
The eldoin is supplied from the dcdc1 regulator. The N_VBUSEN pin is
connected to an external power regulator (SY6280AAC).
With this commit we update the pmic binding properties to support
those features.
Fixes: 7daa213700 ("ARM: dts: sunxi: Add regulators for Sinovoip BPI-M2")
Cc: <stable@vger.kernel.org>
Signed-off-by: Philipp Rossak <embed3d@gmail.com>
Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
This patch adds missing DT binding files to the Samsung ASoC
drivers entry.
Signed-off-by: Sylwester Nawrocki <s.nawrocki@samsung.com>
Acked-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
Current implementation mute codec in global way (DAC block).
That means when user routes sound not from I2S but from
AUX source (LINE_IN) it also will be muted by alsa core.
This should not happen.
Signed-off-by: Michal Oleszczyk <oleszczyk.m@gmail.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Dcoumentation has been added by parsing through git commit history and
reading code. This might be useful for scripting and tracking changes in
the ABI.
I do not have complete descriptions for the following 3 attributes; they
have been annotated with the comment [to be documented] -
/sys/class/scsi_host/hostX/ahci_port_cmd
/sys/class/scsi_host/hostX/ahci_host_caps
/sys/class/scsi_host/hostX/ahci_host_cap2
Signed-off-by: Aishwarya Pant <aishpant@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Clean-up the documentation of sysfs interfaces to be in the same format
as described in Documentation/ABI/README. This will be useful for
tracking changes in the ABI. Attributes are grouped by function (device,
link or port) and then by date added.
This patch also adds documentation for one attribute -
/sys/class/ata_port/ataX/port_no
Signed-off-by: Aishwarya Pant <aishpant@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
The sanity test added in ecd7918745 can be bypassed, validation
only occurs if XFRM_STATE_ESN flag is set, but rest of code doesn't care
and just checks if the attribute itself is present.
So always validate. Alternative is to reject if we have the attribute
without the flag but that would change abi.
Reported-by: syzbot+0ab777c27d2bb7588f73@syzkaller.appspotmail.com
Cc: Mathias Krause <minipli@googlemail.com>
Fixes: ecd7918745 ("xfrm_user: ensure user supplied esn replay window is valid")
Fixes: d8647b79c3 ("xfrm: Add user interface for esn and big anti-replay windows")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Now that the flowcache is removed we need to generate
a new dummy bundle every time we check if the needed
SAs are in place because the dummy bundle is not cached
anymore. Fix it by passing the XFRM_LOOKUP_QUEUE flag
to xfrm_lookup(). This makes sure that we get a dummy
bundle in case the SAs are not yet in place.
Fixes: 3ca28286ea ("xfrm_policy: bypass flow_cache_lookup")
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Some versions of gcc generate a warning that the variable "emulated"
may be used uninitialized in function kvmppc_handle_load128_by2x64().
It would be used uninitialized if kvmppc_handle_load128_by2x64 was
ever called with vcpu->arch.mmio_vmx_copy_nums == 0, but neither of
the callers ever do that, so there is no actual bug. When gcc
generates a warning, it causes the build to fail because arch/powerpc
is compiled with -Werror.
This silences the warning by initializing "emulated" to EMULATE_DONE.
Fixes: 09f984961c ("KVM: PPC: Book3S: Add MMIO emulation for VMX instructions")
Reported-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Commit accb757d79 ("KVM: Move vcpu_load to arch-specific
kvm_arch_vcpu_ioctl_run", 2017-12-04) added a "goto out"
statement and an "out:" label to kvm_arch_vcpu_ioctl_run().
Since the only "goto out" is inside a CONFIG_VSX block,
compiling with CONFIG_VSX=n gives a warning that label "out"
is defined but not used, and because arch/powerpc is compiled
with -Werror, that becomes a compile error that makes the kernel
build fail.
Merge commit 1ab03c072f ("Merge tag 'kvm-ppc-next-4.16-2' of
git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc",
2018-02-09) added a similar block of code inside a #ifdef
CONFIG_ALTIVEC, with a "goto out" statement.
In order to make the build succeed, this adds a #ifdef around the
"out:" label. This is a minimal, ugly fix, to be replaced later
by a refactoring of the code. Since CONFIG_VSX depends on
CONFIG_ALTIVEC, it is sufficient to use #ifdef CONFIG_ALTIVEC here.
Fixes: accb757d79 ("KVM: Move vcpu_load to arch-specific kvm_arch_vcpu_ioctl_run")
Reported-by: Christian Zigotzky <chzigotzky@xenosoft.de>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Dennis rewrote the percpu area allocator some months ago, understands
most of the code base and has been responsive with the bug reports and
questions. Let's add him as a co-maintainer.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Christopher Lameter <cl@linux.com>
While adding cgroup2 interface for the cpu controller, 0d5936344f
("sched: Implement interface for cgroup unified hierarchy") forgot to
update input validation and left it to reject cpu.max config if any
descendant has set a higher value.
cgroup2 officially supports delegation and a descendant must not be
able to restrict what its ancestors can configure. For absolute
limits such as cpu.max and memory.max, this means that the config at
each level should only act as the upper limit at that level and
shouldn't interfere with what other cgroups can configure.
This patch updates config validation on cgroup2 so that the cpu
controller follows the same convention.
Signed-off-by: Tejun Heo <tj@kernel.org>
Fixes: 0d5936344f ("sched: Implement interface for cgroup unified hierarchy")
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org # v4.15+
Because power of Salvator-X board is cut off in suspend,
it needs to reset SATA PHY state in resume.
Otherwise, SATA partition could not be accessed anymore.
Signed-off-by: Khiem Nguyen <khiem.nguyen.xt@rvc.renesas.com>
Signed-off-by: Hien Dang <hien.dang.eb@rvc.renesas.com>
[reinit phy in sata_rcar_resume() function on R-Car Gen3 only]
[factor out SATA module init sequence]
[fixed the prefix for the subject]
Signed-off-by: Yoshihiro Kaneko <ykaneko0929@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
syzkaller hit a WARN() in ata_bmdma_qc_issue() when writing to /dev/sg0.
This happened because it issued an ATA pass-through command (ATA_16)
where the protocol field indicated that NCQ should be used -- but the
device did not support NCQ.
We could just remove the WARN() from libata-sff.c, but the real problem
seems to be that the SCSI -> ATA translation code passes through NCQ
commands without verifying that the device actually supports NCQ.
Fix this by adding the appropriate check to ata_scsi_pass_thru().
Here's reproducer that works in QEMU when /dev/sg0 refers to a disk of
the default type ("82371SB PIIX3 IDE"):
#include <fcntl.h>
#include <unistd.h>
int main()
{
char buf[53] = { 0 };
buf[36] = 0x85; /* ATA_16 */
buf[37] = (12 << 1); /* FPDMA */
buf[38] = 0x1; /* Has data */
buf[51] = 0xC8; /* ATA_CMD_READ */
write(open("/dev/sg0", O_RDWR), buf, sizeof(buf));
}
Fixes: ee7fb331c3 ("libata: add support for NCQ commands for SG interface")
Reported-by: syzbot+2f69ca28df61bdfc77cd36af2e789850355a221e@syzkaller.appspotmail.com
Cc: <stable@vger.kernel.org> # v4.4+
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
syzkaller hit a WARN() in ata_qc_issue() when writing to /dev/sg0. This
happened because it issued a READ_6 command with no data buffer.
Just remove the WARN(), as it doesn't appear indicate a kernel bug. The
expected behavior is to fail the command, which the code does.
Here's a reproducer that works in QEMU when /dev/sg0 refers to a disk of
the default type ("82371SB PIIX3 IDE"):
#include <fcntl.h>
#include <unistd.h>
int main()
{
char buf[42] = { [36] = 0x8 /* READ_6 */ };
write(open("/dev/sg0", O_RDWR), buf, sizeof(buf));
}
Fixes: f92a26365a ("libata: change ATA_QCFLAG_DMAMAP semantics")
Reported-by: syzbot+f7b556d1766502a69d85071d2ff08bd87be53d0f@syzkaller.appspotmail.com
Cc: <stable@vger.kernel.org> # v2.6.25+
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
syzkaller reported a crash in ata_bmdma_fill_sg() when writing to
/dev/sg1. The immediate cause was that the ATA command's scatterlist
was not DMA-mapped, which causes 'pi - 1' to underflow, resulting in a
write to 'qc->ap->bmdma_prd[0xffffffff]'.
Strangely though, the flag ATA_QCFLAG_DMAMAP was set in qc->flags. The
root cause is that when __ata_scsi_queuecmd() is preparing to relay a
SCSI command to an ATAPI device, it doesn't correctly validate the CDB
length before copying it into the 16-byte buffer 'cdb' in 'struct
ata_queued_cmd'. Namely, it validates the fixed CDB length expected
based on the SCSI opcode but not the actual CDB length, which can be
larger due to the use of the SG_NEXT_CMD_LEN ioctl. Since 'flags' is
the next member in ata_queued_cmd, a buffer overflow corrupts it.
Fix it by requiring that the actual CDB length be <= 16 (ATAPI_CDB_LEN).
[Really it seems the length should be required to be <= dev->cdb_len,
but the current behavior seems to have been intentionally introduced by
commit 607126c2a2 ("libata-scsi: be tolerant of 12-byte ATAPI commands
in 16-byte CDBs") to work around a userspace bug in mplayer. Probably
the workaround is no longer needed (mplayer was fixed in 2007), but
continuing to allow lengths to up 16 appears harmless for now.]
Here's a reproducer that works in QEMU when /dev/sg1 refers to the
CD-ROM drive that qemu-system-x86_64 creates by default:
#include <fcntl.h>
#include <sys/ioctl.h>
#include <unistd.h>
#define SG_NEXT_CMD_LEN 0x2283
int main()
{
char buf[53] = { [36] = 0x7e, [52] = 0x02 };
int fd = open("/dev/sg1", O_RDWR);
ioctl(fd, SG_NEXT_CMD_LEN, &(int){ 17 });
write(fd, buf, sizeof(buf));
}
The crash was:
BUG: unable to handle kernel paging request at ffff8cb97db37ffc
IP: ata_bmdma_fill_sg drivers/ata/libata-sff.c:2623 [inline]
IP: ata_bmdma_qc_prep+0xa4/0xc0 drivers/ata/libata-sff.c:2727
PGD fb6c067 P4D fb6c067 PUD 0
Oops: 0002 [#1] SMP
CPU: 1 PID: 150 Comm: syz_ata_bmdma_q Not tainted 4.15.0-next-20180202 #99
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-20171110_100015-anatol 04/01/2014
[...]
Call Trace:
ata_qc_issue+0x100/0x1d0 drivers/ata/libata-core.c:5421
ata_scsi_translate+0xc9/0x1a0 drivers/ata/libata-scsi.c:2024
__ata_scsi_queuecmd drivers/ata/libata-scsi.c:4326 [inline]
ata_scsi_queuecmd+0x8c/0x210 drivers/ata/libata-scsi.c:4375
scsi_dispatch_cmd+0xa2/0xe0 drivers/scsi/scsi_lib.c:1727
scsi_request_fn+0x24c/0x530 drivers/scsi/scsi_lib.c:1865
__blk_run_queue_uncond block/blk-core.c:412 [inline]
__blk_run_queue+0x3a/0x60 block/blk-core.c:432
blk_execute_rq_nowait+0x93/0xc0 block/blk-exec.c:78
sg_common_write.isra.7+0x272/0x5a0 drivers/scsi/sg.c:806
sg_write+0x1ef/0x340 drivers/scsi/sg.c:677
__vfs_write+0x31/0x160 fs/read_write.c:480
vfs_write+0xa7/0x160 fs/read_write.c:544
SYSC_write fs/read_write.c:589 [inline]
SyS_write+0x4d/0xc0 fs/read_write.c:581
do_syscall_64+0x5e/0x110 arch/x86/entry/common.c:287
entry_SYSCALL_64_after_hwframe+0x21/0x86
Fixes: 607126c2a2 ("libata-scsi: be tolerant of 12-byte ATAPI commands in 16-byte CDBs")
Reported-by: syzbot+1ff6f9fcc3c35f1c72a95e26528c8e7e3276e4da@syzkaller.appspotmail.com
Cc: <stable@vger.kernel.org> # v2.6.24+
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Indent the numbered item with one space like all other items in the same
list.
Signed-off-by: Baruch Siach <baruch@tkos.co.il>
Signed-off-by: Tejun Heo <tj@kernel.org>
Exit directly with ENODEV, if the AHCI controller is not available
anymore. Otherwise a delay of 500ms for each port is added to the remove
function while trying to issue a command on the non-existent controller.
Signed-off-by: Stefan Roese <sr@denx.de>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
This fixs the following comile warnings with ATA_DEBUG enabled,
which detected by Linaro GCC 5.2-2015.11:
drivers/ata/libata-scsi.c: In function 'ata_scsi_dump_cdb':
./include/linux/kern_levels.h:5:18: warning: format '%d' expects
argument of type 'int', but argument 6 has type 'u64 {aka long
long unsigned int}' [-Wformat=]
tj: Patch hand-applied and description trimmed.
Signed-off-by: Dong Bo <dongbo4@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
If matrix_keypad_stop() is executing and the keypad interrupt is triggered,
disable_row_irqs() may be called by both matrix_keypad_interrupt() and
matrix_keypad_stop() at the same time, causing interrupts to be disabled
twice and the keypad being "stuck" after resuming.
Take lock when setting keypad->stopped to ensure that ISR will not race
with matrix_keypad_stop() disabling interrupts.
Signed-off-by: Zhang Bo <zbsdta@126.com>
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
stm32_vrefbuf_enable() wrongly checks VRR bit: 0 stands for not ready,
1 for ready. It currently checks the opposite.
This makes enable routine to exit immediately without waiting for ready
flag.
Fixes: 0cdbf481e9 ("regulator: Add support for stm32-vrefbuf")
Signed-off-by: Fabrice Gasnier <fabrice.gasnier@st.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: stable@vger.kernel.org
Replace the original license statement with the SPDX identifier.
Add also one line of description as recommended by the COPYING
file.
Signed-off-by: Andi Shyti <andi.shyti@samsung.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
The driver has been released with GNU Public License v2 as stated
in the header, but the module license information has been tagged
as "GPL" (GNU Public License v2 or later).
Fix the module license information so that it matches the one in
the header as "GPL v2".
Fixes: 07b8481d4a ("Input: add MELFAS mms114 touchscreen driver")
Reported-by: Marcus Folkesson <marcus.folkesson@gmail.com>
Signed-off-by: Andi Shyti <andi.shyti@samsung.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
We don't have a compat layer for xfrm, so userspace and kernel
structures have different sizes in this case. This results in
a broken configuration, so refuse to configure socket policies
when trying to insert from 32 bit userspace as we do it already
with policies inserted via netlink.
Reported-and-tested-by: syzbot+e1a1577ca8bcb47b769a@syzkaller.appspotmail.com
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2018-02-02 09:23:23 +01:00
889 changed files with 9562 additions and 6150 deletions
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.