The contact point for the kernel's Code of Conduct should now be the
Code of Conduct Committee, not the full TAB. Change the email address
in the file to properly reflect this.
Acked-by: Chris Mason <clm@fb.com>
Acked-by: Olof Johansson <olof@lixom.net>
Acked-by: Theodore Ts'o <tytso@mit.edu>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There was a blank <URL> reference for how to find the Code of Conduct
Committee. Fix that up by pointing it to the correct kernel.org website
page location.
Acked-by: Chris Mason <clm@fb.com>
Acked-by: Olof Johansson <olof@lixom.net>
Acked-by: Theodore Ts'o <tytso@mit.edu>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Create a link between the Code of Conduct and the Code of Conduct
Interpretation so that people can see that they are related.
Acked-by: Chris Mason <clm@fb.com>
Acked-by: Olof Johansson <olof@lixom.net>
Acked-by: Theodore Ts'o <tytso@mit.edu>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Wolfram writes:
"i2c for 4.19
Another driver bugfix and MAINTAINERS addition from I2C."
* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: rcar: cleanup DMA for all kinds of failure
MAINTAINERS: Add entry for Broadcom STB I2C controller
David writes:
"Networking:
A few straggler bug fixes:
1) Fix indexing of multi-pass dumps of ipv6 addresses, from David
Ahern.
2) Revert RCU locking change for bonding netpoll, causes worse
problems than it solves.
3) pskb_trim_rcsum_slow() doesn't handle odd trim offsets, resulting
in erroneous bad hw checksum triggers with CHECKSUM_COMPLETE
devices. From Dimitris Michailidis.
4) a revert to some neighbour code changes that adjust notifications
in a way that confuses some apps."
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
Revert "neighbour: force neigh_invalidate when NUD_FAILED update is from admin"
net/ipv6: Fix index counter for unicast addresses in in6_dump_addrs
net: fix pskb_trim_rcsum_slow() with odd trim offset
Revert "bond: take rcu lock in netpoll_send_skb_on_dev"
This reverts commit 8e326289e3.
This patch results in unnecessary netlink notification when one
tries to delete a neigh entry already in NUD_FAILED state. Found
this with a buggy app that tries to delete a NUD_FAILED entry
repeatedly. While the notification issue can be fixed with more
checks, adding more complexity here seems unnecessary. Also,
recent tests with other changes in the neighbour code have
shown that the INCOMPLETE and PROBE checks are good enough for
the original issue.
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The loop wants to skip previously dumped addresses, so loops until
current index >= saved index. If the message fills it wants to save
the index for the next address to dump - ie., the one that did not
fit in the current message.
Currently, it is incrementing the index counter before comparing to the
saved index, and then the saved index is off by 1 - it assumes the
current address is going to fit in the message.
Change the index handling to increment only after a succesful dump.
Fixes: 502a2ffd73 ("ipv6: convert idev_list to list macros")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
DMA needs to be cleaned up not only on timeout, but on all errors where
it has been setup before.
Fixes: 73e8b05283 ("i2c: rcar: add DMA support")
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Add an entry for the Broadcom STB I2C controller in the MAINTAINERS file.
Signed-off-by: Kamal Dasu <kdasu.kdev@gmail.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
[wsa: fixed sorting and a whitespace error]
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Ingo writes:
"x86 fixes:
It's 4 misc fixes, 3 build warning fixes and 3 comment fixes.
In hindsight I'd have left out the 3 comment fixes to make the pull
request look less scary at such a late point in the cycle. :-/"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/swiotlb: Enable swiotlb for > 4GiG RAM on 32-bit kernels
x86/fpu: Fix i486 + no387 boot crash by only saving FPU registers on context switch if there is an FPU
x86/fpu: Remove second definition of fpu in __fpu__restore_sig()
x86/entry/64: Further improve paranoid_entry comments
x86/entry/32: Clear the CS high bits
x86/boot: Add -Wno-pointer-sign to KBUILD_CFLAGS
x86/time: Correct the attribute on jiffies' definition
x86/entry: Add some paranoid entry/exit CR3 handling comments
x86/percpu: Fix this_cpu_read()
x86/tsc: Force inlining of cyc2ns bits
Ingo writes:
"scheduler fixes:
Two fixes: a CFS-throttling bug fix, and an interactivity fix."
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/fair: Fix the min_vruntime update logic in dequeue_entity()
sched/fair: Fix throttle_list starvation with low CFS quota
Ingo writes:
"perf fixes:
Misc perf tooling fixes."
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf tools: Stop fallbacking to kallsyms for vdso symbols lookup
perf tools: Pass build flags to traceevent build
perf report: Don't crash on invalid inline debug information
perf cpu_map: Align cpu map synthesized events properly.
perf tools: Fix tracing_path_mount proper path
perf tools: Fix use of alternatives to find JDIR
perf evsel: Store ids for events with their own cpus perf_event__synthesize_event_update_cpus
perf vendor events intel: Fix wrong filter_band* values for uncore events
Revert "perf tools: Fix PMU term format max value calculation"
tools headers uapi: Sync kvm.h copy
tools arch uapi: Sync the x86 kvm.h copy
We've been getting checksum errors involving small UDP packets, usually
59B packets with 1 extra non-zero padding byte. netdev_rx_csum_fault()
has been complaining that HW is providing bad checksums. Turns out the
problem is in pskb_trim_rcsum_slow(), introduced in commit 88078d98d1
("net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends").
The source of the problem is that when the bytes we are trimming start
at an odd address, as in the case of the 1 padding byte above,
skb_checksum() returns a byte-swapped value. We cannot just combine this
with skb->csum using csum_sub(). We need to use csum_block_sub() here
that takes into account the parity of the start address and handles the
swapping.
Matches existing code in __skb_postpull_rcsum() and esp_remove_trailer().
Fixes: 88078d98d1 ("net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends")
Signed-off-by: Dimitris Michailidis <dmichail@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dave writes:
"drm fixes for 4.19 final (part 2)
Looked like two stragglers snuck in, one very urgent the pageflipping
was missing a reference that could result in a GPF on non-i915
drivers, the other is an overflow in the sun4i dotclock calcs
resulting in a mode not getting set."
* tag 'drm-fixes-2018-10-20-1' of git://anongit.freedesktop.org/drm/drm:
drm/sun4i: Fix an ulong overflow in the dotclock driver
drm: Get ref on CRTC commit object when waiting for flip_done
Steven writes:
"tracing: A few small fixes to synthetic events
Masami found some issues with the creation of synthetic events. The
first two patches fix handling of unsigned type, and handling of a
space before an ending semi-colon.
The third patch adds a selftest to test the processing of synthetic
events."
* tag 'trace-v4.19-rc8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
selftests: ftrace: Add synthetic event syntax testcase
tracing: Fix synthetic event to allow semicolon at end
tracing: Fix synthetic event to accept unsigned modifier
Dmitry writes:
"Input updates for 4.19-rc8
Just an addition to elan touchpad driver ACPI table."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: elan_i2c - add ACPI ID for Lenovo IdeaPad 330-15IGM
Fix synthetic event to allow independent semicolon at end.
The synthetic_events interface accepts a semicolon after the
last word if there is no space.
# echo "myevent u64 var;" >> synthetic_events
But if there is a space, it returns an error.
# echo "myevent u64 var ;" > synthetic_events
sh: write error: Invalid argument
This behavior is difficult for users to understand. Let's
allow the last independent semicolon too.
Link: http://lkml.kernel.org/r/153986835420.18251.2191216690677025744.stgit@devbox
Cc: Shuah Khan <shuah@kernel.org>
Cc: Tom Zanussi <tom.zanussi@linux.intel.com>
Cc: stable@vger.kernel.org
Fixes: commit 4b147936fa ("tracing: Add support for 'synthetic' events")
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Fix synthetic event to accept unsigned modifier for its field type
correctly.
Currently, synthetic_events interface returns error for "unsigned"
modifiers as below;
# echo "myevent unsigned long var" >> synthetic_events
sh: write error: Invalid argument
This is because argv_split() breaks "unsigned long" into "unsigned"
and "long", but parse_synth_field() doesn't expected it.
With this fix, synthetic_events can handle the "unsigned long"
correctly like as below;
# echo "myevent unsigned long var" >> synthetic_events
# cat synthetic_events
myevent unsigned long var
Link: http://lkml.kernel.org/r/153986832571.18251.8448135724590496531.stgit@devbox
Cc: Shuah Khan <shuah@kernel.org>
Cc: Tom Zanussi <tom.zanussi@linux.intel.com>
Cc: stable@vger.kernel.org
Fixes: commit 4b147936fa ("tracing: Add support for 'synthetic' events")
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
This reverts commit 6fe9487892.
It is causing more serious regressions than the RCU warning
it is fixing.
Signed-off-by: David S. Miller <davem@davemloft.net>
I wrote:
"USB fixes for 4.19-final
Here are a small number of last-minute USB driver fixes
Included here are:
- spectre fix for usb storage gadgets
- xhci fixes
- cdc-acm fixes
- usbip fixes for reported problems
All of these have been in linux-next with no reported issues."
* tag 'usb-4.19-final' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
usb: gadget: storage: Fix Spectre v1 vulnerability
USB: fix the usbfs flag sanitization for control transfers
usb: xhci: pci: Enable Intel USB role mux on Apollo Lake platforms
usb: roles: intel_xhci: Fix Unbalanced pm_runtime_enable
cdc-acm: correct counting of UART states in serial state notification
cdc-acm: do not reset notification buffer index upon urb unlinking
cdc-acm: fix race between reset and control messaging
usb: usbip: Fix BUG: KASAN: slab-out-of-bounds in vhci_hub_control()
selftests: usbip: add wait after attach and before checking port status
Jens writes:
"Block fixes for 4.19-final
Two small fixes that should go into this release."
* tag 'for-linus-20181019' of git://git.kernel.dk/linux-block:
block: don't deal with discard limit in blkdev_issue_discard()
nvme: remove ns sibling before clearing path
David writes:
"Networking
1) Fix gro_cells leak in xfrm layer, from Li RongQing.
2) BPF selftests change RLIMIT_MEMLOCK blindly, don't do that. From
Eric Dumazet.
3) AF_XDP calls synchronize_net() under RCU lock, fix from Björn
Töpel.
4) Out of bounds packet access in _decode_session6(), from Alexei
Starovoitov.
5) Several ethtool bugs, where we copy a struct into the kernel twice
and our validations of the values in the first copy can be
invalidated by the second copy due to asynchronous updates to the
memory by the user. From Wenwen Wang.
6) Missing netlink attribute validation in cls_api, from Davide
Caratti.
7) LLC SAP sockets neet to be SOCK_RCU FREE, from Cong Wang.
8) rxrpc operates on wrong kvec, from Yue Haibing.
9) A regression was introduced by the disassosciation of route
neighbour references in rt6_probe(), causing probe for
neighbourless routes to not be properly rate limited. Fix from
Sabrina Dubroca.
10) Unsafe RCU locking in tipc, from Tung Nguyen.
11) Use after free in inet6_mc_check(), from Eric Dumazet.
12) PMTU from icmp packets should update the SCTP transport pathmtu,
from Xin Long.
13) Missing peer put on error in rxrpc, from David Howells.
14) Fix pedit in nfp driver, from Pieter Jansen van Vuuren.
15) Fix overflowing shift statement in qla3xxx driver, from Nathan
Chancellor.
16) Fix Spectre v1 in ptp code, from Gustavo A. R. Silva.
17) udp6_unicast_rcv_skb() interprets udpv6_queue_rcv_skb() return
value in an inverted manner, fix from Paolo Abeni.
18) Fix missed unresolved entries in ipmr dumps, from Nikolay
Aleksandrov.
19) Fix NAPI handling under high load, we can completely miss events
when NAPI has to loop more than one time in a cycle. From Heiner
Kallweit."
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (49 commits)
ip6_tunnel: Fix encapsulation layout
tipc: fix info leak from kernel tipc_event
net: socket: fix a missing-check bug
net: sched: Fix for duplicate class dump
r8169: fix NAPI handling under high load
net: ipmr: fix unresolved entry dumps
net: mscc: ocelot: Fix comment in ocelot_vlant_wait_for_completion()
sctp: fix the data size calculation in sctp_data_size
virtio_net: avoid using netif_tx_disable() for serializing tx routine
udp6: fix encap return code for resubmitting
mlxsw: core: Fix use-after-free when flashing firmware during init
sctp: not free the new asoc when sctp_wait_for_connect returns err
sctp: fix race on sctp_id2asoc
r8169: re-enable MSI-X on RTL8168g
net: bpfilter: use get_pid_task instead of pid_task
ptp: fix Spectre v1 vulnerability
net: qla3xxx: Remove overflowing shift statement
geneve, vxlan: Don't set exceptions if skb->len < mtu
geneve, vxlan: Don't check skb_dst() twice
sctp: get pr_assoc and pr_stream all status with SCTP_PR_SCTP_ALL instead
...
David writes:
"Sparc fixes:
The main bit here is fixing how fallback system calls are handled in
the sparc vDSO.
Unfortunately, I fat fingered the commit and some perf debugging
hacks slipped into the vDSO fix, which I revert in the very next
commit."
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
sparc: Revert unintended perf changes.
sparc: vDSO: Silence an uninitialized variable warning
sparc: Fix syscall fallback bugs in VDSO.
Dave writes:
"drm fixes for 4.19 final
Just a last set of misc core fixes for final.
4 fixes, one use after free, one fb integration fix, one EDID fix,
and one laptop panel quirk,"
* tag 'drm-fixes-2018-10-19' of git://anongit.freedesktop.org/drm/drm:
drm/edid: VSDB yCBCr420 Deep Color mode bit definitions
drm: fix use of freed memory in drm_mode_setcrtc
drm: fb-helper: Reject all pixel format changing requests
drm/edid: Add 6 bpc quirk for BOE panel in HP Pavilion 15-n233sl
Doug writes:
"Really final for-rc pull request for 4.19
Ok, so last week I thought we had sent our final pull request for
4.19. Well, wouldn't ya know someone went and found a couple Spectre
v1 fixes were needed :-/. So, a couple *very* small specter patches
for this (hopefully) final -rc week."
* tag 'for-gkh' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
RDMA/ucma: Fix Spectre v1 vulnerability
IB/ucm: Fix Spectre v1 vulnerability
Commit 058214a4d1 ("ip6_tun: Add infrastructure for doing
encapsulation") added the ip6_tnl_encap() call in ip6_tnl_xmit(), before
the call to ipv6_push_frag_opts() to append the IPv6 Tunnel Encapsulation
Limit option (option 4, RFC 2473, par. 5.1) to the outer IPv6 header.
As long as the option didn't actually end up in generated packets, this
wasn't an issue. Then commit 89a23c8b52 ("ip6_tunnel: Fix missing tunnel
encapsulation limit option") fixed sending of this option, and the
resulting layout, e.g. for FoU, is:
.-------------------.------------.----------.-------------------.----- - -
| Outer IPv6 Header | UDP header | Option 4 | Inner IPv6 Header | Payload
'-------------------'------------'----------'-------------------'----- - -
Needless to say, FoU and GUE (at least) won't work over IPv6. The option
is appended by default, and I couldn't find a way to disable it with the
current iproute2.
Turn this into a more reasonable:
.-------------------.----------.------------.-------------------.----- - -
| Outer IPv6 Header | Option 4 | UDP header | Inner IPv6 Header | Payload
'-------------------'----------'------------'-------------------'----- - -
With this, and with 84dad55951 ("udp6: fix encap return code for
resubmitting"), FoU and GUE work again over IPv6.
Fixes: 058214a4d1 ("ip6_tun: Add infrastructure for doing encapsulation")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In ethtool_ioctl(), the ioctl command 'ethcmd' is checked through a switch
statement to see whether it is necessary to pre-process the ethtool
structure, because, as mentioned in the comment, the structure
ethtool_rxnfc is defined with padding. If yes, a user-space buffer 'rxnfc'
is allocated through compat_alloc_user_space(). One thing to note here is
that, if 'ethcmd' is ETHTOOL_GRXCLSRLALL, the size of the buffer 'rxnfc' is
partially determined by 'rule_cnt', which is actually acquired from the
user-space buffer 'compat_rxnfc', i.e., 'compat_rxnfc->rule_cnt', through
get_user(). After 'rxnfc' is allocated, the data in the original user-space
buffer 'compat_rxnfc' is then copied to 'rxnfc' through copy_in_user(),
including the 'rule_cnt' field. However, after this copy, no check is
re-enforced on 'rxnfc->rule_cnt'. So it is possible that a malicious user
race to change the value in the 'compat_rxnfc->rule_cnt' between these two
copies. Through this way, the attacker can bypass the previous check on
'rule_cnt' and inject malicious data. This can cause undefined behavior of
the kernel and introduce potential security risk.
This patch avoids the above issue via copying the value acquired by
get_user() to 'rxnfc->rule_cn', if 'ethcmd' is ETHTOOL_GRXCLSRLALL.
Signed-off-by: Wenwen Wang <wang6495@umn.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
When dumping classes by parent, kernel would return classes twice:
| # tc qdisc add dev lo root prio
| # tc class show dev lo
| class prio 8001:1 parent 8001:
| class prio 8001:2 parent 8001:
| class prio 8001:3 parent 8001:
| # tc class show dev lo parent 8001:
| class prio 8001:1 parent 8001:
| class prio 8001:2 parent 8001:
| class prio 8001:3 parent 8001:
| class prio 8001:1 parent 8001:
| class prio 8001:2 parent 8001:
| class prio 8001:3 parent 8001:
This comes from qdisc_match_from_root() potentially returning the root
qdisc itself if its handle matched. Though in that case, root's classes
were already dumped a few lines above.
Fixes: cb395b2010 ("net: sched: optimize class dumps")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
rtl_rx() and rtl_tx() are called only if the respective bits are set
in the interrupt status register. Under high load NAPI may not be
able to process all data (work_done == budget) and it will schedule
subsequent calls to the poll callback.
rtl_ack_events() however resets the bits in the interrupt status
register, therefore subsequent calls to rtl8169_poll() won't call
rtl_rx() and rtl_tx() - chip interrupts are still disabled.
Fix this by calling rtl_rx() and rtl_tx() independent of the bits
set in the interrupt status register. Both functions will detect
if there's nothing to do for them.
Fixes: da78dbff2e ("r8169: remove work from irq handler.")
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This fixes a general protection fault, caused by accessing the contents
of a flip_done completion object that has already been freed. It occurs
due to the preemption of a non-blocking commit worker thread W by
another commit thread X. X continues to clear its atomic state at the
end, destroying the CRTC commit object that W still needs. Switching
back to W and accessing the commit objects then leads to bad results.
Worker W becomes preemptable when waiting for flip_done to complete. At
this point, a frequently occurring commit thread X can take over. Here's
an example where W is a worker thread that flips on both CRTCs, and X
does a legacy cursor update on both CRTCs:
...
1. W does flip work
2. W runs commit_hw_done()
3. W waits for flip_done on CRTC 1
4. > flip_done for CRTC 1 completes
5. W finishes waiting for CRTC 1
6. W waits for flip_done on CRTC 2
7. > Preempted by X
8. > flip_done for CRTC 2 completes
9. X atomic_check: hw_done and flip_done are complete on all CRTCs
10. X updates cursor on both CRTCs
11. X destroys atomic state
12. X done
13. > Switch back to W
14. W waits for flip_done on CRTC 2
15. W raises general protection fault
The error looks like so:
general protection fault: 0000 [#1] PREEMPT SMP PTI
**snip**
Call Trace:
lock_acquire+0xa2/0x1b0
_raw_spin_lock_irq+0x39/0x70
wait_for_completion_timeout+0x31/0x130
drm_atomic_helper_wait_for_flip_done+0x64/0x90 [drm_kms_helper]
amdgpu_dm_atomic_commit_tail+0xcae/0xdd0 [amdgpu]
commit_tail+0x3d/0x70 [drm_kms_helper]
process_one_work+0x212/0x650
worker_thread+0x49/0x420
kthread+0xfb/0x130
ret_from_fork+0x3a/0x50
Modules linked in: x86_pkg_temp_thermal amdgpu(O) chash(O)
gpu_sched(O) drm_kms_helper(O) syscopyarea sysfillrect sysimgblt
fb_sys_fops ttm(O) drm(O)
Note that i915 has this issue masked, since hw_done is signaled after
waiting for flip_done. Doing so will block the cursor update from
happening until hw_done is signaled, preventing the cursor commit from
destroying the state.
v2: The reference on the commit object needs to be obtained before
hw_done() is signaled, since that's the point where another commit
is allowed to modify the state. Assuming that the
new_crtc_state->commit object still exists within flip_done() is
incorrect.
Fix by getting a reference in setup_commit(), and releasing it
during default_clear().
Signed-off-by: Leo Li <sunpeng.li@amd.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Harry Wentland <harry.wentland@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/1539611200-6184-1-git-send-email-sunpeng.li@amd.com
Steffen Klassert says:
====================
pull request (net): ipsec 2018-10-18
1) Free the xfrm interface gro_cells when deleting the
interface, otherwise we leak it. From Li RongQing.
2) net/core/flow.c does not exist anymore, so remove it
from the MAINTAINERS file.
3) Fix a slab-out-of-bounds in _decode_session6.
From Alexei Starovoitov.
4) Fix RCU protection when policies inserted into
thei bydst lists. From Florian Westphal.
Please pull or let me know if there are problems.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
blk_queue_split() does respect this limit via bio splitting, so no
need to do that in blkdev_issue_discard(), then we can align to
normal bio submit(bio_add_page() & submit_bio()).
More importantly, this patch fixes one issue introduced in a22c4d7e34
("block: re-add discard_granularity and alignment checks"), in which
zero discard bio may be generated in case of zero alignment.
Fixes: a22c4d7e34 ("block: re-add discard_granularity and alignment checks")
Cc: stable@vger.kernel.org
Cc: Ming Lin <ming.l@ssi.samsung.com>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Xiao Ni <xni@redhat.com>
Tested-by: Mariusz Dabrowski <mariusz.dabrowski@intel.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
fscache_set_key() can incur an out-of-bounds read, reported by KASAN:
BUG: KASAN: slab-out-of-bounds in fscache_alloc_cookie+0x5b3/0x680 [fscache]
Read of size 4 at addr ffff88084ff056d4 by task mount.nfs/32615
and also reported by syzbot at https://lkml.org/lkml/2018/7/8/236
BUG: KASAN: slab-out-of-bounds in fscache_set_key fs/fscache/cookie.c:120 [inline]
BUG: KASAN: slab-out-of-bounds in fscache_alloc_cookie+0x7a9/0x880 fs/fscache/cookie.c:171
Read of size 4 at addr ffff8801d3cc8bb4 by task syz-executor907/4466
This happens for any index_key_len which is not divisible by 4 and is
larger than the size of the inline key, because the code allocates exactly
index_key_len for the key buffer, but the hashing loop is stepping through
it 4 bytes (u32) at a time in the buf[] array.
Fix this by calculating how many u32 buffers we'll need by using
DIV_ROUND_UP, and then using kcalloc() to allocate a precleared allocation
buffer to hold the index_key, then using that same count as the hashing
index limit.
Fixes: ec0328e46d ("fscache: Maintain a catalogue of allocated cookies")
Reported-by: syzbot+a95b989b2dde8e806af8@syzkaller.appspotmail.com
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The inline key in struct rxrpc_cookie is insufficiently initialized,
zeroing only 3 of the 4 slots, therefore an index_key_len between 13 and 15
bytes will end up hashing uninitialized memory because the memcpy only
partially fills the last buf[] element.
Fix this by clearing fscache_cookie objects on allocation rather than using
the slab constructor to initialise them. We're going to pretty much fill
in the entire struct anyway, so bringing it into our dcache writably
shouldn't incur much overhead.
This removes the need to do clearance in fscache_set_key() (where we aren't
doing it correctly anyway).
Also, we don't need to set cookie->key_len in fscache_set_key() as we
already did it in the only caller, so remove that.
Fixes: ec0328e46d ("fscache: Maintain a catalogue of allocated cookies")
Reported-by: syzbot+a95b989b2dde8e806af8@syzkaller.appspotmail.com
Reported-by: Eric Sandeen <sandeen@redhat.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
the victim might've been rmdir'ed just before the lock_rename();
unlike the normal callers, we do not look the source up after the
parents are locked - we know it beforehand and just recheck that it's
still the child of what used to be its parent. Unfortunately,
the check is too weak - we don't spot a dead directory since its
->d_parent is unchanged, dentry is positive, etc. So we sail all
the way to ->rename(), with hosting filesystems _not_ expecting
to be asked renaming an rmdir'ed subdirectory.
The fix is easy, fortunately - the lock on parent is sufficient for
making IS_DEADDIR() on child safe.
Cc: stable@vger.kernel.org
Fixes: 9ae326a690 (CacheFiles: A cache that backs onto a mounted filesystem)
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Jann Horn points out that our TLB flushing was subtly wrong for the
mremap() case. What makes mremap() special is that we don't follow the
usual "add page to list of pages to be freed, then flush tlb, and then
free pages". No, mremap() obviously just _moves_ the page from one page
table location to another.
That matters, because mremap() thus doesn't directly control the
lifetime of the moved page with a freelist: instead, the lifetime of the
page is controlled by the page table locking, that serializes access to
the entry.
As a result, we need to flush the TLB not just before releasing the lock
for the source location (to avoid any concurrent accesses to the entry),
but also before we release the destination page table lock (to avoid the
TLB being flushed after somebody else has already done something to that
page).
This also makes the whole "need_flush" logic unnecessary, since we now
always end up flushing the TLB for every valid entry.
Reported-and-tested-by: Jann Horn <jannh@google.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Tested-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Using non-GPL licenses for our documentation is rather problematic,
as it can directly include other files, which generally are GPLv2
licensed and thus not compatible.
Remove this license now that the only user (idr.rst) is gone to avoid
people semi-accidentally using it again.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Matthew writes:
"IDA/IDR fixes for 4.19
I have two tiny fixes, one for the IDA test-suite and one for the IDR
documentation license."
* 'ida-fixes-4.19-rc8' of git://git.infradead.org/users/willy/linux-dax:
idr: Change documentation license
test_ida: Fix lockdep warning
Pull perf/urgent fixes from Arnaldo Carvalho de Melo:
- Stop falling back to kallsyms for vDSO symbols lookup, this wasn't
being really used and is not valid in arches such as Sparc, where
user and kernel space don't share the address space, relying only on
cpumode to figure out what DSOs to lookup (Arnaldo Carvalho de Melo)
- Align CPU map synthesized events properly, fixing SIGBUS in
CPUs like Sparc (David Miller)
- Fix use of alternatives to find JDIR (Jarod Wilson)
- Store IDs for events with their own CPUs when synthesizing user
level event details (scale, unit, etc) events, fixing a crash
when recording a PMU event with a cpumask defined (Jiri Olsa)
- Fix wrong filter_band* values for uncore Intel vendor events (Jiri Olsa)
- Fix detection of tracefs path in systems without tracefs, where
that path should be the debugfs mountpoint plus "/tracing/" (Jiri Olsa)
- Pass build flags to traceevent build, allowing using alternative
flags in distro packages, RPM, for instance (Jiri Olsa)
- Fix 'perf report' crash on invalid inline debug information (Milian Wolff)
- Synch KVM UAPI copies (Arnaldo Carvalho de Melo)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
If the skb space ends in an unresolved entry while dumping we'll miss
some unresolved entries. The reason is due to zeroing the entry counter
between dumping resolved and unresolved mfc entries. We should just
keep counting until the whole table is dumped and zero when we move to
the next as we have a separate table counter.
Reported-by: Colin Ian King <colin.king@canonical.com>
Fixes: 8fb472c09b ("ipmr: improve hash scalability")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The ocelot_vlant_wait_for_completion() function is very similar to the
ocelot_mact_wait_for_completion(). It seemed to have be copied but the
comment was not updated, so let's fix it.
Signed-off-by: Gregory CLEMENT <gregory.clement@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
sctp data size should be calculated by subtracting data chunk header's
length from chunk_hdr->length, not just data header.
Fixes: 668c9beb90 ("sctp: implement assign_number for sctp_stream_interleave")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 713a98d90c ("virtio-net: serialize tx routine during reset")
introduces netif_tx_disable() after netif_device_detach() in order to
avoid use-after-free of tx queues. However, there are two issues.
1) Its operation is redundant with netif_device_detach() in case the
interface is running.
2) In case of the interface is not running before suspending and
resuming, the tx does not get resumed by netif_device_attach().
This results in losing network connectivity.
It is better to use netif_tx_lock_bh()/netif_tx_unlock_bh() instead for
serializing tx routine during reset. This also preserves the symmetry
of netif_device_detach() and netif_device_attach().
Fixes commit 713a98d90c ("virtio-net: serialize tx routine during reset")
Signed-off-by: Ake Koomsin <ake@igel.co.jp>
Acked-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Steven writes:
"tracing: Two fixes for 4.19
This fixes two bugs:
- Fix size mismatch of tracepoint array
- Have preemptirq test module use same clock source of the selftest"
* tag 'trace-v4.19-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Use trace_clock_local() for looping in preemptirq_delay_test.c
tracepoint: Fix tracepoint array element size mismatch
The commit eb63f2964d ("udp6: add missing checks on edumux packet
processing") used the same return code convention of the ipv4 counterpart,
but ipv6 uses the opposite one: positive values means resubmit.
This change addresses the issue, using positive return value for
resubmitting. Also update the related comment, which was broken, too.
Fixes: eb63f2964d ("udp6: add missing checks on edumux packet processing")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the switch driver (e.g., mlxsw_spectrum) determines it needs to
flash a new firmware version it resets the ASIC after the flashing
process. The bus driver (e.g., mlxsw_pci) then registers itself again
with mlxsw_core which means (among other things) that the device
registers itself again with the hwmon subsystem again.
Since the device was registered with the hwmon subsystem using
devm_hwmon_device_register_with_groups(), then the old hwmon device
(registered before the flashing) was never unregistered and was
referencing stale data, resulting in a use-after free.
Fix by removing reliance on device managed APIs in mlxsw_hwmon_init().
Fixes: c86d62cc41 ("mlxsw: spectrum: Reset FW after flash")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: Alexander Petrovskiy <alexpe@mellanox.com>
Tested-by: Alexander Petrovskiy <alexpe@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When sctp_wait_for_connect is called to wait for connect ready
for sp->strm_interleave in sctp_sendmsg_to_asoc, a panic could
be triggered if cpu is scheduled out and the new asoc is freed
elsewhere, as it will return err and later the asoc gets freed
again in sctp_sendmsg.
[ 285.840764] list_del corruption, ffff9f0f7b284078->next is LIST_POISON1 (dead000000000100)
[ 285.843590] WARNING: CPU: 1 PID: 8861 at lib/list_debug.c:47 __list_del_entry_valid+0x50/0xa0
[ 285.846193] Kernel panic - not syncing: panic_on_warn set ...
[ 285.846193]
[ 285.848206] CPU: 1 PID: 8861 Comm: sctp_ndata Kdump: loaded Not tainted 4.19.0-rc7.label #584
[ 285.850559] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[ 285.852164] Call Trace:
...
[ 285.872210] ? __list_del_entry_valid+0x50/0xa0
[ 285.872894] sctp_association_free+0x42/0x2d0 [sctp]
[ 285.873612] sctp_sendmsg+0x5a4/0x6b0 [sctp]
[ 285.874236] sock_sendmsg+0x30/0x40
[ 285.874741] ___sys_sendmsg+0x27a/0x290
[ 285.875304] ? __switch_to_asm+0x34/0x70
[ 285.875872] ? __switch_to_asm+0x40/0x70
[ 285.876438] ? ptep_set_access_flags+0x2a/0x30
[ 285.877083] ? do_wp_page+0x151/0x540
[ 285.877614] __sys_sendmsg+0x58/0xa0
[ 285.878138] do_syscall_64+0x55/0x180
[ 285.878669] entry_SYSCALL_64_after_hwframe+0x44/0xa9
This is a similar issue with the one fixed in Commit ca3af4dd28
("sctp: do not free asoc when it is already dead in sctp_sendmsg").
But this one can't be fixed by returning -ESRCH for the dead asoc
in sctp_wait_for_connect, as it will break sctp_connect's return
value to users.
This patch is to simply set err to -ESRCH before it returns to
sctp_sendmsg when any err is returned by sctp_wait_for_connect
for sp->strm_interleave, so that no asoc would be freed due to
this.
When users see this error, they will know the packet hasn't been
sent. And it also makes sense to not free asoc because waiting
connect fails, like the second call for sctp_wait_for_connect in
sctp_sendmsg_to_asoc.
Fixes: 668c9beb90 ("sctp: implement assign_number for sctp_stream_interleave")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
syzbot reported an use-after-free involving sctp_id2asoc. Dmitry Vyukov
helped to root cause it and it is because of reading the asoc after it
was freed:
CPU 1 CPU 2
(working on socket 1) (working on socket 2)
sctp_association_destroy
sctp_id2asoc
spin lock
grab the asoc from idr
spin unlock
spin lock
remove asoc from idr
spin unlock
free(asoc)
if asoc->base.sk != sk ... [*]
This can only be hit if trying to fetch asocs from different sockets. As
we have a single IDR for all asocs, in all SCTP sockets, their id is
unique on the system. An application can try to send stuff on an id
that matches on another socket, and the if in [*] will protect from such
usage. But it didn't consider that as that asoc may belong to another
socket, it may be freed in parallel (read: under another socket lock).
We fix it by moving the checks in [*] into the protected region. This
fixes it because the asoc cannot be freed while the lock is held.
Reported-by: syzbot+c7dd55d7aec49d48e49a@syzkaller.appspotmail.com
Acked-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Similar to d49c88d767 ("r8169: Enable MSI-X on RTL8106e") after
e9d0ba506ea8 ("PCI: Reprogram bridge prefetch registers on resume")
we can safely assume that this also fixes the root cause of
the issue worked around by 7c53a72245 ("r8169: don't use MSI-X on
RTL8168g"). So let's revert it.
Fixes: 7c53a72245 ("r8169: don't use MSI-X on RTL8168g")
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
pid_task() dereferences rcu protected tasks array.
But there is no rcu_read_lock() in shutdown_umh() routine so that
rcu_read_lock() is needed.
get_pid_task() is wrapper function of pid_task. it holds rcu_read_lock()
then calls pid_task(). if task isn't NULL, it increases reference count
of task.
test commands:
%modprobe bpfilter
%modprobe -rv bpfilter
splat looks like:
[15102.030932] =============================
[15102.030957] WARNING: suspicious RCU usage
[15102.030985] 4.19.0-rc7+ #21 Not tainted
[15102.031010] -----------------------------
[15102.031038] kernel/pid.c:330 suspicious rcu_dereference_check() usage!
[15102.031063]
other info that might help us debug this:
[15102.031332]
rcu_scheduler_active = 2, debug_locks = 1
[15102.031363] 1 lock held by modprobe/1570:
[15102.031389] #0: 00000000580ef2b0 (bpfilter_lock){+.+.}, at: stop_umh+0x13/0x52 [bpfilter]
[15102.031552]
stack backtrace:
[15102.031583] CPU: 1 PID: 1570 Comm: modprobe Not tainted 4.19.0-rc7+ #21
[15102.031607] Hardware name: To be filled by O.E.M. To be filled by O.E.M./Aptio CRB, BIOS 5.6.5 07/08/2015
[15102.031628] Call Trace:
[15102.031676] dump_stack+0xc9/0x16b
[15102.031723] ? show_regs_print_info+0x5/0x5
[15102.031801] ? lockdep_rcu_suspicious+0x117/0x160
[15102.031855] pid_task+0x134/0x160
[15102.031900] ? find_vpid+0xf0/0xf0
[15102.032017] shutdown_umh.constprop.1+0x1e/0x53 [bpfilter]
[15102.032055] stop_umh+0x46/0x52 [bpfilter]
[15102.032092] __x64_sys_delete_module+0x47e/0x570
[ ... ]
Fixes: d2ba09c17a ("net: add skeleton of bpfilter kernel module")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
pin_index can be indirectly controlled by user-space, hence leading
to a potential exploitation of the Spectre variant 1 vulnerability.
This issue was detected with the help of Smatch:
drivers/ptp/ptp_chardev.c:253 ptp_ioctl() warn: potential spectre issue
'ops->pin_config' [r] (local cap)
Fix this by sanitizing pin_index before using it to index
ops->pin_config, and before passing it as an argument to
function ptp_set_pinfunc(), in which it is used to index
info->pin_config.
Notice that given that speculation windows are large, the policy is
to kill the speculation on the first load and not worry if it can be
completed with a dependent load/store [1].
[1] https://marc.info/?l=linux-kernel&m=152449131114778&w=2
Cc: stable@vger.kernel.org
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Smatch complains that "val" would be uninitialized if kstrtoul() fails.
Fixes: 9a08862a5d ("vDSO for sparc")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Clang currently warns:
drivers/net/ethernet/qlogic/qla3xxx.c:384:24: warning: signed shift
result (0xF00000000) requires 37 bits to represent, but 'int' only has
32 bits [-Wshift-overflow]
((ISP_NVRAM_MASK << 16) | qdev->eeprom_cmd_data));
~~~~~~~~~~~~~~ ^ ~~
1 warning generated.
The warning is certainly accurate since ISP_NVRAM_MASK is defined as
(0x000F << 16) which is then shifted by 16, resulting in 64424509440,
well above UINT_MAX.
Given that this is the only location in this driver where ISP_NVRAM_MASK
is shifted again, it seems likely that ISP_NVRAM_MASK was originally
defined without a shift and during the move of the shift to the
definition, this statement wasn't properly removed (since ISP_NVRAM_MASK
is used in the statenent right above this). Only the maintainers can
confirm this since this statment has been here since the driver was
first added to the kernel.
Link: https://github.com/ClangBuiltLinux/linux/issues/127
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stefano Brivio says:
====================
geneve, vxlan: Don't set exceptions if skb->len < mtu
This series fixes the exception abuse described in 2/2, and 1/2
is just a preparatory change to make 2/2 less ugly.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
We shouldn't abuse exceptions: if the destination MTU is already higher
than what we're transmitting, no exception should be created.
Fixes: 52a589d51f ("geneve: update skb dst pmtu on tx path")
Fixes: a93bf0ff44 ("vxlan: update skb dst pmtu on tx path")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit f15ca723c1 ("net: don't call update_pmtu unconditionally") avoids
that we try updating PMTU for a non-existent destination, but didn't clean
up cases where the check was already explicit. Drop those redundant checks.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
First, the trap number for 32-bit syscalls is 0x10.
Also, only negate the return value when syscall error is indicated by
the carry bit being set.
Signed-off-by: David S. Miller <davem@davemloft.net>
The preemptirq_delay_test module is used for the ftrace selftest code that
tests the latency tracers. The problem is that it uses ktime for the delay
loop, and then checks the tracer to see if the delay loop is caught, but the
tracer uses trace_clock_local() which uses various different other clocks to
measure the latency. As ktime uses the clock cycles, and the code then
converts that to nanoseconds, it causes rounding errors, and the preemptirq
latency tests are failing due to being off by 1 (it expects to see a delay
of 500000 us, but the delay is only 499999 us). This is happening due to a
rounding error in the ktime (which is totally legit). The purpose of the
test is to see if it can catch the delay, not to test the accuracy between
trace_clock_local() and ktime_get(). Best to use apples to apples, and have
the delay loop use the same clock as the latency tracer does.
Cc: stable@vger.kernel.org
Fixes: f96e8577da ("lib: Add module for testing preemptoff/irqsoff latency tracers")
Acked-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
commit 46e0c9be20 ("kernel: tracepoints: add support for relative
references") changes the layout of the __tracepoint_ptrs section on
architectures supporting relative references. However, it does so
without turning struct tracepoint * const into const int elsewhere in
the tracepoint code, which has the following side-effect:
Setting mod->num_tracepoints is done in by module.c:
mod->tracepoints_ptrs = section_objs(info, "__tracepoints_ptrs",
sizeof(*mod->tracepoints_ptrs),
&mod->num_tracepoints);
Basically, since sizeof(*mod->tracepoints_ptrs) is a pointer size
(rather than sizeof(int)), num_tracepoints is erroneously set to half the
size it should be on 64-bit arch. So a module with an odd number of
tracepoints misses the last tracepoint due to effect of integer
division.
So in the module going notifier:
for_each_tracepoint_range(mod->tracepoints_ptrs,
mod->tracepoints_ptrs + mod->num_tracepoints,
tp_module_going_check_quiescent, NULL);
the expression (mod->tracepoints_ptrs + mod->num_tracepoints) actually
evaluates to something within the bounds of the array, but miss the
last tracepoint if the number of tracepoints is odd on 64-bit arch.
Fix this by introducing a new typedef: tracepoint_ptr_t, which
is either "const int" on architectures that have PREL32 relocations,
or "struct tracepoint * const" on architectures that does not have
this feature.
Also provide a new tracepoint_ptr_defer() static inline to
encapsulate deferencing this type rather than duplicate code and
ugly idefs within the for_each_tracepoint_range() implementation.
This issue appears in 4.19-rc kernels, and should ideally be fixed
before the end of the rc cycle.
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Jessica Yu <jeyu@kernel.org>
Link: http://lkml.kernel.org/r/20181013191050.22389-1-mathieu.desnoyers@efficios.com
Link: http://lkml.kernel.org/r/20180704083651.24360-7-ard.biesheuvel@linaro.org
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morris <james.morris@microsoft.com>
Cc: James Morris <jmorris@namei.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Nicolas Pitre <nico@linaro.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Thomas Garnier <thgarnie@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
num can be indirectly controlled by user-space, hence leading to
a potential exploitation of the Spectre variant 1 vulnerability.
This issue was detected with the help of Smatch:
drivers/usb/gadget/function/f_mass_storage.c:3177 fsg_lun_make() warn:
potential spectre issue 'fsg_opts->common->luns' [r] (local cap)
Fix this by sanitizing num before using it to index
fsg_opts->common->luns
Notice that given that speculation windows are large, the policy is
to kill the speculation on the first load and not worry if it can be
completed with a dependent load/store [1].
[1] https://marc.info/?l=linux-kernel&m=152449131114778&w=2
Cc: stable@vger.kernel.org
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Acked-by: Felipe Balbi <felipe.balbi@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
David reports that:
<quote>
Perf has this hack where it uses the kernel symbol map as a backup when
a symbol can't be found in the user's symbol table(s).
This causes problems because the tests driving this code path use
machine__kernel_ip(), and that is completely meaningless on Sparc. On
sparc64 the kernel and user live in physically separate virtual address
spaces, rather than a shared one. And the kernel lives at a virtual
address that overlaps common userspace addresses. So this test passes
almost all the time when a user symbol lookup fails.
The consequence of this is that, if the unfound user virtual address in
the sample doesn't match up to a kernel symbol either, we trigger things
like this code in builtin-top.c:
if (al.sym == NULL && al.map != NULL) {
const char *msg = "Kernel samples will not be resolved.\n";
/*
* As we do lazy loading of symtabs we only will know if the
* specified vmlinux file is invalid when we actually have a
* hit in kernel space and then try to load it. So if we get
* here and there are _no_ symbols in the DSO backing the
* kernel map, bail out.
*
* We may never get here, for instance, if we use -K/
* --hide-kernel-symbols, even if the user specifies an
* invalid --vmlinux ;-)
*/
if (!machine->kptr_restrict_warned && !top->vmlinux_warned &&
__map__is_kernel(al.map) && map__has_symbols(al.map)) {
if (symbol_conf.vmlinux_name) {
char serr[256];
dso__strerror_load(al.map->dso, serr, sizeof(serr));
ui__warning("The %s file can't be used: %s\n%s",
symbol_conf.vmlinux_name, serr, msg);
} else {
ui__warning("A vmlinux file was not found.\n%s",
msg);
}
if (use_browser <= 0)
sleep(5);
top->vmlinux_warned = true;
}
}
When I fire up a compilation on sparc, this triggers immediately.
I'm trying to figure out what the "backup to kernel map" code is
accomplishing.
I see some language in the current code and in the changes that have
happened in this area talking about vdso. Does that really happen?
The vdso is mapped into userspace virtual addresses, not kernel ones.
More history. This didn't cause problems on sparc some time ago,
because the kernel IP check used to be "ip < 0" :-) Sparc kernel
addresses are not negative. But now with machine__kernel_ip(), which
works using the symbol table determined kernel address range, it does
trigger.
What it all boils down to is that on architectures like sparc,
machine__kernel_ip() should always return false in this scenerio, and
therefore this kind of logic:
if (cpumode == PERF_RECORD_MISC_USER && machine &&
mg != &machine->kmaps &&
machine__kernel_ip(machine, al->addr)) {
is basically invalid. PERF_RECORD_MISC_USER implies no kernel address
can possibly match for the sample/event in question (no matter how
hard you try!) :-)
</>
So, I thought something had changed and in the past we would somehow
find that address in the kallsyms, but I couldn't find anything to back
that up, the patch introducing this is over a decade old, lots of things
changed, so I was just thinking I was missing something.
I tried a gtod busy loop to generate vdso activity and added a 'perf
probe' at that branch, on x86_64 to see if it ever gets hit:
Made thread__find_map() noinline, as 'perf probe' in lines of inline
functions seems to not be working, only at function start. (Masami?)
# perf probe -x ~/bin/perf -L thread__find_map:57
<thread__find_map@/home/acme/git/perf/tools/perf/util/event.c:57>
57 if (cpumode == PERF_RECORD_MISC_USER && machine &&
58 mg != &machine->kmaps &&
59 machine__kernel_ip(machine, al->addr)) {
60 mg = &machine->kmaps;
61 load_map = true;
62 goto try_again;
}
} else {
/*
* Kernel maps might be changed when loading
* symbols so loading
* must be done prior to using kernel maps.
*/
69 if (load_map)
70 map__load(al->map);
71 al->addr = al->map->map_ip(al->map, al->addr);
# perf probe -x ~/bin/perf thread__find_map:60
Added new event:
probe_perf:thread__find_map (on thread__find_map:60 in /home/acme/bin/perf)
You can now use it in all perf tools, such as:
perf record -e probe_perf:thread__find_map -aR sleep 1
#
Then used this to see if, system wide, those probe points were being hit:
# perf trace -e *perf:thread*/max-stack=8/
^C[root@jouet ~]#
No hits when running 'perf top' and:
# cat gtod.c
#include <sys/time.h>
int main(void)
{
struct timeval tv;
while (1)
gettimeofday(&tv, 0);
return 0;
}
[root@jouet c]# ./gtod
^C
Pressed 'P' in 'perf top' and the [vdso] samples are there:
62.84% [vdso] [.] __vdso_gettimeofday
8.13% gtod [.] main
7.51% [vdso] [.] 0x0000000000000914
5.78% [vdso] [.] 0x0000000000000917
5.43% gtod [.] _init
2.71% [vdso] [.] 0x000000000000092d
0.35% [kernel] [k] native_io_delay
0.33% libc-2.26.so [.] __memmove_avx_unaligned_erms
0.20% [vdso] [.] 0x000000000000091d
0.17% [i2c_i801] [k] i801_access
0.06% firefox [.] free
0.06% libglib-2.0.so.0.5400.3 [.] g_source_iter_next
0.05% [vdso] [.] 0x0000000000000919
0.05% libpthread-2.26.so [.] __pthread_mutex_lock
0.05% libpixman-1.so.0.34.0 [.] 0x000000000006d3a7
0.04% [kernel] [k] entry_SYSCALL_64_trampoline
0.04% libxul.so [.] style::dom_apis::query_selector_slow
0.04% [kernel] [k] module_get_kallsym
0.04% firefox [.] malloc
0.04% [vdso] [.] 0x0000000000000910
I added a 'perf probe' to thread__find_map:69, and that surely got tons
of hits, i.e. for every map found, just to make sure the 'perf probe'
command was really working.
In the process I noticed a bug, we're only have records for '[vdso]' for
pre-existing commands, i.e. ones that are running when we start 'perf top',
when we will generate the PERF_RECORD_MMAP by looking at /perf/PID/maps.
I.e. like this, for preexisting processes with a vdso map, again,
tracing for all the system, only pre-existing processes get a [vdso] map
(when having one):
[root@jouet ~]# perf probe -x ~/bin/perf __machine__addnew_vdso
Added new event:
probe_perf:__machine__addnew_vdso (on __machine__addnew_vdso in /home/acme/bin/perf)
You can now use it in all perf tools, such as:
perf record -e probe_perf:__machine__addnew_vdso -aR sleep 1
[root@jouet ~]# perf trace -e probe_perf:__machine__addnew_vdso/max-stack=8/
0.000 probe_perf:__machine__addnew_vdso:(568eb3)
__machine__addnew_vdso (/home/acme/bin/perf)
map__new (/home/acme/bin/perf)
machine__process_mmap2_event (/home/acme/bin/perf)
machine__process_event (/home/acme/bin/perf)
perf_event__process (/home/acme/bin/perf)
perf_tool__process_synth_event (/home/acme/bin/perf)
perf_event__synthesize_mmap_events (/home/acme/bin/perf)
__event__synthesize_thread (/home/acme/bin/perf)
The kernel is generating a PERF_RECORD_MMAP for vDSOs, but somehow
'perf top' is not getting those records while 'perf record' is:
# perf record ~acme/c/gtod
^C[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.076 MB perf.data (1499 samples) ]
# perf report -D | grep PERF_RECORD_MMAP2
71293612401913 0x11b48 [0x70]: PERF_RECORD_MMAP2 25484/25484: [0x400000(0x1000) @ 0 fd:02 1137 541179306]: r-xp /home/acme/c/gtod
71293612419012 0x11be0 [0x70]: PERF_RECORD_MMAP2 25484/25484: [0x7fa4a2783000(0x227000) @ 0 fd:00 3146370 854107250]: r-xp /usr/lib64/ld-2.26.so
71293612432110 0x11c50 [0x60]: PERF_RECORD_MMAP2 25484/25484: [0x7ffcdb53a000(0x2000) @ 0 00:00 0 0]: r-xp [vdso]
71293612509944 0x11cb0 [0x70]: PERF_RECORD_MMAP2 25484/25484: [0x7fa4a23cd000(0x3b6000) @ 0 fd:00 3149723 262067164]: r-xp /usr/lib64/libc-2.26.so
#
# perf script | grep vdso | head
gtod 25484 71293.612768: 2485554 cycles:ppp: 7ffcdb53a914 [unknown] ([vdso])
gtod 25484 71293.613576: 2149343 cycles:ppp: 7ffcdb53a917 [unknown] ([vdso])
gtod 25484 71293.614274: 1814652 cycles:ppp: 7ffcdb53aca8 __vdso_gettimeofday+0x98 ([vdso])
gtod 25484 71293.614862: 1669070 cycles:ppp: 7ffcdb53acc5 __vdso_gettimeofday+0xb5 ([vdso])
gtod 25484 71293.615404: 1451589 cycles:ppp: 7ffcdb53acc5 __vdso_gettimeofday+0xb5 ([vdso])
gtod 25484 71293.615999: 1269941 cycles:ppp: 7ffcdb53ace6 __vdso_gettimeofday+0xd6 ([vdso])
gtod 25484 71293.616405: 1177946 cycles:ppp: 7ffcdb53a914 [unknown] ([vdso])
gtod 25484 71293.616775: 1121290 cycles:ppp: 7ffcdb53ac47 __vdso_gettimeofday+0x37 ([vdso])
gtod 25484 71293.617150: 1037721 cycles:ppp: 7ffcdb53ace6 __vdso_gettimeofday+0xd6 ([vdso])
gtod 25484 71293.617478: 994526 cycles:ppp: 7ffcdb53ace6 __vdso_gettimeofday+0xd6 ([vdso])
#
The patch is the obvious one and with it we also continue to resolve
vdso symbols for pre-existing processes in 'perf top' and for all
processes in 'perf record' + 'perf report/script'.
Suggested-by: David Miller <davem@davemloft.net>
Acked-by: David Miller <davem@davemloft.net>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-cs7skq9pp0kjypiju6o7trse@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Helge writes:
"parisc fix:
Fix an unitialized variable usage in the parisc unwind code."
* 'parisc-4.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Fix uninitialized variable usage in unwind.c
Stephen writes:
"clk fixes for v4.19-rc8
One fix for the Allwinner A10 SoC's audio PLL that wasn't properly
set and generating noise."
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: sunxi-ng: sun4i: Set VCO and PLL bias current to lowest setting
Booting an i486 with "no387 nofxsr" ends with with the following crash:
math_emulate: 0060:c101987d
Kernel panic - not syncing: Math emulation needed in kernel
on the first context switch in user land.
The reason is that copy_fpregs_to_fpstate() tries FNSAVE which does not work
as the FPU is turned off.
This bug was introduced in:
f1c8cd0176 ("x86/fpu: Change fpu->fpregs_active users to fpu->fpstate_active")
Add a check for X86_FEATURE_FPU before trying to save FPU registers (we
have such a check in switch_fpu_finish() already).
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Fixes: f1c8cd0176 ("x86/fpu: Change fpu->fpregs_active users to fpu->fpstate_active")
Link: http://lkml.kernel.org/r/20181016202525.29437-4-bigeasy@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
When the function name for an inline frame is invalid, we must not try
to demangle this symbol, otherwise we crash with:
#0 0x0000555555895c01 in bfd_demangle ()
#1 0x0000555555823262 in demangle_sym (dso=0x555555d92b90, elf_name=0x0, kmodule=0) at util/symbol-elf.c:215
#2 dso__demangle_sym (dso=dso@entry=0x555555d92b90, kmodule=<optimized out>, kmodule@entry=0, elf_name=elf_name@entry=0x0) at util/symbol-elf.c:400
#3 0x00005555557fef4b in new_inline_sym (funcname=0x0, base_sym=0x555555d92b90, dso=0x555555d92b90) at util/srcline.c:89
#4 inline_list__append_dso_a2l (dso=dso@entry=0x555555c7bb00, node=node@entry=0x555555e31810, sym=sym@entry=0x555555d92b90) at util/srcline.c:264
#5 0x00005555557ff27f in addr2line (dso_name=dso_name@entry=0x555555d92430 "/home/milian/.debug/.build-id/f7/186d14bb94f3c6161c010926da66033d24fce5/elf", addr=addr@entry=2888, file=file@entry=0x0,
line=line@entry=0x0, dso=dso@entry=0x555555c7bb00, unwind_inlines=unwind_inlines@entry=true, node=0x555555e31810, sym=0x555555d92b90) at util/srcline.c:313
#6 0x00005555557ffe7c in addr2inlines (sym=0x555555d92b90, dso=0x555555c7bb00, addr=2888, dso_name=0x555555d92430 "/home/milian/.debug/.build-id/f7/186d14bb94f3c6161c010926da66033d24fce5/elf")
at util/srcline.c:358
So instead handle the case where we get invalid function names for
inlined frames and use a fallback '??' function name instead.
While this crash was originally reported by Hadrien for rust code, I can
now also reproduce it with trivial C++ code. Indeed, it seems like
libbfd fails to interpret the debug information for the inline frame
symbol name:
$ addr2line -e /home/milian/.debug/.build-id/f7/186d14bb94f3c6161c010926da66033d24fce5/elf -if b48
main
/usr/include/c++/8.2.1/complex:610
??
/usr/include/c++/8.2.1/complex:618
??
/usr/include/c++/8.2.1/complex:675
??
/usr/include/c++/8.2.1/complex:685
main
/home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-inlining/main.cpp:39
I've reported this bug upstream and also attached a patch there which
should fix this issue:
https://sourceware.org/bugzilla/show_bug.cgi?id=23715
Reported-by: Hadrien Grasland <grasland@lal.in2p3.fr>
Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Fixes: a64489c56c ("perf report: Find the inline stack for a given address")
[ The above 'Fixes:' cset is where originally the problem was
introduced, i.e. using a2l->funcname without checking if it is NULL,
but this current patch fixes the current codebase, i.e. multiple csets
were applied after a64489c56c before the problem was reported by Hadrien ]
Link: http://lkml.kernel.org/r/20180926135207.30263-3-milian.wolff@kdab.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
According to rfc7496 section 4.3 or 4.4:
sprstat_policy: This parameter indicates for which PR-SCTP policy
the user wants the information. It is an error to use
SCTP_PR_SCTP_NONE in sprstat_policy. If SCTP_PR_SCTP_ALL is used,
the counters provided are aggregated over all supported policies.
We change to dump pr_assoc and pr_stream all status by SCTP_PR_SCTP_ALL
instead, and return error for SCTP_PR_SCTP_NONE, as it also said "It is
an error to use SCTP_PR_SCTP_NONE in sprstat_policy. "
Fixes: 826d253d57 ("sctp: add SCTP_PR_ASSOC_STATUS on sctp sockopt")
Fixes: d229d48d18 ("sctp: add SCTP_PR_STREAM_STATUS sockopt for prsctp")
Reported-by: Ying Xu <yinxu@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David writes:
"Sparc fixes
1) Revert the %pOF change, it causes regressions.
2) Wire up io_pgetevents().
3) Fix perf events on single-PCR sparc64 cpus.
4) Do proper perf event throttling like arm and x86."
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
Revert "sparc: Convert to using %pOFn instead of device_node.name"
sparc64: Set %l4 properly on trap return after handling signals.
sparc64: Make proc_id signed.
sparc: Throttle perf events properly.
sparc: Fix single-pcr perf event counter management.
sparc: Wire up io_pgetevents system call.
sunvdc: Remove VLA usage
Paul writes:
"SELinux fixes for v4.19
We've got one SELinux "fix" that I'd like to get into v4.19 if
possible. I'm using double quotes on "fix" as this is just an update
to the MAINTAINERS file and not a code change. From my perspective,
MAINTAINERS updates generally don't warrant inclusion during the -rcX
phase, but this is a change to the mailing list location so it seemed
prudent to get this in before v4.19 is released"
* tag 'selinux-pr-20181015' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
MAINTAINERS: update the SELinux mailing list location
hdr.cmd can be indirectly controlled by user-space, hence leading to
a potential exploitation of the Spectre variant 1 vulnerability.
This issue was detected with the help of Smatch:
drivers/infiniband/core/ucma.c:1686 ucma_write() warn: potential
spectre issue 'ucma_cmd_table' [r] (local cap)
Fix this by sanitizing hdr.cmd before using it to index
ucm_cmd_table.
Notice that given that speculation windows are large, the policy is
to kill the speculation on the first load and not worry if it can be
completed with a dependent load/store [1].
[1] https://marc.info/?l=linux-kernel&m=152449131114778&w=2
Cc: stable@vger.kernel.org
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
hdr.cmd can be indirectly controlled by user-space, hence leading to
a potential exploitation of the Spectre variant 1 vulnerability.
This issue was detected with the help of Smatch:
drivers/infiniband/core/ucm.c:1127 ib_ucm_write() warn: potential
spectre issue 'ucm_cmd_table' [r] (local cap)
Fix this by sanitizing hdr.cmd before using it to index
ucm_cmd_table.
Notice that given that speculation windows are large, the policy is
to kill the speculation on the first load and not worry if it can be
completed with a dependent load/store [1].
[1] https://marc.info/?l=linux-kernel&m=152449131114778&w=2
Cc: stable@vger.kernel.org
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
If there's no tracefs (RHEL7) support the tracing_path_mount
returns debugfs path which results in following fail:
# perf probe sys_write
kprobe_events file does not exist - please rebuild kernel with CONFIG_KPROBE_EVENTS.
Error: Failed to add events.
In tracing_path_debugfs_mount function we need to return the
'tracing' path instead of just the mount to make it work:
# perf probe sys_write
Added new event:
probe:sys_write (on sys_write)
You can now use it in all perf tools, such as:
perf record -e probe:sys_write -aR sleep 1
Adding the 'return tracing_path;' also to tracing_path_tracefs_mount
function just for consistency with tracing_path_debugfs_mount.
Upstream keeps working, because it has the tracefs support.
Link: http://lkml.kernel.org/n/tip-yiwkzexq9fk1ey1xg3gnjlw4@git.kernel.org
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Fixes: 23773ca18b ("perf tools: Make perf aware of tracefs")
Link: http://lkml.kernel.org/r/20181016114818.3595-1-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When a build is run from something like a cron job, the user's $PATH is
rather minimal, of note, not including /usr/sbin in my own case. Because
of that, an automated rpm package build ultimately fails to find
libperf-jvmti.so, because somewhere within the build, this happens...
/bin/sh: alternatives: command not found
/bin/sh: alternatives: command not found
Makefile.config:849: No openjdk development package found, please install
JDK package, e.g. openjdk-8-jdk, java-1.8.0-openjdk-devel
...and while the build continues, libperf-jvmti.so isn't built, and
things fall down when rpm tries to find all the %files specified. Exact
same system builds everything just fine when the job is launched from a
login shell instead of a cron job, since alternatives is in $PATH, so
openjdk is actually found.
The test required to get into this section of code actually specifies
the full path, as does a block just above it, so let's do that here too.
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: William Cohen <wcohen@redhat.com>
Fixes: d4dfdf00d4 ("perf jvmti: Plug compilation into perf build")
Link: http://lkml.kernel.org/r/20180906221812.11167-1-jarod@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
John reported crash when recording on an event under PMU with cpumask defined:
root@localhost:~# ./perf_debug_ record -e armv8_pmuv3_0/br_mis_pred/ sleep 1
perf: Segmentation fault
Obtained 9 stack frames.
./perf_debug_() [0x4c5ef8]
[0xffff82ba267c]
./perf_debug_() [0x4bc5a8]
./perf_debug_() [0x419550]
./perf_debug_() [0x41a928]
./perf_debug_() [0x472f58]
./perf_debug_() [0x473210]
./perf_debug_() [0x4070f4]
/lib/aarch64-linux-gnu/libc.so.6(__libc_start_main+0xe0) [0xffff8294c8a0]
Segmentation fault (core dumped)
We synthesize an update event that needs to touch the evsel id array, which is
not defined at that time. Fixing this by forcing the id allocation for events
with their own cpus.
Reported-by: John Garry <john.garry@huawei.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: John Garry <john.garry@huawei.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linuxarm@huawei.com
Fixes: bfd8f72c27 ("perf record: Synthesize unit/scale/... in event update")
Link: http://lkml.kernel.org/r/20181003212052.GA32371@krava
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Commit 7a68d9fb85 ("USB: usbdevfs: sanitize flags more") checks the
transfer flags for URBs submitted from userspace via usbfs. However,
the check for whether the USBDEVFS_URB_SHORT_NOT_OK flag should be
allowed for a control transfer was added in the wrong place, before
the code has properly determined the direction of the control
transfer. (Control transfers are special because for them, the
direction is set by the bRequestType byte of the Setup packet rather
than direction bit of the endpoint address.)
This patch moves code which sets up the allow_short flag for control
transfers down after is_in has been set to the correct value.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reported-and-tested-by: syzbot+24a30223a4b609bb802e@syzkaller.appspotmail.com
Fixes: 7a68d9fb85 ("USB: usbdevfs: sanitize flags more")
CC: Oliver Neukum <oneukum@suse.com>
CC: <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
As noticed by Dave Anglin, the last commit introduced a small bug where
the potentially uninitialized r struct is used instead of the regs
pointer as input for unwind_frame_init(). Fix it.
Signed-off-by: Helge Deller <deller@gmx.de>
Reported-by: John David Anglin <dave.anglin@bell.net>
Jakub Kicinski says:
====================
nfp: fix pedit set action offloads
Pieter says:
This set fixes set actions when using multiple pedit actions with
partial masks and with multiple keys per pedit action. Additionally
it fixes set ipv6 pedit action offloads when using it in combination
with other header keys.
The problem would only trigger if one combines multiple pedit actions
of the same type with partial masks, e.g.:
$ tc filter add dev netdev protocol ip parent ffff: \
flower indev netdev \
ip_proto tcp \
action pedit ex munge \
ip src set 11.11.11.11 retain 65535 munge \
ip src set 22.22.22.22 retain 4294901760 pipe \
csum ip and tcp pipe \
mirred egress redirect dev netdev
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Previously when populating the set ipv6 address action, we incorrectly
made use of pedit's key index to determine which 32bit word should be
set. We now calculate which word has been selected based on the offset
provided by the pedit action.
Fixes: 354b82bb32 ("nfp: add set ipv6 source and destination address")
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Previously we only allowed a single header key per pedit action to
change the header. This used to result in the last header key in the
pedit action to overwrite previous headers. We now keep track of them
and allow multiple header keys per pedit action.
Fixes: c0b1bd9a8b ("nfp: add set ipv4 header action flower offload")
Fixes: 354b82bb32 ("nfp: add set ipv6 source and destination address")
Fixes: f8b7b0a6b1 ("nfp: add set tcp and udp header action flower offload")
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Previously we did not correctly change headers when using multiple
pedit actions with partial masks. We now take this into account and
no longer just commit the last pedit action.
Fixes: c0b1bd9a8b ("nfp: add set ipv4 header action flower offload")
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix a missing call to rxrpc_put_peer() on the main path through the
rxrpc_error_report() function. This manifests itself as a ref leak
whenever an ICMP packet or other error comes in.
In commit f334430316, the hand-off of the ref to a work item was removed
and was not replaced with a put.
Fixes: f334430316 ("rxrpc: Fix error distribution")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Other than asoc pmtu sync from all transports, sctp_assoc_sync_pmtu
is also processing transport pmtu_pending by icmp packets. But it's
meaningless to use sctp_dst_mtu(t->dst) as new pmtu for a transport.
The right pmtu value should come from the icmp packet, and it would
be saved into transport->mtu_info in this patch and used later when
the pmtu sync happens in sctp_sendmsg_to_asoc or sctp_packet_config.
Besides, without this patch, as pmtu can only be updated correctly
when receiving a icmp packet and no place is holding sock lock, it
will take long time if the sock is busy with sending packets.
Note that it doesn't process transport->mtu_info in .release_cb(),
as there is no enough information for pmtu update, like for which
asoc or transport. It is not worth traversing all asocs to check
pmtu_pending. So unlike tcp, sctp does this in tx path, for which
mtu_info needs to be atomic_t.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit db65f35f50 ("net: fec: add support of ethtool get_regs") introduce
ethool "--register-dump" interface to dump all FEC registers.
But not all silicon implementations of the Freescale FEC hardware module
have the FRBR (FIFO Receive Bound Register) and FRSR (FIFO Receive Start
Register) register, so we should not be trying to dump them on those that
don't.
To fix it we create a quirk flag, FEC_QUIRK_HAS_RFREG, and check it before
dump those RX FIFO registers.
Signed-off-by: Fugang Duan <fugang.duan@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The binding table's 'cluster_scope' list is rcu protected to handle
races between threads changing the list and those traversing the list at
the same moment. We have now found that the function named_distribute()
uses the regular list_for_each() macro to traverse the said list.
Likewise, the function tipc_named_withdraw() is removing items from the
same list using the regular list_del() call. When these two functions
execute in parallel we see occasional crashes.
This commit fixes this by adding the missing _rcu() suffixes.
Signed-off-by: Tung Nguyen <tung.q.nguyen@dektech.com.au>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The udpv6_encap_enable() function is part of the ipv6 code, and if that is
configured as a loadable module and rxrpc is built in then a build failure
will occur because the conditional check is wrong:
net/rxrpc/local_object.o: In function `rxrpc_lookup_local':
local_object.c:(.text+0x2688): undefined reference to `udpv6_encap_enable'
Use the correct config symbol (CONFIG_AF_RXRPC_IPV6) in the conditional
check rather than CONFIG_IPV6 as that will do the right thing.
Fixes: 5271953cad ("rxrpc: Use the UDP encap_rcv hook")
Reported-by: kbuild-all@01.org
Reported-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
When commit 270972554c ("[IPV6]: ROUTE: Add Router Reachability
Probing (RFC4191).") introduced router probing, the rt6_probe() function
required that a neighbour entry existed. This neighbour entry is used to
record the timestamp of the last probe via the ->updated field.
Later, commit 2152caea71 ("ipv6: Do not depend on rt->n in rt6_probe().")
removed the requirement for a neighbour entry. Neighbourless routes skip
the interval check and are not rate-limited.
This patch adds rate-limiting for neighbourless routes, by recording the
timestamp of the last probe in the fib6_info itself.
Fixes: 2152caea71 ("ipv6: Do not depend on rt->n in rt6_probe().")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
On GENETv5, there is a hardware issue which prevents the GENET hardware
from generating a link UP interrupt when the link is operating at
10Mbits/sec. Since we do not have any way to configure the link
detection logic, fallback to polling in that case.
Fixes: 421380856d ("net: bcmgenet: add support for the GENETv5 hardware")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes gcc '-Wunused-but-set-variable' warning:
net/rxrpc/output.c: In function 'rxrpc_reject_packets':
net/rxrpc/output.c:527:11: warning:
variable 'ioc' set but not used [-Wunused-but-set-variable]
'ioc' is the correct kvec num when sending a BUSY (or an ABORT) response
packet.
Fixes: ece64fec16 ("rxrpc: Emit BUSY packets when supposed to rather than ABORTs")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix an uninitialised variable introduced by the last patch. This can cause
a crash when a new call comes in to a local service, such as when an AFS
fileserver calls back to the local cache manager.
Fixes: c1e15b4944 ("rxrpc: Fix the packet reception routine")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In the commit referred to below we added link tolerance as an additional
criteria for declaring broadcast transmission "stale" and resetting the
unicast links to the affected node.
Unfortunately, this 'improvement' introduced two bugs, which each and
one alone cause only limited problems, but combined lead to seemingly
stochastic unicast link resets, depending on the amount of broadcast
traffic transmitted.
The first issue, a missing initialization of the 'tolerance' field of
the receiver broadcast link, was recently fixed by commit 047491ea33
("tipc: set link tolerance correctly in broadcast link").
Ths second issue, where we omit to reset the 'stale_cnt' field of
the same link after a 'stale' period is over, leads to this counter
accumulating over time, and in the absence of the 'tolerance' criteria
leads to the above described symptoms. This commit adds the missing
initialization.
Fixes: a4dc70d46c ("tipc: extend link reset criteria for stale packet retransmission")
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
WHen an llc sock is added into the sk_laddr_hash of an llc_sap,
it is not marked with SOCK_RCU_FREE.
This causes that the sock could be freed while it is still being
read by __llc_lookup_established() with RCU read lock. sock is
refcounted, but with RCU read lock, nothing prevents the readers
getting a zero refcnt.
Fix it by setting SOCK_RCU_FREE in llc_sap_add_socket().
Reported-by: syzbot+11e05f04c15e03be5254@syzkaller.appspotmail.com
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Saeed Mahameed says:
====================
Mellanox, mlx5 fixes 2018-10-10
This pull request includes some fixes to mlx5 driver,
Please pull and let me know if there's any problem.
For -stable v4.11:
('net/mlx5: Take only bit 24-26 of wqe.pftype_wq for page fault type')
For -stable v4.17:
('net/mlx5: Fix memory leak when setting fpga ipsec caps')
For -stable v4.18:
('net/mlx5: WQ, fixes for fragmented WQ buffers API')
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Similarly to what has been done in 8b4c3cdd9d ("net: sched: Add policy
validation for tc attributes"), fix classifier code to add validation of
TCA_CHAIN and TCA_KIND netlink attributes.
tested with:
# ./tdc.py -c filter
v2: Let sch_api and cls_api share nla_policy they have in common, thanks
to David Ahern.
v3: Avoid EXPORT_SYMBOL(), as validation of those attributes is not done
by TC modules, thanks to Cong Wang.
While at it, restore the 'Delete / get qdisc' comment to its orginal
position, just above tc_get_qdisc() function prototype.
Fixes: 5bc1701881 ("net: sched: introduce multichain support for filters")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In dev_ethtool(), the eth command 'ethcmd' is firstly copied from the
use-space buffer 'useraddr' and checked to see whether it is
ETHTOOL_PERQUEUE. If yes, the sub-command 'sub_cmd' is further copied from
the user space. Otherwise, 'sub_cmd' is the same as 'ethcmd'. Next,
according to 'sub_cmd', a permission check is enforced through the function
ns_capable(). For example, the permission check is required if 'sub_cmd' is
ETHTOOL_SCOALESCE, but it is not necessary if 'sub_cmd' is
ETHTOOL_GCOALESCE, as suggested in the comment "Allow some commands to be
done by anyone". The following execution invokes different handlers
according to 'ethcmd'. Specifically, if 'ethcmd' is ETHTOOL_PERQUEUE,
ethtool_set_per_queue() is called. In ethtool_set_per_queue(), the kernel
object 'per_queue_opt' is copied again from the user-space buffer
'useraddr' and 'per_queue_opt.sub_command' is used to determine which
operation should be performed. Given that the buffer 'useraddr' is in the
user space, a malicious user can race to change the sub-command between the
two copies. In particular, the attacker can supply ETHTOOL_PERQUEUE and
ETHTOOL_GCOALESCE to bypass the permission check in dev_ethtool(). Then
before ethtool_set_per_queue() is called, the attacker changes
ETHTOOL_GCOALESCE to ETHTOOL_SCOALESCE. In this way, the attacker can
bypass the permission check and execute ETHTOOL_SCOALESCE.
This patch enforces a check in ethtool_set_per_queue() after the second
copy from 'useraddr'. If the sub-command is different from the one obtained
in the first copy in dev_ethtool(), an error code EINVAL will be returned.
Fixes: f38d138a7d ("net/ethtool: support set coalesce per queue")
Signed-off-by: Wenwen Wang <wang6495@umn.edu>
Reviewed-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
In ethtool_get_rxnfc(), the eth command 'cmd' is compared against
'ETHTOOL_GRXFH' to see whether it is necessary to adjust the variable
'info_size'. Then the whole structure of 'info' is copied from the
user-space buffer 'useraddr' with 'info_size' bytes. In the following
execution, 'info' may be copied again from the buffer 'useraddr' depending
on the 'cmd' and the 'info.flow_type'. However, after these two copies,
there is no check between 'cmd' and 'info.cmd'. In fact, 'cmd' is also
copied from the buffer 'useraddr' in dev_ethtool(), which is the caller
function of ethtool_get_rxnfc(). Given that 'useraddr' is in the user
space, a malicious user can race to change the eth command in the buffer
between these copies. By doing so, the attacker can supply inconsistent
data and cause undefined behavior because in the following execution 'info'
will be passed to ops->get_rxnfc().
This patch adds a necessary check on 'info.cmd' and 'cmd' to confirm that
they are still same after the two copies in ethtool_get_rxnfc(). Otherwise,
an error code EINVAL will be returned.
Signed-off-by: Wenwen Wang <wang6495@umn.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Originally, we have an issue where r8169 MSI-X interrupt is broken after
S3 suspend/resume on RTL8106e of ASUS X441UAR.
02:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd.
RTL8101/2/6E PCI Express Fast/Gigabit Ethernet controller [10ec:8136]
(rev 07)
Subsystem: ASUSTeK Computer Inc. RTL810xE PCI Express Fast
Ethernet controller [1043:200f]
Flags: bus master, fast devsel, latency 0, IRQ 16
I/O ports at e000 [size=256]
Memory at ef100000 (64-bit, non-prefetchable) [size=4K]
Memory at e0000000 (64-bit, prefetchable) [size=16K]
Capabilities: [40] Power Management version 3
Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
Capabilities: [70] Express Endpoint, MSI 01
Capabilities: [b0] MSI-X: Enable+ Count=4 Masked-
Capabilities: [d0] Vital Product Data
Capabilities: [100] Advanced Error Reporting
Capabilities: [140] Virtual Channel
Capabilities: [160] Device Serial Number 01-00-00-00-36-4c-e0-00
Capabilities: [170] Latency Tolerance Reporting
Kernel driver in use: r8169
Kernel modules: r8169
We found the all of the values in PCI BAR=4 of the ethernet adapter
become 0xFF after system resumes. That breaks the MSI-X interrupt.
Therefore, we can only fall back to MSI interrupt to fix the issue at
that time.
However, there is a commit which resolves the drivers getting nothing in
PCI BAR=4 after system resumes. It is 04cb3ae895d7 "PCI: Reprogram
bridge prefetch registers on resume" by Daniel Drake.
After apply the patch, the ethernet adapter works fine before suspend
and after resume. So, we can revert the workaround after the commit
"PCI: Reprogram bridge prefetch registers on resume" is merged into main
tree.
This patch reverts commit 7bb05b85bc
"r8169: don't use MSI-X on RTL8106e".
Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=201181
Fixes: 7bb05b85bc ("r8169: don't use MSI-X on RTL8106e")
Signed-off-by: Jian-Hong Pan <jian-hong@endlessm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit 0b9871a3a8.
Causes crashes with qemu, interacts badly with commit commit
6d0a70a284 ("vsprintf: print OF node name using full_name")
etc.
Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This documentation was inadvertently released under the CC-BY-SA-4.0
license. It was intended to be released under GPL-2.0 or later.
Signed-off-by: Matthew Wilcox <willy@infradead.org>
The IDA was declared on the stack instead of statically, so lockdep
triggered a warning that it was improperly initialised.
Reported-by: 0day bot
Tested-by: Rong Chen <rong.a.chen@intel.com>
Signed-off-by: Matthew Wilcox <willy@infradead.org>
The recent patch to fix the afs_server struct leak didn't actually fix the
bug, but rather fixed some of the symptoms. The problem is that an
asynchronous call that holds a resource pointed to by call->reply[0] will
find the pointer cleared in the call destructor, thereby preventing the
resource from being cleaned up.
In the case of the server record leak, the afs_fs_get_capabilities()
function in devel code sets up a call with reply[0] pointing at the server
record that should be altered when the result is obtained, but this was
being cleared before the destructor was called, so the put in the
destructor does nothing and the record is leaked.
Commit f014ffb025 removed the additional ref obtained by
afs_install_server(), but the removal of this ref is actually used by the
garbage collector to mark a server record as being defunct after the record
has expired through lack of use.
The offending clearance of call->reply[0] upon completion in
afs_process_async_call() has been there from the origin of the code, but
none of the asynchronous calls actually use that pointer currently, so it
should be safe to remove (note that synchronous calls don't involve this
function).
Fix this by the following means:
(1) Revert commit f014ffb025.
(2) Remove the clearance of reply[0] from afs_process_async_call().
Without this, afs_manage_servers() will suffer an assertion failure if it
sees a server record that didn't get used because the usage count is not 1.
Fixes: f014ffb025 ("afs: Fix afs_server struct leak")
Fixes: 08e0e7c82e ("[AF_RXRPC]: Make the in-kernel AFS filesystem use AF_RXRPC.")
Signed-off-by: David Howells <dhowells@redhat.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If we did some signal processing, we have to reload the pt_regs
tstate register because it's value may have changed.
In doing so we also have to extract the %pil value contained in there
anre load that into %l4.
This value is at bit 20 and thus needs to be shifted down before we
later write it into the %pil register.
Most of the time this is harmless as we are returning to userspace
and the %pil is zero for that case.
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann says:
====================
pull-request: bpf 2018-10-14
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) Fix xsk map update and delete operation to not call synchronize_net()
but to piggy back on SOCK_RCU_FREE for sockets instead as we are not
allowed to sleep under RCU, from Björn.
2) Do not change RLIMIT_MEMLOCK in reuseport_bpf selftest if the process
already has unlimited RLIMIT_MEMLOCK, from Eric.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
When compiling the kernel with Clang, this warning appears even though
it is disabled for the whole kernel because this folder has its own set
of KBUILD_CFLAGS. It was disabled before the beginning of git history.
In file included from arch/x86/boot/compressed/kaslr.c:29:
In file included from arch/x86/boot/compressed/misc.h:21:
In file included from ./include/linux/elf.h:5:
In file included from ./arch/x86/include/asm/elf.h:77:
In file included from ./arch/x86/include/asm/vdso.h:11:
In file included from ./include/linux/mm_types.h:9:
In file included from ./include/linux/spinlock.h:88:
In file included from ./arch/x86/include/asm/spinlock.h:43:
In file included from ./arch/x86/include/asm/qrwlock.h:6:
./include/asm-generic/qrwlock.h:101:53: warning: passing 'u32 *' (aka
'unsigned int *') to parameter of type 'int *' converts between pointers
to integer types with different sign [-Wpointer-sign]
if (likely(atomic_try_cmpxchg_acquire(&lock->cnts, &cnts, _QW_LOCKED)))
^~~~~
./include/linux/compiler.h:76:40: note: expanded from macro 'likely'
# define likely(x) __builtin_expect(!!(x), 1)
^
./include/asm-generic/atomic-instrumented.h:69:66: note: passing
argument to parameter 'old' here
static __always_inline bool atomic_try_cmpxchg(atomic_t *v, int *old, int new)
^
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Link: https://lkml.kernel.org/r/20181013010713.6999-1-natechancellor@gmail.com
Clang warns that the declaration of jiffies in include/linux/jiffies.h
doesn't match the definition in arch/x86/time/kernel.c:
arch/x86/kernel/time.c:29:42: warning: section does not match previous declaration [-Wsection]
__visible volatile unsigned long jiffies __cacheline_aligned = INITIAL_JIFFIES;
^
./include/linux/cache.h:49:4: note: expanded from macro '__cacheline_aligned'
__section__(".data..cacheline_aligned")))
^
./include/linux/jiffies.h:81:31: note: previous attribute is here
extern unsigned long volatile __cacheline_aligned_in_smp __jiffy_arch_data jiffies;
^
./arch/x86/include/asm/cache.h:20:2: note: expanded from macro '__cacheline_aligned_in_smp'
__page_aligned_data
^
./include/linux/linkage.h:39:29: note: expanded from macro '__page_aligned_data'
#define __page_aligned_data __section(.data..page_aligned) __aligned(PAGE_SIZE)
^
./include/linux/compiler_attributes.h:233:56: note: expanded from macro '__section'
#define __section(S) __attribute__((__section__(#S)))
^
1 warning generated.
The declaration was changed in commit 7c30f352c8 ("jiffies.h: declare
jiffies and jiffies_64 with ____cacheline_aligned_in_smp") but wasn't
updated here. Make them match so Clang no longer warns.
Fixes: 7c30f352c8 ("jiffies.h: declare jiffies and jiffies_64 with ____cacheline_aligned_in_smp")
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20181013005311.28617-1-natechancellor@gmail.com
It is important to clear the hw->state value for non-stopped events
when they are added into the PMU. Otherwise when the event is
scheduled out, we won't read the counter because HES_UPTODATE is still
set. This breaks 'perf stat' and similar use cases, causing all the
events to show zero.
This worked for multi-pcr because we make explicit sparc_pmu_start()
calls in calculate_multiple_pcrs(). calculate_single_pcr() doesn't do
this because the idea there is to accumulate all of the counter
settings into the single pcr value. So we have to add explicit
hw->state handling there.
Like x86, we use the PERF_HES_ARCH bit to track truly stopped events
so that we don't accidently start them on a reload.
Related to all of this, sparc_pmu_start() is missing a userpage update
so add it.
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael reported that he could not stat following event:
$ perf stat -e unc_p_freq_ge_1200mhz_cycles -a -- ls
event syntax error: '..e_1200mhz_cycles'
\___ value too big for format, maximum is 255
Run 'perf list' for a list of valid events
The event is unwrapped into:
uncore_pcu/event=0xb,filter_band0=1200/
where filter_band0 format says it's one byte only:
# cat uncore_pcu/format/filter_band0
config1:0-7
while JSON files specifies bigger number:
"Filter": "filter_band0=1200",
all the filter_band* formats show 1 byte width:
# cat uncore_pcu/format/filter_band1
config1:8-15
# cat uncore_pcu/format/filter_band2
config1:16-23
# cat uncore_pcu/format/filter_band3
config1:24-31
The reason of the issue is that filter_band* values are supposed to be
in 100Mhz units.. it's stated in the JSON help for the events, like:
filter_band3=XXX, with XXX in 100Mhz units
This patch divides the filter_band* values by 100, plus there's couple
of changes that actually change the number completely, like:
- "Filter": "edge=1,filter_band2=4000",
+ "Filter": "edge=1,filter_band2=30",
Reported-by: Michael Petlan <mpetlan@redhat.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Andi Kleen <ak@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20181010080339.GB15790@krava
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
With a very low cpu.cfs_quota_us setting, such as the minimum of 1000,
distribute_cfs_runtime may not empty the throttled_list before it runs
out of runtime to distribute. In that case, due to the change from
c06f04c704 to put throttled entries at the head of the list, later entries
on the list will starve. Essentially, the same X processes will get pulled
off the list, given CPU time and then, when expired, get put back on the
head of the list where distribute_cfs_runtime will give runtime to the same
set of processes leaving the rest.
Fix the issue by setting a bit in struct cfs_bandwidth when
distribute_cfs_runtime is running, so that the code in throttle_cfs_rq can
decide to put the throttled entry on the tail or the head of the list. The
bit is set/cleared by the callers of distribute_cfs_runtime while they hold
cfs_bandwidth->lock.
This is easy to reproduce with a handful of CPU consumers. I use 'crash' on
the live system. In some cases you can simply look at the throttled list and
see the later entries are not changing:
crash> list cfs_rq.throttled_list -H 0xffff90b54f6ade40 -s cfs_rq.runtime_remaining | paste - - | awk '{print $1" "$4}' | pr -t -n3
1 ffff90b56cb2d200 -976050
2 ffff90b56cb2cc00 -484925
3 ffff90b56cb2bc00 -658814
4 ffff90b56cb2ba00 -275365
5 ffff90b166a45600 -135138
6 ffff90b56cb2da00 -282505
7 ffff90b56cb2e000 -148065
8 ffff90b56cb2fa00 -872591
9 ffff90b56cb2c000 -84687
10 ffff90b56cb2f000 -87237
11 ffff90b166a40a00 -164582
crash> list cfs_rq.throttled_list -H 0xffff90b54f6ade40 -s cfs_rq.runtime_remaining | paste - - | awk '{print $1" "$4}' | pr -t -n3
1 ffff90b56cb2d200 -994147
2 ffff90b56cb2cc00 -306051
3 ffff90b56cb2bc00 -961321
4 ffff90b56cb2ba00 -24490
5 ffff90b166a45600 -135138
6 ffff90b56cb2da00 -282505
7 ffff90b56cb2e000 -148065
8 ffff90b56cb2fa00 -872591
9 ffff90b56cb2c000 -84687
10 ffff90b56cb2f000 -87237
11 ffff90b166a40a00 -164582
Sometimes it is easier to see by finding a process getting starved and looking
at the sched_info:
crash> task ffff8eb765994500 sched_info
PID: 7800 TASK: ffff8eb765994500 CPU: 16 COMMAND: "cputest"
sched_info = {
pcount = 8,
run_delay = 697094208,
last_arrival = 240260125039,
last_queued = 240260327513
},
crash> task ffff8eb765994500 sched_info
PID: 7800 TASK: ffff8eb765994500 CPU: 16 COMMAND: "cputest"
sched_info = {
pcount = 8,
run_delay = 697094208,
last_arrival = 240260125039,
last_queued = 240260327513
},
Signed-off-by: Phil Auld <pauld@redhat.com>
Reviewed-by: Ben Segall <bsegall@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Fixes: c06f04c704 ("sched: Fix potential near-infinite distribute_cfs_runtime() loop")
Link: http://lkml.kernel.org/r/20181008143639.GA4019@pauld.bos.csb
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The XSKMAP update and delete functions called synchronize_net(), which
can sleep. It is not allowed to sleep during an RCU read section.
Instead we need to make sure that the sock sk_destruct (xsk_destruct)
function is asynchronously called after an RCU grace period. Setting
the SOCK_RCU_FREE flag for XDP sockets takes care of this.
Fixes: fbfc504a24 ("bpf: introduce new bpf AF_XDP map type BPF_MAP_TYPE_XSKMAP")
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
mlx5e netdevice used to calculate fragment edges by a call to
mlx5_wq_cyc_get_frag_size(). This calculation did not give the correct
indication for queues smaller than a PAGE_SIZE, (broken by default on
PowerPC, where PAGE_SIZE == 64KB). Here it is replaced by the correct new
calls/API.
Since (TX/RX) Work Queues buffers are fragmented, here we introduce
changes to the API in core driver, so that it gets a stride index and
returns the index of last stride on same fragment, and an additional
wrapping function that returns the number of physically contiguous
strides that can be written contiguously to the work queue.
This obsoletes the following API functions, and their buggy
usage in EN driver:
* mlx5_wq_cyc_get_frag_size()
* mlx5_wq_cyc_ctr2fragix()
The new API improves modularity and hides the details of such
calculation for mlx5e netdevice and mlx5_ib rdma drivers.
New calculation is also more efficient, and improves performance
as follows:
Packet rate test: pktgen, UDP / IPv4, 64byte, single ring, 8K ring size.
Before: 16,477,619 pps
After: 17,085,793 pps
3.7% improvement
Fixes: 3a2f703312 ("net/mlx5: Use order-0 allocations for all WQ types")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
The HW spec defines only bits 24-26 of pftype_wq as the page fault type,
use the required mask to ensure that.
Fixes: d9aaed8387 ("{net,IB}/mlx5: Refactor page fault handling")
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Add missing pm_runtime_disable() to remove(), in order to avoid
an Unbalanced pm_runtime_enable when the module is removed and
re-probed.
Error log:
root@intel-corei7-64:~# modprobe -r intel_xhci_usb_role_switch
root@intel-corei7-64:~# modprobe intel_xhci_usb_role_switch
intel_xhci_usb_sw intel_xhci_usb_sw: Unbalanced pm_runtime_enable!
Fixes: cb29684686 (usb: roles: intel_xhci: Enable runtime PM)
Cc: <stable@vger.kernel.org>
Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Signed-off-by: Wan Ahmad Zainie <wan.ahmad.zainie.wan.mohamad@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The usb standard ("Universal Serial Bus Class Definitions for Communication
Devices") distiguishes between "consistent signals" (DSR, DCD), and
"irregular signals" (break, ring, parity error, framing error, overrun).
The bits of "irregular signals" are set, if this error/event occurred on
the device side and are immeadeatly unset, if the serial state notification
was sent.
Like other drivers of real serial ports do, just the occurence of those
events should be counted in serial_icounter_struct (but no 1->0
transitions).
Signed-off-by: Tobias Herzog <t-herzog@gmx.de>
Acked-by: Oliver Neukum <oneukum@suse.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Resetting the write index of the notification buffer on urb unlink (e.g.
closing a cdc-acm device from userspace) may lead to wrong interpretation
of further received notifications, in case the index is not 0 when urb
unlink happens (i.e. when parts of a notification already have been
transferred). On the device side there is no "reset" of the notification
transimission and thus we would get out of sync with the device.
Signed-off-by: Tobias Herzog <t-herzog@gmx.de>
Acked-by: Oliver Neukum <oneukum@suse.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If a device splits up a control message and a reset() happens
between the parts, the message is lost and already recieved parts
must be dropped.
Signed-off-by: Oliver Neukum <oneukum@suse.com>
Fixes: 1aba579f3c ("cdc-acm: handle read pipe errors")
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Add sleep between attach and "usbip port" check to make sure status is
updated. Running attach and query back shows incorrect status.
Signed-off-by: Shuah Khan (Samsung OSG) <shuah@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This reverts commit ac0e2cd555.
Michael reported an issue with oversized terms values assignment
and I noticed there was actually a misunderstanding of the max
value check in the past.
The above commit's changelog says:
If bit 21 is set, there is parsing issues as below.
$ perf stat -a -e uncore_qpi_0/event=0x200002,umask=0x8/
event syntax error: '..pi_0/event=0x200002,umask=0x8/'
\___ value too big for format, maximum is 511
But there's no issue there, because the event value is distributed
along the value defined by the format. Even if the format defines
separated bit, the value is treated as a continual number, which
should follow the format definition.
In above case it's 9-bit value with last bit separated:
$ cat uncore_qpi_0/format/event
config:0-7,21
Hence the value 0x200002 is correctly reported as format violation,
because it exceeds 9 bits. It should have been 0x102 instead, which
sets the 9th bit - the bit 21 of the format.
$ perf stat -vv -a -e uncore_qpi_0/event=0x102,umask=0x8/
Using CPUID GenuineIntel-6-2D
...
------------------------------------------------------------
perf_event_attr:
type 10
size 112
config 0x200802
sample_type IDENTIFIER
...
Reported-by: Michael Petlan <mpetlan@redhat.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Fixes: ac0e2cd555 ("perf tools: Fix PMU term format max value calculation")
Link: http://lkml.kernel.org/r/20181003072046.29276-1-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To pick up the changes introduced in:
6fbbde9a19 ("KVM: x86: Control guest reads of MSR_PLATFORM_INFO")
That is not yet used in tools such as 'perf trace'.
The type of the change in this file, a simple integer parameter to the
KVM_CHECK_EXTENSION ioctl should be easier to implement tho, adding to
the libbeauty TODO list.
This silences this perf build warning:
Warning: Kernel ABI header at 'tools/include/uapi/linux/kvm.h' differs from latest version at 'include/uapi/linux/kvm.h'
diff -u tools/include/uapi/linux/kvm.h include/uapi/linux/kvm.h
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Drew Schmitt <dasch@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-67h1bio5bihi1q6dy7hgwwx8@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To get the changes in:
d176620277 ("x86/kvm/lapic: always disable MMIO interface in x2APIC mode")
That at this time will not generate changes in tools such as 'perf trace',
that still needs more work in tools/perf/examples/bpf/augmented_syscalls.c
to need such id -> string tables.
This silences the following perf build warning:
Warning: Kernel ABI header at 'tools/arch/x86/include/uapi/asm/kvm.h' differs from latest version at 'arch/x86/include/uapi/asm/kvm.h'
diff -u tools/arch/x86/include/uapi/asm/kvm.h arch/x86/include/uapi/asm/kvm.h
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-yadntj2ok6zpzjwi656onuh0@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The code had been clearing a namespace being deleted as the current
path while that namespace was still in the path siblings list. It is
possible a new IO could set that namespace back to the current path
since it appeared to be an eligable path to select, which may result in
a use-after-free error.
This patch ensures a namespace being removed is not eligable to be reset
as a current path prior to clearing it as the current path.
Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
drm_mode_setcrtc() retries modesetting in case one of the functions it
calls returns -EDEADLK. connector_set, mode and fb are freed before
retrying, but they are not set to NULL. This can cause
drm_mode_setcrtc() to use those variables.
For example: On the first try __drm_mode_set_config_internal() returns
-EDEADLK. connector_set, mode and fb are freed. Next retry starts, and
drm_modeset_lock_all_ctx() returns -EDEADLK, and we jump to 'out'. The
code will happily try to release all three again.
This leads to crashes of different kinds, depending on the sequence the
EDEADLKs happen.
Fix this by setting the three variables to NULL at the start of the
retry loop.
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20180917110054.4053-1-tomi.valkeinen@ti.com
net/core/flow.c does not exist anymore, so remove it
from the IPSEC NETWORKING section of the MAINTAINERS
file.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Pull Allwinner clk fixes for 4.19 from Maxime Ripard:
One fix for the Audio PLL that were not properly set and generating noise
on the A10 SoCs.
* tag 'sunxi-clk-fixes-for-4.19' of https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux:
clk: sunxi-ng: sun4i: Set VCO and PLL bias current to lowest setting
The default mid-level PLL bias current setting interferes with sigma
delta modulation. This manifests as decreased audio quality at lower
sampling rates, which sounds like radio broadcast quality, and
distortion noises at sampling rates at 48 kHz or above.
Changing the bias current settings to the lowest gets rid of the
noise.
Fixes: de34485191 ("clk: sunxi-ng: sun4i: Use sigma-delta modulation
for audio PLL")
Cc: <stable@vger.kernel.org> # 4.15.x
Signed-off-by: Chen-Yu Tsai <wens@csie.org>
Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
2018-09-07 10:20:50 +02:00
143 changed files with 993 additions and 972 deletions
@@ -2846,7 +2853,7 @@ int dev_ethtool(struct net *net, struct ifreq *ifr)
rc=ethtool_get_phy_stats(dev,useraddr);
break;
caseETHTOOL_PERQUEUE:
rc=ethtool_set_per_queue(dev,useraddr);
rc=ethtool_set_per_queue(dev,useraddr,sub_cmd);
break;
caseETHTOOL_GLINKSETTINGS:
rc=ethtool_get_link_ksettings(dev,useraddr);
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.