Commit Graph

3798 Commits

Author SHA1 Message Date
Filipe Manana
2e438442ba btrfs: remove not needed mod_start and mod_len from struct extent_map
The mod_start and mod_len fields of struct extent_map were introduced by
commit 4e2f84e63d ("Btrfs: improve fsync by filtering extents that we
want") in order to avoid too low performance when fsyncing a file that
keeps getting extent maps merge, because it resulted in each fsync logging
again csum ranges that were already merged before.

We don't need this anymore as extent maps in the list of modified extents
are never merged with other extent maps and once we log an extent map we
remove it from the list of modified extent maps, so it's never logged
twice.

So remove the mod_start and mod_len fields from struct extent_map and use
instead the start and len fields when logging checksums in the fast fsync
path. This also makes EXTENT_FLAG_FILLING unused so remove it as well.

Running the reproducer from the commit mentioned before, with a larger
number of extents and against a null block device, so that IO is fast
and we can better see any impact from searching checksums items and
logging them, gave the following results from dd:

Before this change:

   409600000 bytes (410 MB, 391 MiB) copied, 22.948 s, 17.8 MB/s

After this change:

   409600000 bytes (410 MB, 391 MiB) copied, 22.9997 s, 17.8 MB/s

So no changes in throughput.
The test was done in a release kernel (non-debug, Debian's default kernel
config) and its steps are the following:

   $ mkfs.btrfs -f /dev/nullb0
   $ mount /dev/sdb /mnt
   $ dd if=/dev/zero of=/mnt/foobar bs=4k count=100000 oflag=sync
   $ umount /mnt

This also reduces the size of struct extent_map from 128 bytes down to 112
bytes, so now we can have 36 extents maps per 4K page instead of 32.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-05-07 21:31:02 +02:00
Trond Myklebust
939cb14d51 NFS/knfsd: Remove the invalid NFS error 'NFSERR_OPNOTSUPP'
NFSERR_OPNOTSUPP is not described by any RFC, and should not be used.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-05-06 12:47:24 -04:00
Takashi Sakamoto
5a5dc48083 firewire: core: remove flag and width from u64 formats of tracepoints events
The pointer to fw_packet structure is passed to ring buffer of tracepoints
framework as the value of u64 type. '0x%016llx' is used for the print
format of value, while the flag and width are useless in the case.

This commit removes them.

Link: https://lore.kernel.org/r/20240506082154.396077-2-o-takashi@sakamocchi.jp
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
2024-05-06 17:53:35 +09:00
Takashi Sakamoto
87144bbc99 firewire: core: fix type of timestamp for async_inbound_template tracepoints events
The type of time stamp should be u16, instead of u8.

Link: https://lore.kernel.org/r/20240506082154.396077-1-o-takashi@sakamocchi.jp
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
2024-05-06 17:53:34 +09:00
Takashi Sakamoto
6b0b708f12 firewire: core: add tracepoint event for handling bus reset
The core function expects hardware drivers to call
fw_core_handle_bus_reset() when changing bus topology. The 1394 OHCI
driver calls it when handling selfID event as a result of any bus-reset.

This commit adds a tracepoints event for it.

Link: https://lore.kernel.org/r/20240501073238.72769-6-o-takashi@sakamocchi.jp
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
2024-05-06 11:06:06 +09:00
Takashi Sakamoto
08dd8602aa firewire: core: add tracepoints events for initiating bus reset
At a commit 673249124304 ("firewire: core: option to log bus reset
initiation"), some kernel log messages were added to trace initiation of
bus reset. The kernel log messages are really helpful, while nowadays it
is not preferable just for debugging purpose. For the purpose, Linux
kernel tracepoints is more preferable.

This commit adds some alternative tracepoints events.

Link: https://lore.kernel.org/r/20240501073238.72769-4-o-takashi@sakamocchi.jp
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
2024-05-06 11:06:06 +09:00
Takashi Sakamoto
eec045c571 firewire: core: add tracepoints event for asynchronous inbound phy packet
At the former commit, a pair of tracepoints events is added to trace
asynchronous outbound phy packet. This commit adds a tracepoints event
to trace inbound phy packet. It includes transaction status as well as
the content of phy packet.

This is an example for Remote Reply Packet as a response to Remote Access
Packet sent by lsfirewirephy command in linux-firewire-utils:

async_phy_inbound: \
  packet=0xffff955fc02b4e10 generation=1 status=1 timestamp=0x0619 \
  first_quadlet=0x001c8208 second_quadlet=0xffe37df7

Link: https://lore.kernel.org/r/20240430001404.734657-3-o-takashi@sakamocchi.jp
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
2024-05-06 11:06:05 +09:00
Takashi Sakamoto
1a4c53cf35 firewire: core/cdev: add tracepoints events for asynchronous phy packet
In IEEE 1394 bus, the type of asynchronous packet without any offset to
node address space is called as phy packet. The destination of packet is
IEEE 1394 phy itself. This type of packet is used for several purposes,
mainly for selfID at the state of bus reset, to force selection of root
node, and to adjust gap count.

This commit adds tracepoints events for the type of asynchronous outbound
packet. Like asynchronous outbound transaction packets, a pair of events
are added to trace initiation and completion of transmission.

In the case that the phy packet is sent by kernel API, the match between
the initiation and completion is not so easy, since the data of
'struct fw_packet' is allocated statically. In the case that it is sent by
userspace applications via cdev, the match is easy, since the data is
allocated per each.

This example is for Remote Access Packet by lsfirewirephy command in
linux-firewire-utils:

async_phy_outbound_initiate: \
  packet=0xffff89fb34e42e78 generation=1 first_quadlet=0x00148200 \
  second_quadlet=0xffeb7dff
async_phy_outbound_complete: \
  packet=0xffff89fb34e42e78 generation=1 status=1 timestamp=0x0619

Link: https://lore.kernel.org/r/20240430001404.734657-2-o-takashi@sakamocchi.jp
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
2024-05-06 11:06:05 +09:00
Takashi Sakamoto
624a8535f7 firewire: core: add tracepoints events for asynchronous outbound response
In a view of core transaction service, the asynchronous outbound response
consists of two stages; initiation and completion.

This commit adds a pair of events for the asynchronous outbound response.
The following example is for asynchronous write quadlet request as IEC
61883-1 FCP response to node 0xffc1.

async_response_outbound_initiate: \
  transaction=0xffff89fa08cf16c0 generation=4 scode=2 dst_id=0xffc1 \
  tlabel=25 tcode=2 src_id=0xffc0 rcode=0 \
  header={0xffc16420,0xffc00000,0x0,0x0} data={}
async_response_outbound_complete: \
  transaction=0xffff89fa08cf16c0 generation=4 scode=2 status=1 \
  timestamp=0x0000

Link: https://lore.kernel.org/r/20240429043218.609398-6-o-takashi@sakamocchi.jp
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
2024-05-06 11:06:05 +09:00
Takashi Sakamoto
2c945b10d7 firewire: core: add tracepoint event for asynchronous inbound request
This commit adds an event for asynchronous inbound request.

The following example is for asynchronous block write request as IEC
61883-1 FCP request from node 0xffc1.

async_request_inbound: \
  transaction=0xffff89fa08cf16c0 generation=4 scode=2 status=2 \
  timestamp=0x00b3 dst_id=0xffc0 tlabel=19 tcode=1 src_id=0xffc1 \
  offset=0xfffff0000d00 header={0xffc04d10,0xffc1ffff,0xf0000d00,0x80000} \
  data={0x19ff08,0xffff0090}

Link: https://lore.kernel.org/r/20240429043218.609398-5-o-takashi@sakamocchi.jp
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
2024-05-06 11:06:05 +09:00
Takashi Sakamoto
06cc078c07 firewire: core: add tracepoints event for asynchronous inbound response
In the transaction of IEEE 1394, the node to receive the asynchronous
request transfers any response packet to the requester except for the
unified transaction.

This commit adds an event for the inbound packet. Note that the code to
decode the packet header is moved, against the note about the sanity
check.

The following example is for asynchronous lock response with
compare_and_swap code.

async_response_inbound: \
  transaction=0xffff955fc6a07a10 generation=5 scode=2 status=1 \
  timestamp=0x0089 dst_id=0xffc1 tlabel=54 tcode=11 src_id=0xffc0 \
  rcode=0 header={0xffc1d9b0,0xffc00000,0x0,0x40002} data={0x50800080}

Link: https://lore.kernel.org/r/20240429043218.609398-4-o-takashi@sakamocchi.jp
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
2024-05-06 11:06:05 +09:00
Takashi Sakamoto
944b06840a firewire: core: add tracepoints events for asynchronous outbound request
In a view of core transaction service, the asynchronous outbound request
consists of two stages; initiation and completion. This commit adds a pair
of event for them.

The following example is for asynchronous lock request with compare_swap
code to offset 0x'ffff'f000'0904 in node 0xffc0.

async_request_outbound_initiate: \
  transaction=0xffff955fc6a07a10 generation=5 scode=2 dst_id=0xffc0 \
  tlabel=54 tcode=9 src_id=0xffc1 offset=0xfffff0000904 \
  header={0xffc0d990,0xffc1ffff,0xf0000904,0x80002}
  data={0x80,0x940181}
async_request_outbound_complete: \
  transaction=0xffff955fc6a07a10 generation=5 scode=2 status=2 \
  timestamp=0xd887

Link: https://lore.kernel.org/r/20240429043218.609398-3-o-takashi@sakamocchi.jp
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
2024-05-06 11:06:05 +09:00
Takashi Sakamoto
57614c2884 firewire: core: add support for Linux kernel tracepoints
The Linux Kernel Tracepoints framework is enough useful to trace
packet data inbound to and outbound from core.

This commit adds firewire subsystem to use the framework.

Link: https://lore.kernel.org/r/20240429043218.609398-2-o-takashi@sakamocchi.jp
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
2024-05-06 11:06:05 +09:00
David Hildenbrand
6eca325674 trace/events/page_ref: trace the raw page mapcount value
We want to limit the use of page_mapcount() to the places where it is
absolutely necessary.  We already trace raw page->refcount, raw
page->flags and raw page->mapping, and don't involve any folios.  Let's
also trace the raw mapcount value that does not consider the entire
mapcount of large folios, and we don't add "1" to it.

When dealing with typed folios, this makes a lot more sense.  ...  and
it's for debugging purposes only either way.

Link: https://lkml.kernel.org/r/20240409192301.907377-16-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: Hugh Dickins <hughd@google.com>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Richard Chang <richardycc@google.com>
Cc: Rich Felker <dalias@libc.org>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Yin Fengwei <fengwei.yin@intel.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-05 17:53:31 -07:00
Jakub Kicinski
e958da0ddb Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR.

Conflicts:

include/linux/filter.h
kernel/bpf/core.c
  66e13b615a ("bpf: verifier: prevent userspace memory access")
  d503a04f8b ("bpf: Add support for certain atomics in bpf_arena to x86 JIT")
https://lore.kernel.org/all/20240429114939.210328b0@canb.auug.org.au/

No adjacent changes.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-02 12:06:25 -07:00
David Howells
69c3c023af cifs: Implement netfslib hooks
Provide implementation of the netfslib hooks that will be used by netfslib
to ask cifs to set up and perform operations.  Of particular note are

 (*) cifs_clamp_length() - This is used to negotiate the size of the next
     subrequest in a read request, taking into account the credit available
     and the rsize.  The credits are attached to the subrequest.

 (*) cifs_req_issue_read() - This is used to issue a subrequest that has
     been set up and clamped.

 (*) cifs_prepare_write() - This prepares to fill a subrequest by picking a
     channel, reopening the file and requesting credits so that we can set
     the maximum size of the subrequest and also sets the maximum number of
     segments if we're doing RDMA.

 (*) cifs_issue_write() - This releases any unneeded credits and issues an
     asynchronous data write for the contiguous slice of file covered by
     the subrequest.  This should possibly be folded in to all
     ->async_writev() ops and that called directly.

 (*) cifs_begin_writeback() - This gets the cached writable handle through
     which we do writeback (this does not affect writethrough, unbuffered
     or direct writes).

At this point, cifs is not wired up to actually *use* netfslib; that will
be done in a subsequent patch.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Steve French <sfrench@samba.org>
cc: Shyam Prasad N <nspmangalore@gmail.com>
cc: Rohith Surabattula <rohiths.msft@gmail.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
cc: linux-mm@kvack.org
2024-05-01 18:08:21 +01:00
David Howells
d41ca44c20 netfs: Miscellaneous tidy ups
Do a couple of miscellaneous tidy ups:

 (1) Add a qualifier into a file banner comment.

 (2) Put the writeback folio traces back into alphabetical order.

 (3) Remove some unused folio traces.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
2024-05-01 18:07:38 +01:00
David Howells
288ace2f57 netfs: New writeback implementation
The current netfslib writeback implementation creates writeback requests of
contiguous folio data and then separately tiles subrequests over the space
twice, once for the server and once for the cache.  This creates a few
issues:

 (1) Every time there's a discontiguity or a change between writing to only
     one destination or writing to both, it must create a new request.
     This makes it harder to do vectored writes.

 (2) The folios don't have the writeback mark removed until the end of the
     request - and a request could be hundreds of megabytes.

 (3) In future, I want to support a larger cache granularity, which will
     require aggregation of some folios that contain unmodified data (which
     only need to go to the cache) and some which contain modifications
     (which need to be uploaded and stored to the cache) - but, currently,
     these are treated as discontiguous.

There's also a move to get everyone to use writeback_iter() to extract
writable folios from the pagecache.  That said, currently writeback_iter()
has some issues that make it less than ideal:

 (1) there's no way to cancel the iteration, even if you find a "temporary"
     error that means the current folio and all subsequent folios are going
     to fail;

 (2) there's no way to filter the folios being written back - something
     that will impact Ceph with it's ordered snap system;

 (3) and if you get a folio you can't immediately deal with (say you need
     to flush the preceding writes), you are left with a folio hanging in
     the locked state for the duration, when really we should unlock it and
     relock it later.

In this new implementation, I use writeback_iter() to pump folios,
progressively creating two parallel, but separate streams and cleaning up
the finished folios as the subrequests complete.  Either or both streams
can contain gaps, and the subrequests in each stream can be of variable
size, don't need to align with each other and don't need to align with the
folios.

Indeed, subrequests can cross folio boundaries, may cover several folios or
a folio may be spanned by multiple folios, e.g.:

         +---+---+-----+-----+---+----------+
Folios:  |   |   |     |     |   |          |
         +---+---+-----+-----+---+----------+

           +------+------+     +----+----+
Upload:    |      |      |.....|    |    |
           +------+------+     +----+----+

         +------+------+------+------+------+
Cache:   |      |      |      |      |      |
         +------+------+------+------+------+

The progressive subrequest construction permits the algorithm to be
preparing both the next upload to the server and the next write to the
cache whilst the previous ones are already in progress.  Throttling can be
applied to control the rate of production of subrequests - and, in any
case, we probably want to write them to the server in ascending order,
particularly if the file will be extended.

Content crypto can also be prepared at the same time as the subrequests and
run asynchronously, with the prepped requests being stalled until the
crypto catches up with them.  This might also be useful for transport
crypto, but that happens at a lower layer, so probably would be harder to
pull off.

The algorithm is split into three parts:

 (1) The issuer.  This walks through the data, packaging it up, encrypting
     it and creating subrequests.  The part of this that generates
     subrequests only deals with file positions and spans and so is usable
     for DIO/unbuffered writes as well as buffered writes.

 (2) The collector. This asynchronously collects completed subrequests,
     unlocks folios, frees crypto buffers and performs any retries.  This
     runs in a work queue so that the issuer can return to the caller for
     writeback (so that the VM can have its kswapd thread back) or async
     writes.

 (3) The retryer.  This pauses the issuer, waits for all outstanding
     subrequests to complete and then goes through the failed subrequests
     to reissue them.  This may involve reprepping them (with cifs, the
     credits must be renegotiated, and a subrequest may need splitting),
     and doing RMW for content crypto if there's a conflicting change on
     the server.

[!] Note that some of the functions are prefixed with "new_" to avoid
clashes with existing functions.  These will be renamed in a later patch
that cuts over to the new algorithm.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
cc: Eric Van Hensbergen <ericvh@kernel.org>
cc: Latchesar Ionkov <lucho@ionkov.net>
cc: Dominique Martinet <asmadeus@codewreck.org>
cc: Christian Schoenebeck <linux_oss@crudebyte.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: v9fs@lists.linux.dev
cc: linux-afs@lists.infradead.org
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
2024-05-01 18:07:36 +01:00
David Howells
7ba167c4c7 netfs: Switch to using unsigned long long rather than loff_t
Switch to using unsigned long long rather than loff_t in netfslib to avoid
problems with the sign flipping in the maths when we're dealing with the
byte at position 0x7fffffffffffffff.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
cc: Ilya Dryomov <idryomov@gmail.com>
cc: Xiubo Li <xiubli@redhat.com>
cc: netfs@lists.linux.dev
cc: ceph-devel@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
2024-05-01 18:07:35 +01:00
David Howells
b4ff7b178b netfs: Remove ->launder_folio() support
Remove support for ->launder_folio() from netfslib and expect filesystems
to use filemap_invalidate_inode() instead.  netfs_launder_folio() can then
be got rid of.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
cc: Eric Van Hensbergen <ericvh@kernel.org>
cc: Latchesar Ionkov <lucho@ionkov.net>
cc: Dominique Martinet <asmadeus@codewreck.org>
cc: Christian Schoenebeck <linux_oss@crudebyte.com>
cc: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Steve French <sfrench@samba.org>
cc: Matthew Wilcox <willy@infradead.org>
cc: linux-mm@kvack.org
cc: linux-fsdevel@vger.kernel.org
cc: netfs@lists.linux.dev
cc: v9fs@lists.linux.dev
cc: linux-afs@lists.infradead.org
cc: ceph-devel@vger.kernel.org
cc: linux-cifs@vger.kernel.org
cc: devel@lists.orangefs.org
2024-05-01 18:07:34 +01:00
Mark Brown
9f6bdb0aa1 ASoC: doc: dapm: various improvements
Merge series from Luca Ceresoli <luca.ceresoli@bootlin.com>:

This series applies various improvements to the DAPM documentation: a
rewrite of a few sections for clarity, style improvements and typo fixes.

Signed-off-by: Luca Ceresoli <luca.ceresoli@bootlin.com>
---
Changes in v2:
- avoid wrapping in patch 3 as suggested by Alex
- Link to v1: https://lore.kernel.org/r/20240416-dapm-docs-v1-0-a818d2819bf6@bootlin.com

---
Luca Ceresoli (12):
      ASoC: doc: dapm: fix typos
      ASoC: doc: dapm: fix struct name
      ASoC: doc: dapm: minor rewording
      ASoC: doc: dapm: remove dash after colon
      ASoC: doc: dapm: clarify it's an internal API
      ASoC: doc: dapm: replace "map" with "graph"
      ASoC: doc: dapm: extend initial descrption
      ASoC: doc: dapm: describe how widgets and routes are registered
      ASoC: doc: dapm: fix and improve section "Registering DAPM controls"
      ASoC: doc: dapm: improve section "Codec/DSP Widget Interconnections"
      ASoC: doc: dapm: update section "DAPM Widget Events"
      ASoC: doc: dapm: update event types

 Documentation/sound/soc/dapm-graph.svg | 375 +++++++++++++++++++++++++++++++++
 Documentation/sound/soc/dapm.rst       | 174 ++++++++++-----
 2 files changed, 492 insertions(+), 57 deletions(-)
---
base-commit: c942a0cd36
change-id: 20240315-dapm-docs-79bd51f267db

Best regards,
--
Luca Ceresoli <luca.ceresoli@bootlin.com>
2024-05-01 00:00:17 +09:00
Jakub Kicinski
89de2db193 Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
pull-request: bpf-next 2024-04-29

We've added 147 non-merge commits during the last 32 day(s) which contain
a total of 158 files changed, 9400 insertions(+), 2213 deletions(-).

The main changes are:

1) Add an internal-only BPF per-CPU instruction for resolving per-CPU
   memory addresses and implement support in x86 BPF JIT. This allows
   inlining per-CPU array and hashmap lookups
   and the bpf_get_smp_processor_id() helper, from Andrii Nakryiko.

2) Add BPF link support for sk_msg and sk_skb programs, from Yonghong Song.

3) Optimize x86 BPF JIT's emit_mov_imm64, and add support for various
   atomics in bpf_arena which can be JITed as a single x86 instruction,
   from Alexei Starovoitov.

4) Add support for passing mark with bpf_fib_lookup helper,
   from Anton Protopopov.

5) Add a new bpf_wq API for deferring events and refactor sleepable
   bpf_timer code to keep common code where possible,
   from Benjamin Tissoires.

6) Fix BPF_PROG_TEST_RUN infra with regards to bpf_dummy_struct_ops programs
   to check when NULL is passed for non-NULLable parameters,
   from Eduard Zingerman.

7) Harden the BPF verifier's and/or/xor value tracking,
   from Harishankar Vishwanathan.

8) Introduce crypto kfuncs to make BPF programs able to utilize the kernel
   crypto subsystem, from Vadim Fedorenko.

9) Various improvements to the BPF instruction set standardization doc,
   from Dave Thaler.

10) Extend libbpf APIs to partially consume items from the BPF ringbuffer,
    from Andrea Righi.

11) Bigger batch of BPF selftests refactoring to use common network helpers
    and to drop duplicate code, from Geliang Tang.

12) Support bpf_tail_call_static() helper for BPF programs with GCC 13,
    from Jose E. Marchesi.

13) Add bpf_preempt_{disable,enable}() kfuncs in order to allow a BPF
    program to have code sections where preemption is disabled,
    from Kumar Kartikeya Dwivedi.

14) Allow invoking BPF kfuncs from BPF_PROG_TYPE_SYSCALL programs,
    from David Vernet.

15) Extend the BPF verifier to allow different input maps for a given
    bpf_for_each_map_elem() helper call in a BPF program, from Philo Lu.

16) Add support for PROBE_MEM32 and bpf_addr_space_cast instructions
    for riscv64 and arm64 JITs to enable BPF Arena, from Puranjay Mohan.

17) Shut up a false-positive KMSAN splat in interpreter mode by unpoison
    the stack memory, from Martin KaFai Lau.

18) Improve xsk selftest coverage with new tests on maximum and minimum
    hardware ring size configurations, from Tushar Vyavahare.

19) Various ReST man pages fixes as well as documentation and bash completion
    improvements for bpftool, from Rameez Rehman & Quentin Monnet.

20) Fix libbpf with regards to dumping subsequent char arrays,
    from Quentin Deslandes.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (147 commits)
  bpf, docs: Clarify PC use in instruction-set.rst
  bpf_helpers.h: Define bpf_tail_call_static when building with GCC
  bpf, docs: Add introduction for use in the ISA Internet Draft
  selftests/bpf: extend BPF_SOCK_OPS_RTT_CB test for srtt and mrtt_us
  bpf: add mrtt and srtt as BPF_SOCK_OPS_RTT_CB args
  selftests/bpf: dummy_st_ops should reject 0 for non-nullable params
  bpf: check bpf_dummy_struct_ops program params for test runs
  selftests/bpf: do not pass NULL for non-nullable params in dummy_st_ops
  selftests/bpf: adjust dummy_st_ops_success to detect additional error
  bpf: mark bpf_dummy_struct_ops.test_1 parameter as nullable
  selftests/bpf: Add ring_buffer__consume_n test.
  bpf: Add bpf_guard_preempt() convenience macro
  selftests: bpf: crypto: add benchmark for crypto functions
  selftests: bpf: crypto skcipher algo selftests
  bpf: crypto: add skcipher to bpf crypto
  bpf: make common crypto API for TC/XDP programs
  bpf: update the comment for BTF_FIELDS_MAX
  selftests/bpf: Fix wq test.
  selftests/bpf: Use make_sockaddr in test_sock_addr
  selftests/bpf: Use connect_to_addr in test_sock_addr
  ...
====================

Link: https://lore.kernel.org/r/20240429131657.19423-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-29 13:12:19 -07:00
David Howells
2ff1e97587 netfs: Replace PG_fscache by setting folio->private and marking dirty
When dirty data is being written to the cache, setting/waiting on/clearing
the fscache flag is always done in tandem with setting/waiting on/clearing
the writeback flag.  The netfslib buffered write routines wait on and set
both flags and the write request cleanup clears both flags, so the fscache
flag is almost superfluous.

The reason it isn't superfluous is because the fscache flag is also used to
indicate that data just read from the server is being written to the cache.
The flag is used to prevent a race involving overlapping direct-I/O writes
to the cache.

Change this to indicate that a page is in need of being copied to the cache
by placing a magic value in folio->private and marking the folios dirty.
Then when the writeback code sees a folio marked in this way, it only
writes it to the cache and not to the server.

If a folio that has this magic value set is modified, the value is just
replaced and the folio will then be uplodaded too.

With this, PG_fscache is no longer required by the netfslib core, 9p and
afs.

Ceph and nfs, however, still need to use the old PG_fscache-based tracking.
To deal with this, a flag, NETFS_ICTX_USE_PGPRIV2, now has to be set on the
flags in the netfs_inode struct for those filesystems.  This reenables the
use of PG_fscache in that inode.  9p and afs use the netfslib write helpers
so get switched over; cifs, for the moment, does page-by-page manual access
to the cache, so doesn't use PG_fscache and is unaffected.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
cc: Matthew Wilcox (Oracle) <willy@infradead.org>
cc: Eric Van Hensbergen <ericvh@kernel.org>
cc: Latchesar Ionkov <lucho@ionkov.net>
cc: Dominique Martinet <asmadeus@codewreck.org>
cc: Christian Schoenebeck <linux_oss@crudebyte.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Ilya Dryomov <idryomov@gmail.com>
cc: Xiubo Li <xiubli@redhat.com>
cc: Steve French <sfrench@samba.org>
cc: Paulo Alcantara <pc@manguebit.com>
cc: Ronnie Sahlberg <ronniesahlberg@gmail.com>
cc: Shyam Prasad N <sprasad@microsoft.com>
cc: Tom Talpey <tom@talpey.com>
cc: Bharath SM <bharathsm@microsoft.com>
cc: Trond Myklebust <trond.myklebust@hammerspace.com>
cc: Anna Schumaker <anna@kernel.org>
cc: netfs@lists.linux.dev
cc: v9fs@lists.linux.dev
cc: linux-afs@lists.infradead.org
cc: ceph-devel@vger.kernel.org
cc: linux-cifs@vger.kernel.org
cc: linux-nfs@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
cc: linux-mm@kvack.org
2024-04-29 15:01:42 +01:00
Jithu Joseph
15b429f4e0 platform/x86/intel/ifs: trace: display batch num in hex
In Field Scan test image files are named in ff-mm-ss-<batch02x>.scan
format. Current trace output, prints the batch number in decimal format.

Make it easier to correlate the trace line to a test image file by
showing the batch number also in hex format.

Add 0x prefix to all fields in the trace line to make the type explicit.

Signed-off-by: Jithu Joseph <jithu.joseph@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Ashok Raj <ashok.raj@intel.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Link: https://lore.kernel.org/r/20240412172349.544064-3-jithu.joseph@intel.com
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2024-04-29 10:52:02 +02:00
Linus Torvalds
e6ebf01172 Merge tag 'mm-hotfixes-stable-2024-04-26-13-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
 "11 hotfixes. 8 are cc:stable and the remaining 3 (nice ratio!) address
  post-6.8 issues or aren't considered suitable for backporting.

  All except one of these are for MM. I see no particular theme - it's
  singletons all over"

* tag 'mm-hotfixes-stable-2024-04-26-13-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  mm/hugetlb: fix DEBUG_LOCKS_WARN_ON(1) when dissolve_free_hugetlb_folio()
  selftests: mm: protection_keys: save/restore nr_hugepages value from launch script
  stackdepot: respect __GFP_NOLOCKDEP allocation flag
  hugetlb: check for anon_vma prior to folio allocation
  mm: zswap: fix shrinker NULL crash with cgroup_disable=memory
  mm: turn folio_test_hugetlb into a PageType
  mm: support page_mapcount() on page_has_type() pages
  mm: create FOLIO_FLAG_FALSE and FOLIO_TYPE_OPS macros
  mm/hugetlb: fix missing hugetlb_lock for resv uncharge
  selftests: mm: fix unused and uninitialized variable warning
  selftests/harness: remove use of LINE_MAX
2024-04-26 13:48:03 -07:00
Jason Xing
b533fb9cf4 rstreason: make it work in trace world
At last, we should let it work by introducing this reset reason in
trace world.

One of the possible expected outputs is:
... tcp_send_reset: skbaddr=xxx skaddr=xxx src=xxx dest=xxx
state=TCP_ESTABLISHED reason=NOT_SPECIFIED

Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-26 15:34:01 +02:00
Matthew Wilcox (Oracle)
43849758fd khugepaged: use a folio throughout hpage_collapse_scan_file()
Replace the use of pages with folios.  Saves a few calls to
compound_head() and removes some uses of obsolete functions.

Link: https://lkml.kernel.org/r/20240403171838.1445826-8-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-04-25 20:56:34 -07:00
Matthew Wilcox (Oracle)
610ff817b9 khugepaged: remove hpage from collapse_file()
Use new_folio throughout where we had been using hpage.

Link: https://lkml.kernel.org/r/20240403171838.1445826-6-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-04-25 20:56:33 -07:00
Matthew Wilcox (Oracle)
c93012d849 dax: use huge_zero_folio
Convert from huge_zero_page to huge_zero_folio.

Link: https://lkml.kernel.org/r/20240326202833.523759-8-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-04-25 20:56:20 -07:00
Matthew Wilcox (Oracle)
46df8e73a4 mm: free up PG_slab
Reclaim the Slab page flag by using a spare bit in PageType.  We are
perennially short of page flags for various purposes, and now that the
original SLAB allocator has been retired, SLUB does not use the
mapcount/page_type field.  This lets us remove a number of special cases
for ignoring mapcount on Slab pages.

[willy@infradead.org: update vmcoreinfo]
  Link: https://lkml.kernel.org/r/ZgGV-O8WYQ_83kxp@casper.infradead.org
Link: https://lkml.kernel.org/r/20240321142448.1645400-8-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-04-25 20:56:00 -07:00
Matthew Wilcox (Oracle)
d99e3140a4 mm: turn folio_test_hugetlb into a PageType
The current folio_test_hugetlb() can be fooled by a concurrent folio split
into returning true for a folio which has never belonged to hugetlbfs. 
This can't happen if the caller holds a refcount on it, but we have a few
places (memory-failure, compaction, procfs) which do not and should not
take a speculative reference.

Since hugetlb pages do not use individual page mapcounts (they are always
fully mapped and use the entire_mapcount field to record the number of
mappings), the PageType field is available now that page_mapcount()
ignores the value in this field.

In compaction and with CONFIG_DEBUG_VM enabled, the current implementation
can result in an oops, as reported by Luis. This happens since 9c5ccf2db0
("mm: remove HUGETLB_PAGE_DTOR") effectively added some VM_BUG_ON() checks
in the PageHuge() testing path.

[willy@infradead.org: update vmcoreinfo]
  Link: https://lkml.kernel.org/r/ZgGZUvsdhaT1Va-T@casper.infradead.org
Link: https://lkml.kernel.org/r/20240321142448.1645400-6-willy@infradead.org
Fixes: 9c5ccf2db0 ("mm: remove HUGETLB_PAGE_DTOR")
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reported-by: Luis Chamberlain <mcgrof@kernel.org>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218227
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-04-24 19:34:26 -07:00
Vincent Guittot
d4dbc99171 sched/cpufreq: Rename arch_update_thermal_pressure() => arch_update_hw_pressure()
Now that cpufreq provides a pressure value to the scheduler, rename
arch_update_thermal_pressure into HW pressure to reflect that it returns
a pressure applied by HW (i.e. with a high frequency change) and not
always related to thermal mitigation but also generated by max current
limitation as an example. Such high frequency signal needs filtering to be
smoothed and provide an value that reflects the average available capacity
into the scheduler time scale.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Lukasz Luba <lukasz.luba@arm.com>
Reviewed-by: Qais Yousef <qyousef@layalina.io>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Link: https://lore.kernel.org/r/20240326091616.3696851-5-vincent.guittot@linaro.org
2024-04-24 12:08:01 +02:00
Chao Yu
92f750d847 f2fs: convert f2fs__page tracepoint class to use folio
Convert f2fs__page tracepoint class() and its instances to use folio
and related functionality, and rename it to f2fs__folio().

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-04-19 17:57:10 +00:00
Jakub Kicinski
41e3ddb291 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR.

Conflicts:

include/trace/events/rpcgss.h
  386f4a7379 ("trace: events: cleanup deprecated strncpy uses")
  a4833e3aba ("SUNRPC: Fix rpcgss_context trace event acceptor field")

Adjacent changes:

drivers/net/ethernet/intel/ice/ice_tc_lib.c
  2cca35f5dd ("ice: Fix checking for unsupported keys on non-tunnel device")
  784feaa65d ("ice: Add support for PFCP hardware offload in switchdev")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-18 13:12:24 -07:00
Jesper Dangaard Brouer
fc29e04ae1 cgroup/rstat: add cgroup_rstat_lock helpers and tracepoints
This commit enhances the ability to troubleshoot the global
cgroup_rstat_lock by introducing wrapper helper functions for the lock
along with associated tracepoints.

Although global, the cgroup_rstat_lock helper APIs and tracepoints take
arguments such as cgroup pointer and cpu_in_loop variable. This
adjustment is made because flushing occurs per cgroup despite the lock
being global. Hence, when troubleshooting, it's important to identify the
relevant cgroup. The cpu_in_loop variable is necessary because the global
lock may be released within the main flushing loop that traverses CPUs.
In the tracepoints, the cpu_in_loop value is set to -1 when acquiring the
main lock; otherwise, it denotes the CPU number processed last.

The new feature in this patchset is detecting when lock is contended. The
tracepoints are implemented with production in mind. For minimum overhead
attach to cgroup:cgroup_rstat_lock_contended, which only gets activated
when trylock detects lock is contended. A quick production check for
issues could be done via this perf commands:

 perf record -g -e cgroup:cgroup_rstat_lock_contended

Next natural question would be asking how long time do lock contenders
wait for obtaining the lock. This can be answered by measuring the time
between cgroup:cgroup_rstat_lock_contended and cgroup:cgroup_rstat_locked
when args->contended is set.  Like this bpftrace script:

 bpftrace -e '
   tracepoint:cgroup:cgroup_rstat_lock_contended {@start[tid]=nsecs}
   tracepoint:cgroup:cgroup_rstat_locked {
     if (args->contended) {
       @wait_ns=hist(nsecs-@start[tid]); delete(@start[tid]);}}
   interval:s:1 {time("%H:%M:%S "); print(@wait_ns); }'

Extending with time spend holding the lock will be more expensive as this
also looks at all the non-contended cases.
Like this bpftrace script:

 bpftrace -e '
   tracepoint:cgroup:cgroup_rstat_lock_contended {@start[tid]=nsecs}
   tracepoint:cgroup:cgroup_rstat_locked { @locked[tid]=nsecs;
     if (args->contended) {
       @wait_ns=hist(nsecs-@start[tid]); delete(@start[tid]);}}
   tracepoint:cgroup:cgroup_rstat_unlock {
       @locked_ns=hist(nsecs-@locked[tid]); delete(@locked[tid]);}
   interval:s:1 {time("%H:%M:%S ");  print(@wait_ns);print(@locked_ns); }'

Signed-off-by: Jesper Dangaard Brouer <hawk@kernel.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
2024-04-16 12:10:42 -10:00
Steven Rostedt
58300f8d6a ASoC: tracing: Export SND_SOC_DAPM_DIR_OUT to its value
The string SND_SOC_DAPM_DIR_OUT is printed in the snd_soc_dapm_path trace
event instead of its value:

   (((REC->path_dir) == SND_SOC_DAPM_DIR_OUT) ? "->" : "<-")

User space cannot parse this, as it has no idea what SND_SOC_DAPM_DIR_OUT
is. Use TRACE_DEFINE_ENUM() to convert it to its value:

   (((REC->path_dir) == 1) ? "->" : "<-")

So that user space tools, such as perf and trace-cmd, can parse it
correctly.

Reported-by: Luca Ceresoli <luca.ceresoli@bootlin.com>
Fixes: 6e588a0d83 ("ASoC: dapm: Consolidate path trace events")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://lore.kernel.org/r/20240416000303.04670cdf@rorschach.local.home
Signed-off-by: Mark Brown <broonie@kernel.org>
2024-04-16 20:00:18 +09:00
Linus Torvalds
96fca68c4f Merge tag 'nfsd-6.9-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd fixes from Chuck Lever:

 - Fix a potential tracepoint crash

 - Fix NFSv4 GETATTR on big-endian platforms

* tag 'nfsd-6.9-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
  NFSD: fix endianness issue in nfsd4_encode_fattr4
  SUNRPC: Fix rpcgss_context trace event acceptor field
2024-04-15 14:09:47 -07:00
Uladzislau Rezki (Sony)
2053937a31 rcu: Add a trace event for synchronize_rcu_normal()
Add an rcu_sr_normal() trace event. It takes three arguments
first one is the name of RCU flavour, second one is a user id
which triggeres synchronize_rcu_normal() and last one is an
event.

There are two traces in the synchronize_rcu_normal(). On entry,
when a new request is registered and on exit point when request
is completed.

Please note, CONFIG_RCU_TRACE=y is required to activate traces.

Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
2024-04-15 19:47:51 +02:00
Paolo Bonzini
f3b65bbaed KVM: delete .change_pte MMU notifier callback
The .change_pte() MMU notifier callback was intended as an
optimization. The original point of it was that KSM could tell KVM to flip
its secondary PTE to a new location without having to first zap it. At
the time there was also an .invalidate_page() callback; both of them were
*not* bracketed by calls to mmu_notifier_invalidate_range_{start,end}(),
and .invalidate_page() also doubled as a fallback implementation of
.change_pte().

Later on, however, both callbacks were changed to occur within an
invalidate_range_start/end() block.

In the case of .change_pte(), commit 6bdb913f0a ("mm: wrap calls to
set_pte_at_notify with invalidate_range_start and invalidate_range_end",
2012-10-09) did so to remove the fallback from .invalidate_page() to
.change_pte() and allow sleepable .invalidate_page() hooks.

This however made KVM's usage of the .change_pte() callback completely
moot, because KVM unmaps the sPTEs during .invalidate_range_start()
and therefore .change_pte() has no hope of finding a sPTE to change.
Drop the generic KVM code that dispatches to kvm_set_spte_gfn(), as
well as all the architecture specific implementations.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Anup Patel <anup@brainfault.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Reviewed-by: Bibo Mao <maobibo@loongson.cn>
Message-ID: <20240405115815.3226315-2-pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-04-11 13:18:27 -04:00
Marco Elver
c82389947d tracing: Add sched_prepare_exec tracepoint
Add "sched_prepare_exec" tracepoint, which is run right after the point
of no return but before the current task assumes its new exec identity.

Unlike the tracepoint "sched_process_exec", the "sched_prepare_exec"
tracepoint runs before flushing the old exec, i.e. while the task still
has the original state (such as original MM), but when the new exec
either succeeds or crashes (but never returns to the original exec).

Being able to trace this event can be helpful in a number of use cases:

  * allowing tracing eBPF programs access to the original MM on exec,
    before current->mm is replaced;
  * counting exec in the original task (via perf event);
  * profiling flush time ("sched_prepare_exec" to "sched_process_exec").

Example of tracing output:

 $ cat /sys/kernel/debug/tracing/trace_pipe
    <...>-379  [003] .....  179.626921: sched_prepare_exec: interp=/usr/bin/sshd filename=/usr/bin/sshd pid=379 comm=sshd
    <...>-381  [002] .....  180.048580: sched_prepare_exec: interp=/bin/bash filename=/bin/bash pid=381 comm=sshd
    <...>-385  [001] .....  180.068277: sched_prepare_exec: interp=/usr/bin/tty filename=/usr/bin/tty pid=385 comm=bash
    <...>-389  [006] .....  192.020147: sched_prepare_exec: interp=/usr/bin/dmesg filename=/usr/bin/dmesg pid=389 comm=bash

Signed-off-by: Marco Elver <elver@google.com>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Link: https://lore.kernel.org/r/20240411102158.1272267-1-elver@google.com
Signed-off-by: Kees Cook <keescook@chromium.org>
2024-04-11 09:02:21 -07:00
Steven Rostedt (Google)
a4833e3aba SUNRPC: Fix rpcgss_context trace event acceptor field
The rpcgss_context trace event acceptor field is a dynamically sized
string that records the "data" parameter. But this parameter is also
dependent on the "len" field to determine the size of the data.

It needs to use __string_len() helper macro where the length can be passed
in. It also incorrectly uses strncpy() to save it instead of
__assign_str(). As these macros can change, it is not wise to open code
them in trace events.

As of commit c759e60903 ("tracing: Remove __assign_str_len()"),
__assign_str() can be used for both __string() and __string_len() fields.
Before that commit, __assign_str_len() is required to be used. This needs
to be noted for backporting. (In actuality, commit c1fa617cae ("tracing:
Rework __assign_str() and __string() to not duplicate getting the string")
is the commit that makes __string_str_len() obsolete).

Cc: stable@vger.kernel.org
Fixes: 0c77668ddb ("SUNRPC: Introduce trace points in rpc_auth_gss.ko")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-04-10 19:01:30 -04:00
Ingo Molnar
0e6ebfd163 Merge tag 'v6.9-rc3' into x86/cpu, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2024-04-09 09:28:41 +02:00
Justin Stitt
386f4a7379 trace: events: cleanup deprecated strncpy uses
strncpy() is deprecated for use on NUL-terminated destination strings
[1] and as such we should prefer more robust and less ambiguous string
interfaces.

For 2 out of 3 of these changes we can simply swap in strscpy() as it
guarantess NUL-termination which is needed for the following trace
print.

trace_rpcgss_context() should use memcpy as its format specifier %.*s
allows for the length to be specifier (__entry->len). Due to this,
acceptor does not technically need to be NUL-terminated. Moreover,
swapping in strscpy() and keeping everything else the same could result
in truncation of the source string by one byte. To remedy this, we could
use `len + 1` but I am unsure of the size of the destination buffer so a
simple memcpy should suffice.
|	TP_printk("win_size=%u expiry=%lu now=%lu timeout=%u acceptor=%.*s",
|		__entry->window_size, __entry->expiry, __entry->now,
|		__entry->timeout, __entry->len, __get_str(acceptor))

I suspect acceptor not to naturally be a NUL-terminated string due to
the presence of some stringify methods.
|	.crstringify_acceptor	= gss_stringify_acceptor,

Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings [1]
Link: https://manpages.debian.org/testing/linux-manual-4.8/strscpy.9.en.html [2]
Link: https://github.com/KSPP/linux/issues/90
Signed-off-by: Justin Stitt <justinstitt@google.com>
Acked-by: Chuck Lever <chuck.lever@oracle.com>
Link: https://lore.kernel.org/r/20240401-strncpy-include-trace-events-mdio-h-v1-1-9cb5a4cda116@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-05 22:10:25 -07:00
Jason Xing
19822a980e trace: tcp: fully support trace_tcp_send_reset
Prior to this patch, what we can see by enabling trace_tcp_send is
only happening under two circumstances:
1) active rst mode
2) non-active rst mode and based on the full socket

That means the inconsistency occurs if we use tcpdump and trace
simultaneously to see how rst happens.

It's necessary that we should take into other cases into considerations,
say:
1) time-wait socket
2) no socket
...

By parsing the incoming skb and reversing its 4-tuple can
we know the exact 'flow' which might not exist.

Samples after applied this patch:
1. tcp_send_reset: skbaddr=XXX skaddr=XXX src=ip:port dest=ip:port
state=TCP_ESTABLISHED
2. tcp_send_reset: skbaddr=000...000 skaddr=XXX src=ip:port dest=ip:port
state=UNKNOWN
Note:
1) UNKNOWN means we cannot extract the right information from skb.
2) skbaddr/skaddr could be 0

Signed-off-by: Jason Xing <kernelxing@tencent.com>
Link: https://lore.kernel.org/r/20240401073605.37335-3-kerneljasonxing@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-03 19:26:14 -07:00
Jason Xing
9807080e21 trace: adjust TP_STORE_ADDR_PORTS_SKB() parameters
Introducing entry_saddr and entry_daddr parameters in this macro
for later use can help us record the reverse 4-tuple by analyzing
the 4-tuple of the incoming skb when receiving.

Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20240401073605.37335-2-kerneljasonxing@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-03 19:26:14 -07:00
Avadhut Naik
186d7ef52c tracing: Add the ::microcode field to the mce_record tracepoint
Currently, the microcode field (Microcode Revision) of 'struct mce' is not
exposed to userspace through the mce_record tracepoint.

Knowing the microcode version on which the MCE was received is critical
information for debugging. If the version is not recorded, later attempts
to acquire the version might result in discrepancies since it can be
changed at runtime.

Add microcode version to the tracepoint to prevent ambiguity over
the active version on the system when the MCE was received.

Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Sohil Mehta <sohil.mehta@intel.com>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/r/20240401171455.1737976-3-avadhut.naik@amd.com
2024-04-03 09:39:29 +02:00
Avadhut Naik
98430645e3 tracing: Add the ::ppin field to the mce_record tracepoint
Machine Check Error information from 'struct mce' is exposed to userspace
through the mce_record tracepoint.

Currently, however, the PPIN (Protected Processor Inventory Number) field
of 'struct mce' is not exposed.

Add a PPIN field to the tracepoint as it provides a unique identifier for
the system (or socket in case of multi-socket systems) on which the MCE
has been received.

Also, add a comment explaining the kind of information that can be and
should be added to the tracepoint.

Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Sohil Mehta <sohil.mehta@intel.com>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/r/20240401171455.1737976-2-avadhut.naik@amd.com
2024-04-03 09:39:29 +02:00
Alexander Aring
1131f33908 dlm: remove lkb from callback tracepoints
Stop using lkb structs in the callback tracepoints so that lkb
references are not needed. This prepares for separating lkb
structs from callbacks.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2024-04-01 13:31:12 -05:00
Ingo Molnar
ac5e80e94f x86/mce: Clean up TP_printk() output line of the 'mce_record' tracepoint
- Only capitalize entries where that makes sense
 - Print separate values separately
 - Rename 'PROCESSOR' to vendor & CPUID

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Avadhut Naik <avadhut.naik@amd.com>
Cc: "Tony Luck" <tony.luck@intel.com>
Link: https://lore.kernel.org/r/ZgZpn/zbCJWYdL5y@gmail.com
2024-03-30 11:35:28 +01:00
Balazs Scheidler
e9669a00bb net: udp: add IP/port data to the tracepoint udp/udp_fail_queue_rcv_skb
The udp_fail_queue_rcv_skb() tracepoint lacks any details on the source
and destination IP/port whereas this information can be critical in case
of UDP/syslog.

Signed-off-by: Balazs Scheidler <balazs.scheidler@axoflow.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Link: https://lore.kernel.org/r/0c8b3e33dbf679e190be6f4c6736603a76988a20.1711475011.git.balazs.scheidler@axoflow.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-29 12:18:24 -07:00