Files
linux/kernel/trace/blktrace.c
Linus Torvalds 69c5079b49 Merge tag 'trace-v6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing updates from Steven Rostedt:

 - Extend tracing option mask to 64 bits

   The trace options were defined by a 32 bit variable. This limits the
   tracing instances to have a total of 32 different options. As that
   limit has been hit, and more options are being added, increase the
   option mask to a 64 bit number, doubling the number of options
   available.

   As this is required for the kprobe topic branches as well as the
   tracing topic branch, a separate branch was created and merged into
   both.

 - Make trace_user_fault_read() available for the rest of tracing

   The function trace_user_fault_read() is used by trace_marker file
   read to allow reading user space to be done fast and without locking
   or allocations. Make this available so that the system call trace
   events can use it too.

 - Have system call trace events read user space values

   Now that the system call trace events callbacks are called in a
   faultable context, take advantage of this and read the user space
   buffers for various system calls. For example, show the path name of
   the openat system call instead of just showing the pointer to that
   path name in user space. Also show the contents of the buffer of the
   write system call. Several system call trace events are updated to
   make tracing into a light weight strace tool for all applications in
   the system.

 - Update perf system call tracing to do the same

 - And a config and syscall_user_buf_size file to control the size of
   the buffer

   Limit the amount of data that can be read from user space. The
   default size is 63 bytes but that can be expanded to 165 bytes.

 - Allow the persistent ring buffer to print system calls normally

   The persistent ring buffer prints trace events by their type and
   ignores the print_fmt. This is because the print_fmt may change from
   kernel to kernel. As the system call output is fixed by the system
   call ABI itself, there's no reason to limit that. This makes reading
   the system call events in the persistent ring buffer much nicer and
   easier to understand.

 - Add options to show text offset to function profiler

   The function profiler that counts the number of times a function is
   hit currently lists all functions by its name and offset. But this
   becomes ambiguous when there are several functions with the same
   name.

   Add a tracing option that changes the output to be that of
   '_text+offset' instead. Now a user space tool can use this
   information to map the '_text+offset' to the unique function it is
   counting.

 - Report bad dynamic event command

   If a bad command is passed to the dynamic_events file, report it
   properly in the error log.

 - Clean up tracer options

   Clean up the tracer option code a bit, by removing some useless code
   and also using switch statements instead of a series of if
   statements.

 - Have tracing options be instance specific

   Tracers can have their own options (function tracer, irqsoff tracer,
   function graph tracer, etc). But now that the same tracer can be
   enabled in multiple trace instances, their options are still global.
   The API is per instance, thus changing one affects other instances.
   This isn't even consistent, as the option take affect differently
   depending on when an tracer started in an instance. Make the options
   for instances only affect the instance it is changed under.

 - Optimize pid_list lock contention

   Whenever the pid_list is read, it uses a spin lock. This happens at
   every sched switch. Taking the lock at sched switch can be removed by
   instead using a seqlock counter.

 - Clean up the trace trigger structures

   The trigger code uses two different structures to implement a single
   tigger. This was due to trying to reuse code for the two different
   types of triggers (always on trigger, and count limited trigger). But
   by adding a single field to one structure, the other structure could
   be absorbed into the first structure making he code easier to
   understand.

 - Create a bulk garbage collector for trace triggers

   If user space has triggers for several hundreds of events and then
   removes them, it can take several seconds to complete. This is
   because each removal calls tracepoint_synchronize_unregister() that
   can take hundreds of milliseconds to complete.

   Instead, create a helper thread that will do the clean up. When a
   trigger is removed, it will create the kthread if it isn't already
   created, and then add the trigger to a llist. The kthread will take
   the items off the llist, call tracepoint_synchronize_unregister(),
   and then remove the items it took off. It will then check if there's
   more items to free before sleeping.

   This makes user space removing all these triggers to finish in less
   than a second.

 - Allow function tracing of some of the tracing infrastructure code

   Because the tracing code can cause recursion issues if it is traced
   by the function tracer the entire tracing directory disables function
   tracing. But not all of tracing causes issues if it is traced.
   Namely, the event tracing code. Add a config that enables some of the
   tracing code to be traced to help in debugging it. Note, when this is
   enabled, it does add noise to general function tracing, especially if
   events are enabled as well (which is a common case).

 - Add boot-time backup instance for persistent buffer

   The persistent ring buffer is used mostly for kernel crash analysis
   in the field. One issue is that if there's a crash, the data in the
   persistent ring buffer must be read before tracing can begin using
   it. This slows down the boot process. Once tracing starts in the
   persistent ring buffer, the old data must be freed and the addresses
   no longer match and old events can't be in the buffer with new
   events.

   Create a way to create a backup buffer that copies the persistent
   ring buffer at boot up. Then after a crash, the always on tracer can
   begin immediately as well as the normal boot process while the crash
   analysis tooling uses the backup buffer. After the backup buffer is
   finished being read, it can be removed.

 - Enable function graph args and return address options at the same
   time

   Currently the when reading of arguments in the function graph tracer
   is enabled, the option to record the parent function in the entry
   event can not be enabled. Update the code so that it can.

 - Add new struct_offset() helper macro

   Add a new macro that takes a pointer to a structure and a name of one
   of its members and it will return the offset of that member. This
   allows the ring buffer code to simplify the following:

   From:  size = struct_size(entry, buf, cnt - sizeof(entry->id));
     To:  size = struct_offset(entry, id) + cnt;

   There should be other simplifications that this macro can help out
   with as well

* tag 'trace-v6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (42 commits)
  overflow: Introduce struct_offset() to get offset of member
  function_graph: Enable funcgraph-args and funcgraph-retaddr to work simultaneously
  tracing: Add boot-time backup of persistent ring buffer
  ftrace: Allow tracing of some of the tracing code
  tracing: Use strim() in trigger_process_regex() instead of skip_spaces()
  tracing: Add bulk garbage collection of freeing event_trigger_data
  tracing: Remove unneeded event_mutex lock in event_trigger_regex_release()
  tracing: Merge struct event_trigger_ops into struct event_command
  tracing: Remove get_trigger_ops() and add count_func() from trigger ops
  tracing: Show the tracer options in boot-time created instance
  ftrace: Avoid redundant initialization in register_ftrace_direct
  tracing: Remove unused variable in tracing_trace_options_show()
  fgraph: Make fgraph_no_sleep_time signed
  tracing: Convert function graph set_flags() to use a switch() statement
  tracing: Have function graph tracer option sleep-time be per instance
  tracing: Move graph-time out of function graph options
  tracing: Have function graph tracer option funcgraph-irqs be per instance
  trace/pid_list: optimize pid_list->lock contention
  tracing: Have function graph tracer define options per instance
  tracing: Have function tracer define options per instance
  ...
2025-12-05 09:51:37 -08:00

2201 lines
54 KiB
C

// SPDX-License-Identifier: GPL-2.0
/*
* Copyright (C) 2006 Jens Axboe <axboe@kernel.dk>
*
*/
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
#include <linux/kernel.h>
#include <linux/blkdev.h>
#include <linux/blktrace_api.h>
#include <linux/percpu.h>
#include <linux/init.h>
#include <linux/mutex.h>
#include <linux/slab.h>
#include <linux/debugfs.h>
#include <linux/export.h>
#include <linux/time.h>
#include <linux/uaccess.h>
#include <linux/list.h>
#include <linux/blk-cgroup.h>
#include "../../block/blk.h"
#include <trace/events/block.h>
#include "trace_output.h"
#ifdef CONFIG_BLK_DEV_IO_TRACE
static unsigned int blktrace_seq __read_mostly = 1;
static struct trace_array *blk_tr;
static bool blk_tracer_enabled __read_mostly;
static LIST_HEAD(running_trace_list);
static __cacheline_aligned_in_smp DEFINE_RAW_SPINLOCK(running_trace_lock);
/* Select an alternative, minimalistic output than the original one */
#define TRACE_BLK_OPT_CLASSIC 0x1
#define TRACE_BLK_OPT_CGROUP 0x2
#define TRACE_BLK_OPT_CGNAME 0x4
static struct tracer_opt blk_tracer_opts[] = {
/* Default disable the minimalistic output */
{ TRACER_OPT(blk_classic, TRACE_BLK_OPT_CLASSIC) },
#ifdef CONFIG_BLK_CGROUP
{ TRACER_OPT(blk_cgroup, TRACE_BLK_OPT_CGROUP) },
{ TRACER_OPT(blk_cgname, TRACE_BLK_OPT_CGNAME) },
#endif
{ }
};
static struct tracer_flags blk_tracer_flags = {
.val = 0,
.opts = blk_tracer_opts,
};
/* Global reference count of probes */
static DEFINE_MUTEX(blk_probe_mutex);
static int blk_probes_ref;
static void blk_register_tracepoints(void);
static void blk_unregister_tracepoints(void);
static void record_blktrace_event(struct blk_io_trace *t, pid_t pid, int cpu,
sector_t sector, int bytes, u64 what,
dev_t dev, int error, u64 cgid,
ssize_t cgid_len, void *pdu_data, int pdu_len)
{
/*
* These two are not needed in ftrace as they are in the
* generic trace_entry, filled by tracing_generic_entry_update,
* but for the trace_event->bin() synthesizer benefit we do it
* here too.
*/
t->cpu = cpu;
t->pid = pid;
t->sector = sector;
t->bytes = bytes;
t->action = lower_32_bits(what);
t->device = dev;
t->error = error;
t->pdu_len = pdu_len + cgid_len;
if (cgid_len)
memcpy((void *)t + sizeof(*t), &cgid, cgid_len);
if (pdu_len)
memcpy((void *)t + sizeof(*t) + cgid_len, pdu_data, pdu_len);
}
static void record_blktrace_event2(struct blk_io_trace2 *t2, pid_t pid, int cpu,
sector_t sector, int bytes, u64 what,
dev_t dev, int error, u64 cgid,
ssize_t cgid_len, void *pdu_data,
int pdu_len)
{
t2->pid = pid;
t2->cpu = cpu;
t2->sector = sector;
t2->bytes = bytes;
t2->action = what;
t2->device = dev;
t2->error = error;
t2->pdu_len = pdu_len + cgid_len;
if (cgid_len)
memcpy((void *)t2 + sizeof(*t2), &cgid, cgid_len);
if (pdu_len)
memcpy((void *)t2 + sizeof(*t2) + cgid_len, pdu_data, pdu_len);
}
static void relay_blktrace_event1(struct blk_trace *bt, unsigned long sequence,
pid_t pid, int cpu, sector_t sector, int bytes,
u64 what, int error, u64 cgid,
ssize_t cgid_len, void *pdu_data, int pdu_len)
{
struct blk_io_trace *t;
size_t trace_len = sizeof(*t) + pdu_len + cgid_len;
t = relay_reserve(bt->rchan, trace_len);
if (!t)
return;
t->magic = BLK_IO_TRACE_MAGIC | BLK_IO_TRACE_VERSION;
t->sequence = sequence;
t->time = ktime_to_ns(ktime_get());
record_blktrace_event(t, pid, cpu, sector, bytes, what, bt->dev, error,
cgid, cgid_len, pdu_data, pdu_len);
}
static void relay_blktrace_event2(struct blk_trace *bt, unsigned long sequence,
pid_t pid, int cpu, sector_t sector,
int bytes, u64 what, int error, u64 cgid,
ssize_t cgid_len, void *pdu_data, int pdu_len)
{
struct blk_io_trace2 *t;
size_t trace_len = sizeof(struct blk_io_trace2) + pdu_len + cgid_len;
t = relay_reserve(bt->rchan, trace_len);
if (!t)
return;
t->magic = BLK_IO_TRACE_MAGIC | BLK_IO_TRACE2_VERSION;
t->sequence = sequence;
t->time = ktime_to_ns(ktime_get());
record_blktrace_event2(t, pid, cpu, sector, bytes, what, bt->dev, error,
cgid, cgid_len, pdu_data, pdu_len);
}
static void relay_blktrace_event(struct blk_trace *bt, unsigned long sequence,
pid_t pid, int cpu, sector_t sector, int bytes,
u64 what, int error, u64 cgid,
ssize_t cgid_len, void *pdu_data, int pdu_len)
{
if (bt->version == 2)
return relay_blktrace_event2(bt, sequence, pid, cpu, sector,
bytes, what, error, cgid, cgid_len,
pdu_data, pdu_len);
return relay_blktrace_event1(bt, sequence, pid, cpu, sector, bytes,
what, error, cgid, cgid_len, pdu_data,
pdu_len);
}
/*
* Send out a notify message.
*/
static void trace_note(struct blk_trace *bt, pid_t pid, u64 action,
const void *data, size_t len, u64 cgid)
{
struct ring_buffer_event *event = NULL;
struct trace_buffer *buffer = NULL;
unsigned int trace_ctx = 0;
int cpu = smp_processor_id();
bool blk_tracer = blk_tracer_enabled;
ssize_t cgid_len = cgid ? sizeof(cgid) : 0;
action = lower_32_bits(action | (cgid ? __BLK_TN_CGROUP : 0));
if (blk_tracer) {
struct blk_io_trace2 *t;
size_t trace_len = sizeof(*t) + cgid_len + len;
buffer = blk_tr->array_buffer.buffer;
trace_ctx = tracing_gen_ctx_flags(0);
event = trace_buffer_lock_reserve(buffer, TRACE_BLK,
trace_len, trace_ctx);
if (!event)
return;
t = ring_buffer_event_data(event);
record_blktrace_event2(t, pid, cpu, 0, 0,
action, bt->dev, 0, cgid, cgid_len,
(void *)data, len);
trace_buffer_unlock_commit(blk_tr, buffer, event, trace_ctx);
return;
}
if (!bt->rchan)
return;
relay_blktrace_event(bt, 0, pid, cpu, 0, 0, action, 0, cgid,
cgid_len, (void *)data, len);
}
/*
* Send out a notify for this process, if we haven't done so since a trace
* started
*/
static void trace_note_tsk(struct task_struct *tsk)
{
unsigned long flags;
struct blk_trace *bt;
tsk->btrace_seq = blktrace_seq;
raw_spin_lock_irqsave(&running_trace_lock, flags);
list_for_each_entry(bt, &running_trace_list, running_list) {
trace_note(bt, tsk->pid, BLK_TN_PROCESS, tsk->comm,
sizeof(tsk->comm), 0);
}
raw_spin_unlock_irqrestore(&running_trace_lock, flags);
}
static void trace_note_time(struct blk_trace *bt)
{
struct timespec64 now;
unsigned long flags;
u32 words[2];
/* need to check user space to see if this breaks in y2038 or y2106 */
ktime_get_real_ts64(&now);
words[0] = (u32)now.tv_sec;
words[1] = now.tv_nsec;
local_irq_save(flags);
trace_note(bt, 0, BLK_TN_TIMESTAMP, words, sizeof(words), 0);
local_irq_restore(flags);
}
void __blk_trace_note_message(struct blk_trace *bt,
struct cgroup_subsys_state *css, const char *fmt, ...)
{
int n;
va_list args;
unsigned long flags;
char *buf;
u64 cgid = 0;
if (unlikely(bt->trace_state != Blktrace_running &&
!blk_tracer_enabled))
return;
/*
* If the BLK_TC_NOTIFY action mask isn't set, don't send any note
* message to the trace.
*/
if (!(bt->act_mask & BLK_TC_NOTIFY))
return;
local_irq_save(flags);
buf = this_cpu_ptr(bt->msg_data);
va_start(args, fmt);
n = vscnprintf(buf, BLK_TN_MAX_MSG, fmt, args);
va_end(args);
#ifdef CONFIG_BLK_CGROUP
if (css && (blk_tracer_flags.val & TRACE_BLK_OPT_CGROUP))
cgid = cgroup_id(css->cgroup);
else
cgid = 1;
#endif
trace_note(bt, current->pid, BLK_TN_MESSAGE, buf, n, cgid);
local_irq_restore(flags);
}
EXPORT_SYMBOL_GPL(__blk_trace_note_message);
static int act_log_check(struct blk_trace *bt, u64 what, sector_t sector,
pid_t pid)
{
if (((bt->act_mask << BLK_TC_SHIFT) & what) == 0)
return 1;
if (sector && (sector < bt->start_lba || sector > bt->end_lba))
return 1;
if (bt->pid && pid != bt->pid)
return 1;
return 0;
}
/*
* Data direction bit lookup
*/
static const u32 ddir_act[2] = { BLK_TC_ACT(BLK_TC_READ),
BLK_TC_ACT(BLK_TC_WRITE) };
#define BLK_TC_RAHEAD BLK_TC_AHEAD
#define BLK_TC_PREFLUSH BLK_TC_FLUSH
/* The ilog2() calls fall out because they're constant */
#define MASK_TC_BIT(rw, __name) ((__force u32)(rw & REQ_ ## __name) << \
(ilog2(BLK_TC_ ## __name) + BLK_TC_SHIFT - __REQ_ ## __name))
/*
* The worker for the various blk_add_trace*() types. Fills out a
* blk_io_trace structure and places it in a per-cpu subbuffer.
*/
static void __blk_add_trace(struct blk_trace *bt, sector_t sector, int bytes,
const blk_opf_t opf, u64 what, int error,
int pdu_len, void *pdu_data, u64 cgid)
{
struct task_struct *tsk = current;
struct ring_buffer_event *event = NULL;
struct trace_buffer *buffer = NULL;
unsigned long flags = 0;
unsigned long *sequence;
unsigned int trace_ctx = 0;
pid_t pid;
int cpu;
bool blk_tracer = blk_tracer_enabled;
ssize_t cgid_len = cgid ? sizeof(cgid) : 0;
const enum req_op op = opf & REQ_OP_MASK;
size_t trace_len;
if (unlikely(bt->trace_state != Blktrace_running && !blk_tracer))
return;
what |= ddir_act[op_is_write(op) ? WRITE : READ];
what |= MASK_TC_BIT(opf, SYNC);
what |= MASK_TC_BIT(opf, RAHEAD);
what |= MASK_TC_BIT(opf, META);
what |= MASK_TC_BIT(opf, PREFLUSH);
what |= MASK_TC_BIT(opf, FUA);
switch (op) {
case REQ_OP_DISCARD:
case REQ_OP_SECURE_ERASE:
what |= BLK_TC_ACT(BLK_TC_DISCARD);
break;
case REQ_OP_FLUSH:
what |= BLK_TC_ACT(BLK_TC_FLUSH);
break;
case REQ_OP_ZONE_APPEND:
what |= BLK_TC_ACT(BLK_TC_ZONE_APPEND);
break;
case REQ_OP_ZONE_RESET:
what |= BLK_TC_ACT(BLK_TC_ZONE_RESET);
break;
case REQ_OP_ZONE_RESET_ALL:
what |= BLK_TC_ACT(BLK_TC_ZONE_RESET_ALL);
break;
case REQ_OP_ZONE_FINISH:
what |= BLK_TC_ACT(BLK_TC_ZONE_FINISH);
break;
case REQ_OP_ZONE_OPEN:
what |= BLK_TC_ACT(BLK_TC_ZONE_OPEN);
break;
case REQ_OP_ZONE_CLOSE:
what |= BLK_TC_ACT(BLK_TC_ZONE_CLOSE);
break;
case REQ_OP_WRITE_ZEROES:
what |= BLK_TC_ACT(BLK_TC_WRITE_ZEROES);
break;
default:
break;
}
/* Drop trace events for zone operations with blktrace v1 */
if (bt->version == 1 && (what >> BLK_TC_SHIFT) > BLK_TC_END_V1) {
pr_debug_ratelimited("blktrace v1 cannot trace zone operation 0x%llx\n",
(unsigned long long)what);
return;
}
if (cgid)
what |= __BLK_TA_CGROUP;
pid = tsk->pid;
if (act_log_check(bt, what, sector, pid))
return;
cpu = raw_smp_processor_id();
if (blk_tracer) {
tracing_record_cmdline(current);
buffer = blk_tr->array_buffer.buffer;
trace_ctx = tracing_gen_ctx_flags(0);
switch (bt->version) {
case 1:
trace_len = sizeof(struct blk_io_trace);
break;
case 2:
default:
/*
* ftrace always uses v2 (blk_io_trace2) format.
*
* For sysfs-enabled tracing path (enabled via
* /sys/block/DEV/trace/enable), blk_trace_setup_queue()
* never initializes bt->version, leaving it 0 from
* kzalloc(). We must handle version==0 safely here.
*
* Fall through to default to ensure we never hit the
* old bug where default set trace_len=0, causing
* buffer underflow and memory corruption.
*
* Always use v2 format for ftrace and normalize
* bt->version to 2 when uninitialized.
*/
trace_len = sizeof(struct blk_io_trace2);
if (bt->version == 0)
bt->version = 2;
break;
}
trace_len += pdu_len + cgid_len;
event = trace_buffer_lock_reserve(buffer, TRACE_BLK,
trace_len, trace_ctx);
if (!event)
return;
switch (bt->version) {
case 1:
record_blktrace_event(ring_buffer_event_data(event),
pid, cpu, sector, bytes,
what, bt->dev, error, cgid, cgid_len,
pdu_data, pdu_len);
break;
case 2:
default:
/*
* Use v2 recording function (record_blktrace_event2)
* which writes blk_io_trace2 structure with correct
* field layout:
* - 32-bit pid at offset 28
* - 64-bit action at offset 32
*
* Fall through to default handles version==0 case
* (from sysfs path), ensuring we always use correct
* v2 recording function to match the v2 buffer
* allocated above.
*/
record_blktrace_event2(ring_buffer_event_data(event),
pid, cpu, sector, bytes,
what, bt->dev, error, cgid, cgid_len,
pdu_data, pdu_len);
break;
}
trace_buffer_unlock_commit(blk_tr, buffer, event, trace_ctx);
return;
}
if (unlikely(tsk->btrace_seq != blktrace_seq))
trace_note_tsk(tsk);
/*
* A word about the locking here - we disable interrupts to reserve
* some space in the relay per-cpu buffer, to prevent an irq
* from coming in and stepping on our toes.
*/
local_irq_save(flags);
sequence = per_cpu_ptr(bt->sequence, cpu);
(*sequence)++;
relay_blktrace_event(bt, *sequence, pid, cpu, sector, bytes,
what, error, cgid, cgid_len, pdu_data, pdu_len);
local_irq_restore(flags);
}
static void blk_trace_free(struct request_queue *q, struct blk_trace *bt)
{
relay_close(bt->rchan);
/*
* If 'bt->dir' is not set, then both 'dropped' and 'msg' are created
* under 'q->debugfs_dir', thus lookup and remove them.
*/
if (!bt->dir) {
debugfs_lookup_and_remove("dropped", q->debugfs_dir);
debugfs_lookup_and_remove("msg", q->debugfs_dir);
} else {
debugfs_remove(bt->dir);
}
free_percpu(bt->sequence);
free_percpu(bt->msg_data);
kfree(bt);
}
static void get_probe_ref(void)
{
mutex_lock(&blk_probe_mutex);
if (++blk_probes_ref == 1)
blk_register_tracepoints();
mutex_unlock(&blk_probe_mutex);
}
static void put_probe_ref(void)
{
mutex_lock(&blk_probe_mutex);
if (!--blk_probes_ref)
blk_unregister_tracepoints();
mutex_unlock(&blk_probe_mutex);
}
static int blk_trace_start(struct blk_trace *bt)
{
if (bt->trace_state != Blktrace_setup &&
bt->trace_state != Blktrace_stopped)
return -EINVAL;
blktrace_seq++;
smp_mb();
bt->trace_state = Blktrace_running;
raw_spin_lock_irq(&running_trace_lock);
list_add(&bt->running_list, &running_trace_list);
raw_spin_unlock_irq(&running_trace_lock);
trace_note_time(bt);
return 0;
}
static int blk_trace_stop(struct blk_trace *bt)
{
if (bt->trace_state != Blktrace_running)
return -EINVAL;
bt->trace_state = Blktrace_stopped;
raw_spin_lock_irq(&running_trace_lock);
list_del_init(&bt->running_list);
raw_spin_unlock_irq(&running_trace_lock);
relay_flush(bt->rchan);
return 0;
}
static void blk_trace_cleanup(struct request_queue *q, struct blk_trace *bt)
{
blk_trace_stop(bt);
synchronize_rcu();
blk_trace_free(q, bt);
put_probe_ref();
}
static int __blk_trace_remove(struct request_queue *q)
{
struct blk_trace *bt;
bt = rcu_replace_pointer(q->blk_trace, NULL,
lockdep_is_held(&q->debugfs_mutex));
if (!bt)
return -EINVAL;
blk_trace_cleanup(q, bt);
return 0;
}
int blk_trace_remove(struct request_queue *q)
{
int ret;
mutex_lock(&q->debugfs_mutex);
ret = __blk_trace_remove(q);
mutex_unlock(&q->debugfs_mutex);
return ret;
}
EXPORT_SYMBOL_GPL(blk_trace_remove);
static ssize_t blk_dropped_read(struct file *filp, char __user *buffer,
size_t count, loff_t *ppos)
{
struct blk_trace *bt = filp->private_data;
size_t dropped = relay_stats(bt->rchan, RELAY_STATS_BUF_FULL);
char buf[16];
snprintf(buf, sizeof(buf), "%zu\n", dropped);
return simple_read_from_buffer(buffer, count, ppos, buf, strlen(buf));
}
static const struct file_operations blk_dropped_fops = {
.owner = THIS_MODULE,
.open = simple_open,
.read = blk_dropped_read,
.llseek = default_llseek,
};
static ssize_t blk_msg_write(struct file *filp, const char __user *buffer,
size_t count, loff_t *ppos)
{
char *msg;
struct blk_trace *bt;
if (count >= BLK_TN_MAX_MSG)
return -EINVAL;
msg = memdup_user_nul(buffer, count);
if (IS_ERR(msg))
return PTR_ERR(msg);
bt = filp->private_data;
__blk_trace_note_message(bt, NULL, "%s", msg);
kfree(msg);
return count;
}
static const struct file_operations blk_msg_fops = {
.owner = THIS_MODULE,
.open = simple_open,
.write = blk_msg_write,
.llseek = noop_llseek,
};
static int blk_remove_buf_file_callback(struct dentry *dentry)
{
debugfs_remove(dentry);
return 0;
}
static struct dentry *blk_create_buf_file_callback(const char *filename,
struct dentry *parent,
umode_t mode,
struct rchan_buf *buf,
int *is_global)
{
return debugfs_create_file(filename, mode, parent, buf,
&relay_file_operations);
}
static const struct rchan_callbacks blk_relay_callbacks = {
.create_buf_file = blk_create_buf_file_callback,
.remove_buf_file = blk_remove_buf_file_callback,
};
static void blk_trace_setup_lba(struct blk_trace *bt,
struct block_device *bdev)
{
if (bdev) {
bt->start_lba = bdev->bd_start_sect;
bt->end_lba = bdev->bd_start_sect + bdev_nr_sectors(bdev);
} else {
bt->start_lba = 0;
bt->end_lba = -1ULL;
}
}
/*
* Setup everything required to start tracing
*/
static struct blk_trace *blk_trace_setup_prepare(struct request_queue *q,
char *name, dev_t dev,
u32 buf_size, u32 buf_nr,
struct block_device *bdev)
{
struct blk_trace *bt = NULL;
struct dentry *dir = NULL;
int ret;
lockdep_assert_held(&q->debugfs_mutex);
/*
* bdev can be NULL, as with scsi-generic, this is a helpful as
* we can be.
*/
if (rcu_dereference_protected(q->blk_trace,
lockdep_is_held(&q->debugfs_mutex))) {
pr_warn("Concurrent blktraces are not allowed on %s\n", name);
return ERR_PTR(-EBUSY);
}
bt = kzalloc(sizeof(*bt), GFP_KERNEL);
if (!bt)
return ERR_PTR(-ENOMEM);
ret = -ENOMEM;
bt->sequence = alloc_percpu(unsigned long);
if (!bt->sequence)
goto err;
bt->msg_data = __alloc_percpu(BLK_TN_MAX_MSG, __alignof__(char));
if (!bt->msg_data)
goto err;
/*
* When tracing the whole disk reuse the existing debugfs directory
* created by the block layer on init. For partitions block devices,
* and scsi-generic block devices we create a temporary new debugfs
* directory that will be removed once the trace ends.
*/
if (bdev && !bdev_is_partition(bdev))
dir = q->debugfs_dir;
else
bt->dir = dir = debugfs_create_dir(name, blk_debugfs_root);
/*
* As blktrace relies on debugfs for its interface the debugfs directory
* is required, contrary to the usual mantra of not checking for debugfs
* files or directories.
*/
if (IS_ERR_OR_NULL(dir)) {
pr_warn("debugfs_dir not present for %s so skipping\n", name);
ret = -ENOENT;
goto err;
}
bt->dev = dev;
INIT_LIST_HEAD(&bt->running_list);
ret = -EIO;
debugfs_create_file("dropped", 0444, dir, bt, &blk_dropped_fops);
debugfs_create_file("msg", 0222, dir, bt, &blk_msg_fops);
bt->rchan = relay_open("trace", dir, buf_size, buf_nr,
&blk_relay_callbacks, bt);
if (!bt->rchan)
goto err;
blk_trace_setup_lba(bt, bdev);
return bt;
err:
blk_trace_free(q, bt);
return ERR_PTR(ret);
}
static void blk_trace_setup_finalize(struct request_queue *q,
char *name, int version,
struct blk_trace *bt,
struct blk_user_trace_setup2 *buts)
{
strscpy_pad(buts->name, name, BLKTRACE_BDEV_SIZE2);
/*
* some device names have larger paths - convert the slashes
* to underscores for this to work as expected
*/
strreplace(buts->name, '/', '_');
bt->version = version;
bt->act_mask = buts->act_mask;
if (!bt->act_mask)
bt->act_mask = (u16) -1;
/* overwrite with user settings */
if (buts->start_lba)
bt->start_lba = buts->start_lba;
if (buts->end_lba)
bt->end_lba = buts->end_lba;
bt->pid = buts->pid;
bt->trace_state = Blktrace_setup;
rcu_assign_pointer(q->blk_trace, bt);
get_probe_ref();
}
int blk_trace_setup(struct request_queue *q, char *name, dev_t dev,
struct block_device *bdev,
char __user *arg)
{
struct blk_user_trace_setup2 buts2;
struct blk_user_trace_setup buts;
struct blk_trace *bt;
int ret;
ret = copy_from_user(&buts, arg, sizeof(buts));
if (ret)
return -EFAULT;
if (!buts.buf_size || !buts.buf_nr)
return -EINVAL;
buts2 = (struct blk_user_trace_setup2) {
.act_mask = buts.act_mask,
.buf_size = buts.buf_size,
.buf_nr = buts.buf_nr,
.start_lba = buts.start_lba,
.end_lba = buts.end_lba,
.pid = buts.pid,
};
mutex_lock(&q->debugfs_mutex);
bt = blk_trace_setup_prepare(q, name, dev, buts.buf_size, buts.buf_nr,
bdev);
if (IS_ERR(bt)) {
mutex_unlock(&q->debugfs_mutex);
return PTR_ERR(bt);
}
blk_trace_setup_finalize(q, name, 1, bt, &buts2);
strcpy(buts.name, buts2.name);
mutex_unlock(&q->debugfs_mutex);
if (copy_to_user(arg, &buts, sizeof(buts))) {
blk_trace_remove(q);
return -EFAULT;
}
return 0;
}
EXPORT_SYMBOL_GPL(blk_trace_setup);
static int blk_trace_setup2(struct request_queue *q, char *name, dev_t dev,
struct block_device *bdev, char __user *arg)
{
struct blk_user_trace_setup2 buts2;
struct blk_trace *bt;
if (copy_from_user(&buts2, arg, sizeof(buts2)))
return -EFAULT;
if (!buts2.buf_size || !buts2.buf_nr)
return -EINVAL;
if (buts2.flags != 0)
return -EINVAL;
mutex_lock(&q->debugfs_mutex);
bt = blk_trace_setup_prepare(q, name, dev, buts2.buf_size, buts2.buf_nr,
bdev);
if (IS_ERR(bt)) {
mutex_unlock(&q->debugfs_mutex);
return PTR_ERR(bt);
}
blk_trace_setup_finalize(q, name, 2, bt, &buts2);
mutex_unlock(&q->debugfs_mutex);
if (copy_to_user(arg, &buts2, sizeof(buts2))) {
blk_trace_remove(q);
return -EFAULT;
}
return 0;
}
#if defined(CONFIG_COMPAT) && defined(CONFIG_X86_64)
static int compat_blk_trace_setup(struct request_queue *q, char *name,
dev_t dev, struct block_device *bdev,
char __user *arg)
{
struct blk_user_trace_setup2 buts2;
struct compat_blk_user_trace_setup cbuts;
struct blk_trace *bt;
if (copy_from_user(&cbuts, arg, sizeof(cbuts)))
return -EFAULT;
if (!cbuts.buf_size || !cbuts.buf_nr)
return -EINVAL;
buts2 = (struct blk_user_trace_setup2) {
.act_mask = cbuts.act_mask,
.buf_size = cbuts.buf_size,
.buf_nr = cbuts.buf_nr,
.start_lba = cbuts.start_lba,
.end_lba = cbuts.end_lba,
.pid = cbuts.pid,
};
mutex_lock(&q->debugfs_mutex);
bt = blk_trace_setup_prepare(q, name, dev, buts2.buf_size, buts2.buf_nr,
bdev);
if (IS_ERR(bt)) {
mutex_unlock(&q->debugfs_mutex);
return PTR_ERR(bt);
}
blk_trace_setup_finalize(q, name, 1, bt, &buts2);
mutex_unlock(&q->debugfs_mutex);
if (copy_to_user(arg, &buts2.name, ARRAY_SIZE(buts2.name))) {
blk_trace_remove(q);
return -EFAULT;
}
return 0;
}
#endif
static int __blk_trace_startstop(struct request_queue *q, int start)
{
struct blk_trace *bt;
bt = rcu_dereference_protected(q->blk_trace,
lockdep_is_held(&q->debugfs_mutex));
if (bt == NULL)
return -EINVAL;
if (start)
return blk_trace_start(bt);
else
return blk_trace_stop(bt);
}
int blk_trace_startstop(struct request_queue *q, int start)
{
int ret;
mutex_lock(&q->debugfs_mutex);
ret = __blk_trace_startstop(q, start);
mutex_unlock(&q->debugfs_mutex);
return ret;
}
EXPORT_SYMBOL_GPL(blk_trace_startstop);
/*
* When reading or writing the blktrace sysfs files, the references to the
* opened sysfs or device files should prevent the underlying block device
* from being removed. So no further delete protection is really needed.
*/
/**
* blk_trace_ioctl - handle the ioctls associated with tracing
* @bdev: the block device
* @cmd: the ioctl cmd
* @arg: the argument data, if any
*
**/
int blk_trace_ioctl(struct block_device *bdev, unsigned cmd, char __user *arg)
{
struct request_queue *q = bdev_get_queue(bdev);
int ret, start = 0;
char b[BDEVNAME_SIZE];
switch (cmd) {
case BLKTRACESETUP2:
snprintf(b, sizeof(b), "%pg", bdev);
ret = blk_trace_setup2(q, b, bdev->bd_dev, bdev, arg);
break;
case BLKTRACESETUP:
snprintf(b, sizeof(b), "%pg", bdev);
ret = blk_trace_setup(q, b, bdev->bd_dev, bdev, arg);
break;
#if defined(CONFIG_COMPAT) && defined(CONFIG_X86_64)
case BLKTRACESETUP32:
snprintf(b, sizeof(b), "%pg", bdev);
ret = compat_blk_trace_setup(q, b, bdev->bd_dev, bdev, arg);
break;
#endif
case BLKTRACESTART:
start = 1;
fallthrough;
case BLKTRACESTOP:
ret = blk_trace_startstop(q, start);
break;
case BLKTRACETEARDOWN:
ret = blk_trace_remove(q);
break;
default:
ret = -ENOTTY;
break;
}
return ret;
}
/**
* blk_trace_shutdown - stop and cleanup trace structures
* @q: the request queue associated with the device
*
**/
void blk_trace_shutdown(struct request_queue *q)
{
if (rcu_dereference_protected(q->blk_trace,
lockdep_is_held(&q->debugfs_mutex)))
__blk_trace_remove(q);
}
#ifdef CONFIG_BLK_CGROUP
static u64 blk_trace_bio_get_cgid(struct request_queue *q, struct bio *bio)
{
struct cgroup_subsys_state *blkcg_css;
struct blk_trace *bt;
/* We don't use the 'bt' value here except as an optimization... */
bt = rcu_dereference_protected(q->blk_trace, 1);
if (!bt || !(blk_tracer_flags.val & TRACE_BLK_OPT_CGROUP))
return 0;
blkcg_css = bio_blkcg_css(bio);
if (!blkcg_css)
return 0;
return cgroup_id(blkcg_css->cgroup);
}
#else
static u64 blk_trace_bio_get_cgid(struct request_queue *q, struct bio *bio)
{
return 0;
}
#endif
static u64
blk_trace_request_get_cgid(struct request *rq)
{
if (!rq->bio)
return 0;
/* Use the first bio */
return blk_trace_bio_get_cgid(rq->q, rq->bio);
}
/*
* blktrace probes
*/
/**
* blk_add_trace_rq - Add a trace for a request oriented action
* @rq: the source request
* @error: return status to log
* @nr_bytes: number of completed bytes
* @what: the action
* @cgid: the cgroup info
*
* Description:
* Records an action against a request. Will log the bio offset + size.
*
**/
static void blk_add_trace_rq(struct request *rq, blk_status_t error,
unsigned int nr_bytes, u64 what, u64 cgid)
{
struct blk_trace *bt;
rcu_read_lock();
bt = rcu_dereference(rq->q->blk_trace);
if (likely(!bt)) {
rcu_read_unlock();
return;
}
if (blk_rq_is_passthrough(rq))
what |= BLK_TC_ACT(BLK_TC_PC);
else
what |= BLK_TC_ACT(BLK_TC_FS);
__blk_add_trace(bt, blk_rq_trace_sector(rq), nr_bytes, rq->cmd_flags,
what, blk_status_to_errno(error), 0, NULL, cgid);
rcu_read_unlock();
}
static void blk_add_trace_rq_insert(void *ignore, struct request *rq)
{
blk_add_trace_rq(rq, 0, blk_rq_bytes(rq), BLK_TA_INSERT,
blk_trace_request_get_cgid(rq));
}
static void blk_add_trace_rq_issue(void *ignore, struct request *rq)
{
blk_add_trace_rq(rq, 0, blk_rq_bytes(rq), BLK_TA_ISSUE,
blk_trace_request_get_cgid(rq));
}
static void blk_add_trace_rq_merge(void *ignore, struct request *rq)
{
blk_add_trace_rq(rq, 0, blk_rq_bytes(rq), BLK_TA_BACKMERGE,
blk_trace_request_get_cgid(rq));
}
static void blk_add_trace_rq_requeue(void *ignore, struct request *rq)
{
blk_add_trace_rq(rq, 0, blk_rq_bytes(rq), BLK_TA_REQUEUE,
blk_trace_request_get_cgid(rq));
}
static void blk_add_trace_rq_complete(void *ignore, struct request *rq,
blk_status_t error, unsigned int nr_bytes)
{
blk_add_trace_rq(rq, error, nr_bytes, BLK_TA_COMPLETE,
blk_trace_request_get_cgid(rq));
}
static void blk_add_trace_zone_update_request(void *ignore, struct request *rq)
{
struct blk_trace *bt;
rcu_read_lock();
bt = rcu_dereference(rq->q->blk_trace);
if (likely(!bt) || bt->version < 2) {
rcu_read_unlock();
return;
}
rcu_read_unlock();
blk_add_trace_rq(rq, 0, blk_rq_bytes(rq), BLK_TA_ZONE_APPEND,
blk_trace_request_get_cgid(rq));
}
/**
* blk_add_trace_bio - Add a trace for a bio oriented action
* @q: queue the io is for
* @bio: the source bio
* @what: the action
* @error: error, if any
*
* Description:
* Records an action against a bio. Will log the bio offset + size.
*
**/
static void blk_add_trace_bio(struct request_queue *q, struct bio *bio,
u64 what, int error)
{
struct blk_trace *bt;
rcu_read_lock();
bt = rcu_dereference(q->blk_trace);
if (likely(!bt)) {
rcu_read_unlock();
return;
}
__blk_add_trace(bt, bio->bi_iter.bi_sector, bio->bi_iter.bi_size,
bio->bi_opf, what, error, 0, NULL,
blk_trace_bio_get_cgid(q, bio));
rcu_read_unlock();
}
static void blk_add_trace_bio_complete(void *ignore,
struct request_queue *q, struct bio *bio)
{
blk_add_trace_bio(q, bio, BLK_TA_COMPLETE,
blk_status_to_errno(bio->bi_status));
}
static void blk_add_trace_bio_backmerge(void *ignore, struct bio *bio)
{
blk_add_trace_bio(bio->bi_bdev->bd_disk->queue, bio, BLK_TA_BACKMERGE,
0);
}
static void blk_add_trace_bio_frontmerge(void *ignore, struct bio *bio)
{
blk_add_trace_bio(bio->bi_bdev->bd_disk->queue, bio, BLK_TA_FRONTMERGE,
0);
}
static void blk_add_trace_bio_queue(void *ignore, struct bio *bio)
{
blk_add_trace_bio(bio->bi_bdev->bd_disk->queue, bio, BLK_TA_QUEUE, 0);
}
static void blk_add_trace_getrq(void *ignore, struct bio *bio)
{
blk_add_trace_bio(bio->bi_bdev->bd_disk->queue, bio, BLK_TA_GETRQ, 0);
}
static void blk_add_trace_plug(void *ignore, struct request_queue *q)
{
struct blk_trace *bt;
rcu_read_lock();
bt = rcu_dereference(q->blk_trace);
if (bt)
__blk_add_trace(bt, 0, 0, 0, BLK_TA_PLUG, 0, 0, NULL, 0);
rcu_read_unlock();
}
static void blk_add_trace_unplug(void *ignore, struct request_queue *q,
unsigned int depth, bool explicit)
{
struct blk_trace *bt;
rcu_read_lock();
bt = rcu_dereference(q->blk_trace);
if (bt) {
__be64 rpdu = cpu_to_be64(depth);
u64 what;
if (explicit)
what = BLK_TA_UNPLUG_IO;
else
what = BLK_TA_UNPLUG_TIMER;
__blk_add_trace(bt, 0, 0, 0, what, 0, sizeof(rpdu), &rpdu, 0);
}
rcu_read_unlock();
}
static void blk_add_trace_zone_plug(void *ignore, struct request_queue *q,
unsigned int zno, sector_t sector,
unsigned int sectors)
{
struct blk_trace *bt;
rcu_read_lock();
bt = rcu_dereference(q->blk_trace);
if (bt && bt->version >= 2)
__blk_add_trace(bt, sector, sectors << SECTOR_SHIFT, 0,
BLK_TA_ZONE_PLUG, 0, 0, NULL, 0);
rcu_read_unlock();
return;
}
static void blk_add_trace_zone_unplug(void *ignore, struct request_queue *q,
unsigned int zno, sector_t sector,
unsigned int sectors)
{
struct blk_trace *bt;
rcu_read_lock();
bt = rcu_dereference(q->blk_trace);
if (bt && bt->version >= 2)
__blk_add_trace(bt, sector, sectors << SECTOR_SHIFT, 0,
BLK_TA_ZONE_UNPLUG, 0, 0, NULL, 0);
rcu_read_unlock();
return;
}
static void blk_add_trace_split(void *ignore, struct bio *bio, unsigned int pdu)
{
struct request_queue *q = bio->bi_bdev->bd_disk->queue;
struct blk_trace *bt;
rcu_read_lock();
bt = rcu_dereference(q->blk_trace);
if (bt) {
__be64 rpdu = cpu_to_be64(pdu);
__blk_add_trace(bt, bio->bi_iter.bi_sector,
bio->bi_iter.bi_size, bio->bi_opf, BLK_TA_SPLIT,
blk_status_to_errno(bio->bi_status),
sizeof(rpdu), &rpdu,
blk_trace_bio_get_cgid(q, bio));
}
rcu_read_unlock();
}
/**
* blk_add_trace_bio_remap - Add a trace for a bio-remap operation
* @ignore: trace callback data parameter (not used)
* @bio: the source bio
* @dev: source device
* @from: source sector
*
* Called after a bio is remapped to a different device and/or sector.
**/
static void blk_add_trace_bio_remap(void *ignore, struct bio *bio, dev_t dev,
sector_t from)
{
struct request_queue *q = bio->bi_bdev->bd_disk->queue;
struct blk_trace *bt;
struct blk_io_trace_remap r;
rcu_read_lock();
bt = rcu_dereference(q->blk_trace);
if (likely(!bt)) {
rcu_read_unlock();
return;
}
r.device_from = cpu_to_be32(dev);
r.device_to = cpu_to_be32(bio_dev(bio));
r.sector_from = cpu_to_be64(from);
__blk_add_trace(bt, bio->bi_iter.bi_sector, bio->bi_iter.bi_size,
bio->bi_opf, BLK_TA_REMAP,
blk_status_to_errno(bio->bi_status),
sizeof(r), &r, blk_trace_bio_get_cgid(q, bio));
rcu_read_unlock();
}
/**
* blk_add_trace_rq_remap - Add a trace for a request-remap operation
* @ignore: trace callback data parameter (not used)
* @rq: the source request
* @dev: target device
* @from: source sector
*
* Description:
* Device mapper remaps request to other devices.
* Add a trace for that action.
*
**/
static void blk_add_trace_rq_remap(void *ignore, struct request *rq, dev_t dev,
sector_t from)
{
struct blk_trace *bt;
struct blk_io_trace_remap r;
rcu_read_lock();
bt = rcu_dereference(rq->q->blk_trace);
if (likely(!bt)) {
rcu_read_unlock();
return;
}
r.device_from = cpu_to_be32(dev);
r.device_to = cpu_to_be32(disk_devt(rq->q->disk));
r.sector_from = cpu_to_be64(from);
__blk_add_trace(bt, blk_rq_pos(rq), blk_rq_bytes(rq),
rq->cmd_flags, BLK_TA_REMAP, 0,
sizeof(r), &r, blk_trace_request_get_cgid(rq));
rcu_read_unlock();
}
/**
* blk_add_driver_data - Add binary message with driver-specific data
* @rq: io request
* @data: driver-specific data
* @len: length of driver-specific data
*
* Description:
* Some drivers might want to write driver-specific data per request.
*
**/
void blk_add_driver_data(struct request *rq, void *data, size_t len)
{
struct blk_trace *bt;
rcu_read_lock();
bt = rcu_dereference(rq->q->blk_trace);
if (likely(!bt)) {
rcu_read_unlock();
return;
}
__blk_add_trace(bt, blk_rq_trace_sector(rq), blk_rq_bytes(rq), 0,
BLK_TA_DRV_DATA, 0, len, data,
blk_trace_request_get_cgid(rq));
rcu_read_unlock();
}
EXPORT_SYMBOL_GPL(blk_add_driver_data);
static void blk_register_tracepoints(void)
{
int ret;
ret = register_trace_block_rq_insert(blk_add_trace_rq_insert, NULL);
WARN_ON(ret);
ret = register_trace_block_rq_issue(blk_add_trace_rq_issue, NULL);
WARN_ON(ret);
ret = register_trace_block_rq_merge(blk_add_trace_rq_merge, NULL);
WARN_ON(ret);
ret = register_trace_block_rq_requeue(blk_add_trace_rq_requeue, NULL);
WARN_ON(ret);
ret = register_trace_block_rq_complete(blk_add_trace_rq_complete, NULL);
WARN_ON(ret);
ret = register_trace_block_bio_complete(blk_add_trace_bio_complete, NULL);
WARN_ON(ret);
ret = register_trace_block_bio_backmerge(blk_add_trace_bio_backmerge, NULL);
WARN_ON(ret);
ret = register_trace_block_bio_frontmerge(blk_add_trace_bio_frontmerge, NULL);
WARN_ON(ret);
ret = register_trace_block_bio_queue(blk_add_trace_bio_queue, NULL);
WARN_ON(ret);
ret = register_trace_block_getrq(blk_add_trace_getrq, NULL);
WARN_ON(ret);
ret = register_trace_blk_zone_append_update_request_bio(
blk_add_trace_zone_update_request, NULL);
WARN_ON(ret);
ret = register_trace_disk_zone_wplug_add_bio(blk_add_trace_zone_plug,
NULL);
WARN_ON(ret);
ret = register_trace_blk_zone_wplug_bio(blk_add_trace_zone_unplug,
NULL);
WARN_ON(ret);
ret = register_trace_block_plug(blk_add_trace_plug, NULL);
WARN_ON(ret);
ret = register_trace_block_unplug(blk_add_trace_unplug, NULL);
WARN_ON(ret);
ret = register_trace_block_split(blk_add_trace_split, NULL);
WARN_ON(ret);
ret = register_trace_block_bio_remap(blk_add_trace_bio_remap, NULL);
WARN_ON(ret);
ret = register_trace_block_rq_remap(blk_add_trace_rq_remap, NULL);
WARN_ON(ret);
}
static void blk_unregister_tracepoints(void)
{
unregister_trace_block_rq_remap(blk_add_trace_rq_remap, NULL);
unregister_trace_block_bio_remap(blk_add_trace_bio_remap, NULL);
unregister_trace_block_split(blk_add_trace_split, NULL);
unregister_trace_block_unplug(blk_add_trace_unplug, NULL);
unregister_trace_block_plug(blk_add_trace_plug, NULL);
unregister_trace_blk_zone_wplug_bio(blk_add_trace_zone_unplug, NULL);
unregister_trace_disk_zone_wplug_add_bio(blk_add_trace_zone_plug, NULL);
unregister_trace_blk_zone_append_update_request_bio(
blk_add_trace_zone_update_request, NULL);
unregister_trace_block_getrq(blk_add_trace_getrq, NULL);
unregister_trace_block_bio_queue(blk_add_trace_bio_queue, NULL);
unregister_trace_block_bio_frontmerge(blk_add_trace_bio_frontmerge, NULL);
unregister_trace_block_bio_backmerge(blk_add_trace_bio_backmerge, NULL);
unregister_trace_block_bio_complete(blk_add_trace_bio_complete, NULL);
unregister_trace_block_rq_complete(blk_add_trace_rq_complete, NULL);
unregister_trace_block_rq_requeue(blk_add_trace_rq_requeue, NULL);
unregister_trace_block_rq_merge(blk_add_trace_rq_merge, NULL);
unregister_trace_block_rq_issue(blk_add_trace_rq_issue, NULL);
unregister_trace_block_rq_insert(blk_add_trace_rq_insert, NULL);
tracepoint_synchronize_unregister();
}
/*
* struct blk_io_tracer formatting routines
*/
static void fill_rwbs(char *rwbs, const struct blk_io_trace2 *t)
{
int i = 0;
int tc = t->action >> BLK_TC_SHIFT;
if ((t->action & ~__BLK_TN_CGROUP) == BLK_TN_MESSAGE) {
rwbs[i++] = 'N';
goto out;
}
if (tc & BLK_TC_FLUSH)
rwbs[i++] = 'F';
if (tc & BLK_TC_DISCARD)
rwbs[i++] = 'D';
else if (tc & BLK_TC_WRITE_ZEROES) {
rwbs[i++] = 'W';
rwbs[i++] = 'Z';
} else if (tc & BLK_TC_WRITE)
rwbs[i++] = 'W';
else if (t->bytes)
rwbs[i++] = 'R';
else
rwbs[i++] = 'N';
if (tc & BLK_TC_FUA)
rwbs[i++] = 'F';
if (tc & BLK_TC_AHEAD)
rwbs[i++] = 'A';
if (tc & BLK_TC_SYNC)
rwbs[i++] = 'S';
if (tc & BLK_TC_META)
rwbs[i++] = 'M';
out:
rwbs[i] = '\0';
}
static inline
const struct blk_io_trace2 *te_blk_io_trace(const struct trace_entry *ent)
{
return (const struct blk_io_trace2 *)ent;
}
static inline const void *pdu_start(const struct trace_entry *ent, bool has_cg)
{
return (void *)(te_blk_io_trace(ent) + 1) + (has_cg ? sizeof(u64) : 0);
}
static inline u64 t_cgid(const struct trace_entry *ent)
{
return *(u64 *)(te_blk_io_trace(ent) + 1);
}
static inline int pdu_real_len(const struct trace_entry *ent, bool has_cg)
{
return te_blk_io_trace(ent)->pdu_len - (has_cg ? sizeof(u64) : 0);
}
static inline u32 t_action(const struct trace_entry *ent)
{
return te_blk_io_trace(ent)->action;
}
static inline u32 t_bytes(const struct trace_entry *ent)
{
return te_blk_io_trace(ent)->bytes;
}
static inline u32 t_sec(const struct trace_entry *ent)
{
return te_blk_io_trace(ent)->bytes >> 9;
}
static inline unsigned long long t_sector(const struct trace_entry *ent)
{
return te_blk_io_trace(ent)->sector;
}
static inline __u16 t_error(const struct trace_entry *ent)
{
return te_blk_io_trace(ent)->error;
}
static __u64 get_pdu_int(const struct trace_entry *ent, bool has_cg)
{
const __be64 *val = pdu_start(ent, has_cg);
return be64_to_cpu(*val);
}
typedef void (blk_log_action_t) (struct trace_iterator *iter, const char *act,
bool has_cg);
static void blk_log_action_classic(struct trace_iterator *iter, const char *act,
bool has_cg)
{
char rwbs[RWBS_LEN];
unsigned long long ts = iter->ts;
unsigned long nsec_rem = do_div(ts, NSEC_PER_SEC);
unsigned secs = (unsigned long)ts;
const struct blk_io_trace2 *t = te_blk_io_trace(iter->ent);
fill_rwbs(rwbs, t);
trace_seq_printf(&iter->seq,
"%3d,%-3d %2d %5d.%09lu %5u %2s %3s ",
MAJOR(t->device), MINOR(t->device), iter->cpu,
secs, nsec_rem, iter->ent->pid, act, rwbs);
}
static void blk_log_action(struct trace_iterator *iter, const char *act,
bool has_cg)
{
char rwbs[RWBS_LEN];
const struct blk_io_trace2 *t = te_blk_io_trace(iter->ent);
fill_rwbs(rwbs, t);
if (has_cg) {
u64 id = t_cgid(iter->ent);
if (blk_tracer_flags.val & TRACE_BLK_OPT_CGNAME) {
char blkcg_name_buf[NAME_MAX + 1] = "<...>";
cgroup_path_from_kernfs_id(id, blkcg_name_buf,
sizeof(blkcg_name_buf));
trace_seq_printf(&iter->seq, "%3d,%-3d %s %2s %3s ",
MAJOR(t->device), MINOR(t->device),
blkcg_name_buf, act, rwbs);
} else {
/*
* The cgid portion used to be "INO,GEN". Userland
* builds a FILEID_INO32_GEN fid out of them and
* opens the cgroup using open_by_handle_at(2).
* While 32bit ino setups are still the same, 64bit
* ones now use the 64bit ino as the whole ID and
* no longer use generation.
*
* Regardless of the content, always output
* "LOW32,HIGH32" so that FILEID_INO32_GEN fid can
* be mapped back to @id on both 64 and 32bit ino
* setups. See __kernfs_fh_to_dentry().
*/
trace_seq_printf(&iter->seq,
"%3d,%-3d %llx,%-llx %2s %3s ",
MAJOR(t->device), MINOR(t->device),
id & U32_MAX, id >> 32, act, rwbs);
}
} else
trace_seq_printf(&iter->seq, "%3d,%-3d %2s %3s ",
MAJOR(t->device), MINOR(t->device), act, rwbs);
}
static void blk_log_dump_pdu(struct trace_seq *s,
const struct trace_entry *ent, bool has_cg)
{
const unsigned char *pdu_buf;
int pdu_len;
int i, end;
pdu_buf = pdu_start(ent, has_cg);
pdu_len = pdu_real_len(ent, has_cg);
if (!pdu_len)
return;
/* find the last zero that needs to be printed */
for (end = pdu_len - 1; end >= 0; end--)
if (pdu_buf[end])
break;
end++;
trace_seq_putc(s, '(');
for (i = 0; i < pdu_len; i++) {
trace_seq_printf(s, "%s%02x",
i == 0 ? "" : " ", pdu_buf[i]);
/*
* stop when the rest is just zeros and indicate so
* with a ".." appended
*/
if (i == end && end != pdu_len - 1) {
trace_seq_puts(s, " ..) ");
return;
}
}
trace_seq_puts(s, ") ");
}
static void blk_log_generic(struct trace_seq *s, const struct trace_entry *ent, bool has_cg)
{
char cmd[TASK_COMM_LEN];
trace_find_cmdline(ent->pid, cmd);
if (t_action(ent) & BLK_TC_ACT(BLK_TC_PC)) {
trace_seq_printf(s, "%u ", t_bytes(ent));
blk_log_dump_pdu(s, ent, has_cg);
trace_seq_printf(s, "[%s]\n", cmd);
} else {
if (t_sec(ent))
trace_seq_printf(s, "%llu + %u [%s]\n",
t_sector(ent), t_sec(ent), cmd);
else
trace_seq_printf(s, "[%s]\n", cmd);
}
}
static void blk_log_with_error(struct trace_seq *s,
const struct trace_entry *ent, bool has_cg)
{
if (t_action(ent) & BLK_TC_ACT(BLK_TC_PC)) {
blk_log_dump_pdu(s, ent, has_cg);
trace_seq_printf(s, "[%d]\n", t_error(ent));
} else {
if (t_sec(ent))
trace_seq_printf(s, "%llu + %u [%d]\n",
t_sector(ent),
t_sec(ent), t_error(ent));
else
trace_seq_printf(s, "%llu [%d]\n",
t_sector(ent), t_error(ent));
}
}
static void blk_log_remap(struct trace_seq *s, const struct trace_entry *ent, bool has_cg)
{
const struct blk_io_trace_remap *__r = pdu_start(ent, has_cg);
trace_seq_printf(s, "%llu + %u <- (%d,%d) %llu\n",
t_sector(ent), t_sec(ent),
MAJOR(be32_to_cpu(__r->device_from)),
MINOR(be32_to_cpu(__r->device_from)),
be64_to_cpu(__r->sector_from));
}
static void blk_log_plug(struct trace_seq *s, const struct trace_entry *ent, bool has_cg)
{
char cmd[TASK_COMM_LEN];
trace_find_cmdline(ent->pid, cmd);
trace_seq_printf(s, "[%s]\n", cmd);
}
static void blk_log_unplug(struct trace_seq *s, const struct trace_entry *ent, bool has_cg)
{
char cmd[TASK_COMM_LEN];
trace_find_cmdline(ent->pid, cmd);
trace_seq_printf(s, "[%s] %llu\n", cmd, get_pdu_int(ent, has_cg));
}
static void blk_log_split(struct trace_seq *s, const struct trace_entry *ent, bool has_cg)
{
char cmd[TASK_COMM_LEN];
trace_find_cmdline(ent->pid, cmd);
trace_seq_printf(s, "%llu / %llu [%s]\n", t_sector(ent),
get_pdu_int(ent, has_cg), cmd);
}
static void blk_log_msg(struct trace_seq *s, const struct trace_entry *ent,
bool has_cg)
{
trace_seq_putmem(s, pdu_start(ent, has_cg),
pdu_real_len(ent, has_cg));
trace_seq_putc(s, '\n');
}
/*
* struct tracer operations
*/
static void blk_tracer_print_header(struct seq_file *m)
{
if (!(blk_tracer_flags.val & TRACE_BLK_OPT_CLASSIC))
return;
seq_puts(m, "# DEV CPU TIMESTAMP PID ACT FLG\n"
"# | | | | | |\n");
}
static void blk_tracer_start(struct trace_array *tr)
{
blk_tracer_enabled = true;
}
static int blk_tracer_init(struct trace_array *tr)
{
blk_tr = tr;
blk_tracer_start(tr);
return 0;
}
static void blk_tracer_stop(struct trace_array *tr)
{
blk_tracer_enabled = false;
}
static void blk_tracer_reset(struct trace_array *tr)
{
blk_tracer_stop(tr);
}
static const struct {
const char *act[2];
void (*print)(struct trace_seq *s, const struct trace_entry *ent,
bool has_cg);
} what2act[] = {
[__BLK_TA_QUEUE] = {{ "Q", "queue" }, blk_log_generic },
[__BLK_TA_BACKMERGE] = {{ "M", "backmerge" }, blk_log_generic },
[__BLK_TA_FRONTMERGE] = {{ "F", "frontmerge" }, blk_log_generic },
[__BLK_TA_GETRQ] = {{ "G", "getrq" }, blk_log_generic },
[__BLK_TA_SLEEPRQ] = {{ "S", "sleeprq" }, blk_log_generic },
[__BLK_TA_REQUEUE] = {{ "R", "requeue" }, blk_log_with_error },
[__BLK_TA_ISSUE] = {{ "D", "issue" }, blk_log_generic },
[__BLK_TA_COMPLETE] = {{ "C", "complete" }, blk_log_with_error },
[__BLK_TA_PLUG] = {{ "P", "plug" }, blk_log_plug },
[__BLK_TA_UNPLUG_IO] = {{ "U", "unplug_io" }, blk_log_unplug },
[__BLK_TA_UNPLUG_TIMER] = {{ "UT", "unplug_timer" }, blk_log_unplug },
[__BLK_TA_INSERT] = {{ "I", "insert" }, blk_log_generic },
[__BLK_TA_SPLIT] = {{ "X", "split" }, blk_log_split },
[__BLK_TA_REMAP] = {{ "A", "remap" }, blk_log_remap },
};
static enum print_line_t print_one_line(struct trace_iterator *iter,
bool classic)
{
struct trace_array *tr = iter->tr;
struct trace_seq *s = &iter->seq;
const struct blk_io_trace2 *t;
u16 what;
bool long_act;
blk_log_action_t *log_action;
bool has_cg;
t = te_blk_io_trace(iter->ent);
what = (t->action & ((1 << BLK_TC_SHIFT) - 1)) & ~__BLK_TA_CGROUP;
long_act = !!(tr->trace_flags & TRACE_ITER(VERBOSE));
log_action = classic ? &blk_log_action_classic : &blk_log_action;
has_cg = t->action & __BLK_TA_CGROUP;
if ((t->action & ~__BLK_TN_CGROUP) == BLK_TN_MESSAGE) {
log_action(iter, long_act ? "message" : "m", has_cg);
blk_log_msg(s, iter->ent, has_cg);
return trace_handle_return(s);
}
if (unlikely(what == 0 || what >= ARRAY_SIZE(what2act)))
trace_seq_printf(s, "Unknown action %x\n", what);
else {
log_action(iter, what2act[what].act[long_act], has_cg);
what2act[what].print(s, iter->ent, has_cg);
}
return trace_handle_return(s);
}
static enum print_line_t blk_trace_event_print(struct trace_iterator *iter,
int flags, struct trace_event *event)
{
return print_one_line(iter, false);
}
static void blk_trace_synthesize_old_trace(struct trace_iterator *iter)
{
struct trace_seq *s = &iter->seq;
struct blk_io_trace2 *t = (struct blk_io_trace2 *)iter->ent;
const int offset = offsetof(struct blk_io_trace2, sector);
struct blk_io_trace old = {
.magic = BLK_IO_TRACE_MAGIC | BLK_IO_TRACE_VERSION,
.time = iter->ts,
};
trace_seq_putmem(s, &old, offset);
trace_seq_putmem(s, &t->sector,
sizeof(old) - offset + t->pdu_len);
}
static enum print_line_t
blk_trace_event_print_binary(struct trace_iterator *iter, int flags,
struct trace_event *event)
{
blk_trace_synthesize_old_trace(iter);
return trace_handle_return(&iter->seq);
}
static enum print_line_t blk_tracer_print_line(struct trace_iterator *iter)
{
if ((iter->ent->type != TRACE_BLK) ||
!(blk_tracer_flags.val & TRACE_BLK_OPT_CLASSIC))
return TRACE_TYPE_UNHANDLED;
return print_one_line(iter, true);
}
static int
blk_tracer_set_flag(struct trace_array *tr, u32 old_flags, u32 bit, int set)
{
/* don't output context-info for blk_classic output */
if (bit == TRACE_BLK_OPT_CLASSIC) {
if (set)
tr->trace_flags &= ~TRACE_ITER(CONTEXT_INFO);
else
tr->trace_flags |= TRACE_ITER(CONTEXT_INFO);
}
return 0;
}
static struct tracer blk_tracer __read_mostly = {
.name = "blk",
.init = blk_tracer_init,
.reset = blk_tracer_reset,
.start = blk_tracer_start,
.stop = blk_tracer_stop,
.print_header = blk_tracer_print_header,
.print_line = blk_tracer_print_line,
.flags = &blk_tracer_flags,
.set_flag = blk_tracer_set_flag,
};
static struct trace_event_functions trace_blk_event_funcs = {
.trace = blk_trace_event_print,
.binary = blk_trace_event_print_binary,
};
static struct trace_event trace_blk_event = {
.type = TRACE_BLK,
.funcs = &trace_blk_event_funcs,
};
static int __init init_blk_tracer(void)
{
if (!register_trace_event(&trace_blk_event)) {
pr_warn("Warning: could not register block events\n");
return 1;
}
if (register_tracer(&blk_tracer) != 0) {
pr_warn("Warning: could not register the block tracer\n");
unregister_trace_event(&trace_blk_event);
return 1;
}
BUILD_BUG_ON(__alignof__(struct blk_user_trace_setup2) %
__alignof__(long));
BUILD_BUG_ON(__alignof__(struct blk_io_trace2) % __alignof__(long));
return 0;
}
device_initcall(init_blk_tracer);
static int blk_trace_remove_queue(struct request_queue *q)
{
struct blk_trace *bt;
bt = rcu_replace_pointer(q->blk_trace, NULL,
lockdep_is_held(&q->debugfs_mutex));
if (bt == NULL)
return -EINVAL;
blk_trace_stop(bt);
put_probe_ref();
synchronize_rcu();
blk_trace_free(q, bt);
return 0;
}
/*
* Setup everything required to start tracing
*/
static int blk_trace_setup_queue(struct request_queue *q,
struct block_device *bdev)
{
struct blk_trace *bt = NULL;
int ret = -ENOMEM;
bt = kzalloc(sizeof(*bt), GFP_KERNEL);
if (!bt)
return -ENOMEM;
bt->msg_data = __alloc_percpu(BLK_TN_MAX_MSG, __alignof__(char));
if (!bt->msg_data)
goto free_bt;
bt->dev = bdev->bd_dev;
bt->act_mask = (u16)-1;
blk_trace_setup_lba(bt, bdev);
rcu_assign_pointer(q->blk_trace, bt);
get_probe_ref();
return 0;
free_bt:
blk_trace_free(q, bt);
return ret;
}
/*
* sysfs interface to enable and configure tracing
*/
static ssize_t sysfs_blk_trace_attr_show(struct device *dev,
struct device_attribute *attr,
char *buf);
static ssize_t sysfs_blk_trace_attr_store(struct device *dev,
struct device_attribute *attr,
const char *buf, size_t count);
#define BLK_TRACE_DEVICE_ATTR(_name) \
DEVICE_ATTR(_name, S_IRUGO | S_IWUSR, \
sysfs_blk_trace_attr_show, \
sysfs_blk_trace_attr_store)
static BLK_TRACE_DEVICE_ATTR(enable);
static BLK_TRACE_DEVICE_ATTR(act_mask);
static BLK_TRACE_DEVICE_ATTR(pid);
static BLK_TRACE_DEVICE_ATTR(start_lba);
static BLK_TRACE_DEVICE_ATTR(end_lba);
static struct attribute *blk_trace_attrs[] = {
&dev_attr_enable.attr,
&dev_attr_act_mask.attr,
&dev_attr_pid.attr,
&dev_attr_start_lba.attr,
&dev_attr_end_lba.attr,
NULL
};
struct attribute_group blk_trace_attr_group = {
.name = "trace",
.attrs = blk_trace_attrs,
};
static const struct {
int mask;
const char *str;
} mask_maps[] = {
{ BLK_TC_READ, "read" },
{ BLK_TC_WRITE, "write" },
{ BLK_TC_FLUSH, "flush" },
{ BLK_TC_SYNC, "sync" },
{ BLK_TC_QUEUE, "queue" },
{ BLK_TC_REQUEUE, "requeue" },
{ BLK_TC_ISSUE, "issue" },
{ BLK_TC_COMPLETE, "complete" },
{ BLK_TC_FS, "fs" },
{ BLK_TC_PC, "pc" },
{ BLK_TC_NOTIFY, "notify" },
{ BLK_TC_AHEAD, "ahead" },
{ BLK_TC_META, "meta" },
{ BLK_TC_DISCARD, "discard" },
{ BLK_TC_DRV_DATA, "drv_data" },
{ BLK_TC_FUA, "fua" },
{ BLK_TC_WRITE_ZEROES, "write-zeroes" },
};
static int blk_trace_str2mask(const char *str)
{
int i;
int mask = 0;
char *buf, *s, *token;
buf = kstrdup(str, GFP_KERNEL);
if (buf == NULL)
return -ENOMEM;
s = strstrip(buf);
while (1) {
token = strsep(&s, ",");
if (token == NULL)
break;
if (*token == '\0')
continue;
for (i = 0; i < ARRAY_SIZE(mask_maps); i++) {
if (strcasecmp(token, mask_maps[i].str) == 0) {
mask |= mask_maps[i].mask;
break;
}
}
if (i == ARRAY_SIZE(mask_maps)) {
mask = -EINVAL;
break;
}
}
kfree(buf);
return mask;
}
static ssize_t blk_trace_mask2str(char *buf, int mask)
{
int i;
char *p = buf;
for (i = 0; i < ARRAY_SIZE(mask_maps); i++) {
if (mask & mask_maps[i].mask) {
p += sprintf(p, "%s%s",
(p == buf) ? "" : ",", mask_maps[i].str);
}
}
*p++ = '\n';
return p - buf;
}
static ssize_t sysfs_blk_trace_attr_show(struct device *dev,
struct device_attribute *attr,
char *buf)
{
struct block_device *bdev = dev_to_bdev(dev);
struct request_queue *q = bdev_get_queue(bdev);
struct blk_trace *bt;
ssize_t ret = -ENXIO;
mutex_lock(&q->debugfs_mutex);
bt = rcu_dereference_protected(q->blk_trace,
lockdep_is_held(&q->debugfs_mutex));
if (attr == &dev_attr_enable) {
ret = sprintf(buf, "%u\n", !!bt);
goto out_unlock_bdev;
}
if (bt == NULL)
ret = sprintf(buf, "disabled\n");
else if (attr == &dev_attr_act_mask)
ret = blk_trace_mask2str(buf, bt->act_mask);
else if (attr == &dev_attr_pid)
ret = sprintf(buf, "%u\n", bt->pid);
else if (attr == &dev_attr_start_lba)
ret = sprintf(buf, "%llu\n", bt->start_lba);
else if (attr == &dev_attr_end_lba)
ret = sprintf(buf, "%llu\n", bt->end_lba);
out_unlock_bdev:
mutex_unlock(&q->debugfs_mutex);
return ret;
}
static ssize_t sysfs_blk_trace_attr_store(struct device *dev,
struct device_attribute *attr,
const char *buf, size_t count)
{
struct block_device *bdev = dev_to_bdev(dev);
struct request_queue *q = bdev_get_queue(bdev);
struct blk_trace *bt;
u64 value;
ssize_t ret = -EINVAL;
if (count == 0)
goto out;
if (attr == &dev_attr_act_mask) {
if (kstrtoull(buf, 0, &value)) {
/* Assume it is a list of trace category names */
ret = blk_trace_str2mask(buf);
if (ret < 0)
goto out;
value = ret;
}
} else {
if (kstrtoull(buf, 0, &value))
goto out;
}
mutex_lock(&q->debugfs_mutex);
bt = rcu_dereference_protected(q->blk_trace,
lockdep_is_held(&q->debugfs_mutex));
if (attr == &dev_attr_enable) {
if (!!value == !!bt) {
ret = 0;
goto out_unlock_bdev;
}
if (value)
ret = blk_trace_setup_queue(q, bdev);
else
ret = blk_trace_remove_queue(q);
goto out_unlock_bdev;
}
ret = 0;
if (bt == NULL) {
ret = blk_trace_setup_queue(q, bdev);
bt = rcu_dereference_protected(q->blk_trace,
lockdep_is_held(&q->debugfs_mutex));
}
if (ret == 0) {
if (attr == &dev_attr_act_mask)
bt->act_mask = value;
else if (attr == &dev_attr_pid)
bt->pid = value;
else if (attr == &dev_attr_start_lba)
bt->start_lba = value;
else if (attr == &dev_attr_end_lba)
bt->end_lba = value;
}
out_unlock_bdev:
mutex_unlock(&q->debugfs_mutex);
out:
return ret ? ret : count;
}
#endif /* CONFIG_BLK_DEV_IO_TRACE */
#ifdef CONFIG_EVENT_TRACING
/**
* blk_fill_rwbs - Fill the buffer rwbs by mapping op to character string.
* @rwbs: buffer to be filled
* @opf: request operation type (REQ_OP_XXX) and flags for the tracepoint
*
* Description:
* Maps each request operation and flag to a single character and fills the
* buffer provided by the caller with resulting string.
*
**/
void blk_fill_rwbs(char *rwbs, blk_opf_t opf)
{
int i = 0;
if (opf & REQ_PREFLUSH)
rwbs[i++] = 'F';
switch (opf & REQ_OP_MASK) {
case REQ_OP_WRITE:
rwbs[i++] = 'W';
break;
case REQ_OP_DISCARD:
rwbs[i++] = 'D';
break;
case REQ_OP_SECURE_ERASE:
rwbs[i++] = 'D';
rwbs[i++] = 'E';
break;
case REQ_OP_FLUSH:
rwbs[i++] = 'F';
break;
case REQ_OP_READ:
rwbs[i++] = 'R';
break;
case REQ_OP_ZONE_APPEND:
rwbs[i++] = 'Z';
rwbs[i++] = 'A';
break;
case REQ_OP_ZONE_RESET:
case REQ_OP_ZONE_RESET_ALL:
rwbs[i++] = 'Z';
rwbs[i++] = 'R';
if ((opf & REQ_OP_MASK) == REQ_OP_ZONE_RESET_ALL)
rwbs[i++] = 'A';
break;
case REQ_OP_ZONE_FINISH:
rwbs[i++] = 'Z';
rwbs[i++] = 'F';
break;
case REQ_OP_ZONE_OPEN:
rwbs[i++] = 'Z';
rwbs[i++] = 'O';
break;
case REQ_OP_ZONE_CLOSE:
rwbs[i++] = 'Z';
rwbs[i++] = 'C';
break;
case REQ_OP_WRITE_ZEROES:
rwbs[i++] = 'W';
rwbs[i++] = 'Z';
break;
default:
rwbs[i++] = 'N';
}
if (opf & REQ_FUA)
rwbs[i++] = 'F';
if (opf & REQ_RAHEAD)
rwbs[i++] = 'A';
if (opf & REQ_SYNC)
rwbs[i++] = 'S';
if (opf & REQ_META)
rwbs[i++] = 'M';
if (opf & REQ_ATOMIC)
rwbs[i++] = 'U';
WARN_ON_ONCE(i >= RWBS_LEN);
rwbs[i] = '\0';
}
EXPORT_SYMBOL_GPL(blk_fill_rwbs);
#endif /* CONFIG_EVENT_TRACING */