Compare commits

...

645 Commits

Author SHA1 Message Date
Linus Torvalds
b6d993310a Merge tag 'linux_kselftest-kunit-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull kunit updates from Shuah Khan:

 - Make filter parameters configurable via Kconfig

 - Add description of kunit.enable parameter to documentation

* tag 'linux_kselftest-kunit-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  kunit: Make filter parameters configurable via Kconfig
  Documentation: kunit: add description of kunit.enable parameter
2025-12-03 15:50:11 -08:00
Linus Torvalds
2488655b2f Merge tag 'linux_kselftest-next-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull kselftest updates from Shuah Khan:

 - Add basic test for trace_marker_raw file to tracing selftest

 - Fix invalid array access in printf dma_map_benchmark selftest

 - Add tprobe enable/disable testcase to tracing selftest

 - Update fprobe selftest for ftrace based fprobe

* tag 'linux_kselftest-next-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  selftests: tracing: Update fprobe selftest for ftrace based fprobe
  selftests: tracing: Add tprobe enable/disable testcase
  selftests/run_kselftest.sh: exit with error if tests fail
  selftests/dma: fix invalid array access in printf
  selftests/tracing: Add basic test for trace_marker_raw file
2025-12-03 15:08:18 -08:00
Linus Torvalds
2ddcf4962c Merge tag 'kbuild-6.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux
Pull Kbuild updates from Nicolas Schier:

  - Enable -fms-extensions, allowing anonymous use of tagged struct or
    union in struct/union (tag kbuild-ms-extensions-6.19). An exemplary
    conversion patch is added here, too (btrfs).

    [ Editor's note: the core of this actually came in early through a
      shared branch and a few other trees    - Linus ]

  - Introduce architecture-specific CC_CAN_LINK and flags for userprogs

  - Add new packaging target 'modules-cpio-pkg' for building a initramfs
    cpio w/ kmods

  - Handle included .c files in gen_compile_commands

  - Minor kbuild changes:
     - Use objtree for module signing key path, fixing oot kmod signing
     - Improve documentation of KBUILD_BUILD_TIMESTAMP
     - Reuse KBUILD_USERCFLAGS for UAPI, instead of defining twice
     - Rename scripts/Makefile.extrawarn to Makefile.warn
     - Drop obsolete types.h check from headers_check.pl
     - Remove outdated config leak ignore entries

* tag 'kbuild-6.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux:
  kbuild: add target to build a cpio containing modules
  initramfs: add gen_init_cpio to hostprogs unconditionally
  kbuild: allow architectures to override CC_CAN_LINK
  init: deduplicate cc-can-link.sh invocations
  kbuild: don't enable CC_CAN_LINK if the dummy program generates warnings
  scripts: headers_install.sh: Remove two outdated config leak ignore entries
  scripts/clang-tools: Handle included .c files in gen_compile_commands
  kbuild: uapi: Drop types.h check from headers_check.pl
  kbuild: Rename Makefile.extrawarn to Makefile.warn
  MAINTAINERS, .mailmap: Update mail address for Nicolas Schier
  kbuild: uapi: reuse KBUILD_USERCFLAGS
  kbuild: doc: improve KBUILD_BUILD_TIMESTAMP documentation
  kbuild: Use objtree for module signing key path
  btrfs: send: make use of -fms-extensions for defining struct fs_path
2025-12-03 14:42:21 -08:00
Linus Torvalds
784faa8eca Merge tag 'rust-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux
Pull Rust updates from Miguel Ojeda:
 "Toolchain and infrastructure:

   - Add support for 'syn'.

     Syn is a parsing library for parsing a stream of Rust tokens into a
     syntax tree of Rust source code.

     Currently this library is geared toward use in Rust procedural
     macros, but contains some APIs that may be useful more generally.

     'syn' allows us to greatly simplify writing complex macros such as
     'pin-init' (Benno has already prepared the 'syn'-based version). We
     will use it in the 'macros' crate too.

     'syn' is the most downloaded Rust crate (according to crates.io),
     and it is also used by the Rust compiler itself. While the amount
     of code is substantial, there should not be many updates needed for
     these crates, and even if there are, they should not be too big,
     e.g. +7k -3k lines across the 3 crates in the last year.

     'syn' requires two smaller dependencies: 'quote' and 'proc-macro2'.
     I only modified their code to remove a third dependency
     ('unicode-ident') and to add the SPDX identifiers. The code can be
     easily verified to exactly match upstream with the provided
     scripts.

     They are all licensed under "Apache-2.0 OR MIT", like the other
     vendored 'alloc' crate we had for a while.

     Please see the merge commit with the cover letter for more context.

   - Allow 'unreachable_pub' and 'clippy::disallowed_names' for
     doctests.

     Examples (i.e. doctests) may want to do things like show public
     items and use names such as 'foo'.

     Nevertheless, we still try to keep examples as close to real code
     as possible (this is part of why running Clippy on doctests is
     important for us, e.g. for safety comments, which userspace Rust
     does not support yet but we are stricter).

  'kernel' crate:

   - Replace our custom 'CStr' type with 'core::ffi::CStr'.

     Using the standard library type reduces our custom code footprint,
     and we retain needed custom functionality through an extension
     trait and a new 'fmt!' macro which replaces the previous 'core'
     import.

     This started in 6.17 and continued in 6.18, and we finally land the
     replacement now. This required quite some stamina from Tamir, who
     split the changes in steps to prepare for the flag day change here.

   - Replace 'kernel::c_str!' with C string literals.

     C string literals were added in Rust 1.77, which produce '&CStr's
     (the 'core' one), so now we can write:

         c"hi"

     instead of:

         c_str!("hi")

   - Add 'num' module for numerical features.

     It includes the 'Integer' trait, implemented for all primitive
     integer types.

     It also includes the 'Bounded' integer wrapping type: an integer
     value that requires only the 'N' least significant bits of the
     wrapped type to be encoded:

         // An unsigned 8-bit integer, of which only the 4 LSBs are used.
         let v = Bounded::<u8, 4>::new::<15>();
         assert_eq!(v.get(), 15);

     'Bounded' is useful to e.g. enforce guarantees when working with
     bitfields that have an arbitrary number of bits.

     Values can also be constructed from simple non-constant expressions
     or, for more complex ones, validated at runtime.

     'Bounded' also comes with comparison and arithmetic operations
     (with both their backing type and other 'Bounded's with a
     compatible backing type), casts to change the backing type,
     extending/shrinking and infallible/fallible conversions from/to
     primitives as applicable.

   - 'rbtree' module: add immutable cursor ('Cursor').

     It enables to use just an immutable tree reference where
     appropriate. The existing fully-featured mutable cursor is renamed
     to 'CursorMut'.

  kallsyms:

   - Fix wrong "big" kernel symbol type read from procfs.

  'pin-init' crate:

   - A couple minor fixes (Benno asked me to pick these patches up for
     him this cycle).

  Documentation:

   - Quick Start guide: add Debian 13 (Trixie).

     Debian Stable is now able to build Linux, since Debian 13 (released
     2025-08-09) packages Rust 1.85.0, which is recent enough.

     We are planning to propose that the minimum supported Rust version
     in Linux follows Debian Stable releases, with Debian 13 being the
     first one we upgrade to, i.e. Rust 1.85.

  MAINTAINERS:

   - Add entry for the new 'num' module.

   - Remove Alex as Rust maintainer: he hasn't had the time to
     contribute for a few years now, so it is a no-op change in
     practice.

  And a few other cleanups and improvements"

* tag 'rust-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux: (53 commits)
  rust: macros: support `proc-macro2`, `quote` and `syn`
  rust: syn: enable support in kbuild
  rust: syn: add `README.md`
  rust: syn: remove `unicode-ident` dependency
  rust: syn: add SPDX License Identifiers
  rust: syn: import crate
  rust: quote: enable support in kbuild
  rust: quote: add `README.md`
  rust: quote: add SPDX License Identifiers
  rust: quote: import crate
  rust: proc-macro2: enable support in kbuild
  rust: proc-macro2: add `README.md`
  rust: proc-macro2: remove `unicode_ident` dependency
  rust: proc-macro2: add SPDX License Identifiers
  rust: proc-macro2: import crate
  rust: kbuild: support using libraries in `rustc_procmacro`
  rust: kbuild: support skipping flags in `rustc_test_library`
  rust: kbuild: add proc macro library support
  rust: kbuild: simplify `--cfg` handling
  rust: kbuild: introduce `core-flags` and `core-skip_flags`
  ...
2025-12-03 14:16:49 -08:00
Linus Torvalds
51ab33fc0a Merge tag 'livepatching-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatching
Pull livepatching updates from Petr Mladek:

 - Support both paths where tracefs is typically mounted in selftests

 - Make old_sympos 0 and 1 equal. They both are valid when there is only
   one symbol with the given name.

* tag 'livepatching-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatching:
  selftests: livepatch: use canonical ftrace path
  livepatch: Match old_sympos 0 and 1 in klp_find_func()
2025-12-03 13:46:48 -08:00
Linus Torvalds
02baaa67d9 Merge tag 'sched_ext-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext
Pull sched_ext updates from Tejun Heo:

 - Improve recovery from misbehaving BPF schedulers.

   When a scheduler puts many tasks with varying affinity restrictions
   on a shared DSQ, CPUs scanning through tasks they cannot run can
   overwhelm the system, causing lockups.

   Bypass mode now uses per-CPU DSQs with a load balancer to avoid this,
   and hooks into the hardlockup detector to attempt recovery.

   Add scx_cpu0 example scheduler to demonstrate this scenario.

 - Add lockless peek operation for DSQs to reduce lock contention for
   schedulers that need to query queue state during load balancing.

 - Allow scx_bpf_reenqueue_local() to be called from anywhere in
   preparation for deprecating cpu_acquire/release() callbacks in favor
   of generic BPF hooks.

 - Prepare for hierarchical scheduler support: add
   scx_bpf_task_set_slice() and scx_bpf_task_set_dsq_vtime() kfuncs,
   make scx_bpf_dsq_insert*() return bool, and wrap kfunc args in
   structs for future aux__prog parameter.

 - Implement cgroup_set_idle() callback to notify BPF schedulers when a
   cgroup's idle state changes.

 - Fix migration tasks being incorrectly downgraded from
   stop_sched_class to rt_sched_class across sched_ext enable/disable.
   Applied late as the fix is low risk and the bug subtle but needs
   stable backporting.

 - Various fixes and cleanups including cgroup exit ordering,
   SCX_KICK_WAIT reliability, and backward compatibility improvements.

* tag 'sched_ext-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext: (44 commits)
  sched_ext: Fix incorrect sched_class settings for per-cpu migration tasks
  sched_ext: tools: Removing duplicate targets during non-cross compilation
  sched_ext: Use kvfree_rcu() to release per-cpu ksyncs object
  sched_ext: Pass locked CPU parameter to scx_hardlockup() and add docs
  sched_ext: Update comments replacing breather with aborting mechanism
  sched_ext: Implement load balancer for bypass mode
  sched_ext: Factor out abbreviated dispatch dequeue into dispatch_dequeue_locked()
  sched_ext: Factor out scx_dsq_list_node cursor initialization into INIT_DSQ_LIST_CURSOR
  sched_ext: Add scx_cpu0 example scheduler
  sched_ext: Hook up hardlockup detector
  sched_ext: Make handle_lockup() propagate scx_verror() result
  sched_ext: Refactor lockup handlers into handle_lockup()
  sched_ext: Make scx_exit() and scx_vexit() return bool
  sched_ext: Exit dispatch and move operations immediately when aborting
  sched_ext: Simplify breather mechanism with scx_aborting flag
  sched_ext: Use per-CPU DSQs instead of per-node global DSQs in bypass mode
  sched_ext: Refactor do_enqueue_task() local and global DSQ paths
  sched_ext: Use shorter slice in bypass mode
  sched_ext: Mark racy bitfields to prevent adding fields that can't tolerate races
  sched_ext: Minor cleanups to scx_task_iter
  ...
2025-12-03 13:25:39 -08:00
Linus Torvalds
8449d3252c Merge tag 'cgroup-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup updates from Tejun Heo:

 - Defer task cgroup unlink until after the dying task's final context
   switch so that controllers see the cgroup properly populated until
   the task is truly gone

 - cpuset cleanups and simplifications.

   Enforce that domain isolated CPUs stay in root or isolated partitions
   and fail if isolated+nohz_full would leave no housekeeping CPU. Fix
   sched/deadline root domain handling during CPU hot-unplug and race
   for tasks in attaching cpusets

 - Misc fixes including memory reclaim protection documentation and
   selftest KTAP conformance

* tag 'cgroup-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (21 commits)
  cpuset: Treat cpusets in attaching as populated
  sched/deadline: Walk up cpuset hierarchy to decide root domain when hot-unplug
  cgroup/cpuset: Introduce cpuset_cpus_allowed_locked()
  docs: cgroup: No special handling of unpopulated memcgs
  docs: cgroup: Note about sibling relative reclaim protection
  docs: cgroup: Explain reclaim protection target
  selftests/cgroup: conform test to KTAP format output
  cpuset: remove need_rebuild_sched_domains
  cpuset: remove global remote_children list
  cpuset: simplify node setting on error
  cgroup: include missing header for struct irq_work
  cgroup: Fix sleeping from invalid context warning on PREEMPT_RT
  cgroup/cpuset: Globally track isolated_cpus update
  cgroup/cpuset: Ensure domain isolated CPUs stay in root or isolated partition
  cgroup/cpuset: Move up prstate_housekeeping_conflict() helper
  cgroup/cpuset: Fail if isolated and nohz_full don't leave any housekeeping
  cgroup/cpuset: Rename update_unbound_workqueue_cpumask() to update_isolation_cpumasks()
  cgroup: Defer task cgroup unlink until after the task is done switching out
  cgroup: Move dying_tasks cleanup from cgroup_task_release() to cgroup_task_free()
  cgroup: Rename cgroup lifecycle hooks to cgroup_task_*()
  ...
2025-12-03 13:04:07 -08:00
Linus Torvalds
2b60145734 Merge tag 'wq-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Pull workqueue updates from Tejun Heo:

 - Rescuer affinity management: Affinity is now updated only when
   detached using wq_unbound_cpumask consistently. DISASSOCIATED workers
   also follow unbound cpumask changes to avoid breaking CPU isolation

 - Rescuer cleanups preparing for fetching work items one by one from
   pool list: Work assignment factored out, optimized to skip pwqs no
   longer needing rescue, and shutdown logic simplified

 - Unused assert_rcu_or_wq_mutex_or_pool_mutex() removed

* tag 'wq-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  workqueue: Don't rely on wq->rescuer to stop rescuer
  workqueue: Only assign rescuer work when really needed
  workqueue: Factor out assign_rescuer_work()
  workqueue: Init rescuer's affinities as wq_unbound_cpumask
  workqueue: Let DISASSOCIATED workers follow unbound wq cpumask changes
  workqueue: Update the rescuer's affinity only when it is detached
  workqueue: Remove unused assert_rcu_or_wq_mutex_or_pool_mutex
2025-12-03 12:50:10 -08:00
Linus Torvalds
4d38b88fd1 Merge tag 'printk-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux
Pull printk updates from Petr Mladek:

 - Allow creaing nbcon console drivers with an unsafe write_atomic()
   callback that can only be called by the final nbcon_atomic_flush_unsafe().
   Otherwise, the driver would rely on the kthread.

   It is going to be used as the-best-effort approach for an
   experimental nbcon netconsole driver, see

     https://lore.kernel.org/r/20251121-nbcon-v1-2-503d17b2b4af@debian.org

   Note that a safe .write_atomic() callback is supposed to work in NMI
   context. But some networking drivers are not safe even in IRQ
   context:

     https://lore.kernel.org/r/oc46gdpmmlly5o44obvmoatfqo5bhpgv7pabpvb6sjuqioymcg@gjsma3ghoz35

   In an ideal world, all networking drivers would be fixed first and
   the atomic flush would be blocked only in NMI context. But it brings
   the question how reliable networking drivers are when the system is
   in a bad state. They might block flushing more reliable serial
   consoles which are more suitable for serious debugging anyway.

 - Allow to use the last 4 bytes of the printk ring buffer.

 - Prevent queuing IRQ work and block printk kthreads when consoles are
   suspended. Otherwise, they create non-necessary churn or even block
   the suspend.

 - Release console_lock() between each record in the kthread used for
   legacy consoles on RT. It might significantly speed up the boot.

 - Release nbcon context between each record in the atomic flush. It
   prevents stalls of the related printk kthread after it has lost the
   ownership in the middle of a record

 - Add support for NBCON consoles into KDB

 - Add %ptsP modifier for printing struct timespec64 and use it where
   possible

 - Misc code clean up

* tag 'printk-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux: (48 commits)
  printk: Use console_is_usable on console_unblank
  arch: um: kmsg_dump: Use console_is_usable
  drivers: serial: kgdboc: Drop checks for CON_ENABLED and CON_BOOT
  lib/vsprintf: Unify FORMAT_STATE_NUM handlers
  printk: Avoid irq_work for printk_deferred() on suspend
  printk: Avoid scheduling irq_work on suspend
  printk: Allow printk_trigger_flush() to flush all types
  tracing: Switch to use %ptSp
  scsi: snic: Switch to use %ptSp
  scsi: fnic: Switch to use %ptSp
  s390/dasd: Switch to use %ptSp
  ptp: ocp: Switch to use %ptSp
  pps: Switch to use %ptSp
  PCI: epf-test: Switch to use %ptSp
  net: dsa: sja1105: Switch to use %ptSp
  mmc: mmc_test: Switch to use %ptSp
  media: av7110: Switch to use %ptSp
  ipmi: Switch to use %ptSp
  igb: Switch to use %ptSp
  e1000e: Switch to use %ptSp
  ...
2025-12-03 12:42:36 -08:00
Linus Torvalds
4a4e019937 Merge tag 'lkmm.2025.12.01a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu
Pull lkmm documentation update from Paul McKenney:

 - Sort the memory-barriers.txt file's wait_event* and wait_on_bit* list
   alphabetically

* tag 'lkmm.2025.12.01a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu:
  memory-barriers.txt: Sort wait_event* and wait_on_bit* list alphabetically
2025-12-03 12:41:00 -08:00
Linus Torvalds
98e7dcbb82 Merge tag 'rcu.release.v6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux
Pull RCU updates from Frederic Weisbecker:
 "SRCU:

   - Properly handle SRCU readers within IRQ disabled sections in tiny
     SRCU

   - Preparation to reimplement RCU Tasks Trace on top of SRCU fast:

      - Introduce API to expedite a grace period and test it through
        rcutorture

      - Split srcu-fast in two flavours: SRCU-fast and SRCU-fast-updown.

        Both are still targeted toward faster readers (without full
        barriers on LOCK and UNLOCK) at the expense of heavier write
        side (using full RCU grace period ordering instead of simply
        full ordering) as compared to "traditional" non-fast SRCU. But
        those srcu-fast flavours are going to be optimized in two
        different ways:

          - SRCU-fast will become the reimplementation basis for
            RCU-TASK-TRACE for consolidation. Since RCU-TASK-TRACE must
            be NMI safe, SRCU-fast must be as well.

          - SRCU-fast-updown will be needed for uretprobes code in order
            to get rid of the read-side memory barriers while still
            allowing entering the reader at task level while exiting it
            in a timer handler. It is considered semaphore-like in that
            it can have different owners between LOCK and UNLOCK.
            However it is not NMI-safe.

        The actual optimizations are work in progress for the next
        cycle. Only the new interfaces are added for now, along with
        related torture and scalability test code.

   - Create/document/debug/torture new proper initializers for RCU fast:
     DEFINE_SRCU_FAST() and init_srcu_struct_fast()

     This allows for using right away the proper ordering on the write
     side (either full ordering or full RCU grace period ordering)
     without waiting for the read side to tell which to use.

     This also optimizes the read side altogether with moving flavour
     debug checks under debug config and with removing a costly RmW
     operation on their first call.

   - Make some diagnostic functions tracing safe

  Refscale:

   - Add performance testing for common context synchronizations
     (Preemption, IRQ, Softirq) and per-cpu increments. Those are
     relevant comparisons against SRCU-fast read side APIs, especially
     as they are planned to synchronize further tracing fast-path code

  Miscellanous:

   - In order to prepare the layout for nohz_full work deferral to user
     exit, the context tracking state must shrink the counter of
     transitions to/from RCU not watching. The only possible hazard is
     to trigger wrap-around more easily, delaying a bit grace periods
     when that happens. This should be a rare event though. Yet add
     debugging and torture code to test that assumption

   - Fix memory leak on locktorture module

   - Annotate accesses in rculist_nulls.h to prevent from KCSAN
     warnings. On recent discussions, we also concluded that all those
     WRITE_ONCE() and READ_ONCE() on list APIs deserve appropriate
     comments. Something to be expected for the next cycle

   - Provide a script to apply several configs to several commits with
     torture

   - Allow torture to reuse a build directory in order to save needless
     rebuild time

   - Various cleanups"

* tag 'rcu.release.v6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux: (29 commits)
  refscale: Add SRCU-fast-updown readers
  refscale: Exercise DEFINE_STATIC_SRCU_FAST() and init_srcu_struct_fast()
  rcutorture: Make srcu{,d}_torture_init() announce the SRCU type
  srcu: Create an SRCU-fast-updown API
  refscale: Do not disable interrupts for tests involving local_bh_enable()
  refscale: Add non-atomic per-CPU increment readers
  refscale: Add this_cpu_inc() readers
  refscale: Add preempt_disable() readers
  refscale: Add local_bh_disable() readers
  refscale: Add local_irq_disable() and local_irq_save() readers
  torture: Permit negative kvm.sh --kconfig numberic arguments
  srcu: Add SRCU_READ_FLAVOR_FAST_UPDOWN CPP macro
  rcu: Mark diagnostic functions as notrace
  rcutorture: Make TREE04 use CONFIG_RCU_DYNTICKS_TORTURE
  rcutorture: Remove redundant rcutorture_one_extend() from rcu_torture_one_read()
  rcutorture: Permit kvm-again.sh to re-use the build directory
  torture: Add kvm-series.sh to test commit/scenario combination
  rcu: use WRITE_ONCE() for ->next and ->pprev of hlist_nulls
  locktorture: Fix memory leak in param_set_cpumask()
  doc: Update for SRCU-fast definitions and initialization
  ...
2025-12-03 12:18:07 -08:00
Linus Torvalds
b687034b1a Merge tag 'slab-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab
Pull slab updates from Vlastimil Babka:

 - mempool_alloc_bulk() support for upcoming users in the block layer
   that need to allocate multiple objects at once with the mempool's
   guaranteed progress semantics, which is not achievable with an
   allocation single objects in a loop. Along with refactoring and
   various improvements (Christoph Hellwig)

 - Preparations for the upcoming separation of struct slab from struct
   page, mostly by removing the struct folio layer, as the purpose of
   struct folio has shifted since it became used in slab code (Matthew
   Wilcox)

 - Modernisation of slab's boot param API usage, which removes some
   unexpected parsing corner cases (Petr Tesarik)

 - Refactoring of freelist_aba_t (now struct freelist_counters) and
   associated functions for double cmpxchg, enabled by -fms-extensions
   (Vlastimil Babka)

 - Cleanups and improvements related to sheaves caching layer, that were
   part of the full conversion to sheaves, which is planned for the next
   release (Vlastimil Babka)

* tag 'slab-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab: (42 commits)
  slab: Remove unnecessary call to compound_head() in alloc_from_pcs()
  mempool: clarify behavior of mempool_alloc_preallocated()
  mempool: drop the file name in the top of file comment
  mempool: de-typedef
  mempool: remove mempool_{init,create}_kvmalloc_pool
  mempool: legitimize the io_schedule_timeout in mempool_alloc_from_pool
  mempool: add mempool_{alloc,free}_bulk
  mempool: factor out a mempool_alloc_from_pool helper
  slab: Remove references to folios from virt_to_slab()
  kasan: Remove references to folio in __kasan_mempool_poison_object()
  memcg: Convert mem_cgroup_from_obj_folio() to mem_cgroup_from_obj_slab()
  mempool: factor out a mempool_adjust_gfp helper
  mempool: add error injection support
  mempool: improve kerneldoc comments
  mm: improve kerneldoc comments for __alloc_pages_bulk
  fault-inject: make enum fault_flags available unconditionally
  usercopy: Remove folio references from check_heap_object()
  slab: Remove folio references from kfree_nolock()
  slab: Remove folio references from kfree_rcu_sheaf()
  slab: Remove folio references from build_detached_freelist()
  ...
2025-12-03 11:53:47 -08:00
Linus Torvalds
f96163865a Merge tag 'docs-6.19' of git://git.lwn.net/linux
Pull documentation updates from Jonathan Corbet:
 "This has been another busy cycle for documentation, with a lot of
  build-system thrashing. That work should slow down from here on out.

   - The various scripts and tools for documentation were spread out in
     several directories; now they are (almost) all coalesced under
     tools/docs/. The holdout is the kernel-doc script, which cannot be
     easily moved without some further thought.

   - As the amount of Python code increases, we are accumulating modules
     that are imported by multiple programs. These modules have been
     pulled together under tools/lib/python/ -- at least, for
     documentation-related programs. There is other Python code in the
     tree that might eventually want to move toward this organization.

   - The Perl kernel-doc.pl script has been removed. It is no longer
     used by default, and nobody has missed it, least of all anybody who
     actually had to look at it.

   - The docs build was controlled by a complex mess of makefilese that
     few dared to touch. Mauro has moved that logic into a new program
     (tools/docs/sphinx-build-wrapper) that, with any luck at all, will
     be far easier to understand and maintain.

   - The get_feat.pl program, used to access information under
     Documentation/features/, has been rewritten in Python, bringing an
     end to the use of Perl in the docs subsystem.

   - The top-level README file has been reorganized into a more
     reader-friendly presentation.

   - A lot of Chinese translation additions

   - Typo fixes and documentation updates as usual"

* tag 'docs-6.19' of git://git.lwn.net/linux: (164 commits)
  docs: makefile: move rustdoc check to the build wrapper
  README: restructure with role-based documentation and guidelines
  docs: kdoc: various fixes for grammar, spelling, punctuation
  docs: kdoc_parser: use '@' for Excess enum value
  docs: submitting-patches: Clarify that removal of Acks needs explanation too
  docs: kdoc_parser: add data/function attributes to ignore
  docs: MAINTAINERS: update Mauro's files/paths
  docs/zh_CN: Add wd719x.rst translation
  docs/zh_CN: Add libsas.rst translation
  get_feat.pl: remove it, as it got replaced by get_feat.py
  Documentation/sphinx/kernel_feat.py: use class directly
  tools/docs/get_feat.py: convert get_feat.pl to Python
  Documentation/admin-guide: fix typo and comment in cscope example
  docs/zh_CN: Add data-integrity.rst translation
  docs/zh_CN: Add blk-mq.rst translation
  docs/zh_CN: Add block/index.rst translation
  docs/zh_CN: Update the Chinese translation of kbuild.rst
  docs: bring some order to our Python module hierarchy
  docs: Move the python libraries to tools/lib/python
  Documentation/kernel-parameters: Move the kernel build options
  ...
2025-12-03 11:34:28 -08:00
Linus Torvalds
a619fe35ab Merge tag 'v6.19-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto updates from Herbert Xu:
 "API:
   - Rewrite memcpy_sglist from scratch
   - Add on-stack AEAD request allocation
   - Fix partial block processing in ahash

  Algorithms:
   - Remove ansi_cprng
   - Remove tcrypt tests for poly1305
   - Fix EINPROGRESS processing in authenc
   - Fix double-free in zstd

  Drivers:
   - Use drbg ctr helper when reseeding xilinx-trng
   - Add support for PCI device 0x115A to ccp
   - Add support of paes in caam
   - Add support for aes-xts in dthev2

  Others:
   - Use likely in rhashtable lookup
   - Fix lockdep false-positive in padata by removing a helper"

* tag 'v6.19-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (71 commits)
  crypto: zstd - fix double-free in per-CPU stream cleanup
  crypto: ahash - Zero positive err value in ahash_update_finish
  crypto: ahash - Fix crypto_ahash_import with partial block data
  crypto: lib/mpi - use min() instead of min_t()
  crypto: ccp - use min() instead of min_t()
  hwrng: core - use min3() instead of nested min_t()
  crypto: aesni - ctr_crypt() use min() instead of min_t()
  crypto: drbg - Delete unused ctx from struct sdesc
  crypto: testmgr - Add missing DES weak and semi-weak key tests
  Revert "crypto: scatterwalk - Move skcipher walk and use it for memcpy_sglist"
  crypto: scatterwalk - Fix memcpy_sglist() to always succeed
  crypto: iaa - Request to add Kanchana P Sridhar to Maintainers.
  crypto: tcrypt - Remove unused poly1305 support
  crypto: ansi_cprng - Remove unused ansi_cprng algorithm
  crypto: asymmetric_keys - fix uninitialized pointers with free attribute
  KEYS: Avoid -Wflex-array-member-not-at-end warning
  crypto: ccree - Correctly handle return of sg_nents_for_len
  crypto: starfive - Correctly handle return of sg_nents_for_len
  crypto: iaa - Fix incorrect return value in save_iaa_wq()
  crypto: zstd - Remove unnecessary size_t cast
  ...
2025-12-03 11:28:38 -08:00
Linus Torvalds
c832183148 Merge tag 'ipe-pr-20251202' of git://git.kernel.org/pub/scm/linux/kernel/git/wufan/ipe
Pull IPE udates from Fan Wu:
 "The primary change is the addition of support for the AT_EXECVE_CHECK
  flag. This allows interpreters to signal the kernel to perform IPE
  security checks on script files before execution, extending IPE
  enforcement to indirectly executed scripts.

  Update documentation for it, and also fix a comment"

* tag 'ipe-pr-20251202' of git://git.kernel.org/pub/scm/linux/kernel/git/wufan/ipe:
  ipe: Update documentation for script enforcement
  ipe: Add AT_EXECVE_CHECK support for script enforcement
  ipe: Drop a duplicated CONFIG_ prefix in the ifdeffery
2025-12-03 11:19:34 -08:00
Linus Torvalds
777f817160 Merge tag 'integrity-v6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity
Pull integrity updates from Mimi Zohar:
 "Bug fixes:

   - defer credentials checking from the bprm_check_security hook to the
     bprm_creds_from_file security hook

   - properly ignore IMA policy rules based on undefined SELinux labels

  IMA policy rule extensions:

   - extend IMA to limit including file hashes in the audit logs
     (dont_audit action)

   - define a new filesystem subtype policy option (fs_subtype)

  Misc:

   - extend IMA to support in-kernel module decompression by deferring
     the IMA signature verification in kernel_read_file() to after the
     kernel module is decompressed"

* tag 'integrity-v6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity:
  ima: Handle error code returned by ima_filter_rule_match()
  ima: Access decompressed kernel module to verify appended signature
  ima: add fs_subtype condition for distinguishing FUSE instances
  ima: add dont_audit action to suppress audit actions
  ima: Attach CREDS_CHECK IMA hook to bprm_creds_from_file LSM hook
2025-12-03 11:08:03 -08:00
Linus Torvalds
204a920f28 Merge tag 'Smack-for-6.19' of https://github.com/cschaufler/smack-next
Pull smack updates from Casey Schaufler:

 - fix several cases where labels were treated inconsistently when
   imported from user space

 - clean up the assignment of extended attributes

 - documentation improvements

* tag 'Smack-for-6.19' of https://github.com/cschaufler/smack-next:
  Smack: function parameter 'gfp' not described
  smack: fix kernel-doc warnings for smk_import_valid_label()
  smack: fix bug: setting task label silently ignores input garbage
  smack: fix bug: unprivileged task can create labels
  smack: fix bug: invalid label of unix socket file
  smack: always "instantiate" inode in smack_inode_init_security()
  smack: deduplicate xattr setting in smack_inode_init_security()
  smack: fix bug: SMACK64TRANSMUTE set on non-directory
  smack: deduplicate "does access rule request transmutation"
2025-12-03 10:58:59 -08:00
Linus Torvalds
0eae3283c3 Merge tag 'audit-pr-20251201' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit
Pull audit updates from Paul Moore:

 - Consolidate the loops in __audit_inode_child() to improve performance

   When logging a child inode in __audit_inode_child(), we first run
   through the list of recorded inodes looking for the parent and then
   we repeat the search looking for a matching child entry. This pull
   request consolidates both searches into one pass through the recorded
   inodes, resuling in approximately a 50% reduction in audit overhead.

   See the commit description for the testing details.

 - Combine kmalloc()/memset() into kzalloc() in audit_krule_to_data()

 - Comment fixes

* tag 'audit-pr-20251201' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
  audit: merge loops in __audit_inode_child()
  audit: Use kzalloc() instead of kmalloc()/memset() in audit_krule_to_data()
  audit: fix comment misindentation in audit.h
2025-12-03 10:52:01 -08:00
Linus Torvalds
51e3b98d73 Merge tag 'selinux-pr-20251201' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux
Pull selinux updates from Paul Moore:

 - Improve the granularity of SELinux labeling for memfd files

   Currently when creating a memfd file, SELinux treats it the same as
   any other tmpfs, or hugetlbfs, file. While simple, the drawback is
   that it is not possible to differentiate between memfd and tmpfs
   files.

   This adds a call to the security_inode_init_security_anon() LSM hook
   and wires up SELinux to provide a set of memfd specific access
   controls, including the ability to control the execution of memfds.

   As usual, the commit message has more information.

 - Improve the SELinux AVC lookup performance

   Adopt MurmurHash3 for the SELinux AVC hash function instead of the
   custom hash function currently used. MurmurHash3 is already used for
   the SELinux access vector table so the impact to the code is minimal,
   and performance tests have shown improvements in both hash
   distribution and latency.

   See the commit message for the performance measurments.

 - Introduce a Kconfig option for the SELinux AVC bucket/slot size

   While we have the ability to grow the number of AVC hash buckets
   today, the size of the buckets (slot size) is fixed at 512. This pull
   request makes that slot size configurable at build time through a new
   Kconfig knob, CONFIG_SECURITY_SELINUX_AVC_HASH_BITS.

* tag 'selinux-pr-20251201' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
  selinux: improve bucket distribution uniformity of avc_hash()
  selinux: Move avtab_hash() to a shared location for future reuse
  selinux: Introduce a new config to make avc cache slot size adjustable
  memfd,selinux: call security_inode_init_security_anon()
2025-12-03 10:45:47 -08:00
Linus Torvalds
121cc35cfb Merge tag 'lsm-pr-20251201' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm
Pull LSM updates from Paul Moore:

 - Rework the LSM initialization code

   What started as a "quick" patch to enable a notification event once
   all of the individual LSMs were initialized, snowballed a bit into a
   30+ patch patchset when everything was done. Most of the patches, and
   diffstat, is due to splitting out the initialization code into
   security/lsm_init.c and cleaning up some of the mess that was there.
   While not strictly necessary, it does cleanup the code signficantly,
   and hopefully makes the upkeep a bit easier in the future.

   Aside from the new LSM_STARTED_ALL notification, these changes also
   ensure that individual LSM initcalls are only called when the LSM is
   enabled at boot time. There should be a minor reduction in boot times
   for those who build multiple LSMs into their kernels, but only enable
   a subset at boot.

   It is worth mentioning that nothing at present makes use of the
   LSM_STARTED_ALL notification, but there is work in progress which is
   dependent upon LSM_STARTED_ALL.

 - Make better use of the seq_put*() helpers in device_cgroup

* tag 'lsm-pr-20251201' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm: (36 commits)
  lsm: use unrcu_pointer() for current->cred in security_init()
  device_cgroup: Refactor devcgroup_seq_show to use seq_put* helpers
  lsm: add a LSM_STARTED_ALL notification event
  lsm: consolidate all of the LSM framework initcalls
  selinux: move initcalls to the LSM framework
  ima,evm: move initcalls to the LSM framework
  lockdown: move initcalls to the LSM framework
  apparmor: move initcalls to the LSM framework
  safesetid: move initcalls to the LSM framework
  tomoyo: move initcalls to the LSM framework
  smack: move initcalls to the LSM framework
  ipe: move initcalls to the LSM framework
  loadpin: move initcalls to the LSM framework
  lsm: introduce an initcall mechanism into the LSM framework
  lsm: group lsm_order_parse() with the other lsm_order_*() functions
  lsm: output available LSMs when debugging
  lsm: cleanup the debug and console output in lsm_init.c
  lsm: add/tweak function header comment blocks in lsm_init.c
  lsm: fold lsm_init_ordered() into security_init()
  lsm: cleanup initialize_lsm() and rename to lsm_init_single()
  ...
2025-12-03 09:53:48 -08:00
Linus Torvalds
7fc2cd2e4b Merge tag 'keys-trusted-next-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd
Pull trusted key updates from Jarkko Sakkinen:

 - Remove duplicate 'tpm2_hash_map' in favor of 'tpm2_find_hash_alg()'

 - Fix a memory leak on failure paths of 'tpm2_load_cmd'

* tag 'keys-trusted-next-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd:
  KEYS: trusted: Fix a memory leak in tpm2_load_cmd
  KEYS: trusted: Replace a redundant instance of tpm2_hash_map
2025-12-03 09:45:23 -08:00
Linus Torvalds
b082c4b060 Merge tag 'keys-next-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd
Pull keys update from Jarkko Sakkinen:
 "This contains only three fixes"

* tag 'keys-next-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd:
  keys: Fix grammar and formatting in 'struct key_type' comments
  keys: Replace deprecated strncpy in ecryptfs_fill_auth_tok
  keys: Remove redundant less-than-zero checks
2025-12-03 09:41:04 -08:00
Linus Torvalds
f2310b6271 Merge tag 'nolibc-20251130-for-6.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/nolibc/linux-nolibc
Pull nolibc updates from Thomas Weißschuh:

 - Preparations to the use of nolibc in UML:
     - Cleanup of sparse warnings
     - Library mode without _start()
     - More consistency when disabling errno

 - Unconditional installation of all architecture support files

 - Always 64-bit wide ino_t and off_t

 - Various cleanups and bug fixes

* tag 'nolibc-20251130-for-6.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/nolibc/linux-nolibc: (25 commits)
  selftests/nolibc: error out on linker warnings
  selftests/nolibc: use lld to link loongarch binaries
  tools/nolibc: remove more __nolibc_enosys() fallbacks
  tools/nolibc: remove now superfluous overflow check in llseek
  tools/nolibc: use 64-bit off_t
  tools/nolibc: prefer the llseek syscall
  tools/nolibc: handle 64-bit off_t for llseek
  tools/nolibc: use 64-bit ino_t
  tools/nolibc: avoid using plain integer as NULL pointer
  tools/nolibc: add support for fchdir()
  tools/nolibc: clean up outdated comments in generic arch.h
  tools/nolibc: make the "headers" target install all supported archs
  tools/nolibc: add the more portable inttypes.h
  tools/nolibc: provide the portable sys/select.h
  tools/nolibc: add missing memchr() to string.h
  tools/nolibc: fix misleading help message regarding installation path
  tools/nolibc: add uio.h with readv and writev
  tools/nolibc: add option to disable runtime
  tools/nolibc: use __fallthrough__ rather than fallthrough
  tools/nolibc: implement %m if errno is not defined
  ...
2025-12-03 09:23:25 -08:00
Rob Herring (Arm)
82d7a9da6e dt-bindings: thermal: qcom-tsens: Remove invalid tab character
Commit 1ee90870ce ("dt-bindings: thermal: tsens: Add QCS8300
compatible") uses a tab character which is illegal in YAML (at the
beginning of a line).  The original patch was correct, so this got
corrupted when applied.

Fixes: 1ee90870ce ("dt-bindings: thermal: tsens: Add QCS8300 compatible")
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-12-03 08:30:31 -08:00
Yanzhu Huang
d7ba853c0e ipe: Update documentation for script enforcement
This patch adds explanation of script enforcement mechanism in admin
guide documentation. Describes how IPE supports integrity enforcement
for indirectly executed scripts through the AT_EXECVE_CHECK flag, and
how this differs from kernel enforcement for compiled executables.

Signed-off-by: Yanzhu Huang <yanzhuhuang@linux.microsoft.com>
Signed-off-by: Fan Wu <wufan@kernel.org>
2025-12-02 19:37:10 -08:00
Yanzhu Huang
67678189e4 ipe: Add AT_EXECVE_CHECK support for script enforcement
This patch adds a new ipe_bprm_creds_for_exec() hook that integrates
with the AT_EXECVE_CHECK mechanism. To enable script enforcement,
interpreters need to incorporate the AT_EXECVE_CHECK flag when
calling execveat() on script files before execution.

When a userspace interpreter calls execveat() with the AT_EXECVE_CHECK
flag, this hook triggers IPE policy evaluation on the script file. The
hook only triggers IPE when bprm->is_check is true, ensuring it's
being called from an AT_EXECVE_CHECK context. It then builds an
evaluation context for an IPE_OP_EXEC operation and invokes IPE policy.
The kernel returns the policy decision to the interpreter, which can
then decide whether to proceed with script execution.

This extends IPE enforcement to indirectly executed scripts, permitting
trusted scripts to execute while denying untrusted ones.

Signed-off-by: Yanzhu Huang <yanzhuhuang@linux.microsoft.com>
Signed-off-by: Fan Wu <wufan@kernel.org>
2025-12-02 19:37:01 -08:00
Borislav Petkov (AMD)
864468ae30 ipe: Drop a duplicated CONFIG_ prefix in the ifdeffery
Looks like it got added by mistake, perhaps editor auto-completion
artifact. Drop it.

No functional changes.

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Fan Wu <wufan@kernel.org>
2025-12-02 19:29:21 -08:00
Zqiang
1dd6c84f1c sched_ext: Fix incorrect sched_class settings for per-cpu migration tasks
When loading the ebpf scheduler, the tasks in the scx_tasks list will
be traversed and invoke __setscheduler_class() to get new sched_class.
however, this would also incorrectly set the per-cpu migration
task's->sched_class to rt_sched_class, even after unload, the per-cpu
migration task's->sched_class remains sched_rt_class.

The log for this issue is as follows:

./scx_rustland --stats 1
[  199.245639][  T630] sched_ext: "rustland" does not implement cgroup cpu.weight
[  199.269213][  T630] sched_ext: BPF scheduler "rustland" enabled
04:25:09 [INFO] RustLand scheduler attached

bpftrace -e 'iter:task /strcontains(ctx->task->comm, "migration")/
{ printf("%s:%d->%pS\n", ctx->task->comm, ctx->task->pid, ctx->task->sched_class); }'
Attaching 1 probe...
migration/0:24->rt_sched_class+0x0/0xe0
migration/1:27->rt_sched_class+0x0/0xe0
migration/2:33->rt_sched_class+0x0/0xe0
migration/3:39->rt_sched_class+0x0/0xe0
migration/4:45->rt_sched_class+0x0/0xe0
migration/5:52->rt_sched_class+0x0/0xe0
migration/6:58->rt_sched_class+0x0/0xe0
migration/7:64->rt_sched_class+0x0/0xe0

sched_ext: BPF scheduler "rustland" disabled (unregistered from user space)
EXIT: unregistered from user space
04:25:21 [INFO] Unregister RustLand scheduler

bpftrace -e 'iter:task /strcontains(ctx->task->comm, "migration")/
{ printf("%s:%d->%pS\n", ctx->task->comm, ctx->task->pid, ctx->task->sched_class); }'
Attaching 1 probe...
migration/0:24->rt_sched_class+0x0/0xe0
migration/1:27->rt_sched_class+0x0/0xe0
migration/2:33->rt_sched_class+0x0/0xe0
migration/3:39->rt_sched_class+0x0/0xe0
migration/4:45->rt_sched_class+0x0/0xe0
migration/5:52->rt_sched_class+0x0/0xe0
migration/6:58->rt_sched_class+0x0/0xe0
migration/7:64->rt_sched_class+0x0/0xe0

This commit therefore generate a new scx_setscheduler_class() and
add check for stop_sched_class to replace __setscheduler_class().

Fixes: f0e1a0643a ("sched_ext: Implement BPF extensible scheduler class")
Cc: stable@vger.kernel.org # v6.12+
Signed-off-by: Zqiang <qiang.zhang@linux.dev>
Reviewed-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-12-01 10:58:49 -10:00
Petr Mladek
5cae92e622 Merge branch 'rework/write_atomic-unsafe' into for-linus 2025-12-01 14:17:04 +01:00
Petr Mladek
4f132d81f9 Merge branch 'rework/threaded-printk' into for-linus 2025-12-01 14:16:45 +01:00
Petr Mladek
3a9a3f5fb2 Merge branch 'rework/suspend-fixes' into for-linus 2025-12-01 14:16:28 +01:00
Petr Mladek
b1e6c41ef9 Merge branch 'rework/preempt-legacy-kthread' into for-linus 2025-12-01 14:16:08 +01:00
Petr Mladek
2d786a5b80 Merge branch 'rework/nbcon-in-kdb' into for-linus 2025-12-01 14:15:43 +01:00
Petr Mladek
475bb520c3 Merge branch 'rework/atomic-flush-hardlockup' into for-linus 2025-12-01 14:15:26 +01:00
Petr Mladek
3869e431b5 Merge branch 'for-6.19-vsprintf-timespec64' into for-linus 2025-12-01 14:14:34 +01:00
Giovanni Cabiddu
48bc9da3c9 crypto: zstd - fix double-free in per-CPU stream cleanup
The crypto/zstd module has a double-free bug that occurs when multiple
tfms are allocated and freed.

The issue happens because zstd_streams (per-CPU contexts) are freed in
zstd_exit() during every tfm destruction, rather than being managed at
the module level.  When multiple tfms exist, each tfm exit attempts to
free the same shared per-CPU streams, resulting in a double-free.

This leads to a stack trace similar to:

  BUG: Bad page state in process kworker/u16:1  pfn:106fd93
  page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x106fd93
  flags: 0x17ffffc0000000(node=0|zone=2|lastcpupid=0x1fffff)
  page_type: 0xffffffff()
  raw: 0017ffffc0000000 dead000000000100 dead000000000122 0000000000000000
  raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
  page dumped because: nonzero entire_mapcount
  Modules linked in: ...
  CPU: 3 UID: 0 PID: 2506 Comm: kworker/u16:1 Kdump: loaded Tainted: G    B
  Hardware name: ...
  Workqueue: btrfs-delalloc btrfs_work_helper
  Call Trace:
   <TASK>
   dump_stack_lvl+0x5d/0x80
   bad_page+0x71/0xd0
   free_unref_page_prepare+0x24e/0x490
   free_unref_page+0x60/0x170
   crypto_acomp_free_streams+0x5d/0xc0
   crypto_acomp_exit_tfm+0x23/0x50
   crypto_destroy_tfm+0x60/0xc0
   ...

Change the lifecycle management of zstd_streams to free the streams only
once during module cleanup.

Fixes: f5ad93ffb5 ("crypto: zstd - convert to acomp")
Cc: stable@vger.kernel.org
Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Reviewed-by: Suman Kumar Chakraborty <suman.kumar.chakraborty@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-12-01 18:05:07 +08:00
Frederic Weisbecker
9a08942f17 Merge branch 'rcu/misc' into next
- In order to prepare the layout for nohz_full work deferral to
  user exit, the context tracking state must shrink the counter
  of transitions to/from RCU not watching. The only possible hazard
  is to trigger wrap-around more easily, delaying a bit grace periods
  when that happens. This should be a rare event though. Yet add
  debugging and torture code to test that assumption.

- Fix memory leak on locktorture module

- Annotate accesses in rculist_nulls.h to prevent from KCSAN warnings.
  On recent discussions, we also concluded that all those WRITE_ONCE()
  and READ_ONCE() on list APIs deserve appropriate comments. Something
  to be expected for the next cycle.

- Provide a script to apply several configs to several commits with torture.

- Allow torture to reuse a build directory in order to save needless
  rebuild time.

- Various cleanups.
2025-11-30 22:20:33 +01:00
Jarkko Sakkinen
62cd5d480b KEYS: trusted: Fix a memory leak in tpm2_load_cmd
'tpm2_load_cmd' allocates a tempoary blob indirectly via 'tpm2_key_decode'
but it is not freed in the failure paths. Address this by wrapping the blob
into with a cleanup helper.

Cc: stable@vger.kernel.org # v5.13+
Fixes: f221974525 ("security: keys: trusted: use ASN.1 TPM2 key format for the blobs")
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
2025-11-29 22:57:30 +02:00
Jarkko Sakkinen
127fa2ae9e KEYS: trusted: Replace a redundant instance of tpm2_hash_map
'trusted_tpm2' duplicates 'tpm2_hash_map' originally part of the TPN
driver, which is suboptimal.

Implement and export `tpm2_find_hash_alg()` in the driver, and substitute
the redundant code in 'trusted_tpm2' with a call to the new function.

Reviewed-by: Jonathan McDowell <noodles@meta.com>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
2025-11-29 22:57:30 +02:00
Mauro Carvalho Chehab
464257baf9 docs: makefile: move rustdoc check to the build wrapper
The makefile logic to detect if rust is enabled is not working
the way it was expected: instead of using the current setup
for CONFIG_RUST, it uses a cached version from a previous build.

The root cause is that the current logic inside docs/Makefile
uses a cached version of CONFIG_RUST, from the last time a non
documentation target was executed. That's perfectly fine for
Sphinx build, as it doesn't need to read or depend on any
CONFIG_*.

So, instead of relying at the cache, move the logic to the
wrapper script and let it check the current content of .config,
to verify if CONFIG_RUST was selected.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <c06b1834ef02099735c13ee1109fa2a2b9e47795.1763722971.git.mchehab+huawei@kernel.org>
2025-11-29 08:42:53 -07:00
Sasha Levin
b9a565b3e4 README: restructure with role-based documentation and guidelines
Reorganize README to provide targeted documentation paths for different user
roles including developers, researchers, security experts, and maintainers.

Add quick start section and essential docs links.

Signed-off-by: Sasha Levin <sashal@kernel.org>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Reviewed-by: Kees Cook <kees@kernel.org>
Reviewed-by: SeongJae Park <sj@kernel.org>
Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <20251121180009.2634393-1-sashal@kernel.org>
2025-11-29 08:40:33 -07:00
Randy Dunlap
5f88f44d84 docs: kdoc: various fixes for grammar, spelling, punctuation
Correct grammar, spelling, and punctuation in comments, strings,
print messages, logs.

Change two instances of two spaces between words to just one space.

codespell was used to find misspelled words.

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: linux-doc@vger.kernel.org
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <20251124041011.3030571-1-rdunlap@infradead.org>
2025-11-29 08:35:23 -07:00
Randy Dunlap
18182f9758 docs: kdoc_parser: use '@' for Excess enum value
kdoc is looking for "@value" here, so use that kind of string in the
warning message. The "%value" can be confusing.

This changes:
Warning: drivers/net/wireless/mediatek/mt76/testmode.h:92 Excess enum value '%MT76_TM_ATTR_TX_PENDING' description in 'mt76_testmode_attr'

to this:
Warning: drivers/net/wireless/mediatek/mt76/testmode.h:92 Excess enum value '@MT76_TM_ATTR_TX_PENDING' description in 'mt76_testmode_attr'

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <20251126061752.3497106-1-rdunlap@infradead.org>
2025-11-29 08:31:17 -07:00
Krzysztof Kozlowski
e36a7b1e17 docs: submitting-patches: Clarify that removal of Acks needs explanation too
The paragraph mentions only removal of Tested-by and Reviewed-by tags as
action needing mentioning in patch changelog, so some developers treat
it too literally.  Acks, as a weaker form of review/approval, should
rarely be removed, but if that happens it should be explained as well.

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <20251126081905.7684-2-krzysztof.kozlowski@oss.qualcomm.com>
2025-11-29 08:30:54 -07:00
Randy Dunlap
2006f468bb docs: kdoc_parser: add data/function attributes to ignore
Recognize and ignore __rcu (in struct members), __private (in struct
members), and __always_unused (in function parameters) to prevent
kernel-doc warnings:

  Warning: include/linux/rethook.h:38 struct member 'void (__rcu *handler' not described in 'rethook'
  Warning: include/linux/hrtimer_types.h:47 Invalid param: enum hrtimer_restart (*__private function)(struct hrtimer *)
  Warning: security/ipe/hooks.c:81 function parameter '__always_unused' not described in 'ipe_mmap_file'
  Warning: security/ipe/hooks.c:109 function parameter '__always_unused' not described in 'ipe_file_mprotect'

There are more of these (in compiler_types.h, compiler_attributes.h)
that can be added as needed.

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <20251127063117.150384-1-rdunlap@infradead.org>
2025-11-29 08:18:30 -07:00
Randy Dunlap
4d23db5b24 docs: MAINTAINERS: update Mauro's files/paths
Update Mauro's F: (files/paths) entry for docs scripts so that
get_maintainer.pl will report these correctly.

This is copied from Jon's entries for these paths.

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <20251127063125.150441-1-rdunlap@infradead.org>
2025-11-29 08:17:27 -07:00
Jonathan Corbet
0dd5cf95b7 Merge tag 'Chinese-docs-6.19' of gitolite.kernel.org:pub/scm/linux/kernel/git/alexs/linux into tmp
Chinese translation docs for 6.19

This is the Chinese translation subtree for 6.19. It includes
the following changes:
        - Add block part translations
        - Update kbuild.rst translations
        - Add more scsi translations and fixes

Above patches are tested by 'make htmldocs'

Signed-off-by: Alex Shi <alexs@kernel.org>
2025-11-29 08:09:58 -07:00
Frederic Weisbecker
a50413848f Merge branch 'rcu/refscale' into next
Add performance testing for common context synchronizations
(Preemption, IRQ, Softirq) and per-cpu increments. Those are
relevant comparisons against SRCU-fast read side APIs, especially
as they are planned to synchronize further tracing fast-path code.
2025-11-28 23:30:38 +01:00
Frederic Weisbecker
6fcc739a04 Merge branch 'rcu/srcu' into next
_ Properly handle SRCU readers within IRQ disabled sections in tiny SRCU

- Preparation to reimplement RCU Tasks Trace on top of SRCU fast:

    - Introduce API to expedite a grace period and test it through
      rcutorture.

    - Split srcu-fast in two flavours: SRCU-fast and SRCU-fast-updown.
      Both are still targeted toward faster readers (without full
      barriers on LOCK and UNLOCK) at the expense of heavier write side
      (using full RCU grace period ordering instead of simply full
      ordering) as compared to "traditional" non-fast SRCU. But those
      srcu-fast flavours are going to be optimized in two different
      ways:

         - SRCU-fast will become the reimplementation basis for
           RCU-TASK-TRACE for consolidation. Since RCU-TASK-TRACE must
           be NMI safe, SRCU-fast must be as well.

         - SRCU-fast-updown will be needed for uretprobes code in order
           to get rid of the read-side memory barriers while still
           allowing entering the reader at task level while exiting it
           in a timer handler. It is considered semaphore-like in that
           it can have different owners between LOCK and UNLOCK.
           However it is not NMI-safe.

      The actual optimizations are work in progress for the next cycle.
      Only the new interfaces are added for now, along with related
      torture and scalability test code.

- Create/document/debug/torture new proper initializers for RCU fast:
   DEFINE_SRCU_FAST() and init_srcu_struct_fast()

   This allows for using right away the proper ordering on the write
   side (either full ordering or full RCU grace period ordering) without
   waiting for the read side to tell which to use. Also this optimizes
   the read side altogether with moving flavour debug checks to debug
   config and with removing a costly RmW operation on their first call.

- Make some diagnostic functions tracing safe.
2025-11-28 23:25:02 +01:00
Paul E. McKenney
bfad33230a refscale: Add SRCU-fast-updown readers
This commit adds refscale readers based on srcu_read_lock_fast_updown()
and srcu_read_unlock_fast_updown()
("refscale.scale_type=srcu-fast-updown"). On my x86 laptop, these are
about 2.2ns per pair.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: <bpf@vger.kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
2025-11-28 15:23:17 +01:00
Thorsten Blum
8c8e3df3d2 keys: Fix grammar and formatting in 'struct key_type' comments
s/it/if/ and s/revokation/revocation/, capitalize "clear", and add a
period after the sentence. Fix the comment formatting.

Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
2025-11-27 23:56:58 +02:00
Thorsten Blum
a0a76e3f8d keys: Replace deprecated strncpy in ecryptfs_fill_auth_tok
strncpy() is deprecated for NUL-terminated destination buffers; use
strscpy_pad() instead to retain the NUL-padding behavior of strncpy().

The destination buffer is initialized using kzalloc() with a 'signature'
size of ECRYPTFS_PASSWORD_SIG_SIZE + 1. strncpy() then copies up to
ECRYPTFS_PASSWORD_SIG_SIZE bytes from 'key_desc', NUL-padding any
remaining bytes if needed, but expects the last byte to be zero.

strscpy_pad() also copies the source string to 'signature', and NUL-pads
the destination buffer if needed, but ensures it's always NUL-terminated
without relying on it being zero-initialized.

strscpy_pad() automatically determines the size of the fixed-length
destination buffer via sizeof() when the optional size argument is
omitted, making an explicit size unnecessary.

In encrypted_init(), the source string 'key_desc' is validated by
valid_ecryptfs_desc() before calling ecryptfs_fill_auth_tok(), and is
therefore NUL-terminated and satisfies the __must_be_cstr() requirement
of strscpy_pad().

Link: https://github.com/KSPP/linux/issues/90
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Reviewed-by: Kees Cook <kees@kernel.org>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
2025-11-27 23:50:20 +02:00
Thorsten Blum
58b46219bf keys: Remove redundant less-than-zero checks
The local variables 'size_t datalen' are unsigned and cannot be less
than zero. Remove the redundant conditions.

Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Reviewed-by: Mimi Zohar <zohar@linux.ibm.com>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
2025-11-27 23:50:20 +02:00
Marcos Paulo de Souza
466348abb0 printk: Use console_is_usable on console_unblank
The macro for_each_console_srcu iterates over all registered consoles. It's
implied that all registered consoles have CON_ENABLED flag set, making
the check for the flag unnecessary. Call console_is_usable function to
fully verify if the given console is usable before calling the ->unblank
callback.

Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Link: https://patch.msgid.link/20251121-printk-cleanup-part2-v2-3-57b8b78647f4@suse.com
Signed-off-by: Petr Mladek <pmladek@suse.com>
2025-11-27 15:54:50 +01:00
Marcos Paulo de Souza
4c70ab110b arch: um: kmsg_dump: Use console_is_usable
All consoles found on for_each_console are registered, meaning that all
of them have the CON_ENABLED flag set. Since NBCON was introduced it's
important to check if a given console also implements the NBCON callbacks.
The function console_is_usable does exactly that.

Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Link: https://patch.msgid.link/20251121-printk-cleanup-part2-v2-2-57b8b78647f4@suse.com
Signed-off-by: Petr Mladek <pmladek@suse.com>
2025-11-27 15:54:50 +01:00
Marcos Paulo de Souza
822e2bb0d6 drivers: serial: kgdboc: Drop checks for CON_ENABLED and CON_BOOT
The original code tried to find a console that has CON_BOOT _or_
CON_ENABLED flag set. The flag CON_ENABLED is set to all registered
consoles, so in this case this check is always true, even for the
CON_BOOT consoles.

The initial intent of the kgdboc_earlycon_init was to get a console
early (CON_BOOT) or later on in the process (CON_ENABLED). The
code was using for_each_console macro, meaning that all console structs
were previously registered on the printk() machinery. At this point,
any console found on for_each_console is safe for kgdboc_earlycon_init
to use.

Dropping the check makes the code cleaner, and avoids further confusion
by future readers of the code.

Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Link: https://patch.msgid.link/20251121-printk-cleanup-part2-v2-1-57b8b78647f4@suse.com
Signed-off-by: Petr Mladek <pmladek@suse.com>
2025-11-27 15:54:50 +01:00
Paul E. McKenney
81f00c462e refscale: Exercise DEFINE_STATIC_SRCU_FAST() and init_srcu_struct_fast()
This commit updates the initialization for the "srcu-fast" scale
type to use DEFINE_STATIC_SRCU_FAST() when reader_flavor is equal to
SRCU_READ_FLAVOR_FAST.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: <bpf@vger.kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
2025-11-27 14:22:41 +01:00
Paul E. McKenney
609460a6db rcutorture: Make srcu{,d}_torture_init() announce the SRCU type
This commit causes rcutorture's srcu_torture_init() and
srcud_torture_init() functions to announce on the console log
which variant of SRCU is being tortured, for example: "torture:
srcud_torture_init fast SRCU".

[ paulmck: Apply feedback from kernel test robot. ]

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
2025-11-27 14:22:40 +01:00
Paul E. McKenney
d3f52f53a5 srcu: Create an SRCU-fast-updown API
This commit creates an SRCU-fast-updown API, including
DEFINE_SRCU_FAST_UPDOWN(), DEFINE_STATIC_SRCU_FAST_UPDOWN(),
__init_srcu_struct_fast_updown(), init_srcu_struct_fast_updown(),
srcu_read_lock_fast_updown(), srcu_read_unlock_fast_updown(),
__srcu_read_lock_fast_updown(), and __srcu_read_unlock_fast_updown().

These are initially identical to their SRCU-fast counterparts, but both
SRCU-fast and SRCU-fast-updown will be optimized in different directions
by later commits. SRCU-fast will lack any sort of srcu_down_read() and
srcu_up_read() APIs, which will enable extremely efficient NMI safety.
For its part, SRCU-fast-updown will not be NMI safe, which will enable
reasonably efficient implementations of srcu_down_read_fast() and
srcu_up_read_fast().

This API fork happens to meet two different future use cases.

* SRCU-fast will become the reimplementation basis for RCU-TASK-TRACE
  for consolidation. Since RCU-TASK-TRACE must be NMI safe, SRCU-fast
  must be as well.

* SRCU-fast-updown will be needed for uretprobes code in order to get
  rid of the read-side memory barriers while still allowing entering the
  reader at task level while exiting it in a timer handler.

This commit also adds rcutorture tests for the new APIs.  This
(annoyingly) needs to be in the same commit for bisectability.  With this
commit, the 0x8 value tests SRCU-fast-updown.  However, most SRCU-fast
testing will be via the RCU Tasks Trace wrappers.

[ paulmck: Apply s/0x8/0x4/ missing change per Boqun Feng feedback. ]
[ paulmck: Apply Akira Yokosawa feedback. ]

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: <bpf@vger.kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
2025-11-27 14:22:31 +01:00
Sascha Hauer
2a9c8c0b59 kbuild: add target to build a cpio containing modules
Add a new package target to build a cpio archive containing the kernel
modules. This is particularly useful to supplement an existing initramfs
with the kernel modules so that the root filesystem can be started with
all needed kernel modules without modifying it.

Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Reviewed-by: Simon Glass <sjg@chromium.org>
Tested-by: Simon Glass <sjg@chromium.org>
Co-developed-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
Reviewed-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Nicolas Schier <nsc@kernel.org>
Link: https://patch.msgid.link/20251125-cpio-modules-pkg-v2-2-aa8277d89682@pengutronix.de
Signed-off-by: Nicolas Schier <nsc@kernel.org>
2025-11-26 21:56:14 +01:00
Ahmad Fatoum
c83c9564cd initramfs: add gen_init_cpio to hostprogs unconditionally
gen_init_cpio is currently only needed when an initramfs cpio archive is
to be created out of CONFIG_INITRAMFS_SOURCE's contents. In other cases,
it's not added to hostprogs and no make target is available.

In preparation to use the host program from Makefile.package, define it
unconditionally. The program will still only be built as needed.

Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
Reviewed-by: Simon Glass <sjg@chromium.org>
Reviewed-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Nicolas Schier <nsc@kernel.org>
Link: https://patch.msgid.link/20251125-cpio-modules-pkg-v2-1-aa8277d89682@pengutronix.de
Signed-off-by: Nicolas Schier <nsc@kernel.org>
2025-11-26 21:55:40 +01:00
Yujie Zhang
f12ae9ba4d docs/zh_CN: Add wd719x.rst translation
Translate .../scsi/wd719x.rst into Chinese.
Add wd719x into .../scsi/index.rst.

Update the translation through commit 40ee63091a
("scsi: docs: convert wd719x.txt to ReST")

Signed-off-by: Yujie Zhang <yjzhang@leap-io-kernel.com>
Signed-off-by: Alex Shi <alexs@kernel.org>
2025-11-26 14:40:12 +08:00
Yujie Zhang
56a248e7bc docs/zh_CN: Add libsas.rst translation
Translate .../scsi/libsas.rst into Chinese.
Add libsas into .../scsi/index.rst.

Update the translation through commit 25882c82f8
("scsi: libsas: Delete lldd_clear_aca callback")

Signed-off-by: Yujie Zhang <yjzhang@leap-io-kernel.com>
Signed-off-by: Alex Shi <alexs@kernel.org>
2025-11-26 14:40:06 +08:00
Vlastimil Babka
a8ec08bf32 Merge branch 'slab/for-6.19/mempool_alloc_bulk' into slab/for-next
Merges series "mempool_alloc_bulk and various mempool improvements v3"
from Christoph Hellwig.

From the cover letter [1]:

This series adds a bulk version of mempool_alloc that makes allocating
multiple objects deadlock safe.

The initial users is the blk-crypto-fallback code:

  https://lore.kernel.org/linux-block/20251031093517.1603379-1-hch@lst.de/

with which v1 was posted, but I also have a few other users in mind.

Link: https://lore.kernel.org/all/20251113084022.1255121-1-hch@lst.de/ [1]
2025-11-25 14:38:41 +01:00
Vlastimil Babka
ed80cc758b Merge branch 'slab/for-6.19/freelist_aba_t_cleanups' into slab/for-next
Merge series "slab: cmpxchg cleanups enabled by -fms-extensions"

From the cover letter [1]:

After learning about -fms-extensions being enabled for 6.19, I realized
there is some cleanup potential in slub code by extending the definition
and usage of freelist_aba_t, as it can now become an unnamed member of
struct slab. This series performs the cleanup, with no functional
changes intended. Additionally we turn freelist_aba_t to struct
freelist_counters as it doesn't meet any criteria for being a typedef,
per Documentation/process/coding-style.rst

Based on the tag kbuild-ms-extensions-6.19 from
git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linuxV

Link: https://lore.kernel.org/all/20251107-slab-fms-cleanup-v1-0-650b1491ac9e@suse.cz/#t [1]
2025-11-25 14:35:33 +01:00
Vlastimil Babka
e5d7764e13 Merge branch 'slab/for-6.19/memdesc_prep' into slab/for-next
Merge series "Prepare slab for memdescs" by Matthew Wilcox.

From the cover letter [1]:

When we separate struct folio, struct page and struct slab from each
other, converting to folios then to slabs will be nonsense.  It made
sense under the 'folio is just a head page' interpretation, but with
full separation, page_folio() will return NULL for a page which belongs
to a slab.

This patch series removes almost all mentions of folio from slab.
There are a few folio_test_slab() invocations left around the tree that
I haven't decided how to handle yet.  We're not yet quite at the point
of separately allocating struct slab, but that's what I'll be working
on next.

Link: https://lore.kernel.org/all/20251113000932.1589073-1-willy@infradead.org/ [1]
2025-11-25 14:33:14 +01:00
Vlastimil Babka
3065c20d5d Merge branch 'slab/for-6.19/sheaves_cleanups' into slab/for-next
Merge series "slab: preparatory cleanups before adding sheaves to all
caches" [1]

Cleanups that were written as part of the full sheaves conversion, which
is not fully ready yet, but they are useful on their own.

Link: https://lore.kernel.org/all/20251105-sheaves-cleanups-v1-0-b8218e1ac7ef@suse.cz/ [1]
2025-11-25 14:27:34 +01:00
Matthew Wilcox (Oracle)
b55590558f slab: Remove unnecessary call to compound_head() in alloc_from_pcs()
Each page knows which node it belongs to, so there's no need to
convert to a folio.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Link: https://patch.msgid.link/20251124142329.1691780-1-willy@infradead.org
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2025-11-25 14:13:34 +01:00
Miguel Ojeda
54e3eae855 Merge patch series "syn support"
This patch series introduces support for `syn` (and its dependencies):

    Syn is a parsing library for parsing a stream of Rust tokens into a
    syntax tree of Rust source code.

    Currently this library is geared toward use in Rust procedural
    macros, but contains some APIs that may be useful more generally.

It is the most downloaded Rust crate (according to crates.io), and it
is also used by the Rust compiler itself. Having such support allows to
greatly simplify writing complex macros such as `pin-init`. We will use
it in the `macros` crate too.

Benno has already prepared the `pin-init` version based on this, and on
top of that, we will be able to simplify the `macros` crate too. I think
Jesung is working on updating the `TryFrom` and `Into` upcoming derive
macros to use `syn` too.

The series starts with a few preparation commits (two fixes were already
merged in mainline that were discovered by this series), then each crate
is added. Finally, support for using the new crates from our `macros`
crate is introduced.

This has been a long time coming, e.g. even before Rust for Linux was
merged into the Linux kernel, Gary and Benno have wanted to use `syn`.
The first iterations of this, from 2022 and 2023 (with `serde` too,
another popular crate), are at:

    https://github.com/Rust-for-Linux/linux/pull/910
    https://github.com/Rust-for-Linux/linux/pull/1007

After those, we considered picking these from the distributions where
possible. However, after discussing it, it is not really worth the
complexity: vendoring makes things less complex and is less fragile.

In particular, we avoid having to support and test several versions,
we avoid having to introduce Cargo just to properly fetch the right
versions from the registry, we can easily customize the crates if needed
(e.g. dropping the `unicode_idents` dependency like it is done in this
series) and we simplify the configuration of the build for users for
which the "default" paths/registries would not have worked.

Moreover, nowadays, the ~57k lines introduced are not that much compared
to years ago (it dwarfed the actual Rust kernel code). Moreover, back
then it wasn't clear the Rust experiment would be a success, so it would
have been a bit pointless/risky to add many lines for nothing. Our macro
needs were also smaller in the early days.

So, finally, in Kangrejos 2025 we discussed going with the original,
simpler approach. Thus here it is the result.

There should not be many updates needed for these, and even if there
are, they should not be too big, e.g. +7k -3k lines across the 3 crates
in the last year.

Note that `syn` does not have all the features enabled, since we do not
need them so far, but they can easily be enabled just adding them to the
list.

Link: https://patch.msgid.link/20251124151837.2184382-1-ojeda@kernel.org
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2025-11-24 17:48:16 +01:00
Miguel Ojeda
52ba807f1a rust: macros: support proc-macro2, quote and syn
One of the two main uses cases for adding `proc-macro2`, `quote` and
`syn` is the `macros` crates (and the other `pin-init`).

Thus add the support for the crates in `macros` already.

Tested-by: Jesung Yang <y.j3ms.n@gmail.com>
Link: https://patch.msgid.link/20251124151837.2184382-21-ojeda@kernel.org
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2025-11-24 17:15:50 +01:00
Miguel Ojeda
737401751a rust: syn: enable support in kbuild
With all the new files in place and ready from the new crate, enable
the support for it in the build system.

Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Reviewed-by: Gary Guo <gary@garyguo.net>
Tested-by: Gary Guo <gary@garyguo.net>
Tested-by: Jesung Yang <y.j3ms.n@gmail.com>
Link: https://patch.msgid.link/20251124151837.2184382-20-ojeda@kernel.org
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2025-11-24 17:15:48 +01:00
Miguel Ojeda
1112ba8655 rust: syn: add README.md
Originally, when the Rust upstream `alloc` standard library crate was
vendored in commit 057b8d2571 ("rust: adapt `alloc` crate to the
kernel"), a `README.md` file was added to explain the provenance and
licensing of the source files.

Thus do the same for the `syn` crate.

Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Reviewed-by: Gary Guo <gary@garyguo.net>
Tested-by: Gary Guo <gary@garyguo.net>
Tested-by: Jesung Yang <y.j3ms.n@gmail.com>
Link: https://patch.msgid.link/20251124151837.2184382-19-ojeda@kernel.org
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2025-11-24 17:15:47 +01:00
Miguel Ojeda
a3ee13024c rust: syn: remove unicode-ident dependency
The `syn` crate depends on the `unicode-ident` crate to determine whether
characters have the XID_Start or XID_Continue properties according to
Unicode Standard Annex #31.

However, we only need ASCII identifiers in the kernel, thus we can
simplify the check and remove completely that dependency.

Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Reviewed-by: Gary Guo <gary@garyguo.net>
Tested-by: Gary Guo <gary@garyguo.net>
Tested-by: Jesung Yang <y.j3ms.n@gmail.com>
Link: https://patch.msgid.link/20251124151837.2184382-18-ojeda@kernel.org
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2025-11-24 17:15:46 +01:00
Miguel Ojeda
69942c0a89 rust: syn: add SPDX License Identifiers
Originally, when the Rust upstream `alloc` standard library crate was
vendored in commit 057b8d2571 ("rust: adapt `alloc` crate to the
kernel"), the SPDX License Identifiers were added to every file so that
the license on those was clear.

Thus do the same for the `syn` crate.

This makes `scripts/spdxcheck.py` pass.

Reviewed-by: Gary Guo <gary@garyguo.net>
Tested-by: Gary Guo <gary@garyguo.net>
Tested-by: Jesung Yang <y.j3ms.n@gmail.com>
Link: https://patch.msgid.link/20251124151837.2184382-17-ojeda@kernel.org
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2025-11-24 17:15:45 +01:00
Miguel Ojeda
808c999fc9 rust: syn: import crate
This is a subset of the Rust `syn` crate, version 2.0.106 (released
2025-08-16), licensed under "Apache-2.0 OR MIT", from:

    https://github.com/dtolnay/syn/raw/2.0.106/src

The files are copied as-is, with no modifications whatsoever (not even
adding the SPDX identifiers).

For copyright details, please see:

    https://github.com/dtolnay/syn/blob/2.0.106/README.md#license
    https://github.com/dtolnay/syn/blob/2.0.106/LICENSE-APACHE
    https://github.com/dtolnay/syn/blob/2.0.106/LICENSE-MIT

The next two patches modify these files as needed for use within the
kernel. This patch split allows reviewers to double-check the import
and to clearly see the differences introduced.

The following script may be used to verify the contents:

    for path in $(cd rust/syn/ && find . -type f -name '*.rs'); do
        curl --silent --show-error --location \
            https://github.com/dtolnay/syn/raw/2.0.106/src/$path \
            | diff --unified rust/syn/$path - && echo $path: OK
    done

Reviewed-by: Gary Guo <gary@garyguo.net>
Tested-by: Gary Guo <gary@garyguo.net>
Tested-by: Jesung Yang <y.j3ms.n@gmail.com>
Link: https://patch.msgid.link/20251124151837.2184382-16-ojeda@kernel.org
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2025-11-24 17:15:44 +01:00
Miguel Ojeda
88de91cc1c rust: quote: enable support in kbuild
With all the new files in place and ready from the new crate, enable
the support for it in the build system.

Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Reviewed-by: Gary Guo <gary@garyguo.net>
Tested-by: Gary Guo <gary@garyguo.net>
Tested-by: Jesung Yang <y.j3ms.n@gmail.com>
Link: https://patch.msgid.link/20251124151837.2184382-15-ojeda@kernel.org
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2025-11-24 17:15:43 +01:00
Miguel Ojeda
51177f023c rust: quote: add README.md
Originally, when the Rust upstream `alloc` standard library crate was
vendored in commit 057b8d2571 ("rust: adapt `alloc` crate to the
kernel"), a `README.md` file was added to explain the provenance and
licensing of the source files.

Thus do the same for the `quote` crate.

Reviewed-by: Gary Guo <gary@garyguo.net>
Tested-by: Gary Guo <gary@garyguo.net>
Tested-by: Jesung Yang <y.j3ms.n@gmail.com>
Link: https://patch.msgid.link/20251124151837.2184382-14-ojeda@kernel.org
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2025-11-24 17:15:43 +01:00
Miguel Ojeda
ddfa1b279d rust: quote: add SPDX License Identifiers
Originally, when the Rust upstream `alloc` standard library crate was
vendored in commit 057b8d2571 ("rust: adapt `alloc` crate to the
kernel"), the SPDX License Identifiers were added to every file so that
the license on those was clear.

Thus do the same for the `quote` crate.

This makes `scripts/spdxcheck.py` pass.

Reviewed-by: Gary Guo <gary@garyguo.net>
Tested-by: Gary Guo <gary@garyguo.net>
Tested-by: Jesung Yang <y.j3ms.n@gmail.com>
Link: https://patch.msgid.link/20251124151837.2184382-13-ojeda@kernel.org
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2025-11-24 17:15:42 +01:00
Miguel Ojeda
a4851eeef3 rust: quote: import crate
This is a subset of the Rust `quote` crate, version 1.0.40 (released
2025-03-12), licensed under "Apache-2.0 OR MIT", from:

    https://github.com/dtolnay/quote/raw/1.0.40/src

The files are copied as-is, with no modifications whatsoever (not even
adding the SPDX identifiers).

For copyright details, please see:

    https://github.com/dtolnay/quote/blob/1.0.40/README.md#license
    https://github.com/dtolnay/quote/blob/1.0.40/LICENSE-APACHE
    https://github.com/dtolnay/quote/blob/1.0.40/LICENSE-MIT

The next patch modifies these files as needed for use within the
kernel. This patch split allows reviewers to double-check the import
and to clearly see the differences introduced.

The following script may be used to verify the contents:

    for path in $(cd rust/quote/ && find . -type f -name '*.rs'); do
        curl --silent --show-error --location \
            https://github.com/dtolnay/quote/raw/1.0.40/src/$path \
            | diff --unified rust/quote/$path - && echo $path: OK
    done

Reviewed-by: Gary Guo <gary@garyguo.net>
Tested-by: Gary Guo <gary@garyguo.net>
Tested-by: Jesung Yang <y.j3ms.n@gmail.com>
Link: https://patch.msgid.link/20251124151837.2184382-12-ojeda@kernel.org
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2025-11-24 17:15:41 +01:00
Miguel Ojeda
158a3b7211 rust: proc-macro2: enable support in kbuild
With all the new files in place and ready from the new crate, enable
the support for it in the build system.

`proc_macro_byte_character` and `proc_macro_c_str_literals` were
stabilized in Rust 1.79.0 [1] and were implemented earlier than our
minimum Rust version (1.78) [2][3]. Thus just enable them instead of using
the `cfg` that `proc-macro2` uses to emulate them in older compilers.

In addition, skip formatting for this vendored crate and take the chance
to add a comment mentioning this.

Link: https://github.com/rust-lang/rust/pull/123431 [1]
Link: https://github.com/rust-lang/rust/pull/112711 [2]
Link: https://github.com/rust-lang/rust/pull/119651 [3]
Reviewed-by: Gary Guo <gary@garyguo.net>
Tested-by: Gary Guo <gary@garyguo.net>
Tested-by: Jesung Yang <y.j3ms.n@gmail.com>
Link: https://patch.msgid.link/20251124151837.2184382-11-ojeda@kernel.org
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2025-11-24 17:15:41 +01:00
Miguel Ojeda
bc1565efc3 rust: proc-macro2: add README.md
Originally, when the Rust upstream `alloc` standard library crate was
vendored in commit 057b8d2571 ("rust: adapt `alloc` crate to the
kernel"), a `README.md` file was added to explain the provenance and
licensing of the source files.

Thus do the same for the `proc-macro2` crate.

Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Reviewed-by: Gary Guo <gary@garyguo.net>
Tested-by: Gary Guo <gary@garyguo.net>
Tested-by: Jesung Yang <y.j3ms.n@gmail.com>
Link: https://patch.msgid.link/20251124151837.2184382-10-ojeda@kernel.org
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2025-11-24 17:15:40 +01:00
Miguel Ojeda
c2af0e5f02 rust: proc-macro2: remove unicode_ident dependency
The `proc-macro2` crate depends on the `unicode-ident` crate to determine
whether characters have the XID_Start or XID_Continue properties according
to Unicode Standard Annex #31.

However, we only need ASCII identifiers in the kernel, thus we can
simplify the check and remove completely that dependency.

Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Reviewed-by: Gary Guo <gary@garyguo.net>
Tested-by: Gary Guo <gary@garyguo.net>
Tested-by: Jesung Yang <y.j3ms.n@gmail.com>
Link: https://patch.msgid.link/20251124151837.2184382-9-ojeda@kernel.org
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2025-11-24 17:15:39 +01:00
Miguel Ojeda
a9acfceb96 rust: proc-macro2: add SPDX License Identifiers
Originally, when the Rust upstream `alloc` standard library crate was
vendored in commit 057b8d2571 ("rust: adapt `alloc` crate to the
kernel"), the SPDX License Identifiers were added to every file so that
the license on those was clear.

Thus do the same for the `proc-macro2` crate.

This makes `scripts/spdxcheck.py` pass.

Reviewed-by: Gary Guo <gary@garyguo.net>
Tested-by: Gary Guo <gary@garyguo.net>
Tested-by: Jesung Yang <y.j3ms.n@gmail.com>
Link: https://patch.msgid.link/20251124151837.2184382-8-ojeda@kernel.org
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2025-11-24 17:15:39 +01:00
Miguel Ojeda
3a8b546a27 rust: proc-macro2: import crate
This is a subset of the Rust `proc-macro2` crate, version 1.0.101
(released 2025-08-16), licensed under "Apache-2.0 OR MIT", from:

    https://github.com/dtolnay/proc-macro2/raw/1.0.101/src

The files are copied as-is, with no modifications whatsoever (not even
adding the SPDX identifiers).

For copyright details, please see:

    https://github.com/dtolnay/proc-macro2/blob/1.0.101/README.md#license
    https://github.com/dtolnay/proc-macro2/blob/1.0.101/LICENSE-APACHE
    https://github.com/dtolnay/proc-macro2/blob/1.0.101/LICENSE-MIT

The next two patches modify these files as needed for use within the
kernel. This patch split allows reviewers to double-check the import
and to clearly see the differences introduced.

The following script may be used to verify the contents:

    for path in $(cd rust/proc-macro2/ && find . -type f -name '*.rs'); do
        curl --silent --show-error --location \
            https://github.com/dtolnay/proc-macro2/raw/1.0.101/src/$path \
            | diff --unified rust/proc-macro2/$path - && echo $path: OK
    done

Reviewed-by: Gary Guo <gary@garyguo.net>
Tested-by: Gary Guo <gary@garyguo.net>
Tested-by: Jesung Yang <y.j3ms.n@gmail.com>
Link: https://patch.msgid.link/20251124151837.2184382-7-ojeda@kernel.org
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2025-11-24 17:15:38 +01:00
Miguel Ojeda
c46b34f1d4 rust: kbuild: support using libraries in rustc_procmacro
Proc macros such as `macros` and `pin-init` will need the ability to use
libraries such as `syn` (added later) in the `rustc_procmacro` command.

Thus add the support for it.

Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Reviewed-by: Gary Guo <gary@garyguo.net>
Tested-by: Gary Guo <gary@garyguo.net>
Tested-by: Jesung Yang <y.j3ms.n@gmail.com>
Link: https://patch.msgid.link/20251124151837.2184382-6-ojeda@kernel.org
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2025-11-24 17:15:37 +01:00
Miguel Ojeda
d4e7307b1f rust: kbuild: support skipping flags in rustc_test_library
Crates like `quote` (added later) will need the ability to skip flags
in the `rustc_test_library` command.

Thus add the support for it.

Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Reviewed-by: Gary Guo <gary@garyguo.net>
Tested-by: Gary Guo <gary@garyguo.net>
Tested-by: Jesung Yang <y.j3ms.n@gmail.com>
Link: https://patch.msgid.link/20251124151837.2184382-5-ojeda@kernel.org
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2025-11-24 17:15:36 +01:00
Miguel Ojeda
7dbe46c0b1 rust: kbuild: add proc macro library support
Add the proc macro library rule that produces `.rlib` files to be used
by proc macros such as the `macros` crate.

Reviewed-by: Gary Guo <gary@garyguo.net>
Tested-by: Gary Guo <gary@garyguo.net>
Tested-by: Jesung Yang <y.j3ms.n@gmail.com>
Link: https://patch.msgid.link/20251124151837.2184382-4-ojeda@kernel.org
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2025-11-24 17:15:36 +01:00
Miguel Ojeda
1181c97442 rust: kbuild: simplify --cfg handling
We need to handle `cfg`s in both `rustc` and `rust-analyzer`, and in
future commits some of those contain double quotes, which complicates
things further.

Thus, instead of removing the `--cfg ` part in the rust-analyzer
generation script, have the `*-cfgs` variables contain just the actual
`cfg`, and use that to generate the actual flags in `*-flags`.

Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Reviewed-by: Gary Guo <gary@garyguo.net>
Tested-by: Gary Guo <gary@garyguo.net>
Tested-by: Jesung Yang <y.j3ms.n@gmail.com>
Link: https://patch.msgid.link/20251124151837.2184382-3-ojeda@kernel.org
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2025-11-24 17:15:35 +01:00
Miguel Ojeda
46e58a9637 rust: kbuild: introduce core-flags and core-skip_flags
In the next commits we are introducing `*-{cfgs,skip_flags,flags}`
variables for other crates.

Thus do so here for `core`, which simplifies a bit the `Makefile`
(including the next commit) and makes it more consistent.

This means we stop passing `-Wrustdoc::unescaped_backticks` to `rustc`
and `-Wunreachable_pub` to `rustdoc`, i.e. we skip more, which is fine
since it shouldn't have an effect.

In addition, use `:=` for `core-cfgs` to make it consistent with the
upcoming additions.

Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Reviewed-by: Gary Guo <gary@garyguo.net>
Tested-by: Gary Guo <gary@garyguo.net>
Tested-by: Jesung Yang <y.j3ms.n@gmail.com>
Link: https://patch.msgid.link/20251124151837.2184382-2-ojeda@kernel.org
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2025-11-24 17:15:34 +01:00
Alex Gaynor
7a0eae4d43 MAINTAINERS: Remove Alex Gaynor as Rust maintainer
I've long since stopped having the time to contribute code or reviews,
this acknowledges that.

Geoffrey Thomas and I created the "linux-kernel-module-rust" project at
PyCon in 2018, as an experiment to see if we could make it possible to
write kernel modules in Rust. The Rust for Linux effort has far exceeded
anything we could have expected at the time.

I want to thank all the Rust for Linux contributors, past and present,
who have helped make this a reality -- and in particularly Miguel, who
really transformed this project from an interesting demo to something
that could really land in mainline.

Signed-off-by: Alex Gaynor <alex.gaynor@gmail.com>
Link: https://patch.msgid.link/CAFRnB2U1uvg1vyZe1kDi7L3P4kTFowfOo6Hfo9WJED4qve4ZZw@mail.gmail.com
[ Reflowed. - Miguel ]
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2025-11-24 16:56:40 +01:00
Zheng Yejian
f3f9f42232 kallsyms: Fix wrong "big" kernel symbol type read from procfs
Currently when the length of a symbol is longer than 0x7f characters,
its type shown in /proc/kallsyms can be incorrect.

I found this issue when reading the code, but it can be reproduced by
following steps:

  1. Define a function which symbol length is 130 characters:

    #define X13(x) x##x##x##x##x##x##x##x##x##x##x##x##x
    static noinline void X13(x123456789)(void)
    {
        printk("hello world\n");
    }

  2. The type in vmlinux is 't':

    $ nm vmlinux | grep x123456
    ffffffff816290f0 t x123456789x123456789x123456789x12[...]

  3. Then boot the kernel, the type shown in /proc/kallsyms becomes 'g'
     instead of the expected 't':

    # cat /proc/kallsyms | grep x123456
    ffffffff816290f0 g x123456789x123456789x123456789x12[...]

The root cause is that, after commit 73bbb94466 ("kallsyms: support
"big" kernel symbols"), ULEB128 was used to encode symbol name length.
That is, for "big" kernel symbols of which name length is longer than
0x7f characters, the length info is encoded into 2 bytes.

kallsyms_get_symbol_type() expects to read the first char of the
symbol name which indicates the symbol type. However, due to the
"big" symbol case not being handled, the symbol type read from
/proc/kallsyms may be wrong, so handle it properly.

Cc: stable@vger.kernel.org
Fixes: 73bbb94466 ("kallsyms: support "big" kernel symbols")
Signed-off-by: Zheng Yejian <zhengyejian@huaweicloud.com>
Acked-by: Gary Guo <gary@garyguo.net>
Link: https://patch.msgid.link/20241011143853.3022643-1-zhengyejian@huaweicloud.com
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2025-11-24 16:56:24 +01:00
Alexandre Courbot
841f31d298 rust: num: bounded: rename try_into_bitint to try_into_bounded
This is a remnant from when `Bounded` was called `BitInt` which I didn't
rename. Fix this.

Fixes: 01e345e82e ("rust: num: add Bounded integer wrapping type")
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Link: https://patch.msgid.link/20251124-bounded_fix-v1-1-d8e34e1c727f@nvidia.com
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2025-11-24 16:38:23 +01:00
Andy Shevchenko
a9f349e3c0 lib/vsprintf: Unify FORMAT_STATE_NUM handlers
We have two almost identical pieces that handle FORMAT_STATE_NUM case.
The differences are:
- redundant {} for one-line if-else conditional
- missing blank line after variable definitions
- inverted conditional

Unify the style of two.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Link: https://patch.msgid.link/20251120083140.3478507-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Petr Mladek <pmladek@suse.com>
2025-11-24 16:10:11 +01:00
John Ogness
66e7c1e0ee printk: Avoid irq_work for printk_deferred() on suspend
With commit ("printk: Avoid scheduling irq_work on suspend") the
implementation of printk_get_console_flush_type() was modified to
avoid offloading when irq_work should be blocked during suspend.
Since printk uses the returned flush type to determine what
flushing methods are used, this was thought to be sufficient for
avoiding irq_work usage during the suspend phase.

However, vprintk_emit() implements a hack to support
printk_deferred(). In this hack, the returned flush type is
adjusted to make sure no legacy direct printing occurs when
printk_deferred() was used.

Because of this hack, the legacy offloading flushing method can
still be used, causing irq_work to be queued when it should not
be.

Adjust the vprintk_emit() hack to also consider
@console_irqwork_blocked so that legacy offloading will not be
chosen when irq_work should be blocked.

Link: https://lore.kernel.org/lkml/87fra90xv4.fsf@jogness.linutronix.de
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Fixes: 26873e3e7f ("printk: Avoid scheduling irq_work on suspend")
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
2025-11-24 15:44:48 +01:00
Herbert Xu
ebbdf6466b crypto: ahash - Zero positive err value in ahash_update_finish
The partial block length returned by a block-only driver should
not be passed up to the caller since ahash itself deals with the
partial block data.

Set err to zero in ahash_update_finish if it was positive.

Reported-by: T Pratham <t-pratham@ti.com>
Tested-by: T Pratham <t-pratham@ti.com>
Fixes: 9d7a0ab1c7 ("crypto: ahash - Handle partial blocks in API")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-11-24 17:53:02 +08:00
Herbert Xu
b0356b75f4 crypto: ahash - Fix crypto_ahash_import with partial block data
Restore the partial block buffer in crypto_ahash_import by copying
it.  Check whether the partial block buffer exceeds the maximum
size and return -EOVERFLOW if it does.

Zero the partial block buffer in crypto_ahash_import_core.

Reported-by: T Pratham <t-pratham@ti.com>
Tested-by: T Pratham <t-pratham@ti.com>
Fixes: 9d7a0ab1c7 ("crypto: ahash - Handle partial blocks in API")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-11-24 17:52:30 +08:00
David Laight
80b61046b6 crypto: lib/mpi - use min() instead of min_t()
min_t(unsigned int, a, b) casts an 'unsigned long' to 'unsigned int'.
Use min(a, b) instead as it promotes any 'unsigned int' to 'unsigned long'
and so cannot discard significant bits.

In this case the 'unsigned long' value is small enough that the result
is ok.

Detected by an extra check added to min_t().

Signed-off-by: David Laight <david.laight.linux@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-11-24 17:44:14 +08:00
David Laight
14ca8ce1fc crypto: ccp - use min() instead of min_t()
min_t(unsigned int, a, b) casts an 'unsigned long' to 'unsigned int'.
Use min(a, b) instead as it promotes any 'unsigned int' to 'unsigned long'
and so cannot discard significant bits.

In this case the 'unsigned long' value is small enough that the result
is ok.

Detected by an extra check added to min_t().

Signed-off-by: David Laight <david.laight.linux@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-11-24 17:44:14 +08:00
David Laight
0f8ead58b6 hwrng: core - use min3() instead of nested min_t()
min_t(u16, a, b) is likely to discard significant bits.
Replace:
	min_t(u16, min_t(u16, default_quality, 1024), rng->quality ?: 1024);
with:
	min3(default_quality, 1024, rng->quality ?: 1024);

Signed-off-by: David Laight <david.laight.linux@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-11-24 17:44:14 +08:00
David Laight
6c5d5b6dc5 crypto: aesni - ctr_crypt() use min() instead of min_t()
min_t(unsigned int, a, b) casts an 'unsigned long' to 'unsigned int'.
Use min(a, b) instead as it promotes any 'unsigned int' to 'unsigned long'
and so cannot discard significant bits.

In this case the 'unsigned long' value is small enough that the result
is ok.

Detected by an extra check added to min_t().

Signed-off-by: David Laight <david.laight.linux@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-11-24 17:43:40 +08:00
Herbert Xu
680cd3e28c crypto: drbg - Delete unused ctx from struct sdesc
The ctx array in struct sdesc is never used.  Delete it as it's
bogus since the previous member ends with a flexible array.

Reported-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-11-24 17:43:40 +08:00
Thorsten Blum
c637f3e4a5 crypto: testmgr - Add missing DES weak and semi-weak key tests
Ever since commit da7f033ddc ("crypto: cryptomgr - Add test
infrastructure"), the DES test suite has tested only one of the four
weak keys and none of the twelve semi-weak keys.

DES has four weak keys and twelve semi-weak keys, and the kernel's DES
implementation correctly detects and rejects all of these keys when the
CRYPTO_TFM_REQ_FORBID_WEAK_KEYS flag is set. However, only a single weak
key was being tested. Add tests for all 16 weak and semi-weak keys.

While DES is deprecated, it is still used in some legacy protocols, and
weak/semi-weak key detection should be tested accordingly.

Tested on arm64 with cryptographic self-tests.

Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-11-24 17:43:40 +08:00
Alexandre Courbot
bc197e24a3 rust: num: bounded: Always inline fits_within and from_expr
`from_expr` relies on `build_assert` to infer that the passed expression
fits the type's boundaries at build time. That inference can only be
successful its code (and that of `fits_within`, which performs the
check) is inlined, as a dedicated function would need to work with a
variable and cannot verify that property.

While inlining happens as expected in most cases, it is not guaranteed.
In particular, kernel options that optimize for size like
`CONFIG_CC_OPTIMIZE_FOR_SIZE` can result in `from_expr` not being
inlined.

Add `#[inline(always)]` attributes to both `fits_within` and `from_expr`
to make the compiler inline these functions more aggressively, as it
does not make sense to use them non-inlined anyway.

[ For reference, the errors look like:

    ld.lld: error: undefined symbol: rust_build_error
    >>> referenced by build_assert.rs:83 (rust/kernel/build_assert.rs:83)
    >>>               rust/doctests_kernel_generated.o:(<kernel::num::bounded::Bounded<u8, 1>>::from_expr) in archive vmlinux.a

      - Miguel ]

Fixes: 01e345e82e ("rust: num: add Bounded integer wrapping type")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202511210055.RUsFNku1-lkp@intel.com/
Suggested-by: Gary Guo <gary@garyguo.net>
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Link: https://patch.msgid.link/20251122-bounded_ints_fix-v1-1-1e07589d4955@nvidia.com
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2025-11-23 22:49:55 +01:00
Thomas Weißschuh
4823329146 mempool: clarify behavior of mempool_alloc_preallocated()
The documentation of that function promises to never sleep.  However on
PREEMPT_RT a spinlock_t might in fact sleep.

Reword the documentation so users can predict its behavior better.

mempool could also replace spinlock_t with raw_spinlock_t which doesn't
sleep even on PREEMPT_RT but that would take away the improved
preemptibility of sleeping locks.

Link: https://lkml.kernel.org/r/20251014-mempool-doc-v1-1-bc9ebf169700@linutronix.de
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Christoph Lameter <cl@gentwo.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: "Vishal Moola (Oracle)" <vishal.moola@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2025-11-23 12:30:40 +01:00
Christoph Hellwig
07723a41ee mempool: drop the file name in the top of file comment
Mentioning the name of the file is redundant, so drop it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://patch.msgid.link/20251113084022.1255121-12-hch@lst.de
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2025-11-23 12:30:40 +01:00
Christoph Hellwig
0cab6873b7 mempool: de-typedef
Switch all uses of the deprecated mempool_t typedef in the core mempool
code to use struct mempool instead.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://patch.msgid.link/20251113084022.1255121-11-hch@lst.de
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2025-11-23 12:30:40 +01:00
Christoph Hellwig
8b41fb80a2 mempool: remove mempool_{init,create}_kvmalloc_pool
This was added for bcachefs and is unused now.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://patch.msgid.link/20251113084022.1255121-10-hch@lst.de
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2025-11-23 12:30:40 +01:00
Christoph Hellwig
9c4391767f mempool: legitimize the io_schedule_timeout in mempool_alloc_from_pool
The timeout here is and old workaround with a Fixme comment.  But
thinking about it, it makes sense to keep it, so reword the comment.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://patch.msgid.link/20251113084022.1255121-9-hch@lst.de
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2025-11-23 12:30:40 +01:00
Christoph Hellwig
ac529d86ad mempool: add mempool_{alloc,free}_bulk
Add a version of the mempool allocator that works for batch allocations
of multiple objects.  Calling mempool_alloc in a loop is not safe because
it could deadlock if multiple threads are performing such an allocation
at the same time.

As an extra benefit the interface is build so that the same array can be
used for alloc_pages_bulk / release_pages so that at least for page
backed mempools the fast path can use a nice batch optimization.

Note that mempool_alloc_bulk does not take a gfp_mask argument as it
must always be able to sleep and doesn't support any non-trivial
modifiers.  NOFO or NOIO constrainst must be set through the scoped API.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://patch.msgid.link/20251113084022.1255121-8-hch@lst.de
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2025-11-23 12:30:36 +01:00
Christoph Hellwig
1742d97df6 mempool: factor out a mempool_alloc_from_pool helper
Add a helper for the mempool_alloc slowpath to better separate it from the
fast path, and also use it to implement mempool_alloc_preallocated which
shares the same logic.

[hughd@google.com: fix lack of retrying with __GFP_DIRECT_RECLAIM]
[vbabka@suse.cz: really use limited flags for first mempool attempt]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://patch.msgid.link/20251113084022.1255121-7-hch@lst.de
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2025-11-23 12:28:16 +01:00
Thomas Weißschuh
1d57346474 selftests/nolibc: error out on linker warnings
If the linker emits warnings these should abort the build.
Otherwise they will be swallowed by run-tests.sh and not shown.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Acked-by: Willy Tarreau <w@1wt.eu>
2025-11-22 12:35:12 +01:00
Thomas Weißschuh
682bf67529 selftests/nolibc: use lld to link loongarch binaries
LLVM 21 switched to -mcmodel=medium for LoongArch64 compilations.
This code model uses R_LARCH_ECALL36 relocations which might not be
supported by GNU ld which to nolibc testsuite uses by default.
ld will not resolve the relocation and all function calls will end up
as busy loops.

Use lld instead.

We can not switch to lld for all LLVM builds, as it does not support all
necessary architectures.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Acked-by: Willy Tarreau <w@1wt.eu>
2025-11-22 12:35:02 +01:00
Eric Biggers
20d868a77f Revert "crypto: scatterwalk - Move skcipher walk and use it for memcpy_sglist"
This reverts commit 0f8d42bf12, with the
memcpy_sglist() part dropped.

Now that memcpy_sglist() no longer uses the skcipher_walk code, the
skcipher_walk code can be moved back to where it belongs.

Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-11-22 10:04:50 +08:00
Eric Biggers
4dffc9bbff crypto: scatterwalk - Fix memcpy_sglist() to always succeed
The original implementation of memcpy_sglist() was broken because it
didn't handle scatterlists that describe exactly the same memory, which
is a case that many callers rely on.  The current implementation is
broken too because it calls the skcipher_walk functions which can fail.
It ignores any errors from those functions.

Fix it by replacing it with a new implementation written from scratch.
It always succeeds.  It's also a bit faster, since it avoids the
overhead of skcipher_walk.  skcipher_walk includes a lot of
functionality (such as alignmask handling) that's irrelevant here.

Reported-by: Colin Ian King <coking@nvidia.com>
Closes: https://lore.kernel.org/r/20251114122620.111623-1-coking@nvidia.com
Fixes: 131bdceca1 ("crypto: scatterwalk - Add memcpy_sglist")
Fixes: 0f8d42bf12 ("crypto: scatterwalk - Move skcipher walk and use it for memcpy_sglist")
Cc: stable@vger.kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-11-22 10:04:50 +08:00
Kanchana P Sridhar
5727a844a3 crypto: iaa - Request to add Kanchana P Sridhar to Maintainers.
As suggested by Herbert, I would like to request to be added as a
Maintainer for the iaa_crypto driver.

Suggested-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Kanchana P Sridhar <kanchana.p.sridhar@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-11-22 10:04:50 +08:00
Eric Biggers
bfc11a84e0 crypto: tcrypt - Remove unused poly1305 support
Since the crypto_shash support for poly1305 was removed, the tcrypt
support for it is now unused as well.  Support for benchmarking the
kernel's Poly1305 code is now provided by the poly1305 kunit test.

Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-11-22 10:04:50 +08:00
Eric Biggers
c7dcb041ce crypto: ansi_cprng - Remove unused ansi_cprng algorithm
Remove ansi_cprng, since it's obsolete and unused, as confirmed at
https://lore.kernel.org/r/aQxpnckYMgAAOLpZ@gondor.apana.org.au/

This was originally added in 2008, apparently as a FIPS approved random
number generator.  Whether this has ever belonged upstream is
questionable.  Either way, ansi_cprng is no longer usable for this
purpose, since it's been superseded by the more modern algorithms in
crypto/drbg.c, and FIPS itself no longer allows it.  (NIST SP 800-131A
Rev 1 (2015) says that RNGs based on ANSI X9.31 will be disallowed after
2015.  NIST SP 800-131A Rev 2 (2019) confirms they are now disallowed.)

Therefore, there is no reason to keep it around.

Suggested-by: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Haotian Zhang <vulab@iscas.ac.cn>
Cc: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-11-22 10:04:50 +08:00
Ally Heev
79492d5adf crypto: asymmetric_keys - fix uninitialized pointers with free attribute
Uninitialized pointers with `__free` attribute can cause undefined
behavior as the memory assigned randomly to the pointer is freed
automatically when the pointer goes out of scope.

crypto/asymmetric_keys doesn't have any bugs related to this as of now,
but, it is better to initialize and assign pointers with `__free`
attribute in one statement to ensure proper scope-based cleanup

Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: https://lore.kernel.org/all/aPiG_F5EBQUjZqsl@stanley.mountain/
Signed-off-by: Ally Heev <allyheev@gmail.com>
Reviewed-by: Ignat Korchagin <ignat@cloudflare.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-11-22 10:04:50 +08:00
Gustavo A. R. Silva
a26c23e0d6 KEYS: Avoid -Wflex-array-member-not-at-end warning
-Wflex-array-member-not-at-end was introduced in GCC-14, and we are
getting ready to enable it, globally.

Use the new TRAILING_OVERLAP() helper to fix the following warning:

crypto/asymmetric_keys/restrict.c:20:34: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end]

This helper creates a union between a flexible-array member (FAM) and a
set of MEMBERS that would otherwise follow it.

This overlays the trailing MEMBER unsigned char data[10]; onto the FAM
struct asymmetric_key_id::data[], while keeping the FAM and the start
of MEMBER aligned.

The static_assert() ensures this alignment remains, and it's
intentionally placed inmediately after the corresponding structures --no
blank line in between.

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Ignat Korchagin <ignat@cloudflare.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-11-22 10:04:50 +08:00
Haotian Zhang
8700ce07c5 crypto: ccree - Correctly handle return of sg_nents_for_len
Fix error handling in cc_map_hash_request_update where sg_nents_for_len
return value was assigned to u32, converting negative errors to large
positive values before passing to sg_copy_to_buffer.

Check sg_nents_for_len return value and propagate errors before
assigning to areq_ctx->in_nents.

Fixes: b7ec853068 ("crypto: ccree - use std api when possible")
Signed-off-by: Haotian Zhang <vulab@iscas.ac.cn>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-11-22 10:04:50 +08:00
Haotian Zhang
e9eb52037a crypto: starfive - Correctly handle return of sg_nents_for_len
The return value of sg_nents_for_len was assigned to an unsigned long
in starfive_hash_digest, causing negative error codes to be converted
to large positive integers.

Add error checking for sg_nents_for_len and return immediately on
failure to prevent potential buffer overflows.

Fixes: 7883d1b28a ("crypto: starfive - Add hash and HMAC support")
Signed-off-by: Haotian Zhang <vulab@iscas.ac.cn>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-11-22 10:04:49 +08:00
Lai Jiangshan
6d90215dc0 workqueue: Don't rely on wq->rescuer to stop rescuer
The commit1 def98c84b6 ("workqueue: Fix spurious sanity check failures
in destroy_workqueue()") tries to fix spurious sanity check failures by
stopping send_mayday() via setting wq->rescuer to NULL.

But it fails to stop the pwq->mayday_node requeuing in the rescuer, and
the commit2 e66b39af00 ("workqueue: Fix pwq ref leak in
rescuer_thread()") fixes it by checking wq->rescuer which is the result
of commit1.

Both commits together really fix spurious sanity check failures caused
by the rescuer, but they both use a convoluted method by relying on
wq->rescuer state rather than the real count of work items.

Actually __WQ_DESTROYING and drain_workqueue() together already stop
send_mayday() by draining all the work items and ensuring no new work
item requeuing.

And the more proper fix to stop the pwq->mayday_node requeuing in the
rescuer is from commit3 4f3f4cf388 ("workqueue: avoid unneeded
requeuing the pwq in rescuer thread") and renders the checking of
wq->rescuer in commit2 unnecessary.

So __WQ_DESTROYING, drain_workqueue() and commit3 together fix spurious
sanity check failures introduced by the rescuer.

Just remove the convoluted code of using wq->rescuer.

Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-11-21 09:45:36 -10:00
Lai Jiangshan
7b05c90b33 workqueue: Only assign rescuer work when really needed
If the pwq does not need rescue (normal workers have been created or
become available), the rescuer can immediately move on to other stalled
pwqs.

Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-11-21 09:45:36 -10:00
Lai Jiangshan
99ed6f62a4 workqueue: Factor out assign_rescuer_work()
Move the code to assign work to rescuer and assign_rescuer_work().

Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-11-21 09:45:36 -10:00
Jonathan Corbet
d879c2e019 Merge branch 'mauro' into docs-mw
Mauro says:

That's the final series to complete the migration of documentation
build: it converts get_feat from Perl to Python.

V2 is technically identical to v1: the only difference is that it
now uses tools/lib/python/feat to store the library logic.

With that, no Sphinx in-kernel extensions use fork anymore to call
ancillary scripts: everything is now importing Python methods
directly from the libraries.

There's nothing special on this conversion: it is a direct translation,
almost bug-compatible with the original version (*).

(*) I did solve two or three caveats on patch 1.

Most of the complexity of the script relies at the logic to produce
ReST tables. I do have here on my internal scripts a (somewhat) generic
formatter for ReST tables in Python. I was tempted to convert the logic
to use it, but, as this could cause regressions, I opted to not do it
right now, mainly because the matrix table logic is complex. Also,
I'm tempted to modify a little bit the output there, but extra tests
are required to see if PDF output would work with complex tables (I
remember I had a problem with that in the past). So, I'm postponing
such extra cleanup.
2025-11-21 10:45:32 -07:00
Mauro Carvalho Chehab
e6bfd693bd get_feat.pl: remove it, as it got replaced by get_feat.py
Now that this was rewritten in Python, we can remove the old
tool.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <1f53e0fa48616af189ce98b45a65cc0c245e7aaf.1763492868.git.mchehab+huawei@kernel.org>
2025-11-21 10:32:30 -07:00
Mauro Carvalho Chehab
b713807eab Documentation/sphinx/kernel_feat.py: use class directly
Now that get_feat is in Python, we don't need to use subprocess
to fork an executable file: we can use the feature classes
directly.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <c59d2542d7cc914fd5f8c84df966e63adc924cdc.1763492868.git.mchehab+huawei@kernel.org>
2025-11-21 10:32:30 -07:00
Mauro Carvalho Chehab
caa642bf3b tools/docs/get_feat.py: convert get_feat.pl to Python
As we want to call Python code directly at the Sphinx extension,
convert get_feat.pl to Python.

The code was made to be (almost) bug-compatible with the Perl
version, with two exceptions:

1. Currently, Perl script outputs a wrong table if arch is set
   to a non-existing value;

2. the ReST table output when --feat is used without --arch
   has an invalid format, as the number of characters for the
   table delimiters are wrong.

Those two bugs were fixed while testing the conversion.

Additionally, another caveat was solved:
the output when --feat is used without arch and the feature
doesn't exist doesn't contain an empty table anymore.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <03c26cee1ec567804735a33047e625ef5ab7bfa8.1763492868.git.mchehab+huawei@kernel.org>
2025-11-21 10:32:30 -07:00
Jiakai Xu
55fb2d5726 Documentation/admin-guide: fix typo and comment in cscope example
This patch updates the Linux documentation for cscope, fixing two issues:
1. Corrects the typo in the command line:
       c"scope -d -p10  ->  cscope -d -p10
2. Fixes the related documentation comment for clarity and correctness:
       cscope by default cscope.out database.
       ->
       cscope by default uses the cscope.out database.

Signed-off-by: Jiakai Xu <jiakaiPeanut@gmail.com>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <20251119065727.3500015-1-jiakaiPeanut@gmail.com>
2025-11-21 10:29:49 -07:00
Zhao Yipeng
738c9738e6 ima: Handle error code returned by ima_filter_rule_match()
In ima_match_rules(), if ima_filter_rule_match() returns -ENOENT due to
the rule being NULL, the function incorrectly skips the 'if (!rc)' check
and sets 'result = true'. The LSM rule is considered a match, causing
extra files to be measured by IMA.

This issue can be reproduced in the following scenario:
After unloading the SELinux policy module via 'semodule -d', if an IMA
measurement is triggered before ima_lsm_rules is updated,
in ima_match_rules(), the first call to ima_filter_rule_match() returns
-ESTALE. This causes the code to enter the 'if (rc == -ESTALE &&
!rule_reinitialized)' block, perform ima_lsm_copy_rule() and retry. In
ima_lsm_copy_rule(), since the SELinux module has been removed, the rule
becomes NULL, and the second call to ima_filter_rule_match() returns
-ENOENT. This bypasses the 'if (!rc)' check and results in a false match.

Call trace:
  selinux_audit_rule_match+0x310/0x3b8
  security_audit_rule_match+0x60/0xa0
  ima_match_rules+0x2e4/0x4a0
  ima_match_policy+0x9c/0x1e8
  ima_get_action+0x48/0x60
  process_measurement+0xf8/0xa98
  ima_bprm_check+0x98/0xd8
  security_bprm_check+0x5c/0x78
  search_binary_handler+0x6c/0x318
  exec_binprm+0x58/0x1b8
  bprm_execve+0xb8/0x130
  do_execveat_common.isra.0+0x1a8/0x258
  __arm64_sys_execve+0x48/0x68
  invoke_syscall+0x50/0x128
  el0_svc_common.constprop.0+0xc8/0xf0
  do_el0_svc+0x24/0x38
  el0_svc+0x44/0x200
  el0t_64_sync_handler+0x100/0x130
  el0t_64_sync+0x3c8/0x3d0

Fix this by changing 'if (!rc)' to 'if (rc <= 0)' to ensure that error
codes like -ENOENT do not bypass the check and accidentally result in a
successful match.

Fixes: 4af4662fa4 ("integrity: IMA policy")
Signed-off-by: Zhao Yipeng <zhaoyipeng5@huawei.com>
Reviewed-by: Roberto Sassu <roberto.sassu@huawei.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
2025-11-21 07:24:01 -05:00
Chen Ridong
b1bcaed1e3 cpuset: Treat cpusets in attaching as populated
Currently, the check for whether a partition is populated does not
account for tasks in the cpuset of attaching. This is a corner case
that can leave a task stuck in a partition with no effective CPUs.

The race condition occurs as follows:

cpu0				cpu1
				//cpuset A  with cpu N
migrate task p to A
cpuset_can_attach
// with effective cpus
// check ok

// cpuset_mutex is not held	// clear cpuset.cpus.exclusive
				// making effective cpus empty
				update_exclusive_cpumask
				// tasks_nocpu_error check ok
				// empty effective cpus, partition valid
cpuset_attach
...
// task p stays in A, with non-effective cpus.

To fix this issue, this patch introduces cs_is_populated, which considers
tasks in the attaching cpuset. This new helper is used in validate_change
and partition_is_populated.

Fixes: e2d59900d9 ("cgroup/cpuset: Allow no-task partition to have empty cpuset.cpus.effective")
Signed-off-by: Chen Ridong <chenridong@huawei.com>
Reviewed-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-11-20 16:25:26 -10:00
ke zijie
862f670205 docs/zh_CN: Add data-integrity.rst translation
Translate .../block/data-integrity.rst into Chinese.
Add data-integrity into .../block/index.rst.

Update the translation through commit c6e56cf6b2
("block: move integrity information into queue_limits")

Reviewed-by: Yanteng Si <siyanteng@cqsoftware.com.cn>
Reviewed-by: WangYuli <wangyl5933@chinaunicom.cn>
Signed-off-by: ke zijie <kezijie@leap-io-kernel.com>
Signed-off-by: Alex Shi <alexs@kernel.org>
2025-11-21 10:09:42 +08:00
ke zijie
dcb7fefe52 docs/zh_CN: Add blk-mq.rst translation
Translate .../block/blk-mq.rst into Chinese.
Add blk-mq into .../block/index.rst.

Update the translation through commit 41bd33df4e
("docs: block: blk-mq.rst: correct places -> place")

Reviewed-by: Yanteng Si <siyanteng@cqsoftware.com.cn>
Reviewed-by: WangYuli <wangyl5933@chinaunicom.cn>
Signed-off-by: ke zijie <kezijie@leap-io-kernel.com>
Signed-off-by: Alex Shi <alexs@kernel.org>
2025-11-21 10:09:37 +08:00
ke zijie
a41b1f1521 docs/zh_CN: Add block/index.rst translation
Translate .../block/index.rst into Chinese and update subsystem-apis.rst
translation.

Update the translation through commit 56cdea92ed
("Documentation/block: drop the request.rst file")

Reviewed-by: Yanteng Si <siyanteng@cqsoftware.com.cn>
Reviewed-by: WangYuli <wangyl5933@chinaunicom.cn>
Signed-off-by: ke zijie <kezijie@leap-io-kernel.com>
Signed-off-by: Alex Shi <alexs@kernel.org>
2025-11-21 10:09:00 +08:00
Chenguang Zhao
6132026df0 docs/zh_CN: Update the Chinese translation of kbuild.rst
Finish the translation of kbuild/kbuild.rst.

Update to commit 5cbfb4da7e06 ("kbuild: doc: improve
KBUILD_BUILD_TIMESTAMP documentation")

Signed-off-by: Chenguang Zhao <zhaochenguang@kylinos.cn>
Reviewed-by: WangYuli <wanyl5933@chinaunicom.cn>
Reviewed-by: Dongliang Mu <dzm91@hust.edu.cn>
Signed-off-by: Alex Shi <alexs@kernel.org>
2025-11-21 10:08:43 +08:00
Lai Jiangshan
c9c19e8bbc workqueue: Init rescuer's affinities as wq_unbound_cpumask
The affinity to set to the rescuers should be consistent in all paths
when a rescuer is in detached state. The affinity could be either
wq_unbound_cpumask or unbound_effective_cpumask(wq).

Related paths:
       rescuer's worker_detach_from_pool()
       update wq_unbound_cpumask
       update wq's cpumask
       init_rescuer()

Both affinities are Ok as long as they are consistent in all paths.

In the commit 449b31ad29 ("workqueue: Init rescuer's affinities as
the wq's effective cpumask") makes init_rescuer use
unbound_effective_cpumask(wq) which is consistent with then
apply_wqattrs_commit().

But using unbound_effective_cpumask(wq) requres much more code to
maintain the consistency, and it doesn't make much sense since the
affinity is only effective when the rescuer is not processing works.
wq_unbound_cpumask is more favorable.

So apply_wqattrs_commit() and the path of "updating wq's cpumask" had
been changed to not update the rescuer's affinity, and both the paths
of "updating wq_unbound_cpumask" and "rescuer's
worker_detach_from_pool()" had been changed to use wq_unbound_cpumask.

Now, make init_rescuer() use wq_unbound_cpumask for rescuer's affinity
and make all the paths consistent.

Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Waiman Long <longman@redhat.com>
Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-11-20 10:27:55 -10:00
Lai Jiangshan
8ac4dbe7dd workqueue: Let DISASSOCIATED workers follow unbound wq cpumask changes
When workqueue cpumask changes are committed, the DISASSOCIATED workers
affinity is not touched and this might be a problem down the line for
isolated setups when the DISASSOCIATED pools still have works to run
after the cpu is offline.

Make sure the workers' affinity is updated every time a workqueue cpumask
changes, so these workers can't break isolation.

Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Waiman Long <longman@redhat.com>
Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-11-20 10:27:55 -10:00
Lai Jiangshan
e36bce4466 workqueue: Update the rescuer's affinity only when it is detached
When a rescuer is attached to a pool, its affinity should be only
managed by the pool.

But updating the detached rescuer's affinity is still meaningful so
that it will not disrupt isolated CPUs when it is to be waken up.

But the commit d64f2fa064 ("kernel/workqueue: Let rescuers follow
unbound wq cpumask changes") updates the affinity unconditionally, and
causes some issues

1) it also changes the affinity when the rescuer is already attached to
   a pool, which violates the affinity management.

2) the said commit tries to update the affinity of the rescuers, but it
   misses the rescuers of the PERCPU workqueues, and isolated CPUs can
   be possibly disrupted by these rescuers when they are summoned.

3) The affinity to set to the rescuers should be consistent in all paths
   when a rescuer is in detached state. The affinity could be either
   wq_unbound_cpumask or unbound_effective_cpumask(wq). Related paths:
       rescuer's worker_detach_from_pool()
       update wq_unbound_cpumask
       update wq's cpumask
       init_rescuer()
   Both affinities are Ok as long as they are consistent in all paths.
   But using unbound_effective_cpumask(wq) requres much more code to
   maintain the consistency, and it doesn't make much sense since the
   affinity is only effective when the rescuer is not processing works.
   wq_unbound_cpumask is more favorable.

Fix the 1) issue by testing rescuer->pool before updating with
wq_pool_attach_mutex held.

Fix the 2) issue by moving the rescuer's affinity updating code to
the place updating wq_unbound_cpumask and make it also update for
PERCPU workqueues.

Partially cleanup the 3) consistency issue by using wq_unbound_cpumask.
So that the path of "updating wq's cpumask" doesn't need to maintain it.
and both the paths of "updating wq_unbound_cpumask" and "rescuer's
worker_detach_from_pool()" use wq_unbound_cpumask.

Cleanup for init_rescuer()'s consistency for affinity can be done in
future.

Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Waiman Long <longman@redhat.com>
Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-11-20 10:27:55 -10:00
Thomas Weißschuh
31b4d3af63 tools/nolibc: remove more __nolibc_enosys() fallbacks
Commit e6366101ce ("tools/nolibc: remove __nolibc_enosys() fallback
from time64-related functions") removed many of these fallbacks but
forgot a few.

Finish the job.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Acked-by: Willy Tarreau <w@1wt.eu>
2025-11-20 19:47:11 +01:00
Thomas Weißschuh
3e1da545db tools/nolibc: remove now superfluous overflow check in llseek
As off_t is now always 64-bit wide this overflow can not happen anymore,
remove the check.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Acked-by: Willy Tarreau <w@1wt.eu>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
2025-11-20 19:47:04 +01:00
Thomas Weißschuh
e800e94468 tools/nolibc: use 64-bit off_t
The kernel uses 64-bit values for file offsets.
Currently these might be truncated to 32-bit when assigned to
nolibc's off_t values.

Switch to 64-bit off_t consistently.

Suggested-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/lkml/cec27d94-c99d-4c57-9a12-275ea663dda8@app.fastmail.com/
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Acked-by: Willy Tarreau <w@1wt.eu>
2025-11-20 19:46:57 +01:00
Thomas Weißschuh
19c5a681b2 tools/nolibc: prefer the llseek syscall
Make sure to always use the 64-bit safe system call
in preparation for 64-bit off_t on 32 bit architectures.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Acked-by: Willy Tarreau <w@1wt.eu>
2025-11-20 19:46:52 +01:00
Thomas Weißschuh
d93d0593dd tools/nolibc: handle 64-bit off_t for llseek
Correctly handle 64-bit off_t values in preparation for 64-bit off_t on
32-bit architectures.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Willy Tarreau <w@1wt.eu>
2025-11-20 19:46:45 +01:00
Thomas Weißschuh
87506e44cb tools/nolibc: use 64-bit ino_t
The kernel uses 64-bit values for inode numbers.
Currently these might be truncated to 32-bit when assigned to
nolibc's ino_t values.

Switch to 64-bit ino_t consistently.

As ino_t is never used directly in kernel ABIs, no systemcall wrappers
need to be adapted.

Suggested-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/lkml/cec27d94-c99d-4c57-9a12-275ea663dda8@app.fastmail.com/
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Acked-by: Willy Tarreau <w@1wt.eu>
2025-11-20 19:46:33 +01:00
Rong Tao
06a7415cf2 sched_ext: tools: Removing duplicate targets during non-cross compilation
When cross-compilation is not used, BPFOBJ and HOST_BPFOBJ are identical
files, libbpf.a, and duplicate libbpf.a files should be removed.

Signed-off-by: Rong Tao <rongtao@cestc.cn>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-11-20 07:00:27 -10:00
Pingfan Liu
318e18ed22 sched/deadline: Walk up cpuset hierarchy to decide root domain when hot-unplug
*** Bug description ***
When testing kexec-reboot on a 144 cpus machine with
isolcpus=managed_irq,domain,1-71,73-143 in kernel command line, I
encounter the following bug:

[   97.114759] psci: CPU142 killed (polled 0 ms)
[   97.333236] Failed to offline CPU143 - error=-16
[   97.333246] ------------[ cut here ]------------
[   97.342682] kernel BUG at kernel/cpu.c:1569!
[   97.347049] Internal error: Oops - BUG: 00000000f2000800 [#1] SMP
[...]

In essence, the issue originates from the CPU hot-removal process, not
limited to kexec. It can be reproduced by writing a SCHED_DEADLINE
program that waits indefinitely on a semaphore, spawning multiple
instances to ensure some run on CPU 72, and then offlining CPUs 1–143
one by one. When attempting this, CPU 143 failed to go offline.
  bash -c 'taskset -cp 0 $$ && for i in {1..143}; do echo 0 > /sys/devices/system/cpu/cpu$i/online 2>/dev/null; done'

Tracking down this issue, I found that dl_bw_deactivate() returned
-EBUSY, which caused sched_cpu_deactivate() to fail on the last CPU.
But that is not the fact, and contributed by the following factors:
When a CPU is inactive, cpu_rq()->rd is set to def_root_domain. For an
blocked-state deadline task (in this case, "cppc_fie"), it was not
migrated to CPU0, and its task_rq() information is stale. So its rq->rd
points to def_root_domain instead of the one shared with CPU0.  As a
result, its bandwidth is wrongly accounted into a wrong root domain
during domain rebuild.

*** Issue ***
The key point is that root_domain is only tracked through active rq->rd.
To avoid using a global data structure to track all root_domains in the
system, there should be a method to locate an active CPU within the
corresponding root_domain.

*** Solution ***
To locate the active cpu, the following rules for deadline
sub-system is useful
  -1.any cpu belongs to a unique root domain at a given time
  -2.DL bandwidth checker ensures that the root domain has active cpus.

Now, let's examine the blocked-state task P.
If P is attached to a cpuset that is a partition root, it is
straightforward to find an active CPU.
If P is attached to a cpuset that has changed from 'root' to 'member',
the active CPUs are grouped into the parent root domain. Naturally, the
CPUs' capacity and reserved DL bandwidth are taken into account in the
ancestor root domain. (In practice, it may be unsafe to attach P to an
arbitrary root domain, since that domain may lack sufficient DL
bandwidth for P.) Again, it is straightforward to find an active CPU in
the ancestor root domain.

This patch groups CPUs into isolated and housekeeping sets. For the
housekeeping group, it walks up the cpuset hierarchy to find active CPUs
in P's root domain and retrieves the valid rd from cpu_rq(cpu)->rd.

Signed-off-by: Pingfan Liu <piliu@redhat.com>
Cc: Waiman Long <longman@redhat.com>
Cc: Chen Ridong <chenridong@huaweicloud.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Pierre Gondois <pierre.gondois@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Valentin Schneider <vschneid@redhat.com>
To: linux-kernel@vger.kernel.org
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-11-20 06:57:58 -10:00
Pingfan Liu
1f38221511 cgroup/cpuset: Introduce cpuset_cpus_allowed_locked()
cpuset_cpus_allowed() uses a reader lock that is sleepable under RT,
which means it cannot be called inside raw_spin_lock_t context.

Introduce a new cpuset_cpus_allowed_locked() helper that performs the
same function as cpuset_cpus_allowed() except that the caller must have
acquired the cpuset_mutex so that no further locking will be needed.

Suggested-by: Waiman Long <longman@redhat.com>
Signed-off-by: Pingfan Liu <piliu@redhat.com>
Cc: Waiman Long <longman@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Koutný <mkoutny@suse.com>
Cc: linux-kernel@vger.kernel.org
To: cgroups@vger.kernel.org
Reviewed-by: Chen Ridong <chenridong@huawei.com>
Reviewed-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-11-20 06:57:58 -10:00
Masami Hiramatsu (Google)
a2f7990d33 selftests: tracing: Update fprobe selftest for ftrace based fprobe
Since the ftrace fprobe is both fgraph and ftrace based implemented,
the selftest needs to be updated. This does not count the actual
number of lines, but just check the differences.

Link: https://lore.kernel.org/r/176295318112.431538.11780280333728368327.stgit@devnote2
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2025-11-19 15:55:14 -07:00
Masami Hiramatsu (Google)
a1ca238936 selftests: tracing: Add tprobe enable/disable testcase
Commit 2867495dea ("tracing: tprobe-events: Register tracepoint when
enable tprobe event") caused regression bug and tprobe did not work.
To prevent similar problems, add a testcase which enables/disables a
tprobe and check the results.

Link: https://lore.kernel.org/r/176252610176.214996.3978515319000806265.stgit@devnote2
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2025-11-19 15:55:08 -07:00
Brendan Jackman
d9e6269e33 selftests/run_kselftest.sh: exit with error if tests fail
Parsing KTAP is quite an inconvenience, but most of the time the thing
you really want to know is "did anything fail"?

Let's give the user the his information without them needing
to parse anything.

Because of the use of subshells and namespaces, this needs to be
communicated via a file. Just write arbitrary data into the file and
treat non-empty content as a signal that something failed.

In case any user depends on the current behaviour, such as running this
from a script with `set -e` and parsing the result for failures
afterwards, add a flag they can set to get the old behaviour, namely
--no-error-on-fail.

Link: https://lore.kernel.org/r/20251111-b4-ksft-error-on-fail-v3-1-0951a51135f6@google.com
Signed-off-by: Brendan Jackman <jackmanb@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2025-11-19 15:00:22 -07:00
Zhang Chujun
26347f8443 selftests/dma: fix invalid array access in printf
The printf statement attempts to print the DMA direction string using
the syntax 'dir[directions]', which is an invalid array access. The
variable 'dir' is an integer, and 'directions' is a char pointer array.
This incorrect syntax should be 'directions[dir]', using 'dir' as the
index into the 'directions' array. Fix this by correcting the array
access from 'dir[directions]' to 'directions[dir]'.

Link: https://lore.kernel.org/r/20251104025234.2363-1-zhangchujun@cmss.chinamobile.com
Signed-off-by: Zhang Chujun <zhangchujun@cmss.chinamobile.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2025-11-19 15:00:14 -07:00
Tamir Duberstein
494de8f67b rust: sync: replace kernel::c_str! with C-Strings
C-String literals were added in Rust 1.77. Replace instances of
`kernel::c_str!` with C-String literals where possible.

Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Reviewed-by: Benno Lossin <lossin@kernel.org>
Signed-off-by: Tamir Duberstein <tamird@gmail.com>
Link: https://patch.msgid.link/20251117-core-cstr-cstrings-v4-1-924886ad9f75@gmail.com
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2025-11-19 22:41:12 +01:00
Brian Harring
26866b6bb1 rust: pin-init: fix typo in docs
Signed-off-by: Brian Harring <ferringb@gmail.com>
Signed-off-by: Benno Lossin <lossin@kernel.org>
Link: https://patch.msgid.link/20251016211740.653599-2-lossin@kernel.org
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2025-11-19 22:41:12 +01:00
Benno Lossin
53870c679e rust: pin-init: fix broken rust doc link
Rust 1.92.0 warns when building the documentation that [`PinnedDrop`] is
an invalid reference. This is correct and it's weird that it didn't warn
before, so fix the link.

[ The reason is that it is hidden -- I had asked about that in the
  upstream PR that changed the behavior because I wasn't sure it was
  intentional (and thus whether we needed to fix this and other cases):

      https://github.com/rust-lang/rust/pull/147153#issuecomment-3395484636

  It turns out it was not, and it has been fixed for 1.92.0's upcoming
  release thanks to Guillaume and León. So we do not strictly need
  this patch and the other changes anymore:

      https://github.com/rust-lang/rust/pull/147809

  However, checking hidden/private items or, even better, a runtime
  toggle to be able to see those on the fly, is something that I think
  would be quite nice so I have had it in our usual lists for a while.
  Guillaume is open to the idea and perhaps experimenting with an
  implementation on our side first -- he asked me to open issues
  upstream:

      https://github.com/rust-lang/rust/issues/149105
      https://github.com/rust-lang/rust/issues/149106

    - Miguel ]

Signed-off-by: Benno Lossin <lossin@kernel.org>
Link: https://patch.msgid.link/20251016211740.653599-1-lossin@kernel.org
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2025-11-19 22:18:33 +01:00
Paul Moore
9a948eefad lsm: use unrcu_pointer() for current->cred in security_init()
We need to directly allocate the cred's LSM state for the initial task
when we initialize the LSM framework.  Unfortunately, this results in a
RCU related type mismatch, use the unrcu_pointer() macro to handle this
a bit more elegantly.

The explicit type casting still remains as we need to work around the
constification of current->cred in this particular case.

Reviewed-by: Xiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-11-19 10:32:06 -05:00
John Ogness
26873e3e7f printk: Avoid scheduling irq_work on suspend
Allowing irq_work to be scheduled while trying to suspend has shown
to cause problems as some architectures interpret the pending
interrupts as a reason to not suspend. This became a problem for
printk() with the introduction of NBCON consoles. With every
printk() call, NBCON console printing kthreads are woken by queueing
irq_work. This means that irq_work continues to be queued due to
printk() calls late in the suspend procedure.

Avoid this problem by preventing printk() from queueing irq_work
once console suspending has begun. This applies to triggering NBCON
and legacy deferred printing as well as klogd waiters.

Since triggering of NBCON threaded printing relies on irq_work, the
pr_flush() within console_suspend_all() is used to perform the final
flushing before suspending consoles and blocking irq_work queueing.
NBCON consoles that are not suspended (due to the usage of the
"no_console_suspend" boot argument) transition to atomic flushing.

Introduce a new global variable @console_irqwork_blocked to flag
when irq_work queueing is to be avoided. The flag is used by
printk_get_console_flush_type() to avoid allowing deferred printing
and switch NBCON consoles to atomic flushing. It is also used by
vprintk_emit() to avoid klogd waking.

Add WARN_ON_ONCE(console_irqwork_blocked) to the irq_work queuing
functions to catch any code that attempts to queue printk irq_work
during the suspending/resuming procedure.

Cc: stable@vger.kernel.org # 6.13.x because no drivers in 6.12.x
Fixes: 6b93bb41f6 ("printk: Add non-BKL (nbcon) console basic infrastructure")
Closes: https://lore.kernel.org/lkml/DB9PR04MB8429E7DDF2D93C2695DE401D92C4A@DB9PR04MB8429.eurprd04.prod.outlook.com
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Tested-by: Sherry Sun <sherry.sun@nxp.com>
Link: https://patch.msgid.link/20251113160351.113031-3-john.ogness@linutronix.de
Signed-off-by: Petr Mladek <pmladek@suse.com>
2025-11-19 16:01:31 +01:00
John Ogness
d01ff281bd printk: Allow printk_trigger_flush() to flush all types
Currently printk_trigger_flush() only triggers legacy offloaded
flushing, even if that may not be the appropriate method to flush
for currently registered consoles. (The function predates the
NBCON consoles.)

Since commit 6690d6b527 ("printk: Add helper for flush type
logic") there is printk_get_console_flush_type(), which also
considers NBCON consoles and reports all the methods of flushing
appropriate based on the system state and consoles available.

Update printk_trigger_flush() to use
printk_get_console_flush_type() to appropriately flush registered
consoles.

Suggested-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/stable/20251113160351.113031-2-john.ogness%40linutronix.de
Tested-by: Sherry Sun <sherry.sun@nxp.com>
Link: https://patch.msgid.link/20251113160351.113031-2-john.ogness@linutronix.de
Signed-off-by: Petr Mladek <pmladek@suse.com>
2025-11-19 16:01:31 +01:00
Coiby Xu
c200892b46 ima: Access decompressed kernel module to verify appended signature
Currently, when in-kernel module decompression (CONFIG_MODULE_DECOMPRESS)
is enabled, IMA has no way to verify the appended module signature as it
can't decompress the module.

Define a new kernel_read_file_id enumerate READING_MODULE_COMPRESSED so
IMA can calculate the compressed kernel module data hash on
READING_MODULE_COMPRESSED and defer appraising/measuring it until on
READING_MODULE when the module has been decompressed.

Before enabling in-kernel module decompression, a kernel module in
initramfs can still be loaded with ima_policy=secure_boot. So adjust the
kernel module rule in secure_boot policy to allow either an IMA
signature OR an appended signature i.e. to use
"appraise func=MODULE_CHECK appraise_type=imasig|modsig".

Reported-by: Karel Srot <ksrot@redhat.com>
Suggested-by: Mimi Zohar <zohar@linux.ibm.com>
Suggested-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Coiby Xu <coxu@redhat.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
2025-11-19 09:19:42 -05:00
Andy Shevchenko
ace3852170 tracing: Switch to use %ptSp
Use %ptSp instead of open coded variants to print content of
struct timespec64 in human readable format.

Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://patch.msgid.link/20251113150217.3030010-22-andriy.shevchenko@linux.intel.com
Signed-off-by: Petr Mladek <pmladek@suse.com>
2025-11-19 12:30:11 +01:00
Andy Shevchenko
7b040d4571 scsi: snic: Switch to use %ptSp
Use %ptSp instead of open coded variants to print content of
struct timespec64 in human readable format.

Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Karan Tilak Kumar <kartilak@cisco.com>
Link: https://patch.msgid.link/20251113150217.3030010-21-andriy.shevchenko@linux.intel.com
Signed-off-by: Petr Mladek <pmladek@suse.com>
2025-11-19 12:30:11 +01:00
Andy Shevchenko
d710741f83 scsi: fnic: Switch to use %ptSp
Use %ptSp instead of open coded variants to print content of
struct timespec64 in human readable format.

Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Karan Tilak Kumar <kartilak@cisco.com>
Link: https://patch.msgid.link/20251113150217.3030010-20-andriy.shevchenko@linux.intel.com
[pmladek@suse.com: Fixed output ordering and last_read_time update.]
Signed-off-by: Petr Mladek <pmladek@suse.com>
2025-11-19 12:28:03 +01:00
Andy Shevchenko
ed40532ccd s390/dasd: Switch to use %ptSp
Use %ptSp instead of open coded variants to print content of
struct timespec64 in human readable format.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Stefan Haberland <sth@linux.ibm.com>
Link: https://patch.msgid.link/20251113150217.3030010-19-andriy.shevchenko@linux.intel.com
Signed-off-by: Petr Mladek <pmladek@suse.com>
2025-11-19 12:26:07 +01:00
Andy Shevchenko
4e7c8ab42e ptp: ocp: Switch to use %ptSp
Use %ptSp instead of open coded variants to print content of
struct timespec64 in human readable format.

While at it, fix wrong use of %ptT against struct timespec64.
It's kinda lucky that it worked just because the first member
there 64-bit and it's of time64_t type. Now with %ptS it may
be used correctly.

Acked-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://patch.msgid.link/20251113150217.3030010-18-andriy.shevchenko@linux.intel.com
Signed-off-by: Petr Mladek <pmladek@suse.com>
2025-11-19 12:26:07 +01:00
Andy Shevchenko
b1e7286eee pps: Switch to use %ptSp
Use %ptSp instead of open coded variants to print content of
struct timespec64 in human readable format.

Acked-by: Rodolfo Giometti <giometti@enneenne.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://patch.msgid.link/20251113150217.3030010-17-andriy.shevchenko@linux.intel.com
Signed-off-by: Petr Mladek <pmladek@suse.com>
2025-11-19 12:26:07 +01:00
Andy Shevchenko
3bc02fe0b8 PCI: epf-test: Switch to use %ptSp
Use %ptSp instead of open coded variants to print content of
struct timespec64 in human readable format.

Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://patch.msgid.link/20251113150217.3030010-16-andriy.shevchenko@linux.intel.com
Signed-off-by: Petr Mladek <pmladek@suse.com>
2025-11-19 12:26:07 +01:00
Andy Shevchenko
b8edf4fbb2 net: dsa: sja1105: Switch to use %ptSp
Use %ptSp instead of open coded variants to print content of
struct timespec64 in human readable format.

Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Vladimir Oltean <olteanv@gmail.com>
Tested-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://patch.msgid.link/20251113150217.3030010-15-andriy.shevchenko@linux.intel.com
Signed-off-by: Petr Mladek <pmladek@suse.com>
2025-11-19 12:26:07 +01:00
Andy Shevchenko
12158d6747 mmc: mmc_test: Switch to use %ptSp
Use %ptSp instead of open coded variants to print content of
struct timespec64 in human readable format.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Ulf Hansson <ulf.hansson@linaro.org>
Link: https://patch.msgid.link/20251113150217.3030010-14-andriy.shevchenko@linux.intel.com
Signed-off-by: Petr Mladek <pmladek@suse.com>
2025-11-19 12:26:06 +01:00
Andy Shevchenko
5a1df7219d media: av7110: Switch to use %ptSp
Use %ptSp instead of open coded variants to print content of
struct timespec64 in human readable format.

Acked-by: Hans Verkuil <hverkuil+cisco@kernel.org>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://patch.msgid.link/20251113150217.3030010-13-andriy.shevchenko@linux.intel.com
Signed-off-by: Petr Mladek <pmladek@suse.com>
2025-11-19 12:26:06 +01:00
Andy Shevchenko
0cfc283d18 ipmi: Switch to use %ptSp
Use %ptSp instead of open coded variants to print content of
struct timespec64 in human readable format.

Acked-by: Corey Minyard <cminyard@mvista.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://patch.msgid.link/20251113150217.3030010-12-andriy.shevchenko@linux.intel.com
Signed-off-by: Petr Mladek <pmladek@suse.com>
2025-11-19 12:26:06 +01:00
Andy Shevchenko
64acc20ec9 igb: Switch to use %ptSp
Use %ptSp instead of open coded variants to print content of
struct timespec64 in human readable format.

Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://patch.msgid.link/20251113150217.3030010-11-andriy.shevchenko@linux.intel.com
Signed-off-by: Petr Mladek <pmladek@suse.com>
2025-11-19 12:26:06 +01:00
Andy Shevchenko
81e3db7ead e1000e: Switch to use %ptSp
Use %ptSp instead of open coded variants to print content of
struct timespec64 in human readable format.

Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://patch.msgid.link/20251113150217.3030010-10-andriy.shevchenko@linux.intel.com
Signed-off-by: Petr Mladek <pmladek@suse.com>
2025-11-19 12:26:06 +01:00
Andy Shevchenko
51d3654916 drm/xe: Switch to use %ptSp
Use %ptSp instead of open coded variants to print content of
struct timespec64 in human readable format.

Acked-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://patch.msgid.link/20251113150217.3030010-9-andriy.shevchenko@linux.intel.com
Signed-off-by: Petr Mladek <pmladek@suse.com>
2025-11-19 12:26:06 +01:00
Andy Shevchenko
083364667d drm/vblank: Switch to use %ptSp
Use %ptSp instead of open coded variants to print content of
struct timespec64 in human readable format.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://patch.msgid.link/20251113150217.3030010-8-andriy.shevchenko@linux.intel.com
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
2025-11-19 12:25:47 +01:00
Andy Shevchenko
9d2a48c3a7 drm/msm: Switch to use %ptSp
Use %ptSp instead of open coded variants to print content of
struct timespec64 in human readable format.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Link: https://patch.msgid.link/20251113150217.3030010-7-andriy.shevchenko@linux.intel.com
Signed-off-by: Petr Mladek <pmladek@suse.com>
2025-11-19 10:24:13 +01:00
Andy Shevchenko
fbd3aad6e0 drm/amdgpu: Switch to use %ptSp
Use %ptSp instead of open coded variants to print content of
struct timespec64 in human readable format.

Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://patch.msgid.link/20251113150217.3030010-6-andriy.shevchenko@linux.intel.com
Signed-off-by: Petr Mladek <pmladek@suse.com>
2025-11-19 10:24:13 +01:00
Andy Shevchenko
c6e049b621 dma-buf: Switch to use %ptSp
Use %ptSp instead of open coded variants to print content of
struct timespec64 in human readable format.

Reviewed-by: Christian König <christian.koenig@amd.com>
Acked-by: Sumit Semwal <sumit.semwal@linaro.org>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://patch.msgid.link/20251113150217.3030010-5-andriy.shevchenko@linux.intel.com
Signed-off-by: Petr Mladek <pmladek@suse.com>
2025-11-19 10:24:13 +01:00
Andy Shevchenko
98e41fb0ec libceph: Switch to use %ptSp
Use %ptSp instead of open coded variants to print content of
struct timespec64 in human readable format.

Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://patch.msgid.link/20251113150217.3030010-4-andriy.shevchenko@linux.intel.com
Signed-off-by: Petr Mladek <pmladek@suse.com>
2025-11-19 10:24:13 +01:00
Andy Shevchenko
46ac6f51e5 ceph: Switch to use %ptSp
Use %ptSp instead of open coded variants to print content of
struct timespec64 in human readable format.

Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://patch.msgid.link/20251113150217.3030010-3-andriy.shevchenko@linux.intel.com
Signed-off-by: Petr Mladek <pmladek@suse.com>
2025-11-19 10:24:13 +01:00
Andy Shevchenko
bccd593744 lib/vsprintf: Add specifier for printing struct timespec64
A handful drivers want to print a content of the struct timespec64
in a format of %lld:%09ld. In order to make their lives easier, add
the respecting specifier directly to the printf() implementation.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Tested-by: Petr Mladek <pmladek@suse.com>
Link: https://patch.msgid.link/20251113150217.3030010-2-andriy.shevchenko@linux.intel.com
Signed-off-by: Petr Mladek <pmladek@suse.com>
2025-11-19 10:24:13 +01:00
Andy Shevchenko
376c18f30e lib/vsprintf: Deduplicate special hex number specifier data
Two functions use the same specifier data for the special hex number.
Almost the same as the field width is calculated on the size of the
given type. Due to that, make a compound literal macro in order to
deduplicate the rest.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Tested-by: Petr Mladek <pmladek@suse.com>
Link: https://patch.msgid.link/20251113150313.3030700-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Petr Mladek <pmladek@suse.com>
2025-11-19 10:07:52 +01:00
Alexandre Courbot
1aaa5cfbd4 MAINTAINERS: add entry for the Rust num module
This new module provides numerical features useful for the kernel, such
as the `Integer` trait and the `Bounded` integer wrapper.

Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Acked-by: Yury Norov (NVIDIA) <yury.norov@gmail.com>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Link: https://patch.msgid.link/20251108-bounded_ints-v4-3-c9342ac7ebd1@nvidia.com
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2025-11-19 00:22:30 +01:00
Alexandre Courbot
01e345e82e rust: num: add Bounded integer wrapping type
Add the `Bounded` integer wrapper type, which restricts the number of
bits allowed to represent of value.

This is useful to e.g. enforce guarantees when working with bitfields
that have an arbitrary number of bits.

Alongside this type, provide many `From` and `TryFrom` implementations
are to reduce friction when using with regular integer types. Proxy
implementations of common integer operations are also provided.

Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Link: https://patch.msgid.link/20251108-bounded_ints-v4-2-c9342ac7ebd1@nvidia.com
[ Added intra-doc link. Fixed a few other nits. - Miguel ]
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2025-11-19 00:22:24 +01:00
Jonathan Corbet
34a28245b6 Merge branch 'python-modules' into docs-mw
scripts/lib was always a bit of an awkward place for Python libraries; give
them a proper home under tools/lib/python.  Put the modules from
tools/docs/lib there for good measure.

The second patch ties them into a single package namespace.  It would be
more aesthetically pleasing to add a kernel layer, so we could say:

  from kernel.kdoc import kdoc_parser

...and have the kernel-specific stuff clearly marked, but that means adding
an empty directory in the hierarchy, which isn't as pleasing.

There are some other "Python library" directories hidden in the kernel
tree; we may eventually want to encourage them to move as well.
2025-11-18 09:26:11 -07:00
Jonathan Corbet
992a9df41a docs: bring some order to our Python module hierarchy
Now that we have tools/lib/python for our Python modules, turn them into
proper packages with a single namespace so that everything can just use
tools/lib/python in sys.path.  No functional change.

Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <20251110220430.726665-3-corbet@lwn.net>
2025-11-18 09:22:40 -07:00
Jonathan Corbet
778b8ebe51 docs: Move the python libraries to tools/lib/python
"scripts/lib" was always a bit of an awkward place for Python modules.  We
already have tools/lib; create a tools/lib/python, move the libraries
there, and update the users accordingly.

While at it, move the contents of tools/docs/lib.  Rather than make another
directory, just put these documentation-oriented modules under "kdoc".

Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <20251110220430.726665-2-corbet@lwn.net>
2025-11-18 09:22:40 -07:00
Borislav Petkov (AMD)
f690e07859 Documentation/kernel-parameters: Move the kernel build options
Move the kernel build options abbreviations to the .txt file so that
they are together instead of one having to go hunt them in the .rst
file.

Tweak the formatting so that the inclusion of kernel-parameters.txt
still keeps the whole thing somewhat presentable in the html output too.

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <20251112114641.8230-1-bp@kernel.org>
2025-11-18 09:18:51 -07:00
Ankit Khushwaha
6ae0f20727 docs: parse-headers.rst: Fix a typo
Replace "vantage" with "advantage" in the description of userspace API
cross-references.

Signed-off-by: Ankit Khushwaha <ankitkhushwaha.linux@gmail.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <20251115114233.32239-1-ankitkhushwaha.linux@gmail.com>
2025-11-18 09:13:40 -07:00
Xie Yuanbin
46a47693e1 Documentation/kernel-parameters: fix typo in retbleed= kernel parameter description
Fixes a typo in the retbleed= parameter description, changing
"migitation" to "mitigation".

Fixes: 7fbf47c7ce ("x86/bugs: Add AMD retbleed= boot parameter")

Signed-off-by: Xie Yuanbin <qq570070308@gmail.com>
Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <20251116145302.3681-1-qq570070308@gmail.com>
2025-11-18 09:12:17 -07:00
Alexandre Courbot
90f3df4fdf rust: add num module and Integer trait
Introduce the `num` module, which will provide numerical extensions and
utilities for the kernel.

For now, introduce the `Integer` trait, which is implemented for all
primitive integer types to provides their core properties to generic
code.

Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Link: https://patch.msgid.link/20251108-bounded_ints-v4-1-c9342ac7ebd1@nvidia.com
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2025-11-17 22:56:23 +01:00
Miguel Ojeda
e5d330e13f rust: allow clippy::disallowed_names for doctests
Examples (i.e. doctests) may want to use names such as `foo`, thus the
`clippy::disallowed_names` lint [1] gets in the way.

Thus allow it for all doctests.

In addition, remove it from the existing `expect`s we have in a few
doctests.

This does not mean that we should stop trying to find good names for
our examples, though.

Link: https://rust-lang.github.io/rust-clippy/stable/index.html#disallowed_names [1]
Suggested-by: Alice Ryhl <aliceryhl@google.com>
Link: https://lore.kernel.org/rust-for-linux/aRHSLChi5HYXW4-9@google.com/
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Acked-by: Benno Lossin <lossin@kernel.org>
Link: https://patch.msgid.link/20251117080714.876978-1-ojeda@kernel.org
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2025-11-17 22:53:27 +01:00
Zqiang
348d3c587a sched_ext: Use kvfree_rcu() to release per-cpu ksyncs object
The free_kick_syncs_rcu() rcu-callback only invoke kvfree() to
release per-cpu ksyncs object, this can use kvfree_rcu() replace
call_rcu() to release per-cpu ksyncs object in the free_kick_syncs().

Signed-off-by: Zqiang <qiang.zhang@linux.dev>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-11-17 05:16:10 -10:00
Miguel Ojeda
ab844cf320 rust: allow unreachable_pub for doctests
Examples (i.e. doctests) may want to show public items such as structs,
thus the `unreachable_pub` warning is not very helpful.

Thus allow it for all doctests.

In addition, remove it from the existing `expect`s we have in a couple
doctests.

Suggested-by: Alice Ryhl <aliceryhl@google.com>
Link: https://lore.kernel.org/rust-for-linux/aRG9VjsaCjsvAwUn@google.com/
Reviewed-by: David Gow <davidgow@google.com>
Acked-by: Benno Lossin <lossin@kernel.org>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Link: https://patch.msgid.link/20251110113528.1658238-1-ojeda@kernel.org
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2025-11-17 08:41:31 +01:00
ke zijie
bd4b5902e7 docs: zh_CN: scsi: fix broken references in scsi-parameters.rst
0day CI reported several broken references under
Documentation/translations/zh_CN/scsi/scsi-parameters.rst.
These files do not exist under the translations directory.
The correct references are the original English documents under
Documentation/scsi/.

This patch updates all broken paths accordingly.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202511130315.WOiKJQTu-lkp@intel.com/
Signed-off-by: ke zijie <kezijie@leap-io-kernel.com>
Signed-off-by: Alex Shi <alexs@kernel.org>
2025-11-17 13:24:03 +08:00
Tamir Duberstein
bf724be7f0 rust: macros: replace kernel::c_str! with C-Strings
C-String literals were added in Rust 1.77. Replace instances of
`kernel::c_str!` with C-String literals where possible.

Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Reviewed-by: Benno Lossin <lossin@kernel.org>
Signed-off-by: Tamir Duberstein <tamird@gmail.com>
Link: https://patch.msgid.link/20251113-core-cstr-cstrings-v3-6-411b34002774@gmail.com
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2025-11-17 00:59:21 +01:00
Tamir Duberstein
ab8a6c7b34 rust: str: replace kernel::c_str! with C-Strings
C-String literals were added in Rust 1.77. Replace instances of
`kernel::c_str!` with C-String literals where possible.

Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Reviewed-by: Benno Lossin <lossin@kernel.org>
Signed-off-by: Tamir Duberstein <tamird@gmail.com>
Link: https://patch.msgid.link/20251113-core-cstr-cstrings-v3-3-411b34002774@gmail.com
[ Removed unused `c_str` import in doctest. - Miguel ]
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2025-11-17 00:58:55 +01:00
Tamir Duberstein
305b795730 rust: firmware: replace kernel::c_str! with C-Strings
C-String literals were added in Rust 1.77. Replace instances of
`kernel::c_str!` with C-String literals where possible.

Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Reviewed-by: Benno Lossin <lossin@kernel.org>
Acked-by: Danilo Krummrich <dakr@kernel.org>
Signed-off-by: Tamir Duberstein <tamird@gmail.com>
Link: https://patch.msgid.link/20251113-core-cstr-cstrings-v3-1-411b34002774@gmail.com
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2025-11-17 00:53:47 +01:00
Vitaly Wool
f56b131723 rust: rbtree: add immutable cursor
Sometimes we may need to iterate over, or find an element in a read
only (or read mostly) red-black tree, and in that case we don't need a
mutable reference to the tree, which we'll however have to take to be
able to use the current (mutable) cursor implementation.

This patch adds a simple immutable cursor implementation to RBTree,
which enables us to use an immutable tree reference. The existing
(fully featured) cursor implementation is renamed to CursorMut,
while retaining its functionality.

The only existing user of the [mutable] cursor for RBTrees (binder) is
updated to match the changes.

Signed-off-by: Vitaly Wool <vitaly.wool@konsulko.se>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Link: https://patch.msgid.link/20251014123339.2492210-1-vitaly.wool@konsulko.se
[ Applied `rustfmt`. Added intra-doc link. Fixed unclosed example.
  Fixed docs description. Fixed typo and other formatting nits.
    - Miguel ]
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2025-11-16 21:56:57 +01:00
Michal Koutný
a0131c3927 docs: cgroup: No special handling of unpopulated memcgs
The current kernel doesn't handle unpopulated cgroups any special
regarding reclaim protection. Furthermore, this wasn't a case even when
this was introduced in
    bf8d5d52ff ("memcg: introduce memory.min")
Drop the incorrect documentation. (Implementation taking into account
the inner-node constraint may be added later.)

Signed-off-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-11-14 11:21:05 -10:00
Michal Koutný
3755c11679 docs: cgroup: Note about sibling relative reclaim protection
Signed-off-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-11-14 11:21:05 -10:00
Michal Koutný
e27179958c docs: cgroup: Explain reclaim protection target
The protection target is necessary to understand how effective reclaim
protection applies in the hierarchy.

Signed-off-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-11-14 11:21:05 -10:00
Tejun Heo
1dcb98bbb7 sched_ext: Pass locked CPU parameter to scx_hardlockup() and add docs
With the buddy lockup detector, smp_processor_id() returns the detecting CPU,
not the locked CPU, making scx_hardlockup()'s printouts confusing. Pass the
locked CPU number from watchdog_hardlockup_check() as a parameter instead.

Also add kerneldoc comments to handle_lockup(), scx_hardlockup(), and
scx_rcu_cpu_stall() documenting their return value semantics.

Suggested-by: Doug Anderson <dianders@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Acked-by: Andrea Righi <arighi@nvidia.com>
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-11-14 11:11:08 -10:00
Thomas Weißschuh
deab487e0f kbuild: allow architectures to override CC_CAN_LINK
The generic test for CC_CAN_LINK assumes that all architectures use -m32
and -m64 to switch between 32-bit and 64-bit compilation. This is overly
simplistic. Architectures may use other flags (-mabi, -m31, etc.) or may
also require byte order handling (-mlittle-endian, -EL). Expressing all
of the different possibilities will be very complicated and brittle.
Instead allow architectures to supply their own logic which will be
easy to understand and evolve.

Both the boolean ARCH_HAS_CC_CAN_LINK and the string ARCH_USERFLAGS need
to be implemented as kconfig does not allow the reuse of string options.

Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Reviewed-by: Nicolas Schier <nsc@kernel.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Link: https://patch.msgid.link/20251114-kbuild-userprogs-bits-v3-3-4dee0d74d439@linutronix.de
Signed-off-by: Nicolas Schier <nsc@kernel.org>
2025-11-14 20:20:35 +01:00
Thomas Weißschuh
80623f2c83 init: deduplicate cc-can-link.sh invocations
The command to invoke scripts/cc-can-link.sh is very long and new usages
are about to be added.

Add a helper variable to make the code easier to read and maintain.

Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Reviewed-by: Nicolas Schier <nsc@kernel.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Link: https://patch.msgid.link/20251114-kbuild-userprogs-bits-v3-2-4dee0d74d439@linutronix.de
Signed-off-by: Nicolas Schier <nsc@kernel.org>
2025-11-14 20:20:34 +01:00
Thomas Weißschuh
d81d9d389b kbuild: don't enable CC_CAN_LINK if the dummy program generates warnings
It is possible that the kernel toolchain generates warnings when used
together with the system toolchain. This happens for example when the
older kernel toolchain does not handle new versions of sframe debug
information. While these warnings where ignored during the evaluation
of CC_CAN_LINK, together with CONFIG_WERROR the actual userprog build
will later fail.

Example warning:

.../x86_64-linux/13.2.0/../../../../x86_64-linux/bin/ld:
error in /lib/../lib64/crt1.o(.sframe); no .sframe will be created
collect2: error: ld returned 1 exit status

Make sure that the very simple example program does not generate
warnings already to avoid breaking the userprog compilations.

Fixes: ec4a3992bc ("kbuild: respect CONFIG_WERROR for linker and assembler")
Fixes: 3f0ff4cc6f ("kbuild: respect CONFIG_WERROR for userprogs")
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Reviewed-by: Nicolas Schier <nsc@kernel.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Link: https://patch.msgid.link/20251114-kbuild-userprogs-bits-v3-1-4dee0d74d439@linutronix.de
Signed-off-by: Nicolas Schier <nsc@kernel.org>
2025-11-14 20:20:34 +01:00
Guopeng Zhang
1dc830ee4c selftests/cgroup: conform test to KTAP format output
Conform the layout, informational and status messages to KTAP.  No
functional change is intended other than the layout of output messages.

Signed-off-by: Guopeng Zhang <zhangguopeng@kylinos.cn>
Suggested-by: Sebastian Chlad <sebastian.chlad@suse.com>
Acked-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-11-14 08:12:57 -10:00
Thomas Weißschuh
7bc16e72dd kunit: Make filter parameters configurable via Kconfig
Enable the preset of filter parameters from kconfig options, similar to
how other KUnit configuration parameters are handled already.
This is useful to run a subset of tests even if the cmdline is not
readily modifyable.

Link: https://lore.kernel.org/r/20251106-kunit-filter-kconfig-v1-1-d723fb7ac221@linutronix.de
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2025-11-14 11:02:34 -07:00
Zilin Guan
76ce17f6f7 crypto: iaa - Fix incorrect return value in save_iaa_wq()
The save_iaa_wq() function unconditionally returns 0, even when an error
is encountered. This prevents the error code from being propagated to the
caller.

Fix this by returning the 'ret' variable, which holds the actual status
of the operations within the function.

Fixes: ea7a5cbb43 ("crypto: iaa - Add Intel IAA Compression Accelerator crypto driver core")
Signed-off-by: Zilin Guan <zilin@seu.edu.cn>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-11-14 18:15:49 +08:00
Thorsten Blum
2236fc007a crypto: zstd - Remove unnecessary size_t cast
Use max() instead of max_t() since zstd_cstream_workspace_bound() and
zstd_dstream_workspace_bound() already return size_t and casting the
values is unnecessary.

Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-11-14 18:15:49 +08:00
Thorsten Blum
6cf3260755 crypto: zstd - Annotate struct zstd_ctx with __counted_by
Add the __counted_by() compiler attribute to the flexible array member
'wksp' to improve access bounds-checking via CONFIG_UBSAN_BOUNDS and
CONFIG_FORTIFY_SOURCE.

Use struct_size(), which provides additional compile-time checks for
structures with flexible array members (e.g., __must_be_array()), for
the allocation size for a new 'zstd_ctx' while we're at it.

Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-11-14 18:15:49 +08:00
Tetsuo Handa
af3852cda3 padata: remove __padata_list_init()
syzbot is reporting possibility of deadlock due to sharing lock_class_key
between padata_init_squeues() and padata_init_reorder_list(). This is a
false positive, for these callers initialize different object. Unshare
lock_class_key by embedding __padata_list_init() into these callers.

Reported-by: syzbot+bd936ccd4339cea66e6b@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=bd936ccd4339cea66e6b
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-11-14 18:15:49 +08:00
Marco Crivellari
06c489ce5b crypto: qat - add WQ_PERCPU to alloc_workqueue users
Currently if a user enqueues a work item using schedule_delayed_work() the
used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use
WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to
schedule_work() that is using system_wq and queue_work(), that makes use
again of WORK_CPU_UNBOUND.
This lack of consistency cannot be addressed without refactoring the API.

alloc_workqueue() treats all queues as per-CPU by default, while unbound
workqueues must opt-in via WQ_UNBOUND.

This default is suboptimal: most workloads benefit from unbound queues,
allowing the scheduler to place worker threads where they’re needed and
reducing noise when CPUs are isolated.

This continues the effort to refactor workqueue APIs, which began with
the introduction of new workqueues and a new alloc_workqueue flag in:

commit 128ea9f6cc ("workqueue: Add system_percpu_wq and system_dfl_wq")
commit 930c2ea566 ("workqueue: Add new WQ_PERCPU flag")

This change adds a new WQ_PERCPU flag to explicitly request alloc_workqueue()
to be per-cpu when WQ_UNBOUND has not been specified.

With the introduction of the WQ_PERCPU flag (equivalent to !WQ_UNBOUND),
any alloc_workqueue() caller that doesn’t explicitly specify WQ_UNBOUND
must now use WQ_PERCPU.

Once migration is complete, WQ_UNBOUND can be removed and unbound will
become the implicit default.

Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
Acked-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-11-14 18:15:49 +08:00
Krzysztof Kozlowski
054c7f7ad3 crypto: cesa - Simplify with of_device_get_match_data()
Driver's probe function matches against driver's of_device_id table,
where each entry has non-NULL match data, so of_match_node() can be
simplified with of_device_get_match_data().

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-11-14 18:15:49 +08:00
Krzysztof Kozlowski
4ae946a45d crypto: ccp - Simplify with of_device_get_match_data()
Driver's probe function matches against driver's of_device_id table,
where each entry has non-NULL match data, so of_match_node() can be
simplified with of_device_get_match_data().

Acked-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-11-14 18:15:49 +08:00
Krzysztof Kozlowski
ec2054c124 crypto: ccp - Constify 'dev_vdata' member
sp_device->dev_vdata points to only const data (see 'static const struct
sp_dev_vdata dev_vdata'), so can be made pointer to const for code
safety.

Update also sp_get_acpi_version() function which returns this pointer to
'pointer to const' for code readability, even though it is not needed.

On the other hand, do not touch similar function sp_get_of_version()
because it will be immediately removed in next patches.

Acked-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-11-14 18:15:48 +08:00
Krzysztof Kozlowski
c6c247ae33 crypto: artpec6 - Simplify with of_device_get_match_data()
Driver's probe function matches against driver's of_device_id table, so
of_match_node() can be simplified with of_device_get_match_data().

This requires changing the enum used in the driver match data entries to
non-zero, to be able to recognize error case of
of_device_get_match_data().

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Acked-by: Jesper Nilsson <jesper.nilsson@axis.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-11-14 18:15:48 +08:00
Krzysztof Kozlowski
6b94eb68ad hwrng: bcm2835 - Simplify with of_device_get_match_data()
Driver's probe function matches against driver's of_device_id table,
where each entry has non-NULL match data, so of_match_node() can be
simplified with of_device_get_match_data().

Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-11-14 18:15:48 +08:00
Krzysztof Kozlowski
cdd7bbce7b hwrng: bcm2835 - Move MODULE_DEVICE_TABLE() to table definition
Convention is to place MODULE_DEVICE_TABLE() immediately after
definition of the affected table, so one can easily spot missing such.
There is on the other hand no benefits of putting MODULE_DEVICE_TABLE()
far away.

Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-11-14 18:15:48 +08:00
Marco Crivellari
7e8f232ae8 crypto: cavium/nitrox - add WQ_PERCPU to alloc_workqueue users
Currently if a user enqueues a work item using schedule_delayed_work() the
used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use
WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to
schedule_work() that is using system_wq and queue_work(), that makes use
again of WORK_CPU_UNBOUND.
This lack of consistency cannot be addressed without refactoring the API.

alloc_workqueue() treats all queues as per-CPU by default, while unbound
workqueues must opt-in via WQ_UNBOUND.

This default is suboptimal: most workloads benefit from unbound queues,
allowing the scheduler to place worker threads where they’re needed and
reducing noise when CPUs are isolated.

This continues the effort to refactor workqueue APIs, which began with
the introduction of new workqueues and a new alloc_workqueue flag in:

commit 128ea9f6cc ("workqueue: Add system_percpu_wq and system_dfl_wq")
commit 930c2ea566 ("workqueue: Add new WQ_PERCPU flag")

This change adds a new WQ_PERCPU flag to explicitly request alloc_workqueue()
to be per-cpu when WQ_UNBOUND has not been specified.

With the introduction of the WQ_PERCPU flag (equivalent to !WQ_UNBOUND),
any alloc_workqueue() caller that doesn’t explicitly specify WQ_UNBOUND
must now use WQ_PERCPU.

Once migration is complete, WQ_UNBOUND can be removed and unbound will
become the implicit default.

Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-11-14 18:15:48 +08:00
Marco Crivellari
5f8c6c9318 crypto: atmel-i2c - add WQ_PERCPU to alloc_workqueue users
Currently if a user enqueues a work item using schedule_delayed_work() the
used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use
WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to
schedule_work() that is using system_wq and queue_work(), that makes use
again of WORK_CPU_UNBOUND.
This lack of consistency cannot be addressed without refactoring the API.

alloc_workqueue() treats all queues as per-CPU by default, while unbound
workqueues must opt-in via WQ_UNBOUND.

This default is suboptimal: most workloads benefit from unbound queues,
allowing the scheduler to place worker threads where they’re needed and
reducing noise when CPUs are isolated.

This continues the effort to refactor workqueue APIs, which began with
the introduction of new workqueues and a new alloc_workqueue flag in:

commit 128ea9f6cc ("workqueue: Add system_percpu_wq and system_dfl_wq")
commit 930c2ea566 ("workqueue: Add new WQ_PERCPU flag")

This change adds a new WQ_PERCPU flag to explicitly request
alloc_workqueue() to be per-cpu when WQ_UNBOUND has not been specified.

With the introduction of the WQ_PERCPU flag (equivalent to !WQ_UNBOUND),
any alloc_workqueue() caller that doesn’t explicitly specify WQ_UNBOUND
must now use WQ_PERCPU.

Once migration is complete, WQ_UNBOUND can be removed and unbound will
become the implicit default.

Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-11-14 18:15:48 +08:00
Karina Yankevich
d52e9b8843 crypto: rockchip - drop redundant crypto_skcipher_ivsize() calls
The function already initialized the ivsize variable
at the point of declaration, let's use it instead of
calling crypto_skcipher_ivsize() extra couple times.

Found by Linux Verification Center (linuxtesting.org) with SVACE.

Fixes: 57d67c6e82 ("crypto: rockchip - rework by using crypto_engine")
Signed-off-by: Karina Yankevich <k.yankevich@omp.ru>
Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-11-14 18:15:48 +08:00
Matthew Wilcox (Oracle)
76ade24433 slab: Remove references to folios from virt_to_slab()
Use page_slab() instead of virt_to_folio() which will work
perfectly when struct slab is separated from struct folio.

This was the last user of folio_slab(), so delete it.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://patch.msgid.link/20251113000932.1589073-17-willy@infradead.org
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2025-11-13 20:23:58 +01:00
Matthew Wilcox (Oracle)
bbe7117305 kasan: Remove references to folio in __kasan_mempool_poison_object()
In preparation for splitting struct slab from struct page and struct
folio, remove mentions of struct folio from this function.  There is a
mild improvement for large kmalloc objects as we will avoid calling
compound_head() for them.  We can discard the comment as using
PageLargeKmalloc() rather than !folio_test_slab() makes it obvious.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: kasan-dev <kasan-dev@googlegroups.com>
Link: https://patch.msgid.link/20251113000932.1589073-16-willy@infradead.org
Acked-by: Harry Yoo <harry.yoo@oracle.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2025-11-13 20:23:58 +01:00
Matthew Wilcox (Oracle)
b8557d109e memcg: Convert mem_cgroup_from_obj_folio() to mem_cgroup_from_obj_slab()
In preparation for splitting struct slab from struct page and struct
folio, convert the pointer to a slab rather than a folio.  This means
we can end up passing a NULL slab pointer to mem_cgroup_from_obj_slab()
if the pointer is not to a page allocated to slab, and we handle that
appropriately by returning NULL.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: cgroups@vger.kernel.org
Link: https://patch.msgid.link/20251113000932.1589073-15-willy@infradead.org
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2025-11-13 20:23:09 +01:00
Mauro Carvalho Chehab
68f3d40ea0 docs: parse-headers.rst: remove uneeded parenthesis
As pointed by Randy, the parenthesis there is not needed and it
violates the document coding style.

Reported-by: Randy Dunlap <rdunlap@infradead.org>
Closes: https://lore.kernel.org/linux-doc/9d709020-03fe-467c-be7f-d5ee251bb79a@infradead.org/
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <e5de9f7b1f6a963b2912574a65495c47cfbb13ba.1762878176.git.mchehab+huawei@kernel.org>
2025-11-13 09:34:40 -07:00
Mauro Carvalho Chehab
62d785159c docs: Makefile: update SPHINXDIRS documentation
Since the beginning, SPHINXDIRS was meant to be used by any
subdirectory inside Documentation/ that contains a file named
index.rst on it. The typical usecase for SPHINXDIRS is help
building subsystem-specific documentation, without needing to
wait for the entire building (with can take 3 minutes with
Sphinx 8.x and above, and a lot more with older versions).

Yet, the documentation for such feature was written back in
2016, where almost all index.rst files were at the first
level (Documentation/*/index.rst).

Update the documentation to reflect the way it works.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <683469813350214da122c258063dd71803ff700b.1763031632.git.mchehab+huawei@kernel.org>
2025-11-13 09:22:30 -07:00
Mauro Carvalho Chehab
f64c7e113d scripts: docs: kdoc_files.py: don't consider symlinks as directories
As reported by Randy, currently kdoc_files can go into endless
looks when symlinks are used:

	$ ln -s . Documentation/peci/foo
	$ ./scripts/kernel-doc Documentation/peci/
	...
	  File "/new_devel/docs/scripts/lib/kdoc/kdoc_files.py", line 52, in _parse_dir
	    if entry.is_dir():
	       ~~~~~~~~~~~~^^
	OSError: [Errno 40] Too many levels of symbolic links: 'Documentation/peci/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo'

Prevent that by not considering symlinks as directories.

Reported-by: Randy Dunlap <rdunlap@infradead.org>
Closes: https://lore.kernel.org/linux-doc/80701524-09fd-4d68-8715-331f47c969f2@infradead.org/
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <73c3450f34e2a4b42ef2ef279d7487c47d22e3bd.1763027622.git.mchehab+huawei@kernel.org>
2025-11-13 09:16:15 -07:00
Christoph Hellwig
3d2492401d mempool: factor out a mempool_adjust_gfp helper
Add a helper to better isolate and document the gfp flags adjustments.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://patch.msgid.link/20251113084022.1255121-6-hch@lst.de
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2025-11-13 17:10:38 +01:00
Christoph Hellwig
b77fc08e39 mempool: add error injection support
Add a call to should_fail_ex that forces mempool to actually allocate
from the pool to stress the mempool implementation when enabled through
debugfs.  By default should_fail{,_ex} prints a very verbose stack trace
that clutters the kernel log, slows down execution and triggers the
kernel bug detection in xfstests.  Pass FAULT_NOWARN and print a
single-line message notating the caller instead so that full tests
can be run with fault injection.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Link: https://patch.msgid.link/20251113084022.1255121-5-hch@lst.de
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2025-11-13 17:10:38 +01:00
Christoph Hellwig
5c829783e5 mempool: improve kerneldoc comments
Use proper formatting, use full sentences and reduce some verbosity in
function parameter descriptions.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://patch.msgid.link/20251113084022.1255121-4-hch@lst.de
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2025-11-13 17:10:38 +01:00
Christoph Hellwig
e9939cebc0 mm: improve kerneldoc comments for __alloc_pages_bulk
Describe the semantincs in more detail, as the filling empty slots in
an array scheme is not quite obvious.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://patch.msgid.link/20251113084022.1255121-3-hch@lst.de
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2025-11-13 17:10:38 +01:00
Christoph Hellwig
0f2620ffc4 fault-inject: make enum fault_flags available unconditionally
This will allow using should_fail_ex from code without having to
make it conditional on CONFIG_FAULT_INJECTION.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://patch.msgid.link/20251113084022.1255121-2-hch@lst.de
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2025-11-13 17:10:38 +01:00
Matthew Wilcox (Oracle)
5934b1be8d usercopy: Remove folio references from check_heap_object()
Use page_slab() instead of virt_to_folio() followed by folio_slab().
We do end up calling compound_head() twice for non-slab copies, but that
will not be a problem once we allocate memdescs separately.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Kees Cook <kees@kernel.org>
Cc: Gustavo A. R. Silva <gustavoars@kernel.org>
Cc: linux-hardening@vger.kernel.org
Link: https://patch.msgid.link/20251113000932.1589073-14-willy@infradead.org
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Reviewed-by: Kees Cook <kees@kernel.org>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2025-11-13 11:01:08 +01:00
Matthew Wilcox (Oracle)
025f5b870b slab: Remove folio references from kfree_nolock()
In preparation for splitting struct slab from struct page and struct
folio, remove mentions of struct folio from this function.  Since large
kmalloc objects are not supported here, we can just use virt_to_slab().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://patch.msgid.link/20251113000932.1589073-13-willy@infradead.org
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2025-11-13 11:01:08 +01:00
Matthew Wilcox (Oracle)
7d26842fd4 slab: Remove folio references from kfree_rcu_sheaf()
In preparation for splitting struct slab from struct page and struct
folio, remove mentions of struct folio from this function.  Since
we don't need to handle large kmalloc objects specially here, we
can just use virt_to_slab().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://patch.msgid.link/20251113000932.1589073-12-willy@infradead.org
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2025-11-13 11:01:08 +01:00
Matthew Wilcox (Oracle)
4a2c2110a3 slab: Remove folio references from build_detached_freelist()
Use pages and slabs directly instead of converting to folios.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://patch.msgid.link/20251113000932.1589073-11-willy@infradead.org
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2025-11-13 11:01:08 +01:00
Matthew Wilcox (Oracle)
5db009dc10 slab: Remove folio references from __do_krealloc()
One slight tweak I made is to calculate 'ks' earlier, which means we
can reuse it in the warning rather than calculating the object size twice.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://patch.msgid.link/20251113000932.1589073-10-willy@infradead.org
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2025-11-13 11:01:08 +01:00
Matthew Wilcox (Oracle)
e409021685 slab: Remove folio references from kfree()
This should generate identical code to the previous version, but
without any dependency on how folios work.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://patch.msgid.link/20251113000932.1589073-9-willy@infradead.org
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2025-11-13 11:01:08 +01:00
Matthew Wilcox (Oracle)
f262cfd75d slab: Remove folio references from kvfree_rcu_cb()
Remove conversions from folio to page and folio to slab.  This is
preparation for separately allocated struct slab from struct page.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://patch.msgid.link/20251113000932.1589073-8-willy@infradead.org
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2025-11-13 11:01:08 +01:00
Matthew Wilcox (Oracle)
0bdfdd6a05 slab: Remove folio references from free_large_kmalloc()
There's no need to use folio APIs here; just use a page directly.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Link: https://patch.msgid.link/20251113000932.1589073-7-willy@infradead.org
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2025-11-13 11:01:08 +01:00
Matthew Wilcox (Oracle)
f9395bf5db slab: Remove folio references from ___kmalloc_large_node()
There's no need to use folio APIs here; just use a page directly.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Link: https://patch.msgid.link/20251113000932.1589073-6-willy@infradead.org
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2025-11-13 11:01:08 +01:00
Matthew Wilcox (Oracle)
09fa19e2f3 slab: Remove folio references in slab alloc/free
Use pages directly to further the split between slab and folio.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Link: https://patch.msgid.link/20251113000932.1589073-5-willy@infradead.org
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2025-11-13 11:01:08 +01:00
Matthew Wilcox (Oracle)
ea4702b170 slab: Remove folio references in memcg_slab_post_charge()
This allows us to skip the compound_head() call for large kmalloc
objects as the virt_to_page() call will always give us the head page
for the large kmalloc case.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Link: https://patch.msgid.link/20251113000932.1589073-4-willy@infradead.org
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2025-11-13 11:01:08 +01:00
Matthew Wilcox (Oracle)
ee1ee8abc4 slab: Remove folio references from __ksize()
In the future, we will separate slab, folio and page from each other
and calling virt_to_folio() on an address allocated from slab will
return NULL.  Delay the conversion from struct page to struct slab
until we know we're not dealing with a large kmalloc allocation.
There's a minor win for large kmalloc allocations as we avoid the
compound_head() hidden in virt_to_folio().

This deprecates calling ksize() on memory allocated by alloc_pages().
Today it becomes a warning and support will be removed entirely in
the future.

Introduce large_kmalloc_size() to abstract how we represent the size
of a large kmalloc allocation.  For now, this is the same as
page_size(), but it will change with separately allocated memdescs.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://patch.msgid.link/20251113000932.1589073-3-willy@infradead.org
Acked-by: David Hildenbrand (Red Hat) <david@kernel.org>
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2025-11-13 11:01:08 +01:00
Matthew Wilcox (Oracle)
2bcd3800f2 slab: Reimplement page_slab()
In order to separate slabs from folios, we need to convert from any page
in a slab to the slab directly without going through a page to folio
conversion first.

Up to this point, page_slab() has followed the example of other memdesc
converters (page_folio(), page_ptdesc() etc) and just cast the pointer
to the requested type, regardless of whether the pointer is actually a
pointer to the correct type or not.

That changes with this commit; we check that the page actually belongs
to a slab and return NULL if it does not.  Other memdesc converters will
adopt this convention in future.

kfence was the only user of page_slab(), so adjust it to the new way
of working.  It will need to be touched again when we separate slab
from page.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Alexander Potapenko <glider@google.com>
Cc: Marco Elver <elver@google.com>
Cc: kasan-dev@googlegroups.com
Link: https://patch.msgid.link/20251113000932.1589073-2-willy@infradead.org
Acked-by: David Hildenbrand (Red Hat) <david@kernel.org>
Tested-by: Marco Elver <elver@google.com>
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2025-11-13 11:01:08 +01:00
Baolin Liu
6adf4b11fa mm: simplify list initialization in barn_shrink()
In barn_shrink(), use LIST_HEAD() to declare and initialize the
list_head in one step instead of using INIT_LIST_HEAD() separately.

No functional change.

Signed-off-by: Baolin Liu <liubaolin@kylinos.cn>
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2025-11-13 10:32:19 +01:00
Vlastimil Babka
c33196c942 slab: use struct freelist_counters as parameters in relevant functions
In functions such as [__]slab_update_freelist() and
__slab_update_freelist_fast/slow() we pass old and new freelist and
counters as 4 separate parameters. The underlying
__update_freelist_fast() then constructs struct freelist_counters
variables for passing the full freelist+counter combinations to cmpxchg
double.

In most cases we actually start with struct freelist_counters variables,
but then pass the individual fields, only to construct new struct
freelist_counters variables. While it's all inlined and thus should be
efficient, we can simplify this code.

Thus replace the 4 parameters for individual fields with two pointers to
struct freelist_counters wherever applicable. __update_freelist_fast()
can then pass them directly to try_cmpxchg_freelist().

The code is also more obvious as the pattern becomes unified such that
we set up "old" and "new" struct freelist_counters variables upfront as
we fully need them to be, and simply call [__]slab_update_freelist() on
them.  Previously some of the "new" values would be hidden among the
many parameters and thus make it harder to figure out what the code
does.

Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2025-11-13 10:13:57 +01:00
Andrea Righi
67932f6918 sched_ext: Update comments replacing breather with aborting mechanism
Commit 5ebec443fb ("sched_ext: Exit dispatch and move operations
immediately when aborting") replaced the breather mechanism with the
scx_aborting flag.

Update comments removing references to the breather mechanism to avoid
confusion.

Signed-off-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-11-12 08:54:44 -10:00
Tejun Heo
95d1df610c sched_ext: Implement load balancer for bypass mode
In bypass mode, tasks are queued on per-CPU bypass DSQs. While this works well
in most cases, there is a failure mode where a BPF scheduler can skew task
placement severely before triggering bypass in highly over-saturated systems.
If most tasks end up concentrated on a few CPUs, those CPUs can accumulate
queues that are too long to drain in a reasonable time, leading to RCU stalls
and hung tasks.

Implement a simple timer-based load balancer that redistributes tasks across
CPUs within each NUMA node. The balancer runs periodically (default 500ms,
tunable via bypass_lb_intv_us module parameter) and moves tasks from overloaded
CPUs to underloaded ones.

When moving tasks between bypass DSQs, the load balancer holds nested DSQ locks
to avoid dropping and reacquiring the donor DSQ lock on each iteration, as
donor DSQs can be very long and highly contended. Add the SCX_ENQ_NESTED flag
and use raw_spin_lock_nested() in dispatch_enqueue() to support this. The load
balancer timer function reads scx_bypass_depth locklessly to check whether
bypass mode is active. Use WRITE_ONCE() when updating scx_bypass_depth to pair
with the READ_ONCE() in the timer function.

This has been tested on a 192 CPU dual socket AMD EPYC machine with ~20k
runnable tasks running scx_cpu0. As scx_cpu0 queues all tasks to CPU0, almost
all tasks end up on CPU0 creating severe imbalance. Without the load balancer,
disabling the scheduler can lead to RCU stalls and hung tasks, taking a very
long time to complete. With the load balancer, disable completes in about a
second.

The load balancing operation can be monitored using the sched_ext_bypass_lb
tracepoint and disabled by setting bypass_lb_intv_us to 0.

v2: Lock both rq and DSQ in bypass_lb_cpu() and use dispatch_dequeue_locked()
    to prevent races with dispatch_dequeue() (Andrea Righi).

Cc: Andrea Righi <arighi@nvidia.com>
Cc: Dan Schatzberg <schatzberg.dan@gmail.com>
Cc: Emil Tsalapatis <etsal@meta.com>
Reviewed_by: Emil Tsalapatis <emil@etsalapatis.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-11-12 06:43:44 -10:00
Tejun Heo
d18b96ce12 sched_ext: Factor out abbreviated dispatch dequeue into dispatch_dequeue_locked()
move_task_between_dsqs() contains open-coded abbreviated dequeue logic when
moving tasks between non-local DSQs. Factor this out into
dispatch_dequeue_locked() which can be used when both the task's rq and dsq
locks are already held. Add lockdep assertions to both dispatch_dequeue() and
the new helper to verify locking requirements.

This prepares for the load balancer which will need the same abbreviated
dequeue pattern.

Cc: Andrea Righi <arighi@nvidia.com>
Cc: Dan Schatzberg <schatzberg.dan@gmail.com>
Cc: Emil Tsalapatis <etsal@meta.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-11-12 06:43:44 -10:00
Tejun Heo
d2974cc79f sched_ext: Factor out scx_dsq_list_node cursor initialization into INIT_DSQ_LIST_CURSOR
Factor out scx_dsq_list_node cursor initialization into INIT_DSQ_LIST_CURSOR
macro in preparation for additional users.

Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Cc: Dan Schatzberg <schatzberg.dan@gmail.com>
Acked-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-11-12 06:43:44 -10:00
Tejun Heo
c948d9f80c sched_ext: Add scx_cpu0 example scheduler
Add scx_cpu0, a simple scheduler that queues all tasks to a single DSQ and
only dispatches them from CPU0 in FIFO order. This is useful for testing bypass
behavior when many tasks are concentrated on a single CPU. If the load balancer
doesn't work, bypass mode can trigger task hangs or RCU stalls as the queue is
long and there's only one CPU working on it.

v2: Check whether task is on CPU0 at enqueue using scx_bpf_task_cpu() instead
    of nr_cpus_allowed (Andrea Righi).

Cc: Dan Schatzberg <schatzberg.dan@gmail.com>
Cc: Emil Tsalapatis <etsal@meta.com>
Reviewed-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-11-12 06:43:44 -10:00
Tejun Heo
582f700e1b sched_ext: Hook up hardlockup detector
A poorly behaving BPF scheduler can trigger hard lockup. For example, on a
large system with many tasks pinned to different subsets of CPUs, if the BPF
scheduler puts all tasks in a single DSQ and lets all CPUs at it, the DSQ lock
can be contended to the point where hardlockup triggers. Unfortunately,
hardlockup can be the first signal out of such situations, thus requiring
hardlockup handling.

Hook scx_hardlockup() into the hardlockup detector to try kicking out the
current scheduler in an attempt to recover the system to a good state. The
handling strategy can delay watchdog taking its own action by one polling
period; however, given that the only remediation for hardlockup is crash, this
is likely an acceptable trade-off.

v2: Add missing dummy scx_hardlockup() definition for
    !CONFIG_SCHED_CLASS_EXT (kernel test bot).

Reported-by: Dan Schatzberg <schatzberg.dan@gmail.com>
Cc: Emil Tsalapatis <etsal@meta.com>
Cc: Douglas Anderson <dianders@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-11-12 06:43:44 -10:00
Tejun Heo
7ed8df0d15 sched_ext: Make handle_lockup() propagate scx_verror() result
handle_lockup() currently calls scx_verror() but ignores its return value,
always returning true when the scheduler is enabled. Make it capture and return
the result from scx_verror(). This prepares for hardlockup handling.

Cc: Dan Schatzberg <schatzberg.dan@gmail.com>
Cc: Emil Tsalapatis <etsal@meta.com>
Reviewed-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-11-12 06:43:44 -10:00
Tejun Heo
4ba54a6cbd sched_ext: Refactor lockup handlers into handle_lockup()
scx_rcu_cpu_stall() and scx_softlockup() share the same pattern: check if the
scheduler is enabled under RCU read lock and trigger an error if so. Extract
the common pattern into handle_lockup() helper. Add scx_verror() macro and use
guard(rcu)().

This simplifies both handlers, reduces code duplication, and prepares for
hardlockup handling.

Reviewed-by: Dan Schatzberg <schatzberg.dan@gmail.com>
Reviewed-by: Andrea Righi <arighi@nvidia.com>
Cc: Emil Tsalapatis <etsal@meta.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-11-12 06:43:44 -10:00
Tejun Heo
f2fe382e1f sched_ext: Make scx_exit() and scx_vexit() return bool
Make scx_exit() and scx_vexit() return bool indicating whether the calling
thread successfully claimed the exit. This will be used by the abort mechanism
added in a later patch.

Reviewed-by: Dan Schatzberg <schatzberg.dan@gmail.com>
Reviewed-by: Andrea Righi <arighi@nvidia.com>
Cc: Emil Tsalapatis <etsal@meta.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-11-12 06:43:44 -10:00
Tejun Heo
5ebec443fb sched_ext: Exit dispatch and move operations immediately when aborting
62dcbab8b0 ("sched_ext: Avoid live-locking bypass mode switching") introduced
the breather mechanism to inject delays during bypass mode switching. It
maintains operation semantics unchanged while reducing lock contention to avoid
live-locks on large NUMA systems.

However, the breather only activates when exiting the scheduler, so there's no
need to maintain operation semantics. Simplify by exiting dispatch and move
operations immediately when scx_aborting is set. In consume_dispatch_q(), break
out of the task iteration loop. In scx_dsq_move(), return early before
acquiring locks.

This also fixes cases the breather mechanism cannot handle. When a large system
has many runnable threads affinitized to different CPU subsets and the BPF
scheduler places them all into a single DSQ, many CPUs can scan the DSQ
concurrently for tasks they can run. This can cause DSQ and RQ locks to be held
for extended periods, leading to various failure modes. The breather cannot
solve this because once in the consume loop, there's no exit. The new mechanism
fixes this by exiting the loop immediately.

The bypass DSQ is exempted to ensure the bypass mechanism itself can make
progress.

v2: Use READ_ONCE() when reading scx_aborting (Andrea Righi).

Reported-by: Dan Schatzberg <schatzberg.dan@gmail.com>
Reviewed-by: Dan Schatzberg <schatzberg.dan@gmail.com>
Cc: Andrea Righi <arighi@nvidia.com>
Cc: Emil Tsalapatis <etsal@meta.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-11-12 06:43:44 -10:00
Tejun Heo
a69040ed57 sched_ext: Simplify breather mechanism with scx_aborting flag
The breather mechanism was introduced in 62dcbab8b0 ("sched_ext: Avoid
live-locking bypass mode switching") and e32c260195 ("sched_ext: Enable the
ops breather and eject BPF scheduler on softlockup") to prevent live-locks by
injecting delays when CPUs are trapped in dispatch paths.

Currently, it uses scx_breather_depth (atomic_t) and scx_in_softlockup
(unsigned long) with separate increment/decrement and cleanup operations. The
breather is only activated when aborting, so tie it directly to the exit
mechanism. Replace both variables with scx_aborting flag set when exit is
claimed and cleared after bypass is enabled. Introduce scx_claim_exit() to
consolidate exit_kind claiming and breather enablement. This eliminates
scx_clear_softlockup() and simplifies scx_softlockup() and scx_bypass().

The breather mechanism will be replaced by a different abort mechanism in a
future patch. This simplification prepares for that change.

Reviewed-by: Dan Schatzberg <schatzberg.dan@gmail.com>
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Acked-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-11-12 06:43:44 -10:00
Tejun Heo
61debc251c sched_ext: Use per-CPU DSQs instead of per-node global DSQs in bypass mode
Bypass mode routes tasks through fallback dispatch queues. Originally a single
global DSQ, b7b3b2dbae ("sched_ext: Split the global DSQ per NUMA node")
changed this to per-node DSQs to resolve NUMA-related livelocks.

Dan Schatzberg found per-node DSQs can still livelock when many threads are
pinned to different small CPU subsets: each CPU must scan many incompatible
tasks to find runnable ones, causing severe contention with high CPU counts.

Switch to per-CPU bypass DSQs. Each task queues on its current CPU. Default
idle CPU selection and direct dispatch handle most cases well.

This introduces a failure mode when tasks concentrate on one CPU in
over-saturated systems. If the BPF scheduler severely skews placement before
triggering bypass, that CPU's queue may be too long to drain, causing RCU
stalls. A load balancer in a future patch will address this. The bypass DSQ is
separate from local DSQ to enable load balancing: local DSQs use rq locks,
preventing efficient scanning and transfer across CPUs, especially problematic
when systems are already contended.

v2: Clarified why bypass DSQ is separate from local DSQ (Andrea Righi).

Reported-by: Dan Schatzberg <schatzberg.dan@gmail.com>
Reviewed-by: Dan Schatzberg <schatzberg.dan@gmail.com>
Reviewed-by: Andrea Righi <arighi@nvidia.com>
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-11-12 06:43:44 -10:00
Tejun Heo
3546119f18 sched_ext: Refactor do_enqueue_task() local and global DSQ paths
The local and global DSQ enqueue paths in do_enqueue_task() share the same
slice refill logic. Factor out the common code into a shared enqueue label.
This makes adding new enqueue cases easier. No functional changes.

Reviewed-by: Andrea Righi <arighi@nvidia.com>
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-11-12 06:43:44 -10:00
Tejun Heo
bfd3749d48 sched_ext: Use shorter slice in bypass mode
There have been reported cases of bypass mode not making forward progress fast
enough. The 20ms default slice is unnecessarily long for bypass mode where the
primary goal is ensuring all tasks can make forward progress.

Introduce SCX_SLICE_BYPASS set to 5ms and make the scheduler automatically
switch to it when entering bypass mode. Also make the bypass slice value
tunable through the slice_bypass_us module parameter (adjustable between 100us
and 100ms) to make it easier to test whether slice durations are a factor in
problem cases.

v3: Use READ_ONCE/WRITE_ONCE for scx_slice_dfl access (Dan).

v2: Removed slice_dfl_us module parameter. Fixed typos (Andrea).

Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Reviewed-by: Andrea Righi <arighi@nvidia.com>
Cc: Dan Schatzberg <schatzberg.dan@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-11-12 06:43:43 -10:00
Andy Shevchenko
372a12bd5d lib/vsprintf: Check pointer before dereferencing in time_and_date()
The pointer may be invalid when gets to the printf(). In particular
the time_and_date() dereferencing it in some cases without checking.

Move the check from rtc_str() to time_and_date() to cover all cases.

Fixes: 7daac5b2fd ("lib/vsprintf: Print time64_t in human readable format")
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Link: https://patch.msgid.link/20251110132118.4113976-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Petr Mladek <pmladek@suse.com>
2025-11-12 11:36:20 +01:00
Thorsten Blum
0e6ebf8778 device_cgroup: Refactor devcgroup_seq_show to use seq_put* helpers
Replace set_access(), set_majmin(), and type_to_char() with new helpers
seq_putaccess(), seq_puttype(), and seq_putversion() that write directly
to 'seq_file'.

Simplify devcgroup_seq_show() by hard-coding "a *:* rwm", and use the
new seq_put* helper functions to list the exceptions otherwise.

This allows us to remove the intermediate string buffers while
maintaining the same functionality, including wildcard handling.

Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Acked-by: Serge Hallyn <serge@hallyn.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-11-11 19:47:24 -05:00
Chen Ridong
f23cb0ced8 cpuset: remove need_rebuild_sched_domains
Previously, update_cpumasks_hier() used need_rebuild_sched_domains to
decide whether to invoke rebuild_sched_domains_locked(). Now that
rebuild_sched_domains_locked() only sets force_rebuild, the flag is
redundant. Hence, remove it.

Signed-off-by: Chen Ridong <chenridong@huawei.com>
Reviewed-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-11-11 11:47:08 -10:00
Chen Ridong
648d43da64 cpuset: remove global remote_children list
The remote_children list is used to track all remote partitions attached
to a cpuset. However, it serves no other purpose. Using a boolean flag to
indicate whether a cpuset is a remote partition is a more direct approach,
making remote_children unnecessary.

This patch replaces the list with a remote_partition flag in the cpuset
structure and removes remote_children entirely.

Signed-off-by: Chen Ridong <chenridong@huawei.com>
Reviewed-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-11-11 11:47:08 -10:00
Chen Ridong
0241e9e2bd cpuset: simplify node setting on error
There is no need to jump to the 'done' label upon failure, as no cleanup
is required. Return the error code directly instead.

Signed-off-by: Chen Ridong <chenridong@huawei.com>
Reviewed-by: Waiman Long <longman@redhat.com>
Reviewed-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-11-11 11:47:08 -10:00
Casey Schaufler
29c701f90b Smack: function parameter 'gfp' not described
Add a descrition of the gfp parameter to smk_import_allocated_label().

Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202511061746.dPegBnNf-lkp@intel.com/
2025-11-11 12:00:18 -08:00
Bert Karwatzki
01a743550b cgroup: include missing header for struct irq_work
To compile cgroup.c with PREEMPT_RT=y include header which declares
struct irq_work.

Fixes: 9311e6c29b ("cgroup: Fix sleeping from invalid context warning on PREEMPT_RT")

Signed-off-by: Bert Karwatzki <spasswolf@web.de>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-11-11 08:52:42 -10:00
Nicolas Schier
7bade3f7e9 scripts: headers_install.sh: Remove two outdated config leak ignore entries
Remove config leak ignore entries for arch/arc/include/uapi/asm/page.h
as they have been removed in commit d3e5bab923 ("arch: simplify
architecture specific page size configuration").

Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Link: https://patch.msgid.link/20251105-update-headers-install-config-leak-ignore-list-v1-1-40be3eed68cb@kernel.org
Signed-off-by: Nicolas Schier <nsc@kernel.org>
2025-11-11 16:01:13 +01:00
Bagas Sanjaya
77cbf5fbe5 Documentation: tps6594-pfsm: Fix macro cross-reference syntax
C macro references are erroneously written using :c:macro:: (note the
double colon). This causes the references to be outputted as combination
of verbatim roles and italicized names instead.

Correct the syntax.

Fixes: dce5488896 ("Documentation: Add TI TPS6594 PFSM")
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <20251104041812.31402-4-bagasdotme@gmail.com>
2025-11-10 12:54:49 -07:00
Bagas Sanjaya
3ba679d443 Documentation: mrvl-cn10k-dpi: Fix macro cross-reference syntax
C macro references are erroneously written using :c:macro:: (note the
double colon). This causes the references to be outputted as combination
of verbatim roles and italicized names instead.

Correct the syntax.

Fixes: 5f67eef6df ("misc: mrvl-cn10k-dpi: add Octeon CN10K DPI administrative driver")
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <20251104041812.31402-3-bagasdotme@gmail.com>
2025-11-10 12:54:49 -07:00
Bagas Sanjaya
c1be952f1e Documentation: amd-sbi: Wrap miscdevice listing snippet in literal code block
Device file listing (ls output) is shown as long-running paragraph
instead. Wrap it in literal code block.

Fixes: 4d95514d14 ("misc: amd-sbi: Add document for AMD SB IOCTL description")
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <20251104041812.31402-2-bagasdotme@gmail.com>
2025-11-10 12:54:49 -07:00
Bagas Sanjaya
c6804c6af9 Documentation: taskstats: Reindent payload kinds list
Payload kinds list text is indented at the first text column, rather
than aligned to the list number. As an effect, the third item becomes
sublist of second item's third sublist item (TASKTYPE_TYPE_STATS).

Reindent the list text.

Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <20251104130751.22755-1-bagasdotme@gmail.com>
2025-11-10 12:52:12 -07:00
Gou Hao
1f37cae5d1 xfs-doc: Fix typo error
Online fsck may take longer than offline fsck...

Signed-off-by: Gou Hao <gouhao@uniontech.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <20251105013506.358-1-gouhao@uniontech.com>
2025-11-10 12:49:45 -07:00
Bagas Sanjaya
285f79bebf Documentation: parport-lowlevel: Separate function listing code blocks
Commit be9d0411f1 ("parport-lowlevel.txt: standardize document
format") reSTify parport interface documentation but forgets to properly
mark function listing code blocks up. As such, these are rendered as
long-running normal paragraph instead.

Fix them by adding missing separator between the code block marker and
the listing.

Fixes: be9d0411f1 ("parport-lowlevel.txt: standardize document format")
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <20251105124947.45048-1-bagasdotme@gmail.com>
2025-11-10 12:47:36 -07:00
Mauro Carvalho Chehab
fc9e9a39cc tools/docs/get_abi.py: fix get_abi library directory
changeset a5dd93016f ("docs: move get_abi.py to tools/docs") moved
the script location, but didn't update library location, causing it
to fail.

Fixes: a5dd93016f ("docs: move get_abi.py to tools/docs")
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <3839bc4db2d5c7e14dd2686876a2c7b5d72a46cd.1762523688.git.mchehab+huawei@kernel.org>
2025-11-10 12:45:46 -07:00
Mauro Carvalho Chehab
d69a03a97a docs: doc-guide: parse-headers.rst update its documentation
Besides converting from Perl to Python, parse-headers has gained
some new functionality and was moved to tools/docs.

Update its documentation accordingly.

Reported-by: Randy Dunlap <rdunlap@infradead.org>
Closes: https://lore.kernel.org/linux-doc/9391a0f0-7c92-42aa-8190-28255b22e131@infradead.org/
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Reviewed-by: Dongliang Mu <dzm91@hust.edu.cn>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <1f9025dc98dc58da3cc31f3343d5027f351be338.1762594622.git.mchehab+huawei@kernel.org>
2025-11-10 12:40:54 -07:00
pierwill
e1cf4aac38 docs: Fix missing word in spectre.rst
Corrects a missing word in the hardware vulnerability docs.

Signed-off-by: Will Pierce <pierwill@protonmail.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <Ru-d3ltJIyY4Oc6tzHwpSiDeFwSLHEzd7Utcr6iobgIy1B8wLRI4f6JiCb0a9n-0-r19d0dyLL3yS8KWVcyHfpkyDErWXYTkI3AJfUPTNCc=@protonmail.com>
2025-11-10 12:39:02 -07:00
zhang jiao
4457265c61 workqueue: Remove unused assert_rcu_or_wq_mutex_or_pool_mutex
assert_rcu_or_wq_mutex_or_pool_mutex is never referenced in the code.
Just remove it.

Signed-off-by: zhang jiao <zhangjiao2@cmss.chinamobile.com>
Reviewed-by: Lai Jiangshan <jiangshanlai@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-11-10 06:10:18 -10:00
Vlastimil Babka
32cf9f2182 slab: use struct freelist_counters for local variables instead of struct slab
In several functions we declare local struct slab variables so we can
work with the freelist and counters fields (including the sub-counters
that are in the union) comfortably.

With struct freelist_counters containing the full counters definition,
we can now reduce the local variables to that type as we don't need the
other fields in struct slab.

Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2025-11-10 15:35:21 +01:00
Vlastimil Babka
3993ca9d64 slab: turn freelist_aba_t to a struct and fully define counters there
In struct slab we currently have freelist and counters pair, where
counters itself is a union of unsigned long with a sub-struct of
several smaller fields. Then for the usage with double cmpxchg we have
freelist_aba_t that duplicates the definition of the freelist+counters
with implicitly the same layout as the full definition in struct slab.

Thanks to -fms-extension we can now move the full counters definition to
freelist_aba_t (while changing it to struct freelist_counters as a
typedef is unnecessary and discouraged) and replace the relevant part in
struct slab to an unnamed reference to it.

The immediate benefit is the removal of duplication and no longer
relying on the same layout implicitly. It also allows further cleanups
thanks to having the full definition of counters in struct
freelist_counters.

Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2025-11-10 15:35:21 +01:00
Vlastimil Babka
b244358e9a slab: separate struct freelist_tid from kmem_cache_cpu
In kmem_cache_cpu we currently have a union of the freelist+tid pair
with freelist_aba_t, relying implicitly on the type compatibility with the
freelist+counters pair used in freelist_aba_t.

To allow further changes to freelist_aba_t, we can instead define a
separate struct freelist_tid (instead of a typedef, per the coding
style) for kmem_cache_cpu, as that affects only a single helper
__update_cpu_freelist_fast().

We can add the resulting struct freelist_tid to kmem_cache_cpu as
unnamed field thanks to -fms-extensions, so that freelist and tid fields
can still be accessed directly.

Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2025-11-10 15:35:21 +01:00
Vlastimil Babka
c99f969d9a Merge tag 'kbuild-ms-extensions-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux into slab/for-6.19/freelist_aba_t_cleanups
Shared branch between Kbuild and other trees for enabling '-fms-extensions' for 6.19

Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2025-11-10 15:34:28 +01:00
Petr Mladek
394aa576c0 printk_ringbuffer: Create a helper function to decide whether more space is needed
The decision whether some more space is needed is tricky in the printk
ring buffer code:

  1. The given lpos values might overflow. A subtraction must be used
     instead of a simple "lower than" check.

  2. Another CPU might reuse the space in the mean time. It can be
     detected when the subtraction is bigger than DATA_SIZE(data_ring).

  3. There is exactly enough space when the result of the subtraction
     is zero. But more space is needed when the result is exactly
     DATA_SIZE(data_ring).

Add a helper function to make sure that the check is done correctly
in all situations. Also it helps to make the code consistent and
better documented.

Suggested-by: John Ogness <john.ogness@linutronix.de>
Link: https://lore.kernel.org/r/87tsz7iea2.fsf@jogness.linutronix.de
Reviewed-by: John Ogness <john.ogness@linutronix.de>
Link: https://patch.msgid.link/20251107194720.1231457-3-pmladek@suse.com
[pmladek@suse.com: Updated wording as suggested by John]
Signed-off-by: Petr Mladek <pmladek@suse.com>
2025-11-10 13:09:43 +01:00
Petr Mladek
cc3bad11de printk_ringbuffer: Fix check of valid data size when blk_lpos overflows
The commit 67e1b0052f ("printk_ringbuffer: don't needlessly wrap
data blocks around") allows to use the last 4 bytes of the ring buffer.

But the check for the @data_size was not properly updated in get_data().
It fails when "blk_lpos->next" overflows to "0". In this case:

  + is_blk_wrapped(data_ring, blk_lpos->begin, blk_lpos->next)
    returns "false" because it checks "blk_lpos->next - 1".

  + "blk_lpos->begin < blk_lpos->next" fails because "blk_lpos->next"
    is already 0.

  + is_blk_wrapped(data_ring, blk_lpos->begin + DATA_SIZE(data_ring),
    blk_lpos->next) returns "false" because "begin_lpos" is from
    the next wrap but "next_lpos - 1" is from the previous one.

As a result, get_data() triggers the WARN_ON_ONCE() for "Illegal
block description", for example:

[  216.317316][ T7652] loop0: detected capacity change from 0 to 16
** 1 printk messages dropped **
[  216.327750][ T7652] ------------[ cut here ]------------
[  216.327789][ T7652] WARNING: kernel/printk/printk_ringbuffer.c:1278 at get_data+0x48a/0x840, CPU#1: syz.0.585/7652
[  216.327848][ T7652] Modules linked in:
[  216.327907][ T7652] CPU: 1 UID: 0 PID: 7652 Comm: syz.0.585 Not tainted syzkaller #0 PREEMPT(full)
[  216.327933][ T7652] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/02/2025
[  216.327953][ T7652] RIP: 0010:get_data+0x48a/0x840
[  216.327986][ T7652] Code: 83 c4 f8 48 b8 00 00 00 00 00 fc ff df 41 0f b6 04 07 84 c0 0f 85 ee 01 00 00 44 89 65 00 49 83 c5 08 eb 13 e8 a7 19 1f 00 90 <0f> 0b 90 eb 05 e8 9c 19 1f 00 45 31 ed 4c 89 e8 48 83 c4 28 5b 41
[  216.328007][ T7652] RSP: 0018:ffffc900035170e0 EFLAGS: 00010293
[  216.328029][ T7652] RAX: ffffffff81a1eee9 RBX: 00003fffffffffff RCX: ffff888033255b80
[  216.328048][ T7652] RDX: 0000000000000000 RSI: 00003fffffffffff RDI: 0000000000000000
[  216.328063][ T7652] RBP: 0000000000000012 R08: 0000000000000e55 R09: 000000325e213cc7
[  216.328079][ T7652] R10: 000000325e213cc7 R11: 00001de4c2000037 R12: 0000000000000012
[  216.328095][ T7652] R13: 0000000000000000 R14: ffffc90003517228 R15: 1ffffffff1bca646
[  216.328111][ T7652] FS:  00007f44eb8da6c0(0000) GS:ffff888125fda000(0000) knlGS:0000000000000000
[  216.328131][ T7652] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  216.328147][ T7652] CR2: 00007f44ea9722e0 CR3: 0000000066344000 CR4: 00000000003526f0
[  216.328168][ T7652] Call Trace:
[  216.328178][ T7652]  <TASK>
[  216.328199][ T7652]  _prb_read_valid+0x672/0xa90
[  216.328328][ T7652]  ? desc_read+0x1b8/0x3f0
[  216.328381][ T7652]  ? __pfx__prb_read_valid+0x10/0x10
[  216.328422][ T7652]  ? panic_on_this_cpu+0x32/0x40
[  216.328450][ T7652]  prb_read_valid+0x3c/0x60
[  216.328482][ T7652]  printk_get_next_message+0x15c/0x7b0
[  216.328526][ T7652]  ? __pfx_printk_get_next_message+0x10/0x10
[  216.328561][ T7652]  ? __lock_acquire+0xab9/0xd20
[  216.328595][ T7652]  ? console_flush_all+0x131/0xb10
[  216.328621][ T7652]  ? console_flush_all+0x478/0xb10
[  216.328648][ T7652]  console_flush_all+0x4cc/0xb10
[  216.328673][ T7652]  ? console_flush_all+0x131/0xb10
[  216.328704][ T7652]  ? __pfx_console_flush_all+0x10/0x10
[  216.328748][ T7652]  ? is_printk_cpu_sync_owner+0x32/0x40
[  216.328781][ T7652]  console_unlock+0xbb/0x190
[  216.328815][ T7652]  ? __pfx___down_trylock_console_sem+0x10/0x10
[  216.328853][ T7652]  ? __pfx_console_unlock+0x10/0x10
[  216.328899][ T7652]  vprintk_emit+0x4c5/0x590
[  216.328935][ T7652]  ? __pfx_vprintk_emit+0x10/0x10
[  216.328993][ T7652]  _printk+0xcf/0x120
[  216.329028][ T7652]  ? __pfx__printk+0x10/0x10
[  216.329051][ T7652]  ? kernfs_get+0x5a/0x90
[  216.329090][ T7652]  _erofs_printk+0x349/0x410
[  216.329130][ T7652]  ? __pfx__erofs_printk+0x10/0x10
[  216.329161][ T7652]  ? __raw_spin_lock_init+0x45/0x100
[  216.329186][ T7652]  ? __init_swait_queue_head+0xa9/0x150
[  216.329231][ T7652]  erofs_fc_fill_super+0x1591/0x1b20
[  216.329285][ T7652]  ? __pfx_erofs_fc_fill_super+0x10/0x10
[  216.329324][ T7652]  ? sb_set_blocksize+0x104/0x180
[  216.329356][ T7652]  ? setup_bdev_super+0x4c1/0x5b0
[  216.329385][ T7652]  get_tree_bdev_flags+0x40e/0x4d0
[  216.329410][ T7652]  ? __pfx_erofs_fc_fill_super+0x10/0x10
[  216.329444][ T7652]  ? __pfx_get_tree_bdev_flags+0x10/0x10
[  216.329483][ T7652]  vfs_get_tree+0x92/0x2b0
[  216.329512][ T7652]  do_new_mount+0x302/0xa10
[  216.329537][ T7652]  ? apparmor_capable+0x137/0x1b0
[  216.329576][ T7652]  ? __pfx_do_new_mount+0x10/0x10
[  216.329605][ T7652]  ? ns_capable+0x8a/0xf0
[  216.329637][ T7652]  ? kmem_cache_free+0x19b/0x690
[  216.329682][ T7652]  __se_sys_mount+0x313/0x410
[  216.329717][ T7652]  ? __pfx___se_sys_mount+0x10/0x10
[  216.329836][ T7652]  ? do_syscall_64+0xbe/0xfa0
[  216.329869][ T7652]  ? __x64_sys_mount+0x20/0xc0
[  216.329901][ T7652]  do_syscall_64+0xfa/0xfa0
[  216.329932][ T7652]  ? lockdep_hardirqs_on+0x9c/0x150
[  216.329964][ T7652]  ? entry_SYSCALL_64_after_hwframe+0x77/0x7f
[  216.329988][ T7652]  ? clear_bhb_loop+0x60/0xb0
[  216.330017][ T7652]  entry_SYSCALL_64_after_hwframe+0x77/0x7f
[  216.330040][ T7652] RIP: 0033:0x7f44ea99076a
[  216.330080][ T7652] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb a6 e8 de 1a 00 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
[  216.330100][ T7652] RSP: 002b:00007f44eb8d9e68 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
[  216.330128][ T7652] RAX: ffffffffffffffda RBX: 00007f44eb8d9ef0 RCX: 00007f44ea99076a
[  216.330146][ T7652] RDX: 0000200000000180 RSI: 00002000000001c0 RDI: 00007f44eb8d9eb0
[  216.330164][ T7652] RBP: 0000200000000180 R08: 00007f44eb8d9ef0 R09: 0000000000000000
[  216.330181][ T7652] R10: 0000000000000000 R11: 0000000000000246 R12: 00002000000001c0
[  216.330196][ T7652] R13: 00007f44eb8d9eb0 R14: 00000000000001a1 R15: 0000200000000080
[  216.330233][ T7652]  </TASK>

Solve the problem by moving and fixing the sanity check. The problematic
if-else-if-else code will just distinguish three basic scenarios:
"regular" vs. "wrapped" vs. "too many times wrapped" block.

The new sanity check is more precise. A valid "data_size" must be
lower than half of the data buffer size. Also it must not be zero at
this stage. It allows to catch problematic "data_size" even for wrapped
blocks.

Closes: https://lore.kernel.org/all/69096836.a70a0220.88fb8.0006.GAE@google.com/
Closes: https://lore.kernel.org/all/69078fb6.050a0220.29fc44.0029.GAE@google.com/
Fixes: 67e1b0052f ("printk_ringbuffer: don't needlessly wrap data blocks around")
Reviewed-by: John Ogness <john.ogness@linutronix.de>
Tested-by: John Ogness <john.ogness@linutronix.de>
Link: https://patch.msgid.link/20251107194720.1231457-2-pmladek@suse.com
Signed-off-by: Petr Mladek <pmladek@suse.com>
2025-11-10 12:58:28 +01:00
Thomas Weißschuh
2d8482959e tools/nolibc: avoid using plain integer as NULL pointer
While an integer zero is a valid NULL pointer as per the C standard,
sparse will complain about it.

Use explicit NULL pointers instead.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/r/202509261452.g5peaXCc-lkp@intel.com/
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Acked-by: Willy Tarreau <w@1wt.eu>
2025-11-09 21:29:57 +01:00
Akira Yokosawa
7f8fcc6f09 memory-barriers.txt: Sort wait_event* and wait_on_bit* list alphabetically
Commit 8817270042 ("docs/memory-barriers.txt: Add wait_event_cmd()
and wait_event_exclusive_cmd()") added two APIs without taking care
of the list order.  Sort the list for readability.

While there, make it clear that this is incomplete by saying
"for example".

Signed-off-by: Akira Yokosawa <akiyks@gmail.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2025-11-08 19:02:26 -08:00
Thomas Weißschuh
107eb8336e tools/nolibc: add support for fchdir()
Add support for the file descriptor based variant of chdir().

Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Acked-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
2025-11-08 14:54:25 +01:00
Pat Somaru
9362d34acf scripts/clang-tools: Handle included .c files in gen_compile_commands
The gen_compile_commands.py script currently only creates entries for the
primary source files found in .cmd files, but some kernel source files
text-include others (i.e. kernel/sched/build_policy.c).

This prevents tools like clangd from working properly on text-included c
files, such as kernel/sched/ext.c because the generated compile_commands.json
does not have entries for them.

Extend process_line() to detect when a source file includes .c files, and
generate additional compile_commands.json entries for them. For included c
files, use the same compile flags as their parent and add their parents headers.

This enables lsp tools like clangd to work properly on files like
kernel/sched/ext.c

Signed-off-by: Pat Somaru <patso@likewhatevs.io>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Justin Stitt <justinstitt@google.com>
Tested-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://patch.msgid.link/20251008004615.2690081-1-patso@likewhatevs.io
Signed-off-by: Nicolas Schier <nsc@kernel.org>
2025-11-08 12:18:00 +01:00
Nathan Chancellor
2d7eda1db5 kbuild: uapi: Drop types.h check from headers_check.pl
Since commit d6fc9fcbaa ("kbuild: compile-test exported headers to
ensure they are self-contained"), the UAPI headers are compile tested
standalone with the compiler. This compile testing will catch a missing
include of types.h, making the types.h checking in headers_check.pl
originally added in commit 483b41218f ("kbuild: add checks for include
of linux/types in userspace headers") obsolete.

Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Link: https://patch.msgid.link/20251023-headers-pl-drop-check-sizes-v1-1-18bd21cf8f5e@kernel.org
Signed-off-by: Nicolas Schier <nsc@kernel.org>
2025-11-08 12:18:00 +01:00
Nathan Chancellor
7319256dda kbuild: Rename Makefile.extrawarn to Makefile.warn
Since commit e88ca24319 ("kbuild: consolidate warning flags in
scripts/Makefile.extrawarn"), scripts/Makefile.extrawarn contains all
warnings for the main kernel build, not just warnings enabled by the
values for W=. Rename it to scripts/Makefile.warn to make it clearer
that this Makefile is where all Kbuild warning handling should exist.

Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Link: https://patch.msgid.link/20251023-rename-scripts-makefile-extrawarn-v1-1-8f7531542169@kernel.org
Signed-off-by: Nicolas Schier <nsc@kernel.org>
2025-11-08 12:17:59 +01:00
Nicolas Schier
d569067318 MAINTAINERS, .mailmap: Update mail address for Nicolas Schier
Update kbuild entry by my kernel.org address, and change my mailmap
entry appropriately.

Acked-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Nicolas Schier <nsc@kernel.org>
2025-11-08 12:17:59 +01:00
Thomas Weißschuh
c4cb34aee1 kbuild: uapi: reuse KBUILD_USERCFLAGS
The toplevel Makefile already provides the compiler flags necessary to
build userspace applications for the target.

Make use of them instead of duplicating the logic.

Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Link: https://patch.msgid.link/20251014-kbuild-uapi-usercflags-v1-1-c162f9059c47@linutronix.de
Signed-off-by: Nicolas Schier <nsc@kernel.org>
2025-11-08 12:17:59 +01:00
Gang Yan
bfb046f67a kbuild: doc: improve KBUILD_BUILD_TIMESTAMP documentation
This patch adds an example of how to set KBUILD_BUILD_TIMESTAMP to a
specific date. Also, note that the provided timestamp is used for
initramfs mtime fields, which are 32-bit and thus limited to dates
between the Unix epoch and 2106-02-07 06:28:15 UTC. Dates outside this
range will cause errors.

Suggested-by: David Disseldorp <ddiss@suse.de>
Signed-off-by: Gang Yan <yangang@kylinos.cn>
Reviewed-by: David Disseldorp <ddiss@suse.de>
Reviewed-by: Nicolas Schier <nsc@kernel.org>
Link: https://patch.msgid.link/20251017021209.6586-1-gang.yan@linux.dev
Signed-off-by: Nicolas Schier <nsc@kernel.org>
2025-11-08 12:17:58 +01:00
Mikhail Malyshev
af61da281f kbuild: Use objtree for module signing key path
When building out-of-tree modules with CONFIG_MODULE_SIG_FORCE=y,
module signing fails because the private key path uses $(srctree)
while the public key path uses $(objtree). Since signing keys are
generated in the build directory during kernel compilation, both
paths should use $(objtree) for consistency.

This causes SSL errors like:
  SSL error:02001002:system library:fopen:No such file or directory
  sign-file: /kernel-src/certs/signing_key.pem

The issue occurs because:
- sig-key uses: $(srctree)/certs/signing_key.pem (source tree)
- cmd_sign uses: $(objtree)/certs/signing_key.x509 (build tree)

But both keys are generated in $(objtree) during the build.

This complements commit 25ff08aa43 ("kbuild: Fix signing issue for
external modules") which fixed the scripts path and public key path,
but missed the private key path inconsistency.

Fixes out-of-tree module signing for configurations with separate
source and build directories (e.g., O=/kernel-out).

Signed-off-by: Mikhail Malyshev <mike.malyshev@gmail.com>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Nicolas Schier <nsc@kernel.org>
Link: https://patch.msgid.link/20251015163452.3754286-1-mike.malyshev@gmail.com
Signed-off-by: Nicolas Schier <nsc@kernel.org>
2025-11-08 12:17:58 +01:00
Rasmus Villemoes
d599f571b3 btrfs: send: make use of -fms-extensions for defining struct fs_path
The newly introduced -fms-extensions compiler flag allows defining
struct fs_path in such a way that inline_buf becomes a proper array
with a size known to the compiler.

This also makes the problem fixed by commit 8aec9dbf2d ("btrfs: send:
fix -Wflex-array-member-not-at-end warning in struct send_ctx") go away.
Whether cur_inode_path should be put back to its original place in
struct send_ctx I don't know, but at least the comment no longer
applies.

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Acked-by: David Sterba <dsterba@suse.com>
Link: https://patch.msgid.link/20251020142228.1819871-3-linux@rasmusvillemoes.dk
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Nicolas Schier <nsc@kernel.org>
2025-11-08 12:17:58 +01:00
Nicolas Schier
9716818d61 Merge tag 'kbuild-ms-extensions-6.19' into kbuild-next
Shared branch between Kbuild and other trees for enabling '-fms-extensions' for 6.19

Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Nicolas Schier <nsc@kernel.org>
2025-11-08 12:16:41 +01:00
Ricardo Robaina
c8a3dfe731 audit: merge loops in __audit_inode_child()
Whenever there's audit context, __audit_inode_child() gets called
numerous times, which can lead to high latency in scenarios that
create too many sysfs/debugfs entries at once, for instance, upon
device_add_disk() invocation.

   # uname -r
   6.18.0-rc2+

   # auditctl -a always,exit -F path=/tmp -k foo
   # time insmod loop max_loop=1000
   real 0m46.676s
   user 0m0.000s
   sys 0m46.405s

   # perf record -a insmod loop max_loop=1000
   # perf report --stdio |grep __audit_inode_child
   32.73%  insmod [kernel.kallsyms] [k] __audit_inode_child

__audit_inode_child() searches for both the parent and the child
in two different loops that iterate over the same list. This
process can be optimized by merging these into a single loop,
without changing the function behavior or affecting the code's
readability.

This patch merges the two loops that walk through the list
context->names_list into a single loop. This optimization resulted
in around 51% performance enhancement for the benchmark.

   # uname -r
   6.18.0-rc2-enhancedv3+

   # auditctl -a always,exit -F path=/tmp -k foo
   # time insmod loop max_loop=1000
   real 0m22.899s
   user 0m0.001s
   sys 0m22.652s

Signed-off-by: Ricardo Robaina <rrobaina@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-11-07 16:50:42 -05:00
Gongwei Li
77563f3d47 audit: Use kzalloc() instead of kmalloc()/memset() in audit_krule_to_data()
Replace kmalloc+memset by kzalloc for better readability and simplicity.

This addresses the warning below:
WARNING: kzalloc should be used for data, instead of kmalloc/memset

Signed-off-by: Gongwei Li <ligongwei@kylinos.cn>
[PM: subject and description tweaks]
Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-11-07 16:38:34 -05:00
Paul E. McKenney
204ab51445 refscale: Do not disable interrupts for tests involving local_bh_enable()
Some kernel configurations prohibit invoking local_bh_enable() while
interrupts are disabled.  However, refscale disables interrupts to reduce
OS noise during the tests, which results in splats.  This commit therefore
adds an ->enable_irqs flag to the ref_scale_ops structure, and refrains
from disabling interrupts when that flag is set.  This flag is set for
the "bh" and "incpercpubh" scale_type module-parameter values.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
2025-11-07 14:37:17 +01:00
Paul E. McKenney
448b66a7aa refscale: Add non-atomic per-CPU increment readers
This commit adds refscale readers based on READ_ONCE() and WRITE_ONCE()
that are unprotected (can lose counts, "refscale.scale_type=incpercpu"),
preempt-disabled ("refscale.scale_type=incpercpupreempt"),
bh-disabled ("refscale.scale_type=incpercpubh"), and irq-disabled
("refscale.scale_type=incpercpuirqsave").  On my x86 laptop, these are
about 4.3ns, 3.8ns, and 7.3ns per pair, respectively.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
2025-11-07 14:37:17 +01:00
Paul E. McKenney
bdba8330ad refscale: Add this_cpu_inc() readers
This commit adds refscale readers based on this_cpu_inc() and
this_cpu_inc() ("refscale.scale_type=percpuinc").  On my x86 laptop,
these are about 4.5ns per pair.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
2025-11-07 14:37:17 +01:00
Paul E. McKenney
057df3eaca refscale: Add preempt_disable() readers
This commit adds refscale readers based on preempt_disable() and
preempt_enable() ("refscale.scale_type=preempt").  On my x86 laptop, these
are about 2.8ns.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
2025-11-07 14:37:17 +01:00
Paul E. McKenney
78a731cefc refscale: Add local_bh_disable() readers
This commit adds refscale readers based on local_bh_disable() and
local_bh_enable() ("refscale.scale_type=bh").  On my x86 laptop, these
are about 4.9ns.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
2025-11-07 14:37:17 +01:00
Paul E. McKenney
edd6f78b75 refscale: Add local_irq_disable() and local_irq_save() readers
This commit adds refscale readers based on local_irq_disable() and
local_irq_enable() ("refscale.scale_type=irq") and on local_irq_save()
and local_irq_restore ("refscale.scale_type=irqsave").  On my x86 laptop,
these are about 2.8ns and 7.5ns per enable/disable pair, respectively.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
2025-11-07 14:37:17 +01:00
John Ogness
187de7c212 printk: nbcon: Allow unsafe write_atomic() for panic
There may be console drivers that have not yet figured out a way
to implement safe atomic printing (->write_atomic() callback).
These drivers could choose to only implement threaded printing
(->write_thread() callback), but then it is guaranteed that _no_
output will be printed during panic. Not even attempted.

As a result, developers may be tempted to implement unsafe
->write_atomic() callbacks and/or implement some sort of custom
deferred printing trickery to try to make it work. This goes
against the principle intention of the nbcon API as well as
endangers other nbcon drivers that are doing things correctly
(safely).

As a compromise, allow nbcon drivers to implement unsafe
->write_atomic() callbacks by providing a new console flag
CON_NBCON_ATOMIC_UNSAFE. When specified, the ->write_atomic()
callback for that console will _only_ be called during the
final "hope and pray" flush attempt at the end of a panic:
nbcon_atomic_flush_unsafe().

Signed-off-by: John Ogness <john.ogness@linutronix.de>
Link: https://lore.kernel.org/lkml/b2qps3uywhmjaym4mht2wpxul4yqtuuayeoq4iv4k3zf5wdgh3@tocu6c7mj4lt
Reviewed-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/all/swdpckuwwlv3uiessmtnf2jwlx3jusw6u7fpk5iggqo4t2vdws@7rpjso4gr7qp/ [1]
Link: https://lore.kernel.org/all/20251103-fix_netpoll_aa-v4-1-4cfecdf6da7c@debian.org/ [2]
Link: https://patch.msgid.link/20251027161212.334219-2-john.ogness@linutronix.de
[pmladek@suse.com: Fix build with rework/nbcon-in-kdb branch.]
Signed-off-by: Petr Mladek <pmladek@suse.com>
2025-11-07 14:30:52 +01:00
Paul E. McKenney
f2b7d6252c torture: Permit negative kvm.sh --kconfig numberic arguments
This commit loosens the kvm.sh script's regular expressions to permit
negative-valued Kconfig options, for example:

	--kconfig CONFIG_CMDLINE_LOG_WRAP_IDEAL_LEN=-1

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
2025-11-07 13:57:38 +01:00
Paul E. McKenney
37827223f8 srcu: Add SRCU_READ_FLAVOR_FAST_UPDOWN CPP macro
This commit adds the SRCU_READ_FLAVOR_FAST_UPDOWN=0x8 macro
and adjusts rcutorture to make use of it.  In this commit, both
SRCU_READ_FLAVOR_FAST=0x4 and the new SRCU_READ_FLAVOR_FAST_UPDOWN
test SRCU-fast.  When the SRCU-fast-updown is added, the new
SRCU_READ_FLAVOR_FAST_UPDOWN macro will test it when passed to the
rcutorture.reader_flavor module parameter.

The old SRCU_READ_FLAVOR_FAST macro's value changed from 0x8 to 0x4.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: <bpf@vger.kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
2025-11-07 13:57:38 +01:00
Paul E. McKenney
3ed04e3f03 rcu: Mark diagnostic functions as notrace
The rcu_lockdep_current_cpu_online(), rcu_read_lock_sched_held(),
rcu_read_lock_held(), rcu_read_lock_bh_held(), rcu_read_lock_any_held()
are used by tracing-related code paths, so putting traces on them is
unlikely to make anyone happy.  This commit therefore marks them all
"notrace".

Reported-by: Leon Hwang <leon.hwang@linux.dev>
Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
2025-11-07 13:57:37 +01:00
Vlastimil Babka
4c0a17e283 slab: prevent recursive kmalloc() in alloc_empty_sheaf()
We want to expand usage of sheaves to all non-boot caches, including
kmalloc caches. Since sheaves themselves are also allocated by
kmalloc(), we need to prevent excessive or infinite recursion -
depending on sheaf size, the sheaf can be allocated from smaller, same
or larger kmalloc size bucket, there's no particular constraint.

This is similar to allocating the objext arrays so let's just reuse the
existing mechanisms for those. __GFP_NO_OBJ_EXT in alloc_empty_sheaf()
will prevent a nested kmalloc() from allocating a sheaf itself - it will
either have sheaves already, or fallback to a non-sheaf-cached
allocation (so bootstrap of sheaves in a kmalloc cache that allocates
sheaves from its own size bucket is possible). Additionally, reuse
OBJCGS_CLEAR_MASK to clear unwanted gfp flags from the nested
allocation.

Link: https://patch.msgid.link/20251105-sheaves-cleanups-v1-5-b8218e1ac7ef@suse.cz
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2025-11-07 09:59:15 +01:00
Vlastimil Babka
f6087b926a slab: make __slab_free() more clear
The function is tricky and many of its tests are hard to understand. Try
to improve that by using more descriptively named variables and added
comments.

- rename 'prior' to 'old_head' to match the head and tail parameters
- introduce a 'bool was_full' to make it more obvious what we are
  testing instead of the !prior and prior tests
- add or improve comments in various places to explain what we're doing

Also replace kmem_cache_has_cpu_partial() tests with
IS_ENABLED(CONFIG_SLUB_CPU_PARTIAL) which are compile-time constants.
We can do that because the kmem_cache_debug(s) case is handled upfront
via free_to_partial_list().

Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Link: https://patch.msgid.link/20251105-sheaves-cleanups-v1-1-b8218e1ac7ef@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2025-11-07 09:59:15 +01:00
Vlastimil Babka
31e0886fd5 slub: remove CONFIG_SLUB_TINY specific code paths
CONFIG_SLUB_TINY minimizes the SLUB's memory overhead in multiple ways,
mainly by avoiding percpu caching of slabs and objects. It also reduces
code size by replacing some code paths with simplified ones through
ifdefs, but the benefits of that are smaller and would complicate the
upcoming changes.

Thus remove these code paths and associated ifdefs and simplify the code
base.

Link: https://patch.msgid.link/20251105-sheaves-cleanups-v1-4-b8218e1ac7ef@suse.cz
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2025-11-07 09:59:15 +01:00
Vlastimil Babka
1ce20c28ea slab: handle pfmemalloc slabs properly with sheaves
When a pfmemalloc allocation actually dips into reserves, the slab is
marked accordingly and non-pfmemalloc allocations should not be allowed
to allocate from it. The sheaves percpu caching currently doesn't follow
this rule, so implement it before we expand sheaves usage to all caches.

Make sure objects from pfmemalloc slabs don't end up in percpu sheaves.
When freeing, skip sheaves when freeing an object from pfmemalloc slab.
When refilling sheaves, use __GFP_NOMEMALLOC to override any pfmemalloc
context - the allocation will fallback to regular slab allocations when
sheaves are depleted and can't be refilled because of the override.

For kfree_rcu(), detect pfmemalloc slabs after processing the rcu_sheaf
after the grace period in __rcu_free_sheaf_prepare() and simply flush
it if any object is from pfmemalloc slabs.

For prefilled sheaves, try to refill them first with __GFP_NOMEMALLOC
and if it fails, retry without __GFP_NOMEMALLOC but then mark the sheaf
pfmemalloc, which makes it flushed back to slabs when returned.

Link: https://patch.msgid.link/20251105-sheaves-cleanups-v1-3-b8218e1ac7ef@suse.cz
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2025-11-07 09:59:15 +01:00
Vlastimil Babka
ea6b5e5778 slab: move kfence_alloc() out of internal bulk alloc
SLUB's internal bulk allocation __kmem_cache_alloc_bulk() can currently
allocate some objects from KFENCE, i.e. when refilling a sheaf. It works
but it's conceptually the wrong layer, as KFENCE allocations should only
happen when objects are actually handed out from slab to its users.

Currently for sheaf-enabled caches, slab_alloc_node() can return KFENCE
object via kfence_alloc(), but also via alloc_from_pcs() when a sheaf
was refilled with KFENCE objects. Continuing like this would also
complicate the upcoming sheaf refill changes.

Thus remove KFENCE allocation from __kmem_cache_alloc_bulk() and move it
to the places that return slab objects to users. slab_alloc_node() is
already covered (see above). Add kfence_alloc() to
kmem_cache_alloc_from_sheaf() to handle KFENCE allocations from
prefilled sheafs, with a comment that the caller should not expect the
sheaf size to decrease after every allocation because of this
possibility.

For kmem_cache_alloc_bulk() implement a different strategy to handle
KFENCE upfront and rely on internal batched operations afterwards.
Assume there will be at most once KFENCE allocation per bulk allocation
and then assign its index in the array of objects randomly.

Cc: Alexander Potapenko <glider@google.com>
Cc: Marco Elver <elver@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Link: https://patch.msgid.link/20251105-sheaves-cleanups-v1-2-b8218e1ac7ef@suse.cz
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2025-11-07 09:59:15 +01:00
Tejun Heo
9311e6c29b cgroup: Fix sleeping from invalid context warning on PREEMPT_RT
cgroup_task_dead() is called from finish_task_switch() which runs with
preemption disabled and doesn't allow scheduling even on PREEMPT_RT. The
function needs to acquire css_set_lock which is a regular spinlock that can
sleep on RT kernels, leading to "sleeping function called from invalid
context" warnings.

css_set_lock is too large in scope to convert to a raw_spinlock. However,
the unlinking operations don't need to run synchronously - they just need
to complete after the task is done running.

On PREEMPT_RT, defer the work through irq_work. While the work doesn't need
to happen immediately, it can't be delayed indefinitely either as the dead
task pins the cgroup and task_struct can be pinned indefinitely. Use the
lazy version of irq_work to allow batching and lower impact while ensuring
timely completion.

v2: Use IRQ_WORK_INIT_LAZY instead of immediate irq_work and add explanation
    for why the work can't be delayed indefinitely (Sebastian Andrzej Siewior).

Fixes: d245698d72 ("cgroup: Defer task cgroup unlink until after the task is done switching out")
Reported-by: Calvin Owens <calvin@wbinvd.org>
Link: https://lore.kernel.org/r/20251104181114.489391-1-calvin@wbinvd.org
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-11-06 12:52:26 -10:00
Steven Rostedt
37f4660138 selftests/tracing: Add basic test for trace_marker_raw file
Commit 64cf7d058a ("tracing: Have trace_marker use per-cpu data to read
user space") made an update that fixed both trace_marker and
trace_marker_raw. But the small difference made to trace_marker_raw had a
blatant bug in it that any basic testing would have uncovered.
Unfortunately, the self tests have tests for trace_marker but nothing for
trace_marker_raw which allowed the bug to get upstream.

Add basic selftests to test trace_marker_raw so that this doesn't happen
again.

Link: https://lore.kernel.org/r/20251014145149.3e3c1033@gandalf.local.home
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2025-11-06 15:23:50 -07:00
Thorsten Blum
d633730bb3 crypto: octeontx2 - Replace deprecated strcpy in cpt_ucode_load_fw
strcpy() is deprecated; use the safer strscpy() instead.

The destination buffer is only zero-initialized for the first iteration
and since strscpy() guarantees its NUL termination anyway, remove
zero-initializing 'eng_type'.

Link: https://github.com/KSPP/linux/issues/88
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-11-06 14:31:08 +08:00
Thorsten Blum
b6410c1e50 crypto: deflate - Use struct_size to improve deflate_alloc_stream
Use struct_size(), which provides additional compile-time checks for
structures with flexible array members (e.g., __must_be_array()), to
calculate the allocation size for a new 'deflate_stream'.

Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-11-06 14:31:08 +08:00
Mario Limonciello (AMD)
9fc6290117 crypto: ccp - Add support for PCI device 0x115A
PCI device 0x115A is similar to pspv5, except it doesn't have platform
access mailbox support.

Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
Acked-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-11-06 14:31:08 +08:00
Gaurav Kashyap
457be301fc crypto: qce - fix version check
The previous version check made it difficult to support newer major
versions (e.g., v6.0) without adding extra checks/macros. Update the
logic to only reject v5.0 and allow future versions without additional
changes.

Signed-off-by: Gaurav Kashyap <gaurav.kashyap@oss.qualcomm.com>
Signed-off-by: Jingyi Wang <jingyi.wang@oss.qualcomm.com>
Reviewed-by: Bjorn Andersson <andersson@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-11-06 14:31:08 +08:00
Gaurav Kashyap
5a331d1cd5 dt-bindings: crypto: qcom-qce: Document the kaanapli crypto engine
Document the crypto engine on the kaanapali platform.

Signed-off-by: Gaurav Kashyap <gaurav.kashyap@oss.qualcomm.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Jingyi Wang <jingyi.wang@oss.qualcomm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-11-06 14:31:08 +08:00
Gaurav Kashyap
e74a03e519 dt-bindings: crypto: qcom,prng: Document kaanapali RNG
Document kaanapali compatible for the True Random Number Generator.

Signed-off-by: Gaurav Kashyap <gaurav.kashyap@oss.qualcomm.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Jingyi Wang <jingyi.wang@oss.qualcomm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-11-06 14:31:08 +08:00
Harsh Jain
426b1a1bdf crypto: xilinx - Use %pe to print PTR_ERR
Fix cocci warnings to use %pe to print PTR_ERR().

Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Julia Lawall <julia.lawall@inria.fr>
Closes: https://lore.kernel.org/r/202510231229.Z6TduqZy-lkp@intel.com/
Signed-off-by: Harsh Jain <h.jain@amd.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-11-06 14:31:08 +08:00
Valentin Schneider
82a2244980 rcutorture: Make TREE04 use CONFIG_RCU_DYNTICKS_TORTURE
We now have an RCU_EXPERT config for testing small-sized RCU dynticks
counter:  CONFIG_RCU_DYNTICKS_TORTURE.

Modify scenario TREE04 to exercise to use this config in order to test a
ridiculously small counter (2 bits).

Link: http://lore.kernel.org/r/4c2cb573-168f-4806-b1d9-164e8276e66a@paulmck-laptop
Suggested-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Valentin Schneider <vschneid@redhat.com>
Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
2025-11-06 00:09:16 +01:00
Paul E. McKenney
d4500d68bc rcutorture: Remove redundant rcutorture_one_extend() from rcu_torture_one_read()
This commit removes a harmless but potentially confusing invocation of
rcutorture_one_extend() within rcu_torture_one_read().  The immediately
preceding call to rcu_torture_one_read_start() already does this cleanup,
and the other call to rcu_torture_one_read_start() already relies on this.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
2025-11-06 00:03:15 +01:00
Paul E. McKenney
f121fbbdaf rcutorture: Permit kvm-again.sh to re-use the build directory
This commit adds "inplace" and "inplace-force" values to the kvm-again.sh
"--link" argument, which causes the run's output to be placed into the
build directory.  This could be used to save build time if the machine
went down partway into a run, but it can also be used to do a large
number of builds, and run the resulting kernels concurrently even if the
builds are based on different commits.  A later commit will add this
latter capability to kvm-series.sh in order to produce large speedups
for branch-checking operations.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
2025-11-06 00:03:12 +01:00
Paul E. McKenney
515a48fedc torture: Add kvm-series.sh to test commit/scenario combination
This commit adds a kvm-series.sh script that takes a list of scenarios and
a list of commits, and then runs each scenario on all of the commits.
A given scenario is run on all the commits before advancing to the
next scenario to minimize build times.  The successes and failures are
summarized at the end of the run.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
2025-11-06 00:03:10 +01:00
Xuanqiang Luo
34e82569d5 rcu: use WRITE_ONCE() for ->next and ->pprev of hlist_nulls
In rculist_nulls.h we can still see ordinary assignments to ->pprev and
->next of hlist_nulls.

As noted in the two patches below:
commit efd04f8a8b ("rcu: Use WRITE_ONCE() for assignments to ->next for
rculist_nulls")
commit 860c8802ac ("rcu: Use WRITE_ONCE() for assignments to ->pprev for
hlist_nulls")

We should use WRITE_ONCE().

Signed-off-by: Xuanqiang Luo <luoxuanqiang@kylinos.cn>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
2025-11-06 00:03:06 +01:00
Wang Liang
e52b43883d locktorture: Fix memory leak in param_set_cpumask()
With CONFIG_CPUMASK_OFFSTACK=y, the 'bind_writers' buffer is allocated via
alloc_cpumask_var() in param_set_cpumask(). But it is not freed, when
setting the module parameter multiple times by sysfs interface or removing
module.

Below kmemleak trace is seen for this issue:

unreferenced object 0xffff888100aabff8 (size 8):
  comm "bash", pid 323, jiffies 4295059233
  hex dump (first 8 bytes):
    07 00 00 00 00 00 00 00                          ........
  backtrace (crc ac50919):
    __kmalloc_node_noprof+0x2e5/0x420
    alloc_cpumask_var_node+0x1f/0x30
    param_set_cpumask+0x26/0xb0 [locktorture]
    param_attr_store+0x93/0x100
    module_attr_store+0x1b/0x30
    kernfs_fop_write_iter+0x114/0x1b0
    vfs_write+0x300/0x410
    ksys_write+0x60/0xd0
    do_syscall_64+0xa4/0x260
    entry_SYSCALL_64_after_hwframe+0x77/0x7f

This issue can be reproduced by:
  insmod locktorture.ko bind_writers=1
  rmmod locktorture

or:
  insmod locktorture.ko bind_writers=1
  echo 2 > /sys/module/locktorture/parameters/bind_writers

Considering that setting the module parameter 'bind_writers' or
'bind_readers' by sysfs interface has no real effect, set the parameter
permissions to 0444. To fix the memory leak when removing module, free
'bind_writers' and 'bind_readers' memory in lock_torture_cleanup().

Fixes: 73e3412424 ("locktorture: Add readers_bind and writers_bind module parameters")
Suggested-by: Zhang Changzhong <zhangchangzhong@huawei.com>
Signed-off-by: Wang Liang <wangliang74@huawei.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
2025-11-06 00:03:04 +01:00
Paul E. McKenney
8c8250ee3b doc: Update for SRCU-fast definitions and initialization
This commit documents the DEFINE_SRCU_FAST(), DEFINE_STATIC_SRCU_FAST(),
and init_srcu_struct_fast() API members.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: <bpf@vger.kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
2025-11-05 23:58:30 +01:00
Paul E. McKenney
ac51c40c2c srcu: Make SRCU-fast readers enforce use of SRCU-fast definition/init
This commit makes CONFIG_PROVE_RCU=y kernels enforce the new rule
that srcu_struct structures that are passed to srcu_read_lock_fast()
and other SRCU-fast read-side markers be either initialized with
init_srcu_struct_fast() on the one hand or defined with DEFINE_SRCU_FAST()
or DEFINE_STATIC_SRCU_FAST() on the other.

This eliminates the read-side test that was formerly included in
srcu_read_lock_fast() and friends, speeding these primitives up by
about 25% (admittedly only about half of a nanosecond, but when tracing
on fastpaths...)

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: <bpf@vger.kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
2025-11-05 23:58:28 +01:00
Paul E. McKenney
8235bcfd39 srcu: Require special srcu_struct define/init for SRCU-fast readers
This commit adds CONFIG_PROVE_RCU=y checking to enforce the new rule that
srcu_struct structures passed to srcu_read_lock_fast() and other SRCU-fast
read-side markers be either initialized with init_srcu_struct_fast()
on the one hand or defined using either DEFINE_SRCU_FAST() or
DEFINE_STATIC_SRCU_FAST().  This will enable removal of the non-debug
read-side checks from srcu_read_lock_fast() and friends, which on my
laptop provides a 25% speedup (which admittedly amounts to about half
a nanosecond, but when tracing fastpaths...)

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: <bpf@vger.kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
2025-11-05 23:58:26 +01:00
Paul E. McKenney
e4ed20c160 rcutorture: Exercise DEFINE_STATIC_SRCU_FAST() and init_srcu_struct_fast()
This commit updates the initialization for the "srcu" and "srcud" torture
types to use DEFINE_STATIC_SRCU_FAST() and init_srcu_struct_fast(),
respectively, when reader_flavor is equal to SRCU_READ_FLAVOR_FAST.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: <bpf@vger.kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
2025-11-05 23:58:24 +01:00
Paul E. McKenney
c5fee33f88 srcu: Make grace-period determination use ssp->srcu_reader_flavor
This commit causes the srcu_readers_unlock_idx() function to take the
srcu_struct structure's ->srcu_reader_flavor field into account.  This
ensures that structures defined via DEFINE_SRCU_FAST( or initialized via
init_srcu_struct_fast() have their grace periods use synchronize_srcu()
or synchronize_srcu_expedited() instead of smp_mb(), even before the
first SRCU reader has been entered.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: <bpf@vger.kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
2025-11-05 23:58:22 +01:00
Paul E. McKenney
ee90848499 srcu: Create a DEFINE_SRCU_FAST()
This commit creates DEFINE_SRCU_FAST() and DEFINE_STATIC_SRCU_FAST()
macros that are similar to DEFINE_SRCU() and DEFINE_STATIC_SRCU(),
but which create srcu_struct structures that are usable only by readers
initiated by srcu_read_lock_fast() and friends.

This commit does make DEFINE_SRCU_FAST() available to modules, in which
case the per-CPU srcu_data structures are not created at compile time, but
rather at module-load time.  This means that the >srcu_reader_flavor field
of the srcu_data structure is not available.  Therefore,
this commit instead creates an ->srcu_reader_flavor field in the
srcu_struct structure, adds arguments to the DEFINE_SRCU()-related
macros to initialize this new field, and extends the checks in the
__srcu_check_read_flavor() function to include this new field.

This commit also allows dynamically allocated srcu_struct structure
to be marked for SRCU-fast readers.  It does so by defining a new
init_srcu_struct_fast() function that marks the specified srcu_struct
structure for use by srcu_read_lock_fast() and friends.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: <bpf@vger.kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
2025-11-05 23:58:20 +01:00
Paul E. McKenney
950063c6e8 rcutorture: Test srcu_expedite_current()
This commit adds a ->exp_current member to the rcu_torture_ops structure
to test the srcu_expedite_current() function.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: <bpf@vger.kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
2025-11-05 23:58:16 +01:00
Paul E. McKenney
34dc27f02c srcu: Create an srcu_expedite_current() function
This commit creates an srcu_expedite_current() function that expedites
the current (and possibly the next) SRCU grace period for the specified
srcu_struct structure.  This functionality will be inherited by RCU
Tasks Trace courtesy of its mapping to SRCU fast.

If the current SRCU grace period is already waiting, that wait will
complete before the expediting takes effect.  If there is no SRCU grace
period in flight, this function might well create one.

[ paulmck: Apply Zqiang feedback for PREEMPT_RT use. ]

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: <bpf@vger.kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
2025-11-05 23:58:14 +01:00
Paul E. McKenney
58ac42f278 srcu: Permit Tiny SRCU srcu_read_unlock() with interrupts disabled
The current Tiny SRCU implementation of srcu_read_unlock() awakens
the grace-period processing when exiting the outermost SRCU read-side
critical section.  However, not all Linux-kernel configurations and
contexts permit swake_up_one() to be invoked while interrupts are
disabled, and this can result in indefinitely extended SRCU grace periods.
This commit therefore only invokes swake_up_one() when interrupts are
enabled, and introduces polling to the grace-period workqueue handler.

Reported-by: kernel test robot <oliver.sang@intel.com>
Reported-by: Zqiang <qiang.zhang@linux.dev>
Closes: https://lore.kernel.org/oe-lkp/202508261642.b15eefbb-lkp@intel.com
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
2025-11-05 23:58:11 +01:00
Tejun Heo
5a629ecbcd sched_ext: Mark racy bitfields to prevent adding fields that can't tolerate races
The warned bitfields in struct scx_sched are updated racily from concurrent
CPUs causing RMW races, which is fine for these boolean warning flags. Add a
comment marking this area to prevent future fields that can't tolerate racy
updates from being added here.

Signed-off-by: Tejun Heo <tj@kernel.org>
2025-11-05 12:07:09 -10:00
Randy Dunlap
8710524f3f docs: ABI: sysfs-module: update modules taint flags
Add missing taint flags for loadable modules, as pointed out by
Petr Pavlu [1].

[1] https://lore.kernel.org/all/c58152f1-0fbe-4f50-bb61-e2f4c0584025@suse.com/

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Petr Pavlu <petr.pavlu@suse.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <20251102060458.517911-1-rdunlap@infradead.org>
2025-11-05 11:32:13 -07:00
Bagas Sanjaya
f4c6e50568 Documentation: uacce: Add explicit title
Uacce docs' sections are listed in misc-devices toctree instead due to
lack of explicit docs title. Add it to clean up the toctree.

Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <20251103093817.52764-2-bagasdotme@gmail.com>
2025-11-05 11:31:05 -07:00
Bagas Sanjaya
0629278ecb Documentation: pldmfw: Demote library overview section
pldmfw library overview section is formatted as title heading (the
second title of index.rst), making it listed in driver-api toctree.

Demote the section.

Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <20251103030228.23851-1-bagasdotme@gmail.com>
2025-11-05 11:29:30 -07:00
Randy Dunlap
dd3e817e87 doc-guide: kernel-doc: add %CONST examples
Add examples of using '%' for formatting constant values to
facilitate more usage of "%CONST" in kernel-doc.

Suggested-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <20251104050930.720711-1-rdunlap@infradead.org>
2025-11-05 11:27:12 -07:00
Bhanu Seshu Kumar Valluri
6894ea0b9a docs: Makefile: Sort Documentation targets case-insensitively in make help
Avoid case-sensitive sorting when listing Documentation targets in make help.
Previously, targets like PCI and RCU appeared ahead of others due to uppercase
names.

Normalize casing during _SPHINXDIRS generation to ensure consistent and
intuitive ordering.

Fixes: 965fc39f73 ("Documentation: sort _SPHINXDIRS for 'make help'")
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Bhanu Seshu Kumar Valluri <bhanuseshukumar@gmail.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <20251104061723.16629-1-bhanuseshukumar@gmail.com>
2025-11-05 11:26:24 -07:00
Randy Dunlap
0c6636d826 docs: w1: fix w1-netlink invalid URL
The URL in w1-netlink.rst for an example userspace application
has changed. The former URL is no longer valid. Update it to the
github URL.

Fixes: e4e056aa35 ("w1: documentation update")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <20251104194403.945611-1-rdunlap@infradead.org>
2025-11-05 11:23:39 -07:00
Tomas Glozar
21d5c65d95 Documentation/rtla: Include defaults for tracer options
Commit 0122938a7a ("rtla: Always set all tracer options") changed the
behavior of RTLA to always set all osnoise and timerlat tracer options
to default values taken from the tracers whenever an RTLA measurement
is started. The change was done to make RTLA results consistent on
subsequent runs of the same command.

Include the default values for tracer options also in documentation
where appropriate.

Signed-off-by: Tomas Glozar <tglozar@redhat.com>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <20251010083338.478961-10-tglozar@redhat.com>
2025-11-05 11:19:24 -07:00
Tomas Glozar
b9f6a40dc3 Documentation/trace: Specify exact priority for timerlat
The timerlat tracer documentation mentions that threads are created with
real-time priority, but does not mention which priority and scheduling
class is used.

Add the information so that users do not have to look it up in
trace_osnoise.c.

Signed-off-by: Tomas Glozar <tglozar@redhat.com>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <20251010083338.478961-9-tglozar@redhat.com>
2025-11-05 11:19:20 -07:00
Tomas Glozar
122a552b5b Documentation/rtla: Mention default cgroup state
The RTLA option -C/--cgroup is used to set a cgroup for workload
threads. This is either a specific cgroup, if passed an argument, or
rtla's cgroup, if no argument is given.

Expand the documentation of the -C option to also include the
information about the cgroup settings when the option is not specified.

Signed-off-by: Tomas Glozar <tglozar@redhat.com>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <20251010083338.478961-8-tglozar@redhat.com>
2025-11-05 11:19:17 -07:00
Tomas Glozar
198fcc7cb8 Documentation/rtla: Mention default priority
RTLA allows the priority of workload threads to be set using the -P
option. This is covered in docs, but the default state for RTLA's own
user workload (implemented in timerlat_u.c) is not mentioned.

Add mention of the default user workload priority as well as a reference
to osnoise and timerlat tracers for kernel workload priority.

Signed-off-by: Tomas Glozar <tglozar@redhat.com>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <20251010083338.478961-7-tglozar@redhat.com>
2025-11-05 11:19:13 -07:00
Tomas Glozar
3e30aee838 Documentation/rtla: Correct tracer name for common options
Several options in common_options.rst say "osnoise tracer" for both
osnoise and timerlat.

Use |tool| variable so that the correct tool name is used.

Fixes: b1be48307d ("rtla: Add rtla osnoise top documentation")
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <20251010083338.478961-6-tglozar@redhat.com>
2025-11-05 11:19:09 -07:00
Tomas Glozar
5e954a379f Documentation/rtla: Fix typo in common_timerlat_options.txt
Fix spelling error "equilavent" in place of "equivalent".

Fixes: 173a3b0148 ("rtla/timerlat: Add the automatic trace option")
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <20251010083338.478961-5-tglozar@redhat.com>
2025-11-05 11:19:06 -07:00
Tomas Glozar
5bad56b4a2 Documentation/rtla: Fix typo in rtla-timerlat-top.rst
Fix "seem" in place of intended "seen" in rtla-timerlat-top
documentation.

Fixes: df337d014b ("rtla: Add rtla timerlat top documentation")
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <20251010083338.478961-4-tglozar@redhat.com>
2025-11-05 11:19:03 -07:00
Tomas Glozar
6524d31e15 Documentation/rtla: Fix typo in common_timerlat_options.txt
Fix "awakes" being used in place of "awakened" in --users-threads option
documentation.

Fixes: 6127383217 ("Documentation: Add tools/rtla timerlat -u option documentation")
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <20251010083338.478961-3-tglozar@redhat.com>
2025-11-05 11:18:59 -07:00
Tomas Glozar
aad1530ff6 Documentation/rtla: Fix typo in common_options.txt
Fix "unlike" being spelled "nlike" in --on-threshold documentation.

Fixes: 70165c78e3 ("Documentation/rtla: Add actions feature")
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <20251010083338.478961-2-tglozar@redhat.com>
2025-11-05 11:18:50 -07:00
Andy Shevchenko
469c1c9eb6 kernel-doc: Issue warnings that were silently discarded
When kernel-doc parses the sections for the documentation some errors
may occur. In many cases the warning is simply stored to the current
"entry" object. However, in the most of such cases this object gets
discarded and there is no way for the output engine to even know about
that. To avoid that, check if the "entry" is going to be discarded and
if there warnings have been collected, issue them to the current logger
as is and then flush the "entry". This fixes the problem that original
Perl implementation doesn't have.

As of Linux kernel v6.18-rc4 the reproducer can be:

$ scripts/kernel-doc -v -none -Wall include/linux/util_macros.h
...
Info: include/linux/util_macros.h:138 Scanning doc for function to_user_ptr
...

while with the proposed change applied it gives one more line:

$ scripts/kernel-doc -v -none -Wall include/linux/util_macros.h
...
Info: include/linux/util_macros.h:138 Scanning doc for function to_user_ptr
Warning: include/linux/util_macros.h:144 expecting prototype for to_user_ptr(). Prototype was for u64_to_user_ptr() instead
...

And with the original Perl script:

$ scripts/kernel-doc.pl -v -none -Wall include/linux/util_macros.h
...
include/linux/util_macros.h:139: info: Scanning doc for function to_user_ptr
include/linux/util_macros.h:149: warning: expecting prototype for to_user_ptr(). Prototype was for u64_to_user_ptr() instead
...

Fixes: 9cbc2d3b13 ("scripts/kernel-doc.py: postpone warnings to the output plugin")
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <20251104215502.1049817-1-andriy.shevchenko@linux.intel.com>
2025-11-05 11:05:37 -07:00
Waiman Long
be04e96ba9 cgroup/cpuset: Globally track isolated_cpus update
The current cpuset code passes a local isolcpus_updated flag around in a
number of functions to determine if external isolation related cpumasks
like wq_unbound_cpumask should be updated. It is a bit cumbersome and
makes the code more complex. Simplify the code by using a global boolean
flag "isolated_cpus_updating" to track this. This flag will be set in
isolated_cpus_update() and cleared in update_isolation_cpumasks().

No functional change is expected.

Signed-off-by: Waiman Long <longman@redhat.com>
Reviewed-by: Chen Ridong <chenridong@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-11-05 07:00:33 -10:00
Waiman Long
b1034a6901 cgroup/cpuset: Ensure domain isolated CPUs stay in root or isolated partition
Commit 4a74e41888 ("cgroup/cpuset: Check partition conflict with
housekeeping setup") is supposed to ensure that domain isolated CPUs
designated by the "isolcpus" boot command line option stay either in
root partition or in isolated partitions. However, the required check
wasn't implemented when a remote partition was created or when an
existing partition changed type from "root" to "isolated".

Even though this is a relatively minor issue, we still need to add the
required prstate_housekeeping_conflict() call in the right places to
ensure that the rule is strictly followed.

The following steps can be used to reproduce the problem before this
fix.

  # fmt -1 /proc/cmdline | grep isolcpus
  isolcpus=9
  # cd /sys/fs/cgroup/
  # echo +cpuset > cgroup.subtree_control
  # mkdir test
  # echo 9 > test/cpuset.cpus
  # echo isolated > test/cpuset.cpus.partition
  # cat test/cpuset.cpus.partition
  isolated
  # cat test/cpuset.cpus.effective
  9
  # echo root > test/cpuset.cpus.partition
  # cat test/cpuset.cpus.effective
  9
  # cat test/cpuset.cpus.partition
  root

With this fix, the last few steps will become:

  # echo root > test/cpuset.cpus.partition
  # cat test/cpuset.cpus.effective
  0-8,10-95
  # cat test/cpuset.cpus.partition
  root invalid (partition config conflicts with housekeeping setup)

Reported-by: Chen Ridong <chenridong@huawei.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Reviewed-by: Chen Ridong <chenridong@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-11-05 07:00:33 -10:00
Waiman Long
6cfeddbf4a cgroup/cpuset: Move up prstate_housekeeping_conflict() helper
Move up the prstate_housekeeping_conflict() helper so that it can be
used in remote partition code.

Signed-off-by: Waiman Long <longman@redhat.com>
Reviewed-by: Chen Ridong <chenridong@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-11-05 07:00:33 -10:00
Waiman Long
103b08709e cgroup/cpuset: Fail if isolated and nohz_full don't leave any housekeeping
Currently the user can set up isolated cpus via cpuset and nohz_full in
such a way that leaves no housekeeping CPU (i.e. no CPU that is neither
domain isolated nor nohz full). This can be a problem for other
subsystems (e.g. the timer wheel imgration).

Prevent this configuration by blocking any assignation that would cause
the union of domain isolated cpus and nohz_full to covers all CPUs.

[longman: Remove isolated_cpus_should_update() and rewrite the checking
 in update_prstate() and update_parent_effective_cpumask()]

Originally-by: Gabriele Monaco <gmonaco@redhat.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Reviewed-by: Chen Ridong <chenridong@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-11-05 07:00:33 -10:00
Gabriele Monaco
55939cf28a cgroup/cpuset: Rename update_unbound_workqueue_cpumask() to update_isolation_cpumasks()
update_unbound_workqueue_cpumask() updates unbound workqueues settings
when there's a change in isolated CPUs, but it can be used for other
subsystems requiring updated when isolated CPUs change.

Generalise the name to update_isolation_cpumasks() to prepare for other
functions unrelated to workqueues to be called in that spot.

[longman: Change the function name to update_isolation_cpumasks()]

Acked-by: Frederic Weisbecker <frederic@kernel.org>
Acked-by: Waiman Long <longman@redhat.com>
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Reviewed-by: Chen Ridong <chenridong@huaweicloud.com>
Reviewed-by: Chen Ridong <chenridong@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-11-05 07:00:33 -10:00
Lukas Bulwahn
af205a9b68 MAINTAINERS: add printk core-api doc file to PRINTK
The files in Documentation/core-api/ are by virtue of their top-level
directory part of the Documentation section in MAINTAINERS. Each file in
Documentation/core-api/ should however also have a further section in
MAINTAINERS it belongs to, which fits to the technical area of the
documented API in that file.

The printk.rst provides some explanation to the printk API defined in
include/linux/printk.h, which itself is part of the PRINTK section.

Add this core-api document to PRINTK.

Signed-off-by: Lukas Bulwahn <lukas.bulwahn@redhat.com>
Link: https://patch.msgid.link/20251105102832.155823-1-lukas.bulwahn@redhat.com
Signed-off-by: Petr Mladek <pmladek@suse.com>
2025-11-05 16:38:24 +01:00
Tejun Heo
d723f36e01 sched_ext: Minor cleanups to scx_task_iter
- Use memset() in scx_task_iter_start() instead of zeroing fields individually.

- In scx_task_iter_next(), move __scx_task_iter_maybe_relock() after the batch
  check which is simpler.

- Update comment to reflect that tasks are removed from scx_tasks when dead
  (commit 7900aa699c ("sched_ext: Fix cgroup exit ordering by moving
  sched_ext_free() to finish_task_switch()")).

No functional changes.

Signed-off-by: Tejun Heo <tj@kernel.org>
2025-11-04 11:46:25 -10:00
Tejun Heo
023af03cae sched_ext: Move __SCX_DSQ_ITER_ALL_FLAGS BUILD_BUG_ON to the right place
The BUILD_BUG_ON() which checks that __SCX_DSQ_ITER_ALL_FLAGS doesn't
overlap with the private lnode bits was in scx_task_iter_start() which has
nothing to do with DSQ iteration. Move it to bpf_iter_scx_dsq_new() where it
belongs.

No functional changes.

Signed-off-by: Tejun Heo <tj@kernel.org>
2025-11-04 11:46:24 -10:00
Miguel Ojeda
5935461b45 docs: rust: quick-start: add Debian 13 (Trixie)
Debian 13 (released 2025-08-09) packages Rust 1.85.0 [1], which is recent
enough to build Linux.

Thus document it.

In fact, we are planning to propose that the minimum supported Rust
version in Linux follows Debian Stable releases, with Debian 13 being
the first one we upgrade to, i.e. Rust 1.85.

Link: https://www.debian.org/News/2025/20250809 [1]
Link: https://patch.msgid.link/20251012224645.1148411-1-ojeda@kernel.org
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2025-11-04 20:21:15 +01:00
Lukas Bulwahn
27600b51fb MAINTAINERS: extend DOCUMENTATION SCRIPTS to the full directories
Due to commit abd61d1ff8 ("scripts: sphinx-pre-install: move it to
tools/docs"), checkpatch.pl --self-test=patterns reported a non-matching
file entry in DOCUMENTATION SCRIPTS. Clearly, there are now multiple
documentation scripts, all located in Documentation/sphinx/ and tools/docs/
and Mauro is the maintainer of those.

Update the DOCUMENTATION SCRIPTS section to cover these directories. While
at it, also make the DOCUMENTATION section cover the subdirectories of
tools/docs/.

Signed-off-by: Lukas Bulwahn <lukas.bulwahn@redhat.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <20251103075948.26026-1-lukas.bulwahn@redhat.com>
2025-11-03 16:43:21 -07:00
Jonathan Corbet
77a22121fe Merge branch 'tools-final2' into docs-mw
Our documentation-related tools are spread out over various directories;
several are buried in the scripts/ dumping ground.  That makes them harder
to discover and harder to maintain.

Recent work has started accumulating our documentation-related tools in
/tools/docs.  This series nearly completes that task, moving most of the
rest of our various utilities there, hopefully fixing up all of the
relevant references in the process.

The one exception is scripts/kernel-doc; that move turned up some other
problems, so I have dropped it until those are ironed out.

At the end, rather than move the old, Perl kernel-doc, I simply removed it.
2025-11-03 16:25:22 -07:00
Bagas Sanjaya
e849217cf3 Documentation: treewide: Replace marc.info links with lore
In the past, people would link to third-party mailing list archives
(like marc.info) for any kernel-related discussions. Now that there
is lore archive under kernel.org infrastructure, replace these marc
links

Note that the only remaining marc link is "Re: Memory mapping on Cirrus
EP9315" [1] since that thread is not available at lore [2].

[1]: https://marc.info/?l=linux-arm-kernel&m=110061245502000&w=2
[2]: https://lore.kernel.org/linux-arm-kernel/?q=b%3A%22Re%3A+Memory+mapping+on+Cirrus+EP9315%22

Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Lance Yang <lance.yang@linux.dev>
Acked-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <20251031043358.23709-1-bagasdotme@gmail.com>
2025-11-03 16:21:31 -07:00
Jonathan Corbet
5d75e45518 Merge tag 'Chinese-docs-6.18' of gitolite.kernel.org:pub/scm/linux/kernel/git/alexs/linux into alex
Chinese translation docs for 6.18

This is the Chinese translation subtree for 6.18. It includes
the following changes:
        - docs/zh_CN: Add rust Chinese translations
        - docs/zh_CN: Add scsi Chinese translations
        - docs/zh_CN: Add gfs2 Chinese translations
        - Add some other Chinese translations and fixes

Above patches are tested by 'make htmldocs'
2025-11-03 16:19:12 -07:00
Gabriele Ricciardi
2c62e2e874 coding-style: fix verb typo
In the Identation section there is a list of instructions in
second-person. The offending line uses third-person singular.

Signed-off-by: Gabriele Ricciardi <gricciardi-coding@pm.me>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <20251101223027.171874-1-gricciardi-coding@pm.me>
2025-11-03 16:11:24 -07:00
Tejun Heo
7900aa699c sched_ext: Fix cgroup exit ordering by moving sched_ext_free() to finish_task_switch()
sched_ext_free() was called from __put_task_struct() when the last reference
to the task is dropped, which could be long after the task has finished
running. This causes cgroup-related problems:

- ops.init_task() can be called on a cgroup which didn't get ops.cgroup_init()'d
  during scheduler load, because the cgroup might be destroyed/unlinked
  while the zombie or dead task is still lingering on the scx_tasks list.

- ops.cgroup_exit() could be called before ops.exit_task() is called on all
  member tasks, leading to incorrect exit ordering.

Fix by moving it to finish_task_switch() to be called right after the final
context switch away from the dying task, matching when sched_class->task_dead()
is called. Rename it to sched_ext_dead() to match the new calling context.

By calling sched_ext_dead() before cgroup_task_dead(), we ensure that:

- Tasks visible on scx_tasks list have valid cgroups during scheduler load,
  as cgroup_mutex prevents cgroup destruction while the task is still linked.

- All member tasks have ops.exit_task() called and are removed from scx_tasks
  before the cgroup can be destroyed and trigger ops.cgroup_exit().

This fix is made possible by the cgroup_task_dead() split in the previous patch.

This also makes more sense resource-wise as there's no point in keeping
scheduler side resources around for dead tasks.

Reported-by: Dan Schatzberg <dschatzberg@meta.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Reviewed-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-11-03 11:57:30 -10:00
Tejun Heo
587eb08a5f sched_ext: Merge branch 'for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup into for-6.19
Pull cgroup/for-6.19 to receive:

 16dad7801a ("cgroup: Rename cgroup lifecycle hooks to cgroup_task_*()")
 260fbcb92b ("cgroup: Move dying_tasks cleanup from cgroup_task_release() to cgroup_task_free()")
 d245698d72 ("cgroup: Defer task cgroup unlink until after the task is done switching out")

These are needed for the sched_ext cgroup exit ordering fix.

Signed-off-by: Tejun Heo <tj@kernel.org>
2025-11-03 11:57:26 -10:00
Tejun Heo
d245698d72 cgroup: Defer task cgroup unlink until after the task is done switching out
When a task exits, css_set_move_task(tsk, cset, NULL, false) unlinks the task
from its cgroup. From the cgroup's perspective, the task is now gone. If this
makes the cgroup empty, it can be removed, triggering ->css_offline() callbacks
that notify controllers the cgroup is going offline resource-wise.

However, the exiting task can still run, perform memory operations, and schedule
until the final context switch in finish_task_switch(). This creates a confusing
situation where controllers are told a cgroup is offline while resource
activities are still happening in it. While this hasn't broken existing
controllers, it has caused direct confusion for sched_ext schedulers.

Split cgroup_task_exit() into two functions. cgroup_task_exit() now only calls
the subsystem exit callbacks and continues to be called from do_exit(). The
css_set cleanup is moved to the new cgroup_task_dead() which is called from
finish_task_switch() after the final context switch, so that the cgroup only
appears empty after the task is truly done running.

This also reorders operations so that subsys->exit() is now called before
unlinking from the cgroup, which shouldn't break anything.

Cc: Dan Schatzberg <dschatzberg@meta.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-11-03 11:46:18 -10:00
Tejun Heo
260fbcb92b cgroup: Move dying_tasks cleanup from cgroup_task_release() to cgroup_task_free()
Currently, cgroup_task_exit() adds thread group leaders with live member
threads to their css_set's dying_tasks list (so cgroup.procs iteration can
still see the leader), and cgroup_task_release() later removes them with
list_del_init(&task->cg_list).

An upcoming patch will defer the dying_tasks list addition, moving it from
cgroup_task_exit() (called from do_exit()) to a new function called from
finish_task_switch(). However, release_task() (which calls
cgroup_task_release()) can run either before or after finish_task_switch(),
creating a race where cgroup_task_release() might try to remove the task from
dying_tasks before or while it's being added.

Move the list_del_init() from cgroup_task_release() to cgroup_task_free() to
fix this race. cgroup_task_free() runs from __put_task_struct(), which is
always after both paths, making the cleanup safe.

Cc: Dan Schatzberg <dschatzberg@meta.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-11-03 11:46:18 -10:00
Tejun Heo
16dad7801a cgroup: Rename cgroup lifecycle hooks to cgroup_task_*()
The current names cgroup_exit(), cgroup_release(), and cgroup_free() are
confusing because they look like they're operating on cgroups themselves when
they're actually task lifecycle hooks. For example, cgroup_init() initializes
the cgroup subsystem while cgroup_exit() is a task exit notification to
cgroup. Rename them to cgroup_task_exit(), cgroup_task_release(), and
cgroup_task_free() to make it clear that these operate on tasks.

Cc: Dan Schatzberg <dschatzberg@meta.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Reviewed-by: Chen Ridong <chenridong@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-11-03 11:46:18 -10:00
Thorsten Blum
2e44874814 lib/vsprintf: Improve vsprintf + sprintf function comments
Clarify that the return values of vsprintf() and sprintf() exclude the
trailing NUL character.

Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://patch.msgid.link/20251103090913.2066-2-thorsten.blum@linux.dev
Signed-off-by: Petr Mladek <pmladek@suse.com>
2025-11-03 18:39:29 +01:00
Willy Tarreau
7534b9bfe6 tools/nolibc: clean up outdated comments in generic arch.h
Along the code reorganizations, the file has been keeping the original
comments about argv and envp which are no longer relevant to this file
anymore. Let's just drop them.

Acked-by: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2025-11-02 22:02:43 +01:00
Willy Tarreau
1868c027b6 tools/nolibc: make the "headers" target install all supported archs
The efforts we go through by installing a single arch are counter
productive when the base directory already supports them all, and
the arch-specific files are really small. Let's make the "headers"
target simply install headers for all supported archs and stop
trying to build a hybrid "arch.h" file on the fly, to instead keep
the generic one. Now the same nolibc headers installation will be
usable with any arch-specific uapi installation.

Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
2025-11-02 15:38:13 +01:00
Willy Tarreau
44bf8bbe29 tools/nolibc: add the more portable inttypes.h
It's often recommended to only use inttypes.h instead of stdint.h for
portability reasons since the former is always present when the latter
is present, but not conversely, and the former includes the latter. Due
to this some simple programs fail to build when including inttypes.h.
Let's add one that simply includes nolibc.h to better support these
programs.

Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
2025-11-02 14:28:20 +01:00
Willy Tarreau
10f407c660 tools/nolibc: provide the portable sys/select.h
Modern programs tend to include sys/select.h to get FD_SET() and
FD_CLR() definitions as well as struct fd_set, but in our case it
didn't exist.

The definitions were moved from types.h to sys/select.h, which is
now included from nolibc.h, and the sys_select() definition moved
there as well from sys.h.

Signed-off-by: Willy Tarreau <w@1wt.eu>
[Thomas: adapt to current -next branch]
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
2025-11-02 13:36:19 +01:00
Willy Tarreau
db75042e93 tools/nolibc: add missing memchr() to string.h
Surprisingly we forgot to add this common one. It was added with a
per-arch guard allowing to later implement it in arch-specific asm
code like was done for a few other ones.

The test verifies that we don't search past the indicated length.

Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
2025-11-02 12:11:48 +01:00
Willy Tarreau
09c873c91f tools/nolibc: fix misleading help message regarding installation path
The help message says the headers are going to be installed into
tools/include/nolibc but this is only the default if $OUTPUT is not set,
so better clarify this (the current value of $OUTPUT is already shown).

Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
2025-11-02 12:04:54 +01:00
Thorsten Blum
d5b59ec33c crypto: qat - use simple_strtoull to improve qat_uclo_parse_num
Replace the manual string copying and parsing logic with a call to
simple_strtoull() to simplify and improve qat_uclo_parse_num().

Ensure that the parsed number does not exceed UINT_MAX, and add an
approximate upper-bound check (no more than 19 digits) to guard against
overflow.

Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Acked-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-10-31 17:50:43 +08:00
nieweiqiang
d912494747 crypto: hisilicon/qm - add missing default in switch in qm_vft_data_cfg
Add default case to avoid warnings and meet code style requirements.

Signed-off-by: nieweiqiang <nieweiqiang@huawei.com>
Signed-off-by: Chenghai Huang <huangchenghai2@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-10-31 17:50:43 +08:00
nieweiqiang
51996331f7 crypto: hisilicon/sgl - remove unnecessary checks for curr_hw_sgl error
Before calling acc_get_sgl(), the mem_block has already been
created. acc_get_sgl() will not return NULL or any other error.
so the return value check can be removed.

Signed-off-by: nieweiqiang <nieweiqiang@huawei.com>
Signed-off-by: Chenghai Huang <huangchenghai2@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-10-31 17:50:43 +08:00
nieweiqiang
1b5645e454 crypto: hisilicon/qm - add concurrency protection for variable err_threshold
The isolate_strategy_store function is not protected
by a lock. If sysfs operations and functions that depend
on the err_threshold variable,such as qm_hw_err_isolate(),
execute concurrently, the outcome will be unpredictable.
Therefore, concurrency protection should be added for
the err_threshold variable.

Signed-off-by: nieweiqiang <nieweiqiang@huawei.com>
Signed-off-by: Chenghai Huang <huangchenghai2@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-10-31 17:50:42 +08:00
nieweiqiang
f5a332980a crypto: hisilicon/qm - add the save operation of eqe and aeqe
The eqe and aeqe are device updated values that include the
valid bit and queue number. In the current process, there is no
memory barrier added, so it cannot be guaranteed that the valid
bit is read before other processes are executed. Since eqe and aeqe
are only 4 bytes and the device writes them to memory in a single
operation, saving the values of eqe and aeqe ensures that the valid
bit and queue number read by the CPU were written by the device
simultaneously.

Signed-off-by: nieweiqiang <nieweiqiang@huawei.com>
Signed-off-by: Chenghai Huang <huangchenghai2@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-10-31 17:50:42 +08:00
Bjorn Andersson
f5e297a112 crypto: qce - Provide dev_err_probe() status on DMA failure
On multiple occasions the qce device have shown up in devices_deferred,
without the explanation that this came from the failure to acquire the
DMA channels from the associated BAM.

Use dev_err_probe() to associate this context with the failure to faster
pinpoint the culprit when this happens in the future.

Signed-off-by: Bjorn Andersson <bjorn.andersson@oss.qualcomm.com>
Reviewed-by: Abel Vesa <abel.vesa@linaro.org>
Reviewed-by: David Heidelberg <david@ixit.cz>
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-10-31 17:50:42 +08:00
Thorsten Blum
12ad5b2346 keys: Annotate struct asymmetric_key_id with __counted_by
Add the __counted_by() compiler attribute to the flexible array member
'data' to improve access bounds-checking via CONFIG_UBSAN_BOUNDS and
CONFIG_FORTIFY_SOURCE.

Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Reviewed-by: Lukas Wunner <lukas@wunner.de>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-10-31 17:43:56 +08:00
Rob Herring (Arm)
841940df6f dt-bindings: crypto: amd,ccp-seattle-v1a: Allow 'iommus' property
The AMD Seattle CCP is behind an IOMMU and has 4 entries, so add
the 'iommus' property.

Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-10-31 17:43:56 +08:00
T Pratham
4fbfd7b206 crypto: ti - Add support for AES-XTS in DTHEv2 driver
Add support for XTS mode of operation for AES algorithm in the AES
Engine of the DTHEv2 hardware cryptographic engine.

Signed-off-by: T Pratham <t-pratham@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-10-31 17:43:56 +08:00
T Pratham
85e1a7ec61 crypto: aead - Add support for on-stack AEAD req allocation
This patch introduces infrastructure for allocating req objects on the
stack for AEADs. The additions mirror the existing sync skcipher APIs.
This can be used in cases where simple sync AEAD operations are being
done. So allocating the request on stack avoides possible out-of-memory
errors.

The struct crypto_sync_aead is a wrapper around crypto_aead and should
be used in its place when sync only requests will be done on the stack.
Correspondingly, the request should be allocated with
SYNC_AEAD_REQUEST_ON_STACK().

Similar to sync_skcipher APIs, the new sync_aead APIs are wrappers
around the regular aead APIs to facilitate sync only operations. The
following crypto APIs are added:
 - struct crypto_sync_aead
 - crypto_alloc_sync_aead()
 - crypto_free_sync_aead()
 - crypto_aync_aead_tfm()
 - crypto_sync_aead_setkey()
 - crypto_sync_aead_setauthsize()
 - crypto_sync_aead_authsize()
 - crypto_sync_aead_maxauthsize()
 - crypto_sync_aead_ivsize()
 - crypto_sync_aead_blocksize()
 - crypto_sync_aead_get_flags()
 - crypto_sync_aead_set_flags()
 - crypto_sync_aead_clear_flags()
 - crypto_sync_aead_reqtfm()
 - aead_request_set_sync_tfm()
 - SYNC_AEAD_REQUEST_ON_STACK()

Signed-off-by: T Pratham <t-pratham@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-10-31 17:43:56 +08:00
Alex Shi
6fc05a144c Revert "Docs/zh_CN: Translate skbuff.rst to Simplified Chinese"
This reverts commit d3e7609c6e5ec92587ed1043a985749d22cc78d1.
The commit cause a warning:
  Documentation/networking/skbuff.rst:34: WARNING: duplicate label crc, other instance in Documentation/translations/zh_CN/networking/skbuff.rst

And there's no simple way to keep the meaningful doc context and avoid the
warning, so, let's remove the doc.

Signed-off-by: Alex Shi <alexs@kernel.org>
2025-10-31 14:01:35 +08:00
Jacob Keller
e5e7ca66a7 docs: kdoc: fix duplicate section warning message
The python version of the kernel-doc parser emits some strange warnings
with just a line number in certain cases:

$ ./scripts/kernel-doc -Wall -none 'include/linux/virtio_config.h'
Warning: 174
Warning: 184
Warning: 190
Warning: include/linux/virtio_config.h:226 No description found for return value of '__virtio_test_bit'
Warning: include/linux/virtio_config.h:259 No description found for return value of 'virtio_has_feature'
Warning: include/linux/virtio_config.h:283 No description found for return value of 'virtio_has_dma_quirk'
Warning: include/linux/virtio_config.h:392 No description found for return value of 'virtqueue_set_affinity'

I eventually tracked this down to the lone call of emit_msg() in the
KernelEntry class, which looks like:

  self.emit_msg(self.new_start_line, f"duplicate section name '{name}'\n")

This looks like all the other emit_msg calls. Unfortunately, the definition
within the KernelEntry class takes only a message parameter and not a line
number. The intended message is passed as the warning!

Pass the filename to the KernelEntry class, and use this to build the log
message in the same way as the KernelDoc class does.

To avoid future errors, mark the warning parameter for both emit_msg
definitions as a keyword-only argument. This will prevent accidentally
passing a string as the warning parameter in the future.

Also fix the call in dump_section to avoid an unnecessary additional
newline.

Fixes: e3b42e94cf ("scripts/lib/kdoc/kdoc_parser.py: move kernel entry to a class")
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <20251030-jk-fix-kernel-doc-duplicate-return-warning-v2-1-ec4b5c662881@intel.com>
2025-10-30 15:13:18 -06:00
Petr Tesarik
8ad018dbd3 slab: use new API for remaining command line parameters
Use core_param() and __core_param_cb() instead of __setup() or
__setup_param() to improve syntax checking and error messages.

Replace get_option() with kstrtouint(), because:
* the latter accepts a pointer to const char,
* these parameters should not accept ranges,
* error value can be passed directly to parser.

There is one more change apart from the parsing of numeric parameters:
slab_strict_numa parameter name must match exactly. Before this patch the
kernel would silently accept any option that starts with the name as an
undocumented alias.

Signed-off-by: Petr Tesarik <ptesarik@suse.com>
Link: https://patch.msgid.link/6ae7e0ddc72b7619203c07dd5103a598e12f713b.1761324765.git.ptesarik@suse.com
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2025-10-30 14:35:20 +01:00
Petr Mladek
d5d399efff printk/nbcon: Release nbcon consoles ownership in atomic flush after each emitted record
printk() tries to flush messages with NBCON_PRIO_EMERGENCY on
nbcon consoles immediately. It might take seconds to flush all
pending lines on slow serial consoles. Note that there might be
hundreds of messages, for example:

[    3.771531][    T1] pci 0000:3e:08.1: [8086:324
** replaying previous printk message **
[    3.771531][    T1] pci 0000:3e:08.1: [8086:3246] type 00 class 0x088000 PCIe Root Complex Integrated Endpoint
[ ... more than 2000 lines, about 200kB messages ... ]
[    3.837752][    T1] pci 0000:20:01.0: Adding to iommu group 18
[    3.837851][    T
** replaying previous printk message **
[    3.837851][    T1] pci 0000:20:03.0: Adding to iommu group 19
[    3.837946][    T1] pci 0000:20:05.0: Adding to iommu group 20
[ ... more than 500 messages for iommu groups 21-590 ...]
[    3.912932][    T1] pci 0000:f6:00.1: Adding to iommu group 591
[    3.913070][    T1] pci 0000:f6:00.2: Adding to iommu group 592
[    3.913243][    T1] DMAR: Intel(R) Virtualization Technology for Directed I/O
[    3.913245][    T1] PCI-DMA: Using software bounce buffering for IO (SWIOTLB)
[    3.913245][    T1] software IO TLB: mapped [mem 0x000000004f000000-0x0000000053000000] (64MB)
[    3.913324][    T1] RAPL PMU: API unit is 2^-32 Joules, 3 fixed counters, 655360 ms ovfl timer
[    3.913325][    T1] RAPL PMU: hw unit of domain package 2^-14 Joules
[    3.913326][    T1] RAPL PMU: hw unit of domain dram 2^-14 Joules
[    3.913327][    T1] RAPL PMU: hw unit of domain psys 2^-0 Joules
[    3.933486][    T1] ------------[ cut here ]------------
[    3.933488][    T1] WARNING: CPU: 2 PID: 1 at arch/x86/events/intel/uncore.c:1156 uncore_pci_pmu_register+0x15e/0x180
[    3.930291][    C0] watchdog: Watchdog detected hard LOCKUP on cpu 0
[    3.930291][    C0] Kernel panic - not syncing: Hard LOCKUP
[...]
[    3.930291][    C0] CPU: 0 UID: 0 PID: 18 Comm: pr/ttyS0 Not tainted...
[...]
[    3.930291][    C0] RIP: 0010:nbcon_reacquire_nobuf+0x11/0x50
[    3.930291][    C0] Call Trace:
[...]
[    3.930291][    C0]  <TASK>
[    3.930291][    C0]  serial8250_console_write+0x16d/0x5c0
[    3.930291][    C0]  nbcon_emit_next_record+0x22c/0x250
[    3.930291][    C0]  nbcon_emit_one+0x93/0xe0
[    3.930291][    C0]  nbcon_kthread_func+0x13c/0x1c0

The are visible two takeovers of the console ownership:

  - The 1st one is triggered by the "WARNING: CPU: 2 PID: 1 at
    arch/x86/..." line printed with NBCON_PRIO_EMERGENCY.

  - The 2nd one is triggered by the "Kernel panic - not syncing:
    Hard LOCKUP" line printed with NBCON_PRIO_PANIC.

There are more than 2500 lines, at about 240kB, emitted between
the takeover and the 1st "WARNING" line in the emergency context.
This amount of pending messages had to be flushed by
nbcon_atomic_flush_pending() when WARN() printed its first line.

The atomic flush was holding the nbcon console context for too long so
that it triggered hard lockup on the CPU running the printk kthread
"pr/ttyS0". The kthread needed to reacquire the console ownership
for restoring the original serial port state in serial8250_console_write().

Prevent the hardlockup by releasing the nbcon console ownership after
each emitted record.

Note that __nbcon_atomic_flush_pending_con() used to hold the console
ownership all the time because it blocked the printk kthread. Otherwise
the kthread tried to flush the messages in parallel which caused repeated
takeovers and more replayed messages.

It is not longer a problem because the repeated takeovers are blocked
by the counter of emergency contexts, see nbcon_cpu_emergency_cnt.

Link: https://lore.kernel.org/all/aNQO-zl3k1l4ENfy@pathway.suse.cz
Reviewed-by: Andrew Murray <amurray@thegoodpenguin.co.uk>
Reviewed-by: John Ogness <john.ogness@linutronix.de>
Link: https://patch.msgid.link/20250926124912.243464-4-pmladek@suse.com
Signed-off-by: Petr Mladek <pmladek@suse.com>
2025-10-30 12:11:33 +01:00
Petr Mladek
4c3ba0d592 printk/nbcon/panic: Allow printk kthread to sleep when the system is in panic
The printk kthread might be running when there is a panic in progress.
But it is not able to acquire the console ownership any longer.

Prevent the desperate attempts to acquire the ownership and allow sleeping
in panic. It would make it behave the same as when there is any CPU
in an emergency context.

Reviewed-by: Andrew Murray <amurray@thegoodpenguin.co.uk>
Reviewed-by: John Ogness <john.ogness@linutronix.de>
Link: https://patch.msgid.link/20250926124912.243464-3-pmladek@suse.com
[pmladek@suse.com: Rebased on top of 6.18-rc1 (panic_in_progress() moved to panic.c)]
Signed-off-by: Petr Mladek <pmladek@suse.com>
2025-10-30 12:11:33 +01:00
Petr Mladek
c41c0ebfa1 printk/nbcon: Block printk kthreads when any CPU is in an emergency context
In emergency contexts, printk() tries to flush messages directly even
on nbcon consoles. And it is allowed to takeover the console ownership
and interrupt the printk kthread in the middle of a message.

Only one takeover and one repeated message should be enough in most
situations. The first emergency message flushes the backlog and printk
kthreads get to sleep. Next emergency messages are flushed directly
and printk() does not wake up the kthreads.

However, the one takeover is not guaranteed. Any printk() in normal
context on another CPU could wake up the kthreads. Or a new emergency
message might be added before the kthreads get to sleep. Note that
the interrupted .write_thread() callbacks usually have to call
nbcon_reacquire_nobuf() and restore the original device setting
before checking for pending messages.

The risk of the repeated takeovers will be even bigger because
__nbcon_atomic_flush_pending_con is going to release the console
ownership after each emitted record. It will be needed to prevent
hardlockup reports on other CPUs which are busy waiting for
the context ownership, for example, by nbcon_reacquire_nobuf() or
__uart_port_nbcon_acquire().

The repeated takeovers break the output, for example:

    [ 5042.650211][ T2220] Call Trace:
    [ 5042.6511
    ** replaying previous printk message **
    [ 5042.651192][ T2220]  <TASK>
    [ 5042.652160][ T2220]  kunit_run_
    ** replaying previous printk message **
    [ 5042.652160][ T2220]  kunit_run_tests+0x72/0x90
    [ 5042.653340][ T22
    ** replaying previous printk message **
    [ 5042.653340][ T2220]  ? srso_alias_return_thunk+0x5/0xfbef5
    [ 5042.654628][ T2220]  ? stack_trace_save+0x4d/0x70
    [ 5042.6553
    ** replaying previous printk message **
    [ 5042.655394][ T2220]  ? srso_alias_return_thunk+0x5/0xfbef5
    [ 5042.656713][ T2220]  ? save_trace+0x5b/0x180

A more robust solution is to block the printk kthread entirely whenever
*any* CPU enters an emergency context. This ensures that critical messages
can be flushed without contention from the normal, non-atomic printing
path.

Link: https://lore.kernel.org/all/aNQO-zl3k1l4ENfy@pathway.suse.cz
Reviewed-by: Andrew Murray <amurray@thegoodpenguin.co.uk>
Reviewed-by: John Ogness <john.ogness@linutronix.de>
Link: https://patch.msgid.link/20250926124912.243464-2-pmladek@suse.com
[pmladek@suse.com: Added changes proposed by John Ogness]
Signed-off-by: Petr Mladek <pmladek@suse.com>
2025-10-30 12:10:16 +01:00
Brendan Jackman
b4ff1f611b Documentation: fix reference to PR_SPEC_L1D_FLUSH
PR_SET_L1D_FLUSH does not exist.

Signed-off-by: Brendan Jackman <jackmanb@google.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <20251015-l1d-flush-doc-v1-2-f8cefea3f2f2@google.com>
2025-10-29 16:19:27 -06:00
Brendan Jackman
aab703b3c6 Documentation: clarify PR_SPEC_L1D_FLUSH
For PR_SPEC_STORE_BYPASS and PR_SPEC_INDIRECT_BRANCH, PR_SPEC_DISABLE
means "disable the speculation bug" i.e. "enable the mitigation".

For PR_SPEC_L1D_FLUSH, PR_SPEC_DISABLE means "disable the mitigation".
This is not obvious, so document it.

Signed-off-by: Brendan Jackman <jackmanb@google.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <20251015-l1d-flush-doc-v1-1-f8cefea3f2f2@google.com>
2025-10-29 16:19:27 -06:00
Bagas Sanjaya
ba2457109d Documentation: process: Also mention Sasha Levin as stable tree maintainer
Sasha has also maintaining stable branch in conjunction with Greg
since cb5d21946d ("MAINTAINERS: Add Sasha as a stable branch
maintainer"). Mention him in 2.Process.rst.

Cc: stable@vger.kernel.org
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <20251022034336.22839-1-bagasdotme@gmail.com>
2025-10-29 16:19:27 -06:00
Nadav Tasher
9de608a26f docs: replace broken links in ramfs-rootfs-initramfs docs
http://www.uwsg.iu.edu/ doesn't seem to exist anymore.
I managed to find backups on archive.org, which helped me find
the right links on https://lore.kernel.org/.

http://freecode.com/projects/afio was also down, so I figured
it could be replaced with https://linux.die.net/man/1/afio.

Replace broken links to mailing list and aifo tool.

Signed-off-by: Nadav Tasher <tashernadav@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <20251025171625.33197-1-tashernadav@gmail.com>
2025-10-29 16:19:26 -06:00
Tejun Heo
3442345644 sched_ext/tools: Restore backward compat with v6.12 kernels
Commit 111a79800a ("tools/sched_ext: Strip compatibility macros for
cgroup and dispatch APIs") removed the compat layer for v6.12-v6.13 kfunc
renaming, but v6.12 is the current LTS kernel and will remain supported
through 2026. Restore backward compatibility so schedulers built with v6.19+
headers can run on v6.12 kernels.

The restored compat differs from the original in two ways:

1. Uses ___new/___old suffixes instead of ___compat for clarity. The new
   macros check for v6.13+ names (scx_bpf_dsq_move*), fall back to v6.12
   names (scx_bpf_dispatch_from_dsq*, scx_bpf_consume), then return safe
   no-ops for missing symbols.

2. Integrates with the args-struct-packing changes added in c0d630ba34
   ("sched_ext: Wrap kfunc args in struct to prepare for aux__prog").
   scx_bpf_dsq_insert_vtime() now tries __scx_bpf_dsq_insert_vtime (args
   struct), then scx_bpf_dsq_insert_vtime___compat (v6.13-v6.18), then
   scx_bpf_dispatch_vtime___compat (pre-v6.13).

Forward compatibility is not restored - binaries built against v6.13 or
earlier headers won't run on v6.19+ kernels, as the old kfunc names are not
exported. This is acceptable since the priority is new binaries running on
older kernels.

Also add missing compat checks for ops.cgroup_set_bandwidth() (added v6.17)
and ops.cgroup_set_idle() (added v6.19). These need to be NULLed out in
userspace on older kernels.

Reported-by: Andrea Righi <arighi@nvidia.com>
Acked-and-tested-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-10-29 10:41:21 -10:00
Akira Yokosawa
1f6e3f2139 tools/docs/sphinx-build-wrapper: Emit $SPHINXOPTS later in args list
The option list to sphinx-build via SPHINXOPTS should have higher
priority than those the wrapper comes up with.
sphinx-build will choose the latest one if there are duplicates.

To restore the behavior of Makefile era, when the documentation builds
at https://www.kernel.org/doc/html/next/ had been depending on it,
reorder the flag list.

Reported-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
Closes: https://lore.kernel.org/20251007-awesome-guan-of-greatness-e6ec75@lemur/
Reported-by: Akira Yokosawa <akiyks@gmail.com>
Closes: https://lore.kernel.org/c35e690f-0579-49cb-850c-07af98e5253a@gmail.com/
Signed-off-by: Akira Yokosawa <akiyks@gmail.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <eaf4bfd8-fb80-45d0-b3ec-4047692ebbed@gmail.com>
2025-10-29 09:48:21 -06:00
Benjamin Berg
4bb30188c7 tools/nolibc: add uio.h with readv and writev
This is generally useful and struct iovec is also needed for other
purposes such as ptrace.

Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
2025-10-29 16:29:19 +01:00
Benjamin Berg
3d66c4e14f tools/nolibc: add option to disable runtime
In principle, it is possible to use nolibc for only some object files in
a program. In that case, the startup code in _start and _start_c is not
going to be used. Add the NOLIBC_NO_RUNTIME compile time option to
disable it entirely and also remove anything that depends on it.

Doing this avoids warnings from modpost for UML as the _start_c code
references the main function from the .init.text section while it is not
inside .init itself.

Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
2025-10-29 16:29:16 +01:00
Benjamin Berg
2cb6cc8361 tools/nolibc: use __fallthrough__ rather than fallthrough
Use the version of the attribute with underscores to avoid issues if
fallthrough has been defined by another header file already.

Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
2025-10-29 16:29:15 +01:00
Benjamin Berg
fbd1b7f6b3 tools/nolibc: implement %m if errno is not defined
For improved compatibility, print %m as "unknown error" when nolibc is
compiled using NOLIBC_IGNORE_ERRNO.

Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
2025-10-29 16:29:15 +01:00
Benjamin Berg
4ada5679f1 tools/nolibc/dirent: avoid errno in readdir_r
Using errno is not possible when NOLIBC_IGNORE_ERRNO is set. Use
sys_lseek instead of lseek as that avoids using errno.

Fixes: 665fa8dea9 ("tools/nolibc: add support for directory access")
Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
2025-10-29 16:29:14 +01:00
Benjamin Berg
c485ca3aff tools/nolibc/stdio: let perror work when NOLIBC_IGNORE_ERRNO is set
There is no errno variable when NOLIBC_IGNORE_ERRNO is defined. As such,
simply print the message with "unknown error" rather than the integer
value of errno.

Fixes: acab7bcdb1 ("tools/nolibc/stdio: add perror() to report the errno value")
Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
2025-10-29 16:29:13 +01:00
Thomas Weißschuh
089c0a9853 tools/nolibc: remove outdated comment about __sysret() in mmap()
Since commit fb01ff635e ("tools/nolibc: keep brk(), sbrk(), mmap()
away from __sysret()") the implementation of mmap() does not use the
__sysret() macro anymore.

Remove the outdated comment.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
2025-10-29 16:29:12 +01:00
Tejun Heo
a3f5d48222 sched_ext: Allow scx_bpf_reenqueue_local() to be called from anywhere
The ops.cpu_acquire/release() callbacks miss events under multiple conditions.
There are two distinct task dispatch gaps that can cause cpu_released flag
desynchronization:

1. balance-to-pick_task gap: This is what was originally reported. balance_scx()
   can enqueue a task, but during consume_remote_task() when the rq lock is
   released, a higher priority task can be enqueued and ultimately picked while
   cpu_released remains false. This gap is closeable via RETRY_TASK handling.

2. ttwu-to-pick_task gap: ttwu() can directly dispatch a task to a CPU's local
   DSQ. By the time the sched path runs on the target CPU, higher class tasks may
   already be queued. In such cases, nothing on sched_ext side will be invoked,
   and the only solution would be a hook invoked regardless of sched class, which
   isn't desirable.

Rather than adding invasive core hooks, BPF schedulers can use generic BPF
mechanisms like tracepoints. From SCX scheduler's perspective, this is congruent
with other mechanisms it already uses and doesn't add further friction.

The main use case for cpu_release() was calling scx_bpf_reenqueue_local() when
a CPU gets preempted by a higher priority scheduling class. However, the old
scx_bpf_reenqueue_local() could only be called from cpu_release() context.

Add a new version of scx_bpf_reenqueue_local() that can be called from any
context by deferring the actual re-enqueue operation. This eliminates the need
for cpu_acquire/release() ops entirely. Schedulers can now use standard BPF
mechanisms like the sched_switch tracepoint to detect and handle CPU preemption.

Update scx_qmap to demonstrate the new approach using sched_switch instead of
cpu_release, with compat support for older kernels. Mark cpu_acquire/release()
as deprecated. The old scx_bpf_reenqueue_local() variant will be removed in
v6.23.

Reported-by: Wen-Fang Liu <liuwenfang@honor.com>
Link: https://lore.kernel.org/all/8d64c74118c6440f81bcf5a4ac6b9f00@honor.com/
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-10-29 05:29:04 -10:00
Tejun Heo
8803e6a7fb sched_ext: Factor out reenq_local() from scx_bpf_reenqueue_local()
Factor out the core re-enqueue logic from scx_bpf_reenqueue_local() into a
new reenq_local() helper function. scx_bpf_reenqueue_local() now handles the
BPF kfunc checks and calls reenq_local() to perform the actual work.

This is a prep patch to allow reenq_local() to be called from other contexts.

Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-10-29 05:29:04 -10:00
Tejun Heo
180b4ac342 sched_ext: Split schedule_deferred() into locked and unlocked variants
schedule_deferred() currently requires the rq lock to be held so that it can
use scheduler hooks for efficiency when available. However, there are cases
where deferred actions need to be scheduled from contexts that don't hold the
rq lock.

Split into schedule_deferred() which can be called from any context and just
queues irq_work, and schedule_deferred_locked() which requires the rq lock and
can optimize by using scheduler hooks when available. Update the existing call
site to use the _locked variant.

Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-10-29 05:29:04 -10:00
Tejun Heo
2d697e5f5a Merge branch 'for-6.18-fixes' into for-6.19
Pull to receive f4fa7c25f6 ("sched_ext: Fix use of uninitialized variable
in scx_bpf_cpuperf_set()") which conflicts with changes for planned
sub-sched changes.
2025-10-29 05:18:13 -10:00
Oleg Nesterov
2079395583 printk_legacy_map: use LD_WAIT_CONFIG instead of LD_WAIT_SLEEP
printk_legacy_map is used to hide lock nesting violations caused by
legacy drivers and is using the wrong override type. LD_WAIT_SLEEP is
for always sleeping lock types such as mutex_t. LD_WAIT_CONFIG is for
lock type which are sleeping while spinning on PREEMPT_RT such as
spinlock_t.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://patch.msgid.link/20251026150726.GA23223@redhat.com
[pmladek@suse.com: Fixed indentation.]
Signed-off-by: Petr Mladek <pmladek@suse.com>
2025-10-29 14:20:56 +01:00
Jonathan Corbet
683e8cbaba docs: remove kernel-doc.pl
We've been using the Python version and nobody has missed this one.  All
credit goes to Mauro Carvalho Chehab for creating the replacement.

Reviewed-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2025-10-28 16:01:31 -06:00
Jonathan Corbet
184414c6a6 docs: move find-unused-docs.sh to tools/docs
...and update references accordingly.

Reviewed-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2025-10-28 16:01:27 -06:00
Jonathan Corbet
f1c2db1f14 docs: move test_doc_build.py to tools/docs
Add this tool to tools/docs.

Reviewed-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2025-10-28 16:01:24 -06:00
Jonathan Corbet
a5dd93016f docs: move get_abi.py to tools/docs
Move this tool out of scripts/ to join the other documentation tools; fix
up a couple of erroneous references in the process.

It's worth noting that this script will fail badly unless one has a
PYTHONPATH referencing scripts/lib/abi.

Reviewed-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2025-10-28 16:01:20 -06:00
Jonathan Corbet
eaae0ad972 docs: move scripts/documentation-file-ref-check to tools/docs
Add this script to the growing collection of documentation tools.

Reviewed-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2025-10-28 16:01:17 -06:00
Jonathan Corbet
d37366cac4 docs: move checktransupdate.py to tools/docs
The checktranslate.py tool currently languishes in scripts/; move it to
tools/docs and update references accordingly.

Cc: Alex Shi <alexs@kernel.org>
Cc: Yanteng Si <si.yanteng@linux.dev>
Cc: Dongliang Mu <dzm91@hust.edu.cn>
Reviewed-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2025-10-28 16:01:13 -06:00
Jonathan Corbet
909597fa01 docs: Move the "features" tools to tools/docs
The scripts for managing the features docs are found in three different
directories; unite them all under tools/docs and update references as
needed.

Reviewed-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2025-10-28 16:01:00 -06:00
Tejun Heo
b7d4b28db7 sched_ext: Use SCX_TASK_READY test instead of tryget_task_struct() during class switch
ddf7233fca ("sched/ext: Fix invalid task state transitions on class
switch") added tryget_task_struct() test during scx_enable()'s class
switching loop. The reason for the addition was to avoid enabling tasks which
skipped prep in the previous loop due to being dead.

While tryget_task_struct() does work for this purpose as tasks that fail
tryget always will fail it, it's a bit roundabout. A more direct way is
testing whether the task is in READY state. Switch to testing SCX_TASK_READY
directly.

Cc: Andrea Righi <arighi@nvidia.com>
Acked-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-10-28 11:38:34 -10:00
Valentin Schneider
d1e6d27738 rcu: Add a small-width RCU watching counter debug option
A later commit will reduce the size of the RCU watching counter to free up
some bits for another purpose. Paul suggested adding a config option to
test the extreme case where the counter is reduced to its minimum usable
width for rcutorture to poke at, so do that.

Make it only configurable under RCU_EXPERT. While at it, add a comment to
explain the layout of context_tracking->state.

Link: http://lore.kernel.org/r/4c2cb573-168f-4806-b1d9-164e8276e66a@paulmck-laptop
Suggested-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Valentin Schneider <vschneid@redhat.com>
Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
2025-10-28 17:32:56 +01:00
Yuya Ishikawa
9de5f847ef Documentation: kunit: add description of kunit.enable parameter
The current KUnit documentation does not mention the kunit.enable
kernel parameter, making it unclear how to troubleshoot cases where
KUnit tests do not run as expected.
Add a note explaining kunit.enable parmaeter. Disabling this parameter
prevents all KUnit tests from running even if CONFIG_KUNIT is enabled.

Link: https://lore.kernel.org/r/20251021030605.41610-1-ishikawa.yuy-00@jp.fujitsu.com
Signed-off-by: Yuya Ishikawa <ishikawa.yuy-00@jp.fujitsu.com>
Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2025-10-27 11:16:09 -06:00
Willy Tarreau
2602949b22 tools/nolibc: x86: fix section mismatch caused by asm "mem*" functions
I recently got occasional build failures at -Os or -Oz that would always
involve waitpid(), where the assembler would complain about this:

   init.s: Error: .size expression for waitpid.constprop.0 does not evaluate to a constant

And without -fno-asynchronous-unwind-tables it could also spit such
errors:

  init.s:836: Error: CFI instruction used without previous .cfi_startproc
  init.s:838: Error: .cfi_endproc without corresponding .cfi_startproc
  init.s: Error: open CFI at the end of file; missing .cfi_endproc directive

A trimmed down reproducer is as simple as this:

  int main(int argc, char **argv)
  {
        int ret, status;

        if (argc == 0)
                ret = waitpid(-1, &status, 0);
        else
                ret = waitpid(-1, &status, 0);

        return status;
  }

It produces the following asm code on x86_64:

        .text
  .section .text.nolibc_memmove_memcpy
  .weak memmove
  .weak memcpy
  memmove:
  memcpy:
        movq %rdx, %rcx
	(...)
        retq
  .section .text.nolibc_memset
  .weak memset
  memset:
        xchgl %eax, %esi
        movq  %rdx, %rcx
        pushq %rdi
        rep stosb
        popq  %rax
        retq

        .type	waitpid.constprop.0.isra.0, @function
  waitpid.constprop.0.isra.0:
        subq	$8, %rsp
        (...)
        jmp	*.L5(,%rax,8)
        .section	.rodata
        .align 8
        .align 4
  .L5:
        .quad	.L10
        (...)
        .quad	.L4
        .text
  .L10:
        (...)
        .cfi_def_cfa_offset 8
        ret
        .cfi_endproc
  .LFE273:
        .size	waitpid.constprop.0.isra.0, .-waitpid.constprop.0.isra.0

It's a bit dense, but here's the explanation: the compiler has emitted a
".text" statement because it knows it's working in the .text section.

Then, our hand-written asm code for the mem* functions forced the section
to .text.something without the compiler knowing about it, so it thinks
the code is still being emitted for .text. As such, without any .section
statement, the waitpid.constprop.0.isra.0 label is in fact placed in the
previously created section, here .text.nolibc_memset.

The waitpid() function involves a switch/case statement that can be
turned to a jump table, which is what the compiler does with the .rodata
section, and after that it restores .text, which is no longer the
previous .text.nolibc_memset section. Then the CFI statements cross a
section, so does the .size calculation, which explains the error.

While a first approach consisting in placing an explicit ".text" at the
end of these functions was verified to work, it's still unreliable as
it depends on what the compiler remembers having emitted previously. A
better approach is to replace the ".section" with ".pushsection", and
place a ".popsection" at the end, so that these code blocks are agnostic
to where they're placed relative to other blocks.

Fixes: 553845eebd ("tools/nolibc: x86-64: Use `rep movsb` for `memcpy()` and `memmove()`")
Fixes: 12108aa8c1 ("tools/nolibc: x86-64: Use `rep stosb` for `memset()`")
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
2025-10-27 16:20:08 +01:00
Petr Tesarik
aed760df8e slab: convert setup_slub_debug() to use __core_param_cb()
Use __core_param_cb() to parse the "slab_debug" kernel parameter instead of
the obsolescent __setup(). For now, the parameter is not exposed in sysfs,
and no get ops is provided.

There is a slight change in behavior. Before this patch, the following
parameter would silently turn on full debugging for all slabs:

  slub_debug_yada_yada_gotta_love_this=hail_satan!

This syntax is now rejected, and the parameter will be passed to user
space, making the kernel a holier place.

Signed-off-by: Petr Tesarik <ptesarik@suse.com>
Link: https://patch.msgid.link/9674b34861394088c7853edf8e9d2b439fd4b42f.1761324765.git.ptesarik@suse.com
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2025-10-27 15:11:19 +01:00
Petr Tesarik
d3722ff57e slab: constify slab debug strings
Since the string passed to slab_debug is never modified, use pointers to
const char in all places where it is processed.

No functional changes intended.

Signed-off-by: Petr Tesarik <ptesarik@suse.com>
Reviewed-by: Christoph Lameter <cl@gentwo.org>
Link: https://patch.msgid.link/819095b921f6ae03bb54fd69ee4020e2a3aef675.1761324765.git.ptesarik@suse.com
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2025-10-27 15:11:19 +01:00
Tejun Heo
dcb938c453 sched_ext: Add ___compat suffix to scx_bpf_dsq_insert___v2 in compat.bpf.h
2dbbdeda77 ("sched_ext: Fix scx_bpf_dsq_insert() backward binary
compatibility") renamed the new bool-returning variant to scx_bpf_dsq_insert___v2
in the kernel. However, libbpf currently only strips ___SUFFIX on the BPF side,
not on kernel symbols, so the compat wrapper couldn't match the kernel kfunc and
would always fall back to the old variant even when the new one was available.

Add an extra ___compat suffix as a workaround - libbpf strips one suffix on the
BPF side leaving ___v2, which then matches the kernel kfunc directly. In the
future when libbpf strips all suffixes on both sides, all suffixes can be
dropped.

Fixes: 2dbbdeda77 ("sched_ext: Fix scx_bpf_dsq_insert() backward binary compatibility")
Cc: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-10-24 13:37:37 -10:00
Andrea Righi
71d7847cad sched_ext: Fix scx_bpf_dsq_peek() with FIFO DSQs
When removing a task from a FIFO DSQ, we must delete it from the list
before updating dsq->first_task, otherwise the following lookup will
just re-read the same task, leaving first_task pointing to removed
entry.

This issue only affects DSQs operating in FIFO mode, as priority DSQs
correctly update the rbtree before re-evaluating the new first task.

Remove the item from the list before refreshing the first task to
guarantee the correct behavior in FIFO DSQs.

Fixes: 44f5c8ec5b ("sched_ext: Add lockless peek operation for DSQs")
Signed-off-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-10-24 12:20:24 -10:00
Marcos Paulo de Souza
62627bf0ca kdb: Adapt kdb_msg_write to work with NBCON consoles
Function kdb_msg_write was calling con->write for any found console,
but it won't work on NBCON consoles. In this case we should acquire the
ownership of the console using NBCON_PRIO_EMERGENCY, since printing
kdb messages should only be interrupted by a panic.

At this point, the console is required to use the atomic callback. The
console is skipped if the write_atomic callback is not set or if the
context could not be acquired. The validation of NBCON is done by the
console_is_usable helper. The context is released right after
write_atomic finishes.

The oops_in_progress handling is only needed in the legacy consoles,
so it was moved around the con->write callback.

Suggested-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com>
Link: https://patch.msgid.link/20251016-nbcon-kgdboc-v6-5-866aac60a80e@suse.com
[pmladek@suse.com: Fixed compilation with !CONFIG_PRINTK.]
Signed-off-by: Petr Mladek <pmladek@suse.com>
2025-10-24 12:56:20 +02:00
Marcos Paulo de Souza
4349cf0df3 printk: nbcon: Export nbcon_write_context_set_buf
This function will be used in the next patch to allow a driver to set
both the message and message length of a nbcon_write_context. This is
necessary because the function also initializes the ->unsafe_takeover
struct member. By using this helper we ensure that the struct is
initialized correctly.

Reviewed-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com>
Link: https://patch.msgid.link/20251016-nbcon-kgdboc-v6-4-866aac60a80e@suse.com
Signed-off-by: Petr Mladek <pmladek@suse.com>
2025-10-24 12:56:19 +02:00
Marcos Paulo de Souza
286b113d70 printk: nbcon: Allow KDB to acquire the NBCON context
KDB can interrupt any console to execute the "mirrored printing" at any
time, so add an exception to nbcon_context_try_acquire_direct to allow
to get the context if the current CPU is the same as kdb_printf_cpu.

This change will be necessary for the next patch, which fixes
kdb_msg_write to work with NBCON consoles by calling ->write_atomic on
such consoles. But to print it first needs to acquire the ownership of
the console, so nbcon_context_try_acquire_direct is fixed here.

Reviewed-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Link: https://patch.msgid.link/20251016-nbcon-kgdboc-v6-3-866aac60a80e@suse.com
[pmladek@suse.com: Fix compilation with !CONFIG_KGDB_KDB.]
Signed-off-by: Petr Mladek <pmladek@suse.com>
2025-10-24 12:55:10 +02:00
Marcos Paulo de Souza
49f7d3054e printk: nbcon: Introduce KDB helpers
These helpers will be used when calling console->write_atomic on
KDB code in the next patch. It's basically the same implementation
as nbcon_device_try_acquire, but using NBCON_PRIO_EMERGENCY when
acquiring the context.

If the acquire succeeds, the message and message length are assigned to
nbcon_write_context so ->write_atomic can print the message.

After release try to flush the console since there may be a backlog of
messages in the ringbuffer. The kthread console printers do not get a
chance to run while kdb is active.

Reviewed-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com>
Link: https://patch.msgid.link/20251016-nbcon-kgdboc-v6-2-866aac60a80e@suse.com
Signed-off-by: Petr Mladek <pmladek@suse.com>
2025-10-24 12:19:23 +02:00
Marcos Paulo de Souza
4da42aaa82 printk: nbcon: Export console_is_usable
The helper will be used on KDB code in the next commits.

Reviewed-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com>
Link: https://patch.msgid.link/20251016-nbcon-kgdboc-v6-1-866aac60a80e@suse.com
Signed-off-by: Petr Mladek <pmladek@suse.com>
2025-10-24 12:19:23 +02:00
Hongru Zhang
20d387d7ce selinux: improve bucket distribution uniformity of avc_hash()
Reuse the already implemented MurmurHash3 algorithm. Under heavy stress
testing (on an 8-core system sustaining over 50,000 authentication events
per second), sample once per second and take the mean of 1800 samples:

1. Bucket utilization rate and length of longest chain
+--------------------------+-----------------------------------------+
|                          | bucket utilization rate / longest chain |
|                          +--------------------+--------------------+
|                          |      no-patch      |     with-patch     |
+--------------------------+--------------------+--------------------+
|  512 nodes,  512 buckets |      52.5%/7.5     |     60.2%/5.7      |
+--------------------------+--------------------+--------------------+
| 1024 nodes,  512 buckets |      68.9%/12.1    |     80.2%/9.7      |
+--------------------------+--------------------+--------------------+
| 2048 nodes,  512 buckets |      83.7%/19.4    |     93.4%/16.3     |
+--------------------------+--------------------+--------------------+
| 8192 nodes, 8192 buckets |      49.5%/11.4    |     60.3%/7.4      |
+--------------------------+--------------------+--------------------+

2. avc_search_node latency (total latency of hash operation and table
lookup)
+--------------------------+-----------------------------------------+
|                          |   latency of function avc_search_node   |
|                          +--------------------+--------------------+
|                          |      no-patch      |     with-patch     |
+--------------------------+--------------------+--------------------+
|  512 nodes,  512 buckets |        87ns        |        84ns        |
+--------------------------+--------------------+--------------------+
| 1024 nodes,  512 buckets |        97ns        |        96ns        |
+--------------------------+--------------------+--------------------+
| 2048 nodes,  512 buckets |       118ns        |       113ns        |
+--------------------------+--------------------+--------------------+
| 8192 nodes, 8192 buckets |       106ns        |        99ns        |
+--------------------------+--------------------+--------------------+

Although MurmurHash3 has higher overhead than the bitwise operations in
the original algorithm, the data shows that the MurmurHash3 achieves
better distribution, reducing average lookup time. Consequently, the
total latency of hashing and table lookup is lower than before.

Signed-off-by: Hongru Zhang <zhanghongru@xiaomi.com>
[PM: whitespace fixes]
Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-10-23 18:24:30 -04:00
Hongru Zhang
929126ef4a selinux: Move avtab_hash() to a shared location for future reuse
This is a preparation patch, no functional change.

Signed-off-by: Hongru Zhang <zhanghongru@xiaomi.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-10-23 18:24:30 -04:00
Hongru Zhang
641e021758 selinux: Introduce a new config to make avc cache slot size adjustable
On mobile device high-load situations, permission check can happen
more than 90,000/s (8 core system). With default 512 cache nodes
configuration, avc cache miss happens more often and occasionally
leads to long time (>2ms) irqs off on both big and little cores,
which decreases system real-time capability.

An actual call stack is as follows:
 => avc_compute_av
 => avc_perm_nonode
 => avc_has_perm_noaudit
 => selinux_capable
 => security_capable
 => capable
 => __sched_setscheduler
 => do_sched_setscheduler
 => __arm64_sys_sched_setscheduler
 => invoke_syscall
 => el0_svc_common
 => do_el0_svc
 => el0_svc
 => el0t_64_sync_handler
 => el0t_64_sync

Although we can expand avc nodes through /sys/fs/selinux/cache_threshold
to mitigate long time irqs off, hash conflicts make the bucket average
length longer because of the fixed size of cache slots, leading to
avc_search_node() latency increase.

So introduce a new config to make avc cache slot size also configurable,
and with fine tuning, we can mitigate long time irqs off with slightly
avc_search_node() performance regression.

Theoretically, the main overhead is memory consumption.

Signed-off-by: Hongru Zhang <zhanghongru@xiaomi.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-10-23 18:24:30 -04:00
Andrew Murray
1bc9a28f07 printk: Use console_flush_one_record for legacy printer kthread
The legacy printer kthread uses console_lock and
__console_flush_and_unlock to flush records to the console. This
approach results in the console_lock being held for the entire
duration of a flush. This can result in large waiting times for
those waiting for console_lock especially where there is a large
volume of records or where the console is slow (e.g. serial). This
contention is observed during boot, as the call to filp_open in
console_on_rootfs will delay progression to userspace until any
in-flight flush is completed.

Let's instead use console_flush_one_record and release/reacquire
the console_lock between records.

On a PocketBeagle 2, with the following boot args:
"console=ttyS2,9600 initcall_debug=1 loglevel=10"

Without this patch:

[    5.613166] +console_on_rootfs/filp_open
[    5.643473] mmc1: SDHCI controller on fa00000.mmc [fa00000.mmc] using ADMA 64-bit
[    5.643823] probe of fa00000.mmc returned 0 after 258244 usecs
[    5.710520] mmc1: new UHS-I speed SDR104 SDHC card at address 5048
[    5.721976] mmcblk1: mmc1:5048 SD32G 29.7 GiB
[    5.747258]  mmcblk1: p1 p2
[    5.753324] probe of mmc1:5048 returned 0 after 40002 usecs
[   15.595240] ti_sci_pm_domains 44043000.system-controller:power-controller: sync_state() pending due to 30040000.pruss
[   15.595282] ti_sci_pm_domains 44043000.system-controller:power-controller: sync_state() pending due to e010000.watchdog
[   15.595297] ti_sci_pm_domains 44043000.system-controller:power-controller: sync_state() pending due to e000000.watchdog
[   15.595437] ti_sci_pm_domains 44043000.system-controller:power-controller: sync_state() pending due to 30300000.crc
[  146.275961] -console_on_rootfs/filp_open ...

and with:

[    5.477122] +console_on_rootfs/filp_open
[    5.595814] mmc1: SDHCI controller on fa00000.mmc [fa00000.mmc] using ADMA 64-bit
[    5.596181] probe of fa00000.mmc returned 0 after 312757 usecs
[    5.662813] mmc1: new UHS-I speed SDR104 SDHC card at address 5048
[    5.674367] mmcblk1: mmc1:5048 SD32G 29.7 GiB
[    5.699320]  mmcblk1: p1 p2
[    5.705494] probe of mmc1:5048 returned 0 after 39987 usecs
[    6.418682] -console_on_rootfs/filp_open ...
...
[   15.593509] ti_sci_pm_domains 44043000.system-controller:power-controller: sync_state() pending due to 30040000.pruss
[   15.593551] ti_sci_pm_domains 44043000.system-controller:power-controller: sync_state() pending due to e010000.watchdog
[   15.593566] ti_sci_pm_domains 44043000.system-controller:power-controller: sync_state() pending due to e000000.watchdog
[   15.593704] ti_sci_pm_domains 44043000.system-controller:power-controller: sync_state() pending due to 30300000.crc

Where I've added a printk surrounding the call in console_on_rootfs
to filp_open.

Suggested-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Andrew Murray <amurray@thegoodpenguin.co.uk>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: John Ogness <john.ogness@linutronix.de>
Link: https://patch.msgid.link/20251020-printk_legacy_thread_console_lock-v3-3-00f1f0ac055a@thegoodpenguin.co.uk
[pmladek@suse.com: Fixed ordering of variable definition suggested by John.]
Signed-off-by: Petr Mladek <pmladek@suse.com>
2025-10-23 17:15:44 +02:00
Petr Mladek
ba00f7c4d0 printk: console_flush_one_record() code cleanup
console_flush_one_record() and console_flush_all() duplicate several
checks. They both want to tell the caller that consoles are not
longer usable in this context because it has lost the lock or
the lock has to be reserved for the panic CPU.

Remove the duplication by changing the semantic of the function
console_flush_one_record() return value and parameters.

The function will return true when it is able to do the job. It means
that there is at least one usable console. And the flushing was
not interrupted by a takeover or panic_on_other_cpu().

Also replace the @any_usable parameter with @try_again. The @try_again
parameter will be set to true when the function could do the job
and at least one console made a progress.

Motivation:

The callers need to know when

  + they should continue flushing => @try_again
  + when the console is flushed => can_do_the_job(return) && !@try_again
  + when @next_seq is valid => same as flushed
  + when lost console_lock => @takeover

The proposed change makes it clear when the function can do
the job. It simplifies the answer for the other questions.

Also the return value from console_flush_one_record() can
be used as return value from console_flush_all().

Reviewed-by: John Ogness <john.ogness@linutronix.de>
Link: https://patch.msgid.link/20251020-printk_legacy_thread_console_lock-v3-2-00f1f0ac055a@thegoodpenguin.co.uk
[pmladek@suse.com: Fixed type of any_usable variable reported by John]
Signed-off-by: Petr Mladek <pmladek@suse.com>
2025-10-23 17:11:13 +02:00
Andrew Murray
741ea7aa95 printk: Introduce console_flush_one_record
console_flush_all prints all remaining records to all usable consoles
whilst its caller holds console_lock. This can result in large waiting
times for those waiting for console_lock especially where there is a
large volume of records or where the console is slow (e.g. serial).

Let's extract the parts of this function which print a single record
into a new function named console_flush_one_record. This can later
be used for functions that will release and reacquire console_lock
between records.

This commit should not change existing functionality.

Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Andrew Murray <amurray@thegoodpenguin.co.uk>
Reviewed-by: John Ogness <john.ogness@linutronix.de>
Link: https://patch.msgid.link/20251020-printk_legacy_thread_console_lock-v3-1-00f1f0ac055a@thegoodpenguin.co.uk
Signed-off-by: Petr Mladek <pmladek@suse.com>
2025-10-23 17:08:41 +02:00
Herbert Xu
275a9a3f9b KEYS: trusted: Pass argument by pointer in dump_options
Instead of passing pkey_info into dump_options by value, using a
pointer instead.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-10-23 12:55:43 +08:00
nieweiqiang
e7066160f5 crypto: hisilicon/qm - restore original qos values
When the new qos valus setting fails, restore to
the original qos values.

Fixes: 72b010dc33 ("crypto: hisilicon/qm - supports writing QoS int the host")
Signed-off-by: nieweiqiang <nieweiqiang@huawei.com>
Signed-off-by: Chenghai Huang <huangchenghai2@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-10-23 12:55:43 +08:00
Herbert Xu
7fc25ed2bc crypto: sun8i-ss - Move j init earlier in sun8i_ss_hash_run
With gcc-14 I get

../drivers/crypto/allwinner/sun8i-ss/sun8i-ss-hash.c: In function ‘sun8i_ss_hash_run’:
../drivers/crypto/allwinner/sun8i-ss/sun8i-ss-hash.c:631:21: warning: ‘j’ may be used uninitialized [-Wmaybe-uninitialized]
  631 |                 j = hash_pad(bf, 4096, j, byte_count, true, bs);
      |                     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../drivers/crypto/allwinner/sun8i-ss/sun8i-ss-hash.c:493:13: note: ‘j’ was declared here
  493 |         int j, i, k, todo;
      |             ^

Fix this false positive by moving the initialisation of j earlier.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-10-23 12:55:43 +08:00
Thorsten Blum
df0845cf44 crypto: asymmetric_keys - prevent overflow in asymmetric_key_generate_id
Use check_add_overflow() to guard against potential integer overflows
when adding the binary blob lengths and the size of an asymmetric_key_id
structure and return ERR_PTR(-EOVERFLOW) accordingly. This prevents a
possible buffer overflow when copying data from potentially malicious
X.509 certificate fields that can be arbitrarily large, such as ASN.1
INTEGER serial numbers, issuer names, etc.

Fixes: 7901c1a8ef ("KEYS: Implement binary asymmetric key ID handling")
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Reviewed-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-10-23 12:55:42 +08:00
Thiébaud Weksteen
094e94d13b memfd,selinux: call security_inode_init_security_anon()
Prior to this change, no security hooks were called at the creation of a
memfd file. It means that, for SELinux as an example, it will receive
the default type of the filesystem that backs the in-memory inode. In
most cases, that would be tmpfs, but if MFD_HUGETLB is passed, it will
be hugetlbfs. Both can be considered implementation details of memfd.

It also means that it is not possible to differentiate between a file
coming from memfd_create and a file coming from a standard tmpfs mount
point.

Additionally, no permission is validated at creation, which differs from
the similar memfd_secret syscall.

Call security_inode_init_security_anon during creation. This ensures
that the file is setup similarly to other anonymous inodes. On SELinux,
it means that the file will receive the security context of its task.

The ability to limit fexecve on memfd has been of interest to avoid
potential pitfalls where /proc/self/exe or similar would be executed
[1][2]. Reuse the "execute_no_trans" and "entrypoint" access vectors,
similarly to the file class. These access vectors may not make sense for
the existing "anon_inode" class. Therefore, define and assign a new
class "memfd_file" to support such access vectors.

Guard these changes behind a new policy capability named "memfd_class".

[1] https://crbug.com/1305267
[2] https://lore.kernel.org/lkml/20221215001205.51969-1-jeffxu@google.com/

Signed-off-by: Thiébaud Weksteen <tweek@google.com>
Reviewed-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Tested-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Acked-by: Hugh Dickins <hughd@google.com>
[PM: subj tweak]
Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-10-22 19:28:27 -04:00
Ricardo Robaina
4f7b54e17e audit: fix comment misindentation in audit.h
Minor comment misindentation adjustment.

Signed-off-by: Ricardo Robaina <rrobaina@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-10-22 19:28:06 -04:00
Paul Moore
dfa024bc3f lsm: add a LSM_STARTED_ALL notification event
Add a new LSM notifier event, LSM_STARTED_ALL, which is fired once at
boot when all of the LSMs have been started.

Reviewed-by: Kees Cook <kees@kernel.org>
Reviewed-by: Casey Schaufler <casey@schaufler-ca.com>
Reviewed-by: John Johansen <john.johhansen@canonical.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-10-22 19:24:29 -04:00
Paul Moore
4ab5efcc28 lsm: consolidate all of the LSM framework initcalls
The LSM framework itself registers a small number of initcalls, this
patch converts these initcalls into the new initcall mechanism.

Reviewed-by: Casey Schaufler <casey@schaufler-ca.com>
Reviewed-by: John Johansen <john.johhansen@canonical.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-10-22 19:24:28 -04:00
Paul Moore
3156bc814f selinux: move initcalls to the LSM framework
SELinux currently has a number of initcalls so we've created a new
function, selinux_initcall(), which wraps all of these initcalls so
that we have a single initcall function that can be registered with the
LSM framework.

Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-10-22 19:24:28 -04:00
Roberto Sassu
82fe7932e8 ima,evm: move initcalls to the LSM framework
This patch converts IMA and EVM to use the LSM frameworks's initcall
mechanism. It moved the integrity_fs_init() call to ima_fs_init() and
evm_init_secfs(), to work around the fact that there is no "integrity" LSM,
and introduced integrity_fs_fini() to remove the integrity directory, if
empty. Both integrity_fs_init() and integrity_fs_fini() support the
scenario of being called by both the IMA and EVM LSMs.

This patch does not touch any of the platform certificate code that
lives under the security/integrity/platform_certs directory as the
IMA/EVM developers would prefer to address that in a future patchset.

Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Acked-by: Mimi Zohar <zohar@linux.ibm.com>
[PM: adjust description as discussed over email]
Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-10-22 19:24:27 -04:00
Paul Moore
77ebff0607 lockdown: move initcalls to the LSM framework
Reviewed-by: Kees Cook <kees@kernel.org>
Acked-by: Xiu Jianfeng <xiujianfeng@huawei.com>
Reviewed-by: John Johansen <john.johhansen@canonical.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-10-22 19:24:27 -04:00
Paul Moore
7cbe113537 apparmor: move initcalls to the LSM framework
Reviewed-by: Kees Cook <kees@kernel.org>
Acked-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-10-22 19:24:27 -04:00
Paul Moore
d3ba8f8089 safesetid: move initcalls to the LSM framework
Reviewed-by: Kees Cook <kees@kernel.org>
Acked-by: Micah Morton <mortonm@chromium.org>
Reviewed-by: John Johansen <john.johhansen@canonical.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-10-22 19:24:26 -04:00
Paul Moore
9484ae1295 tomoyo: move initcalls to the LSM framework
Reviewed-by: Kees Cook <kees@kernel.org>
Reviewed-by: John Johansen <john.johhansen@canonical.com>
Acked-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-10-22 19:24:26 -04:00
Paul Moore
06643d5584 smack: move initcalls to the LSM framework
As the LSM framework only supports one LSM initcall callback for each
initcall type, the init_smk_fs() and smack_nf_ip_init() functions were
wrapped with a new function, smack_initcall() that is registered with
the LSM framework.

Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Reviewed-by: John Johansen <john.johhansen@canonical.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-10-22 19:24:25 -04:00
Paul Moore
d934f97db8 ipe: move initcalls to the LSM framework
Reviewed-by: Kees Cook <kees@kernel.org>
Tested-by: Fan Wu <wufan@kernel.org>
Acked-by: Fan Wu <wufan@kernel.org>
Reviewed-by: John Johansen <john.johhansen@canonical.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-10-22 19:24:25 -04:00
Paul Moore
b0374e79a8 loadpin: move initcalls to the LSM framework
Acked-by: Kees Cook <kees@kernel.org>
Reviewed-by: John Johansen <john.johhansen@canonical.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-10-22 19:24:25 -04:00
Paul Moore
cdc028812f lsm: introduce an initcall mechanism into the LSM framework
Currently the individual LSMs register their own initcalls, and while
this should be harmless, it can be wasteful in the case where a LSM
is disabled at boot as the initcall will still be executed.  This
patch introduces support for managing the initcalls in the LSM
framework, and future patches will convert the existing LSMs over to
this new mechanism.

Only initcall types which are used by the current in-tree LSMs are
supported, additional initcall types can easily be added in the future
if needed.

Reviewed-by: Kees Cook <kees@kernel.org>
Reviewed-by: Casey Schaufler <casey@schaufler-ca.com>
Reviewed-by: John Johansen <john.johhansen@canonical.com>
Reviewed-by: Mimi Zohar <zohar@linux.ibm.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-10-22 19:24:24 -04:00
Paul Moore
3423c6397c lsm: group lsm_order_parse() with the other lsm_order_*() functions
Move the lsm_order_parse() function near the other lsm_order_*()
functions to improve readability.

No code changes.

Reviewed-by: Casey Schaufler <casey@schaufler-ca.com>
Reviewed-by: John Johansen <john.johhansen@canonical.com>
Reviewed-by: Mimi Zohar <zohar@linux.ibm.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-10-22 19:24:24 -04:00
Paul Moore
ac3c47cece lsm: output available LSMs when debugging
This will display all of the LSMs built into the kernel, regardless
of if they are enabled or not.

Reviewed-by: Casey Schaufler <casey@schaufler-ca.com>
Reviewed-by: John Johansen <john.johhansen@canonical.com>
Reviewed-by: Mimi Zohar <zohar@linux.ibm.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-10-22 19:24:23 -04:00
Paul Moore
5137e583ba lsm: cleanup the debug and console output in lsm_init.c
Move away from an init specific init_debug() macro to a more general
lsm_pr()/lsm_pr_cont()/lsm_pr_dbg() set of macros that are available
both before and after init.  In the process we do a number of minor
changes to improve the LSM initialization output and cleanup the code
somewhat.

Reviewed-by: Casey Schaufler <casey@schaufler-ca.com>
Reviewed-by: John Johansen <john.johhansen@canonical.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-10-22 19:24:23 -04:00
Paul Moore
450705334f lsm: add/tweak function header comment blocks in lsm_init.c
Add function header comments for lsm_static_call_init() and
early_security_init(), tweak the existing comment block for
security_add_hooks().

Reviewed-by: Casey Schaufler <casey@schaufler-ca.com>
Reviewed-by: John Johansen <john.johhansen@canonical.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-10-22 19:24:23 -04:00
Paul Moore
45a41d1394 lsm: fold lsm_init_ordered() into security_init()
With only security_init() calling lsm_init_ordered, it makes little
sense to keep lsm_init_ordered() as a standalone function.  Fold
lsm_init_ordered() into security_init().

Reviewed-by: Casey Schaufler <casey@schaufler-ca.com>
Reviewed-by: John Johansen <john.johhansen@canonical.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-10-22 19:24:22 -04:00
Paul Moore
27be5600fe lsm: cleanup initialize_lsm() and rename to lsm_init_single()
Rename initialize_lsm() to be more consistent with the rest of the LSM
initialization changes and rework the function itself to better fit
with the "exit on fail" coding pattern.

Reviewed-by: Kees Cook <kees@kernel.org>
Reviewed-by: John Johansen <john.johansen@canonical.com>
Reviewed-by: Casey Schaufler <casey@schaufler-ca.com>
Reviewed-by: Mimi Zohar <zohar@linux.ibm.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-10-22 19:24:22 -04:00
Paul Moore
291271e691 lsm: cleanup the LSM blob size code
Convert the lsm_blob_size fields to unsigned integers as there is no
current need for them to be negative, change "lsm_set_blob_size()" to
"lsm_blob_size_update()" to better reflect reality, and perform some
other minor cleanups to the associated code.

Reviewed-by: Kees Cook <kees@kernel.org>
Reviewed-by: John Johansen <john.johansen@canonical.com>
Reviewed-by: Casey Schaufler <casey@schaufler-ca.com>
Reviewed-by: Mimi Zohar <zohar@linux.ibm.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-10-22 19:24:21 -04:00
Paul Moore
752db06571 lsm: rename/rework ordered_lsm_parse() to lsm_order_parse()
Rename ordered_lsm_parse() to lsm_order_parse() for the sake of
consistency with the other LSM initialization routines, and also
do some minor rework of the function.  Aside from some minor style
decisions, the majority of the rework involved shuffling the order
of the LSM_FLAG_LEGACY and LSM_ORDER_FIRST code so that the
LSM_FLAG_LEGACY checks are handled first; it is important to note
that this doesn't affect the order in which the LSMs are registered.

Reviewed-by: Casey Schaufler <casey@schaufler-ca.com>
Reviewed-by: John Johansen <john.johhansen@canonical.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-10-22 19:24:21 -04:00
Paul Moore
24a9c58978 lsm: rename/rework append_ordered_lsm() into lsm_order_append()
Rename append_ordered_lsm() to lsm_order_append() to better match
convention and do some rework.  The rework includes moving the
LSM_FLAG_EXCLUSIVE logic from lsm_prepare() to lsm_order_append()
in order to consolidate the individual LSM append/activation code,
and adding logic to skip appending explicitly disabled LSMs to the
active LSM list.

Reviewed-by: Casey Schaufler <casey@schaufler-ca.com>
Reviewed-by: John Johansen <john.johhansen@canonical.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-10-22 19:24:21 -04:00
Paul Moore
a748372a28 lsm: rename exists_ordered_lsm() to lsm_order_exists()
Also add a header comment block to the function.

Reviewed-by: Casey Schaufler <casey@schaufler-ca.com>
Reviewed-by: John Johansen <john.johhansen@canonical.com>
Reviewed-by: Mimi Zohar <zohar@linux.ibm.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-10-22 19:24:20 -04:00
Paul Moore
2d67172612 lsm: rework the LSM enable/disable setter/getter functions
In addition to style changes, rename set_enabled() to lsm_enabled_set()
and is_enabled() to lsm_is_enabled() to better fit within the LSM
initialization code.

Reviewed-by: Casey Schaufler <casey@schaufler-ca.com>
Reviewed-by: John Johansen <john.johhansen@canonical.com>
Reviewed-by: Mimi Zohar <zohar@linux.ibm.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-10-22 19:24:20 -04:00
Paul Moore
935d508d4d lsm: get rid of the lsm_names list and do some cleanup
The LSM currently has a lot of code to maintain a list of the currently
active LSMs in a human readable string, with the only user being the
"/sys/kernel/security/lsm" code.  Let's drop all of that code and
generate the string on first use and then cache it for subsequent use.

Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-10-22 19:24:19 -04:00
Paul Moore
250898ca33 lsm: rework lsm_active_cnt and lsm_idlist[]
Move the LSM active count and lsm_id list declarations out of a header
that is visible across the kernel and into a header that is limited to
the LSM framework.  This not only helps keep the include/linux headers
smaller and cleaner, it helps prevent misuse of these variables.

Reviewed-by: Casey Schaufler <casey@schaufler-ca.com>
Reviewed-by: John Johansen <john.johhansen@canonical.com>
Reviewed-by: Mimi Zohar <zohar@linux.ibm.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-10-22 19:24:19 -04:00
Paul Moore
592b104f9b lsm: rename the lsm order variables for consistency
Rename the builtin_lsm_order variable to lsm_order_builtin,
chosen_lsm_order to lsm_order_cmdline, chosen_major_lsm to
lsm_order_legacy, ordered_lsms[] to lsm_order[], and exclusive
to lsm_exclusive.

This patch also renames the associated kernel command line parsing
functions and adds some basic function comment blocks.  The parsing
function choose_major_lsm() was renamed to lsm_choose_security(),
choose_lsm_order() to lsm_choose_lsm(), and enable_debug() to
lsm_debug_enable().

Reviewed-by: Kees Cook <kees@kernel.org>
Reviewed-by: John Johansen <john.johansen@canonical.com>
Reviewed-by: Casey Schaufler <casey@schaufler-ca.com>
Reviewed-by: Mimi Zohar <zohar@linux.ibm.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-10-22 19:24:19 -04:00
Paul Moore
9f9dc69e06 lsm: replace the name field with a pointer to the lsm_id struct
Reduce the duplication between the lsm_id struct and the DEFINE_LSM()
definition by linking the lsm_id struct directly into the individual
LSM's DEFINE_LSM() instance.

Linking the lsm_id into the LSM definition also allows us to simplify
the security_add_hooks() function by removing the code which populates
the lsm_idlist[] array and moving it into the normal LSM startup code
where the LSM list is parsed and the individual LSMs are enabled,
making for a cleaner implementation with less overhead at boot.

Reviewed-by: Kees Cook <kees@kernel.org>
Reviewed-by: John Johansen <john.johansen@canonical.com>
Reviewed-by: Casey Schaufler <casey@schaufler-ca.com>
Reviewed-by: Mimi Zohar <zohar@linux.ibm.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-10-22 19:24:18 -04:00
Paul Moore
faabedcd6e lsm: rename ordered_lsm_init() to lsm_init_ordered()
The new name more closely fits the rest of the naming scheme in
security/lsm_init.c.  This patch also adds a trivial comment block to
the top of the function.

Reviewed-by: Casey Schaufler <casey@schaufler-ca.com>
Reviewed-by: John Johansen <john.johhansen@canonical.com>
Reviewed-by: Mimi Zohar <zohar@linux.ibm.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-10-22 19:24:18 -04:00
Paul Moore
92ed3500c9 lsm: integrate lsm_early_cred() and lsm_early_task() into caller
With only one caller of lsm_early_cred() and lsm_early_task(), insert
the functions' code directly into the caller and ger rid of the two
functions.

Reviewed-by: Casey Schaufler <casey@schaufler-ca.com>
Reviewed-by: John Johansen <john.johhansen@canonical.com>
Reviewed-by: Mimi Zohar <zohar@linux.ibm.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-10-22 19:24:17 -04:00
Paul Moore
cb1513db7a lsm: integrate report_lsm_order() code into caller
With only one caller of report_lsm_order(), insert the function's code
directly into the caller and ger rid of report_lsm_order().

Reviewed-by: Casey Schaufler <casey@schaufler-ca.com>
Reviewed-by: John Johansen <john.johhansen@canonical.com>
Reviewed-by: Mimi Zohar <zohar@linux.ibm.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-10-22 19:24:17 -04:00
Paul Moore
37f788f655 lsm: introduce looping macros for the initialization code
There are three common for loop patterns in the LSM initialization code
to loop through the ordered LSM list and the registered "early" LSMs.
This patch implements these loop patterns as macros to help simplify the
code and reduce the chance for errors.

Reviewed-by: Casey Schaufler <casey@schaufler-ca.com>
Reviewed-by: John Johansen <john.johhansen@canonical.com>
Reviewed-by: Mimi Zohar <zohar@linux.ibm.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-10-22 19:24:17 -04:00
Paul Moore
e02578561d lsm: consolidate lsm_allowed() and prepare_lsm() into lsm_prepare()
Simplify and consolidate the lsm_allowed() and prepare_lsm() functions
into a new function, lsm_prepare().

Reviewed-by: Casey Schaufler <casey@schaufler-ca.com>
Reviewed-by: John Johansen <john.johhansen@canonical.com>
Reviewed-by: Mimi Zohar <zohar@linux.ibm.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-10-22 19:24:16 -04:00
Paul Moore
67a4b6a89b lsm: split the init code out into lsm_init.c
Continue to pull code out of security/security.c to help improve
readability by pulling all of the LSM framework initialization
code out into a new file.

No code changes.

Reviewed-by: Kees Cook <kees@kernel.org>
Reviewed-by: John Johansen <john.johansen@canonical.com>
Reviewed-by: Casey Schaufler <casey@schaufler-ca.com>
Reviewed-by: Mimi Zohar <zohar@linux.ibm.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-10-22 19:24:16 -04:00
Paul Moore
a5e7c17c81 lsm: split the notifier code out into lsm_notifier.c
In an effort to decompose security/security.c somewhat to make it less
twisted and unwieldy, pull out the LSM notifier code into a new file
as it is fairly well self-contained.

No code changes.

Reviewed-by: Kees Cook <kees@kernel.org>
Reviewed-by: John Johansen <john.johansen@canonical.com>
Reviewed-by: Casey Schaufler <casey@schaufler-ca.com>
Reviewed-by: Mimi Zohar <zohar@linux.ibm.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-10-22 19:24:15 -04:00
Tejun Heo
e67708823d sched_ext: Use rhashtable_lookup() instead of rhashtable_lookup_fast()
The find_user_dsq() function is called from contexts that are already
under RCU read lock protection. Switch from rhashtable_lookup_fast() to
rhashtable_lookup() to avoid redundant RCU locking.

Requires: bee8a520eb ("rhashtable: Use rcu_dereference_all and rcu_dereference_all_check")
Acked-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-10-22 11:55:06 -10:00
Tejun Heo
987e00035c sched_ext: Rename pnt_seq to kick_sync
The pnt_seq field and related infrastructure were originally named for
"pick next task sequence", reflecting their original implementation in
scx_next_task_picked(). However, the sequence counter is now incremented in
both put_prev_task_scx() and pick_task_scx() and its purpose is to
synchronize kick operations via SCX_KICK_WAIT, not specifically to track
pick_next_task events.

Rename to better reflect the actual semantics:
- pnt_seq -> kick_sync
- scx_kick_pseqs -> scx_kick_syncs
- pseqs variables -> ksyncs
- Update comments to refer to "kick_sync sequence" instead of "pick_task
  sequence"

This is a pure renaming with no functional changes.

Reviewed-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-10-22 11:42:14 -10:00
Tejun Heo
a379fa1e2c sched_ext: Fix SCX_KICK_WAIT to work reliably
SCX_KICK_WAIT is used to synchronously wait for the target CPU to complete
a reschedule and can be used to implement operations like core scheduling.

This used to be implemented by scx_next_task_picked() incrementing pnt_seq,
which was always called when a CPU picks the next task to run, allowing
SCX_KICK_WAIT to reliably wait for the target CPU to enter the scheduler and
pick the next task.

However, commit b999e365c2 ("sched_ext: Replace scx_next_task_picked()
with switch_class()") replaced scx_next_task_picked() with the
switch_class() callback, which is only called when switching between sched
classes. This broke SCX_KICK_WAIT because pnt_seq would no longer be
reliably incremented unless the previous task was SCX and the next task was
not.

This fix leverages commit 4c95380701 ("sched/ext: Fold balance_scx() into
pick_task_scx()") which refactored the pick path making put_prev_task_scx()
the natural place to track task switches for SCX_KICK_WAIT. The fix moves
pnt_seq increment to put_prev_task_scx() and also increments it in
pick_task_scx() to handle cases where the same task is re-selected, whether
by BPF scheduler decision or slice refill. The semantics: If the current
task on the target CPU is SCX, SCX_KICK_WAIT waits until the CPU enters the
scheduling path. This provides sufficient guarantee for use cases like core
scheduling while keeping the operation self-contained within SCX.

v2: - Also increment pnt_seq in pick_task_scx() to handle same-task
      re-selection (Andrea Righi).
    - Use smp_cond_load_acquire() for the busy-wait loop for better
      architecture optimization (Peter Zijlstra).

Reported-by: Wen-Fang Liu <liuwenfang@honor.com>
Link: http://lkml.kernel.org/r/228ebd9e6ed3437996dffe15735a9caa@honor.com
Cc: Peter Zijlstra <peterz@infradead.org>
Reviewed-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-10-22 11:42:14 -10:00
Tejun Heo
a9c1fbbd6d sched_ext: Don't kick CPUs running higher classes
When a sched_ext scheduler tries to kick a CPU, the CPU may be running a
higher class task. sched_ext has no control over such CPUs. A sched_ext
scheduler couldn't have expected to get access to the CPU after kicking it
anyway. Skip kicking when the target CPU is running a higher class.

Reviewed-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-10-22 11:42:14 -10:00
Daniil Tatianin
67e1b0052f printk_ringbuffer: don't needlessly wrap data blocks around
Previously, data blocks that perfectly fit the data ring buffer would
get wrapped around to the beginning for no reason since the calculated
offset of the next data block would belong to the next wrap. Since this
offset is not actually part of the data block, but rather the offset of
where the next data block is going to start, there is no reason to
include it when deciding whether the current block fits the buffer.

Signed-off-by: Daniil Tatianin <d-tatianin@yandex-team.ru>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Tested-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: John Ogness <john.ogness@linutronix.de>
Tested-by: John Ogness <john.ogness@linutronix.de>
Link: https://patch.msgid.link/20250905144152.9137-2-d-tatianin@yandex-team.ru
[pmladek@suse.com: Updated indentation.]
Signed-off-by: Petr Mladek <pmladek@suse.com>
2025-10-22 14:03:36 +02:00
Tamir Duberstein
3b83f5d5e7 rust: replace CStr with core::ffi::CStr
`kernel::ffi::CStr` was introduced in commit d126d23801 ("rust: str:
add `CStr` type") in November 2022 as an upstreaming of earlier work
that was done in May 2021[0]. That earlier work, having predated the
inclusion of `CStr` in `core`, largely duplicated the implementation of
`std::ffi::CStr`.

`std::ffi::CStr` was moved to `core::ffi::CStr` in Rust 1.64 in
September 2022. Hence replace `kernel::str::CStr` with `core::ffi::CStr`
to reduce our custom code footprint, and retain needed custom
functionality through an extension trait.

Add `CStr` to `ffi` and the kernel prelude.

Link: faa3cbcca0 [0]
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Acked-by: Danilo Krummrich <dakr@kernel.org>
Reviewed-by: Benno Lossin <lossin@kernel.org>
Signed-off-by: Tamir Duberstein <tamird@gmail.com>
Link: https://patch.msgid.link/20251018-cstr-core-v18-16-9378a54385f8@gmail.com
[ Removed assert that would now depend on the Rust version. - Miguel ]
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2025-10-22 07:47:27 +02:00
Tamir Duberstein
c5cf01ba8d rust: support formatting of foreign types
Introduce a `fmt!` macro which wraps all arguments in
`kernel::fmt::Adapter` and a `kernel::fmt::Display` trait. This enables
formatting of foreign types (like `core::ffi::CStr`) that do not
implement `core::fmt::Display` due to concerns around lossy conversions
which do not apply in the kernel.

Suggested-by: Alice Ryhl <aliceryhl@google.com>
Link: https://rust-for-linux.zulipchat.com/#narrow/channel/288089-General/topic/Custom.20formatting/with/516476467
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Acked-by: Danilo Krummrich <dakr@kernel.org>
Reviewed-by: Benno Lossin <lossin@kernel.org>
Signed-off-by: Tamir Duberstein <tamird@gmail.com>
Link: https://patch.msgid.link/20251018-cstr-core-v18-15-9378a54385f8@gmail.com
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2025-10-22 07:15:31 +02:00
Tamir Duberstein
1dcd763ba1 rust: clk: use CStr::as_char_ptr
Replace the use of `as_ptr` which works through `<CStr as
Deref<Target=&[u8]>::deref()` in preparation for replacing
`kernel::str::CStr` with `core::ffi::CStr` as the latter does not
implement `Deref<Target=&[u8]>`.

Signed-off-by: Tamir Duberstein <tamird@gmail.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Link: https://patch.msgid.link/20251018-cstr-core-v18-14-9378a54385f8@gmail.com
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2025-10-22 07:15:19 +02:00
Tamir Duberstein
9ce084e579 rust: regulator: use CStr::as_char_ptr
Replace the use of `as_ptr` which works through `<CStr as
Deref<Target=&[u8]>::deref()` in preparation for replacing
`kernel::str::CStr` with `core::ffi::CStr` as the latter does not
implement `Deref<Target=&[u8]>`.

Signed-off-by: Tamir Duberstein <tamird@gmail.com>
Acked-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Daniel Almeida <daniel.almeida@collabora.com>
Link: https://patch.msgid.link/20251018-cstr-core-v18-13-9378a54385f8@gmail.com
[ Move safety comment below to support older Clippy. - Miguel ]
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2025-10-22 07:14:57 +02:00
Tamir Duberstein
3b46f65355 rust: configfs: use CStr::as_char_ptr
Replace the use of `as_ptr` which works through `<CStr as
Deref<Target=&[u8]>::deref()` in preparation for replacing
`kernel::str::CStr` with `core::ffi::CStr` as the latter does not
implement `Deref<Target=&[u8]>`.

Signed-off-by: Tamir Duberstein <tamird@gmail.com>
Acked-by: Andreas Hindborg <a.hindborg@kernel.org>
Link: https://patch.msgid.link/20251018-cstr-core-v18-12-9378a54385f8@gmail.com
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2025-10-22 07:14:57 +02:00
Tamir Duberstein
965a39a962 rust: opp: use CStr::as_char_ptr
Replace the use of `as_ptr` which works through `<CStr as
Deref<Target=&[u8]>::deref()` in preparation for replacing
`kernel::str::CStr` with `core::ffi::CStr` as the latter does not
implement `Deref<Target=&[u8]>`.

Signed-off-by: Tamir Duberstein <tamird@gmail.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Link: https://patch.msgid.link/20251018-cstr-core-v18-10-9378a54385f8@gmail.com
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2025-10-22 07:14:47 +02:00
Tejun Heo
2dbbdeda77 sched_ext: Fix scx_bpf_dsq_insert() backward binary compatibility
cded46d971 ("sched_ext: Make scx_bpf_dsq_insert*() return bool")
introduced a new bool-returning scx_bpf_dsq_insert() and renamed the old
void-returning version to scx_bpf_dsq_insert___compat, with the expectation
that libbpf would match old binaries to the ___compat variant, maintaining
backward binary compatibility. However, while libbpf ignores ___suffix on
the BPF side when matching symbols, it doesn't do so for kernel-side symbols.
Old binaries compiled with the original scx_bpf_dsq_insert() could no longer
resolve the symbol.

Fix by reversing the naming: Keep scx_bpf_dsq_insert() as the old
void-returning interface and add ___v2 to the new bool-returning version.
This allows old binaries to continue working while new code can use the
___v2 variant. Once libbpf is updated to ignore kernel-side ___SUFFIX, the
___v2 suffix can be dropped when the compat interface is removed.

v2: Use ___v2 instead of ___new.

Fixes: cded46d971 ("sched_ext: Make scx_bpf_dsq_insert*() return bool")
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-10-21 10:40:15 -10:00
Brian Norris
0aa760051f docs: checkpatch: Drop networking comment style
Networking no longer has their own comment style, and checkpatch no
longer checks for this.

Signed-off-by: Brian Norris <briannorris@chromium.org>
Suggested-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <20251017203719.1554224-2-briannorris@chromium.org>
2025-10-21 14:36:33 -06:00
Brian Norris
7159cf9fad docs: checkpatch: Align block comment style
Ironically, the block style comments in the checkpatch documentation are
not aligned properly. Correct that.

Signed-off-by: Brian Norris <briannorris@chromium.org>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <20251017203719.1554224-1-briannorris@chromium.org>
2025-10-21 14:36:33 -06:00
Ally Heev
d0ef999061 Documentation: fix dev-tools broken links in translations
gdb and kgdb debugging documentation were moved to
Documentation/process/debugging/ as a part of
Commit d5af79c05e ("Documentation: move
dev-tools debugging files to process/debugging/"), but translations/
were not updated. Fix them

Signed-off-by: Ally Heev <allyheev@gmail.com>
Fixes: d5af79c05e ("Documentation: move dev-tools debugging files to process/debugging/")
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <20251020-aheev-fix-docs-dev-tools-broken-links-v2-1-7db64bf0405a@gmail.com>
2025-10-21 14:14:38 -06:00
Gopi Krishna Menon
77cd921027 docs: trusted-encrypted: fix htmldocs build error
Running "make htmldocs" generates the following build error and
warning in trusted-encrypted.rst:

Documentation/security/keys/trusted-encrypted.rst:18: ERROR: Unexpected indentation.
Documentation/security/keys/trusted-encrypted.rst:19: WARNING: Block quote ends without a blank line; unexpected unindent.

Add a blank line before bullet list and fix the indentation of text to
fix the build error and resolve the warning.

Fixes: 38f6880759 ("docs: trusted-encrypted: trusted-keys as protected keys")
Signed-off-by: Gopi Krishna Menon <krishnagopi487@gmail.com>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-10-21 11:58:29 +08:00
Waiman Long
d5cf4d34a3 cgroup/cpuset: Don't track # of local child partitions
The cpuset structure has a nr_subparts field which tracks the number
of child local partitions underneath a particular cpuset. Right now,
nr_subparts is only used in partition_is_populated() to avoid iteration
of child cpusets if the condition is right. So by always performing the
child iteration, we can avoid tracking the number of child partitions
and simplify the code a bit.

Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-10-20 07:01:48 -10:00
Menglong Dong
aa653654ee rhashtable: use likely for rhashtable lookup
Sometimes, the result of the rhashtable_lookup() is expected to be found.
Therefore, we can use likely() for such cases.

Following new functions are introduced, which will use likely or unlikely
during the lookup:

 rhashtable_lookup_likely
 rhltable_lookup_likely

A micro-benchmark is made for these new functions: lookup a existed entry
repeatedly for 100000000 times, and rhashtable_lookup_likely() gets ~30%
speedup.

Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-10-20 12:10:28 +08:00
T Pratham
9b04d8f005 crypto: aead - Fix reqsize handling
Commit afddce13ce ("crypto: api - Add reqsize to crypto_alg")
introduced cra_reqsize field in crypto_alg struct to replace type
specific reqsize fields. It looks like this was introduced specifically
for ahash and acomp from the commit description as subsequent commits
add necessary changes in these alg frameworks.

However, this is being recommended for use in all crypto algs
instead of setting reqsize using crypto_*_set_reqsize(). Using
cra_reqsize in aead algorithms, hence, causes memory corruptions and
crashes as the underlying functions in the algorithm framework have not
been updated to set the reqsize properly from cra_reqsize. [1]

Add proper set_reqsize calls in the aead init function to properly
initialize reqsize for these algorithms in the framework.

[1]: https://gist.github.com/Pratham-T/24247446f1faf4b7843e4014d5089f6b

Fixes: afddce13ce ("crypto: api - Add reqsize to crypto_alg")
Signed-off-by: T Pratham <t-pratham@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-10-20 12:10:28 +08:00
Meenakshi Aggarwal
66b9a095f7 crypto: caam - Add support of paes algorithm
PAES algorithm uses protected key for encryption/decryption operations.

Signed-off-by: Gaurav Jain <gaurav.jain@nxp.com>
Signed-off-by: Meenakshi Aggarwal <meenakshi.aggarwal@nxp.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-10-20 12:10:28 +08:00
Meenakshi Aggarwal
a703a4c2a3 KEYS: trusted: caam based protected key
- CAAM supports two types of protected keys:
  -- Plain key encrypted with ECB
  -- Plain key encrypted with CCM
  Due to robustness, default encryption used for protected key is CCM.

- Generate protected key blob and add it to trusted key payload.
  This is done as part of sealing operation, which is triggered
  when below two operations are requested:
  -- new key generation
  -- load key,

Signed-off-by: Pankaj Gupta <pankaj.gupta@nxp.com>
Signed-off-by: Meenakshi Aggarwal <meenakshi.aggarwal@nxp.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-10-20 12:10:28 +08:00
Meenakshi Aggarwal
38f6880759 docs: trusted-encrypted: trusted-keys as protected keys
Add a section in trusted key document describing the protected-keys.
- Detailing need for protected keys.
- Detailing the usage for protected keys.

Signed-off-by: Pankaj Gupta <pankaj.gupta@nxp.com>
Signed-off-by: Meenakshi Aggarwal <meenakshi.aggarwal@nxp.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-10-20 12:10:28 +08:00
Tamir Duberstein
5b60cde74b rust: remove spurious use core::fmt::Debug
We want folks to use `kernel::fmt` but this is only used for `derive` so
can be removed entirely.

This backslid in commit ea60cea07d ("rust: add `Alignment` type").

Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Signed-off-by: Tamir Duberstein <tamird@gmail.com>
Link: https://patch.msgid.link/20251018-cstr-core-v18-9-9378a54385f8@gmail.com
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2025-10-20 04:04:24 +02:00
Tamir Duberstein
0e947bc22b rust: pci: use kernel::fmt
Reduce coupling to implementation details of the formatting machinery by
avoiding direct use for `core`'s formatting traits and macros.

This backslid in commit ed78a01887 ("rust: pci: provide access to PCI
Class and Class-related items").

Acked-by: Danilo Krummrich <dakr@kernel.org>
Signed-off-by: Tamir Duberstein <tamird@gmail.com>
Link: https://patch.msgid.link/20251018-cstr-core-v18-8-9378a54385f8@gmail.com
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2025-10-20 04:04:23 +02:00
Miguel Ojeda
7b0c32cbed samples: rust: debugfs: use core::ffi::CStr method names
Prepare for `core::ffi::CStr` taking the place of `kernel::str::CStr` by
avoiding methods that only exist on the latter.

This backslid in commit d4a5d397c7 ("samples: rust: Add scoped debugfs
sample driver").

Link: https://patch.msgid.link/20251019213049.2060970-1-ojeda@kernel.org
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2025-10-20 04:04:23 +02:00
Tamir Duberstein
3f0dd5fad9 rust: debugfs: use kernel::fmt
Reduce coupling to implementation details of the formatting machinery by
avoiding direct use for `core`'s formatting traits and macros.

This backslid in commit 40ecc49466 ("rust: debugfs: Add support for
callback-based files") and commit 5e40b591cb ("rust: debugfs: Add
support for read-only files").

Acked-by: Danilo Krummrich <dakr@kernel.org>
Reviewed-by: Matthew Maurer <mmaurer@google.com>
Signed-off-by: Tamir Duberstein <tamird@gmail.com>
Link: https://patch.msgid.link/20251018-cstr-core-v18-7-9378a54385f8@gmail.com
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2025-10-20 04:04:23 +02:00
Tamir Duberstein
b0af4f9142 rust: alloc: use kernel::fmt
Reduce coupling to implementation details of the formatting machinery by
avoiding direct use for `core`'s formatting traits and macros.

This backslid in commit 9def0d0a2a ("rust: alloc: add
Vec::push_within_capacity").

Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Acked-by: Danilo Krummrich <dakr@kernel.org>
Signed-off-by: Tamir Duberstein <tamird@gmail.com>
Link: https://patch.msgid.link/20251018-cstr-core-v18-6-9378a54385f8@gmail.com
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2025-10-20 04:04:23 +02:00
Tamir Duberstein
5cc5d805e3 rnull: use kernel::fmt
Reduce coupling to implementation details of the formatting machinery by
avoiding direct use for `core`'s formatting traits and macros.

This backslid in commit d969d504bc ("rnull: enable configuration via
`configfs`") and commit 34585dc649 ("rnull: add soft-irq completion
support").

Acked-by: Andreas Hindborg <a.hindborg@kernel.org>
Signed-off-by: Tamir Duberstein <tamird@gmail.com>
Link: https://patch.msgid.link/20251018-cstr-core-v18-5-9378a54385f8@gmail.com
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2025-10-20 04:04:23 +02:00
Tamir Duberstein
0dac8cf44b rust_binder: use core::ffi::CStr method names
Prepare for `core::ffi::CStr` taking the place of `kernel::str::CStr` by
avoiding methods that only exist on the latter.

This backslid in commit eafedbc7c0 ("rust_binder: add Rust Binder
driver").

Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Signed-off-by: Tamir Duberstein <tamird@gmail.com>
Link: https://patch.msgid.link/20251018-cstr-core-v18-4-9378a54385f8@gmail.com
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2025-10-20 04:04:23 +02:00
Tamir Duberstein
5aed9677e5 rust_binder: use kernel::fmt
Reduce coupling to implementation details of the formatting machinery by
avoiding direct use for `core`'s formatting traits and macros.

This backslid in commit eafedbc7c0 ("rust_binder: add Rust Binder
driver").

Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Signed-off-by: Tamir Duberstein <tamird@gmail.com>
Link: https://patch.msgid.link/20251018-cstr-core-v18-3-9378a54385f8@gmail.com
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2025-10-20 04:04:23 +02:00
Tamir Duberstein
d9252f1be2 rust_binder: remove trailing comma
This prepares for a later commit in which we introduce a custom
formatting macro; that macro doesn't handle trailing commas so just
remove this one.

Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Signed-off-by: Tamir Duberstein <tamird@gmail.com>
Link: https://patch.msgid.link/20251018-cstr-core-v18-2-9378a54385f8@gmail.com
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2025-10-20 04:04:23 +02:00
Tamir Duberstein
061a8ac6de samples: rust: platform: remove trailing commas
This prepares for a later commit in which we introduce a custom
formatting macro; that macro doesn't handle trailing commas so just
remove them.

Acked-by: Danilo Krummrich <dakr@kernel.org>
Reviewed-by: Benno Lossin <lossin@kernel.org>
Signed-off-by: Tamir Duberstein <tamird@gmail.com>
Link: https://patch.msgid.link/20251018-cstr-core-v18-1-9378a54385f8@gmail.com
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2025-10-20 04:04:23 +02:00
Andrea Righi
67fa319f5f sched_ext: Allow forcibly picking an scx task
Refactor pick_task_scx() adding a new argument to forcibly pick a
SCHED_EXT task, ignoring any higher-priority sched class activity.

This refactoring prepares the code for future scenarios, e.g., allowing
the ext dl_server to force a SCHED_EXT task selection.

No functional changes.

Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrea Righi <arighi@nvidia.com>
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-10-18 13:01:35 -10:00
Ben Guo
9b31970e1a docs/zh_CN: Add translation of rust/testing.rst
Complete the translation of rust/testing.rst and add the testing TOC entry
to rust/index.rst.

Add the translation based on commit a3b2347343
("Documentation: rust: testing: add docs on the new KUnit `#[test]` tests").

Signed-off-by: Ben Guo <benx.guo@gmail.com>
Reviewed-by: Yanteng Si <siyanteng@cqsoftware.com.cm>
Signed-off-by: Alex Shi <alexs@kernel.org>
2025-10-18 14:37:25 +08:00
Shuo Zhao
7b8a943944 docs/zh_CN: Add secrets coco Chinese translation
Translate .../security/secrets/coco.rst into Chinese.

Update the translation through commit d56b699d76
("Documentation: Fix typos").

Signed-off-by: Shuo Zhao <zhaoshuo@cqsoftware.com.cn>
Reviewed-by: Yanteng Si <siyanteng@cqsoftware.com.cn>
Signed-off-by: Alex Shi <alexs@kernel.org>
2025-10-18 14:32:54 +08:00
Gopi Krishna Menon
96b546c241 Documentation/rtla: rename common_xxx.rst files to common_xxx.txt
Sphinx reports htmldocs errors:

Documentation/tools/rtla/common_options.rst:58: ERROR: Undefined substitution referenced: "threshold".
Documentation/tools/rtla/common_options.rst:88: ERROR: Undefined substitution referenced: "tool".
Documentation/tools/rtla/common_options.rst:88: ERROR: Undefined substitution referenced: "thresharg".
Documentation/tools/rtla/common_options.rst:88: ERROR: Undefined substitution referenced: "tracer".
Documentation/tools/rtla/common_options.rst:92: ERROR: Undefined substitution referenced: "tracer".
Documentation/tools/rtla/common_options.rst:98: ERROR: Undefined substitution referenced: "actionsperf".
Documentation/tools/rtla/common_options.rst:113: ERROR: Undefined substitution referenced: "tool".

common_*.rst files are snippets that are intended to be included by rtla
docs (rtla*.rst). common_options.rst in particular contains
substitutions which depend on other common_* includes, so building it
independently as reST source results in above errors.

Rename all common_*.rst files to common_*.txt to prevent Sphinx from
building these snippets as standalone reST source and update all include
references accordingly.

Link: https://www.sphinx-doc.org/en/master/usage/restructuredtext/basics.html#substitutions
Suggested-by: Tomas Glozar <tglozar@redhat.com>
Suggested-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Gopi Krishna Menon <krishnagopi487@gmail.com>
Reviewed-by: Tomas Glozar <tglozar@redhat.com>
Fixes: 05b7e10687 ("tools/rtla: Add remaining support for osnoise actions")
Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>
Link: https://lore.kernel.org/r/20251008184522.13201-1-krishnagopi487@gmail.com
[Bagas: massage commit message and apply trailers]
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <20251013092719.30780-2-bagasdotme@gmail.com>
2025-10-17 14:25:12 -06:00
Bagas Sanjaya
22605d257b Documentation: assoc_array: Format internal tree layout tables
Format tables in "Basic internal tree layout" as reST tables.

Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <20251013095630.34235-4-bagasdotme@gmail.com>
2025-10-17 14:23:04 -06:00
Bagas Sanjaya
54ff675c2b Documentation: assoc_array: Indent function explanation text
Paragraphs of function explanation are currently not indented following
their appropriate numbered list item, which causes only the first
paragraph and function prototype code blocks to be indented in the
numbered list in htmldocs output.

Indent the explanation.

Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <20251013095630.34235-3-bagasdotme@gmail.com>
2025-10-17 14:23:04 -06:00
Yohei Kojima
04623798aa docs: admin-guide: Fix a typo in kernel-parameters.txt
Fix a typo in the stacktrace parameter description in kernel-parameters.txt

Signed-off-by: Yohei Kojima <Yohei.Kojima@sony.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <edda15e3fcae13265278d3c3bd93ab077345d78f.1760498951.git.Yohei.Kojima@sony.com>
2025-10-17 14:16:10 -06:00
Jonathan Corbet
3df5affb4b Merge branch 'build-script' into docs-mw
Quoth Mauro:

This series should probably be called:

    "Move the trick-or-treat build hacks accumulated over time
     into a single place and document them."

as this reflects its main goal. As such:

- it places the jobserver logic on a library;
- it removes sphinx/parallel-wrapper.sh;
- the code now properly implements a jobserver-aware logic
  to do the parallelism when called via GNU make, failing back to
  "-j" when there's  no jobserver;
- converts check-variable-fonts.sh to Python and uses it via
  function call;
- drops an extra script to generate man pages, adding a makefile
  target for it;
- ensures that return code is 0 when PDF successfully builds;
- about half of the script is comments and documentation.

I tried to do my best to document all tricks that are inside the
script. This way, the docs build steps is now documented.

It should be noticed that it is out of the scope of this series
to change the implementation. Surely the process can be improved,
but first let's consolidate and document everything on a single
place.

Such script was written in a way that it can be called either
directly or via a Makefile. Running outside Makefile is
interesting specially when debug is needed. The command line
interface replaces the need of having lots of env vars before
calling sphinx-build:

    $ ./tools/docs/sphinx-build-wrapper --help
    usage: sphinx-build-wrapper [-h]
	   [--sphinxdirs SPHINXDIRS [SPHINXDIRS ...]] [--conf CONF]
	   [--builddir BUILDDIR] [--theme THEME] [--css CSS] [--paper {,a4,letter}] [-v]
	   [-j JOBS] [-i] [-V [VENV]]
	   {cleandocs,linkcheckdocs,htmldocs,epubdocs,texinfodocs,infodocs,mandocs,latexdocs,pdfdocs,xmldocs}

    Kernel documentation builder

    positional arguments:
      {cleandocs,linkcheckdocs,htmldocs,epubdocs,texinfodocs,infodocs,mandocs,latexdocs,pdfdocs,xmldocs}
			    Documentation target to build

    options:
      -h, --help            show this help message and exit
      --sphinxdirs SPHINXDIRS [SPHINXDIRS ...]
			    Specific directories to build
      --conf CONF           Sphinx configuration file
      --builddir BUILDDIR   Sphinx configuration file
      --theme THEME         Sphinx theme to use
      --css CSS             Custom CSS file for HTML/EPUB
      --paper {,a4,letter}  Paper size for LaTeX/PDF output
      -v, --verbose         place build in verbose mode
      -j, --jobs JOBS       Sets number of jobs to use with sphinx-build
      -i, --interactive     Change latex default to run in interactive mode
      -V, --venv [VENV]     If used, run Sphinx from a venv dir (default dir: sphinx_latest)

the only mandatory argument is the target, which is identical with
"make" targets.

The call inside Makefile doesn't use the last four arguments. They're
there to help identifying problems at the build:

    -v makes the output verbose;
    -j helps to test parallelism;
    -i runs latexmk in interactive mode, allowing to debug PDF
       build issues;
    -V is useful when testing it with different venvs.

When used with GNU make (or some other make which implements jobserver),
a call like:

    make -j <targets> htmldocs

will make the wrapper to automatically use POSIX jobserver to claim
the number of available job slots, calling sphinx-build with a
"-j" parameter reflecting it. ON such case, the default can be
overriden via SPHINXDIRS argument.

Visiable changes when compared with the old behavior:

When V=0, the only visible difference is that:
- pdfdocs target now returns 0 on success, 1 on failures.
  This addresses an issue over the current process where we
  it always return success even on failures;
- it will now print the name of PDF files that failed to build,
  if any.

In verbose mode, sphinx-build-wrapper and sphinx-build command lines
are now displayed.
2025-10-17 14:11:30 -06:00
Jonathan Corbet
d0841b8761 Merge branch 'media-uapi' into docs-mw
Mauro says:

In the past, media used Docbook to generate documentation, together
with some logic to ensure that cross-references would match the
actual defined uAPI.

The rationale is that we wanted to automatically check for uAPI
documentation gaps.

The same logic was migrated to Sphinx. Back then, broken links
were reported. However, recent versions of it and/or changes at
conf.py disabled such checks.

The result is that several symbols are now not cross-referenced,
and we don't get warnings anymore when something breaks.

This series consist on 2 parts:

Part 1: extra patches to parse_data_structs.py and kernel_include.py;
Part 2: media documentation fixes.
2025-10-17 14:07:08 -06:00
Mauro Carvalho Chehab
be63b06be5 docs: media: dvb: fix dmx.h.rst.exceptions
There are lots of broken links on dmx. Those are mostly linked
to namespace handling.

Yet, some symbols were pointed to the wrong locations, and there
are some definitions that aren't needed.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <b2acf243771529daa925afddd2b68d07d7bbb164.1759329363.git.mchehab+huawei@kernel.org>
2025-10-17 13:57:00 -06:00
Mauro Carvalho Chehab
c7d830d26b docs: media: dvb: headers: warn about broken cross references
Enable cross-reference warnings for demux header.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <8f98dde399df8b937dadf09168194bacce682c7d.1759329363.git.mchehab+huawei@kernel.org>
2025-10-17 13:57:00 -06:00
Mauro Carvalho Chehab
6393c3780e docs: media: dmx_types: place kerneldoc at the right namespace
The DVB documentation is using DTV.dmx for all demux symbols.

Use such domain for kernel-doc documentation.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <27fcc036fb5c80bda8116029e1964ad229208095.1759329363.git.mchehab+huawei@kernel.org>
2025-10-17 13:57:00 -06:00
Mauro Carvalho Chehab
95a0bd5d79 docs: cec: show broken xrefs and show TOC instead of cec.h content
Enable xref broken warnings. While here, change the output to
only show cross-references, as there's no need to show the entire cec.h
header at the docs.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <c587872ca3685213d9f8e88277404c9e253633df.1759329363.git.mchehab+huawei@kernel.org>
2025-10-17 13:57:00 -06:00
Mauro Carvalho Chehab
becd89fd86 docs: cec: cec.h.rst.exceptions: fix broken references from cec.h
All references there belong to CEC namespace.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <dd1270dd5d91538cdfb0b087127c53a9f4ba7885.1759329363.git.mchehab+huawei@kernel.org>
2025-10-17 13:56:59 -06:00
Mauro Carvalho Chehab
7ef84239ed media: docs: add some C domain missing references
Some enum/struct fields don't contain C domain references.
Add them to avoid broken xrefs.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <c9b036938197f1dd5bc93f5c5be0245bd9e5d1ee.1759329363.git.mchehab+huawei@kernel.org>
2025-10-17 13:56:59 -06:00
Mauro Carvalho Chehab
753b113b77 media: docs: videodev2.h.rst.exceptions: ignore struct __kernel_v4l2_timeval
This is an ancillary struct used for year-2038 compat logic.
It is not meant to be used directly on userspace.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <a6a0dc7366b1a5d7184b8f7d4ba27689051a1f6a.1759329363.git.mchehab+huawei@kernel.org>
2025-10-17 13:56:59 -06:00
Mauro Carvalho Chehab
fec3d4c376 media: docs: add a missing reference for VIDIOC_QUERY_CTRL
This one is missing its c:macro definition.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <46f86be6ace28abe83ea9ce6fa6138e40185a23a.1759329363.git.mchehab+huawei@kernel.org>
2025-10-17 13:56:59 -06:00
Mauro Carvalho Chehab
145b1d5c2e docs: media: videodev2.h.rst.exceptions: fix namespace on refs
Media uses V4L domain, but the replace rules are not considering
it.

Fix them.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <927f9c19d90b62ffda950cdac9bba23c2ca09f53.1759329363.git.mchehab+huawei@kernel.org>
2025-10-17 13:56:59 -06:00
Mauro Carvalho Chehab
ce062cdc2e docs: media: add missing c namespace to V4L headers
Media references belong to V4L namespace. Fix a lot of broken
links when including videodev2.h and probably fixing several
other broken cross-references between different files inside
media docs.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <b69fc5bd43da7d326b9f4720df59388088c64065.1759329363.git.mchehab+huawei@kernel.org>
2025-10-17 13:56:59 -06:00
Mauro Carvalho Chehab
11578a2ecb docs: kernel_include.py: use get_close_matches() to propose alternatives
Improve the suggestions algorithm by using get_close_matches() if
no suggestions with the same name are found. As we're now building
a dict, when the name is identical, but on a different domain,
the search is O(1), making it a lot faster.

The get_close_matches is also fast, as there is just one loop,
instead of 3.

This can be useful to detect typos on references, with could
be the base of a futuere extension that will handle ref unmatches
for the entire build, allowing someone to find typos and fix them.

As difflib and get_close_matches are there since the early
Python 3.x days, we don't need to handle any extra dependencies
to use it.

We're keeping the default values for the search, e.g. n=3, cutoff=0.6.
With that, we now have things like:

  $ make SPHINXDIRS="userspace-api/media" htmldocs
  ...
  include/uapi/linux/videodev2.h:199: WARNING: Invalid xref: c:type:`v4l2_memory`. Possible alternatives:
        c:type:`v4l2_meta_format` (from v4l/dev-meta)
        c:type:`v4l2_rect` (from v4l/dev-overlay)
        c:type:`v4l2_area` (from v4l/ext-ctrls-image-source) [ref.missing]
  ...
  include/uapi/linux/videodev2.h:1985: WARNING: Invalid xref: c:type:`V4L.v4l2_queryctrl`. Possible alternatives:
        std🏷️`v4l2-queryctrl` (from v4l/vidioc-queryctrl)
        std🏷️`v4l2-query-ext-ctrl` (from v4l/vidioc-queryctrl)

At the first example, it was not a typo, but a symbol that doesn't
seem to be properly documented. The second example points to
v4l2-queryctrl, which is a close match for the symbol.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <7365feb74cbdd6b982c87baf5863360ab98cf727.1759329363.git.mchehab+huawei@kernel.org>
2025-10-17 13:56:59 -06:00
Mauro Carvalho Chehab
2792fc7307 docs: media: mediactl: use TOC instead of file contents
All we wanted were to have a way to link the comprehensive
documentation with the actual symbols parsed from the header
file, as this helps to identify any broken references.

Use the new :toc: flag for media controller and enable warnings.

Here, we need to adjust the exceptions file to setup the C
namespace accordingly.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <5c8a87be712397563fc8ca914c3d92fe675e4071.1759329363.git.mchehab+huawei@kernel.org>
2025-10-17 13:56:59 -06:00
Mauro Carvalho Chehab
94d95887ea docs: media: rc: use TOC instead of file contents for LIRC header
All we wanted were to have a way to link the comprehensive
documentation with the actual symbols parsed from the header
file, as this helps to identify any broken references.

Use the new :toc: flag for LIRC and enable warnings.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <94dc601b4777ca544489ffc6cef9a2da5fe28e0e.1759329363.git.mchehab+huawei@kernel.org>
2025-10-17 13:56:59 -06:00
Mauro Carvalho Chehab
a75968226c docs: media: dvb: enable warnings for most headers
Except for two exception rules and dmx.h, the other files
are already handling properly cross references.

Fix the two exception rules for frontend.h, as those are
false-positives:

	include/uapi/linux/dvb/frontend.h:959: WARNING: can't link to: c:type:: FE_GET_PROPERTY
	include/uapi/linux/dvb/frontend.h:933: WARNING: can't link to: c:func:: FE_SET_FRONTEND_TUNE_MODE

The dmx.h are actual issues that will require an extra
patch to fill gaps.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <9ae6556ebd47de4f066a149ab0bbe7ce27acf2c4.1759329363.git.mchehab+huawei@kernel.org>
2025-10-17 13:56:59 -06:00
Mauro Carvalho Chehab
f0eb1b4ce7 docs: media: dvb: use TOC instead of file contents at headers
All we wanted were to have a way to link the comprehensive
documentation with the actual symbols parsed from the header
file, as this helps to identify any broken references.

Use the new :toc: flag for DVB.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <dbe95d83ae2117ed532fda423fd1c1d9906b7a19.1759329363.git.mchehab+huawei@kernel.org>
2025-10-17 13:56:59 -06:00
Mauro Carvalho Chehab
d2a72e1f27 tools: docs: parse_data_structs.py: accept more reftypes
Current regex is limited to only some c-domain reftypes.
There are several others.

Change the code to pick the name specified there.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <5c146923d1e3183893f290216fb1378954e2e540.1759329363.git.mchehab+huawei@kernel.org>
2025-10-17 13:56:59 -06:00
Mauro Carvalho Chehab
7f809e6a6f tools: docs: parse_data_structs.py: add namespace support
C domain supports a ".. c:namespace::" tag that allows setting a
symbol namespace. This is used within the kernel to avoid warnings
about duplicated symbols. This is specially important for syscalls,
as each subsystem may have their own documentation for them.
This is specially true for ioctl.

When such tag is used, all C domain symbols have c++ style,
e.g. they'll become "{namespace}.{reference}".

Allow specifying C namespace at the exception files, avoiding
the need of override rules for every symbol.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <cc27ec60ceb3bdac4197fb7266d2df8155edacda.1759329363.git.mchehab+huawei@kernel.org>
2025-10-17 13:56:59 -06:00
Mauro Carvalho Chehab
2cdd27a708 tools: docs: parse_data_structs.py: get rid of process_exceptions()
Add an extra parameter to parse_file to make it handle exceptions
internally, cleaning up the API.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <8575bbc94ff706aa7e7cc3a188399ca17a3169e6.1759329363.git.mchehab+huawei@kernel.org>
2025-10-17 13:56:59 -06:00
Mauro Carvalho Chehab
9e4173432e tools: docs: parse_data_structs: make process_exceptions two stages
Split the logic which parses exceptions on two stages, preparing
the exceptions file to have rules that will affect xref generation.

For now, preserve the original API.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <da9ca5f2ce1ffcfb355e32e676ff013607c227e0.1759329363.git.mchehab+huawei@kernel.org>
2025-10-17 13:56:59 -06:00
Mauro Carvalho Chehab
641a4a13f3 docs: kernel_include.py: propose alternatives
Specially when using c::namespace, it is not hard to break
a reference by forgetting to add a domain. Also, different
cases and using "-"/"_" the wrong way are typical cases that
people often gets wrong.

We might use a more complex logic here to also check for typos,
but let's keep it plain, simple.

This is enough to get thos exeptions from media controller:

  .../include/uapi/linux/media.h:26: WARNING: Invalid xref: c:type:`media_device_info`. Possible alternatives:
	c:type:`MC.media_device_info` (from mediactl/media-ioc-device-info)
  .../include/uapi/linux/media.h:149: WARNING: Invalid xref: c:type:`media_entity_desc`. Possible alternatives:
	c:type:`MC.media_entity_desc` (from mediactl/media-ioc-enum-entities)
  .../include/uapi/linux/media.h:228: WARNING: Invalid xref: c:type:`media_link_desc`. Possible alternatives:
	c:type:`MC.media_link_desc` (from mediactl/media-ioc-enum-links)
  .../include/uapi/linux/media.h:235: WARNING: Invalid xref: c:type:`media_links_enum`. Possible alternatives:
	c:type:`MC.media_links_enum` (from mediactl/media-ioc-enum-links)
  .../include/uapi/linux/media.h:212: WARNING: Invalid xref: c:type:`media_pad_desc`. Possible alternatives:
	c:type:`MC.media_pad_desc` (from mediactl/media-ioc-enum-links)
  .../include/uapi/linux/media.h:298: WARNING: Invalid xref: c:type:`media_v2_entity`. Possible alternatives:
	c:type:`MC.media_v2_entity` (from mediactl/media-ioc-g-topology)
  .../include/uapi/linux/media.h:312: WARNING: Invalid xref: c:type:`media_v2_interface`. Possible alternatives:
	c:type:`MC.media_v2_interface` (from mediactl/media-ioc-g-topology)
  .../include/uapi/linux/media.h:307: WARNING: Invalid xref: c:type:`media_v2_intf_devnode`. Possible alternatives:
	c:type:`MC.media_v2_intf_devnode` (from mediactl/media-ioc-g-topology)
  .../include/uapi/linux/media.h:341: WARNING: Invalid xref: c:type:`media_v2_link`. Possible alternatives:
	c:type:`MC.media_v2_link` (from mediactl/media-ioc-g-topology)
  .../include/uapi/linux/media.h:333: WARNING: Invalid xref: c:type:`media_v2_pad`. Possible alternatives:
	c:type:`MC.media_v2_pad` (from mediactl/media-ioc-g-topology)
  .../include/uapi/linux/media.h:349: WARNING: Invalid xref: c:type:`media_v2_topology`. Possible alternatives:
	c:type:`MC.media_v2_topology` (from mediactl/media-ioc-g-topology)

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <4c75d277e950e619ea00ba2dea336853a4aac976.1759329363.git.mchehab+huawei@kernel.org>
2025-10-17 13:56:59 -06:00
Mauro Carvalho Chehab
3ed9521772 docs: kernel_include.py: fix line numbers for TOC
On TOC output, we need to embeed line numbers with ViewList.

Change the parse class to produce a line-number parsed result,
and adjust the output accordingly.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <74eed96e32f79eaaef7a99ffe7c3224fed369c27.1759329363.git.mchehab+huawei@kernel.org>
2025-10-17 13:56:59 -06:00
Mauro Carvalho Chehab
ba9fbb3d9a tools: docs: parse_data_structs.py: output a line number
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <dcffa6844dede00052f5fb851a857991468f22b5.1759329363.git.mchehab+huawei@kernel.org>
2025-10-17 13:56:59 -06:00
Mauro Carvalho Chehab
ba99100687 tools: docs: parse_data_structs.py: drop contents header
When used in practice, one may want to have multiple header
files on a single rst file, like:

	***********************
	Digital TV uAPI symbols
	***********************

	.. contents:: Table of Contents
	   :depth: 2
	   :local:

	Frontend
	========

	.. kernel-include:: include/uapi/linux/dvb/frontend.h
	    :generate-cross-refs:
	    :toc:

	Demux
	=====

	.. kernel-include:: include/uapi/linux/dvb/dmx.h
	    :generate-cross-refs:
	    :toc:

	...

So, don't add ..contents:: here.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <4bf353e5248133a3b0abd82519a38453402fe7c6.1759329363.git.mchehab+huawei@kernel.org>
2025-10-17 13:56:59 -06:00
Gaurav Kashyap
4f3b5f9edc dt-bindings: crypto: qcom,inline-crypto-engine: Document the kaanapali ICE
Document the Inline Crypto Engine (ICE) on the kaanapali platform.

Signed-off-by: Gaurav Kashyap <gaurav.kashyap@oss.qualcomm.com>
Signed-off-by: Jingyi Wang <jingyi.wang@oss.qualcomm.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Eugen Hristev <eugen.hristev@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-10-17 16:10:27 +08:00
Karina Yankevich
1617d93c12 crypto: drbg - make drbg_{ctr_bcc,kcapi_sym}() return *void*
drgb_kcapi_sym() always returns 0, so make it return void instead.
Consequently, make drbg_ctr_bcc() return void too.

Found by Linux Verification Center (linuxtesting.org) with the Svace static
analysis tool.

[Sergey: fixed the subject, refreshed the patch]

Signed-off-by: Karina Yankevich <k.yankevich@omp.ru>
Signed-off-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-10-17 16:10:27 +08:00
Herbert Xu
96feb73def crypto: authenc - Correctly pass EINPROGRESS back up to the caller
When authenc is invoked with MAY_BACKLOG, it needs to pass EINPROGRESS
notifications back up to the caller when the underlying algorithm
returns EBUSY synchronously.

However, if the EBUSY comes from the second part of an authenc call,
i.e., it is asynchronous, both the EBUSY and the subsequent EINPROGRESS
notification must not be passed to the caller.

Implement this by passing a mask to the function that starts the
second half of authenc and using it to determine whether EBUSY
and EINPROGRESS should be passed to the caller.

This was a deficiency in the original implementation of authenc
because it was not expected to be used with MAY_BACKLOG.

Reported-by: Ingo Franzki <ifranzki@linux.ibm.com>
Reported-by: Mikulas Patocka <mpatocka@redhat.com>
Fixes: 180ce7e810 ("crypto: authenc - Add EINPROGRESS check")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-10-17 16:03:58 +08:00
Shivani Agarwal
6f6e309328 crypto: af_alg - zero initialize memory allocated via sock_kmalloc
Several crypto user API contexts and requests allocated with
sock_kmalloc() were left uninitialized, relying on callers to
set fields explicitly. This resulted in the use of uninitialized
data in certain error paths or when new fields are added in the
future.

The ACVP patches also contain two user-space interface files:
algif_kpp.c and algif_akcipher.c. These too rely on proper
initialization of their context structures.

A particular issue has been observed with the newly added
'inflight' variable introduced in af_alg_ctx by commit:

  67b164a871 ("crypto: af_alg - Disallow multiple in-flight AIO requests")

Because the context is not memset to zero after allocation,
the inflight variable has contained garbage values. As a result,
af_alg_alloc_areq() has incorrectly returned -EBUSY randomly when
the garbage value was interpreted as true:

  https://github.com/gregkh/linux/blame/master/crypto/af_alg.c#L1209

The check directly tests ctx->inflight without explicitly
comparing against true/false. Since inflight is only ever set to
true or false later, an uninitialized value has triggered
-EBUSY failures. Zero-initializing memory allocated with
sock_kmalloc() ensures inflight and other fields start in a known
state, removing random issues caused by uninitialized data.

Fixes: fe869cdb89 ("crypto: algif_hash - User-space interface for hash operations")
Fixes: 5afdfd22e6 ("crypto: algif_rng - add random number generator support")
Fixes: 2d97591ef4 ("crypto: af_alg - consolidation of duplicate code")
Fixes: 67b164a871 ("crypto: af_alg - Disallow multiple in-flight AIO requests")
Cc: stable@vger.kernel.org
Signed-off-by: Shivani Agarwal <shivani.agarwal@broadcom.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-10-17 16:03:57 +08:00
Jonathan McDowell
e74b96d77d hwrng: core - Allow runtime disabling of the HW RNG
The HW RNG core allows for manual selection of which RNG device to use,
but does not allow for no device to be enabled. It may be desirable to
do this on systems with only a single suitable hardware RNG, where we
need exclusive access to other functionality on this device. In
particular when performing TPM firmware upgrades this lets us ensure the
kernel does not try to access the device.

Before:

root@debian-qemu-efi:~# grep "" /sys/devices/virtual/misc/hw_random/rng_*
/sys/devices/virtual/misc/hw_random/rng_available:tpm-rng-0
/sys/devices/virtual/misc/hw_random/rng_current:tpm-rng-0
/sys/devices/virtual/misc/hw_random/rng_quality:1024
/sys/devices/virtual/misc/hw_random/rng_selected:0

After:

root@debian-qemu-efi:~# grep "" /sys/devices/virtual/misc/hw_random/rng_*
/sys/devices/virtual/misc/hw_random/rng_available:tpm-rng-0 none
/sys/devices/virtual/misc/hw_random/rng_current:tpm-rng-0
/sys/devices/virtual/misc/hw_random/rng_quality:1024
/sys/devices/virtual/misc/hw_random/rng_selected:0

root@debian-qemu-efi:~# echo none > /sys/devices/virtual/misc/hw_random/rng_current
root@debian-qemu-efi:~# grep "" /sys/devices/virtual/misc/hw_random/rng_*
/sys/devices/virtual/misc/hw_random/rng_available:tpm-rng-0 none
/sys/devices/virtual/misc/hw_random/rng_current:none
grep: /sys/devices/virtual/misc/hw_random/rng_quality: No such device
/sys/devices/virtual/misc/hw_random/rng_selected:1

(Observe using bpftrace no calls to TPM being made)

root@debian-qemu-efi:~# echo "" > /sys/devices/virtual/misc/hw_random/rng_current
root@debian-qemu-efi:~# grep "" /sys/devices/virtual/misc/hw_random/rng_*
/sys/devices/virtual/misc/hw_random/rng_available:tpm-rng-0 none
/sys/devices/virtual/misc/hw_random/rng_current:tpm-rng-0
/sys/devices/virtual/misc/hw_random/rng_quality:1024
/sys/devices/virtual/misc/hw_random/rng_selected:0

(Observe using bpftrace that calls to the TPM resume)

Signed-off-by: Jonathan McDowell <noodles@meta.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-10-17 16:03:57 +08:00
Guangshuo Li
7cf6e0b69b crypto: caam - Add check for kcalloc() in test_len()
As kcalloc() may fail, check its return value to avoid a NULL pointer
dereference when passing the buffer to rng->read(). On allocation
failure, log the error and return since test_len() returns void.

Fixes: 2be0d806e2 ("crypto: caam - add a test for the RNG")
Cc: stable@vger.kernel.org
Signed-off-by: Guangshuo Li <lgs201920130244@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-10-17 16:03:57 +08:00
Kael D'Alcamo
59682835e1 dt-bindings: rng: microchip,pic32-rng: convert to DT schema
Convert the Devicetree binding documentation for microchip,pic32mzda-rng
from plain text to YAML.

Signed-off-by: Kael D'Alcamo <dev@kael-k.io>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-10-17 16:03:57 +08:00
Thorsten Blum
6af9914f7b crypto: hifn_795x - replace simple_strtoul with kstrtouint
Replace simple_strtoul() with the recommended kstrtouint() for parsing
the 'hifn_pll_ref=' module parameter. Unlike simple_strtoul(), which
returns an unsigned long, kstrtouint() converts the string directly to
an unsigned integer and avoids implicit casting.

Check the return value of kstrtouint() and fall back to 66 MHz if
parsing fails. This adds error handling while preserving existing
behavior for valid values, and removes use of the deprecated
simple_strtoul() helper.

Add a space in the log message to correctly format "66 MHz" while we're
at it.

Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-10-17 16:03:57 +08:00
Thorsten Blum
33eea63ff9 crypto: fips - replace simple_strtol with kstrtoint to improve fips_enable
Replace simple_strtol() with the recommended kstrtoint() for parsing the
'fips=' boot parameter. Unlike simple_strtol(), which returns a long,
kstrtoint() converts the string directly to an integer and avoids
implicit casting.

Check the return value of kstrtoint() and reject invalid values. This
adds error handling while preserving existing behavior for valid values,
and removes use of the deprecated simple_strtol() helper.

Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-10-17 16:03:57 +08:00
Harsh Jain
1a098379f2 crypto: xilinx-trng - Add CTR_DRBG DF processing of seed
Versal TRNG IP does not support Derivation Function (DF) of seed.
Add DF processing for CTR_DRBG mode.

Signed-off-by: Harsh Jain <h.jain@amd.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-10-17 16:03:57 +08:00
Harsh Jain
ba0570bdf1 crypto: drbg - Replace AES cipher calls with library calls
Replace aes used in drbg with library calls.

Signed-off-by: Harsh Jain <h.jain@amd.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-10-17 16:03:57 +08:00
Harsh Jain
6c4fed5fee crypto: drbg - Export CTR DRBG DF functions
Export drbg_ctr_df() derivative function to new module df_sp80090.

Signed-off-by: Harsh Jain <h.jain@amd.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-10-17 16:03:57 +08:00
Tejun Heo
70d837c3e0 sched_ext: Merge branch 'sched/core' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into for-6.19
Pull in tip/sched/core to receive:

 50653216e4 ("sched: Add support to pick functions to take rf")
 4c95380701 ("sched/ext: Fold balance_scx() into pick_task_scx()")

which will enable clean integration of DL server support among other things.

This conflicts with the following from sched_ext/for-6.18-fixes:

 a8ad873113 ("sched_ext: defer queue_balance_callback() until after ops.dispatch")

which adds maybe_queue_balance_callback() to balance_scx() which is removed
by 50653216e4. Resolve by moving the invocation to pick_task_scx() in the
equivalent location.

Signed-off-by: Tejun Heo <tj@kernel.org>
2025-10-16 08:45:38 -10:00
Tejun Heo
075e3f7206 sched_ext: Merge branch 'for-6.18-fixes' into for-6.19
Pull sched_ext/for-6.18-fixes to sync trees to receive:

 05e63305c8 ("sched_ext: Fix scx_kick_pseqs corruption on concurrent scheduler loads")

to avoid conflicts with planned cgroup sub-sched support.

Signed-off-by: Tejun Heo <tj@kernel.org>
2025-10-16 08:34:12 -10:00
Jann Horn
4336927351 ima: add fs_subtype condition for distinguishing FUSE instances
Linux systems often use FUSE for several different purposes, where the
contents of some FUSE instances can be of more interest for auditing
than others.

Allow distinguishing between them based on the filesystem subtype
(s_subtype) using the new condition "fs_subtype".

The subtype string is supplied by userspace FUSE daemons
when a FUSE connection is initialized, so policy authors who want to
filter based on subtype need to ensure that FUSE mount operations are
sufficiently audited or restricted.

Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
2025-10-16 11:12:20 -04:00
Jann Horn
345123d650 ima: add dont_audit action to suppress audit actions
"measure", "appraise" and "hash" actions all have corresponding "dont_*"
actions, but "audit" currently lacks that. This means it is not
currently possible to have a policy that audits everything by default,
but excludes specific cases.

This seems to have been an oversight back when the "audit" action was
added.

Add a corresponding "dont_audit" action to enable such uses.

Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
2025-10-16 11:12:20 -04:00
Ryan Newton
5aff3b3199 sched_ext: Add a selftest for scx_bpf_dsq_peek
This commit adds two tests. The first is the most basic unit test:
make sure an empty queue peeks as empty, and when we put one element
in the queue, make sure peek returns that element.

However, even this simple test is a little complicated by the different
behavior of scx_bpf_dsq_insert in different calling contexts:
 - insert is for direct dispatch in enqueue
 - insert is delayed when called from select_cpu

In this case we split the insert and the peek that verifies the
result between enqueue/dispatch.

Note: An alternative would be to call `scx_bpf_dsq_move_to_local` on an
empty queue, which in turn calls `flush_dispatch_buf`, in order to flush
the buffered insert. Unfortunately, this is not viable within the
enqueue path, as it attempts a voluntary context switch within an RCU
read-side critical section.

The second test is a stress test that performs many peeks on all DSQs
and records the observed tasks.

Signed-off-by: Ryan Newton <newton@meta.com>
Reviewed-by: Christian Loehle <christian.loehle@arm.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-10-15 06:46:36 -10:00
Ryan Newton
44f5c8ec5b sched_ext: Add lockless peek operation for DSQs
The builtin DSQ queue data structures are meant to be used by a wide
range of different sched_ext schedulers with different demands on these
data structures. They might be per-cpu with low-contention, or
high-contention shared queues. Unfortunately, DSQs have a coarse-grained
lock around the whole data structure. Without going all the way to a
lock-free, more scalable implementation, a small step we can take to
reduce lock contention is to allow a lockless, small-fixed-cost peek at
the head of the queue.

This change allows certain custom SCX schedulers to cheaply peek at
queues, e.g. during load balancing, before locking them. But it
represents a few extra memory operations to update the pointer each
time the DSQ is modified, including a memory barrier on ARM so the write
appears correctly ordered.

This commit adds a first_task pointer field which is updated
atomically when the DSQ is modified, and allows any thread to peek at
the head of the queue without holding the lock.

Signed-off-by: Ryan Newton <newton@meta.com>
Reviewed-by: Andrea Righi <arighi@nvidia.com>
Reviewed-by: Christian Loehle <christian.loehle@arm.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-10-15 06:46:25 -10:00
Fushuai Wang
5cb5575308 selftests: livepatch: use canonical ftrace path
Since v4.1 kernel, a new interface for ftrace called "tracefs" was
introduced, which is usually mounted in /sys/kernel/tracing. Therefore,
tracing files can now be accessed via either the legacy path
/sys/kernel/debug/tracing or the newer path /sys/kernel/tracing.

Signed-off-by: Fushuai Wang <wangfushuai@baidu.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Tested-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
2025-10-15 14:50:46 +02:00
Song Liu
139560e8b9 livepatch: Match old_sympos 0 and 1 in klp_find_func()
When there is only one function of the same name, old_sympos of 0 and 1
are logically identical. Match them in klp_find_func().

This is to avoid a corner case with different toolchain behavior.

In this specific issue, two versions of kpatch-build were used to
build livepatch for the same kernel. One assigns old_sympos == 0 for
unique local functions, the other assigns old_sympos == 1 for unique
local functions. Both versions work fine by themselves. (PS: This
behavior change was introduced in a downstream version of kpatch-build.
This change does not exist in upstream kpatch-build.)

However, during livepatch upgrade (with the replace flag set) from a
patch built with one version of kpatch-build to the same fix built with
the other version of kpatch-build, livepatching fails with errors like:

[   14.218706] sysfs: cannot create duplicate filename 'xxx/somefunc,1'
...
[   14.219466] Call Trace:
[   14.219468]  <TASK>
[   14.219469]  dump_stack_lvl+0x47/0x60
[   14.219474]  sysfs_warn_dup.cold+0x17/0x27
[   14.219476]  sysfs_create_dir_ns+0x95/0xb0
[   14.219479]  kobject_add_internal+0x9e/0x260
[   14.219483]  kobject_add+0x68/0x80
[   14.219485]  ? kstrdup+0x3c/0xa0
[   14.219486]  klp_enable_patch+0x320/0x830
[   14.219488]  patch_init+0x443/0x1000 [ccc_0_6]
[   14.219491]  ? 0xffffffffa05eb000
[   14.219492]  do_one_initcall+0x2e/0x190
[   14.219494]  do_init_module+0x67/0x270
[   14.219496]  init_module_from_file+0x75/0xa0
[   14.219499]  idempotent_init_module+0x15a/0x240
[   14.219501]  __x64_sys_finit_module+0x61/0xc0
[   14.219503]  do_syscall_64+0x5b/0x160
[   14.219505]  entry_SYSCALL_64_after_hwframe+0x4b/0x53
[   14.219507] RIP: 0033:0x7f545a4bd96d
...
[   14.219516] kobject: kobject_add_internal failed for somefunc,1 with
    -EEXIST, don't try to register things with the same name ...

This happens because klp_find_func() thinks somefunc with old_sympos==0
is not the same as somefunc with old_sympos==1, and klp_add_object_nops
adds another xxx/func,1 to the list of functions to patch.

Signed-off-by: Song Liu <song@kernel.org>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
[pmladek@suse.com: Fixed some typos.]
Reviewed-by: Petr Mladek <pmladek@suse.com>
Tested-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
2025-10-15 14:47:19 +02:00
zhidao su
347ed2d566 sched/ext: Implement cgroup_set_idle() callback
Implement the missing cgroup_set_idle() callback that was marked as a
TODO. This allows BPF schedulers to be notified when a cgroup's idle
state changes, enabling them to adjust their scheduling behavior
accordingly.

The implementation follows the same pattern as other cgroup callbacks
like cgroup_set_weight() and cgroup_set_bandwidth(). It checks if the
BPF scheduler has implemented the callback and invokes it with the
appropriate parameters.

Fixes a spelling error in the cgroup_set_bandwidth() documentation.

tj: s/scx_cgroup_rwsem/scx_cgroup_ops_rwsem/ to fix build breakage.

Signed-off-by: zhidao su <soolaugust@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-10-14 10:17:33 -10:00
Bagas Sanjaya
44abc8fcbf Documentation: process: Arbitrarily bump kernel major version number
The big picture section of 2.Process.rst currently hardcodes major
version number to 5 since fb0e0ffe7f ("Documentation: bring process
docs up to date"). As it can get outdated when it is actually
incremented (the recent is 6 and will be 7 in the near future),
arbitrarily bump it to 9, giving a headroom for a decade.

Note that the version number examples are kept to illustrate the
numbering scheme.

Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
[jc: tweaked the initial 9.x mention slightly]
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <20250922074219.26241-1-bagasdotme@gmail.com>
2025-10-14 09:14:03 -06:00
Akiyoshi Kurita
3a2ddc5fb1 docs: ja_JP: SubmittingPatches: describe the 'Fixes:' tag
Sync the ja_JP translation with the following upstream commits:

commit 8401aa1f59 ("Documentation/SubmittingPatches: describe the Fixes: tag")
commit 19c3fe285c ("docs: Explicitly state that the 'Fixes:' tag shouldn't split lines")
commit 5b5bbb8cc5 ("docs: process: Add an example for creating a fixes tag")
commit 6356f18f09 ("Align git commit ID abbreviation guidelines and checks")

The mix of plain text and reST markup for ``git bisect`` is intentional to
align with the eventual reST conversion.

Signed-off-by: Akiyoshi Kurita <weibu@redadmin.org>
Reviewed-by: Akira Yokosawa <akiyks@gmail.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <20250924192426.2743495-1-weibu@redadmin.org>
2025-10-14 09:00:56 -06:00
Akiyoshi Kurita
f9e51009b0 Documentation: admin-guide: Correct spelling of "userspace"
The term "userspace" should be a single word. Fix the typo
"userpace" accordingly.

Signed-off-by: Akiyoshi Kurita <weibu@redadmin.org>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <20250926190019.41788-1-weibu@redadmin.org>
2025-10-14 08:57:34 -06:00
Hugo Osvaldo Barrera
8e3b02d260 Documentation/x86: explain LINUX_EFI_INITRD_MEDIA_GUID
Since the Handover Protocol was deprecated, the recommended approach is
to provide an initrd using a UEFI boot service with the
LINUX_EFI_INITRD_MEDIA_GUID device path. Documentation for the new
approach has been no more than an admonition with a link to an existing
implementation.

Provide a short explanation of this functionality, to ease future
implementations without having to reverse engineer existing ones.

[Bagas: Don't use :ref: link to EFI stub documentation and refer to
OVMF/edk2 implementation]

Signed-off-by: Hugo Osvaldo Barrera <hugo@whynothugo.nl>
Link: https://lore.kernel.org/r/20250428131206.8656-2-hugo@whynothugo.nl
Co-developed-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <20251013085718.27085-1-bagasdotme@gmail.com>
2025-10-14 07:53:40 -06:00
Tejun Heo
bd7143e74e sched_ext/tools: Add compat wrapper for scx_bpf_task_set_slice/dsq_vtime()
for sub-scheduler authority checks. Add compat wrappers which fall back to
direct p->scx field writes on older kernels.

Suggested-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-10-13 08:49:29 -10:00
Tejun Heo
cded46d971 sched_ext: Make scx_bpf_dsq_insert*() return bool
In preparation for hierarchical schedulers, change scx_bpf_dsq_insert() and
scx_bpf_dsq_insert_vtime() to return bool instead of void. With
sub-schedulers, there will be no reliable way to guarantee a task is still
owned by the sub-scheduler at insertion time (e.g., the task may have been
migrated to another scheduler). The bool return value will enable
sub-schedulers to detect and gracefully handle insertion failures.

For the root scheduler, insertion failures will continue to trigger scheduler
abort via scx_error(), so existing code doesn't need to check the return
value. Backward compatibility is maintained through compat wrappers.

Also update scx_bpf_dsq_move() documentation to clarify that it can return
false for sub-schedulers when @dsq_id points to a disallowed local DSQ.

Reviewed-by: Changwoo Min <changwoo@igalia.com>
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Acked-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-10-13 08:49:29 -10:00
Tejun Heo
c0d630ba34 sched_ext: Wrap kfunc args in struct to prepare for aux__prog
scx_bpf_dsq_insert_vtime() and scx_bpf_select_cpu_and() currently have 5
parameters. An upcoming change will add aux__prog parameter which will exceed
BPF's 5 argument limit.

Prepare by adding new kfuncs __scx_bpf_dsq_insert_vtime() and
__scx_bpf_select_cpu_and() that take args structs. The existing kfuncs are
kept as compatibility wrappers. BPF programs use inline wrappers that detect
kernel API version via bpf_core_type_exists() and use the new struct-based
kfuncs when available, falling back to compat kfuncs otherwise. This allows
BPF programs to work with both old and new kernels.

Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Acked-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-10-13 08:49:29 -10:00
Tejun Heo
3035addfaf sched_ext: Add scx_bpf_task_set_slice() and scx_bpf_task_set_dsq_vtime()
With the planned hierarchical scheduler support, sub-schedulers will need to
be verified for authority before being allowed to modify task->scx.slice and
task->scx.dsq_vtime. Add scx_bpf_task_set_slice() and
scx_bpf_task_set_dsq_vtime() which will perform the necessary permission
checks.

Root schedulers can still directly write to these fields, so this doesn't
affect existing schedulers.

Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-10-13 08:49:29 -10:00
Tejun Heo
111a79800a tools/sched_ext: Strip compatibility macros for cgroup and dispatch APIs
Enough time has passed since the introduction of scx_bpf_task_cgroup() and
the scx_bpf_dispatch* -> scx_bpf_dsq* kfunc renaming. Strip the compatibility
macros.

Acked-by: Changwoo Min <changwoo@igalia.com>
Acked-by: Andrea Righi <arighi@nvidia.com>
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-10-13 08:49:29 -10:00
Andrea Righi
0128c85051 sched_ext: Exit early on hotplug events during attach
There is no need to complete the entire scx initialization if a
scheduler is failing to be attached due to a hotplug event.

Exit early to avoid unnecessary work and simplify the attach flow.

Signed-off-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-10-13 08:45:01 -10:00
Roberto Sassu
8f3fc4f3f8 ima: Attach CREDS_CHECK IMA hook to bprm_creds_from_file LSM hook
Since commit 56305aa9b6 ("exec: Compute file based creds only once"), the
credentials to be applied to the process after execution are not calculated
anymore for each step of finding intermediate interpreters (including the
final binary), but only after the final binary to be executed without
interpreter has been found.

In particular, that means that the bprm_check_security LSM hook will not
see the updated cred->e[ug]id for the intermediate and for the final binary
to be executed, since the function doing this task has been moved from
prepare_binprm(), which calls the bprm_check_security hook, to
bprm_creds_from_file().

This breaks the IMA expectation for the CREDS_CHECK hook, introduced with
commit d906c10d8a ("IMA: Support using new creds in appraisal policy"),
which expects to evaluate "the credentials that will be committed when the
new process is started". This is clearly not the case for the CREDS_CHECK
IMA hook, which is attached to bprm_check_security.

This issue does not affect systems which load a policy with the BPRM_CHECK
hook with no other criteria, as is the case with the built-in "tcb" and/or
"appraise_tcb" IMA policies. The "tcb" built-in policy measures all
executions regardless of the new credentials, and the "appraise_tcb" policy
is written in terms of the file owner, rather than IMA hooks.

However, it does affect systems without a BPRM_CHECK policy rule or with a
BPRM_CHECK policy rule that does not include what CREDS_CHECK evaluates. As
an extreme example, taking a standalone rule like:

measure func=CREDS_CHECK euid=0

This will not measure for example sudo (because CREDS_CHECK still sees the
bprm->cred->euid set to the regular user UID), but only the subsequent
commands after the euid was applied to the children.

Make set[ug]id programs measured/appraised again by splitting
ima_bprm_check() in two separate hook implementations (CREDS_CHECK now
being implemented by ima_creds_check()), and by attaching CREDS_CHECK to
the bprm_creds_from_file LSM hook.

The limitation of this approach is that CREDS_CHECK will not be invoked
anymore for the intermediate interpreters, like it was before, but only for
the final binary. This limitation can be removed only by reverting commit
56305aa9b6 ("exec: Compute file based creds only once").

Link: https://github.com/linux-integrity/linux/issues/3
Fixes: 56305aa9b6 ("exec: Compute file based creds only once")
Cc: Serge E. Hallyn <serge@hallyn.com>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Jann Horn <jannh@google.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
2025-10-13 09:27:02 -04:00
Thomas Weißschuh
812f223fe9 tools/nolibc: handle NULL wstatus argument to waitpid()
wstatus is allowed to be NULL. Avoid a segmentation fault in this case.

Fixes: 0c89abf5ab ("tools/nolibc: implement waitpid() in terms of waitid()")
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Acked-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
2025-10-12 22:48:07 +02:00
Mauro Carvalho Chehab
e123e00a58 tools/docs: sphinx-build-wrapper: -q is a boolean, not an integer
As reported by Konstantin, sphinx-build -q is a boolean, not an integer.

Fix the code.

Reported-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
Closes: https://lore.kernel.org/all/871pnepxfy.fsf@trenco.lwn.net/
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Reviewed-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <cafa10cddce3e5342a66c73f3f51a17fb6c7f5d3.1759851791.git.mchehab+huawei@kernel.org>
2025-10-08 01:19:35 -06:00
doubled
89ac14006f docs/zh_CN: Add sd-parameters.rst translation
Translate .../scsi/sd-parameters.rst into Chinese.
Add sd-parameters into .../scsi/index.rst.

Update the translation through commit d835971b2b
("scsi: docs: convert sd-parameters.txt to ReST")

Signed-off-by: doubled <doubled@leap-io-kernel.com>
Signed-off-by: Alex Shi <alexs@kernel.org>
2025-10-02 21:36:23 +08:00
doubled
dca85d5916 docs/zh_CN: Add link_power_management_policy.rst translation
Translate .../scsi/link_power_management_policy.rst into Chinese.
Add link_power_management_policy into .../scsi/index.rst.

Update the translation through commit cbbc70a8cd
("scsi: docs: convert link_power_management_policy.txt to ReST")

Signed-off-by: doubled <doubled@leap-io-kernel.com>
Signed-off-by: Alex Shi <alexs@kernel.org>
2025-10-02 21:36:22 +08:00
doubled
f7c2e7108e docs/zh_CN: Add scsi-parameters.rst translation
Translate .../scsi/scsi-parameters.rst into Chinese.
Add scsi-parameters into .../scsi/index.rst.

Update the translation through commit bdb39c9509
("Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi")

Signed-off-by: doubled <doubled@leap-io-kernel.com>
Signed-off-by: Alex Shi <alexs@kernel.org>
2025-10-02 21:36:22 +08:00
doubled
fdca4c262a docs/zh_CN: Add scsi_eh.rst translation
Translate .../scsi/scsi_eh.rst into Chinese.
Add scsi_eh into .../scsi/index.rst.

Update the translation through commit a9dcee18a2
("scsi: documentation: scsi_eh: updates for EH changes")

Signed-off-by: doubled <doubled@leap-io-kernel.com>
Signed-off-by: Alex Shi <alexs@kernel.org>
2025-10-02 21:36:22 +08:00
doubled
4e841f7e41 docs/zh_CN: Add scsi_mid_low_api.rst translation
Translate .../scsi/scsi_mid_low_api.rst into Chinese.
Add scsi_mid_low_api into .../scsi/index.rst.

Update the translation through commit 73349697fd
("scsi: docs: Clean up some style in scsi_mid_low_api")

Signed-off-by: doubled <doubled@leap-io-kernel.com>
Signed-off-by: Alex Shi <alexs@kernel.org>
2025-10-02 21:36:22 +08:00
doubled
301e7b86d6 docs/zh_CN: Add scsi.rst translation
Translate .../scsi/scsi.rst into Chinese.
Add scsi into .../scsi/index.rst.

Update the translation through commit c4e672ac8c
("scsi: docs: introduction: Multiple cleanups")

Signed-off-by: doubled <doubled@leap-io-kernel.com>
Signed-off-by: Alex Shi <alexs@kernel.org>
2025-10-02 21:36:22 +08:00
doubled
791ca5860b docs/zh_CN: Add scsi/index.rst translation
Translate .../scsi/index.rst into Chinese and
update subsystem-apis.rst translation

Update the translation through commit 682b07d2ff
("scsi: docs: Organize the SCSI documentation")

Signed-off-by: doubled <doubled@leap-io-kernel.com>
Signed-off-by: Alex Shi <alexs@kernel.org>
2025-10-02 21:36:22 +08:00
Ben Guo
11441336dc docs/zh_CN: Update Rust index translation and add reference label
Update the translated rust/index.rst with new contents,
and add a reference label in rust/general-information.rst so
that index.rst can link to it properly.

Fixes in rust/index.rst:
- Fixed broken quick-start.rst cross-reference

Update the translation through commit d0b343605f
("kernel-docs: Add new section for Rust learning materials")

Signed-off-by: Ben Guo <benx.guo@gmail.com>
Reviewed-by: Yanteng Si <si.yanteng@linux.dev>
Signed-off-by: Alex Shi <alexs@kernel.org>
2025-10-02 21:26:35 +08:00
Shuo Zhao
23713aa5b4 docs/zh_CN: Add security SCTP Chinese translation
Translate .../security/SCTP.rst into Chinese.

Update the translation through commit da51bbcdba
("Docs: typos/spelling")

Signed-off-by: Shuo Zhao <zhaoshuo@cqsoftware.com.cn>
Reviewed-by: Yanteng Si <si.yanteng@linux.dev>
Signed-off-by: Alex Shi <alexs@kernel.org>
2025-10-02 21:21:22 +08:00
Mauro Carvalho Chehab
2bd22194b2 kernel-doc: output source file name at SEE ALSO
for man pages, it is helpful to know from where the man page
were generated.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <ac25496a27a0c90494a634d342207ef1ff6216e9.1759327966.git.mchehab+huawei@kernel.org>
2025-10-01 09:06:06 -06:00
Mauro Carvalho Chehab
0a4cd1c65e docs: Makefile: use PYTHONPYCACHEPREFIX
Previous cleanup patches ended dropping it when sphinx-build-wrapper
were added. Also, sphinx-pre-install can also generate caches.

So, re-add it for both.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <8c37576342994ea0e3466eec2602a8d989d9a5f0.1759328070.git.mchehab+huawei@kernel.org>
2025-10-01 08:51:10 -06:00
Mauro Carvalho Chehab
5401f971f5 tools/docs: sphinx-build-wrapper: pdflatex is needed only for pdf
Fix the logic which checks for pdflatex. Currently, it complains
for both pdfdocs and latexdocs, but for the latter, all it is
needed is Sphinx.

Reported-by: Akira Yokosawa <akiyks@gmail.com>
Closes: https://lore.kernel.org/linux-doc/cover.1758881658.git.mchehab+huawei@kernel.org/T/#ma81ff2e11b8579e5edc23e4381e464081ae668b7
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <313df7b4aab653e8fc40c30120c0dbebf8a0bcb1.1759328070.git.mchehab+huawei@kernel.org>
2025-10-01 08:51:10 -06:00
Mauro Carvalho Chehab
ff1354edb3 docs: Makefile: avoid a warning when using without texlive
As reported by Randy, running make htmldocs on a machine
without textlive now produce warnings:

    $ make O=DOCS htmldocs
    ../Documentation/Makefile:70: warning: overriding recipe for target 'pdfdocs'
    ../Documentation/Makefile:61: warning: ignoring old recipe for target 'pdfdocs'

That's because the code has now two definitions for pdfdocs in
case $PDFLATEX command is not found. With the new script, such
special case is not needed anymore, as the script checks it.

Drop the special case. Even after dropping it, on a machine
without LaTeX, it will still produce an error as expected,
as running:

    $ ./tools/docs/sphinx-build-wrapper pdfdocs
    Error: pdflatex or latexmk required for PDF generation

does the check. After applying the patch we have:

    $ make SPHINXDIRS=peci htmldocs
    Using alabaster theme
    Using Python kernel-doc

    $ make SPHINXDIRS=peci pdfdocs
    Error: pdflatex or latexmk required for PDF generation
    make[2]: *** [Documentation/Makefile:64: pdfdocs] Error 1
    make[1]: *** [/root/Makefile:1808: pdfdocs] Error 2
    make: *** [Makefile:248: __sub-make] Error 2

Which is the expected behavior.

Reported-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://lore.kernel.org/linux-doc/e7c29532-71de-496b-a89f-743cef28736e@infradead.org/
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <cd16a7436a510116ef87cd4abbb1f3cfe358012f.1759328070.git.mchehab+huawei@kernel.org>
2025-10-01 08:51:10 -06:00
Mauro Carvalho Chehab
4c6ece9180 tools/docs/sphinx-build-wrapper: allow skipping sphinx-build step
Most targets have two steps:
- step 1: run sphinx-build;
- step 2: run a post-build logic.

The second step can be as simple as copying static files like CSS,
but may may also envolve running make. allowing to skip the first
step helps debugging what's broken, and also allows using make
command line arguments like --ignore-errors.

Add an option to skip step 1.

Requested-by: Akira Yokosawa <akiyks@gmail.com>
Link: https://lore.kernel.org/linux-doc/5031e0c4-f17e-41b8-8955-959989e797f2@gmail.com/
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <30dae295606ae1735f3bbeef4ca041a76dcd4540.1758539031.git.mchehab+huawei@kernel.org>
2025-09-25 11:07:18 -06:00
Mauro Carvalho Chehab
683dd3f79a docs: Makefile: fix rustdoc detection
During cleanups, the logic checking if .config exists were dropped,
but removing it causes false-positives. So, re-add it.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <f414ff9b21eeb770ca1f57eb1496cc064c9bab15.1758539031.git.mchehab+huawei@kernel.org>
2025-09-25 11:07:18 -06:00
Mauro Carvalho Chehab
35b9d338e4 tools/docs: sphinx-build-wrapper: fix compat with recent Tumbleweed
On recent versions of openSUSE Tumbleweed, sphinx-buildis is no longer
a Python script, but something else. Such change is due to how
it now handles alternatives:

    https://en.opensuse.org/openSUSE:Migrating_to_libalternatives_with_alts

The most common approach that distros use for alternatives is via
symlinks:

    lrwxrwxrwx 1 root root 22 out 31  2024 /usr/bin/java -> /etc/alternatives/java
    lrwxrwxrwx 1 root root 37 mar  5  2025 /etc/alternatives/java -> /usr/lib/jvm/java-21-openjdk/bin/java

With such approach, one can sun the script with either:

    <sphinx>
    python3 <script>

However, openSUSE's implementation uses an ELF binary (/usr/bin/alts),
which breaks the latter format.

It is needed to allow users to specify the Python version to be
used while building docs, as some distros like Leap 15.x are
shipped with:

- older, unsupported python3/python3-sphinx packages;
- more modern python3xx/python3xx-sphinx packages that work
  properly.

On such distros, building docs require running make with:

    make PYTHON3=python3.11 htmldocs

Heh, even on more moderen distros where python3-sphinx
is supported, one may still want to use a newer package,
for instance, due to performance issues, as:

    - with Python < 3.11, kernel-doc is 3 times slower;
    - while building htmldocs with Python 3.13/Sphinx 8.x
      takes about 3 minutes on a modern machine, using
      Sphinx < 8.0 can take up to 16 minutes to build docs
      (7.x are the worse ones and require lots of RAM).

So, even with not too old distros, one still may want to use
for instance PYTHON3=python3.11.

To acommodate using PYTHON3 without breaking on Tumbleweed,
add a workaround that will only use:

    $(PYTHON3) sphinx-build

if PYTHON3 env var is not default.

While here, drop the special check for venv, as, with venv,
we can just call sphinx-build directly without any extra
checks.

Reported-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://lore.kernel.org/all/883df949-0281-4a39-8745-bcdcce3a5594@infradead.org/
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <8917f862e0b8484c68408c274129c9f37a7aefb4.1758539031.git.mchehab+huawei@kernel.org>
2025-09-25 11:07:18 -06:00
Mauro Carvalho Chehab
72603d73fa docs: conf.py: get rid of load_config.py
The code here was meant to handle 3 functions:
1. allow having a separate conf.py file, per subdir;
2. generate a list of latex documents.
3. set "subproject" tag if SPHINXDIRS points to a subdir.

We don't have (1) anymore, and (3) is now properly handled
entirely inside conf.py.

So, only (3) is still needed, and this is a single-line change
at conf.py.

So, drop it, moving the remaining code to conf.py.

While here, drop a duplicated $(RUSTDOC) command-line argument.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <ec998f9f268a401ca6aa36e3221d39c97efeccaa.1758361087.git.mchehab+huawei@kernel.org>
2025-09-21 16:40:37 -06:00
Mauro Carvalho Chehab
c2381e8a61 scripts: remove sphinx-build-wrapper from scripts/
Commit 8a298579cd ("scripts: sphinx-build-wrapper: get rid of uapi/media Makefile")
accidentally added scripts/sphinx-build-wrapper, probably due
to some rebase issues.

The file was added on a separate patch series, at tools/docs,
and has other patches on the top of it, so drop this extra
version.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <d26137d908dc7813fafcded2c728ec837e4df073.1758361087.git.mchehab+huawei@kernel.org>
2025-09-21 16:40:37 -06:00
Mauro Carvalho Chehab
0aa9c0395e tools/docs: sphinx-build-wrapper: handle sphinx-build errors
If sphinx-build returns an error, exit the script.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <b7e152291fadd91694cbb6b086caefa4b6470fdd.1758361087.git.mchehab+huawei@kernel.org>
2025-09-21 16:40:37 -06:00
Wang Yaxin
33d80c67d3 Docs/zh_CN: Translate timestamping.rst to Simplified Chinese
translate the "timestamping.rst" into Simplified Chinese.

Update the translation through commit d5c17e3654
("docs: networking: timestamping: improve stacked PHC sentence")

Signed-off-by: Wang Yaxin <wang.yaxin@zte.com.cn>
Signed-off-by: Sun yuxi <sun.yuxi@zte.com.cn>
Reviewed-by: xu xin <xu.xin16@zte.com.cn>
Signed-off-by: Alex Shi <alexs@kernel.org>
2025-09-20 15:44:48 +08:00
Shuo Zhao
7404c6b78a docs/zh_CN: Add security lsm-development Chinese translation
Translate .../security/lsm-development.rst into Chinese.

Update the translation through commit 6d2ed65318
("lsm: move hook comments docs to security/security.c").

Signed-off-by: Shuo Zhao <zhaoshuo@cqsoftware.com.cn>
Reviewed-by: Yanteng Si <siyanteng@cqsoftware.com.cn>
Reviewed-by: Yanteng Si <si.yanteng@linux.dev>
Signed-off-by: Alex Shi <alexs@kernel.org>
2025-09-20 15:40:06 +08:00
shaomingyin
679e5d29b5 Docs/zh_CN: fix the format of proofreader
fix the format of proofreader for
Documentation/translations/zh_CN/filesystems/gfs2-uevents.rst
Documentation/translations/zh_CN/filesystems/gfs2.rst
Documentation/translations/zh_CN/filesystems/ubifs-authentication.rst
Documentation/translations/zh_CN/filesystems/ubifs.rst

Signed-off-by: Shao Mingyin <shao.mingyin@zte.com.cn>
Signed-off-by: Alex Shi <alexs@kernel.org>
2025-09-20 15:37:11 +08:00
shaomingyin
46b194beee Docs/zh_CN: align title underline for ubifs.rst
align title underline for ubifs.rst

Signed-off-by: Shao Mingyin <shao.mingyin@zte.com.cn>
Reviewed-by: Dongliang Mu <dzm91@hust.edu.cn>
Signed-off-by: Alex Shi <alexs@kernel.org>
2025-09-20 15:36:44 +08:00
shaomingyin
b8c1494b32 Docs/zh_CN: add fixed format for the header of gfs2-glocks.rst
add fixed format for the header of gfs2-glocks.rst

Signed-off-by: Shao Mingyin <shao.mingyin@zte.com.cn>
Reviewed-by: Dongliang Mu <dzm91@hust.edu.cn>
Signed-off-by: Alex Shi <alexs@kernel.org>
2025-09-20 15:36:13 +08:00
Mauro Carvalho Chehab
42180ada39 tools/docs: sphinx-build-wrapper: add support to run inside venv
Sometimes, it is desired to run Sphinx from a virtual environment.
Add a command line parameter to automatically build Sphinx from
such environment.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Message-ID: <e34fa63a61e75a0ec86b37c9b5fafa6677f44c6c.1758196090.git.mchehab+huawei@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2025-09-18 11:20:48 -06:00
Mauro Carvalho Chehab
62ea383b44 tools/docs: sphinx-* break documentation bulds on openSUSE
Before this patch, building htmldocs on opensuseLEAP works
fine:

    # make htmldocs
    Available Python versions:
      /usr/bin/python3.11

    Python 3.6.15 not supported. Changing to /usr/bin/python3.11
    Python 3.6.15 not supported. Changing to /usr/bin/python3.11
    Using alabaster theme
    Using Python kernel-doc

    ...

As the logic detects that Python 3.6 is too old and recommends
intalling python311-Sphinx. If installed, documentation builds
work like a charm.

Yet, some develpers complained that running python3.11 instead
of python3 should not happen. So, let's break the build to make
them happier:

    $ make htmldocs
    Python 3.6.15 not supported. Bailing out
    You could run, instead:
      /usr/bin/python3.11 tools/docs/sphinx-build-wrapper htmldocs \
        --sphinxdirs=. --conf=conf.py --builddir=Documentation/output --theme= --css= \
        --paper=

    Python 3.6.15 not supported. Bailing out
    make[2]: *** [Documentation/Makefile:76: htmldocs] Error 1
    make[1]: *** [Makefile:1806: htmldocs] Error 2
    make: *** [Makefile:248: __sub-make] Error 2

It should be noticed that:

1. after this change, sphinx-pre-install needs to be called
   by hand:

    $ /usr/bin/python3.11 tools/docs/sphinx-pre-install
    Detected OS: openSUSE Leap 15.6.
    Sphinx version: 7.2.6

    All optional dependencies are met.
    Needed package dependencies are met.

2. sphinx-build-wrapper will auto-detect python3.11 and
   suggest a way to build the docs using the parameters passed
   via make variables. In this specific example:

   /usr/bin/python3.11 tools/docs/sphinx-build-wrapper htmldocs --sphinxdirs=. --conf=conf.py --theme= --css= --paper=

3. As this needs to be executed outside docs Makefile, it won't run
   the validation check scripts nor build Rust documentation if
   enabled, as the extra scripts are part of the docs Makefile.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Message-ID: <0635c311295300e9fb48c0ea607e2408910036e3.1758196090.git.mchehab+huawei@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2025-09-18 11:20:48 -06:00
Mauro Carvalho Chehab
2118ba7da6 tools/docs: sphinx-build-wrapper: move rust doc builder to wrapper
Simplify even further the docs Makefile by moving rust build logic
to the wrapper.

After this change, running make on an environment with rust enabled
works as expected.

With CONFIG_RUST:

    $ make O=/tmp/foo LLVM=1 SPHINXDIRS=peci htmldocs
    make[1]: Entrando no diretório '/tmp/foo'

    Using alabaster theme
    Using Python kernel-doc
      GEN     Makefile
      DESCEND objtool
      CC      arch/x86/kernel/asm-offsets.s
      INSTALL libsubcmd_headers
      CALL    /new_devel/docs/scripts/checksyscalls.sh
      RUSTC L rust/core.o
      BINDGEN rust/bindings/bindings_generated.rs
      BINDGEN rust/bindings/bindings_helpers_generated.rs
    ...

Without it:
    $ make SPHINXDIRS=peci htmldocs
    Using alabaster theme
    Using Python kernel-doc

Both work as it is it is supposed to do.

After the change, it is also possible to build directly with the script
by passing "--rustodoc".

if CONFIG_RUST, this works fine:

    $ ./tools/docs/sphinx-build-wrapper --sphinxdirs peci --rustdoc -- htmldocs
    Using alabaster theme
    Using Python kernel-doc
      SYNC    include/config/auto.conf
    ...
      RUSTC L rust/core.o
    ...

If not, it will produce a warning that RUST may be disabled:

    $ ./tools/docs/sphinx-build-wrapper --sphinxdirs peci --rustdoc -- htmldocs
    Using alabaster theme
    Using Python kernel-doc
    ***
    *** Configuration file ".config" not found!
    ***
    *** Please run some configurator (e.g. "make oldconfig" or
    *** "make menuconfig" or "make xconfig").
    ***
    make[1]: *** [/new_devel/docs/Makefile:829: .config] Error 1
    make: *** [Makefile:248: __sub-make] Error 2
    Ignored errors when building rustdoc: Command '['make', 'LLVM=1', 'rustdoc']' returned non-zero exit status 2.. Is RUST enabled?

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Message-ID: <fa1235ccf859f6ebfeef7ffba0ebde2015a75042.1758196090.git.mchehab+huawei@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2025-09-18 11:20:46 -06:00
Mauro Carvalho Chehab
ade9b9576e scripts: kdoc_parser.py: warn about Python version only once
When running kernel-doc over multiple documents, it emits
one error message per file with is not what we want:

	$ python3.6 scripts/kernel-doc.py . --none
	...
	Warning: ./include/trace/events/swiotlb.h:0 Python 3.7 or later is required for correct results
	Warning: ./include/trace/events/iommu.h:0 Python 3.7 or later is required for correct results
	Warning: ./include/trace/events/sock.h:0 Python 3.7 or later is required for correct results
	...

Change the logic to warn it only once at the library:

	$ python3.6 scripts/kernel-doc.py . --none
	Warning: Python 3.7 or later is required for correct results
	Warning: ./include/cxl/features.h:0 Python 3.7 or later is required for correct results

When running from command line, it warns twice, but that sounds
ok.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Message-ID: <68e54cf8b1201d1f683aad9bc710a99421910356.1758196090.git.mchehab+huawei@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2025-09-18 11:19:59 -06:00
Mauro Carvalho Chehab
104e0a682e tools: kernel-doc: add a see also section at man pages
While cross-references are complex, as related ones can be on
different files, we can at least correlate the ones that belong
to the same file, adding a SEE ALSO section for them.

The result is not bad. See for instance:

	$ tools/docs/sphinx-build-wrapper --sphinxdirs driver-api/media -- mandocs
	$ man Documentation/output/driver-api/man/edac_pci_add_device.9

	edac_pci_add_device(9)  Kernel Hacker's Manual  edac_pci_add_device(9)

	NAME
	       edac_pci_add_device  - Insert the 'edac_dev' structure into the
	       edac_pci global list and create sysfs entries  associated  with
	       edac_pci structure.

	SYNOPSIS
	       int  edac_pci_add_device  (struct  edac_pci_ctl_info *pci , int
	       edac_idx );

	ARGUMENTS
	       pci         pointer to the edac_device structure to be added to
	                   the list

	       edac_idx    A unique numeric identifier to be assigned to the

	RETURN
	       0 on Success, or an error code on failure

	SEE ALSO
	       edac_pci_alloc_ctl_info(9),          edac_pci_free_ctl_info(9),
	       edac_pci_alloc_index(9),  edac_pci_del_device(9), edac_pci_cre‐
	       ate_generic_ctl(9),            edac_pci_release_generic_ctl(9),
	       edac_pci_create_sysfs(9), edac_pci_remove_sysfs(9)

	August 2025               edac_pci_add_device   edac_pci_add_device(9)

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Message-ID: <fba25efb41eadad17a54e6275a6191173d702f00.1758196090.git.mchehab+huawei@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2025-09-18 11:19:58 -06:00
Mauro Carvalho Chehab
7e8a8143ec docs: add support to build manpages from kerneldoc output
Generating man files currently requires running a separate
script. The target also doesn't appear at the docs Makefile.

Add support for mandocs at the Makefile, adding the build
logic inside sphinx-build-wrapper, updating documentation
and dropping the ancillary script.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Message-ID: <3d248d724e7f3154f6e3a227e5923d7360201de9.1758196090.git.mchehab+huawei@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2025-09-18 11:19:57 -06:00
Mauro Carvalho Chehab
0d9abc7627 tools/docs: sphinx-build-wrapper: Fix output for duplicated names
When SPHINXDIRS is used, basename may be identical for different
files. If this happens, the summary and error detection won't be
accurate.

Fix it by using relative names from builddir.

While here, don't duplicate names. Report, instead:

- SUCCESS
    output PDF file was built
- FAILED
    latexmk/xelatex didn't build any PDF output
- FAILED: no .tex files were generated
    Sphinx didn't build any tex file for SPHINXDIRS directories
- FAILED ({python exception})
    When a concurrent.futures is catched. Usually indicates an
    internal error at the build logic.

With that, building multiple dirs with the same name is reported
properly:

    $ make V=1 SPHINXDIRS="admin-guide/media driver-api/media userspace-api/media" pdfdocs

    Summary
    =======
    admin-guide/media/pdf/media.pdf  : SUCCESS
    driver-api/media/pdf/media.pdf   : SUCCESS
    userspace-api/media/pdf/media.pdf: SUCCESS

And if at least one of them fails, return code will be 1.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Message-ID: <d4a4f16f6c0c423ad38531a490888be3bf01e574.1758196090.git.mchehab+huawei@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2025-09-18 11:18:39 -06:00
Mauro Carvalho Chehab
82c294d453 tools/docs,scripts: sphinx-*: prevent sphinx-build crashes
On a properly set system, LANG and LC_ALL is always defined.
However, some distros like Debian, Gentoo and their variants
start with those undefioned.

When Sphinx tries to set a locale with:

	locale.setlocale(locale.LC_ALL, '')

It raises an exception, making Sphinx fail. This is more likely
to happen with test containers.

Add a logic to detect and workaround such issue by setting
locale to C.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Message-ID: <1d0afad8fe3d83182be3a08eb00dd71322e23e69.1758196090.git.mchehab+huawei@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2025-09-18 11:18:39 -06:00
Mauro Carvalho Chehab
08e14bc17e tools/docs: sphinx-build-wrapper: allow building PDF files in parallel
Use POSIX jobserver when available or -j<number> to run PDF
builds in parallel, restoring pdf build performance. Yet,
running it when debugging troubles is a bad idea, so, when
calling directly via command line, except if "-j" is splicitly
requested, it will serialize the build.

With such change, a PDF doc builds now takes around 5 minutes
on a Ryzen 9 machine with 32 cpu threads:

	# Explicitly paralelize both Sphinx and LaTeX pdf builds
	$ make cleandocs; time scripts/sphinx-build-wrapper pdfdocs -j 33

	real	5m17.901s
	user	15m1.499s
	sys	2m31.482s

	# Use POSIX jobserver to paralelize both sphinx-build and LaTeX
	$ make cleandocs; time make pdfdocs

	real	5m22.369s
	user	15m9.076s
	sys	2m31.419s

	# Serializes PDF build, while keeping Sphinx parallelized.
	# it is equivalent of passing -jauto via command line
	$ make cleandocs; time scripts/sphinx-build-wrapper pdfdocs

	real	11m20.901s
	user	13m2.910s
	sys	1m44.553s

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Message-ID: <42eef319f9af6f9feb12bcd74ca6392c8119929d.1758196090.git.mchehab+huawei@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2025-09-18 11:18:39 -06:00
Mauro Carvalho Chehab
2f99b85e22 tools/docs: sphinx-build-wrapper: add an argument for LaTeX interactive mode
By default, we use LaTeX batch mode to build docs. This way, when
an error happens, the build fails. This is good for normal builds,
but when debugging problems with pdf generation, the best is to
use interactive mode.

We already support it via LATEXOPTS, but having a command line
argument makes it easier:

Interactive mode:
	./scripts/sphinx-build-wrapper pdfdocs --sphinxdirs peci -v -i
	...
	Running 'xelatex --no-pdf  -no-pdf -recorder  ".../Documentation/output/peci/latex/peci.tex"'
	...

Default batch mode:
        ./scripts/sphinx-build-wrapper pdfdocs --sphinxdirs peci -v
	...
	Running 'xelatex --no-pdf  -no-pdf -interaction=batchmode -no-shell-escape -recorder  ".../Documentation/output/peci/latex/peci.tex"'
	...

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Message-ID: <9e5b9a8becc981b47ca3bf3ddce034f273400738.1758196090.git.mchehab+huawei@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2025-09-18 11:18:39 -06:00
Mauro Carvalho Chehab
c6879037a1 docs: Makefile: document FONTS_CONF_DENY_VF= parameter
This parameter is there for some time, but it doesn't have anything
documenting it at make help.

Add some documentation, pointing to the place where one can find
more details.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Message-ID: <a84680c8f6f34e57c51829242ebc98a609af94c1.1758196090.git.mchehab+huawei@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2025-09-18 11:18:39 -06:00
Mauro Carvalho Chehab
c514b13fd0 docs: Makefile: document latex/PDF PAPER= parameter
While the build system supports this for a long time, this was
never documented. Add a documentation for it.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Message-ID: <9c7b34db18642081d22c36a4203f341c1100341e.1758196090.git.mchehab+huawei@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2025-09-18 11:18:39 -06:00
Mauro Carvalho Chehab
2e1760999e docs: parallel-wrapper.sh: remove script
The only usage of this script was docs Makefile. Now that
it is using the new sphinx-build-wrapper, which has inside
the code from parallel-wrapper.sh, we can drop this script.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Message-ID: <c8ce0c78d0761370e4c9091a9423d9df9a357187.1758196090.git.mchehab+huawei@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2025-09-18 11:18:39 -06:00
Mauro Carvalho Chehab
819667bc3c tools/docs: sphinx-build-wrapper: add a wrapper for sphinx-build
There are too much magic inside docs Makefile to properly run
sphinx-build. Create an ancillary script that contains all
kernel-related sphinx-build call logic currently at Makefile.

Such script is designed to work both as an standalone command
and as part of a Makefile. As such, it properly handles POSIX
jobserver used by GNU make.

On a side note, there was a line number increase due to the
conversion (ignoring comments) is:

 Documentation/Makefile          |  131 +++----------
 tools/docs/sphinx-build-wrapper |  293 +++++++++++++++++++++++++++++++
 2 files changed, 323 insertions(+), 101 deletions(-)

Comments and descriptions adds:
 tools/docs/sphinx-build-wrapper | 261 +++++++++++++++++++++++++++++++-

So, about half of the script are comments/descriptions.

This is because some things are more verbosed on Python and because
it requires reading env vars from Makefile. Besides it, this script
has some extra features that don't exist at the Makefile:

- It can be called directly from command line;
- It properly return PDF build errors.

When running the script alone, it will only take handle sphinx-build
targets. On other words, it won't runn make rustdoc after building
htmlfiles, nor it will run the extra check scripts.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Message-ID: <80ae57b01fcfb1d338d93b8f8e26e57b69b5f16b.1758196090.git.mchehab+huawei@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2025-09-18 11:18:36 -06:00
Mauro Carvalho Chehab
adf9dc2592 tools/docs: python_version: move version check from sphinx-pre-install
The sphinx-pre-install code has some logic to deal with Python
version, which ensures that a minimal version will be enforced
for documentation build logic.

Move it to a separate library to allow re-using its code.

No functional changes.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Message-ID: <d134ace64b55c827565ce68f0527e20c735f0d2e.1758196090.git.mchehab+huawei@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2025-09-18 11:17:19 -06:00
Mauro Carvalho Chehab
3f835cb123 tools/docs: sphinx-pre-install: allow check for alternatives and bail out
The caller script may not want an automatic execution of the new
version. Add two parameters to allow showing alternatives and to
bail out if version is incompatible.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Message-ID: <19777bc710bf901ffbb0ad0f1bb57b18fc01b163.1758196090.git.mchehab+huawei@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2025-09-18 11:17:19 -06:00
Mauro Carvalho Chehab
4880eac5bc tools/docs: sphinx-pre-install: drop a debug print
The version print at the lib was added for debugging purposes.
Get rid of it.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Message-ID: <27f76a4df2b80c38d277d58a92c85c614544e013.1758196090.git.mchehab+huawei@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2025-09-18 11:17:18 -06:00
Mauro Carvalho Chehab
abd61d1ff8 scripts: sphinx-pre-install: move it to tools/docs
As we're reorganizing the place where doc scripts are located,
move this one to tools/docs.

No functional changes.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Message-ID: <5e2c40d3aebfd67b7ac7817f548bd1fa4ff661a8.1758196090.git.mchehab+huawei@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2025-09-18 11:17:18 -06:00
Mauro Carvalho Chehab
92ea342ff6 check-variable-fonts.py: add a helper to display instructions
Use lib docstring to output the comments via --help/-h. With
that, update the default instructions to recomment it instead
of asking the user to read the source code.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Message-ID: <577162cf4e07de74c4a783f16e3404f0040e5e0a.1758196090.git.mchehab+huawei@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2025-09-18 11:17:18 -06:00
Mauro Carvalho Chehab
4515ffdf3c tools/docs: check-variable-fonts.py: split into a lib and an exec file
As we'll be using the actual code inside sphinx-build-wrapper,
split the library from the executable, placing the exec at
the new place we've been using:

	tools/docs

No functional changes.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Message-ID: <8adbc22df1d43b1c5a673799d2333cc429ffe9fc.1758196090.git.mchehab+huawei@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2025-09-18 11:17:18 -06:00
Mauro Carvalho Chehab
75539bec27 scripts: check-variable-fonts.sh: convert to Python
This script handle errors when trying to build translations
with make pdfdocs.

As part of our cleanup work to remove hacks from docs Makefile,
convert this to python, preparing it to be part of a library
to be called by sphinx-build-wrapper.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Message-ID: <d438fb01d2c00e2c2b4ac16f999d9a8ce848251b.1758196090.git.mchehab+huawei@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2025-09-18 11:17:18 -06:00
Mauro Carvalho Chehab
a84a5d0b5a scripts/jobserver-exec: add a help message
Currently, calling it without an argument shows an ugly error
message. Instead, print a message using pythondoc as description.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Message-ID: <64b0339eac54ac0f2b3de3667a7f4f5becb1c6ae.1758196090.git.mchehab+huawei@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2025-09-18 11:17:18 -06:00
Mauro Carvalho Chehab
fce6df7e73 scripts/jobserver-exec: move its class to the lib directory
To make it easier to be re-used, move the JobserverExec class
to the library directory.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Message-ID: <6be7b161b6c005a9807162ebfd239af6a4e6fa47.1758196090.git.mchehab+huawei@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2025-09-18 11:17:18 -06:00
Mauro Carvalho Chehab
2a14f02121 scripts/jobserver-exec: move the code to a class
Convert the code inside jobserver-exec to a class and
properly document it.

Using a class allows reusing the jobserver logic on other
scripts.

While the main code remains unchanged, being compatible with
Python 2.6 and 3.0+, its coding style now follows a more
modern standard, having tabs replaced by a 4-spaces
indent, passing autopep8, black and pylint.

The code allows using a pythonic way to enter/exit a python
code, e.g. it now supports:

	with JobserverExec() as jobserver:
	    jobserver.run(sys.argv[1:])

With the new code, the __exit__() function should ensure
that the jobserver slot will be closed at the end, even if
something bad happens somewhere.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Message-ID: <4749921b75d4e0bd85a25d4d94aa2c940fad084e.1758196090.git.mchehab+huawei@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2025-09-18 11:17:18 -06:00
Shuo Zhao
943b764cb6 docs/zh_CN: Add security ipe Chinese translation
Translate .../security/ipe.rst into Chinese.

Update the translation through commit ac6731870e
("documentation: add IPE documentation")

Signed-off-by: Shuo Zhao <zhaoshuo@cqsoftware.com.cn>
Reviewed-by: Yanteng Si <siyanteng@cqsoftware.com.cn>
Signed-off-by: Alex Shi <alexs@kernel.org>
2025-09-16 22:03:23 +08:00
Sun yuxi
a513d96280 Docs/zh_CN: Translate generic-hdlc.rst to Simplified Chinese
translate the "generic-hdlc.rst" into Simplified Chinese.

Update the translation through commit 16128ad8f9
("docs: networking: convert generic-hdlc.txt to ReST")

Signed-off-by: Wang Yaxin <wang.yaxin@zte.com.cn>
Signed-off-by: Sun yuxi <sun.yuxi@zte.com.cn>
Reviewed-by: xu xin <xu.xin16@zte.com.cn>
Signed-off-by: Alex Shi <alexs@kernel.org>
2025-09-16 21:52:54 +08:00
Wang Yaxin
478bb02b07 Docs/zh_CN: Translate skbuff.rst to Simplified Chinese
translate the "skbuff.rst" into Simplified Chinese.

Update the translation through commit 9facd94114
("skbuff: render the checksum comment to documentation")

Signed-off-by: Wang Yaxin <wang.yaxin@zte.com.cn>
Signed-off-by: Sun yuxi <sun.yuxi@zte.com.cn>
Reviewed-by: xu xin <xu.xin16@zte.com.cn>
Signed-off-by: Alex Shi <alexs@kernel.org>
2025-09-16 21:52:54 +08:00
Sun yuxi
733b8bdfe7 Docs/zh_CN: Translate mptcp-sysctl.rst to Simplified Chinese
translate the "mptcp-sysctl.rst" into Simplified Chinese.

Update the translation through commit fa3ee9dd80
("mptcp: sysctl: add available_path_managers")

Signed-off-by: Sun yuxi <sun.yuxi@zte.com.cn>
Reviewed-by: xu xin <xu.xin16@zte.com.cn>
Signed-off-by: Alex Shi <alexs@kernel.org>
2025-09-16 21:52:54 +08:00
Wang Longjie
e63b0d0e04 Docs/zh_CN: Translate inotify.rst to Simplified Chinese
translate the "inotify.rst" into Simplified Chinese.

Update to commit de389cf08d47("docs: filesystems: convert inotify.txt to
ReST")

Signed-off-by: Wang Longjie <wang.longjie1@zte.com.cn>
Signed-off-by: Shao Mingyin <shao.mingyin@zte.com.cn>
Signed-off-by: Alex Shi <alexs@kernel.org>
2025-09-16 21:32:22 +08:00
Wang Longjie
9a3d2d98f6 Docs/zh_CN: Translate dnotify.rst to Simplified Chinese
translate the "dnotify.rst" into Simplified Chinese.

Update to commit b31763cff488("docs: filesystems: convert dnotify.txt to
ReST")

Signed-off-by: Wang Longjie <wang.longjie1@zte.com.cn>
Signed-off-by: Shao Mingyin <shao.mingyin@zte.com.cn>
Signed-off-by: Alex Shi <alexs@kernel.org>
2025-09-16 21:32:22 +08:00
Shao Mingyin
1c8e194113 Docs/zh_CN: Translate gfs2-glocks.rst to Simplified Chinese
translate the "gfs2-glocks.rst" into Simplified Chinese.

Update to commit 713f8834389f("gfs2: Get rid of emote_ok
checks")

Signed-off-by: Shao Mingyin <shao.mingyin@zte.com.cn>
Signed-off-by: yang tao <yang.tao172@zte.com.cn>
Signed-off-by: Alex Shi <alexs@kernel.org>
2025-09-16 21:32:22 +08:00
Shao Mingyin
a1cce3f467 Docs/zh_CN: Translate gfs2-uevents.rst to Simplified Chinese
translate the "gfs2-uevents.rst" into Simplified Chinese.

Update to commit 5b7ac27a6e2c("docs: filesystems: convert
gfs2-uevents.txt to ReST")

Signed-off-by: Shao Mingyin <shao.mingyin@zte.com.cn>
Signed-off-by: yang tao <yang.tao172@zte.com.cn>
Signed-off-by: Alex Shi <alexs@kernel.org>
2025-09-16 21:32:22 +08:00
Shao Mingyin
8c6c6e9564 Docs/zh_CN: Translate gfs2.rst to Simplified Chinese
translate the "gfs2.rst" into Simplified Chinese.

Update to commit d9593868cd58("Documentation: Update
filesystems/gfs2.rst")

Signed-off-by: Shao Mingyin <shao.mingyin@zte.com.cn>
Signed-off-by: yang tao <yang.tao172@zte.com.cn>
Signed-off-by: Alex Shi <alexs@kernel.org>
2025-09-16 21:32:22 +08:00
Shao Mingyin
a46d47ae30 Docs/zh_CN: Translate ubifs-authentication.rst to Simplified Chinese
translate the "ubifs-authentication.rst" into Simplified Chinese.

Update to commit d56b699d76d1("Documentation: Fix typos")

Signed-off-by: Shao Mingyin <shao.mingyin@zte.com.cn>
Signed-off-by: yang tao <yang.tao172@zte.com.cn>
Signed-off-by: Alex Shi <alexs@kernel.org>
2025-09-16 21:32:21 +08:00
Shao Mingyin
0e6d01c464 Docs/zh_CN: Translate ubifs.rst to Simplified Chinese
translate the "ubifs.rst" into Simplified Chinese.

Update to commit 5f5cae9b0e81("Documentation: ubifs: Fix
compression idiom")

Signed-off-by: Shao Mingyin <shao.mingyin@zte.com.cn>
Signed-off-by: yang tao <yang.tao172@zte.com.cn>
Signed-off-by: Alex Shi <alexs@kernel.org>
2025-09-16 21:32:21 +08:00
Konstantin Andreev
6ddd169d02 smack: fix kernel-doc warnings for smk_import_valid_label()
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202506251712.x5SJiNlh-lkp@intel.com/
Signed-off-by: Konstantin Andreev <andreev@swemel.ru>
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
2025-06-30 13:02:49 -07:00
Konstantin Andreev
674e2b2479 smack: fix bug: setting task label silently ignores input garbage
This command:
    # echo foo/bar >/proc/$$/attr/smack/current

gives the task a label 'foo' w/o indication
that label does not match input.
Setting the label with lsm_set_self_attr() syscall
behaves identically.

This occures because:

1) smk_parse_smack() is used to convert input to a label
2) smk_parse_smack() takes only that part from the
   beginning of the input that looks like a label.
3) `/' is prohibited in labels, so only "foo" is taken.

(2) is by design, because smk_parse_smack() is used
for parsing strings which are more than just a label.

Silent failure is not a good thing, and there are two
indicators that this was not done intentionally:

    (size >= SMK_LONGLABEL) ~> invalid

clause at the beginning of the do_setattr() and the
"Returns the length of the smack label" claim
in the do_setattr() description.

So I fixed this by adding one tiny check:
the taken label length == input length.

Since input length is now strictly controlled,
I changed the two ways of setting label

   smack_setselfattr(): lsm_set_self_attr() syscall
   smack_setprocattr(): > /proc/.../current

to accommodate the divergence in
what they understand by "input length":

  smack_setselfattr counts mandatory \0 into input length,
  smack_setprocattr does not.

  smack_setprocattr allows various trailers after label

Related changes:

* fixed description for smk_parse_smack

* allow unprivileged tasks validate label syntax.

* extract smk_parse_label_len() from smk_parse_smack()
  so parsing may be done w/o string allocation.

* extract smk_import_valid_label() from smk_import_entry()
  to avoid repeated parsing.

* smk_parse_smack(): scan null-terminated strings
  for no more than SMK_LONGLABEL(256) characters

* smack_setselfattr(): require struct lsm_ctx . flags == 0
  to reserve them for future.

Fixes: e114e47377 ("Smack: Simplified Mandatory Access Control Kernel")
Signed-off-by: Konstantin Andreev <andreev@swemel.ru>
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
2025-06-24 16:30:24 -07:00
Konstantin Andreev
c147e13ea7 smack: fix bug: unprivileged task can create labels
If an unprivileged task is allowed to relabel itself
(/smack/relabel-self is not empty),
it can freely create new labels by writing their
names into own /proc/PID/attr/smack/current

This occurs because do_setattr() imports
the provided label in advance,
before checking "relabel-self" list.

This change ensures that the "relabel-self" list
is checked before importing the label.

Fixes: 38416e5393 ("Smack: limited capability for changing process label")
Signed-off-by: Konstantin Andreev <andreev@swemel.ru>
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
2025-06-24 16:30:24 -07:00
Konstantin Andreev
78fc6a94be smack: fix bug: invalid label of unix socket file
According to [1], the label of a UNIX domain socket (UDS)
file (i.e., the filesystem object representing the socket)
is not supposed to participate in Smack security.

To achieve this, [1] labels UDS files with "*"
in smack_d_instantiate().

Before [2], smack_d_instantiate() was responsible
for initializing Smack security for all inodes,
except ones under /proc

[2] imposed the sole responsibility for initializing
inode security for newly created filesystem objects
on smack_inode_init_security().

However, smack_inode_init_security() lacks some logic
present in smack_d_instantiate().
In particular, it does not label UDS files with "*".

This patch adds the missing labeling of UDS files
with "*" to smack_inode_init_security().

Labeling UDS files with "*" in smack_d_instantiate()
still works for stale UDS files that already exist on
disk. Stale UDS files are useless, but I keep labeling
them for consistency and maybe to make easier for user
to delete them.

Compared to [1], this version introduces the following
improvements:

  * UDS file label is held inside inode only
    and not saved to xattrs.

  * relabeling UDS files (setxattr, removexattr, etc.)
    is blocked.

[1] 2010-11-24 Casey Schaufler
commit b4e0d5f079 ("Smack: UDS revision")

[2] 2023-11-16 roberto.sassu
Fixes: e63d86b8b7 ("smack: Initialize the in-memory inode in smack_inode_init_security()")
Link: https://lore.kernel.org/linux-security-module/20231116090125.187209-5-roberto.sassu@huaweicloud.com/

Signed-off-by: Konstantin Andreev <andreev@swemel.ru>
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
2025-06-22 08:51:32 -07:00
Konstantin Andreev
69204f6cdb smack: always "instantiate" inode in smack_inode_init_security()
If memory allocation for the SMACK64TRANSMUTE
xattr value fails in smack_inode_init_security(),
the SMK_INODE_INSTANT flag is not set in
(struct inode_smack *issp)->smk_flags,
leaving the inode as not "instantiated".

It does not matter if fs frees the inode
after failed smack_inode_init_security() call,
but there is no guarantee for this.

To be safe, mark the inode as "instantiated",
even if allocation of xattr values fails.

Signed-off-by: Konstantin Andreev <andreev@swemel.ru>
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
2025-06-22 08:51:32 -07:00
Konstantin Andreev
8e5d9f916a smack: deduplicate xattr setting in smack_inode_init_security()
Signed-off-by: Konstantin Andreev <andreev@swemel.ru>
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
2025-06-22 08:51:32 -07:00
Konstantin Andreev
195da3ff24 smack: fix bug: SMACK64TRANSMUTE set on non-directory
When a new file system object is created
and the conditions for label transmutation are met,
the SMACK64TRANSMUTE extended attribute is set
on the object regardless of its type:
file, pipe, socket, symlink, or directory.

However,
SMACK64TRANSMUTE may only be set on directories.

This bug is a combined effect of the commits [1] and [2]
which both transfer functionality
from smack_d_instantiate() to smack_inode_init_security(),
but only in part.

Commit [1] set blank  SMACK64TRANSMUTE on improper object types.
Commit [2] set "TRUE" SMACK64TRANSMUTE on improper object types.

[1] 2023-06-10,
Fixes: baed456a6a ("smack: Set the SMACK64TRANSMUTE xattr in smack_inode_init_security()")
Link: https://lore.kernel.org/linux-security-module/20230610075738.3273764-3-roberto.sassu@huaweicloud.com/

[2] 2023-11-16,
Fixes: e63d86b8b7 ("smack: Initialize the in-memory inode in smack_inode_init_security()")
Link: https://lore.kernel.org/linux-security-module/20231116090125.187209-5-roberto.sassu@huaweicloud.com/

Signed-off-by: Konstantin Andreev <andreev@swemel.ru>
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
2025-06-22 08:51:32 -07:00
Konstantin Andreev
635a01da83 smack: deduplicate "does access rule request transmutation"
Signed-off-by: Konstantin Andreev <andreev@swemel.ru>
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
2025-06-22 08:51:32 -07:00
846 changed files with 78594 additions and 10661 deletions

1
.gitignore vendored
View File

@@ -41,6 +41,7 @@
*.o.*
*.patch
*.pyc
*.rlib
*.rmeta
*.rpm
*.rsi

View File

@@ -589,8 +589,8 @@ Nicolas Pitre <nico@fluxnic.net> <nicolas.pitre@linaro.org>
Nicolas Pitre <nico@fluxnic.net> <nico@linaro.org>
Nicolas Saenz Julienne <nsaenz@kernel.org> <nsaenzjulienne@suse.de>
Nicolas Saenz Julienne <nsaenz@kernel.org> <nsaenzjulienne@suse.com>
Nicolas Schier <nicolas.schier@linux.dev> <n.schier@avm.de>
Nicolas Schier <nicolas.schier@linux.dev> <nicolas@fjasle.eu>
Nicolas Schier <nsc@kernel.org> <n.schier@avm.de>
Nicolas Schier <nsc@kernel.org> <nicolas@fjasle.eu>
Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Nikolay Aleksandrov <razor@blackwall.org> <naleksan@redhat.com>
Nikolay Aleksandrov <razor@blackwall.org> <nikolay@redhat.com>

View File

@@ -1,2 +1,2 @@
[MASTER]
init-hook='import sys; sys.path += ["scripts/lib/kdoc", "scripts/lib/abi", "tools/docs/lib"]'
init-hook='import sys; sys.path += ["tools/lib/python"]'

View File

@@ -20,9 +20,10 @@ Description:
rule format: action [condition ...]
action: measure | dont_measure | appraise | dont_appraise |
audit | hash | dont_hash
audit | dont_audit | hash | dont_hash
condition:= base | lsm [option]
base: [[func=] [mask=] [fsmagic=] [fsuuid=] [fsname=]
[fs_subtype=]
[uid=] [euid=] [gid=] [egid=]
[fowner=] [fgroup=]]
lsm: [[subj_user=] [subj_role=] [subj_type=]

View File

@@ -59,6 +59,8 @@ Description: Module taint flags:
F force-loaded module
C staging driver module
E unsigned module
K livepatch module
N in-kernel test module
== =====================
What: /sys/module/grant_table/parameters/free_per_iteration

View File

@@ -19,7 +19,7 @@ config WARN_ABI_ERRORS
described at Documentation/ABI/README. Yet, as they're manually
written, it would be possible that some of those files would
have errors that would break them for being parsed by
scripts/get_abi.pl. Add a check to verify them.
tools/docs/get_abi.py. Add a check to verify them.
If unsure, select 'N'.

View File

@@ -8,12 +8,12 @@ subdir- := devicetree/bindings
ifneq ($(MAKECMDGOALS),cleandocs)
# Check for broken documentation file references
ifeq ($(CONFIG_WARN_MISSING_DOCUMENTS),y)
$(shell $(srctree)/scripts/documentation-file-ref-check --warn)
$(shell $(srctree)/tools/docs/documentation-file-ref-check --warn)
endif
# Check for broken ABI files
ifeq ($(CONFIG_WARN_ABI_ERRORS),y)
$(shell $(srctree)/scripts/get_abi.py --dir $(srctree)/Documentation/ABI validate)
$(shell $(srctree)/tools/docs/get_abi.py --dir $(srctree)/Documentation/ABI validate)
endif
endif
@@ -23,21 +23,22 @@ SPHINXOPTS =
SPHINXDIRS = .
DOCS_THEME =
DOCS_CSS =
_SPHINXDIRS = $(sort $(patsubst $(srctree)/Documentation/%/index.rst,%,$(wildcard $(srctree)/Documentation/*/index.rst)))
SPHINX_CONF = conf.py
RUSTDOC =
PAPER =
BUILDDIR = $(obj)/output
PDFLATEX = xelatex
LATEXOPTS = -interaction=batchmode -no-shell-escape
PYTHONPYCACHEPREFIX ?= $(abspath $(BUILDDIR)/__pycache__)
# Wrapper for sphinx-build
BUILD_WRAPPER = $(srctree)/tools/docs/sphinx-build-wrapper
# For denylisting "variable font" files
# Can be overridden by setting as an env variable
FONTS_CONF_DENY_VF ?= $(HOME)/deny-vf
ifeq ($(findstring 1, $(KBUILD_VERBOSE)),)
SPHINXOPTS += "-q"
endif
# User-friendly check for sphinx-build
HAVE_SPHINX := $(shell if which $(SPHINXBUILD) >/dev/null 2>&1; then echo 1; else echo 0; fi)
@@ -46,141 +47,46 @@ ifeq ($(HAVE_SPHINX),0)
.DEFAULT:
$(warning The '$(SPHINXBUILD)' command was not found. Make sure you have Sphinx installed and in PATH, or set the SPHINXBUILD make variable to point to the full path of the '$(SPHINXBUILD)' executable.)
@echo
@$(srctree)/scripts/sphinx-pre-install
@$(srctree)/tools/docs/sphinx-pre-install
@echo " SKIP Sphinx $@ target."
else # HAVE_SPHINX
# User-friendly check for pdflatex and latexmk
HAVE_PDFLATEX := $(shell if which $(PDFLATEX) >/dev/null 2>&1; then echo 1; else echo 0; fi)
HAVE_LATEXMK := $(shell if which latexmk >/dev/null 2>&1; then echo 1; else echo 0; fi)
# Common documentation targets
htmldocs mandocs infodocs texinfodocs latexdocs epubdocs xmldocs pdfdocs linkcheckdocs:
$(Q)PYTHONPYCACHEPREFIX="$(PYTHONPYCACHEPREFIX)" \
$(srctree)/tools/docs/sphinx-pre-install --version-check
+$(Q)PYTHONPYCACHEPREFIX="$(PYTHONPYCACHEPREFIX)" \
$(PYTHON3) $(BUILD_WRAPPER) $@ \
--sphinxdirs="$(SPHINXDIRS)" $(RUSTDOC) \
--builddir="$(BUILDDIR)" --deny-vf=$(FONTS_CONF_DENY_VF) \
--theme=$(DOCS_THEME) --css=$(DOCS_CSS) --paper=$(PAPER)
ifeq ($(HAVE_LATEXMK),1)
PDFLATEX := latexmk -$(PDFLATEX)
endif #HAVE_LATEXMK
# Internal variables.
PAPEROPT_a4 = -D latex_elements.papersize=a4paper
PAPEROPT_letter = -D latex_elements.papersize=letterpaper
ALLSPHINXOPTS = -D kerneldoc_srctree=$(srctree) -D kerneldoc_bin=$(KERNELDOC)
ALLSPHINXOPTS += $(PAPEROPT_$(PAPER)) $(SPHINXOPTS)
ifneq ($(wildcard $(srctree)/.config),)
ifeq ($(CONFIG_RUST),y)
# Let Sphinx know we will include rustdoc
ALLSPHINXOPTS += -t rustdoc
endif
endif
# the i18n builder cannot share the environment and doctrees with the others
I18NSPHINXOPTS = $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) .
# commands; the 'cmd' from scripts/Kbuild.include is not *loopable*
loop_cmd = $(echo-cmd) $(cmd_$(1)) || exit;
# $2 sphinx builder e.g. "html"
# $3 name of the build subfolder / e.g. "userspace-api/media", used as:
# * dest folder relative to $(BUILDDIR) and
# * cache folder relative to $(BUILDDIR)/.doctrees
# $4 dest subfolder e.g. "man" for man pages at userspace-api/media/man
# $5 reST source folder relative to $(src),
# e.g. "userspace-api/media" for the linux-tv book-set at ./Documentation/userspace-api/media
PYTHONPYCACHEPREFIX ?= $(abspath $(BUILDDIR)/__pycache__)
quiet_cmd_sphinx = SPHINX $@ --> file://$(abspath $(BUILDDIR)/$3/$4)
cmd_sphinx = \
PYTHONPYCACHEPREFIX="$(PYTHONPYCACHEPREFIX)" \
BUILDDIR=$(abspath $(BUILDDIR)) SPHINX_CONF=$(abspath $(src)/$5/$(SPHINX_CONF)) \
$(PYTHON3) $(srctree)/scripts/jobserver-exec \
$(CONFIG_SHELL) $(srctree)/Documentation/sphinx/parallel-wrapper.sh \
$(SPHINXBUILD) \
-b $2 \
-c $(abspath $(src)) \
-d $(abspath $(BUILDDIR)/.doctrees/$3) \
-D version=$(KERNELVERSION) -D release=$(KERNELRELEASE) \
$(ALLSPHINXOPTS) \
$(abspath $(src)/$5) \
$(abspath $(BUILDDIR)/$3/$4) && \
if [ "x$(DOCS_CSS)" != "x" ]; then \
cp $(if $(patsubst /%,,$(DOCS_CSS)),$(abspath $(srctree)/$(DOCS_CSS)),$(DOCS_CSS)) $(BUILDDIR)/$3/_static/; \
fi
htmldocs:
@$(srctree)/scripts/sphinx-pre-install --version-check
@+$(foreach var,$(SPHINXDIRS),$(call loop_cmd,sphinx,html,$(var),,$(var)))
htmldocs-redirects: $(srctree)/Documentation/.renames.txt
@tools/docs/gen-redirects.py --output $(BUILDDIR) < $<
# If Rust support is available and .config exists, add rustdoc generated contents.
# If there are any, the errors from this make rustdoc will be displayed but
# won't stop the execution of htmldocs
ifneq ($(wildcard $(srctree)/.config),)
ifeq ($(CONFIG_RUST),y)
$(Q)$(MAKE) rustdoc || true
endif
endif
texinfodocs:
@$(srctree)/scripts/sphinx-pre-install --version-check
@+$(foreach var,$(SPHINXDIRS),$(call loop_cmd,sphinx,texinfo,$(var),texinfo,$(var)))
# Note: the 'info' Make target is generated by sphinx itself when
# running the texinfodocs target define above.
infodocs: texinfodocs
$(MAKE) -C $(BUILDDIR)/texinfo info
linkcheckdocs:
@$(foreach var,$(SPHINXDIRS),$(call loop_cmd,sphinx,linkcheck,$(var),,$(var)))
latexdocs:
@$(srctree)/scripts/sphinx-pre-install --version-check
@+$(foreach var,$(SPHINXDIRS),$(call loop_cmd,sphinx,latex,$(var),latex,$(var)))
ifeq ($(HAVE_PDFLATEX),0)
pdfdocs:
$(warning The '$(PDFLATEX)' command was not found. Make sure you have it installed and in PATH to produce PDF output.)
@echo " SKIP Sphinx $@ target."
else # HAVE_PDFLATEX
pdfdocs: DENY_VF = XDG_CONFIG_HOME=$(FONTS_CONF_DENY_VF)
pdfdocs: latexdocs
@$(srctree)/scripts/sphinx-pre-install --version-check
$(foreach var,$(SPHINXDIRS), \
$(MAKE) PDFLATEX="$(PDFLATEX)" LATEXOPTS="$(LATEXOPTS)" $(DENY_VF) -C $(BUILDDIR)/$(var)/latex || sh $(srctree)/scripts/check-variable-fonts.sh || exit; \
mkdir -p $(BUILDDIR)/$(var)/pdf; \
mv $(subst .tex,.pdf,$(wildcard $(BUILDDIR)/$(var)/latex/*.tex)) $(BUILDDIR)/$(var)/pdf/; \
)
endif # HAVE_PDFLATEX
epubdocs:
@$(srctree)/scripts/sphinx-pre-install --version-check
@+$(foreach var,$(SPHINXDIRS),$(call loop_cmd,sphinx,epub,$(var),epub,$(var)))
xmldocs:
@$(srctree)/scripts/sphinx-pre-install --version-check
@+$(foreach var,$(SPHINXDIRS),$(call loop_cmd,sphinx,xml,$(var),xml,$(var)))
endif # HAVE_SPHINX
# The following targets are independent of HAVE_SPHINX, and the rules should
# work or silently pass without Sphinx.
htmldocs-redirects: $(srctree)/Documentation/.renames.txt
@tools/docs/gen-redirects.py --output $(BUILDDIR) < $<
refcheckdocs:
$(Q)cd $(srctree);scripts/documentation-file-ref-check
$(Q)cd $(srctree); tools/docs/documentation-file-ref-check
cleandocs:
$(Q)rm -rf $(BUILDDIR)
# Used only on help
_SPHINXDIRS = $(shell printf "%s\n" $(patsubst $(srctree)/Documentation/%/index.rst,%,$(wildcard $(srctree)/Documentation/*/index.rst)) | sort -f)
dochelp:
@echo ' Linux kernel internal documentation in different formats from ReST:'
@echo ' htmldocs - HTML'
@echo ' htmldocs-redirects - generate HTML redirects for moved pages'
@echo ' texinfodocs - Texinfo'
@echo ' infodocs - Info'
@echo ' mandocs - Man pages'
@echo ' latexdocs - LaTeX'
@echo ' pdfdocs - PDF'
@echo ' epubdocs - EPUB'
@@ -192,13 +98,17 @@ dochelp:
@echo ' cleandocs - clean all generated files'
@echo
@echo ' make SPHINXDIRS="s1 s2" [target] Generate only docs of folder s1, s2'
@echo ' valid values for SPHINXDIRS are: $(_SPHINXDIRS)'
@echo
@echo ' make SPHINX_CONF={conf-file} [target] use *additional* sphinx-build'
@echo ' configuration. This is e.g. useful to build with nit-picking config.'
@echo ' top level values for SPHINXDIRS are: $(_SPHINXDIRS)'
@echo ' you may also use a subdirectory like SPHINXDIRS=userspace-api/media,'
@echo ' provided that there is an index.rst file at the subdirectory.'
@echo
@echo ' make DOCS_THEME={sphinx-theme} selects a different Sphinx theme.'
@echo
@echo ' make DOCS_CSS={a .css file} adds a DOCS_CSS override file for html/epub output.'
@echo
@echo ' make PAPER={a4|letter} Specifies the paper size used for LaTeX/PDF output.'
@echo
@echo ' make FONTS_CONF_DENY_VF={path} sets a deny list to block variable Noto CJK fonts'
@echo ' for PDF build. See tools/lib/python/kdoc/latex_fonts.py for more details'
@echo
@echo ' Default location for the generated documents is Documentation/output'

View File

@@ -2637,15 +2637,16 @@ synchronize_srcu() for some other domain ``ss1``, and if an
that was held across as ``ss``-domain synchronize_srcu(), deadlock
would again be possible. Such a deadlock cycle could extend across an
arbitrarily large number of different SRCU domains. Again, with great
power comes great responsibility.
power comes great responsibility, though lockdep is now able to detect
this sort of deadlock.
Unlike the other RCU flavors, SRCU read-side critical sections can run
on idle and even offline CPUs. This ability requires that
srcu_read_lock() and srcu_read_unlock() contain memory barriers,
which means that SRCU readers will run a bit slower than would RCU
readers. It also motivates the smp_mb__after_srcu_read_unlock() API,
which, in combination with srcu_read_unlock(), guarantees a full
memory barrier.
Unlike the other RCU flavors, SRCU read-side critical sections can run on
idle and even offline CPUs, with the exception of srcu_read_lock_fast()
and friends. This ability requires that srcu_read_lock() and
srcu_read_unlock() contain memory barriers, which means that SRCU
readers will run a bit slower than would RCU readers. It also motivates
the smp_mb__after_srcu_read_unlock() API, which, in combination with
srcu_read_unlock(), guarantees a full memory barrier.
Also unlike other RCU flavors, synchronize_srcu() may **not** be
invoked from CPU-hotplug notifiers, due to the fact that SRCU grace
@@ -2681,15 +2682,15 @@ run some tests first. SRCU just might need a few adjustment to deal with
that sort of load. Of course, your mileage may vary based on the speed
of your CPUs and the size of your memory.
The `SRCU
API <https://lwn.net/Articles/609973/#RCU%20Per-Flavor%20API%20Table>`__
The `SRCU API
<https://lwn.net/Articles/609973/#RCU%20Per-Flavor%20API%20Table>`__
includes srcu_read_lock(), srcu_read_unlock(),
srcu_dereference(), srcu_dereference_check(),
synchronize_srcu(), synchronize_srcu_expedited(),
call_srcu(), srcu_barrier(), and srcu_read_lock_held(). It
also includes DEFINE_SRCU(), DEFINE_STATIC_SRCU(), and
init_srcu_struct() APIs for defining and initializing
``srcu_struct`` structures.
srcu_dereference(), srcu_dereference_check(), synchronize_srcu(),
synchronize_srcu_expedited(), call_srcu(), srcu_barrier(),
and srcu_read_lock_held(). It also includes DEFINE_SRCU(),
DEFINE_STATIC_SRCU(), DEFINE_SRCU_FAST(), DEFINE_STATIC_SRCU_FAST(),
init_srcu_struct(), and init_srcu_struct_fast() APIs for defining and
initializing ``srcu_struct`` structures.
More recently, the SRCU API has added polling interfaces:

View File

@@ -417,11 +417,13 @@ over a rather long period of time, but improvements are always welcome!
you should be using RCU rather than SRCU, because RCU is almost
always faster and easier to use than is SRCU.
Also unlike other forms of RCU, explicit initialization and
cleanup is required either at build time via DEFINE_SRCU()
or DEFINE_STATIC_SRCU() or at runtime via init_srcu_struct()
and cleanup_srcu_struct(). These last two are passed a
"struct srcu_struct" that defines the scope of a given
Also unlike other forms of RCU, explicit initialization
and cleanup is required either at build time via
DEFINE_SRCU(), DEFINE_STATIC_SRCU(), DEFINE_SRCU_FAST(),
or DEFINE_STATIC_SRCU_FAST() or at runtime via either
init_srcu_struct() or init_srcu_struct_fast() and
cleanup_srcu_struct(). These last three are passed a
`struct srcu_struct` that defines the scope of a given
SRCU domain. Once initialized, the srcu_struct is passed
to srcu_read_lock(), srcu_read_unlock() synchronize_srcu(),
synchronize_srcu_expedited(), and call_srcu(). A given

View File

@@ -1227,7 +1227,10 @@ SRCU: Initialization/cleanup/ordering::
DEFINE_SRCU
DEFINE_STATIC_SRCU
DEFINE_SRCU_FAST // for srcu_read_lock_fast() and friends
DEFINE_STATIC_SRCU_FAST // for srcu_read_lock_fast() and friends
init_srcu_struct
init_srcu_struct_fast
cleanup_srcu_struct
smp_mb__after_srcu_read_unlock

View File

@@ -76,41 +76,43 @@ The messages are in the format::
The taskstats payload is one of the following three kinds:
1. Commands: Sent from user to kernel. Commands to get data on
a pid/tgid consist of one attribute, of type TASKSTATS_CMD_ATTR_PID/TGID,
containing a u32 pid or tgid in the attribute payload. The pid/tgid denotes
the task/process for which userspace wants statistics.
a pid/tgid consist of one attribute, of type TASKSTATS_CMD_ATTR_PID/TGID,
containing a u32 pid or tgid in the attribute payload. The pid/tgid denotes
the task/process for which userspace wants statistics.
Commands to register/deregister interest in exit data from a set of cpus
consist of one attribute, of type
TASKSTATS_CMD_ATTR_REGISTER/DEREGISTER_CPUMASK and contain a cpumask in the
attribute payload. The cpumask is specified as an ascii string of
comma-separated cpu ranges e.g. to listen to exit data from cpus 1,2,3,5,7,8
the cpumask would be "1-3,5,7-8". If userspace forgets to deregister interest
in cpus before closing the listening socket, the kernel cleans up its interest
set over time. However, for the sake of efficiency, an explicit deregistration
is advisable.
Commands to register/deregister interest in exit data from a set of cpus
consist of one attribute, of type
TASKSTATS_CMD_ATTR_REGISTER/DEREGISTER_CPUMASK and contain a cpumask in the
attribute payload. The cpumask is specified as an ascii string of
comma-separated cpu ranges e.g. to listen to exit data from cpus 1,2,3,5,7,8
the cpumask would be "1-3,5,7-8". If userspace forgets to deregister
interest in cpus before closing the listening socket, the kernel cleans up
its interest set over time. However, for the sake of efficiency, an explicit
deregistration is advisable.
2. Response for a command: sent from the kernel in response to a userspace
command. The payload is a series of three attributes of type:
command. The payload is a series of three attributes of type:
a) TASKSTATS_TYPE_AGGR_PID/TGID : attribute containing no payload but indicates
a pid/tgid will be followed by some stats.
a) TASKSTATS_TYPE_AGGR_PID/TGID: attribute containing no payload but
indicates a pid/tgid will be followed by some stats.
b) TASKSTATS_TYPE_PID/TGID: attribute whose payload is the pid/tgid whose stats
are being returned.
b) TASKSTATS_TYPE_PID/TGID: attribute whose payload is the pid/tgid whose
stats are being returned.
c) TASKSTATS_TYPE_STATS: attribute with a struct taskstats as payload. The
same structure is used for both per-pid and per-tgid stats.
c) TASKSTATS_TYPE_STATS: attribute with a struct taskstats as payload. The
same structure is used for both per-pid and per-tgid stats.
3. New message sent by kernel whenever a task exits. The payload consists of a
series of attributes of the following type:
a) TASKSTATS_TYPE_AGGR_PID: indicates next two attributes will be pid+stats
b) TASKSTATS_TYPE_PID: contains exiting task's pid
c) TASKSTATS_TYPE_STATS: contains the exiting task's per-pid stats
d) TASKSTATS_TYPE_AGGR_TGID: indicates next two attributes will be tgid+stats
e) TASKSTATS_TYPE_TGID: contains tgid of process to which task belongs
f) TASKSTATS_TYPE_STATS: contains the per-tgid stats for exiting task's process
a) TASKSTATS_TYPE_AGGR_PID: indicates next two attributes will be pid+stats
b) TASKSTATS_TYPE_PID: contains exiting task's pid
c) TASKSTATS_TYPE_STATS: contains the exiting task's per-pid stats
d) TASKSTATS_TYPE_AGGR_TGID: indicates next two attributes will be
tgid+stats
e) TASKSTATS_TYPE_TGID: contains tgid of process to which task belongs
f) TASKSTATS_TYPE_STATS: contains the per-tgid stats for exiting task's
process
per-tgid stats

View File

@@ -601,10 +601,15 @@ specification.
Task Attribute
~~~~~~~~~~~~~~
The Smack label of a process can be read from /proc/<pid>/attr/current. A
process can read its own Smack label from /proc/self/attr/current. A
The Smack label of a process can be read from ``/proc/<pid>/attr/current``. A
process can read its own Smack label from ``/proc/self/attr/current``. A
privileged process can change its own Smack label by writing to
/proc/self/attr/current but not the label of another process.
``/proc/self/attr/current`` but not the label of another process.
Format of writing is : only the label or the label followed by one of the
3 trailers: ``\n`` (by common agreement for ``/proc/...`` interfaces),
``\0`` (because some applications incorrectly include it),
``\n\0`` (because we think some applications may incorrectly include it).
File Attribute
~~~~~~~~~~~~~~
@@ -696,6 +701,11 @@ sockets.
A privileged program may set this to match the label of another
task with which it hopes to communicate.
UNIX domain socket (UDS) with a BSD address functions both as a file in a
filesystem and as a socket. As a file, it carries the SMACK64 attribute. This
attribute is not involved in Smack security enforcement and is immutably
assigned the label "*".
Smack Netlabel Exceptions
~~~~~~~~~~~~~~~~~~~~~~~~~

View File

@@ -95,7 +95,20 @@ languages when these scripts are invoked by passing these program files
to the interpreter. This is because the way interpreters execute these
files; the scripts themselves are not evaluated as executable code
through one of IPE's hooks, but they are merely text files that are read
(as opposed to compiled executables) [#interpreters]_.
(as opposed to compiled executables). However, with the introduction of the
``AT_EXECVE_CHECK`` flag (:doc:`AT_EXECVE_CHECK </userspace-api/check_exec>`),
interpreters can use it to signal the kernel that a script file will be executed,
and request the kernel to perform LSM security checks on it.
IPE's EXECUTE operation enforcement differs between compiled executables and
interpreted scripts: For compiled executables, enforcement is triggered
automatically by the kernel during ``execve()``, ``execveat()``, ``mmap()``
and ``mprotect()`` syscalls when loading executable content. For interpreted
scripts, enforcement requires explicit interpreter integration using
``execveat()`` with ``AT_EXECVE_CHECK`` flag. Unlike exec syscalls that IPE
intercepts during the execution process, this mechanism needs the interpreter
to take the initiative, and existing interpreters won't be automatically
supported unless the signal call is added.
Threat Model
------------
@@ -806,8 +819,6 @@ A:
.. [#digest_cache_lsm] https://lore.kernel.org/lkml/20240415142436.2545003-1-roberto.sassu@huaweicloud.com/
.. [#interpreters] There is `some interest in solving this issue <https://lore.kernel.org/lkml/20220321161557.495388-1-mic@digikod.net/>`_.
.. [#devdoc] Please see :doc:`the design docs </security/ipe>` for more on
this topic.

View File

@@ -53,7 +53,8 @@ v1 is available under :ref:`Documentation/admin-guide/cgroup-v1/index.rst <cgrou
5-2. Memory
5-2-1. Memory Interface Files
5-2-2. Usage Guidelines
5-2-3. Memory Ownership
5-2-3. Reclaim Protection
5-2-4. Memory Ownership
5-3. IO
5-3-1. IO Interface Files
5-3-2. Writeback
@@ -1317,7 +1318,7 @@ PAGE_SIZE multiple when read back.
smaller overages.
Effective min boundary is limited by memory.min values of
all ancestor cgroups. If there is memory.min overcommitment
ancestor cgroups. If there is memory.min overcommitment
(child cgroup or cgroups are requiring more protected memory
than parent will allow), then each child cgroup will get
the part of parent's protection proportional to its
@@ -1326,9 +1327,6 @@ PAGE_SIZE multiple when read back.
Putting more memory than generally available under this
protection is discouraged and may lead to constant OOMs.
If a memory cgroup is not populated with processes,
its memory.min is ignored.
memory.low
A read-write single value file which exists on non-root
cgroups. The default is "0".
@@ -1343,7 +1341,7 @@ PAGE_SIZE multiple when read back.
smaller overages.
Effective low boundary is limited by memory.low values of
all ancestor cgroups. If there is memory.low overcommitment
ancestor cgroups. If there is memory.low overcommitment
(child cgroup or cgroups are requiring more protected memory
than parent will allow), then each child cgroup will get
the part of parent's protection proportional to its
@@ -1934,6 +1932,27 @@ memory - is necessary to determine whether a workload needs more
memory; unfortunately, memory pressure monitoring mechanism isn't
implemented yet.
Reclaim Protection
~~~~~~~~~~~~~~~~~~
The protection configured with "memory.low" or "memory.min" applies relatively
to the target of the reclaim (i.e. any of memory cgroup limits, proactive
memory.reclaim or global reclaim apparently located in the root cgroup).
The protection value configured for B applies unchanged to the reclaim
targeting A (i.e. caused by competition with the sibling E)::
root - ... - A - B - C
\ ` D
` E
When the reclaim targets ancestors of A, the effective protection of B is
capped by the protection value configured for A (and any other intermediate
ancestors between A and the target).
To express indifference about relative sibling protection, it is suggested to
use memory_recursiveprot. Configuring all descendants of a parent with finite
protection to "max" works but it may unnecessarily skew memory.events:low
field.
Memory Ownership
~~~~~~~~~~~~~~~~

View File

@@ -79,6 +79,9 @@ because the image we're executing is interpreted by the EFI shell,
which understands relative paths, whereas the rest of the command line
is passed to bzImage.efi.
.. hint::
It is also possible to provide an initrd using a Linux-specific UEFI
protocol at boot time. See :ref:`pe-coff-entry-point` for details.
The "dtb=" option
-----------------

View File

@@ -31,7 +31,7 @@ specifically opt into the feature to enable it.
Mitigation
----------
When PR_SET_L1D_FLUSH is enabled for a task a flush of the L1D cache is
When PR_SPEC_L1D_FLUSH is enabled for a task a flush of the L1D cache is
performed when the task is scheduled out and the incoming task belongs to a
different process and therefore to a different address space.

View File

@@ -406,7 +406,7 @@ The possible values in this file are:
- Single threaded indirect branch prediction (STIBP) status for protection
between different hyper threads. This feature can be controlled through
prctl per process, or through kernel command line options. This is x86
prctl per process, or through kernel command line options. This is an x86
only feature. For more details see below.
==================== ========================================================

View File

@@ -110,102 +110,7 @@ The parameters listed below are only valid if certain kernel build options
were enabled and if respective hardware is present. This list should be kept
in alphabetical order. The text in square brackets at the beginning
of each description states the restrictions within which a parameter
is applicable::
ACPI ACPI support is enabled.
AGP AGP (Accelerated Graphics Port) is enabled.
ALSA ALSA sound support is enabled.
APIC APIC support is enabled.
APM Advanced Power Management support is enabled.
APPARMOR AppArmor support is enabled.
ARM ARM architecture is enabled.
ARM64 ARM64 architecture is enabled.
AX25 Appropriate AX.25 support is enabled.
CLK Common clock infrastructure is enabled.
CMA Contiguous Memory Area support is enabled.
DRM Direct Rendering Management support is enabled.
DYNAMIC_DEBUG Build in debug messages and enable them at runtime
EARLY Parameter processed too early to be embedded in initrd.
EDD BIOS Enhanced Disk Drive Services (EDD) is enabled
EFI EFI Partitioning (GPT) is enabled
EVM Extended Verification Module
FB The frame buffer device is enabled.
FTRACE Function tracing enabled.
GCOV GCOV profiling is enabled.
HIBERNATION HIBERNATION is enabled.
HW Appropriate hardware is enabled.
HYPER_V HYPERV support is enabled.
IMA Integrity measurement architecture is enabled.
IP_PNP IP DHCP, BOOTP, or RARP is enabled.
IPV6 IPv6 support is enabled.
ISAPNP ISA PnP code is enabled.
ISDN Appropriate ISDN support is enabled.
ISOL CPU Isolation is enabled.
JOY Appropriate joystick support is enabled.
KGDB Kernel debugger support is enabled.
KVM Kernel Virtual Machine support is enabled.
LIBATA Libata driver is enabled
LOONGARCH LoongArch architecture is enabled.
LOOP Loopback device support is enabled.
LP Printer support is enabled.
M68k M68k architecture is enabled.
These options have more detailed description inside of
Documentation/arch/m68k/kernel-options.rst.
MDA MDA console support is enabled.
MIPS MIPS architecture is enabled.
MOUSE Appropriate mouse support is enabled.
MSI Message Signaled Interrupts (PCI).
MTD MTD (Memory Technology Device) support is enabled.
NET Appropriate network support is enabled.
NFS Appropriate NFS support is enabled.
NUMA NUMA support is enabled.
OF Devicetree is enabled.
PARISC The PA-RISC architecture is enabled.
PCI PCI bus support is enabled.
PCIE PCI Express support is enabled.
PCMCIA The PCMCIA subsystem is enabled.
PNP Plug & Play support is enabled.
PPC PowerPC architecture is enabled.
PPT Parallel port support is enabled.
PS2 Appropriate PS/2 support is enabled.
PV_OPS A paravirtualized kernel is enabled.
RAM RAM disk support is enabled.
RDT Intel Resource Director Technology.
RISCV RISCV architecture is enabled.
S390 S390 architecture is enabled.
SCSI Appropriate SCSI support is enabled.
A lot of drivers have their options described inside
the Documentation/scsi/ sub-directory.
SDW SoundWire support is enabled.
SECURITY Different security models are enabled.
SELINUX SELinux support is enabled.
SERIAL Serial support is enabled.
SH SuperH architecture is enabled.
SMP The kernel is an SMP kernel.
SPARC Sparc architecture is enabled.
SUSPEND System suspend states are enabled.
SWSUSP Software suspend (hibernation) is enabled.
TPM TPM drivers are enabled.
UMS USB Mass Storage support is enabled.
USB USB support is enabled.
USBHID USB Human Interface Device support is enabled.
V4L Video For Linux support is enabled.
VGA The VGA console has been enabled.
VMMIO Driver for memory mapped virtio devices is enabled.
VT Virtual terminal support is enabled.
WDT Watchdog support is enabled.
X86-32 X86-32, aka i386 architecture is enabled.
X86-64 X86-64 architecture is enabled.
X86 Either 32-bit or 64-bit x86 (same as X86-32+X86-64)
X86_UV SGI UV support is enabled.
XEN Xen support is enabled
XTENSA xtensa architecture is enabled.
In addition, the following text indicates that the option::
BOOT Is a boot loader parameter.
BUGS= Relates to possible processor bugs on the said processor.
KNL Is a kernel start-up parameter.
is applicable.
Parameters denoted with BOOT are actually interpreted by the boot
loader, and have no meaning to the kernel directly.

View File

@@ -1,3 +1,101 @@
ACPI ACPI support is enabled.
AGP AGP (Accelerated Graphics Port) is enabled.
ALSA ALSA sound support is enabled.
APIC APIC support is enabled.
APM Advanced Power Management support is enabled.
APPARMOR AppArmor support is enabled.
ARM ARM architecture is enabled.
ARM64 ARM64 architecture is enabled.
AX25 Appropriate AX.25 support is enabled.
CLK Common clock infrastructure is enabled.
CMA Contiguous Memory Area support is enabled.
DRM Direct Rendering Management support is enabled.
DYNAMIC_DEBUG Build in debug messages and enable them at runtime
EARLY Parameter processed too early to be embedded in initrd.
EDD BIOS Enhanced Disk Drive Services (EDD) is enabled
EFI EFI Partitioning (GPT) is enabled
EVM Extended Verification Module
FB The frame buffer device is enabled.
FTRACE Function tracing enabled.
GCOV GCOV profiling is enabled.
HIBERNATION HIBERNATION is enabled.
HW Appropriate hardware is enabled.
HYPER_V HYPERV support is enabled.
IMA Integrity measurement architecture is enabled.
IP_PNP IP DHCP, BOOTP, or RARP is enabled.
IPV6 IPv6 support is enabled.
ISAPNP ISA PnP code is enabled.
ISDN Appropriate ISDN support is enabled.
ISOL CPU Isolation is enabled.
JOY Appropriate joystick support is enabled.
KGDB Kernel debugger support is enabled.
KVM Kernel Virtual Machine support is enabled.
LIBATA Libata driver is enabled
LOONGARCH LoongArch architecture is enabled.
LOOP Loopback device support is enabled.
LP Printer support is enabled.
M68k M68k architecture is enabled.
These options have more detailed description inside of
Documentation/arch/m68k/kernel-options.rst.
MDA MDA console support is enabled.
MIPS MIPS architecture is enabled.
MOUSE Appropriate mouse support is enabled.
MSI Message Signaled Interrupts (PCI).
MTD MTD (Memory Technology Device) support is enabled.
NET Appropriate network support is enabled.
NFS Appropriate NFS support is enabled.
NUMA NUMA support is enabled.
OF Devicetree is enabled.
PARISC The PA-RISC architecture is enabled.
PCI PCI bus support is enabled.
PCIE PCI Express support is enabled.
PCMCIA The PCMCIA subsystem is enabled.
PNP Plug & Play support is enabled.
PPC PowerPC architecture is enabled.
PPT Parallel port support is enabled.
PS2 Appropriate PS/2 support is enabled.
PV_OPS A paravirtualized kernel is enabled.
RAM RAM disk support is enabled.
RDT Intel Resource Director Technology.
RISCV RISCV architecture is enabled.
S390 S390 architecture is enabled.
SCSI Appropriate SCSI support is enabled.
A lot of drivers have their options described inside
the Documentation/scsi/ sub-directory.
SDW SoundWire support is enabled.
SECURITY Different security models are enabled.
SELINUX SELinux support is enabled.
SERIAL Serial support is enabled.
SH SuperH architecture is enabled.
SMP The kernel is an SMP kernel.
SPARC Sparc architecture is enabled.
SUSPEND System suspend states are enabled.
SWSUSP Software suspend (hibernation) is enabled.
TPM TPM drivers are enabled.
UMS USB Mass Storage support is enabled.
USB USB support is enabled.
USBHID USB Human Interface Device support is enabled.
V4L Video For Linux support is enabled.
VGA The VGA console has been enabled.
VMMIO Driver for memory mapped virtio devices is enabled.
VT Virtual terminal support is enabled.
WDT Watchdog support is enabled.
X86-32 X86-32, aka i386 architecture is enabled.
X86-64 X86-64 architecture is enabled.
X86 Either 32-bit or 64-bit x86 (same as X86-32+X86-64)
X86_UV SGI UV support is enabled.
XEN Xen support is enabled
XTENSA xtensa architecture is enabled.
In addition, the following text indicates that the option
BOOT Is a boot loader parameter.
BUGS= Relates to possible processor bugs on the said processor.
KNL Is a kernel start-up parameter.
Kernel parameters
accept_memory= [MM]
Format: { eager | lazy }
default: lazy
@@ -6414,7 +6512,7 @@
that don't.
off - no mitigation
auto - automatically select a migitation
auto - automatically select a mitigation
auto,nosmt - automatically select a mitigation,
disabling SMT if necessary for
the full mitigation (only on Zen1
@@ -7164,7 +7262,7 @@
limit. Default value is 8191 pools.
stacktrace [FTRACE]
Enabled the stack tracer on boot up.
Enable the stack tracer on boot up.
stacktrace_filter=[function-list]
[FTRACE] Limit the functions that the stack tracer

View File

@@ -186,6 +186,6 @@ More detailed explanation for tainting
18) ``N`` if an in-kernel test, such as a KUnit test, has been run.
19) ``J`` if userpace opened /dev/fwctl/* and performed a FWTCL_RPC_DEBUG_WRITE
19) ``J`` if userspace opened /dev/fwctl/* and performed a FWTCL_RPC_DEBUG_WRITE
to use the devices debugging features. Device debugging features could
cause the device to malfunction in undefined ways.

View File

@@ -196,11 +196,11 @@ Lets checkout the latest Linux repository and build cscope database::
cscope -R -p10 # builds cscope.out database before starting browse session
cscope -d -p10 # starts browse session on cscope.out database
Note: Run "cscope -R -p10" to build the database and c"scope -d -p10" to
enter into the browsing session. cscope by default cscope.out database.
To get out of this mode press ctrl+d. -p option is used to specify the
number of file path components to display. -p10 is optimal for browsing
kernel sources.
Note: Run "cscope -R -p10" to build the database and "cscope -d -p10" to
enter into the browsing session. cscope by default uses the cscope.out
database. To get out of this mode press ctrl+d. -p option is used to
specify the number of file path components to display. -p10 is optimal
for browsing kernel sources.
What is perf and how do we use it?
==================================

View File

@@ -1431,12 +1431,34 @@ The boot loader *must* fill out the following fields in bp::
All other fields should be zero.
.. note::
The EFI Handover Protocol is deprecated in favour of the ordinary PE/COFF
entry point, combined with the LINUX_EFI_INITRD_MEDIA_GUID based initrd
loading protocol (refer to [0] for an example of the bootloader side of
this), which removes the need for any knowledge on the part of the EFI
bootloader regarding the internal representation of boot_params or any
requirements/limitations regarding the placement of the command line
and ramdisk in memory, or the placement of the kernel image itself.
The EFI Handover Protocol is deprecated in favour of the ordinary PE/COFF
entry point described below.
[0] https://github.com/u-boot/u-boot/commit/ec80b4735a593961fe701cc3a5d717d4739b0fd0
.. _pe-coff-entry-point:
PE/COFF entry point
===================
When compiled with ``CONFIG_EFI_STUB=y``, the kernel can be executed as a
regular PE/COFF binary. See Documentation/admin-guide/efi-stub.rst for
implementation details.
The stub loader can request the initrd via a UEFI protocol. For this to work,
the firmware or bootloader needs to register a handle which carries
implementations of the ``EFI_LOAD_FILE2`` protocol and the device path
protocol exposing the ``LINUX_EFI_INITRD_MEDIA_GUID`` vendor media device path.
In this case, a kernel booting via the EFI stub will invoke
``LoadFile2::LoadFile()`` method on the registered protocol to instruct the
firmware to load the initrd into a memory location chosen by the kernel/EFI
stub.
This approach removes the need for any knowledge on the part of the EFI
bootloader regarding the internal representation of boot_params or any
requirements/limitations regarding the placement of the command line and
ramdisk in memory, or the placement of the kernel image itself.
For sample implementations, refer to `the original u-boot implementation`_ or
`the OVMF implementation`_.
.. _the original u-boot implementation: https://github.com/u-boot/u-boot/commit/ec80b4735a593961fe701cc3a5d717d4739b0fd0
.. _the OVMF implementation: https://github.com/tianocore/edk2/blob/1780373897f12c25075f8883e073144506441168/OvmfPkg/LinuxInitrdDynamicShellCommand/LinuxInitrdDynamicShellCommand.c

View File

@@ -18,8 +18,6 @@ import sphinx
# documentation root, use os.path.abspath to make it absolute, like shown here.
sys.path.insert(0, os.path.abspath("sphinx"))
from load_config import loadConfig # pylint: disable=C0413,E0401
# Minimal supported version
needs_sphinx = "3.4.3"
@@ -93,8 +91,12 @@ def config_init(app, config):
# LaTeX and PDF output require a list of documents with are dependent
# of the app.srcdir. Add them here
# When SPHINXDIRS is used, we just need to get index.rst, if it exists
# Handle the case where SPHINXDIRS is used
if not os.path.samefile(doctree, app.srcdir):
# Add a tag to mark that the build is actually a subproject
tags.add("subproject")
# get index.rst, if it exists
doc = os.path.basename(app.srcdir)
fname = "index"
if os.path.exists(os.path.join(app.srcdir, fname + ".rst")):
@@ -583,13 +585,6 @@ pdf_documents = [
kerneldoc_bin = "../scripts/kernel-doc.py"
kerneldoc_srctree = ".."
# ------------------------------------------------------------------------------
# Since loadConfig overwrites settings from the global namespace, it has to be
# the last statement in the conf.py file
# ------------------------------------------------------------------------------
loadConfig(globals())
def setup(app):
"""Patterns need to be updated at init time on older Sphinx versions"""

View File

@@ -92,18 +92,18 @@ There are two functions for dealing with the script:
void assoc_array_apply_edit(struct assoc_array_edit *edit);
This will perform the edit functions, interpolating various write barriers
to permit accesses under the RCU read lock to continue. The edit script
will then be passed to ``call_rcu()`` to free it and any dead stuff it points
to.
This will perform the edit functions, interpolating various write barriers
to permit accesses under the RCU read lock to continue. The edit script
will then be passed to ``call_rcu()`` to free it and any dead stuff it
points to.
2. Cancel an edit script::
void assoc_array_cancel_edit(struct assoc_array_edit *edit);
This frees the edit script and all preallocated memory immediately. If
this was for insertion, the new object is _not_ released by this function,
but must rather be released by the caller.
This frees the edit script and all preallocated memory immediately. If
this was for insertion, the new object is *not* released by this function,
but must rather be released by the caller.
These functions are guaranteed not to fail.
@@ -123,43 +123,43 @@ This points to a number of methods, all of which need to be provided:
unsigned long (*get_key_chunk)(const void *index_key, int level);
This should return a chunk of caller-supplied index key starting at the
*bit* position given by the level argument. The level argument will be a
multiple of ``ASSOC_ARRAY_KEY_CHUNK_SIZE`` and the function should return
``ASSOC_ARRAY_KEY_CHUNK_SIZE bits``. No error is possible.
This should return a chunk of caller-supplied index key starting at the
*bit* position given by the level argument. The level argument will be a
multiple of ``ASSOC_ARRAY_KEY_CHUNK_SIZE`` and the function should return
``ASSOC_ARRAY_KEY_CHUNK_SIZE bits``. No error is possible.
2. Get a chunk of an object's index key::
unsigned long (*get_object_key_chunk)(const void *object, int level);
As the previous function, but gets its data from an object in the array
rather than from a caller-supplied index key.
As the previous function, but gets its data from an object in the array
rather than from a caller-supplied index key.
3. See if this is the object we're looking for::
bool (*compare_object)(const void *object, const void *index_key);
Compare the object against an index key and return ``true`` if it matches and
``false`` if it doesn't.
Compare the object against an index key and return ``true`` if it matches
and ``false`` if it doesn't.
4. Diff the index keys of two objects::
int (*diff_objects)(const void *object, const void *index_key);
Return the bit position at which the index key of the specified object
differs from the given index key or -1 if they are the same.
Return the bit position at which the index key of the specified object
differs from the given index key or -1 if they are the same.
5. Free an object::
void (*free_object)(void *object);
Free the specified object. Note that this may be called an RCU grace period
after ``assoc_array_apply_edit()`` was called, so ``synchronize_rcu()`` may be
necessary on module unloading.
Free the specified object. Note that this may be called an RCU grace period
after ``assoc_array_apply_edit()`` was called, so ``synchronize_rcu()`` may
be necessary on module unloading.
Manipulation Functions
@@ -171,7 +171,7 @@ There are a number of functions for manipulating an associative array:
void assoc_array_init(struct assoc_array *array);
This initialises the base structure for an associative array. It can't fail.
This initialises the base structure for an associative array. It can't fail.
2. Insert/replace an object in an associative array::
@@ -182,21 +182,21 @@ This initialises the base structure for an associative array. It can't fail.
const void *index_key,
void *object);
This inserts the given object into the array. Note that the least
significant bit of the pointer must be zero as it's used to type-mark
pointers internally.
This inserts the given object into the array. Note that the least
significant bit of the pointer must be zero as it's used to type-mark
pointers internally.
If an object already exists for that key then it will be replaced with the
new object and the old one will be freed automatically.
If an object already exists for that key then it will be replaced with the
new object and the old one will be freed automatically.
The ``index_key`` argument should hold index key information and is
passed to the methods in the ops table when they are called.
The ``index_key`` argument should hold index key information and is
passed to the methods in the ops table when they are called.
This function makes no alteration to the array itself, but rather returns
an edit script that must be applied. ``-ENOMEM`` is returned in the case of
an out-of-memory error.
This function makes no alteration to the array itself, but rather returns
an edit script that must be applied. ``-ENOMEM`` is returned in the case of
an out-of-memory error.
The caller should lock exclusively against other modifiers of the array.
The caller should lock exclusively against other modifiers of the array.
3. Delete an object from an associative array::
@@ -206,15 +206,15 @@ The caller should lock exclusively against other modifiers of the array.
const struct assoc_array_ops *ops,
const void *index_key);
This deletes an object that matches the specified data from the array.
This deletes an object that matches the specified data from the array.
The ``index_key`` argument should hold index key information and is
passed to the methods in the ops table when they are called.
The ``index_key`` argument should hold index key information and is
passed to the methods in the ops table when they are called.
This function makes no alteration to the array itself, but rather returns
an edit script that must be applied. ``-ENOMEM`` is returned in the case of
an out-of-memory error. ``NULL`` will be returned if the specified object is
not found within the array.
This function makes no alteration to the array itself, but rather returns
an edit script that must be applied. ``-ENOMEM`` is returned in the case of
an out-of-memory error. ``NULL`` will be returned if the specified object
is not found within the array.
The caller should lock exclusively against other modifiers of the array.
@@ -225,14 +225,14 @@ The caller should lock exclusively against other modifiers of the array.
assoc_array_clear(struct assoc_array *array,
const struct assoc_array_ops *ops);
This deletes all the objects from an associative array and leaves it
completely empty.
This deletes all the objects from an associative array and leaves it
completely empty.
This function makes no alteration to the array itself, but rather returns
an edit script that must be applied. ``-ENOMEM`` is returned in the case of
an out-of-memory error.
This function makes no alteration to the array itself, but rather returns
an edit script that must be applied. ``-ENOMEM`` is returned in the case of
an out-of-memory error.
The caller should lock exclusively against other modifiers of the array.
The caller should lock exclusively against other modifiers of the array.
5. Destroy an associative array, deleting all objects::
@@ -240,14 +240,14 @@ The caller should lock exclusively against other modifiers of the array.
void assoc_array_destroy(struct assoc_array *array,
const struct assoc_array_ops *ops);
This destroys the contents of the associative array and leaves it
completely empty. It is not permitted for another thread to be traversing
the array under the RCU read lock at the same time as this function is
destroying it as no RCU deferral is performed on memory release -
something that would require memory to be allocated.
This destroys the contents of the associative array and leaves it
completely empty. It is not permitted for another thread to be traversing
the array under the RCU read lock at the same time as this function is
destroying it as no RCU deferral is performed on memory release -
something that would require memory to be allocated.
The caller should lock exclusively against other modifiers and accessors
of the array.
The caller should lock exclusively against other modifiers and accessors
of the array.
6. Garbage collect an associative array::
@@ -257,24 +257,24 @@ of the array.
bool (*iterator)(void *object, void *iterator_data),
void *iterator_data);
This iterates over the objects in an associative array and passes each one to
``iterator()``. If ``iterator()`` returns ``true``, the object is kept. If it
returns ``false``, the object will be freed. If the ``iterator()`` function
returns ``true``, it must perform any appropriate refcount incrementing on the
object before returning.
This iterates over the objects in an associative array and passes each one
to ``iterator()``. If ``iterator()`` returns ``true``, the object is kept.
If it returns ``false``, the object will be freed. If the ``iterator()``
function returns ``true``, it must perform any appropriate refcount
incrementing on the object before returning.
The internal tree will be packed down if possible as part of the iteration
to reduce the number of nodes in it.
The internal tree will be packed down if possible as part of the iteration
to reduce the number of nodes in it.
The ``iterator_data`` is passed directly to ``iterator()`` and is otherwise
ignored by the function.
The ``iterator_data`` is passed directly to ``iterator()`` and is otherwise
ignored by the function.
The function will return ``0`` if successful and ``-ENOMEM`` if there wasn't
enough memory.
The function will return ``0`` if successful and ``-ENOMEM`` if there wasn't
enough memory.
It is possible for other threads to iterate over or search the array under
the RCU read lock while this function is in progress. The caller should
lock exclusively against other modifiers of the array.
It is possible for other threads to iterate over or search the array under
the RCU read lock while this function is in progress. The caller should
lock exclusively against other modifiers of the array.
Access Functions
@@ -289,19 +289,19 @@ There are two functions for accessing an associative array:
void *iterator_data),
void *iterator_data);
This passes each object in the array to the iterator callback function.
``iterator_data`` is private data for that function.
This passes each object in the array to the iterator callback function.
``iterator_data`` is private data for that function.
This may be used on an array at the same time as the array is being
modified, provided the RCU read lock is held. Under such circumstances,
it is possible for the iteration function to see some objects twice. If
this is a problem, then modification should be locked against. The
iteration algorithm should not, however, miss any objects.
This may be used on an array at the same time as the array is being
modified, provided the RCU read lock is held. Under such circumstances,
it is possible for the iteration function to see some objects twice. If
this is a problem, then modification should be locked against. The
iteration algorithm should not, however, miss any objects.
The function will return ``0`` if no objects were in the array or else it will
return the result of the last iterator function called. Iteration stops
immediately if any call to the iteration function results in a non-zero
return.
The function will return ``0`` if no objects were in the array or else it
will return the result of the last iterator function called. Iteration
stops immediately if any call to the iteration function results in a
non-zero return.
2. Find an object in an associative array::
@@ -310,14 +310,14 @@ return.
const struct assoc_array_ops *ops,
const void *index_key);
This walks through the array's internal tree directly to the object
specified by the index key..
This walks through the array's internal tree directly to the object
specified by the index key.
This may be used on an array at the same time as the array is being
modified, provided the RCU read lock is held.
This may be used on an array at the same time as the array is being
modified, provided the RCU read lock is held.
The function will return the object if found (and set ``*_type`` to the object
type) or will return ``NULL`` if the object was not found.
The function will return the object if found (and set ``*_type`` to the
object type) or will return ``NULL`` if the object was not found.
Index Key Form
@@ -399,10 +399,11 @@ fixed levels. For example::
In the above example, there are 7 nodes (A-G), each with 16 slots (0-f).
Assuming no other meta data nodes in the tree, the key space is divided
thusly::
thusly:
=========== ====
KEY PREFIX NODE
========== ====
=========== ====
137* D
138* E
13[0-69-f]* C
@@ -410,10 +411,12 @@ thusly::
e6* G
e[0-57-f]* F
[02-df]* A
=========== ====
So, for instance, keys with the following example index keys will be found in
the appropriate nodes::
the appropriate nodes:
=============== ======= ====
INDEX KEY PREFIX NODE
=============== ======= ====
13694892892489 13 C
@@ -422,12 +425,13 @@ the appropriate nodes::
138bbb89003093 138 E
1394879524789 12 C
1458952489 1 B
9431809de993ba - A
b4542910809cd - A
9431809de993ba \- A
b4542910809cd \- A
e5284310def98 e F
e68428974237 e6 G
e7fffcbd443 e F
f3842239082 - A
f3842239082 \- A
=============== ======= ====
To save memory, if a node can hold all the leaves in its portion of keyspace,
then the node will have all those leaves in it and will not have any metadata
@@ -441,8 +445,9 @@ metadata pointer. If the metadata pointer is there, any leaf whose key matches
the metadata key prefix must be in the subtree that the metadata pointer points
to.
In the above example list of index keys, node A will contain::
In the above example list of index keys, node A will contain:
==== =============== ==================
SLOT CONTENT INDEX KEY (PREFIX)
==== =============== ==================
1 PTR TO NODE B 1*
@@ -450,11 +455,16 @@ In the above example list of index keys, node A will contain::
any LEAF b4542910809cd
e PTR TO NODE F e*
any LEAF f3842239082
==== =============== ==================
and node B::
and node B:
3 PTR TO NODE C 13*
any LEAF 1458952489
==== =============== ==================
SLOT CONTENT INDEX KEY (PREFIX)
==== =============== ==================
3 PTR TO NODE C 13*
any LEAF 1458952489
==== =============== ==================
Shortcuts

View File

@@ -547,11 +547,13 @@ Time and date
%pt[RT]s YYYY-mm-dd HH:MM:SS
%pt[RT]d YYYY-mm-dd
%pt[RT]t HH:MM:SS
%pt[RT][dt][r][s]
%ptSp <seconds>.<nanoseconds>
%pt[RST][dt][r][s]
For printing date and time as represented by::
R struct rtc_time structure
R content of struct rtc_time
S content of struct timespec64
T time64_t type
in human readable format.
@@ -563,6 +565,11 @@ The %pt[RT]s (space) will override ISO 8601 separator by using ' ' (space)
instead of 'T' (Capital T) between date and time. It won't have any effect
when date or time is omitted.
The %ptSp is equivalent to %lld.%09ld for the content of the struct timespec64.
When the other specifiers are given, it becomes the respective equivalent of
%ptT[dt][r][s].%09ld. In other words, the seconds are being printed in
the human readable format followed by a dot and nanoseconds.
Passed by reference.
struct clk

View File

@@ -302,10 +302,9 @@ follows:
Depending on the RNG type, the RNG must be seeded. The seed is provided
using the setsockopt interface to set the key. For example, the
ansi_cprng requires a seed. The DRBGs do not require a seed, but may be
seeded. The seed is also known as a *Personalization String* in NIST SP 800-90A
standard.
using the setsockopt interface to set the key. The SP800-90A DRBGs do
not require a seed, but may be seeded. The seed is also known as a
*Personalization String* in NIST SP 800-90A standard.
Using the read()/recvmsg() system calls, random numbers can be obtained.
The kernel generates at most 128 bytes in one call. If user space

View File

@@ -461,16 +461,9 @@ Comments
line comments is::
/*
* This is the preferred style
* for multi line comments.
*/
The networking comment style is a bit different, with the first line
not empty like the former::
/* This is the preferred comment style
* for files in net/ and drivers/net/
*/
* This is the preferred style
* for multi line comments.
*/
See: https://www.kernel.org/doc/html/latest/process/coding-style.html#commenting

View File

@@ -35,6 +35,12 @@ or be built into the kernel.
a good way of quickly testing everything applicable to the current
config.
KUnit can be enabled or disabled at boot time, and this behavior is
controlled by the kunit.enable kernel parameter.
By default, kunit.enable is set to 1 because KUNIT_DEFAULT_ENABLED is
enabled by default. To ensure that tests are executed as expected,
verify that kunit.enable=1 at boot time.
Once we have built our kernel (and/or modules), it is simple to run
the tests. If the tests are built-in, they will run automatically on the
kernel boot. The results will be written to the kernel log (``dmesg``)

View File

@@ -21,6 +21,9 @@ properties:
dma-coherent: true
iommus:
maxItems: 4
required:
- compatible
- reg

View File

@@ -13,6 +13,7 @@ properties:
compatible:
items:
- enum:
- qcom,kaanapali-inline-crypto-engine
- qcom,qcs8300-inline-crypto-engine
- qcom,sa8775p-inline-crypto-engine
- qcom,sc7180-inline-crypto-engine

View File

@@ -20,6 +20,7 @@ properties:
- qcom,ipq5332-trng
- qcom,ipq5424-trng
- qcom,ipq9574-trng
- qcom,kaanapali-trng
- qcom,qcs615-trng
- qcom,qcs8300-trng
- qcom,sa8255p-trng

View File

@@ -45,6 +45,7 @@ properties:
- items:
- enum:
- qcom,kaanapali-qce
- qcom,qcs615-qce
- qcom,qcs8300-qce
- qcom,sa8775p-qce

View File

@@ -1,17 +0,0 @@
* Microchip PIC32 Random Number Generator
The PIC32 RNG provides a pseudo random number generator which can be seeded by
another true random number generator.
Required properties:
- compatible : should be "microchip,pic32mzda-rng"
- reg : Specifies base physical address and size of the registers.
- clocks: clock phandle.
Example:
rng: rng@1f8e6000 {
compatible = "microchip,pic32mzda-rng";
reg = <0x1f8e6000 0x1000>;
clocks = <&PBCLK5>;
};

View File

@@ -0,0 +1,40 @@
# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
%YAML 1.2
---
$id: http://devicetree.org/schemas/rng/microchip,pic32-rng.yaml#
$schema: http://devicetree.org/meta-schemas/core.yaml#
title: Microchip PIC32 Random Number Generator
description: |
The PIC32 RNG provides a pseudo random number generator which can be seeded
by another true random number generator.
maintainers:
- Joshua Henderson <joshua.henderson@microchip.com>
properties:
compatible:
enum:
- microchip,pic32mzda-rng
reg:
maxItems: 1
clocks:
maxItems: 1
required:
- compatible
- reg
- clocks
additionalProperties: false
examples:
- |
rng: rng@1f8e6000 {
compatible = "microchip,pic32mzda-rng";
reg = <0x1f8e6000 0x1000>;
clocks = <&PBCLK5>;
};

View File

@@ -61,7 +61,7 @@ properties:
- qcom,msm8996-tsens
- qcom,msm8998-tsens
- qcom,qcm2290-tsens
- qcom,qcs8300-tsens
- qcom,qcs8300-tsens
- qcom,qcs615-tsens
- qcom,sa8255p-tsens
- qcom,sa8775p-tsens

View File

@@ -27,15 +27,15 @@ Usage
::
./scripts/checktransupdate.py --help
tools/docs/checktransupdate.py --help
Please refer to the output of argument parser for usage details.
Samples
- ``./scripts/checktransupdate.py -l zh_CN``
- ``tools/docs/checktransupdate.py -l zh_CN``
This will print all the files that need to be updated in the zh_CN locale.
- ``./scripts/checktransupdate.py Documentation/translations/zh_CN/dev-tools/testing-overview.rst``
- ``tools/docs/checktransupdate.py Documentation/translations/zh_CN/dev-tools/testing-overview.rst``
This will only print the status of the specified file.
Then the output is something like:

View File

@@ -152,7 +152,7 @@ generate links to that documentation. Adding ``kernel-doc`` directives to
the documentation to bring those comments in can help the community derive
the full value of the work that has gone into creating them.
The ``scripts/find-unused-docs.sh`` tool can be used to find these
The ``tools/docs/find-unused-docs.sh`` tool can be used to find these
overlooked comments.
Note that the most value comes from pulling in the documentation for

View File

@@ -405,6 +405,10 @@ Domain`_ references.
``%CONST``
Name of a constant. (No cross-referencing, just formatting.)
Examples::
%0 %NULL %-1 %-EFAULT %-EINVAL %-ENOMEM
````literal````
A literal block that should be handled as-is. The output will use a
``monospaced font``.
@@ -579,20 +583,23 @@ source.
How to use kernel-doc to generate man pages
-------------------------------------------
If you just want to use kernel-doc to generate man pages you can do this
from the kernel git tree::
To generate man pages for all files that contain kernel-doc markups, run::
$ scripts/kernel-doc -man \
$(git grep -l '/\*\*' -- :^Documentation :^tools) \
| scripts/split-man.pl /tmp/man
$ make mandocs
Some older versions of git do not support some of the variants of syntax for
path exclusion. One of the following commands may work for those versions::
Or calling ``script-build-wrapper`` directly::
$ scripts/kernel-doc -man \
$(git grep -l '/\*\*' -- . ':!Documentation' ':!tools') \
| scripts/split-man.pl /tmp/man
$ ./tools/docs/sphinx-build-wrapper mandocs
$ scripts/kernel-doc -man \
$(git grep -l '/\*\*' -- . ":(exclude)Documentation" ":(exclude)tools") \
| scripts/split-man.pl /tmp/man
The output will be at ``/man`` directory inside the output directory
(by default: ``Documentation/output``).
Optionally, it is possible to generate a partial set of man pages by
using SPHINXDIRS:
$ make SPHINXDIRS=driver-api/media mandocs
.. note::
When SPHINXDIRS={subdir} is used, it will only generate man pages for
the files explicitly inside a ``Documentation/{subdir}/.../*.rst`` file.

View File

@@ -5,173 +5,168 @@ Including uAPI header files
Sometimes, it is useful to include header files and C example codes in
order to describe the userspace API and to generate cross-references
between the code and the documentation. Adding cross-references for
userspace API files has an additional vantage: Sphinx will generate warnings
userspace API files has an additional advantage: Sphinx will generate warnings
if a symbol is not found at the documentation. That helps to keep the
uAPI documentation in sync with the Kernel changes.
The :ref:`parse_headers.pl <parse_headers>` provide a way to generate such
The :ref:`parse_headers.py <parse_headers>` provides a way to generate such
cross-references. It has to be called via Makefile, while building the
documentation. Please see ``Documentation/userspace-api/media/Makefile`` for an example
about how to use it inside the Kernel tree.
.. _parse_headers:
parse_headers.pl
^^^^^^^^^^^^^^^^
tools/docs/parse_headers.py
^^^^^^^^^^^^^^^^^^^^^^^^^^^
NAME
****
parse_headers.pl - parse a C file, in order to identify functions, structs,
parse_headers.py - parse a C file, in order to identify functions, structs,
enums and defines and create cross-references to a Sphinx book.
USAGE
*****
parse-headers.py [-h] [-d] [-t] ``FILE_IN`` ``FILE_OUT`` ``FILE_RULES``
SYNOPSIS
********
Converts a C header or source file ``FILE_IN`` into a ReStructured Text
included via ..parsed-literal block with cross-references for the
documentation files that describe the API. It accepts an optional
``FILE_RULES`` file to describe what elements will be either ignored or
be pointed to a non-default reference type/name.
\ **parse_headers.pl**\ [<options>] <C_FILE> <OUT_FILE> [<EXCEPTIONS_FILE>]
The output is written at ``FILE_OUT``.
Where <options> can be: --debug, --help or --usage.
It is capable of identifying ``define``, ``struct``, ``typedef``, ``enum``
and enum ``symbol``, creating cross-references for all of them.
It is also capable of distinguishing ``#define`` used for specifying
Linux-specific macros used to define ``ioctl``.
The optional ``FILE_RULES`` contains a set of rules like::
ignore ioctl VIDIOC_ENUM_FMT
replace ioctl VIDIOC_DQBUF vidioc_qbuf
replace define V4L2_EVENT_MD_FL_HAVE_FRAME_SEQ :c:type:`v4l2_event_motion_det`
POSITIONAL ARGUMENTS
********************
``FILE_IN``
Input C file
``FILE_OUT``
Output RST file
``FILE_RULES``
Exceptions file (optional)
OPTIONS
*******
\ **--debug**\
Put the script in verbose mode, useful for debugging.
\ **--usage**\
Prints a brief help message and exits.
\ **--help**\
Prints a more detailed help message and exits.
``-h``, ``--help``
show a help message and exit
``-d``, ``--debug``
Increase debug level. Can be used multiple times
``-t``, ``--toc``
instead of a literal block, outputs a TOC table at the RST file
DESCRIPTION
***********
Creates an enriched version of a Kernel header file with cross-links
to each C data structure type, from ``FILE_IN``, formatting it with
reStructuredText notation, either as-is or as a table of contents.
Convert a C header or source file (C_FILE), into a reStructuredText
included via ..parsed-literal block with cross-references for the
documentation files that describe the API. It accepts an optional
EXCEPTIONS_FILE with describes what elements will be either ignored or
be pointed to a non-default reference.
It accepts an optional ``FILE_RULES`` which describes what elements will be
either ignored or be pointed to a non-default reference, and optionally
defines the C namespace to be used.
The output is written at the (OUT_FILE).
It is meant to allow having more comprehensive documentation, where
uAPI headers will create cross-reference links to the code.
It is capable of identifying defines, functions, structs, typedefs,
enums and enum symbols and create cross-references for all of them.
It is also capable of distinguish #define used for specifying a Linux
ioctl.
The output is written at the ``FILE_OUT``.
The EXCEPTIONS_FILE contain two types of statements: \ **ignore**\ or \ **replace**\ .
The ``FILE_RULES`` may contain contain three types of statements:
**ignore**, **replace** and **namespace**.
The syntax for the ignore tag is:
By default, it create rules for all symbols and defines, but it also
allows parsing an exception file. Such file contains a set of rules
using the syntax below:
1. Ignore rules:
ignore \ **type**\ \ **name**\
ignore *type* *symbol*
The \ **ignore**\ means that it won't generate cross references for a
\ **name**\ symbol of type \ **type**\ .
Removes the symbol from reference generation.
The syntax for the replace tag is:
2. Replace rules:
replace *type* *old_symbol* *new_reference*
replace \ **type**\ \ **name**\ \ **new_value**\
Replaces *old_symbol* with a *new_reference*.
The *new_reference* can be:
The \ **replace**\ means that it will generate cross references for a
\ **name**\ symbol of type \ **type**\ , but, instead of using the default
replacement rule, it will use \ **new_value**\ .
- A simple symbol name;
- A full Sphinx reference.
For both statements, \ **type**\ can be either one of the following:
3. Namespace rules
namespace *namespace*
\ **ioctl**\
Sets C *namespace* to be used during cross-reference generation. Can
be overridden by replace rules.
The ignore or replace statement will apply to ioctl definitions like:
On ignore and replace rules, *type* can be:
#define VIDIOC_DBG_S_REGISTER _IOW('V', 79, struct v4l2_dbg_register)
- ioctl:
for defines of the form ``_IO*``, e.g., ioctl definitions
- define:
for other defines
- symbol:
for symbols defined within enums;
\ **define**\
- typedef:
for typedefs;
The ignore or replace statement will apply to any other #define found
at C_FILE.
\ **typedef**\
The ignore or replace statement will apply to typedef statements at C_FILE.
\ **struct**\
The ignore or replace statement will apply to the name of struct statements
at C_FILE.
\ **enum**\
The ignore or replace statement will apply to the name of enum statements
at C_FILE.
\ **symbol**\
The ignore or replace statement will apply to the name of enum value
at C_FILE.
For replace statements, \ **new_value**\ will automatically use :c:type:
references for \ **typedef**\ , \ **enum**\ and \ **struct**\ types. It will use :ref:
for \ **ioctl**\ , \ **define**\ and \ **symbol**\ types. The type of reference can
also be explicitly defined at the replace statement.
- enum:
for the name of a non-anonymous enum;
- struct:
for structs.
EXAMPLES
********
- Ignore a define ``_VIDEODEV2_H`` at ``FILE_IN``::
ignore define _VIDEODEV2_H
ignore define _VIDEODEV2_H
- On an data structure like this enum::
enum foo { BAR1, BAR2, PRIVATE };
It won't generate cross-references for ``PRIVATE``::
ignore symbol PRIVATE
At the same struct, instead of creating one cross reference per symbol,
make them all point to the ``enum foo`` C type::
replace symbol BAR1 :c:type:\`foo\`
replace symbol BAR2 :c:type:\`foo\`
Ignore a #define _VIDEODEV2_H at the C_FILE.
ignore symbol PRIVATE
On a struct like:
enum foo { BAR1, BAR2, PRIVATE };
It won't generate cross-references for \ **PRIVATE**\ .
replace symbol BAR1 :c:type:\`foo\`
replace symbol BAR2 :c:type:\`foo\`
On a struct like:
enum foo { BAR1, BAR2, PRIVATE };
It will make the BAR1 and BAR2 enum symbols to cross reference the foo
symbol at the C domain.
- Use C namespace ``MC`` for all symbols at ``FILE_IN``::
namespace MC
BUGS
****
@@ -184,7 +179,7 @@ COPYRIGHT
*********
Copyright (c) 2016 by Mauro Carvalho Chehab <mchehab+samsung@kernel.org>.
Copyright (c) 2016, 2025 by Mauro Carvalho Chehab <mchehab+huawei@kernel.org>.
License GPLv2: GNU GPL version 2 <https://gnu.org/licenses/gpl.html>.

View File

@@ -106,7 +106,7 @@ There's a script that automatically checks for Sphinx dependencies. If it can
recognize your distribution, it will also give a hint about the install
command line options for your distro::
$ ./scripts/sphinx-pre-install
$ ./tools/docs/sphinx-pre-install
Checking if the needed tools for Fedora release 26 (Twenty Six) are available
Warning: better to also install "texlive-luatex85".
You should run:
@@ -116,7 +116,7 @@ command line options for your distro::
. sphinx_2.4.4/bin/activate
pip install -r Documentation/sphinx/requirements.txt
Can't build as 1 mandatory dependency is missing at ./scripts/sphinx-pre-install line 468.
Can't build as 1 mandatory dependency is missing at ./tools/docs/sphinx-pre-install line 468.
By default, it checks all the requirements for both html and PDF, including
the requirements for images, math expressions and LaTeX build, and assumes
@@ -149,7 +149,7 @@ a venv with it with, and install minimal requirements with::
A more comprehensive test can be done by using:
scripts/test_doc_build.py
tools/docs/test_doc_build.py
Such script create one Python venv per supported version,
optionally building documentation for a range of Sphinx versions.

View File

@@ -7,6 +7,7 @@ PARPORT interface documentation
Described here are the following functions:
Global functions::
parport_register_driver
parport_unregister_driver
parport_enumerate
@@ -34,6 +35,7 @@ Global functions::
Port functions (can be overridden by low-level drivers):
SPP::
port->ops->read_data
port->ops->write_data
port->ops->read_status
@@ -46,17 +48,20 @@ Port functions (can be overridden by low-level drivers):
port->ops->data_reverse
EPP::
port->ops->epp_write_data
port->ops->epp_read_data
port->ops->epp_write_addr
port->ops->epp_read_addr
ECP::
port->ops->ecp_write_data
port->ops->ecp_read_data
port->ops->ecp_write_addr
Other::
port->ops->nibble_read_data
port->ops->byte_read_data
port->ops->compat_write_data

View File

@@ -14,7 +14,6 @@ the PLDM for Firmware Update standard
file-format
driver-ops
==================================
Overview of the ``pldmfw`` library
==================================

View File

@@ -709,7 +709,7 @@ Resources
USB Home Page: https://www.usb.org
linux-usb Mailing List Archives: https://marc.info/?l=linux-usb
linux-usb Mailing List Archives: https://lore.kernel.org/linux-usb
USB On-the-Go Basics:
https://www.maximintegrated.com/app-notes/index.mvp/id/1822

View File

@@ -290,11 +290,11 @@ Why cpio rather than tar?
This decision was made back in December, 2001. The discussion started here:
http://www.uwsg.iu.edu/hypermail/linux/kernel/0112.2/1538.html
- https://lore.kernel.org/lkml/a03cke$640$1@cesium.transmeta.com/
And spawned a second thread (specifically on tar vs cpio), starting here:
http://www.uwsg.iu.edu/hypermail/linux/kernel/0112.2/1587.html
- https://lore.kernel.org/lkml/3C25A06D.7030408@zytor.com/
The quick and dirty summary version (which is no substitute for reading
the above threads) is:
@@ -310,7 +310,7 @@ the above threads) is:
either way about the archive format, and there are alternative tools,
such as:
http://freecode.com/projects/afio
https://linux.die.net/man/1/afio
2) The cpio archive format chosen by the kernel is simpler and cleaner (and
thus easier to create and parse) than any of the (literally dozens of)
@@ -331,12 +331,12 @@ the above threads) is:
5) Al Viro made the decision (quote: "tar is ugly as hell and not going to be
supported on the kernel side"):
http://www.uwsg.iu.edu/hypermail/linux/kernel/0112.2/1540.html
- https://lore.kernel.org/lkml/Pine.GSO.4.21.0112222109050.21702-100000@weyl.math.psu.edu/
explained his reasoning:
- http://www.uwsg.iu.edu/hypermail/linux/kernel/0112.2/1550.html
- http://www.uwsg.iu.edu/hypermail/linux/kernel/0112.2/1638.html
- https://lore.kernel.org/lkml/Pine.GSO.4.21.0112222240530.21702-100000@weyl.math.psu.edu/
- https://lore.kernel.org/lkml/Pine.GSO.4.21.0112230849550.23300-100000@weyl.math.psu.edu/
and, most importantly, designed and implemented the initramfs code.

View File

@@ -249,7 +249,7 @@ sharing and lock acquisition rules as the regular filesystem.
This means that scrub cannot take *any* shortcuts to save time, because doing
so could lead to concurrency problems.
In other words, online fsck is not a complete replacement for offline fsck, and
a complete run of online fsck may take longer than online fsck.
a complete run of online fsck may take longer than offline fsck.
However, both of these limitations are acceptable tradeoffs to satisfy the
different motivations of online fsck, which are to **minimize system downtime**
and to **increase predictability of operation**.

View File

@@ -328,8 +328,14 @@ KBUILD_BUILD_TIMESTAMP
----------------------
Setting this to a date string overrides the timestamp used in the
UTS_VERSION definition (uname -v in the running kernel). The value has to
be a string that can be passed to date -d. The default value
is the output of the date command at one point during build.
be a string that can be passed to date -d. E.g.::
$ KBUILD_BUILD_TIMESTAMP="Mon Oct 13 00:00:00 UTC 2025" make
The default value is the output of the date command at one point during
build. If provided, this timestamp will also be used for mtime fields
within any initramfs archive. Initramfs mtimes are 32-bit, so dates before
the 1970 Unix epoch, or after 2106-02-07 06:28:15 UTC will fail.
KBUILD_BUILD_USER, KBUILD_BUILD_HOST
------------------------------------

View File

@@ -2182,9 +2182,11 @@ set_current_state() may be wrapped by:
which therefore also imply a general memory barrier after setting the state.
The whole sequence above is available in various canned forms, all of which
interpolate the memory barrier in the right place:
interpolate the memory barrier in the right place, for example:
wait_event();
wait_event_cmd();
wait_event_exclusive_cmd();
wait_event_interruptible();
wait_event_interruptible_exclusive();
wait_event_interruptible_timeout();
@@ -2192,8 +2194,6 @@ interpolate the memory barrier in the right place:
wait_event_timeout();
wait_on_bit();
wait_on_bit_lock();
wait_event_cmd();
wait_event_exclusive_cmd();
Secondly, code that performs a wake up normally follows something like this:

View File

@@ -28,8 +28,10 @@ MCAMSR and register xfer commands.
Register sets is common across APML protocols. IOCTL is providing synchronization
among protocols as transactions may create race condition.
$ ls -al /dev/sbrmi-3c
crw------- 1 root root 10, 53 Jul 10 11:13 /dev/sbrmi-3c
.. code-block:: bash
$ ls -al /dev/sbrmi-3c
crw------- 1 root root 10, 53 Jul 10 11:13 /dev/sbrmi-3c
apml_sbrmi driver registers hwmon sensors for monitoring power_cap_max,
current power consumption and managing power_cap.

View File

@@ -33,12 +33,12 @@ drivers/misc/mrvl_cn10k_dpi.c
Driver IOCTLs
=============
:c:macro::`DPI_MPS_MRRS_CFG`
:c:macro:`DPI_MPS_MRRS_CFG`
ioctl that sets max payload size & max read request size parameters of
a pem port to which DMA engines are wired.
:c:macro::`DPI_ENGINE_CFG`
:c:macro:`DPI_ENGINE_CFG`
ioctl that sets DMA engine's fifo sizes & max outstanding load request
thresholds.

View File

@@ -39,28 +39,28 @@ include/uapi/linux/tps6594_pfsm.h
Driver IOCTLs
=============
:c:macro::`PMIC_GOTO_STANDBY`
:c:macro:`PMIC_GOTO_STANDBY`
All device resources are powered down. The processor is off, and
no voltage domains are energized.
:c:macro::`PMIC_GOTO_LP_STANDBY`
:c:macro:`PMIC_GOTO_LP_STANDBY`
The digital and analog functions of the PMIC, which are not
required to be always-on, are turned off (low-power).
:c:macro::`PMIC_UPDATE_PGM`
:c:macro:`PMIC_UPDATE_PGM`
Triggers a firmware update.
:c:macro::`PMIC_SET_ACTIVE_STATE`
:c:macro:`PMIC_SET_ACTIVE_STATE`
One of the operational modes.
The PMICs are fully functional and supply power to all PDN loads.
All voltage domains are energized in both MCU and Main processor
sections.
:c:macro::`PMIC_SET_MCU_ONLY_STATE`
:c:macro:`PMIC_SET_MCU_ONLY_STATE`
One of the operational modes.
Only the power resources assigned to the MCU Safety Island are on.
:c:macro::`PMIC_SET_RETENTION_STATE`
:c:macro:`PMIC_SET_RETENTION_STATE`
One of the operational modes.
Depending on the triggers set, some DDR/GPIO voltage domains can
remain energized, while all other domains are off to minimize

View File

@@ -1,7 +1,10 @@
.. SPDX-License-Identifier: GPL-2.0
Introduction of Uacce
---------------------
Uacce (Unified/User-space-access-intended Accelerator Framework)
================================================================
Introduction
------------
Uacce (Unified/User-space-access-intended Accelerator Framework) targets to
provide Shared Virtual Addressing (SVA) between accelerators and processes.

View File

@@ -92,4 +92,4 @@ helpers, which abstract this config option.
and register state is separate, the alpha PALcode joins the two, and you
need to switch both together).
(From http://marc.info/?l=linux-kernel&m=93337278602211&w=2)
(From https://lore.kernel.org/lkml/Pine.LNX.4.10.9907301410280.752-100000@penguin.transmeta.com/)

View File

@@ -13,24 +13,19 @@ how the process works is required in order to be an effective part of it.
The big picture
---------------
The kernel developers use a loosely time-based release process, with a new
major kernel release happening every two or three months. The recent
release history looks like this:
The Linux kernel uses a loosely time-based, rolling release development
model. A new major kernel release (which we will call, as an example, 9.x)
[1]_ happens every two or three months, which comes with new features,
internal API changes, and more. A typical release can contain about 13,000
changesets with changes to several hundred thousand lines of code. Recent
releases, along with their dates, can be found at `Wikipedia
<https://en.wikipedia.org/wiki/Linux_kernel_version_history>`_.
====== =================
5.0 March 3, 2019
5.1 May 5, 2019
5.2 July 7, 2019
5.3 September 15, 2019
5.4 November 24, 2019
5.5 January 6, 2020
====== =================
Every 5.x release is a major kernel release with new features, internal
API changes, and more. A typical release can contain about 13,000
changesets with changes to several hundred thousand lines of code. 5.x is
the leading edge of Linux kernel development; the kernel uses a
rolling development model which is continually integrating major changes.
.. [1] Strictly speaking, the Linux kernel does not use semantic versioning
number scheme, but rather the 9.x pair identifies major release
version as a whole number. For each release, x is incremented,
but 9 is incremented only if x is deemed large enough (e.g.
Linux 5.0 is released following Linux 4.20).
A relatively straightforward discipline is followed with regard to the
merging of patches for each release. At the beginning of each development
@@ -48,9 +43,9 @@ detail later on).
The merge window lasts for approximately two weeks. At the end of this
time, Linus Torvalds will declare that the window is closed and release the
first of the "rc" kernels. For the kernel which is destined to be 5.6,
first of the "rc" kernels. For the kernel which is destined to be 9.x,
for example, the release which happens at the end of the merge window will
be called 5.6-rc1. The -rc1 release is the signal that the time to
be called 9.x-rc1. The -rc1 release is the signal that the time to
merge new features has passed, and that the time to stabilize the next
kernel has begun.
@@ -99,13 +94,15 @@ release is made. In the real world, this kind of perfection is hard to
achieve; there are just too many variables in a project of this size.
There comes a point where delaying the final release just makes the problem
worse; the pile of changes waiting for the next merge window will grow
larger, creating even more regressions the next time around. So most 5.x
kernels go out with a handful of known regressions though, hopefully, none
of them are serious.
larger, creating even more regressions the next time around. So most kernels
go out with a handful of known regressions, though, hopefully, none of them
are serious.
Once a stable release is made, its ongoing maintenance is passed off to the
"stable team," currently Greg Kroah-Hartman. The stable team will release
occasional updates to the stable release using the 5.x.y numbering scheme.
"stable team," currently consists of Greg Kroah-Hartman and Sasha Levin. The
stable team will release occasional updates to the stable release using the
9.x.y numbering scheme.
To be considered for an update release, a patch must (1) fix a significant
bug, and (2) already be merged into the mainline for the next development
kernel. Kernels will typically receive stable updates for a little more

View File

@@ -76,7 +76,7 @@ Don't use commas to avoid using braces:
if (condition)
do_this(), do_that();
Always uses braces for multiple statements:
Always use braces for multiple statements:
.. code-block:: c

View File

@@ -592,8 +592,9 @@ Both Tested-by and Reviewed-by tags, once received on mailing list from tester
or reviewer, should be added by author to the applicable patches when sending
next versions. However if the patch has changed substantially in following
version, these tags might not be applicable anymore and thus should be removed.
Usually removal of someone's Tested-by or Reviewed-by tags should be mentioned
in the patch changelog (after the '---' separator).
Usually removal of someone's Acked-by, Tested-by or Reviewed-by tags should be
mentioned in the patch changelog with an explanation (after the '---'
separator).
A Suggested-by: tag indicates that the patch idea is suggested by the person
named and ensures credit to the person for the idea: if we diligently credit

View File

@@ -39,8 +39,8 @@ of the box, e.g.::
Debian
******
Debian Testing and Debian Unstable (Sid), outside of the freeze period, provide
recent Rust releases and thus they should generally work out of the box, e.g.::
Debian 13 (Trixie), as well as Testing and Debian Unstable (Sid) provide recent
Rust releases and thus they should generally work out of the box, e.g.::
apt install rustc rust-src bindgen rustfmt rust-clippy

View File

@@ -10,6 +10,37 @@ of a Trust Source for greater security, while Encrypted Keys can be used on any
system. All user level blobs, are displayed and loaded in hex ASCII for
convenience, and are integrity verified.
Trusted Keys as Protected key
=============================
It is the secure way of keeping the keys in the kernel key-ring as Trusted-Key,
such that:
- Key-blob, an encrypted key-data, created to be stored, loaded and seen by
userspace.
- Key-data, the plain-key text in the system memory, to be used by
kernel space only.
Though key-data is not accessible to the user-space in plain-text, but it is in
plain-text in system memory, when used in kernel space. Even though kernel-space
attracts small surface attack, but with compromised kernel or side-channel
attack accessing the system memory can lead to a chance of the key getting
compromised/leaked.
In order to protect the key in kernel space, the concept of "protected-keys" is
introduced which will act as an added layer of protection. The key-data of the
protected keys is encrypted with Key-Encryption-Key(KEK), and decrypted inside
the trust source boundary. The plain-key text never available out-side in the
system memory. Thus, any crypto operation that is to be executed using the
protected key, can only be done by the trust source, which generated the
key blob.
Hence, if the protected-key is leaked or compromised, it is of no use to the
hacker.
Trusted keys as protected keys, with trust source having the capability of
generating:
- Key-Blob, to be loaded, stored and seen by user-space.
Trust Source
============
@@ -252,7 +283,7 @@ in bytes. Trusted Keys can be 32 - 128 bytes (256 - 1024 bits).
Trusted Keys usage: CAAM
------------------------
Usage::
Trusted Keys Usage::
keyctl add trusted name "new keylen" ring
keyctl add trusted name "load hex_blob" ring
@@ -262,6 +293,21 @@ Usage::
CAAM-specific format. The key length for new keys is always in bytes.
Trusted Keys can be 32 - 128 bytes (256 - 1024 bits).
Trusted Keys as Protected Keys Usage::
keyctl add trusted name "new keylen pk [options]" ring
keyctl add trusted name "load hex_blob [options]" ring
keyctl print keyid
where, 'pk' is used to direct trust source to generate protected key.
options:
key_enc_algo = For CAAM, supported enc algo are ECB(2), CCM(1).
"keyctl print" returns an ASCII hex copy of the sealed key, which is in a
CAAM-specific format. The key length for new keys is always in bytes.
Trusted Keys can be 32 - 128 bytes (256 - 1024 bits).
Trusted Keys usage: DCP
-----------------------
@@ -343,6 +389,46 @@ Load a trusted key from the saved blob::
f1f8fff03ad0acb083725535636addb08d73dedb9832da198081e5deae84bfaf0409c22b
e4a8aea2b607ec96931e6f4d4fe563ba
Create and save a trusted key as protected key named "kmk" of length 32 bytes.
::
$ keyctl add trusted kmk "new 32 pk key_enc_algo=1" @u
440502848
$ keyctl show
Session Keyring
-3 --alswrv 500 500 keyring: _ses
97833714 --alswrv 500 -1 \_ keyring: _uid.500
440502848 --alswrv 500 500 \_ trusted: kmk
$ keyctl print 440502848
0101000000000000000001005d01b7e3f4a6be5709930f3b70a743cbb42e0cc95e18e915
3f60da455bbf1144ad12e4f92b452f966929f6105fd29ca28e4d4d5a031d068478bacb0b
27351119f822911b0a11ba3d3498ba6a32e50dac7f32894dd890eb9ad578e4e292c83722
a52e56a097e6a68b3f56f7a52ece0cdccba1eb62cad7d817f6dc58898b3ac15f36026fec
d568bd4a706cb60bb37be6d8f1240661199d640b66fb0fe3b079f97f450b9ef9c22c6d5d
dd379f0facd1cd020281dfa3c70ba21a3fa6fc2471dc6d13ecf8298b946f65345faa5ef0
f1f8fff03ad0acb083725535636addb08d73dedb9832da198081e5deae84bfaf0409c22b
e4a8aea2b607ec96931e6f4d4fe563ba
$ keyctl pipe 440502848 > kmk.blob
Load a trusted key from the saved blob::
$ keyctl add trusted kmk "load `cat kmk.blob` key_enc_algo=1" @u
268728824
$ keyctl print 268728824
0101000000000000000001005d01b7e3f4a6be5709930f3b70a743cbb42e0cc95e18e915
3f60da455bbf1144ad12e4f92b452f966929f6105fd29ca28e4d4d5a031d068478bacb0b
27351119f822911b0a11ba3d3498ba6a32e50dac7f32894dd890eb9ad578e4e292c83722
a52e56a097e6a68b3f56f7a52ece0cdccba1eb62cad7d817f6dc58898b3ac15f36026fec
d568bd4a706cb60bb37be6d8f1240661199d640b66fb0fe3b079f97f450b9ef9c22c6d5d
dd379f0facd1cd020281dfa3c70ba21a3fa6fc2471dc6d13ecf8298b946f65345faa5ef0
f1f8fff03ad0acb083725535636addb08d73dedb9832da198081e5deae84bfaf0409c22b
e4a8aea2b607ec96931e6f4d4fe563ba
Reseal (TPM specific) a trusted key under new PCR values::
$ keyctl update 268728824 "update pcrinfo=`cat pcr.blob`"

View File

@@ -14,7 +14,7 @@
:license: GPL Version 2, June 1991 see Linux/COPYING for details.
The ``kernel-abi`` (:py:class:`KernelCmd`) directive calls the
scripts/get_abi.py script to parse the Kernel ABI files.
AbiParser class to parse the Kernel ABI files.
Overview of directive's argument and options.
@@ -43,9 +43,9 @@ from sphinx.util.docutils import switch_source_input
from sphinx.util import logging
srctree = os.path.abspath(os.environ["srctree"])
sys.path.insert(0, os.path.join(srctree, "scripts/lib/abi"))
sys.path.insert(0, os.path.join(srctree, "tools/lib/python"))
from abi_parser import AbiParser
from abi.abi_parser import AbiParser
__version__ = "1.0"

View File

@@ -13,7 +13,7 @@
:license: GPL Version 2, June 1991 see Linux/COPYING for details.
The ``kernel-feat`` (:py:class:`KernelFeat`) directive calls the
scripts/get_feat.pl script to parse the Kernel ABI files.
tools/docs/get_feat.pl script to parse the Kernel ABI files.
Overview of directive's argument and options.
@@ -34,7 +34,6 @@
import codecs
import os
import re
import subprocess
import sys
from docutils import nodes, statemachine
@@ -42,6 +41,11 @@ from docutils.statemachine import ViewList
from docutils.parsers.rst import directives, Directive
from sphinx.util.docutils import switch_source_input
srctree = os.path.abspath(os.environ["srctree"])
sys.path.insert(0, os.path.join(srctree, "tools/lib/python"))
from feat.parse_features import ParseFeature # pylint: disable=C0413
def ErrorString(exc): # Shamelessly stolen from docutils
return f'{exc.__class__.__name}: {exc}'
@@ -84,18 +88,16 @@ class KernelFeat(Directive):
srctree = os.path.abspath(os.environ["srctree"])
args = [
os.path.join(srctree, 'scripts/get_feat.pl'),
'rest',
'--enable-fname',
'--dir',
os.path.join(srctree, 'Documentation', self.arguments[0]),
]
feature_dir = os.path.join(srctree, 'Documentation', self.arguments[0])
feat = ParseFeature(feature_dir, False, True)
feat.parse()
if len(self.arguments) > 1:
args.extend(['--arch', self.arguments[1]])
lines = subprocess.check_output(args, cwd=os.path.dirname(doc.current_source)).decode('utf-8')
arch = self.arguments[1]
lines = feat.output_arch_table(arch)
else:
lines = feat.output_matrix()
line_regex = re.compile(r"^\.\. FILE (\S+)$")

View File

@@ -87,6 +87,8 @@ import os.path
import re
import sys
from difflib import get_close_matches
from docutils import io, nodes, statemachine
from docutils.statemachine import ViewList
from docutils.parsers.rst import Directive, directives
@@ -95,15 +97,17 @@ from docutils.parsers.rst.directives.body import CodeBlock, NumberLines
from sphinx.util import logging
srctree = os.path.abspath(os.environ["srctree"])
sys.path.insert(0, os.path.join(srctree, "tools/docs/lib"))
sys.path.insert(0, os.path.join(srctree, "tools/lib/python"))
from parse_data_structs import ParseDataStructs
from kdoc.parse_data_structs import ParseDataStructs
__version__ = "1.0"
logger = logging.getLogger(__name__)
RE_DOMAIN_REF = re.compile(r'\\ :(ref|c:type|c:func):`([^<`]+)(?:<([^>]+)>)?`\\')
RE_SIMPLE_REF = re.compile(r'`([^`]+)`')
RE_LINENO_REF = re.compile(r'^\s*-\s+LINENO_(\d+):\s+(.*)')
RE_SPLIT_DOMAIN = re.compile(r"(.*)\.(.*)")
def ErrorString(exc): # Shamelessly stolen from docutils
return f'{exc.__class__.__name}: {exc}'
@@ -212,14 +216,16 @@ class KernelInclude(Directive):
- a TOC table containing cross references.
"""
parser = ParseDataStructs()
parser.parse_file(path)
if 'exception-file' in self.options:
source_dir = os.path.dirname(os.path.abspath(
self.state_machine.input_lines.source(
self.lineno - self.state_machine.input_offset - 1)))
exceptions_file = os.path.join(source_dir, self.options['exception-file'])
parser.process_exceptions(exceptions_file)
else:
exceptions_file = None
parser.parse_file(path, exceptions_file)
# Store references on a symbol dict to be used at check time
if 'warn-broken' in self.options:
@@ -242,23 +248,32 @@ class KernelInclude(Directive):
# TOC output is a ReST file, not a literal. So, we can add line
# numbers
rawtext = parser.gen_toc()
include_lines = statemachine.string2lines(rawtext, tab_width,
convert_whitespace=True)
# Append line numbers data
startline = self.options.get('start-line', None)
endline = self.options.get('end-line', None)
relpath = os.path.relpath(path, srctree)
result = ViewList()
if startline and startline > 0:
offset = startline - 1
else:
offset = 0
for line in parser.gen_toc().split("\n"):
match = RE_LINENO_REF.match(line)
if not match:
result.append(line, path)
continue
for ln, line in enumerate(include_lines, start=offset):
result.append(line, path, ln)
ln, ref = match.groups()
ln = int(ln)
# Filter line range if needed
if startline and (ln < startline):
continue
if endline and (ln > endline):
continue
# Sphinx numerates starting with zero, but text editors
# and other tools start from one
realln = ln + 1
result.append(f"- {ref}: {relpath}#{realln}", path, ln)
self.state_machine.insert_input(result, path)
@@ -388,6 +403,63 @@ class KernelInclude(Directive):
# ==============================================================================
reported = set()
DOMAIN_INFO = {}
all_refs = {}
def fill_domain_info(env):
"""
Get supported reference types for each Sphinx domain and C namespaces
"""
if DOMAIN_INFO:
return
for domain_name, domain_instance in env.domains.items():
try:
object_types = list(domain_instance.object_types.keys())
DOMAIN_INFO[domain_name] = object_types
except AttributeError:
# Ignore domains that we can't retrieve object types, if any
pass
for domain in DOMAIN_INFO.keys():
domain_obj = env.get_domain(domain)
for name, dispname, objtype, docname, anchor, priority in domain_obj.get_objects():
ref_name = name.lower()
if domain == "c":
if '.' in ref_name:
ref_name = ref_name.split(".")[-1]
if not ref_name in all_refs:
all_refs[ref_name] = []
all_refs[ref_name].append(f"\t{domain}:{objtype}:`{name}` (from {docname})")
def get_suggestions(app, env, node,
original_target, original_domain, original_reftype):
"""Check if target exists in the other domain or with different reftypes."""
original_target = original_target.lower()
# Remove namespace if present
if original_domain == "c":
if '.' in original_target:
original_target = original_target.split(".")[-1]
suggestions = []
# If name exists, propose exact name match on different domains
if original_target in all_refs:
return all_refs[original_target]
# If not found, get a close match, using difflib.
# Such method is based on Ratcliff-Obershelp Algorithm, which seeks
# for a close match within a certain distance. We're using the defaults
# here, e.g. cutoff=0.6, proposing 3 alternatives
matches = get_close_matches(original_target, all_refs.keys())
for match in matches:
suggestions += all_refs[match]
return suggestions
def check_missing_refs(app, env, node, contnode):
"""Check broken refs for the files it creates xrefs"""
@@ -404,11 +476,13 @@ def check_missing_refs(app, env, node, contnode):
if node.source not in xref_files:
return None
fill_domain_info(env)
target = node.get('reftarget', '')
domain = node.get('refdomain', 'std')
reftype = node.get('reftype', '')
msg = f"can't link to: {domain}:{reftype}:: {target}"
msg = f"Invalid xref: {domain}:{reftype}:`{target}`"
# Don't duplicate warnings
data = (node.source, msg)
@@ -416,6 +490,10 @@ def check_missing_refs(app, env, node, contnode):
return None
reported.add(data)
suggestions = get_suggestions(app, env, node, target, domain, reftype)
if suggestions:
msg += ". Possible alternatives:\n" + '\n'.join(suggestions)
logger.warning(msg, location=node, type='ref', subtype='missing')
return None

View File

@@ -220,7 +220,7 @@
If you want them, please install non-variable ``Noto Sans CJK''
font families along with the texlive-xecjk package by following
instructions from
\sphinxcode{./scripts/sphinx-pre-install}.
\sphinxcode{./tools/docs/sphinx-pre-install}.
Having optional non-variable ``Noto Serif CJK'' font families will
improve the looks of those translations.
\end{sphinxadmonition}}

View File

@@ -42,10 +42,10 @@ from sphinx.util import logging
from pprint import pformat
srctree = os.path.abspath(os.environ["srctree"])
sys.path.insert(0, os.path.join(srctree, "scripts/lib/kdoc"))
sys.path.insert(0, os.path.join(srctree, "tools/lib/python"))
from kdoc_files import KernelFiles
from kdoc_output import RestFormat
from kdoc.kdoc_files import KernelFiles
from kdoc.kdoc_output import RestFormat
__version__ = '1.0'
kfiles = None

View File

@@ -1,60 +0,0 @@
# -*- coding: utf-8; mode: python -*-
# SPDX-License-Identifier: GPL-2.0
# pylint: disable=R0903, C0330, R0914, R0912, E0401
import os
import sys
from sphinx.util.osutil import fs_encoding
# ------------------------------------------------------------------------------
def loadConfig(namespace):
# ------------------------------------------------------------------------------
"""Load an additional configuration file into *namespace*.
The name of the configuration file is taken from the environment
``SPHINX_CONF``. The external configuration file extends (or overwrites) the
configuration values from the origin ``conf.py``. With this you are able to
maintain *build themes*. """
config_file = os.environ.get("SPHINX_CONF", None)
if (config_file is not None
and os.path.normpath(namespace["__file__"]) != os.path.normpath(config_file) ):
config_file = os.path.abspath(config_file)
# Let's avoid one conf.py file just due to latex_documents
start = config_file.find('Documentation/')
if start >= 0:
start = config_file.find('/', start + 1)
end = config_file.rfind('/')
if start >= 0 and end > 0:
dir = config_file[start + 1:end]
print("source directory: %s" % dir)
new_latex_docs = []
latex_documents = namespace['latex_documents']
for l in latex_documents:
if l[0].find(dir + '/') == 0:
has = True
fn = l[0][len(dir) + 1:]
new_latex_docs.append((fn, l[1], l[2], l[3], l[4]))
break
namespace['latex_documents'] = new_latex_docs
# If there is an extra conf.py file, load it
if os.path.isfile(config_file):
sys.stdout.write("load additional sphinx-config: %s\n" % config_file)
config = namespace.copy()
config['__file__'] = config_file
with open(config_file, 'rb') as f:
code = compile(f.read(), fs_encoding, 'exec')
exec(code, config)
del config['__file__']
namespace.update(config)
else:
config = namespace.copy()
config['tags'].add("subproject")
namespace.update(config)

View File

@@ -1,33 +0,0 @@
#!/bin/sh
# SPDX-License-Identifier: GPL-2.0+
#
# Figure out if we should follow a specific parallelism from the make
# environment (as exported by scripts/jobserver-exec), or fall back to
# the "auto" parallelism when "-jN" is not specified at the top-level
# "make" invocation.
sphinx="$1"
shift || true
parallel="$PARALLELISM"
if [ -z "$parallel" ] ; then
# If no parallelism is specified at the top-level make, then
# fall back to the expected "-jauto" mode that the "htmldocs"
# target has had.
auto=$(perl -e 'open IN,"'"$sphinx"' --version 2>&1 |";
while (<IN>) {
if (m/([\d\.]+)/) {
print "auto" if ($1 >= "1.7")
}
}
close IN')
if [ -n "$auto" ] ; then
parallel="$auto"
fi
fi
# Only if some parallelism has been determined do we add the -jN option.
if [ -n "$parallel" ] ; then
parallel="-j$parallel"
fi
exec "$sphinx" $parallel "$@"

View File

@@ -1,11 +1,15 @@
**-c**, **--cpus** *cpu-list*
Set the osnoise tracer to run the sample threads in the cpu-list.
Set the |tool| tracer to run the sample threads in the cpu-list.
By default, the |tool| tracer runs the sample threads on all CPUs.
**-H**, **--house-keeping** *cpu-list*
Run rtla control threads only on the given cpu-list.
If omitted, rtla will attempt to auto-migrate its main thread to any CPU that is not running any workload threads.
**-d**, **--duration** *time[s|m|h|d]*
Set the duration of the session.
@@ -35,17 +39,21 @@
**-P**, **--priority** *o:prio|r:prio|f:prio|d:runtime:period*
Set scheduling parameters to the osnoise tracer threads, the format to set the priority are:
Set scheduling parameters to the |tool| tracer threads, the format to set the priority are:
- *o:prio* - use SCHED_OTHER with *prio*;
- *r:prio* - use SCHED_RR with *prio*;
- *f:prio* - use SCHED_FIFO with *prio*;
- *d:runtime[us|ms|s]:period[us|ms|s]* - use SCHED_DEADLINE with *runtime* and *period* in nanoseconds.
If not set, tracer threads keep their default priority. For rtla user threads, it is set to SCHED_FIFO with priority 95. For kernel threads, see *osnoise* and *timerlat* tracer documentation for the running kernel version.
**-C**, **--cgroup**\[*=cgroup*]
Set a *cgroup* to the tracer's threads. If the **-C** option is passed without arguments, the tracer's thread will inherit **rtla**'s *cgroup*. Otherwise, the threads will be placed on the *cgroup* passed to the option.
If not set, the behavior differs between workload types. User workloads created by rtla will inherit rtla's cgroup. Kernel workloads are assigned the root cgroup.
**--warm-up** *s*
After starting the workload, let it run for *s* seconds before starting collecting the data, allowing the system to warm-up. Statistical data generated during warm-up is discarded.
@@ -53,6 +61,8 @@
**--trace-buffer-size** *kB*
Set the per-cpu trace buffer size in kB for the tracing output.
If not set, the default tracefs buffer size is used.
**--on-threshold** *action*
Defines an action to be executed when tracing is stopped on a latency threshold
@@ -67,7 +77,7 @@
- *trace[,file=<filename>]*
Saves trace output, optionally taking a filename. Alternative to -t/--trace.
Note that nlike -t/--trace, specifying this multiple times will result in
Note that unlike -t/--trace, specifying this multiple times will result in
the trace being saved multiple times.
- *signal,num=<sig>,pid=<pid>*

View File

@@ -13,7 +13,7 @@
Set the automatic trace mode. This mode sets some commonly used options
while debugging the system. It is equivalent to use **-T** *us* **-s** *us*
**-t**. By default, *timerlat* tracer uses FIFO:95 for *timerlat* threads,
thus equilavent to **-P** *f:95*.
thus equivalent to **-P** *f:95*.
**-p**, **--period** *us*
@@ -56,7 +56,7 @@
**-u**, **--user-threads**
Set timerlat to run without a workload, and then dispatches user-space workloads
to wait on the timerlat_fd. Once the workload is awakes, it goes to sleep again
to wait on the timerlat_fd. Once the workload is awakened, it goes to sleep again
adding so the measurement for the kernel-to-user and user-to-kernel to the tracer
output. **--user-threads** will be used unless the user specify **-k**.

View File

@@ -29,11 +29,11 @@ collection of the tracer output.
OPTIONS
=======
.. include:: common_osnoise_options.rst
.. include:: common_osnoise_options.txt
.. include:: common_top_options.rst
.. include:: common_top_options.txt
.. include:: common_options.rst
.. include:: common_options.txt
EXAMPLE
=======
@@ -106,4 +106,4 @@ AUTHOR
======
Written by Daniel Bristot de Oliveira <bristot@kernel.org>
.. include:: common_appendix.rst
.. include:: common_appendix.txt

View File

@@ -15,7 +15,7 @@ SYNOPSIS
DESCRIPTION
===========
.. include:: common_osnoise_description.rst
.. include:: common_osnoise_description.txt
The **rtla osnoise hist** tool collects all **osnoise:sample_threshold**
occurrence in a histogram, displaying the results in a user-friendly way.
@@ -24,11 +24,11 @@ collection of the tracer output.
OPTIONS
=======
.. include:: common_osnoise_options.rst
.. include:: common_osnoise_options.txt
.. include:: common_hist_options.rst
.. include:: common_hist_options.txt
.. include:: common_options.rst
.. include:: common_options.txt
EXAMPLE
=======
@@ -65,4 +65,4 @@ AUTHOR
======
Written by Daniel Bristot de Oliveira <bristot@kernel.org>
.. include:: common_appendix.rst
.. include:: common_appendix.txt

View File

@@ -15,7 +15,7 @@ SYNOPSIS
DESCRIPTION
===========
.. include:: common_osnoise_description.rst
.. include:: common_osnoise_description.txt
**rtla osnoise top** collects the periodic summary from the *osnoise* tracer,
including the counters of the occurrence of the interference source,
@@ -26,11 +26,11 @@ collection of the tracer output.
OPTIONS
=======
.. include:: common_osnoise_options.rst
.. include:: common_osnoise_options.txt
.. include:: common_top_options.rst
.. include:: common_top_options.txt
.. include:: common_options.rst
.. include:: common_options.txt
EXAMPLE
=======
@@ -60,4 +60,4 @@ AUTHOR
======
Written by Daniel Bristot de Oliveira <bristot@kernel.org>
.. include:: common_appendix.rst
.. include:: common_appendix.txt

View File

@@ -14,7 +14,7 @@ SYNOPSIS
DESCRIPTION
===========
.. include:: common_osnoise_description.rst
.. include:: common_osnoise_description.txt
The *osnoise* tracer outputs information in two ways. It periodically prints
a summary of the noise of the operating system, including the counters of
@@ -56,4 +56,4 @@ AUTHOR
======
Written by Daniel Bristot de Oliveira <bristot@kernel.org>
.. include:: common_appendix.rst
.. include:: common_appendix.txt

View File

@@ -16,7 +16,7 @@ SYNOPSIS
DESCRIPTION
===========
.. include:: common_timerlat_description.rst
.. include:: common_timerlat_description.txt
The **rtla timerlat hist** displays a histogram of each tracer event
occurrence. This tool uses the periodic information, and the
@@ -25,13 +25,13 @@ occurrence. This tool uses the periodic information, and the
OPTIONS
=======
.. include:: common_timerlat_options.rst
.. include:: common_timerlat_options.txt
.. include:: common_hist_options.rst
.. include:: common_hist_options.txt
.. include:: common_options.rst
.. include:: common_options.txt
.. include:: common_timerlat_aa.rst
.. include:: common_timerlat_aa.txt
EXAMPLE
=======
@@ -110,4 +110,4 @@ AUTHOR
======
Written by Daniel Bristot de Oliveira <bristot@kernel.org>
.. include:: common_appendix.rst
.. include:: common_appendix.txt

View File

@@ -16,23 +16,23 @@ SYNOPSIS
DESCRIPTION
===========
.. include:: common_timerlat_description.rst
.. include:: common_timerlat_description.txt
The **rtla timerlat top** displays a summary of the periodic output
from the *timerlat* tracer. It also provides information for each
operating system noise via the **osnoise:** tracepoints that can be
seem with the option **-T**.
seen with the option **-T**.
OPTIONS
=======
.. include:: common_timerlat_options.rst
.. include:: common_timerlat_options.txt
.. include:: common_top_options.rst
.. include:: common_top_options.txt
.. include:: common_options.rst
.. include:: common_options.txt
.. include:: common_timerlat_aa.rst
.. include:: common_timerlat_aa.txt
**--aa-only** *us*
@@ -133,4 +133,4 @@ AUTHOR
------
Written by Daniel Bristot de Oliveira <bristot@kernel.org>
.. include:: common_appendix.rst
.. include:: common_appendix.txt

View File

@@ -14,7 +14,7 @@ SYNOPSIS
DESCRIPTION
===========
.. include:: common_timerlat_description.rst
.. include:: common_timerlat_description.txt
The **rtla timerlat top** mode displays a summary of the periodic output
from the *timerlat* tracer. The **rtla timerlat hist** mode displays
@@ -51,4 +51,4 @@ AUTHOR
======
Written by Daniel Bristot de Oliveira <bristot@kernel.org>
.. include:: common_appendix.rst
.. include:: common_appendix.txt

View File

@@ -45,4 +45,4 @@ AUTHOR
======
Daniel Bristot de Oliveira <bristot@kernel.org>
.. include:: common_appendix.rst
.. include:: common_appendix.txt

View File

@@ -43,12 +43,12 @@ It is possible to follow the trace by reading the trace file::
<...>-868 [001] .... 54.030347: #2 context thread timer_latency 4351 ns
The tracer creates a per-cpu kernel thread with real-time priority that
prints two lines at every activation. The first is the *timer latency*
observed at the *hardirq* context before the activation of the thread.
The second is the *timer latency* observed by the thread. The ACTIVATION
ID field serves to relate the *irq* execution to its respective *thread*
execution.
The tracer creates a per-cpu kernel thread with real-time priority
SCHED_FIFO:95 that prints two lines at every activation. The first is
the *timer latency* observed at the *hardirq* context before the activation
of the thread. The second is the *timer latency* observed by the thread.
The ACTIVATION ID field serves to relate the *irq* execution to its
respective *thread* execution.
The *irq*/*thread* splitting is important to clarify in which context
the unexpected high value is coming from. The *irq* context can be

View File

@@ -13,28 +13,28 @@ dello spazio utente ha ulteriori vantaggi: Sphinx genererà dei messaggi
d'avviso se un simbolo non viene trovato nella documentazione. Questo permette
di mantenere allineate la documentazione della uAPI (API spazio utente)
con le modifiche del kernel.
Il programma :ref:`parse_headers.pl <it_parse_headers>` genera questi riferimenti.
Il programma :ref:`parse_headers.py <it_parse_headers>` genera questi riferimenti.
Esso dev'essere invocato attraverso un Makefile, mentre si genera la
documentazione. Per avere un esempio su come utilizzarlo all'interno del kernel
consultate ``Documentation/userspace-api/media/Makefile``.
.. _it_parse_headers:
parse_headers.pl
parse_headers.py
^^^^^^^^^^^^^^^^
NOME
****
parse_headers.pl - analizza i file C al fine di identificare funzioni,
parse_headers.py - analizza i file C al fine di identificare funzioni,
strutture, enumerati e definizioni, e creare riferimenti per Sphinx
SINTASSI
********
\ **parse_headers.pl**\ [<options>] <C_FILE> <OUT_FILE> [<EXCEPTIONS_FILE>]
\ **parse_headers.py**\ [<options>] <C_FILE> <OUT_FILE> [<EXCEPTIONS_FILE>]
Dove <options> può essere: --debug, --usage o --help.

View File

@@ -109,7 +109,7 @@ Sphinx. Se lo script riesce a riconoscere la vostra distribuzione, allora
sarà in grado di darvi dei suggerimenti su come procedere per completare
l'installazione::
$ ./scripts/sphinx-pre-install
$ ./tools/docs/sphinx-pre-install
Checking if the needed tools for Fedora release 26 (Twenty Six) are available
Warning: better to also install "texlive-luatex85".
You should run:
@@ -119,7 +119,7 @@ l'installazione::
. sphinx_2.4.4/bin/activate
pip install -r Documentation/sphinx/requirements.txt
Can't build as 1 mandatory dependency is missing at ./scripts/sphinx-pre-install line 468.
Can't build as 1 mandatory dependency is missing at ./tools/docs/sphinx-pre-install line 468.
L'impostazione predefinita prevede il controllo dei requisiti per la generazione
di documenti html e PDF, includendo anche il supporto per le immagini, le

View File

@@ -132,6 +132,25 @@ http://savannah.nongnu.org/projects/quilt
platform_set_drvdata(), but left the variable "dev" unused,
delete it.
特定のコミットで導入された不具合を修正する場合(例えば ``git bisect`` で原因となった
コミットを特定したときなど)は、コミットの SHA-1 の先頭12文字と1行の要約を添えた
「Fixes:」タグを付けてください。この行は75文字を超えても構いませんが、途中で
改行せず、必ず1行で記述してください。
例:
Fixes: 54a4f0239f2e ("KVM: MMU: make kvm_mmu_zap_page() return the number of pages it actually freed")
以下の git の設定を使うと、git log や git show で上記形式を出力するための
専用の出力形式を追加できます::
[core]
abbrev = 12
[pretty]
fixes = Fixes: %h (\"%s\")
使用例::
$ git log -1 --pretty=fixes 54a4f0239f2e
Fixes: 54a4f0239f2e ("KVM: MMU: make kvm_mmu_zap_page() return the number of pages it actually freed")
3) パッチの分割
@@ -409,7 +428,7 @@ Acked-by: が必ずしもパッチ全体の承認を示しているわけでは
このタグはパッチに関心があると思われる人達がそのパッチの議論に含まれていたこと
を明文化します。
14) Reported-by:, Tested-by:, Reviewed-by: および Suggested-by: の利用
14) Reported-by:, Tested-by:, Reviewed-by:, Suggested-by: および Fixes: の利用
他の誰かによって報告された問題を修正するパッチであれば、問題報告者という寄与を
クレジットするために、Reported-by: タグを追加することを検討してください。
@@ -465,6 +484,13 @@ Suggested-by: タグは、パッチのアイデアがその人からの提案に
クレジットしていけば、望むらくはその人たちが将来別の機会に再度力を貸す気に
なってくれるかもしれません。
Fixes: タグは、そのパッチが以前のコミットにあった問題を修正することを示します。
これは、バグがどこで発生したかを特定しやすくし、バグ修正のレビューに役立ちます。
また、このタグはstableカーネルチームが、あなたの修正をどのstableカーネル
バージョンに適用すべきか判断する手助けにもなります。パッチによって修正された
バグを示すには、この方法が推奨されます。前述の、「2) パッチに対する説明」の
セクションを参照してください。
15) 標準的なパッチのフォーマット
標準的なパッチのサブジェクトは以下のとおりです。

View File

@@ -288,4 +288,4 @@ Documentation/translations/zh_CN/admin-guide/bug-hunting.rst 。
更多用GDB调试内核的信息请参阅
Documentation/translations/zh_CN/dev-tools/gdb-kernel-debugging.rst
和 Documentation/dev-tools/kgdb.rst 。
和 Documentation/process/debugging/kgdb.rst 。

View File

@@ -0,0 +1,130 @@
.. SPDX-License-Identifier: GPL-2.0
.. include:: ../disclaimer-zh_CN.rst
:Original: Documentation/block/blk-mq.rst
:翻译:
柯子杰 kezijie <kezijie@leap-io-kernel.com>
:校译:
================================================
多队列块设备 I/O 排队机制 (blk-mq)
================================================
多队列块设备 I/O 排队机制提供了一组 API使高速存储设备能够同时在多个队列中
处理并发的 I/O 请求并将其提交到块设备,从而实现极高的每秒输入/输出操作次数
(IOPS),充分发挥现代存储设备的并行能力。
介绍
====
背景
----
磁盘从 Linux 内核开发初期就已成为事实上的标准。块 I/O 子系统的目标是尽可能
为此类设备提供最佳性能,因为它们在进行随机访问时代价极高,性能瓶颈主要在机械
运动部件上,其速度远低于存储栈中其他任何层。其中一个软件优化例子是根据硬盘磁
头当前的位置重新排序读/写请求。
然而,随着固态硬盘和非易失性存储的发展,它们没有机械部件,也不存在随机访问代
码,并能够进行高速并行访问,存储栈的瓶颈从存储设备转移到了操作系统。为了充分
利用这些设备设计中的并行性,引入了多队列机制。
原来的设计只有一个队列来存储块设备 I/O 请求,并且只使用一个锁。由于缓存中的
脏数据和多处理器共享单锁的瓶颈,这种设计在 SMP 系统中扩展性不佳。当不同进程
(或同一进程在不同 CPU 上)同时执行块设备 I/O 时,该单队列模型还会出现严重
的拥塞问题。为了解决这些问题blk-mq API 引入了多个队列,每个队列在本地 CPU
上拥有独立的入口点,从而消除了对全局锁的需求。关于其具体工作机制的更深入说明,
请参见下一节( `工作原理`_ )。
工作原理
--------
当用户空间执行对块设备的 I/O例如读写文件blk-mq 便会介入:它将存储和
管理发送到块设备的 I/O 请求,充当用户空间(文件系统,如果存在的话)与块设备驱
动之间的中间层。
blk-mq 由两组队列组成:软件暂存队列和硬件派发队列。当请求到达块层时,它会尝
试最短路径:直接发送到硬件队列。然而,有两种情况下可能不会这样做:如果该层有
IO 调度器或者是希望合并请求。在这两种情况下,请求将被发送到软件队列。
随后,在软件队列中的请求被处理后,请求会被放置到硬件队列。硬件队列是第二阶段
的队列,硬件可以直接访问并处理这些请求。然而,如果硬件没有足够的资源来接受更
多请求blk-mq 会将请求放置在临时队列中,待硬件资源充足时再发送。
软件暂存队列
~~~~~~~~~~~~
在这些请求未直接发送到驱动时,块设备 I/O 子系统会将请求添加到软件暂存队列中
(由 struct blk_mq_ctx 表示)。一个请求可能包含一个或多个 BIO。它们通过 struct bio
数据结构到达块层。块层随后会基于这些 BIO 构建新的结构体 struct request用于
与设备驱动通信。每个队列都有自己的锁,队列数量由每个 CPU 和每个 node 为基础
来决定。
暂存队列可用于合并相邻扇区的请求。例如对扇区3-6、6-7、7-9的请求可以合并
为对扇区3-9的一个请求。即便 SSD 或 NVM 的随机访问和顺序访问响应时间相同,
合并顺序访问的请求仍可减少单独请求的数量。这种合并请求的技术称为 plugging。
此外I/O 调度器还可以对请求进行重新排序以确保系统资源的公平性(例如防止某
个应用出现“饥饿”现象)或是提高 I/O 性能。
I/O 调度器
^^^^^^^^^^
块层实现了多种调度器,每种调度器都遵循一定启发式规则以提高 I/O 性能。它们是
“可插拔”的(plug and play),可在运行时通过 sysfs 选择。你可以在这里阅读更
多关于 Linux IO 调度器知识 `here
<https://www.kernel.org/doc/html/latest/block/index.html>`_。调度只发
生在同一队列内的请求之间,因此无法合并不同队列的请求,否则会造成缓存冲突并需
要为每个队列加锁。调度后,请求即可发送到硬件。可能选择的调度器之一是 NONE 调
度器,这是最直接的调度器:它只将请求放到进程所在的软件队列,不进行重新排序。
当设备开始处理硬件队列中的请求时(运行硬件队列),映射到该硬件队列的软件队列
会按映射顺序依次清空。
硬件派发队列
~~~~~~~~~~~~~
硬件队列(由 struct blk_mq_hw_ctx 表示)是设备驱动用来映射设备提交队列
(或设备 DMA 环缓存)的结构体,它是块层提交路径在底层设备驱动接管请求之前的
最后一个阶段。运行此队列时,块层会从相关软件队列中取出请求,并尝试派发到硬件。
如果请求无法直接发送到硬件,它们会被加入到请求的链表(``hctx->dispatch``) 中。
随后,当块层下次运行该队列时,会优先发送位于 ``dispatch`` 链表中的请求,
以确保那些最早准备好发送的请求能够得到公平调度。硬件队列的数量取决于硬件及
其设备驱动所支持的硬件上下文数但不会超过系统的CPU核心数。在这个阶段不
会发生重新排序,每个软件队列都有一组硬件队列来用于提交请求。
.. note::
块层和设备协议都不保证请求完成顺序。此问题需由更高层处理,例如文件系统。
基于标识的完成机制
~~~~~~~~~~~~~~~~~~~
为了指示哪一个请求已经完成,每个请求都会被分配一个整数标识,该标识的取值范围
是从0到分发队列的大小。这个标识由块层生成并在之后由设备驱动使用从而避
免了为每个请求再单独创建冗余的标识符。当请求在驱动中完成时,驱动会将该标识返
回给块层,以通知该请求已完成。这样,块层就无需再进行线性搜索来确定是哪一个
I/O 请求完成了。
更多阅读
--------
- `Linux 块 I/O多队列 SSD 并发访问简介 <http://kernel.dk/blk-mq.pdf>`_
- `NOOP 调度器 <https://en.wikipedia.org/wiki/Noop_scheduler>`_
- `Null 块设备驱动程序 <https://www.kernel.org/doc/html/latest/block/null_blk.html>`_
源代码
======
该API在以下内核代码中:
include/linux/blk-mq.h
block/blk-mq.c

View File

@@ -0,0 +1,192 @@
.. SPDX-License-Identifier: GPL-2.0
.. include:: ../disclaimer-zh_CN.rst
:Original: Documentation/block/data-integrity.rst
:翻译:
柯子杰 kezijie <kezijie@leap-io-kernel.com>
:校译:
==========
数据完整性
==========
1. 引言
=======
现代文件系统对数据和元数据都进行了校验和保护以防止数据损坏。然而,这种损坏的
检测是在读取时才进行,这可能发生在数据写入数月之后。到那时,应用程序尝试写入
的原始数据很可能已经丢失。
解决方案是确保磁盘实际存储的内容就是应用程序想存储的。SCSI 协议族(如 SBC
数据完整性字段、SCC 保护提案)以及 SATA/T13外部路径保护最近新增的功能
通过在 I/O 中附加完整性元数据的方式,试图解决这一问题。完整性元数据(在
SCSI 术语中称为保护信息)包括每个扇区的校验和,以及一个递增计数器,用于确保
各扇区按正确顺序被写入盘。在某些保护方案中,还能保证 I/O 写入磁盘的正确位置。
当前的存储控制器和设备实现了多种保护措施,例如校验和和数据清理。但这些技术通
常只在各自的独立域内工作,或最多仅在 I/O 路径的相邻节点之间发挥作用。DIF 及
其它数据完整性拓展有意思的点在于保护格式定义明确I/O 路径上的每个节点都可以
验证 I/O 的完整性,如检测到损坏可直接拒绝。这不仅可以防止数据损坏,还能够隔
离故障点。
2. 数据完整性拓展
=================
如上所述,这些协议扩展只保护控制器与存储设备之间的路径。然而,许多控制器实际
上允许操作系统与完整性元数据(IMD)交互。我们一直与多家 FC/SAS HBA 厂商合作,
使保护信息能够在其控制器与操作系统之间传输。
SCSI 数据完整性字段通过在每个扇区后附加8字节的保护信息来实现。数据 + 完整
性元数据存储在磁盘的520字节扇区中。数据 + IMD 在控制器与目标设备之间传输
时是交错组合在一起的。T13 提案的方式类似。
由于操作系统处理520字节甚至 4104 字节)扇区非常不便,我们联系了多家 HBA
厂商,并鼓励它们分离数据与完整性元数据的 scatter-gather lists。
控制器在写入时会将数据缓冲区和完整性元数据缓冲区的数据交错在一起,并在读取时
会拆分它们。这样Linux 就能直接通过 DMA 将数据缓冲区传输到主机内存或从主机
内存读取,而无需修改页缓存。
此外SCSI 与 SATA 规范要求的16位 CRC 校验在软件中计算代价较高。基准测试发
现,计算此校验在高负载情形下显著影响系统性能。一些控制器允许在操作系统接口处
使用轻量级校验。例如 Emulex 支持 TCP/IP 校验。操作系统提供的 IP 校验在写入
时会转换为16位 CRC读取时则相反。这允许 Linux 或应用程序以极低的开销生成
完整性元数据(与软件 RAID5 相当)。
IP 校验在检测位错误方面比 CRC 弱,但关键在于数据缓冲区与完整性元数据缓冲区
的分离。只有这两个不同的缓冲区匹配I/O 才能完成。
数据与完整性元数据缓冲区的分离以及校验选择被称为数据完整性扩展。由于这些扩展
超出了协议主体(T10、T13)的范围Oracle 及其合作伙伴正尝试在存储网络行业协
会内对其进行标准化。
3. 内核变更
===========
Linux 中的数据完整性框架允许将保护信息固定到 I/O 上,并在支持该功能的控制器
之间发送和接收。
SCSI 和 SATA 中完整性扩展的优势在于,它们能够保护从应用程序到存储设备的整个
路径。然而,这同时也是最大的劣势。这意味着保护信息必须采用磁盘可以理解的格式。
通常Linux/POSIX 应用程序并不关心所访问存储设备的具体细节。虚拟文件系统层
和块层会让硬件扇区大小和传输协议对应用程序完全透明。
然而,在准备发送到磁盘的保护信息时,就需要这种细节。因此,端到端保护方案的概
念实际上违反了层次结构。应用程序完全不应该知道它访问的是 SCSI 还是 SATA 磁盘。
Linux 中实现的数据完整性支持尝试将这些细节对应用程序隐藏。就应用程序(以及在
某种程度上内核)而言,完整性元数据是附加在 I/O 上的不透明信息。
当前实现允许块层自动为任何 I/O 生成保护信息。最终目标是将用户数据的完整性元
数据计算移至用户空间。内核中产生的元数据和其他 I/O 仍将使用自动生成接口。
一些存储设备允许为每个硬件扇区附加一个16位的标识值。这个标识空间的所有者是
块设备的所有者,也就是在多数情况下由文件系统掌控。文件系统可以利用这额外空间
按需为扇区附加标识。由于标识空间有限,块接口允许通过交错方式对更大的数据块标
识。这样8*16位的信息可以附加到典型的 4KB 文件系统块上。
这也意味着诸如 fsck 和 mkfs 等应用程序需要能够从用户空间访问并操作这些标记。
为此,正在开发一个透传接口。
4. 块层实现细节
===============
4.1 Bio
--------
当启用 CONFIG_BLK_DEV_INTEGRITY 时,数据完整性补丁会在 struct bio 中添加
一个新字段。调用 bio_integrity(bio) 会返回一个指向 struct bip 的指针,该
结构体包含了该 bio 的完整性负载。本质上bip 是一个精简版的 struct bio
中包含一个 bio_vec用于保存完整性元数据以及所需的维护信息bvec 池、向量计
数等)。
内核子系统可以通过调用 bio_integrity_alloc(bio) 来为某个 bio 启用数据完整
性保护。该函数会分配并附加一个 bip 到该 bio 上。
随后使用 bio_integrity_add_page() 将包含完整性元数据的单独页面附加到该 bio。
调用 bio_free() 会自动释放bip。
4.2 块设备
-----------
块设备可以在 queue_limits 结构中的 integrity 子结构中设置完整性信息。
对于分层块设备,需要选择一个适用于所有子设备的完整性配置文件。可以使用
queue_limits_stack_integrity() 来协助完成该操作。目前DM 和 MD linear、
RAID0 和 RAID1 已受支持。而RAID4/5/6因涉及应用标签仍需额外的开发工作。
5.0 块层完整性API
==================
5.1 普通文件系统
-----------------
普通文件系统并不知道其下层块设备具备发送或接收完整性元数据的能力。
在执行写操作时,块层会在调用 submit_bio() 时自动生成完整性元数据。
在执行读操作时I/O 完成后会触发完整性验证。
IMD 的生成与验证行为可以通过以下开关控制::
/sys/block/<bdev>/integrity/write_generate
and::
/sys/block/<bdev>/integrity/read_verify
flags.
5.2 具备完整性感知的文件系统
----------------------------
具备完整性感知能力的文件系统可以在准备 I/O 时附加完整性元数据,
并且如果底层块设备支持应用标签空间,也可以加以利用。
`bool bio_integrity_prep(bio);`
要为写操作生成完整性元数据或为读操作设置缓冲区,文件系统必须调用
bio_integrity_prep(bio)。
在调用此函数之前,必须先设置好 bio 的数据方向和起始扇区,并确
保该 bio 已经添加完所有的数据页。调用者需要自行保证,在 I/O 进行
期间 bio 不会被修改。如果由于某种原因准备失败,则应当以错误状态
完成该 bio。
5.3 传递已有的完整性元数据
--------------------------
能够自行生成完整性元数据或可以从用户空间传输完整性元数据的文件系统,
可以使用如下接口:
`struct bip * bio_integrity_alloc(bio, gfp_mask, nr_pages);`
为 bio 分配完整性负载并挂载到 bio 上。nr_pages 表示需要在
integrity bio_vec list 中存储多少页保护数据(类似 bio_alloc
完整性负载将在 bio_free() 被调用时释放。
`int bio_integrity_add_page(bio, page, len, offset);`
将包含完整性元数据的一页附加到已有的 bio 上。该 bio 必须已有 bip
即必须先调用 bio_integrity_alloc()。对于写操作,页中的完整
性元数据必须采用目标设备可识别的格式,但有一个例外,当请求在 I/O 栈
中传递时,扇区号会被重新映射。这意味着通过此接口添加的页在 I/O 过程
中可能会被修改!完整性元数据中的第一个引用标签必须等于 bip->bip_sector。
只要 bip bio_vec arraynr_pages有空间就可以继续通过
bio_integrity_add_page()添加页。
当读操作完成后,附加的页将包含从存储设备接收到的完整性元数据。
接收方需要处理这些元数据,并在操作完成时验证数据完整性
----------------------------------------------------------------------
2007-12-24 Martin K. Petersen <martin.petersen@oracle.com>

View File

@@ -0,0 +1,35 @@
.. SPDX-License-Identifier: GPL-2.0
.. include:: ../disclaimer-zh_CN.rst
:Original: Documentation/block/index.rst
:翻译:
柯子杰 ke zijie <kezijie@leap-io-kernel.com>
:校译:
=====
Block
=====
.. toctree::
:maxdepth: 1
blk-mq
data-integrity
TODOList:
* bfq-iosched
* biovecs
* cmdline-partition
* deadline-iosched
* inline-encryption
* ioprio
* kyber-iosched
* null_blk
* pr
* stat
* switching-sched
* writeback_cache_control
* ublk

View File

@@ -2,7 +2,7 @@
.. include:: ../disclaimer-zh_CN.rst
:Original: Documentation/dev-tools/gdb-kernel-debugging.rst
:Original: Documentation/process/debugging/gdb-kernel-debugging.rst
:Translator: 高超 gao chao <gaochao49@huawei.com>
通过gdb调试内核和模块

View File

@@ -28,15 +28,15 @@
::
./scripts/checktransupdate.py --help
tools/docs/checktransupdate.py --help
具体用法请参考参数解析器的输出
示例
- ``./scripts/checktransupdate.py -l zh_CN``
- ``tools/docs/checktransupdate.py -l zh_CN``
这将打印 zh_CN 语言中需要更新的所有文件。
- ``./scripts/checktransupdate.py Documentation/translations/zh_CN/dev-tools/testing-overview.rst``
- ``tools/docs/checktransupdate.py Documentation/translations/zh_CN/dev-tools/testing-overview.rst``
这将只打印指定文件的状态。
然后输出类似如下的内容:

View File

@@ -124,7 +124,7 @@ C代码编译器发出的警告常常会被视为误报从而导致出现了
这使得这些信息更难找到例如使Sphinx无法生成指向该文档的链接。将 ``kernel-doc``
指令添加到文档中以引入这些注释可以帮助社区获得为编写注释所做工作的全部价值。
``scripts/find-unused-docs.sh`` 工具可以用来找到这些被忽略的评论。
``tools/docs/find-unused-docs.sh`` 工具可以用来找到这些被忽略的评论。
请注意,将导出的函数和数据结构引入文档是最有价值的。许多子系统还具有供内部
使用的kernel-doc注释除非这些注释放在专门针对相关子系统开发人员的文档中

View File

@@ -13,20 +13,20 @@
有时为了描述用户空间API并在代码和文档之间生成交叉引用需要包含头文件和示例
C代码。为用户空间API文件添加交叉引用还有一个好处如果在文档中找不到相应符号
Sphinx将生成警告。这有助于保持用户空间API文档与内核更改同步。
:ref:`parse_headers.pl <parse_headers_zh>` 提供了生成此类交叉引用的一种方法。
:ref:`parse_headers.py <parse_headers_zh>` 提供了生成此类交叉引用的一种方法。
在构建文档时必须通过Makefile调用它。有关如何在内核树中使用它的示例请参阅
``Documentation/userspace-api/media/Makefile``
.. _parse_headers_zh:
parse_headers.pl
parse_headers.py
----------------
脚本名称
~~~~~~~~
parse_headers.pl——解析一个C文件识别函数、结构体、枚举、定义并对Sphinx文档
parse_headers.py——解析一个C文件识别函数、结构体、枚举、定义并对Sphinx文档
创建交叉引用。
@@ -34,7 +34,7 @@ parse_headers.pl——解析一个C文件识别函数、结构体、枚举、
~~~~~~~~
\ **parse_headers.pl**\ [<选项>] <C文件> <输出文件> [<例外文件>]
\ **parse_headers.py**\ [<选项>] <C文件> <输出文件> [<例外文件>]
<选项> 可以是: --debug, --help 或 --usage 。

View File

@@ -84,7 +84,7 @@ PDF和LaTeX构建
这有一个脚本可以自动检查Sphinx依赖项。如果它认得您的发行版还会提示您所用发行
版的安装命令::
$ ./scripts/sphinx-pre-install
$ ./tools/docs/sphinx-pre-install
Checking if the needed tools for Fedora release 26 (Twenty Six) are available
Warning: better to also install "texlive-luatex85".
You should run:
@@ -94,7 +94,7 @@ PDF和LaTeX构建
. sphinx_2.4.4/bin/activate
pip install -r Documentation/sphinx/requirements.txt
Can't build as 1 mandatory dependency is missing at ./scripts/sphinx-pre-install line 468.
Can't build as 1 mandatory dependency is missing at ./tools/docs/sphinx-pre-install line 468.
默认情况下它会检查html和PDF的所有依赖项包括图像、数学表达式和LaTeX构建的
需求并假设将使用虚拟Python环境。html构建所需的依赖项被认为是必需的其他依

View File

@@ -0,0 +1,67 @@
.. SPDX-License-Identifier: GPL-2.0
.. include:: ../disclaimer-zh_CN.rst
:Original: Documentation/filesystems/dnotify.rst
:翻译:
王龙杰 Wang Longjie <wang.longjie1@zte.com.cn>
==============
Linux 目录通知
==============
Stephen Rothwell <sfr@canb.auug.org.au>
目录通知的目的是使用户应用程序能够在目录或目录中的任何文件发生变更时收到通知。基本机制包括应用程序
通过 fcntl(2) 调用在目录上注册通知,通知本身则通过信号传递。
应用程序可以决定希望收到哪些 “事件” 的通知。当前已定义的事件如下:
========= =====================================
DN_ACCESS 目录中的文件被访问read
DN_MODIFY 目录中的文件被修改write,truncate
DN_CREATE 目录中创建了文件
DN_DELETE 目录中的文件被取消链接
DN_RENAME 目录中的文件被重命名
DN_ATTRIB 目录中的文件属性被更改chmod,chown
========= =====================================
通常,应用程序必须在每次通知后重新注册,但如果将 DN_MULTISHOT 与事件掩码进行或运算,则注册
将一直保持有效,直到被显式移除(通过注册为不接收任何事件)。
默认情况下SIGIO 信号将被传递给进程,且不附带其他有用的信息。但是,如果使用 F_SETSIG fcntl(2)
调用让内核知道要传递哪个信号,一个 siginfo 结构体将被传递给信号处理程序,该结构体的 si_fd 成员将
包含与发生事件的目录相关联的文件描述符。
应用程序最好选择一个实时信号SIGRTMIN + <n>),以便通知可以被排队。如果指定了 DN_MULTISHOT
这一点尤为重要。注意SIGRTMIN 通常是被阻塞的因此最好使用至少SIGRTMIN + 1。
实现预期(特性与缺陷 :-)
--------------------------
对于文件的任何本地访问,通知都应能正常工作,即使实际文件系统位于远程服务器上。这意味着,对本地用户
模式服务器提供的文件的远程访问应能触发通知。同样的,对本地内核 NFS 服务器提供的文件的远程访问
也应能触发通知。
为了尽可能减小对文件系统代码的影响文件硬链接的问题已被忽略。因此如果一个文件x存在于两个
目录a 和 b通过名称”a/x”对该文件进行的更改应通知给期望接收目录“a”通知的程序但不会
通知给期望接收目录“b”通知的程序。
此外,取消链接的文件仍会在它们链接到的最后一个目录中触发通知。
配置
----
Dnotify 由 CONFIG_DNOTIFY 配置选项控制。禁用该选项时fcntl(fd, F_NOTIFY, ...) 将返
回 -EINVAL。
示例
----
具体示例可参见 tools/testing/selftests/filesystems/dnotify_test.c。
注意
----
从 Linux 2.6.13 开始dnotify 已被 inotify 取代。有关 inotify 的更多信息,请参见
Documentation/filesystems/inotify.rst。

View File

@@ -0,0 +1,211 @@
.. SPDX-License-Identifier: GPL-2.0
.. include:: ../disclaimer-zh_CN.rst
:Original: Documentation/filesystems/gfs2-glocks.rst
:翻译:
邵明寅 Shao Mingyin <shao.mingyin@zte.com.cn>
:校译:
杨涛 yang tao <yang.tao172@zte.com.cn>
==================
Glock 内部加锁规则
==================
本文档阐述 glock 状态机内部运作的基本原理。每个 glock
fs/gfs2/incore.h 中的 struct gfs2_glock包含两把主要的内部锁
1. 自旋锁gl_lockref.lock用于保护内部状态
gl_state、gl_target和持有者列表gl_holders
2. 非阻塞的位锁GLF_LOCK用于防止其他线程同时调用
DLM 等操作。若某线程获取此锁,则在释放时必须调用
run_queue通常通过工作队列以确保所有待处理任务
得以完成。
gl_holders 列表包含与该 glock 关联的所有排队锁请求(不
仅是持有者)。若存在已持有的锁,它们将位于列表开头的连
续条目中。锁的授予严格遵循排队顺序。
glock 层用户可请求三种锁状态共享SH、延迟DF
排他EX。它们对应以下 DLM 锁模式:
========== ====== =====================================================
Glock 模式 DLM 锁模式
========== ====== =====================================================
UN IV/NL 未加锁(无关联的 DLM 锁)或 NL
SH PR 受保护读Protected read
DF CW 并发写Concurrent write
EX EX 排他Exclusive
========== ====== =====================================================
因此DF 本质上是一种与“常规”共享锁模式SH互斥的共
享模式。在 GFS2 中DF 模式专用于直接 I/O 操作。Glock
本质上是锁加缓存管理例程的组合,其缓存规则如下:
========== ============== ========== ========== ==============
Glock 模式 缓存元数据 缓存数据 脏数据 脏元数据
========== ============== ========== ========== ==============
UN 否 否 否 否
DF 是 否 否 否
SH 是 是 否 否
EX 是 是 是 是
========== ============== ========== ========== ==============
这些规则通过为每种 glock 定义的操作函数实现。并非所有
glock 类型都使用全部的模式,例如仅 inode glock 使用 DF 模
式。
glock 操作函数及类型常量说明表:
============== ========================================================
字段 用途
============== ========================================================
go_sync 远程状态变更前调用(如同步脏数据)
go_xmote_bh 远程状态变更后调用(如刷新缓存)
go_inval 远程状态变更需使缓存失效时调用
go_instantiate 获取 glock 时调用
go_held 每次获取 glock 持有者时调用
go_dump 为 debugfs 文件打印对象内容,或出错时将 glock 转储至日志
go_callback 若 DLM 发送回调以释放此锁时调用
go_unlocked 当 glock 解锁时调用dlm_unlock()
go_type glock 类型,``LM_TYPE_*``
go_flags 若 glock 关联地址空间则设置GLOF_ASPACE 标志
============== ========================================================
每种锁的最短持有时间是指在远程锁授予后忽略远程降级请求
的时间段。此举旨在防止锁在集群节点间持续弹跳而无实质进
展的情况,此现象常见于多节点写入的共享内存映射文件。通
过延迟响应远程回调的降级操作,为用户空间程序争取页面取
消映射前的处理时间。
未来计划将 glock 的 "EX" 模式设为本地共享,使本地锁通
过 i_mutex 实现而非 glock。
glock 操作函数的加锁规则:
============== ====================== =============================
操作 GLF_LOCK 位锁持有 gl_lockref.lock 自旋锁持有
============== ====================== =============================
go_sync 是 否
go_xmote_bh 是 否
go_inval 是 否
go_instantiate 否 否
go_held 否 否
go_dump 有时 是
go_callback 有时N/A
go_unlocked 是 否
============== ====================== =============================
.. Note::
若入口处持有锁则操作期间不得释放位锁或自旋锁。
go_dump 和 do_demote_ok 严禁阻塞。
仅当 glock 状态指示其缓存最新数据时才会调用 go_dump。
GFS2 内部的 glock 加锁顺序:
1. i_rwsem如需要
2. 重命名 glock仅用于重命名
3. Inode glock
(父级优先于子级,同级 inode 按锁编号排序)
4. Rgrp glock用于分配操作
5. 事务 glock通过 gfs2_trans_begin非读操作
6. i_rw_mutex如需要
7. 页锁(始终最后,至关重要!)
每个 inode 对应两把 glock一把管理 inode 本身(加锁顺
序如上),另一把(称为 iopen glock结合 inode 的
i_nlink 字段决定 inode 生命周期。inode 加锁基于单个
inodergrp 加锁基于单个 rgrp。通常优先获取本地锁再获
取集群锁。
Glock 统计
----------
统计分为两类:超级块相关统计和单个 glock 相关统计。超级
块统计按每 CPU 执行以减少收集开销,并进一步按 glock 类
型细分。所有时间单位为纳秒。
超级块和 glock 统计收集相同信息。超级块时序统计为 glock
时序统计提供默认值,使新建 glock 具有合理的初始值。每个
glock 的计数器在创建时初始化为零,当 glock 从内存移除时
统计丢失。
统计包含三组均值/方差对及两个计数器。均值/方差对为平滑
指数估计,算法与网络代码中的往返时间计算类似(参见《
TCP/IP详解 卷1》第21.3节及《卷2》第25.10节)。与 TCP/IP
案例不同,此处均值/方差未缩放且单位为整数纳秒。
三组均值/方差对测量以下内容:
1. DLM 锁时间(非阻塞请求)
2. DLM 锁时间(阻塞请求)
3. 请求间隔时间(指向 DLM
非阻塞请求指无论目标 DLM 锁处于何种状态均能立即完成的请求。
当前满足条件的请求包括:(a)锁当前状态为互斥(如锁降级)、
(b)请求状态为空置或解锁(同样如锁降级)、或(c)设置"try lock"
标志的请求。其余锁请求均属阻塞请求。
两个计数器分别统计:
1. 锁请求总数(决定均值/方差计算的数据量)
2. glock 代码顶层的持有者排队数(通常远大于 DLM 锁请求数)
为什么收集这些统计数据?我们需深入分析时序参数的动因如下:
1. 更精准设置 glock "最短持有时间"
2. 快速识别性能问题
3. 改进资源组分配算法(基于锁等待时间而非盲目 "try lock"
因平滑更新的特性,采样量的阶跃变化需经 8 次采样(方差需
4 次)才能完全体现,解析结果时需审慎考虑。
通过锁请求完成时间和 glock 平均锁请求间隔时间,可计算节
点使用 glock 时长与集群共享时长的占比,对设置锁最短持有
时间至关重要。
我们已采取严谨措施,力求精准测量目标量值。任何测量系统均
存在误差,但我期望当前方案已达到合理精度极限。
超级块状态统计路径::
/sys/kernel/debug/gfs2/<fsname>/sbstats
Glock 状态统计路径::
/sys/kernel/debug/gfs2/<fsname>/glstats
(假设 debugfs 挂载于 /sys/kernel/debug且 <fsname> 替
换为对应 GFS2 文件系统名)
输出缩写说明:
========= ============================================
srtt 非阻塞 DLM 请求的平滑往返时间
srttvar srtt 的方差估计
srttb (潜在)阻塞 DLM 请求的平滑往返时间
srttvarb srttb 的方差估计
sirt DLM 请求的平滑请求间隔时间
sirtvar sirt 的方差估计
dlm DLM 请求数glstats 文件中的 dcnt
queue 排队的 glock 请求数glstats 文件中的 qcnt
========= ============================================
sbstats文件按glock类型每种类型8行和CPU核心每CPU一列
记录统计数据集。glstats文件则为每个glock提供统计集其格式
与glocks文件类似但所有时序统计量均采用均值/方差格式存储。
gfs2_glock_lock_time 跟踪点实时输出目标 glock 的当前统计
并附带每次接收到的dlm响应附加信息
====== ============
status DLM 请求状态
flags DLM 请求标志
tdiff 该请求的耗时
====== ============
(其余字段同上表)

View File

@@ -0,0 +1,97 @@
.. SPDX-License-Identifier: GPL-2.0
.. include:: ../disclaimer-zh_CN.rst
:Original: Documentation/filesystems/gfs2-uevents.rst
:翻译:
邵明寅 Shao Mingyin <shao.mingyin@zte.com.cn>
:校译:
杨涛 yang tao <yang.tao172@zte.com.cn>
===============
uevents 与 GFS2
===============
在 GFS2 文件系统的挂载生命周期内,会生成多个 uevent。
本文档解释了这些事件的含义及其用途(被 gfs2-utils 中的 gfs_controld 使用)。
GFS2 uevents 列表
=================
1. ADD
------
ADD 事件发生在挂载时。它始终是新建文件系统生成的第一个 uevent。如果挂载成
功,随后会生成 ONLINE uevent。如果挂载失败则随后会生成 REMOVE uevent。
ADD uevent 包含两个环境变量SPECTATOR=[0|1] 和 RDONLY=[0|1],分别用
于指定文件系统的观察者状态(一种未分配日志的只读挂载)和只读状态(已分配日志)。
2. ONLINE
---------
ONLINE uevent 在成功挂载或重新挂载后生成。它具有与 ADD uevent 相同的环
境变量。ONLINE uevent 及其用于标识观察者和 RDONLY 状态的两个环境变量是较
新版本内核引入的功能2.6.32-rc+ 及以上),旧版本内核不会生成此事件。
3. CHANGE
---------
CHANGE uevent 在两种场景下使用。一是报告第一个节点成功挂载文件系统时
FIRSTMOUNT=Done。这作为信号告知 gfs_controld此时集群中其他节点可以
安全挂载该文件系统。
另一个 CHANGE uevent 用于通知文件系统某个日志的日志恢复已完成。它包含两个
环境变量JID= 指定刚恢复的日志 IDRECOVERY=[Done|Failed] 表示操作成
功与否。这些 uevent 会在每次日志恢复时生成,无论是在初始挂载过程中,还是
gfs_controld 通过 /sys/fs/gfs2/<fsname>/lock_module/recovery 文件
请求特定日志恢复的结果。
由于早期版本的 gfs_controld 使用 CHANGE uevent 时未检查环境变量以确定状
态,若为其添加新功能,存在用户工具版本过旧导致集群故障的风险。因此,在新增用
于标识成功挂载或重新挂载的 uevent 时,选择了使用 ONLINE uevent。
4. OFFLINE
----------
OFFLINE uevent 仅在文件系统发生错误时生成,是 "withdraw" 机制的一部分。
当前该事件未提供具体错误信息,此问题有待修复。
5. REMOVE
---------
REMOVE uevent 在挂载失败结束或卸载文件系统时生成。所有 REMOVE uevent
之前都至少存在同一文件系统的 ADD uevent。与其他 uevent 不同,它由内核的
kobject 子系统自动生成。
所有 GFS2 uevents 的通用信息uevent 环境变量)
===============================================
1. LOCKTABLE=
--------------
LOCKTABLE 是一个字符串其值来源于挂载命令行locktable=)或 fstab 文件。
它用作文件系统标签,并为 lock_dlm 类型的挂载提供加入集群所需的信息。
2. LOCKPROTO=
-------------
LOCKPROTO 是一个字符串,其值取决于挂载命令行或 fstab 中的设置。其值将是
lock_nolock 或 lock_dlm。未来可能支持其他锁管理器。
3. JOURNALID=
-------------
如果文件系统正在使用日志(观察者挂载不分配日志),则所有 GFS2 uevent 中都
会包含此变量,其值为数字形式的日志 ID。
4. UUID=
--------
在较新版本的 gfs2-utils 中mkfs.gfs2 会向文件系统超级块写入 UUID。若存
在 UUID所有与该文件系统相关的 uevent 中均会包含此信息。

View File

@@ -0,0 +1,57 @@
.. SPDX-License-Identifier: GPL-2.0
.. include:: ../disclaimer-zh_CN.rst
:Original: Documentation/filesystems/gfs2.rst
:翻译:
邵明寅 Shao Mingyin <shao.mingyin@zte.com.cn>
:校译:
杨涛 yang tao <yang.tao172@zte.com.cn>
=====================================
全局文件系统 2 (Global File System 2)
=====================================
GFS2 是一个集群文件系统。它允许一组计算机同时使用在它们之间共享的块设备(通
过 FC、iSCSI、NBD 等。GFS2 像本地文件系统一样读写块设备,但也使用一个锁
模块来让计算机协调它们的 I/O 操作从而维护文件系统的一致性。GFS2 的出色特
性之一是完美一致性——在一台机器上对文件系统所做的更改会立即显示在集群中的所
有其他机器上。
GFS2 使用可互换的节点间锁定机制,当前支持的机制有:
lock_nolock
- 允许将 GFS2 用作本地文件系统
lock_dlm
- 使用分布式锁管理器 (dlm) 进行节点间锁定。
该 dlm 位于 linux/fs/dlm/
lock_dlm 依赖于在上述 URL 中找到的用户空间集群管理系统。
若要将 GFS2 用作本地文件系统,则不需要外部集群系统,只需::
$ mkfs -t gfs2 -p lock_nolock -j 1 /dev/block_device
$ mount -t gfs2 /dev/block_device /dir
在所有集群节点上都需要安装 gfs2-utils 软件包;对于 lock_dlm您还需要按
照文档配置 dlm 和 corosync 用户空间工具。
gfs2-utils 可在 https://pagure.io/gfs2-utils 找到。
GFS2 在磁盘格式上与早期版本的 GFS 不兼容,但它已相当接近。
以下手册页 (man pages) 可在 gfs2-utils 中找到:
============ =============================================
fsck.gfs2 用于修复文件系统
gfs2_grow 用于在线扩展文件系统
gfs2_jadd 用于在线向文件系统添加日志
tunegfs2 用于操作、检查和调优文件系统
gfs2_convert 用于将 gfs 文件系统原地转换为 GFS2
mkfs.gfs2 用于创建文件系统
============ =============================================

View File

@@ -15,6 +15,16 @@ Linux Kernel中的文件系统
文件系统VFS层以及基于其上的各种文件系统如何工作呈现给大家。当前\
可以看到下面的内容。
核心 VFS 文档
=============
有关 VFS 层本身以及其算法工作方式的文档,请参阅这些手册。
.. toctree::
:maxdepth: 1
dnotify
文件系统
========
@@ -26,4 +36,9 @@ Linux Kernel中的文件系统
virtiofs
debugfs
tmpfs
ubifs
ubifs-authentication
gfs2
gfs2-uevents
gfs2-glocks
inotify

View File

@@ -0,0 +1,80 @@
.. SPDX-License-Identifier: GPL-2.0
.. include:: ../disclaimer-zh_CN.rst
:Original: Documentation/filesystems/inotify.rst
:翻译:
王龙杰 Wang Longjie <wang.longjie1@zte.com.cn>
==========================================
Inotify - 一个强大且简单的文件变更通知系统
==========================================
文档由 Robert Love <rml@novell.com> 于 2005 年 3 月 15 日开始撰写
文档由 Zhang Zhen <zhenzhang.zhang@huawei.com> 于 2015 年 1 月 4 日更新
- 删除了已废弃的接口,关于用户接口请参考手册页。
(i) 基本原理
问:
不将监控项与被监控对象打开的文件描述符fd绑定这背后的设计决策是什么
答:
监控项会与打开的 inotify 设备相关联,而非与打开的文件相关联。这解决了 dnotify 的主要问题:
保持文件打开会锁定文件更糟的是还会锁定挂载点。因此dnotify 在带有可移动介质的桌面系统
上难以使用,因为介质将无法被卸载。监控文件不应要求文件处于打开状态。
问:
与每个监控项一个文件描述符的方式相比,采用每个实例一个文件描述符的设计决策是出于什么
考虑?
答:
每个监控项一个文件描述符会很快的消耗掉超出允许数量的文件描述符,其数量会超出实际可管理的范
围,也会超出 select() 能高效处理的范围。诚然root 用户可以提高每个进程的文件描述符限制,
用户也可以使用 epoll但同时要求这两者是不合理且多余的。一个监控项所消耗的内存比一个打开的文
件要少,因此将这两个数量空间分开是合理的。当前的设计正是用户空间开发者所期望的:用户只需初始
化一次 inotify然后添加 n 个监控项,而这只需要一个文件描述符,无需调整文件描述符限制。初
始化 inotify 实例初始化两千次是很荒谬的。如果我们能够简洁地实现用户空间的偏好——而且我们
确实可以idr 层让这类事情变得轻而易举——那么我们就应该这么做。
还有其他合理的理由。如果只有一个文件描述符,那就只需要在该描述符上阻塞,它对应着一个事件队列。
这个单一文件描述符会返回所有的监控事件以及任何可能的带外数据。而如果每个文件描述符都是一个独
立的监控项,
- 将无法知晓事件的顺序。文件 foo 和文件 bar 上的事件会触发两个文件描述符上的 poll()
但无法判断哪个事件先发生。而用单个队列就可以很容易的提供事件的顺序。这种顺序对现有的应用程
序(如 Beagle至关重要。想象一下如果“mv a b ; mv b a”这样的事件没有顺序会是什么
情况。
- 我们将不得不维护 n 个文件描述符和 n 个带有状态的内部队列,而不是仅仅一个。这在 kernel 中
会混乱得多。单个线性队列是合理的数据结构。
- 用户空间开发者更青睐当前的 API。例如Beagle 的开发者们就很喜欢它。相信我,我问过他们。
这并不奇怪:谁会想通过 select 来管理以及阻塞在 1000 个文件描述符上呢?
- 无法获取带外数据。
- 1024 这个数量仍然太少。 ;-)
当要设计一个可扩展到数千个目录的文件变更通知系统时,处理数千个文件描述符似乎并不是合适的接口。
这太繁琐了。
此外,创建多个实例、处理多个队列以及相应的多个文件描述符是可行的。不必是每个进程对应一个文件描
述符;而是每个队列对应一个文件描述符,一个进程完全可能需要多个队列。
问:
为什么采用系统调用的方式?
答:
糟糕的用户空间接口是 dnotify 的第二大问题。信号对于文件通知来说是一种非常糟糕的接口。其实对
于其他任何事情,信号也都不是好的接口。从各个角度来看,理想的解决方案是基于文件描述符的,它允许
基本的文件 I/O 操作以及 poll/select 操作。获取文件描述符和管理监控项既可以通过设备文件来
实现,也可以通过一系列新的系统调用来实现。我们决定采用一系列系统调用,因为这是提供新的内核接口
的首选方法。两者之间唯一真正的区别在于,我们是想使用 open(2) 和 ioctl(2),还是想使用几
个新的系统调用。系统调用比 ioctl 更有优势。

Some files were not shown because too many files have changed in this diff Show More