Commit Graph

239 Commits

Author SHA1 Message Date
Linus Torvalds
1896ce8eb6 Merge tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fsverity/linux
Pull interleaved SHA-256 hashing support from Eric Biggers:
 "Optimize fsverity with 2-way interleaved hashing

  Add support for 2-way interleaved SHA-256 hashing to lib/crypto/, and
  make fsverity use it for faster file data verification. This improves
  fsverity performance on many x86_64 and arm64 processors.

  Later, I plan to make dm-verity use this too"

* tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fsverity/linux:
  fsverity: Use 2-way interleaved SHA-256 hashing when supported
  fsverity: Remove inode parameter from fsverity_hash_block()
  lib/crypto: tests: Add tests and benchmark for sha256_finup_2x()
  lib/crypto: x86/sha256: Add support for 2-way interleaved hashing
  lib/crypto: arm64/sha256: Add support for 2-way interleaved hashing
  lib/crypto: sha256: Add support for 2-way interleaved hashing
2025-09-29 15:55:20 -07:00
Linus Torvalds
d8768fb12a Merge tag 'libcrypto-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux
Pull crypto library updates from Eric Biggers:

 - Add a RISC-V optimized implementation of Poly1305. This code was
   written by Andy Polyakov and contributed by Zhihang Shao.

 - Migrate the MD5 code into lib/crypto/, and add KUnit tests for MD5.

   Yes, it's still the 90s, and several kernel subsystems are still
   using MD5 for legacy use cases. As long as that remains the case,
   it's helpful to clean it up in the same way as I've been doing for
   other algorithms.

   Later, I plan to convert most of these users of MD5 to use the new
   MD5 library API instead of the generic crypto API.

 - Simplify the organization of the ChaCha, Poly1305, BLAKE2s, and
   Curve25519 code.

   Consolidate these into one module per algorithm, and centralize the
   configuration and build process. This is the same reorganization that
   has already been successful for SHA-1 and SHA-2.

 - Remove the unused crypto_kpp API for Curve25519.

 - Migrate the BLAKE2s and Curve25519 self-tests to KUnit.

 - Always enable the architecture-optimized BLAKE2s code.

* tag 'libcrypto-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux: (38 commits)
  crypto: md5 - Implement export_core() and import_core()
  wireguard: kconfig: simplify crypto kconfig selections
  lib/crypto: tests: Enable Curve25519 test when CRYPTO_SELFTESTS
  lib/crypto: curve25519: Consolidate into single module
  lib/crypto: curve25519: Move a couple functions out-of-line
  lib/crypto: tests: Add Curve25519 benchmark
  lib/crypto: tests: Migrate Curve25519 self-test to KUnit
  crypto: curve25519 - Remove unused kpp support
  crypto: testmgr - Remove curve25519 kpp tests
  crypto: x86/curve25519 - Remove unused kpp support
  crypto: powerpc/curve25519 - Remove unused kpp support
  crypto: arm/curve25519 - Remove unused kpp support
  crypto: hisilicon/hpre - Remove unused curve25519 kpp support
  lib/crypto: tests: Add KUnit tests for BLAKE2s
  lib/crypto: blake2s: Consolidate into single C translation unit
  lib/crypto: blake2s: Move generic code into blake2s.c
  lib/crypto: blake2s: Always enable arch-optimized BLAKE2s code
  lib/crypto: blake2s: Remove obsolete self-test
  lib/crypto: x86/blake2s: Reduce size of BLAKE2S_SIGMA2
  lib/crypto: chacha: Consolidate into single module
  ...
2025-09-29 15:48:56 -07:00
Linus Torvalds
e2fffe1d95 Merge tag 'crc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux
Pull CRC updates from Eric Biggers:
 "Update crc_kunit to test the CRC functions in softirq and hardirq
  contexts, similar to what the lib/crypto/ KUnit tests do. Move the
  helper function needed to do this into a common header.

  This is useful mainly to test fallback code paths for when
  FPU/SIMD/vector registers are unusable"

* tag 'crc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux:
  Documentation/staging: Fix typo and incorrect citation in crc32.rst
  lib/crc: Drop inline from all *_mod_init_arch() functions
  lib/crc: Use underlying functions instead of crypto_simd_usable()
  lib/crc: crc_kunit: Test CRC computation in interrupt contexts
  kunit, lib/crypto: Move run_irq_test() to common header
2025-09-29 15:36:42 -07:00
Eric Biggers
6733968be7 lib/crypto: tests: Add tests and benchmark for sha256_finup_2x()
Update sha256_kunit to include test cases and a benchmark for the new
sha256_finup_2x() function.

Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20250915160819.140019-5-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-09-17 13:09:40 -05:00
Eric Biggers
bc6d6a4172 lib/crypto: x86/sha256: Add support for 2-way interleaved hashing
Add an implementation of sha256_finup_2x_arch() for x86_64.  It
interleaves the computation of two SHA-256 hashes using the x86 SHA-NI
instructions.  dm-verity and fs-verity will take advantage of this for
greatly improved performance on capable CPUs.

This increases the throughput of SHA-256 hashing 4096-byte messages by
the following amounts on the following CPUs:

    Intel Ice Lake (server):        4%
    Intel Sapphire Rapids:          38%
    Intel Emerald Rapids:           38%
    AMD Zen 1 (Threadripper 1950X): 84%
    AMD Zen 4 (EPYC 9B14):          98%
    AMD Zen 5 (Ryzen 9 9950X):      64%

For now, this seems to benefit AMD more than Intel.  This seems to be
because current AMD CPUs support concurrent execution of the SHA-NI
instructions, but unfortunately current Intel CPUs don't, except for the
sha256msg2 instruction.  Hopefully future Intel CPUs will support SHA-NI
on more execution ports.  Zen 1 supports 2 concurrent sha256rnds2, and
Zen 4 supports 4 concurrent sha256rnds2, which suggests that even better
performance may be achievable on Zen 4 by interleaving more than two
hashes.  However, doing so poses a number of trade-offs, and furthermore
Zen 5 goes back to supporting "only" 2 concurrent sha256rnds2.

Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20250915160819.140019-4-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-09-17 13:09:40 -05:00
Eric Biggers
34c3f1e346 lib/crypto: arm64/sha256: Add support for 2-way interleaved hashing
Add an implementation of sha256_finup_2x_arch() for arm64.  It
interleaves the computation of two SHA-256 hashes using the ARMv8
SHA-256 instructions.  dm-verity and fs-verity will take advantage of
this for greatly improved performance on capable CPUs.

This increases the throughput of SHA-256 hashing 4096-byte messages by
the following amounts on the following CPUs:

    ARM Cortex-X1: 70%
    ARM Cortex-X3: 68%
    ARM Cortex-A76: 65%
    ARM Cortex-A715: 43%
    ARM Cortex-A510: 25%
    ARM Cortex-A55: 8%

Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20250915160819.140019-3-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-09-17 13:09:39 -05:00
Eric Biggers
4ca24d6abb lib/crypto: sha256: Add support for 2-way interleaved hashing
Many arm64 and x86_64 CPUs can compute two SHA-256 hashes in nearly the
same speed as one, if the instructions are interleaved.  This is because
SHA-256 is serialized block-by-block, and two interleaved hashes take
much better advantage of the CPU's instruction-level parallelism.

Meanwhile, a very common use case for SHA-256 hashing in the Linux
kernel is dm-verity and fs-verity.  Both use a Merkle tree that has a
fixed block size, usually 4096 bytes with an empty or 32-byte salt
prepended.  Usually, many blocks need to be hashed at a time.  This is
an ideal scenario for 2-way interleaved hashing.

To enable this optimization, add a new function sha256_finup_2x() to the
SHA-256 library API.  It computes the hash of two equal-length messages,
starting from a common initial context.

For now it always falls back to sequential processing.  Later patches
will wire up arm64 and x86_64 optimized implementations.

Note that the interleaving factor could in principle be higher than 2x.
However, that runs into many practical difficulties and CPU throughput
limitations.  Thus, both the implementations I'm adding are 2x.  In the
interest of using the simplest solution, the API matches that.

Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20250915160819.140019-2-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-09-17 13:09:39 -05:00
Eric Biggers
cb2d6b132a lib/crypto: tests: Enable Curve25519 test when CRYPTO_SELFTESTS
Now that the Curve25519 library has been disentangled from CRYPTO,
adding CRYPTO_SELFTESTS as a default value of
CRYPTO_LIB_CURVE25519_KUNIT_TEST no longer causes a recursive kconfig
dependency.  Do this, which makes this option consistent with the other
crypto KUnit test options in the same file.

Link: https://lore.kernel.org/r/20250906213523.84915-12-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-09-06 16:32:43 -07:00
Eric Biggers
68546e5632 lib/crypto: curve25519: Consolidate into single module
Reorganize the Curve25519 library code:

- Build a single libcurve25519 module, instead of up to three modules:
  libcurve25519, libcurve25519-generic, and an arch-specific module.

- Move the arch-specific Curve25519 code from arch/$(SRCARCH)/crypto/ to
  lib/crypto/$(SRCARCH)/.  Centralize the build rules into
  lib/crypto/Makefile and lib/crypto/Kconfig.

- Include the arch-specific code directly in lib/crypto/curve25519.c via
  a header, rather than using a separate .c file.

- Eliminate the entanglement with CRYPTO.  CRYPTO_LIB_CURVE25519 no
  longer selects CRYPTO, and the arch-specific Curve25519 code no longer
  depends on CRYPTO.

This brings Curve25519 in line with the latest conventions for
lib/crypto/, used by other algorithms.  The exception is that I kept the
generic code in separate translation units for now.  (Some of the
function names collide between the x86 and generic Curve25519 code.  And
the Curve25519 functions are very long anyway, so inlining doesn't
matter as much for Curve25519 as it does for some other algorithms.)

Link: https://lore.kernel.org/r/20250906213523.84915-11-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-09-06 16:32:43 -07:00
Eric Biggers
8c06b330e8 lib/crypto: curve25519: Move a couple functions out-of-line
Move curve25519() and curve25519_generate_public() from curve25519.h to
curve25519.c.  There's no good reason for them to be inline.

Link: https://lore.kernel.org/r/20250906213523.84915-10-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-09-06 16:32:43 -07:00
Eric Biggers
643d79e531 lib/crypto: tests: Add Curve25519 benchmark
Add a benchmark to curve25519_kunit.  This brings it in line with the
other crypto KUnit tests and provides an easy way to measure
performance.

Link: https://lore.kernel.org/r/20250906213523.84915-9-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-09-06 16:32:42 -07:00
Eric Biggers
afc4e4a5f1 lib/crypto: tests: Migrate Curve25519 self-test to KUnit
Move the Curve25519 test from an ad-hoc self-test to a KUnit test.

Generally keep the same test logic for now, just translated to KUnit.
There's one exception, which is that I dropped the incomplete test of
curve25519_generic().  The approach I'm taking to cover the different
implementations with the KUnit tests is to just rely on booting kernels
in QEMU with different '-cpu' options, rather than try to make the tests
(incompletely) test multiple implementations on one CPU.  This way, both
the test and the library API are simpler.

This commit makes the file lib/crypto/curve25519.c no longer needed, as
its only purpose was to call the self-test.  However, keep it for now,
since a later commit will add code to it again.

Temporarily omit the default value of CRYPTO_SELFTESTS that the other
lib/crypto/ KUnit tests have.  It would cause a recursive kconfig
dependency, since the Curve25519 code is still entangled with CRYPTO.  A
later commit will fix that.

Link: https://lore.kernel.org/r/20250906213523.84915-8-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-09-06 16:32:19 -07:00
Eric Biggers
362f922860 lib/crypto: tests: Add KUnit tests for BLAKE2s
Add a KUnit test suite for BLAKE2s.  Most of the core test logic is in
the previously-added hash-test-template.h.  This commit just adds the
actual KUnit suite, commits the generated test vectors to the tree so
that gen-hash-testvecs.py won't have to be run at build time, and adds a
few BLAKE2s-specific test cases.

This is the replacement for blake2s-selftest, which an earlier commit
removed.  Improvements over blake2s-selftest include integration with
KUnit, more comprehensive test cases, and support for benchmarking.

Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20250827151131.27733-13-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-08-29 09:50:19 -07:00
Eric Biggers
39ee3970f2 lib/crypto: blake2s: Consolidate into single C translation unit
As was done with the other algorithms, reorganize the BLAKE2s code so
that the generic implementation and the arch-specific "glue" code is
consolidated into a single translation unit, so that the compiler will
inline the functions and automatically decide whether to include the
generic code in the resulting binary or not.

Similarly, also consolidate the build rules into
lib/crypto/{Makefile,Kconfig}.  This removes the last uses of
lib/crypto/{arm,x86}/{Makefile,Kconfig}, so remove those too.

Don't keep the !KMSAN dependency.  It was needed only for other
algorithms such as ChaCha that initialize memory from assembly code.

Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20250827151131.27733-12-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-08-29 09:50:19 -07:00
Eric Biggers
5d313a7625 lib/crypto: blake2s: Move generic code into blake2s.c
Move blake2s_compress_generic() from blake2s-generic.c to blake2s.c.

For now it's still guarded by CONFIG_CRYPTO_LIB_BLAKE2S_GENERIC, but
this prepares for changing it to a 'static __maybe_unused' function and
just using the compiler to automatically decide its inclusion.

Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20250827151131.27733-11-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-08-29 09:50:19 -07:00
Eric Biggers
56e48d4e13 lib/crypto: blake2s: Always enable arch-optimized BLAKE2s code
When support for a crypto algorithm is enabled, the arch-optimized
implementation of that algorithm should be enabled too.  We've learned
this the hard way many times over the years: people regularly forget to
enable the arch-optimized implementations of the crypto algorithms,
resulting in significant performance being left on the table.

Currently, BLAKE2s support is always enabled ('obj-y'), since random.c
uses it.  Therefore, the arch-optimized BLAKE2s code, which exists for
ARM and x86_64, should be always enabled too.  Let's do that.

Note that the effect on kernel image size is very small and should not
be a concern.  On ARM, enabling CRYPTO_BLAKE2S_ARM actually *shrinks*
the kernel size by about 1200 bytes, since the ARM-optimized
blake2s_compress() completely replaces the generic blake2s_compress().
On x86_64, enabling CRYPTO_BLAKE2S_X86 increases the kernel size by
about 1400 bytes, as the generic blake2s_compress() is still included as
a fallback; however, for context, that is only about a quarter the size
of the generic blake2s_compress().  The x86_64 optimized BLAKE2s code
uses much less icache at runtime than the generic code.

Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20250827151131.27733-10-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-08-29 09:50:19 -07:00
Eric Biggers
126f5d90f6 lib/crypto: blake2s: Remove obsolete self-test
Remove the original BLAKE2s self-test, since it will be superseded by
blake2s_kunit.

Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20250827151131.27733-9-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-08-29 09:50:19 -07:00
Eric Biggers
453eda46b7 lib/crypto: x86/blake2s: Reduce size of BLAKE2S_SIGMA2
Save 480 bytes of .rodata by replacing the .long constants with .bytes,
and using the vpmovzxbd instruction to expand them.

Also update the code to do the loads before incrementing %rax rather
than after.  This avoids the need for the first load to use an offset.

Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20250827151131.27733-8-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-08-29 09:50:19 -07:00
Eric Biggers
13cecc526d lib/crypto: chacha: Consolidate into single module
Consolidate the ChaCha code into a single module (excluding
chacha-block-generic.c which remains always built-in for random.c),
similar to various other algorithms:

- Each arch now provides a header file lib/crypto/$(SRCARCH)/chacha.h,
  replacing lib/crypto/$(SRCARCH)/chacha*.c.  The header defines
  chacha_crypt_arch() and hchacha_block_arch().  It is included by
  lib/crypto/chacha.c, and thus the code gets built into the single
  libchacha module, with improved inlining in some cases.

- Whether arch-optimized ChaCha is buildable is now controlled centrally
  by lib/crypto/Kconfig instead of by lib/crypto/$(SRCARCH)/Kconfig.
  The conditions for enabling it remain the same as before, and it
  remains enabled by default.

- Any additional arch-specific translation units for the optimized
  ChaCha code, such as assembly files, are now compiled by
  lib/crypto/Makefile instead of lib/crypto/$(SRCARCH)/Makefile.

This removes the last use for the Makefile and Kconfig files in the
arm64, mips, powerpc, riscv, and s390 subdirectories of lib/crypto/.  So
also remove those files and the references to them.

Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20250827151131.27733-7-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-08-29 09:50:19 -07:00
Eric Biggers
1ae46b6eb5 lib/crypto: chacha: Rename libchacha.c to chacha.c
Rename libchacha.c to chacha.c to make the naming consistent with other
algorithms and allow additional source files to be added to the
libchacha module.  This file currently contains chacha_crypt_generic(),
but it will soon be updated to contain chacha_crypt().

Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20250827151131.27733-6-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-08-29 09:50:19 -07:00
Eric Biggers
20a1acb68d lib/crypto: chacha: Rename chacha.c to chacha-block-generic.c
Rename chacha.c to chacha-block-generic.c to free up the name chacha.c
for the high-level API entry points (chacha_crypt() and
hchacha_block()), similar to the other algorithms.

Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20250827151131.27733-5-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-08-29 09:50:19 -07:00
Eric Biggers
c4b846ff6e lib/crypto: chacha: Remove unused function chacha_is_arch_optimized()
chacha_is_arch_optimized() is no longer used, so remove it.

Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20250827151131.27733-4-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-08-29 09:50:19 -07:00
Zhihang Shao
bef9c75598 lib/crypto: riscv/poly1305: Import OpenSSL/CRYPTOGAMS implementation
This is a straight import of the OpenSSL/CRYPTOGAMS Poly1305
implementation for riscv authored by Andy Polyakov.  The file
'poly1305-riscv.pl' is taken straight from
https://github.com/dot-asm/cryptogams commit
5e3fba73576244708a752fa61a8e93e587f271bb.  This patch was tested on
SpacemiT X60, with 2~2.5x improvement over generic implementation.

Signed-off-by: Chunyan Zhang <zhangchunyan@iscas.ac.cn>
Signed-off-by: Zhihang Shao <zhihang.shao.iscas@gmail.com>
[EB: ported to lib/crypto/riscv/]
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20250829152513.92459-4-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-08-29 09:49:18 -07:00
Eric Biggers
b646b782e5 lib/crypto: poly1305: Consolidate into single module
Consolidate the Poly1305 code into a single module, similar to various
other algorithms (SHA-1, SHA-256, SHA-512, etc.):

- Each arch now provides a header file lib/crypto/$(SRCARCH)/poly1305.h,
  replacing lib/crypto/$(SRCARCH)/poly1305*.c.  The header defines
  poly1305_block_init(), poly1305_blocks(), poly1305_emit(), and
  optionally poly1305_mod_init_arch().  It is included by
  lib/crypto/poly1305.c, and thus the code gets built into the single
  libpoly1305 module, with improved inlining in some cases.

- Whether arch-optimized Poly1305 is buildable is now controlled
  centrally by lib/crypto/Kconfig instead of by
  lib/crypto/$(SRCARCH)/Kconfig.  The conditions for enabling it remain
  the same as before, and it remains enabled by default.  (The PPC64 one
  remains unconditionally disabled due to 'depends on BROKEN'.)

- Any additional arch-specific translation units for the optimized
  Poly1305 code, such as assembly files, are now compiled by
  lib/crypto/Makefile instead of lib/crypto/$(SRCARCH)/Makefile.

A special consideration is needed because the Adiantum code uses the
poly1305_core_*() functions directly.  For now, just carry forward that
approach.  This means retaining the CRYPTO_LIB_POLY1305_GENERIC kconfig
symbol, and keeping the poly1305_core_*() functions in separate
translation units.  So it's not quite as streamlined I've done with the
other hash functions, but we still get a single libpoly1305 module.

Note: to see the diff from the arm, arm64, and x86 .c files to the new
.h files, view this commit with 'git show -M10'.

Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20250829152513.92459-3-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-08-29 09:49:18 -07:00
Eric Biggers
df220cc5e6 lib/crypto: poly1305: Remove unused function poly1305_is_arch_optimized()
poly1305_is_arch_optimized() is unused, so remove it.

Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20250829152513.92459-2-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-08-29 09:49:18 -07:00
Eric Biggers
5012bd2dc6 lib/crypto: Drop inline from all *_mod_init_arch() functions
Drop 'inline' from all the *_mod_init_arch() functions so that the
compiler will warn about any bugs where they are unused due to not being
wired up properly.  (There are no such bugs currently, so this just
establishes a more robust convention for the future.  Of course, these
functions also tend to get inlined anyway, regardless of the keyword.)

Link: https://lore.kernel.org/r/20250816020457.432040-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-08-27 08:15:35 -07:00
Eric Biggers
d6b6aac0cd lib/crypto: tests: Add KUnit tests for MD5 and HMAC-MD5
Add a KUnit test suite for the MD5 library functions, including the
corresponding HMAC support.  The core test logic is in the
previously-added hash-test-template.h.  This commit just adds the actual
KUnit suite, and it adds the generated test vectors to the tree so that
gen-hash-testvecs.py won't have to be run at build time.

Link: https://lore.kernel.org/r/20250805222855.10362-8-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-08-27 08:15:35 -07:00
Eric Biggers
a1848f6e38 lib/crypto: sparc/md5: Migrate optimized code into library
Instead of exposing the sparc-optimized MD5 code via sparc-specific
crypto_shash algorithms, instead just implement the md5_blocks() library
function.  This is much simpler, it makes the MD5 library functions be
sparc-optimized, and it fixes the longstanding issue where the
sparc-optimized MD5 code was disabled by default.  MD5 still remains
available through crypto_shash, but individual architectures no longer
need to handle it.

Note: to see the diff from arch/sparc/crypto/md5_glue.c to
lib/crypto/sparc/md5.h, view this commit with 'git show -M10'.

Link: https://lore.kernel.org/r/20250805222855.10362-6-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-08-26 12:52:28 -04:00
Eric Biggers
09371e1349 lib/crypto: powerpc/md5: Migrate optimized code into library
Instead of exposing the powerpc-optimized MD5 code via powerpc-specific
crypto_shash algorithms, instead just implement the md5_blocks() library
function.  This is much simpler, it makes the MD5 library functions be
powerpc-optimized, and it fixes the longstanding issue where the
powerpc-optimized MD5 code was disabled by default.  MD5 still remains
available through crypto_shash, but individual architectures no longer
need to handle it.

Link: https://lore.kernel.org/r/20250805222855.10362-5-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-08-26 12:52:28 -04:00
Eric Biggers
c9e5ac0ab9 lib/crypto: mips/md5: Migrate optimized code into library
Instead of exposing the mips-optimized MD5 code via mips-specific
crypto_shash algorithms, instead just implement the md5_blocks() library
function.  This is much simpler, it makes the MD5 library functions be
mips-optimized, and it fixes the longstanding issue where the
mips-optimized MD5 code was disabled by default.  MD5 still remains
available through crypto_shash, but individual architectures no longer
need to handle it.

Note: to see the diff from arch/mips/cavium-octeon/crypto/octeon-md5.c
to lib/crypto/mips/md5.h, view this commit with 'git show -M10'.

Link: https://lore.kernel.org/r/20250805222855.10362-3-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-08-26 12:52:28 -04:00
Eric Biggers
e164461349 lib/crypto: md5: Add MD5 and HMAC-MD5 library functions
Add library functions for MD5, including HMAC support.  The MD5
implementation is derived from crypto/md5.c.  This closely mirrors the
corresponding SHA-1 and SHA-2 changes.

Like SHA-1 and SHA-2, support for architecture-optimized MD5
implementations is included.  I originally proposed dropping those, but
unfortunately there is an AF_ALG user of the PowerPC MD5 code
(https://lore.kernel.org/r/c4191597-341d-4fd7-bc3d-13daf7666c41@csgroup.eu/),
and dropping that code would be viewed as a performance regression.  We
don't add new software algorithm implementations purely for AF_ALG, as
escalating to kernel mode merely to do calculations that could be done
in userspace is inefficient and is completely the wrong design.  But
since this one already existed, it gets grandfathered in for now.  An
objection was also raised to dropping the SPARC64 MD5 code because it
utilizes the CPU's direct support for MD5, although it remains unclear
that anyone is using that.  Regardless, we'll keep these around for now.

Note that while MD5 is a legacy algorithm that is vulnerable to
practical collision attacks, it still has various in-kernel users that
implement legacy protocols.  Switching to a simple library API, which is
the way the code should have been organized originally, will greatly
simplify their code.  For example:

    MD5:
        drivers/md/dm-crypt.c (for lmk IV generation)
        fs/nfsd/nfs4recover.c
        fs/ecryptfs/
        fs/smb/client/
        net/{ipv4,ipv6}/ (for TCP-MD5 signatures)

    HMAC-MD5:
        fs/smb/client/
        fs/smb/server/

(Also net/sctp/ if it continues using HMAC-MD5 for cookie generation.
However, that use case has the flexibility to upgrade to a more modern
algorithm, which I'll be proposing instead.)

As usual, the "md5" and "hmac(md5)" crypto_shash algorithms will also be
reimplemented on top of these library functions.  For "hmac(md5)" this
will provide a faster, more streamlined implementation.

Link: https://lore.kernel.org/r/20250805222855.10362-2-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-08-26 12:52:27 -04:00
Eric Biggers
bce5816672 lib/crypto: sha512: Use underlying functions instead of crypto_simd_usable()
Since sha512_kunit tests the fallback code paths without using
crypto_simd_disabled_for_test, make the SHA-512 code just use the
underlying may_use_simd() and irq_fpu_usable() functions directly
instead of crypto_simd_usable().  This eliminates an unnecessary layer.

Link: https://lore.kernel.org/r/20250731223651.136939-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-08-26 12:52:27 -04:00
Eric Biggers
640d31ea83 lib/crypto: sha256: Use underlying functions instead of crypto_simd_usable()
Since sha256_kunit tests the fallback code paths without using
crypto_simd_disabled_for_test, make the SHA-256 code just use the
underlying may_use_simd() and irq_fpu_usable() functions directly
instead of crypto_simd_usable().  This eliminates an unnecessary layer.

While doing this, also add likely() annotations, and fix a minor
inconsistency where the static keys in the sha256.h files were in a
different place than in the corresponding sha1.h and sha512.h files.

Link: https://lore.kernel.org/r/20250731223510.136650-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-08-26 12:52:27 -04:00
Tal Zussman
fd7e5de4b2 lib/crypto: ensure generated *.S files are removed on make clean
make clean does not check the kernel config when removing files. As
such, additions to clean-files under CONFIG_ARM or CONFIG_ARM64 are not
evaluated. For example, when building on arm64, this means that
lib/crypto/arm64/sha{256,512}-core.S are left over after make clean.

Set clean-files unconditionally to ensure that make clean removes these
files.

Fixes: e96cb9507f ("lib/crypto: sha256: Consolidate into single module")
Fixes: 24c91b62ac ("lib/crypto: arm/sha512: Migrate optimized SHA-512 code to library")
Fixes: 60e3f1e9b7 ("lib/crypto: arm64/sha512: Migrate optimized SHA-512 code to library")
Signed-off-by: Tal Zussman <tz2294@columbia.edu>
Link: https://lore.kernel.org/r/20250814-crypto_clean-v2-1-659a2dc86302@columbia.edu
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-08-14 18:01:03 -07:00
Eric Biggers
d73915fdc0 lib/crypto: sha: Update Kconfig help for SHA1 and SHA256
Update the help text for CRYPTO_LIB_SHA1 and CRYPTO_LIB_SHA256 to
reflect the addition of HMAC support, and to be consistent with
CRYPTO_LIB_SHA512.

Link: https://lore.kernel.org/r/20250731224218.137947-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-08-14 18:00:47 -07:00
Eric Biggers
b41dc83f07 kunit, lib/crypto: Move run_irq_test() to common header
Rename run_irq_test() to kunit_run_irq_test() and move it to a public
header so that it can be reused by crc_kunit.

Link: https://lore.kernel.org/r/20250811182631.376302-2-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-08-11 11:28:00 -07:00
Linus Torvalds
bc46b7cbc5 Merge tag 's390-6.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Alexander Gordeev:

 - Standardize on the __ASSEMBLER__ macro that is provided by GCC and
   Clang compilers and replace __ASSEMBLY__ with __ASSEMBLER__ in both
   uapi and non-uapi headers

 - Explicitly include <linux/export.h> in architecture and driver files
   which contain an EXPORT_SYMBOL() and remove the include from the
   files which do not contain the EXPORT_SYMBOL()

 - Use the full title of "z/Architecture Principles of Operation" manual
   and the name of a section where facility bits are listed

 - Use -D__DISABLE_EXPORTS for files in arch/s390/boot to avoid
   unnecessary slowing down of the build and confusing external kABI
   tools that process symtypes data

 - Print additional unrecoverable machine check information to make the
   root cause analysis easier

 - Move cmpxchg_user_key() handling to uaccess library code, since the
   generated code is large anyway and there is no benefit if it is
   inlined

 - Fix a problem when cmpxchg_user_key() is executing a code with a
   non-default key: if a system is IPL-ed with "LOAD NORMAL", and the
   previous system used storage keys where the fetch-protection bit was
   set for some pages, and the cmpxchg_user_key() is located within such
   page, a protection exception happens

 - Either the external call or emergency signal order is used to send an
   IPI to a remote CPU. Use the external order only, since it is at
   least as good and sometimes even better, than the emergency signal

 - In case of an early crash the early program check handler prints more
   or less random value of the last breaking event address, since it is
   not initialized properly. Copy the last breaking event address from
   the lowcore to pt_regs to address this

 - During STP synchronization check udelay() can not be used, since the
   first CPU modifies tod_clock_base and get_tod_clock_monotonic() might
   return a non-monotonic time. Instead, busy-loop on other CPUs, while
   the the first CPU actually handles the synchronization operation

 - When debugging the early kernel boot using QEMU with the -S flag and
   GDB attached, skip the decompressor and start directly in kernel

 - Rename PAI Crypto event 4210 according to z16 and z17 "z/Architecture
   Principles of Operation" manual

 - Remove the in-kernel time steering support in favour of the new s390
   PTP driver, which allows the kernel clock steered more precisely

 - Remove a possible false-positive warning in pte_free_defer(), which
   could be triggered in a valid case KVM guest process is initializing

* tag 's390-6.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (29 commits)
  s390/mm: Remove possible false-positive warning in pte_free_defer()
  s390/stp: Default to enabled
  s390/stp: Remove leap second support
  s390/time: Remove in-kernel time steering
  s390/sclp: Use monotonic clock in sclp_sync_wait()
  s390/smp: Use monotonic clock in smp_emergency_stop()
  s390/time: Use monotonic clock in get_cycles()
  s390/pai_crypto: Rename PAI Crypto event 4210
  scripts/gdb/symbols: make lx-symbols skip the s390 decompressor
  s390/boot: Introduce jump_to_kernel() function
  s390/stp: Remove udelay from stp_sync_clock()
  s390/early: Copy last breaking event address to pt_regs
  s390/smp: Remove conditional emergency signal order code usage
  s390/uaccess: Merge cmpxchg_user_key() inline assemblies
  s390/uaccess: Prevent kprobes on cmpxchg_user_key() functions
  s390/uaccess: Initialize code pages executed with non-default access key
  s390/skey: Provide infrastructure for executing with non-default access key
  s390/uaccess: Make cmpxchg_user_key() library code
  s390/page: Add memory clobber to page_set_storage_key()
  s390/page: Cleanup page_set_storage_key() inline assemblies
  ...
2025-07-29 20:17:08 -07:00
Linus Torvalds
f2f573ebd4 Merge tag 'libcrypto-tests-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux
Pull crypto library test updates from Eric Biggers:
 "Add KUnit test suites for the Poly1305, SHA-1, SHA-224, SHA-256,
  SHA-384, and SHA-512 library functions.

  These are the first KUnit tests for lib/crypto/. So in addition to
  being useful tests for these specific algorithms, they also establish
  some conventions for lib/crypto/ testing going forwards.

  The new tests are fairly comprehensive: more comprehensive than the
  generic crypto infrastructure's tests. They use a variety of
  techniques to check for the types of implementation bugs that tend to
  occur in the real world, rather than just naively checking some test
  vectors. (Interestingly, poly1305_kunit found a bug in QEMU)

  The core test logic is shared by all six algorithms, rather than being
  duplicated for each algorithm.

  Each algorithm's test suite also optionally includes a benchmark"

* tag 'libcrypto-tests-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux:
  lib/crypto: tests: Annotate worker to be on stack
  lib/crypto: tests: Add KUnit tests for SHA-1 and HMAC-SHA1
  lib/crypto: tests: Add KUnit tests for Poly1305
  lib/crypto: tests: Add KUnit tests for SHA-384 and SHA-512
  lib/crypto: tests: Add KUnit tests for SHA-224 and SHA-256
  lib/crypto: tests: Add hash-test-template.h and gen-hash-testvecs.py
2025-07-28 18:02:58 -07:00
Guenter Roeck
8cd876e783 lib/crypto: tests: Annotate worker to be on stack
The following warning traceback is seen if object debugging is enabled
with the new crypto test code.

ODEBUG: object 9000000106237c50 is on stack 9000000106234000, but NOT annotated.
------------[ cut here ]------------
WARNING: lib/debugobjects.c:655 at lookup_object_or_alloc.part.0+0x19c/0x1f4, CPU#0: kunit_try_catch/468
...

This also results in a boot stall when running the code in qemu:loongarch.

Initializing the worker with INIT_WORK_ONSTACK() fixes the problem.

Fixes: 950a81224e ("lib/crypto: tests: Add hash-test-template.h and gen-hash-testvecs.py")
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20250721231917.3182029-1-linux@roeck-us.net
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-07-21 20:10:36 -07:00
Eric Biggers
debc1e5a43 lib/crypto: arm64/sha512-ce: Drop compatibility macros for older binutils
Now that the oldest supported binutils version is 2.30, the macros that
emit the SHA-512 instructions as '.inst' words are no longer needed.  So
drop them.  No change in the generated machine code.

Changed from the original patch by Ard Biesheuvel:
(https://lore.kernel.org/r/20250515142702.2592942-2-ardb+git@google.com):
 - Reduced scope to just SHA-512
 - Added comment that explains why "sha3" is used instead of "sha2"

Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20250718220706.475240-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-07-20 21:43:27 -07:00
Eric Biggers
42e3376e09 lib/crypto: x86/sha1-ni: Convert to use rounds macros
The assembly code that does all 80 rounds of SHA-1 is highly repetitive.
Replace it with 20 expansions of a macro that does 4 rounds, using the
macro arguments and .if directives to handle the slight variations
between rounds.  This reduces the length of sha1-ni-asm.S by 129 lines
while still producing the exact same object file.  This mirrors
sha256-ni-asm.S which uses this same strategy.

Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20250718191900.42877-3-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-07-20 21:42:42 -07:00
Eric Biggers
f88ed14aa0 lib/crypto: x86/sha1-ni: Minor optimizations and cleanup
- Store the previous state in %xmm8-%xmm9 instead of spilling it to the
  stack.  There are plenty of unused XMM registers here, so there is no
  reason to spill to the stack.  (While 32-bit code is limited to
  %xmm0-%xmm7, this is 64-bit code, so it's free to use %xmm8-%xmm15.)

- Remove the unnecessary check for nblocks == 0.  sha1_ni_transform() is
  always passed a positive nblocks.

- To get an XMM register with 'e' in the high dword and the rest zeroes,
  just zeroize the register using pxor, then load 'e'.  Previously the
  code loaded 'e', then zeroized the lower dwords by AND-ing with a
  constant, which was slightly less efficient.

- Instead of computing &DATA_PTR[NBLOCKS << 6] and stopping when
  DATA_PTR reaches that value, instead just decrement NBLOCKS on each
  iteration and stop when it reaches 0.  This is fewer instructions.

- Rename DIGEST_PTR to STATE_PTR.  It points to the SHA-1 internal
  state, not a SHA-1 digest value.

This commit shrinks the code size of sha1_ni_transform() from 624 bytes
to 589 bytes and also shrinks rodata by 16 bytes.

Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20250718191900.42877-2-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-07-20 21:42:34 -07:00
Eric Biggers
66b1306079 lib/crypto: tests: Add KUnit tests for SHA-1 and HMAC-SHA1
Add a KUnit test suite for the SHA-1 library functions, including the
corresponding HMAC support.  The core test logic is in the
previously-added hash-test-template.h.  This commit just adds the actual
KUnit suite, and it adds the generated test vectors to the tree so that
gen-hash-testvecs.py won't have to be run at build time.

Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20250712232329.818226-16-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-07-14 11:29:36 -07:00
Eric Biggers
6dd4d9f791 lib/crypto: tests: Add KUnit tests for Poly1305
Add a KUnit test suite for the Poly1305 functions.  Most of its test
cases are instantiated from hash-test-template.h, which is also used by
the SHA-2 tests.  A couple additional test cases are also included to
test edge cases specific to Poly1305.

Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20250709200112.258500-5-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-07-14 11:29:36 -07:00
Eric Biggers
571eaeddb6 lib/crypto: tests: Add KUnit tests for SHA-384 and SHA-512
Add KUnit test suites for the SHA-384 and SHA-512 library functions,
including the corresponding HMAC support.  The core test logic is in the
previously-added hash-test-template.h.  This commit just adds the actual
KUnit suites, and it adds the generated test vectors to the tree so that
gen-hash-testvecs.py won't have to be run at build time.

Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20250709200112.258500-4-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-07-14 11:29:36 -07:00
Eric Biggers
4dcf6cadda lib/crypto: tests: Add KUnit tests for SHA-224 and SHA-256
Add KUnit test suites for the SHA-224 and SHA-256 library functions,
including the corresponding HMAC support.  The core test logic is in the
previously-added hash-test-template.h.  This commit just adds the actual
KUnit suites, and it adds the generated test vectors to the tree so that
gen-hash-testvecs.py won't have to be run at build time.

Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20250709200112.258500-3-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-07-14 11:29:36 -07:00
Eric Biggers
950a81224e lib/crypto: tests: Add hash-test-template.h and gen-hash-testvecs.py
Add hash-test-template.h which generates the following KUnit test cases
for hash functions:

    test_hash_test_vectors
    test_hash_all_lens_up_to_4096
    test_hash_incremental_updates
    test_hash_buffer_overruns
    test_hash_overlaps
    test_hash_alignment_consistency
    test_hash_ctx_zeroization
    test_hash_interrupt_context_1
    test_hash_interrupt_context_2
    test_hmac  (when HMAC is supported)
    benchmark_hash  (when CONFIG_CRYPTO_LIB_BENCHMARK=y)

The initial use cases for this will be sha224_kunit, sha256_kunit,
sha384_kunit, sha512_kunit, and poly1305_kunit.

Add a Python script gen-hash-testvecs.py which generates the test
vectors required by test_hash_test_vectors,
test_hash_all_lens_up_to_4096, and test_hmac.

Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20250709200112.258500-2-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-07-14 11:29:36 -07:00
Eric Biggers
f3d6cb3dc0 lib/crypto: x86/sha1: Migrate optimized code into library
Instead of exposing the x86-optimized SHA-1 code via x86-specific
crypto_shash algorithms, instead just implement the sha1_blocks()
library function.  This is much simpler, it makes the SHA-1 library
functions be x86-optimized, and it fixes the longstanding issue where
the x86-optimized SHA-1 code was disabled by default.  SHA-1 still
remains available through crypto_shash, but individual architectures no
longer need to handle it.

To match sha1_blocks(), change the type of the nblocks parameter of the
assembly functions from int to size_t.  The assembly functions actually
already treated it as size_t.

Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20250712232329.818226-14-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-07-14 11:28:35 -07:00
Eric Biggers
c751059985 lib/crypto: sparc/sha1: Migrate optimized code into library
Instead of exposing the sparc-optimized SHA-1 code via sparc-specific
crypto_shash algorithms, instead just implement the sha1_blocks()
library function.  This is much simpler, it makes the SHA-1 library
functions be sparc-optimized, and it fixes the longstanding issue where
the sparc-optimized SHA-1 code was disabled by default.  SHA-1 still
remains available through crypto_shash, but individual architectures no
longer need to handle it.

Note: to see the diff from arch/sparc/crypto/sha1_glue.c to
lib/crypto/sparc/sha1.h, view this commit with 'git show -M10'.

Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20250712232329.818226-13-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-07-14 11:11:49 -07:00
Eric Biggers
377982d561 lib/crypto: s390/sha1: Migrate optimized code into library
Instead of exposing the s390-optimized SHA-1 code via s390-specific
crypto_shash algorithms, instead just implement the sha1_blocks()
library function.  This is much simpler, it makes the SHA-1 library
functions be s390-optimized, and it fixes the longstanding issue where
the s390-optimized SHA-1 code was disabled by default.  SHA-1 still
remains available through crypto_shash, but individual architectures no
longer need to handle it.

Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20250712232329.818226-12-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-07-14 11:11:49 -07:00