282 Commits

Author SHA1 Message Date
Linus Torvalds
a619fe35ab Merge tag 'v6.19-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto updates from Herbert Xu:
 "API:
   - Rewrite memcpy_sglist from scratch
   - Add on-stack AEAD request allocation
   - Fix partial block processing in ahash

  Algorithms:
   - Remove ansi_cprng
   - Remove tcrypt tests for poly1305
   - Fix EINPROGRESS processing in authenc
   - Fix double-free in zstd

  Drivers:
   - Use drbg ctr helper when reseeding xilinx-trng
   - Add support for PCI device 0x115A to ccp
   - Add support of paes in caam
   - Add support for aes-xts in dthev2

  Others:
   - Use likely in rhashtable lookup
   - Fix lockdep false-positive in padata by removing a helper"

* tag 'v6.19-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (71 commits)
  crypto: zstd - fix double-free in per-CPU stream cleanup
  crypto: ahash - Zero positive err value in ahash_update_finish
  crypto: ahash - Fix crypto_ahash_import with partial block data
  crypto: lib/mpi - use min() instead of min_t()
  crypto: ccp - use min() instead of min_t()
  hwrng: core - use min3() instead of nested min_t()
  crypto: aesni - ctr_crypt() use min() instead of min_t()
  crypto: drbg - Delete unused ctx from struct sdesc
  crypto: testmgr - Add missing DES weak and semi-weak key tests
  Revert "crypto: scatterwalk - Move skcipher walk and use it for memcpy_sglist"
  crypto: scatterwalk - Fix memcpy_sglist() to always succeed
  crypto: iaa - Request to add Kanchana P Sridhar to Maintainers.
  crypto: tcrypt - Remove unused poly1305 support
  crypto: ansi_cprng - Remove unused ansi_cprng algorithm
  crypto: asymmetric_keys - fix uninitialized pointers with free attribute
  KEYS: Avoid -Wflex-array-member-not-at-end warning
  crypto: ccree - Correctly handle return of sg_nents_for_len
  crypto: starfive - Correctly handle return of sg_nents_for_len
  crypto: iaa - Fix incorrect return value in save_iaa_wq()
  crypto: zstd - Remove unnecessary size_t cast
  ...
2025-12-03 11:28:38 -08:00
Linus Torvalds
f617d24606 Merge tag 'fpsimd-on-stack-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux
Pull arm64 FPSIMD on-stack buffer updates from Eric Biggers:
 "This is a core arm64 change. However, I was asked to take this because
  most uses of kernel-mode FPSIMD are in crypto or CRC code.

  In v6.8, the size of task_struct on arm64 increased by 528 bytes due
  to the new 'kernel_fpsimd_state' field. This field was added to allow
  kernel-mode FPSIMD code to be preempted.

  Unfortunately, 528 bytes is kind of a lot for task_struct. This
  regression in the task_struct size was noticed and reported.

  Recover that space by making this state be allocated on the stack at
  the beginning of each kernel-mode FPSIMD section.

  To make it easier for all the users of kernel-mode FPSIMD to do that
  correctly, introduce and use a 'scoped_ksimd' abstraction"

* tag 'fpsimd-on-stack-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux: (23 commits)
  lib/crypto: arm64: Move remaining algorithms to scoped ksimd API
  lib/crypto: arm/blake2b: Move to scoped ksimd API
  arm64/fpsimd: Allocate kernel mode FP/SIMD buffers on the stack
  arm64/fpu: Enforce task-context only for generic kernel mode FPU
  net/mlx5: Switch to more abstract scoped ksimd guard API on arm64
  arm64/xorblocks:  Switch to 'ksimd' scoped guard API
  crypto/arm64: sm4 - Switch to 'ksimd' scoped guard API
  crypto/arm64: sm3 - Switch to 'ksimd' scoped guard API
  crypto/arm64: sha3 - Switch to 'ksimd' scoped guard API
  crypto/arm64: polyval - Switch to 'ksimd' scoped guard API
  crypto/arm64: nhpoly1305 - Switch to 'ksimd' scoped guard API
  crypto/arm64: aes-gcm - Switch to 'ksimd' scoped guard API
  crypto/arm64: aes-blk - Switch to 'ksimd' scoped guard API
  crypto/arm64: aes-ccm - Switch to 'ksimd' scoped guard API
  raid6: Move to more abstract 'ksimd' guard API
  crypto: aegis128-neon - Move to more abstract 'ksimd' guard API
  crypto/arm64: sm4-ce-gcm - Avoid pointless yield of the NEON unit
  crypto/arm64: sm4-ce-ccm - Avoid pointless yield of the NEON unit
  crypto/arm64: aes-ce-ccm - Avoid pointless yield of the NEON unit
  lib/crc: Switch ARM and arm64 to 'ksimd' scoped guard API
  ...
2025-12-02 18:53:50 -08:00
Linus Torvalds
906003e151 Merge tag 'libcrypto-at-least-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux
Pull 'at_least' array size update from Eric Biggers:
 "C supports lower bounds on the sizes of array parameters, using the
  static keyword as follows: 'void f(int a[static 32]);'. This allows
  the compiler to warn about a too-small array being passed.

  As discussed, this reuse of the 'static' keyword, while standard, is a
  bit obscure. Therefore, add an alias 'at_least' to compiler_types.h.

  Then, add this 'at_least' annotation to the array parameters of
  various crypto library functions"

* tag 'libcrypto-at-least-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux:
  lib/crypto: sha2: Add at_least decoration to fixed-size array params
  lib/crypto: sha1: Add at_least decoration to fixed-size array params
  lib/crypto: poly1305: Add at_least decoration to fixed-size array params
  lib/crypto: md5: Add at_least decoration to fixed-size array params
  lib/crypto: curve25519: Add at_least decoration to fixed-size array params
  lib/crypto: chacha: Add at_least decoration to fixed-size array params
  lib/crypto: chacha20poly1305: Statically check fixed array lengths
  compiler_types: introduce at_least parameter decoration pseudo keyword
  wifi: iwlwifi: trans: rename at_least variable to min_mode
2025-12-02 18:26:54 -08:00
Linus Torvalds
db425f7a0b Merge tag 'libcrypto-tests-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux
Pull crypto library test updates from Eric Biggers:

 - Add KUnit test suites for SHA-3, BLAKE2b, and POLYVAL. These are the
   algorithms that have new crypto library interfaces this cycle.

 - Remove the crypto_shash POLYVAL tests. They're no longer needed
   because POLYVAL support was removed from crypto_shash. Better POLYVAL
   test coverage is now provided via the KUnit test suite.

* tag 'libcrypto-tests-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux:
  crypto: testmgr - Remove polyval tests
  lib/crypto: tests: Add KUnit tests for POLYVAL
  lib/crypto: tests: Add additional SHAKE tests
  lib/crypto: tests: Add SHA3 kunit tests
  lib/crypto: tests: Add KUnit tests for BLAKE2b
2025-12-02 18:20:06 -08:00
Linus Torvalds
5abe8d8efc Merge tag 'libcrypto-updates-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux
Pull crypto library updates from Eric Biggers:
 "This is the main crypto library pull request for 6.19. It includes:

   - Add SHA-3 support to lib/crypto/, including support for both the
     hash functions and the extendable-output functions. Reimplement the
     existing SHA-3 crypto_shash support on top of the library.

     This is motivated mainly by the upcoming support for the ML-DSA
     signature algorithm, which needs the SHAKE128 and SHAKE256
     functions. But even on its own it's a useful cleanup.

     This also fixes the longstanding issue where the
     architecture-optimized SHA-3 code was disabled by default.

   - Add BLAKE2b support to lib/crypto/, and reimplement the existing
     BLAKE2b crypto_shash support on top of the library.

     This is motivated mainly by btrfs, which supports BLAKE2b
     checksums. With this change, all btrfs checksum algorithms now have
     library APIs. btrfs is planned to start just using the library
     directly.

     This refactor also improves consistency between the BLAKE2b code
     and BLAKE2s code. And as usual, it also fixes the issue where the
     architecture-optimized BLAKE2b code was disabled by default.

   - Add POLYVAL support to lib/crypto/, replacing the existing POLYVAL
     support in crypto_shash. Reimplement HCTR2 on top of the library.

     This simplifies the code and improves HCTR2 performance. As usual,
     it also makes the architecture-optimized code be enabled by
     default. The generic implementation of POLYVAL is greatly improved
     as well.

   - Clean up the BLAKE2s code

   - Add FIPS self-tests for SHA-1, SHA-2, and SHA-3"

* tag 'libcrypto-updates-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux: (37 commits)
  fscrypt: Drop obsolete recommendation to enable optimized POLYVAL
  crypto: polyval - Remove the polyval crypto_shash
  crypto: hctr2 - Convert to use POLYVAL library
  lib/crypto: x86/polyval: Migrate optimized code into library
  lib/crypto: arm64/polyval: Migrate optimized code into library
  lib/crypto: polyval: Add POLYVAL library
  crypto: polyval - Rename conflicting functions
  lib/crypto: x86/blake2s: Use vpternlogd for 3-input XORs
  lib/crypto: x86/blake2s: Avoid writing back unchanged 'f' value
  lib/crypto: x86/blake2s: Improve readability
  lib/crypto: x86/blake2s: Use local labels for data
  lib/crypto: x86/blake2s: Drop check for nblocks == 0
  lib/crypto: x86/blake2s: Fix 32-bit arg treated as 64-bit
  lib/crypto: arm, arm64: Drop filenames from file comments
  lib/crypto: arm/blake2s: Fix some comments
  crypto: s390/sha3 - Remove superseded SHA-3 code
  crypto: sha3 - Reimplement using library API
  crypto: jitterentropy - Use default sha3 implementation
  lib/crypto: s390/sha3: Add optimized one-shot SHA-3 digest functions
  lib/crypto: sha3: Support arch overrides of one-shot digest functions
  ...
2025-12-02 18:01:03 -08:00
David Laight
80b61046b6 crypto: lib/mpi - use min() instead of min_t()
min_t(unsigned int, a, b) casts an 'unsigned long' to 'unsigned int'.
Use min(a, b) instead as it promotes any 'unsigned int' to 'unsigned long'
and so cannot discard significant bits.

In this case the 'unsigned long' value is small enough that the result
is ok.

Detected by an extra check added to min_t().

Signed-off-by: David Laight <david.laight.linux@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-11-24 17:44:14 +08:00
Jason A. Donenfeld
ac653d57ad lib/crypto: chacha20poly1305: Statically check fixed array lengths
Several parameters of the chacha20poly1305 functions require arrays of
an exact length. Use the new at_least keyword to instruct gcc and
clang to statically check that the caller is passing an object of at
least that length.

Here it is in action, with this faulty patch to wireguard's cookie.h:

     struct cookie_checker {
     	u8 secret[NOISE_HASH_LEN];
    -	u8 cookie_encryption_key[NOISE_SYMMETRIC_KEY_LEN];
    +	u8 cookie_encryption_key[NOISE_SYMMETRIC_KEY_LEN - 1];
     	u8 message_mac1_key[NOISE_SYMMETRIC_KEY_LEN];

If I try compiling this code, I get this helpful warning:

  CC      drivers/net/wireguard/cookie.o
drivers/net/wireguard/cookie.c: In function ‘wg_cookie_message_create’:
drivers/net/wireguard/cookie.c:193:9: warning: ‘xchacha20poly1305_encrypt’ reading 32 bytes from a region of size 31 [-Wstringop-overread]
  193 |         xchacha20poly1305_encrypt(dst->encrypted_cookie, cookie, COOKIE_LEN,
      |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  194 |                                   macs->mac1, COOKIE_LEN, dst->nonce,
      |                                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  195 |                                   checker->cookie_encryption_key);
      |                                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/net/wireguard/cookie.c:193:9: note: referencing argument 7 of type ‘const u8 *’ {aka ‘const unsigned char *’}
In file included from drivers/net/wireguard/messages.h:10,
                 from drivers/net/wireguard/cookie.h:9,
                 from drivers/net/wireguard/cookie.c:6:
include/crypto/chacha20poly1305.h:28:6: note: in a call to function ‘xchacha20poly1305_encrypt’
   28 | void xchacha20poly1305_encrypt(u8 *dst, const u8 *src, const size_t src_len,

Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: "Jason A. Donenfeld" <Jason@zx2c4.com>
Link: https://lore.kernel.org/r/20251123054819.2371989-4-Jason@zx2c4.com
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-11-23 12:19:21 -08:00
Eric Biggers
141fbbecec lib/crypto: tests: Fix KMSAN warning in test_sha256_finup_2x()
Fully initialize *ctx, including the buf field which sha256_init()
doesn't initialize, to avoid a KMSAN warning when comparing *ctx to
orig_ctx.  This KMSAN warning slipped in while KMSAN was not working
reliably due to a stackdepot bug, which has now been fixed.

Fixes: 6733968be7 ("lib/crypto: tests: Add tests and benchmark for sha256_finup_2x()")
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20251121033431.34406-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-11-21 10:22:24 -08:00
Ard Biesheuvel
8dcac98a47 lib/crypto: arm64: Move remaining algorithms to scoped ksimd API
Move the arm64 implementations of SHA-3 and POLYVAL to the newly
introduced scoped ksimd API, which replaces kernel_neon_begin() and
kernel_neon_end(). On arm64, this is needed because the latter API
will change in an incompatible manner.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-11-12 10:14:11 -08:00
Ard Biesheuvel
c0d597e016 lib/crypto: arm/blake2b: Move to scoped ksimd API
Even though ARM's versions of kernel_neon_begin()/_end() are not being
changed, update the newly migrated ARM blake2b to the scoped ksimd API
so that all ARM and arm64 in lib/crypto remains consistent in this
manner.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-11-12 09:57:52 -08:00
Eric Biggers
065f040010 Merge tag 'scoped-ksimd-for-arm-arm64' into libcrypto-fpsimd-on-stack
Pull scoped ksimd API for ARM and arm64 from Ard Biesheuvel:

  "Introduce a more strict replacement API for
   kernel_neon_begin()/kernel_neon_end() on both ARM and arm64, and
   replace occurrences of the latter pair appearing in lib/crypto"

Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-11-12 09:55:55 -08:00
Ard Biesheuvel
f53d18a4e6 lib/crypto: Switch ARM and arm64 to 'ksimd' scoped guard API
Before modifying the prototypes of kernel_neon_begin() and
kernel_neon_end() to accommodate kernel mode FP/SIMD state buffers
allocated on the stack, move arm64 to the new 'ksimd' scoped guard API,
which encapsulates the calls to those functions.

For symmetry, do the same for 32-bit ARM too.

Reviewed-by: Eric Biggers <ebiggers@kernel.org>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2025-11-12 09:51:13 +01:00
Eric Biggers
b3aed551b3 lib/crypto: tests: Add KUnit tests for POLYVAL
Add a test suite for the POLYVAL library, including:

- All the standard tests and the benchmark from hash-test-template.h
- Comparison with a test vector from the RFC
- Test with key and message containing all one bits
- Additional tests related to the key struct

Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20251109234726.638437-4-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-11-11 11:07:52 -08:00
Eric Biggers
b2210f3516 lib/crypto: tests: Add additional SHAKE tests
Add the following test cases to cover gaps in the SHAKE testing:

    - test_shake_all_lens_up_to_4096()
    - test_shake_multiple_squeezes()
    - test_shake_with_guarded_bufs()

Remove test_shake256_tiling() and test_shake256_tiling2() since they are
superseded by test_shake_multiple_squeezes().  It provides better test
coverage by using randomized testing.  E.g., it's able to generate a
zero-length squeeze followed by a nonzero-length squeeze, which the
first 7 versions of the SHA-3 patchset handled incorrectly.

Tested-by: Harald Freudenberger <freude@linux.ibm.com>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20251026055032.1413733-7-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-11-11 11:07:36 -08:00
David Howells
15c64c47e4 lib/crypto: tests: Add SHA3 kunit tests
Add a SHA3 kunit test suite, providing the following:

 (*) A simple test of each of SHA3-224, SHA3-256, SHA3-384, SHA3-512,
     SHAKE128 and SHAKE256.

 (*) NIST 0- and 1600-bit test vectors for SHAKE128 and SHAKE256.

 (*) Output tiling (multiple squeezing) tests for SHAKE256.

 (*) Standard hash template test for SHA3-256.  To make this possible,
     gen-hash-testvecs.py is modified to support sha3-256.

 (*) Standard benchmark test for SHA3-256.

[EB: dropped some unnecessary changes to gen-hash-testvecs.py, moved
     addition of Testing section in doc file into this commit, and
     other small cleanups]

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Harald Freudenberger <freude@linux.ibm.com>
Link: https://lore.kernel.org/r/20251026055032.1413733-6-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-11-11 11:07:36 -08:00
Eric Biggers
6401fd334d lib/crypto: tests: Add KUnit tests for BLAKE2b
Add a KUnit test suite for the BLAKE2b library API, mirroring the
BLAKE2s test suite very closely.

As with the BLAKE2s test suite, a benchmark is included.

Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20251018043106.375964-9-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-11-11 11:07:36 -08:00
Eric Biggers
4d8da35579 lib/crypto: x86/polyval: Migrate optimized code into library
Migrate the x86_64 implementation of POLYVAL into lib/crypto/, wiring it
up to the POLYVAL library interface.  This makes the POLYVAL library be
properly optimized on x86_64.

This drops the x86_64 optimizations of polyval in the crypto_shash API.
That's fine, since polyval will be removed from crypto_shash entirely
since it is unneeded there.  But even if it comes back, the crypto_shash
API could just be implemented on top of the library API, as usual.

Adjust the names and prototypes of the assembly functions to align more
closely with the rest of the library code.

Also replace a movaps instruction with movups to remove the assumption
that the key struct is 16-byte aligned.  Users can still align the key
if they want (and at least in this case, movups is just as fast as
movaps), but it's inconvenient to require it.

Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20251109234726.638437-6-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-11-11 11:03:38 -08:00
Eric Biggers
37919e239e lib/crypto: arm64/polyval: Migrate optimized code into library
Migrate the arm64 implementation of POLYVAL into lib/crypto/, wiring it
up to the POLYVAL library interface.  This makes the POLYVAL library be
properly optimized on arm64.

This drops the arm64 optimizations of polyval in the crypto_shash API.
That's fine, since polyval will be removed from crypto_shash entirely
since it is unneeded there.  But even if it comes back, the crypto_shash
API could just be implemented on top of the library API, as usual.

Adjust the names and prototypes of the assembly functions to align more
closely with the rest of the library code.

Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20251109234726.638437-5-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-11-11 11:03:38 -08:00
Eric Biggers
3d176751e5 lib/crypto: polyval: Add POLYVAL library
Add support for POLYVAL to lib/crypto/.

This will replace the polyval crypto_shash algorithm and its use in the
hctr2 template, simplifying the code and reducing overhead.

Specifically, this commit introduces the POLYVAL library API and a
generic implementation of it.  Later commits will migrate the existing
architecture-optimized implementations of POLYVAL into lib/crypto/ and
add a KUnit test suite.

I've also rewritten the generic implementation completely, using a more
modern approach instead of the traditional table-based approach.  It's
now constant-time, requires no precomputation or dynamic memory
allocations, decreases the per-key memory usage from 4096 bytes to 16
bytes, and is faster than the old polyval-generic even on bulk data
reusing the same key (at least on x86_64, where I measured 15% faster).
We should do this for GHASH too, but for now just do it for POLYVAL.

Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20251109234726.638437-3-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-11-11 11:03:38 -08:00
Eric Biggers
8ba60c5914 lib/crypto: x86/blake2s: Use vpternlogd for 3-input XORs
AVX-512 supports 3-input XORs via the vpternlogd (or vpternlogq)
instruction with immediate 0x96.  This approach, vs. the alternative of
two vpxor instructions, is already used in the CRC, AES-GCM, and AES-XTS
code, since it reduces the instruction count and is faster on some CPUs.
Make blake2s_compress_avx512() take advantage of it too.

Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20251102234209.62133-7-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-11-05 20:30:52 -08:00
Eric Biggers
cd5528621a lib/crypto: x86/blake2s: Avoid writing back unchanged 'f' value
Just before returning, blake2s_compress_ssse3() and
blake2s_compress_avx512() store updated values to the 'h', 't', and 'f'
fields of struct blake2s_ctx.  But 'f' is always unchanged (which is
correct; only the C code changes it).  So, there's no need to write to
'f'.  Use 64-bit stores (movq and vmovq) instead of 128-bit stores
(movdqu and vmovdqu) so that only 't' is written.

Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20251102234209.62133-6-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-11-05 20:30:52 -08:00
Eric Biggers
a7acd77ebd lib/crypto: x86/blake2s: Improve readability
Various cleanups for readability.  No change to the generated code:

- Add some comments
- Add #defines for arguments
- Rename some labels
- Use decimal constants instead of hex where it makes sense.
  (The pshufd immediates intentionally remain as hex.)
- Add blank lines when there's a logical break

The round loop still could use some work, but this is at least a start.

Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20251102234209.62133-5-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-11-05 20:30:52 -08:00
Eric Biggers
83c1a867c9 lib/crypto: x86/blake2s: Use local labels for data
Following the usual practice, prefix the names of the data labels with
".L" so that the assembler treats them as truly local.  This more
clearly expresses the intent and is less error-prone.

Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20251102234209.62133-4-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-11-05 20:30:52 -08:00
Eric Biggers
c19bdf24cc lib/crypto: x86/blake2s: Drop check for nblocks == 0
Since blake2s_compress() is always passed nblocks != 0, remove the
unnecessary check for nblocks == 0 from blake2s_compress_ssse3().

Note that this makes it consistent with blake2s_compress_avx512() in the
same file as well as the arm32 blake2s_compress().

Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20251102234209.62133-3-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-11-05 20:30:52 -08:00
Eric Biggers
2f22115709 lib/crypto: x86/blake2s: Fix 32-bit arg treated as 64-bit
In the C code, the 'inc' argument to the assembly functions
blake2s_compress_ssse3() and blake2s_compress_avx512() is declared with
type u32, matching blake2s_compress().  The assembly code then reads it
from the 64-bit %rcx.  However, the ABI doesn't guarantee zero-extension
to 64 bits, nor do gcc or clang guarantee it.  Therefore, fix these
functions to read this argument from the 32-bit %ecx.

In theory, this bug could have caused the wrong 'inc' value to be used,
causing incorrect BLAKE2s hashes.  In practice, probably not: I've fixed
essentially this same bug in many other assembly files too, but there's
never been a real report of it having caused a problem.  In x86_64, all
writes to 32-bit registers are zero-extended to 64 bits.  That results
in zero-extension in nearly all situations.  I've only been able to
demonstrate a lack of zero-extension with a somewhat contrived example
involving truncation, e.g. when the C code has a u64 variable holding
0x1234567800000040 and passes it as a u32 expecting it to be truncated
to 0x40 (64).  But that's not what the real code does, of course.

Fixes: ed0356eda1 ("crypto: blake2s - x86_64 SIMD implementation")
Cc: stable@vger.kernel.org
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20251102234209.62133-2-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-11-05 20:30:51 -08:00
Eric Biggers
95ce85de0b lib/crypto: arm, arm64: Drop filenames from file comments
Remove self-references to filenames from assembly files in
lib/crypto/arm/ and lib/crypto/arm64/.  This follows the recommended
practice and eliminates an outdated reference to sha2-ce-core.S.

Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20251102014809.170713-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-11-05 20:30:51 -08:00
Eric Biggers
b8b816ec04 lib/crypto: arm/blake2s: Fix some comments
Fix the indices in some comments in blake2s-core.S.

Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20251102021553.176587-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-11-05 20:30:51 -08:00
Eric Biggers
862445d3b9 lib/crypto: s390/sha3: Add optimized one-shot SHA-3 digest functions
Some z/Architecture processors can compute a SHA-3 digest in a single
instruction.  arch/s390/crypto/ already uses this capability to optimize
the SHA-3 crypto_shash algorithms.

Use this capability to implement the sha3_224(), sha3_256(), sha3_384(),
and sha3_512() library functions too.

SHA3-256 benchmark results provided by Harald Freudenberger
(https://lore.kernel.org/r/4188d18bfcc8a64941c5ebd8de10ede2@linux.ibm.com/)
on a z/Architecture machine with "facility 86" (MSA level 12):

    Length (bytes)    Before (MB/s)   After (MB/s)
    ==============    =============   ============
          16                212             225
          64                820             915
         256               1850            3350
        1024               5400            8300
        4096              11200           11300

Note: the original data from Harald was given in the form of a graph for
each length, showing the distribution of throughputs from 500 runs.  I
guesstimated the peak of each one.

Harald also reported that the generic SHA-3 code was at most 259 MB/s
(https://lore.kernel.org/r/c39f6b6c110def0095e5da5becc12085@linux.ibm.com/).
So as expected, the earlier commit that optimized sha3_absorb_blocks()
and sha3_keccakf() is the more important one; it optimized the Keccak
permutation which is the most performance-critical part of SHA-3.
Still, this additional commit does notably improve performance further
on some lengths.

Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Harald Freudenberger <freude@linux.ibm.com>
Link: https://lore.kernel.org/r/20251026055032.1413733-13-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-11-05 20:30:41 -08:00
Eric Biggers
0354d3c1f1 lib/crypto: sha3: Support arch overrides of one-shot digest functions
Add support for architecture-specific overrides of sha3_224(),
sha3_256(), sha3_384(), and sha3_512().  This will be used to implement
these functions more efficiently on s390 than is possible via the usual
init + update + final flow.

Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Harald Freudenberger <freude@linux.ibm.com>
Link: https://lore.kernel.org/r/20251026055032.1413733-12-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-11-05 20:02:35 -08:00
Eric Biggers
04171105d3 lib/crypto: s390/sha3: Add optimized Keccak functions
Implement sha3_absorb_blocks() and sha3_keccakf() using the hardware-
accelerated SHA-3 support in Message-Security-Assist Extension 6.

This accelerates the SHA3-224, SHA3-256, SHA3-384, SHA3-512, and
SHAKE256 library functions.

Note that arch/s390/crypto/ already has SHA-3 code that uses this
extension, but it is exposed only via crypto_shash.  This commit brings
the same acceleration to the SHA-3 library.  The arch/s390/crypto/
version will become redundant and be removed in later changes.

Tested-by: Harald Freudenberger <freude@linux.ibm.com>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20251026055032.1413733-11-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-11-05 20:02:35 -08:00
Eric Biggers
1e29a75057 lib/crypto: arm64/sha3: Migrate optimized code into library
Instead of exposing the arm64-optimized SHA-3 code via arm64-specific
crypto_shash algorithms, instead just implement the sha3_absorb_blocks()
and sha3_keccakf() library functions.  This is much simpler, it makes
the SHA-3 library functions be arm64-optimized, and it fixes the
longstanding issue where the arm64-optimized SHA-3 code was disabled by
default.  SHA-3 still remains available through crypto_shash, but
individual architectures no longer need to handle it.

Note: to see the diff from arch/arm64/crypto/sha3-ce-glue.c to
lib/crypto/arm64/sha3.h, view this commit with 'git show -M10'.

Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20251026055032.1413733-10-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-11-05 20:02:35 -08:00
Eric Biggers
6fa873641c lib/crypto: sha3: Add FIPS cryptographic algorithm self-test
Since the SHA-3 algorithms are FIPS-approved, add the boot-time
self-test which is apparently required.  This closely follows the
corresponding SHA-1, SHA-256, and SHA-512 tests.

Tested-by: Harald Freudenberger <freude@linux.ibm.com>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20251026055032.1413733-8-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-11-05 20:02:35 -08:00
David Howells
c0db39e253 lib/crypto: sha3: Move SHA3 Iota step mapping into round function
In crypto/sha3_generic.c, the keccakf() function calls keccakf_round()
to do four of Keccak-f's five step mappings.  However, it does not do
the Iota step mapping - presumably because that is dependent on round
number, whereas Theta, Rho, Pi and Chi are not.

Note that the keccakf_round() function needs to be explicitly
non-inlined on certain architectures as gcc's produced output will (or
used to) use over 1KiB of stack space if inlined.

Now, this code was copied more or less verbatim into lib/crypto/sha3.c,
so that has the same aesthetic issue.  Fix this there by passing the
round number into sha3_keccakf_one_round_generic() and doing the Iota
step mapping there.

crypto/sha3_generic.c is left untouched as that will be converted to use
lib/crypto/sha3.c at some point.

Suggested-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Harald Freudenberger <freude@linux.ibm.com>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20251026055032.1413733-5-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-11-05 20:02:35 -08:00
David Howells
0593447248 lib/crypto: sha3: Add SHA-3 support
Add SHA-3 support to lib/crypto/.  All six algorithms in the SHA-3
family are supported: four digests (SHA3-224, SHA3-256, SHA3-384, and
SHA3-512) and two extendable-output functions (SHAKE128 and SHAKE256).

The SHAKE algorithms will be required for ML-DSA.

[EB: simplified the API to use fewer types and functions, fixed bug that
     sometimes caused incorrect SHAKE output, cleaned up the
     documentation, dropped an ad-hoc test that was inconsistent with
     the rest of lib/crypto/, and many other cleanups]

Signed-off-by: David Howells <dhowells@redhat.com>
Co-developed-by: Eric Biggers <ebiggers@kernel.org>
Tested-by: Harald Freudenberger <freude@linux.ibm.com>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20251026055032.1413733-4-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-11-05 20:02:32 -08:00
Eric Biggers
44e8241c51 lib/crypto: arm/curve25519: Disable on CPU_BIG_ENDIAN
On big endian arm kernels, the arm optimized Curve25519 code produces
incorrect outputs and fails the Curve25519 test.  This has been true
ever since this code was added.

It seems that hardly anyone (or even no one?) actually uses big endian
arm kernels.  But as long as they're ostensibly supported, we should
disable this code on them so that it's not accidentally used.

Note: for future-proofing, use !CPU_BIG_ENDIAN instead of
CPU_LITTLE_ENDIAN.  Both of these are arch-specific options that could
get removed in the future if big endian support gets dropped.

Fixes: d8f1308a02 ("crypto: arm/curve25519 - wire up NEON implementation")
Cc: stable@vger.kernel.org
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20251104054906.716914-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-11-04 09:36:22 -08:00
Nathan Chancellor
2b81082ad3 lib/crypto: curve25519-hacl64: Fix older clang KASAN workaround for GCC
Commit 2f13daee2a ("lib/crypto/curve25519-hacl64: Disable KASAN with
clang-17 and older") inadvertently disabled KASAN in curve25519-hacl64.o
for GCC unconditionally because clang-min-version will always evaluate
to nothing for GCC. Add a check for CONFIG_CC_IS_CLANG to avoid applying
the workaround for GCC, which is only needed for clang-17 and older.

Cc: stable@vger.kernel.org
Fixes: 2f13daee2a ("lib/crypto/curve25519-hacl64: Disable KASAN with clang-17 and older")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20251103-curve25519-hacl64-fix-kasan-workaround-v2-1-ab581cbd8035@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-11-04 09:35:58 -08:00
Eric Biggers
ba6617bd47 lib/crypto: arm/blake2b: Migrate optimized code into library
Migrate the arm-optimized BLAKE2b code from arch/arm/crypto/ to
lib/crypto/arm/.  This makes the BLAKE2b library able to use it, and it
also simplifies the code because it's easier to integrate with the
library than crypto_shash.

This temporarily makes the arm-optimized BLAKE2b code unavailable via
crypto_shash.  A later commit reimplements the blake2b-* crypto_shash
algorithms on top of the BLAKE2b library API, making it available again.

Note that as per the lib/crypto/ convention, the optimized code is now
enabled by default.  So, this also fixes the longstanding issue where
the optimized BLAKE2b code was not enabled by default.

To see the diff from arch/arm/crypto/blake2b-neon-glue.c to
lib/crypto/arm/blake2b.h, view this commit with 'git show -M10'.

Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20251018043106.375964-8-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-10-29 22:04:24 -07:00
Eric Biggers
23a16c9533 lib/crypto: blake2b: Add BLAKE2b library functions
Add a library API for BLAKE2b, closely modeled after the BLAKE2s API.

This will allow in-kernel users such as btrfs to use BLAKE2b without
going through the generic crypto layer.  In addition, as usual the
BLAKE2b crypto_shash algorithms will be reimplemented on top of this.

Note: to create lib/crypto/blake2b.c I made a copy of
lib/crypto/blake2s.c and made the updates from BLAKE2s => BLAKE2b.  This
way, the BLAKE2s and BLAKE2b code is kept consistent.  Therefore, it
borrows the SPDX-License-Identifier and Copyright from
lib/crypto/blake2s.c rather than crypto/blake2b_generic.c.

The library API uses 'struct blake2b_ctx', consistent with other
lib/crypto/ APIs.  The existing 'struct blake2b_state' will be removed
once the blake2b crypto_shash algorithms are updated to stop using it.

Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20251018043106.375964-7-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-10-29 22:04:24 -07:00
Eric Biggers
5385bcbffe lib/crypto: blake2s: Drop excessive const & rename block => data
A couple more small cleanups to the BLAKE2s code before these things get
propagated into the BLAKE2b code:

- Drop 'const' from some non-pointer function parameters.  It was a bit
  excessive and not conventional.

- Rename 'block' argument of blake2s_compress*() to 'data'.  This is for
  consistency with the SHA-* code, and also to avoid the implication
  that it points to a singular "block".

No functional changes.

Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20251018043106.375964-4-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-10-29 22:04:24 -07:00
Eric Biggers
5e0ec8e46d lib/crypto: blake2s: Rename blake2s_state to blake2s_ctx
For consistency with the SHA-1, SHA-2, SHA-3 (in development), and MD5
library APIs, rename blake2s_state to blake2s_ctx.

As a refresher, the ctx name:

- Is a bit shorter.
- Avoids confusion with the compression function state, which is also
  often called the state (but is just part of the full context).
- Is consistent with OpenSSL.

Not a big deal, of course.  But consistency is nice.  With a BLAKE2b
library API about to be added, this is a convenient time to update this.

Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20251018043106.375964-3-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-10-29 22:04:24 -07:00
Eric Biggers
50b8e36994 lib/crypto: blake2s: Adjust parameter order of blake2s()
Reorder the parameters of blake2s() from (out, in, key, outlen, inlen,
keylen) to (key, keylen, in, inlen, out, outlen).

This aligns BLAKE2s with the common conventions of pairing buffers and
their lengths, and having outputs follow inputs.  This is widely used
elsewhere in lib/crypto/ and crypto/, and even elsewhere in the BLAKE2s
code itself such as blake2s_init_key() and blake2s_final().  So
blake2s() was a bit of an exception.

Notably, this results in the same order as hmac_*_usingrawkey().

Note that since the type signature changed, it's not possible for a
blake2s() call site to be silently missed.

Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20251018043106.375964-2-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-10-29 22:04:24 -07:00
Eric Biggers
04cadb4fe0 lib/crypto: Add FIPS self-tests for SHA-1 and SHA-2
Add FIPS cryptographic algorithm self-tests for all SHA-1 and SHA-2
algorithms.  Following the "Implementation Guidance for FIPS 140-3"
document, to achieve this it's sufficient to just test a single test
vector for each of HMAC-SHA1, HMAC-SHA256, and HMAC-SHA512.

Just run these tests in the initcalls, following the example of e.g.
crypto/kdf_sp800108.c.  Note that this should meet the FIPS self-test
requirement even in the built-in case, given that the initcalls run
before userspace, storage, network, etc. are accessible.

This does not fix a regression, seeing as lib/ has had SHA-1 support
since 2005 and SHA-256 support since 2018.  Neither ever had FIPS
self-tests.  Moreover, fips=1 support has always been an unfinished
feature upstream.  However, with lib/ now being used more widely, it's
now seeing more scrutiny and people seem to want these now [1][2].

[1] https://lore.kernel.org/r/3226361.1758126043@warthog.procyon.org.uk/
[2] https://lore.kernel.org/r/f31dbb22-0add-481c-aee0-e337a7731f8e@oracle.com/

Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20251011001047.51886-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-10-29 22:04:24 -07:00
Eric Biggers
1af424b154 lib/crypto: poly1305: Restore dependency of arch code on !KMSAN
Restore the dependency of the architecture-optimized Poly1305 code on
!KMSAN.  It was dropped by commit b646b782e5 ("lib/crypto: poly1305:
Consolidate into single module").

Unlike the other hash algorithms in lib/crypto/ (e.g., SHA-512), the way
the architecture-optimized Poly1305 code is integrated results in
assembly code initializing memory, for several different architectures.
Thus, it generates false positive KMSAN warnings.  These could be
suppressed with kmsan_unpoison_memory(), but it would be needed in quite
a few places.  For now let's just restore the dependency on !KMSAN.

Note: this should have been caught by running poly1305_kunit with
CONFIG_KMSAN=y, which I did.  However, due to an unrelated KMSAN bug
(https://lore.kernel.org/r/20251022030213.GA35717@sol/), KMSAN currently
isn't working reliably.  Thus, the warning wasn't noticed until later.

Fixes: b646b782e5 ("lib/crypto: poly1305: Consolidate into single module")
Reported-by: syzbot+01fcd39a0d90cdb0e3df@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/r/68f6a48f.050a0220.91a22.0452.GAE@google.com/
Reported-by: Pei Xiao <xiaopei01@kylinos.cn>
Closes: https://lore.kernel.org/r/751b3d80293a6f599bb07770afcef24f623c7da0.1761026343.git.xiaopei01@kylinos.cn/
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20251022033405.64761-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-10-22 10:52:10 -07:00
Linus Torvalds
1896ce8eb6 Merge tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fsverity/linux
Pull interleaved SHA-256 hashing support from Eric Biggers:
 "Optimize fsverity with 2-way interleaved hashing

  Add support for 2-way interleaved SHA-256 hashing to lib/crypto/, and
  make fsverity use it for faster file data verification. This improves
  fsverity performance on many x86_64 and arm64 processors.

  Later, I plan to make dm-verity use this too"

* tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fsverity/linux:
  fsverity: Use 2-way interleaved SHA-256 hashing when supported
  fsverity: Remove inode parameter from fsverity_hash_block()
  lib/crypto: tests: Add tests and benchmark for sha256_finup_2x()
  lib/crypto: x86/sha256: Add support for 2-way interleaved hashing
  lib/crypto: arm64/sha256: Add support for 2-way interleaved hashing
  lib/crypto: sha256: Add support for 2-way interleaved hashing
2025-09-29 15:55:20 -07:00
Linus Torvalds
d8768fb12a Merge tag 'libcrypto-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux
Pull crypto library updates from Eric Biggers:

 - Add a RISC-V optimized implementation of Poly1305. This code was
   written by Andy Polyakov and contributed by Zhihang Shao.

 - Migrate the MD5 code into lib/crypto/, and add KUnit tests for MD5.

   Yes, it's still the 90s, and several kernel subsystems are still
   using MD5 for legacy use cases. As long as that remains the case,
   it's helpful to clean it up in the same way as I've been doing for
   other algorithms.

   Later, I plan to convert most of these users of MD5 to use the new
   MD5 library API instead of the generic crypto API.

 - Simplify the organization of the ChaCha, Poly1305, BLAKE2s, and
   Curve25519 code.

   Consolidate these into one module per algorithm, and centralize the
   configuration and build process. This is the same reorganization that
   has already been successful for SHA-1 and SHA-2.

 - Remove the unused crypto_kpp API for Curve25519.

 - Migrate the BLAKE2s and Curve25519 self-tests to KUnit.

 - Always enable the architecture-optimized BLAKE2s code.

* tag 'libcrypto-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux: (38 commits)
  crypto: md5 - Implement export_core() and import_core()
  wireguard: kconfig: simplify crypto kconfig selections
  lib/crypto: tests: Enable Curve25519 test when CRYPTO_SELFTESTS
  lib/crypto: curve25519: Consolidate into single module
  lib/crypto: curve25519: Move a couple functions out-of-line
  lib/crypto: tests: Add Curve25519 benchmark
  lib/crypto: tests: Migrate Curve25519 self-test to KUnit
  crypto: curve25519 - Remove unused kpp support
  crypto: testmgr - Remove curve25519 kpp tests
  crypto: x86/curve25519 - Remove unused kpp support
  crypto: powerpc/curve25519 - Remove unused kpp support
  crypto: arm/curve25519 - Remove unused kpp support
  crypto: hisilicon/hpre - Remove unused curve25519 kpp support
  lib/crypto: tests: Add KUnit tests for BLAKE2s
  lib/crypto: blake2s: Consolidate into single C translation unit
  lib/crypto: blake2s: Move generic code into blake2s.c
  lib/crypto: blake2s: Always enable arch-optimized BLAKE2s code
  lib/crypto: blake2s: Remove obsolete self-test
  lib/crypto: x86/blake2s: Reduce size of BLAKE2S_SIGMA2
  lib/crypto: chacha: Consolidate into single module
  ...
2025-09-29 15:48:56 -07:00
Linus Torvalds
e2fffe1d95 Merge tag 'crc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux
Pull CRC updates from Eric Biggers:
 "Update crc_kunit to test the CRC functions in softirq and hardirq
  contexts, similar to what the lib/crypto/ KUnit tests do. Move the
  helper function needed to do this into a common header.

  This is useful mainly to test fallback code paths for when
  FPU/SIMD/vector registers are unusable"

* tag 'crc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux:
  Documentation/staging: Fix typo and incorrect citation in crc32.rst
  lib/crc: Drop inline from all *_mod_init_arch() functions
  lib/crc: Use underlying functions instead of crypto_simd_usable()
  lib/crc: crc_kunit: Test CRC computation in interrupt contexts
  kunit, lib/crypto: Move run_irq_test() to common header
2025-09-29 15:36:42 -07:00
Eric Biggers
6733968be7 lib/crypto: tests: Add tests and benchmark for sha256_finup_2x()
Update sha256_kunit to include test cases and a benchmark for the new
sha256_finup_2x() function.

Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20250915160819.140019-5-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-09-17 13:09:40 -05:00
Eric Biggers
bc6d6a4172 lib/crypto: x86/sha256: Add support for 2-way interleaved hashing
Add an implementation of sha256_finup_2x_arch() for x86_64.  It
interleaves the computation of two SHA-256 hashes using the x86 SHA-NI
instructions.  dm-verity and fs-verity will take advantage of this for
greatly improved performance on capable CPUs.

This increases the throughput of SHA-256 hashing 4096-byte messages by
the following amounts on the following CPUs:

    Intel Ice Lake (server):        4%
    Intel Sapphire Rapids:          38%
    Intel Emerald Rapids:           38%
    AMD Zen 1 (Threadripper 1950X): 84%
    AMD Zen 4 (EPYC 9B14):          98%
    AMD Zen 5 (Ryzen 9 9950X):      64%

For now, this seems to benefit AMD more than Intel.  This seems to be
because current AMD CPUs support concurrent execution of the SHA-NI
instructions, but unfortunately current Intel CPUs don't, except for the
sha256msg2 instruction.  Hopefully future Intel CPUs will support SHA-NI
on more execution ports.  Zen 1 supports 2 concurrent sha256rnds2, and
Zen 4 supports 4 concurrent sha256rnds2, which suggests that even better
performance may be achievable on Zen 4 by interleaving more than two
hashes.  However, doing so poses a number of trade-offs, and furthermore
Zen 5 goes back to supporting "only" 2 concurrent sha256rnds2.

Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20250915160819.140019-4-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-09-17 13:09:40 -05:00
Eric Biggers
34c3f1e346 lib/crypto: arm64/sha256: Add support for 2-way interleaved hashing
Add an implementation of sha256_finup_2x_arch() for arm64.  It
interleaves the computation of two SHA-256 hashes using the ARMv8
SHA-256 instructions.  dm-verity and fs-verity will take advantage of
this for greatly improved performance on capable CPUs.

This increases the throughput of SHA-256 hashing 4096-byte messages by
the following amounts on the following CPUs:

    ARM Cortex-X1: 70%
    ARM Cortex-X3: 68%
    ARM Cortex-A76: 65%
    ARM Cortex-A715: 43%
    ARM Cortex-A510: 25%
    ARM Cortex-A55: 8%

Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20250915160819.140019-3-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-09-17 13:09:39 -05:00
Eric Biggers
4ca24d6abb lib/crypto: sha256: Add support for 2-way interleaved hashing
Many arm64 and x86_64 CPUs can compute two SHA-256 hashes in nearly the
same speed as one, if the instructions are interleaved.  This is because
SHA-256 is serialized block-by-block, and two interleaved hashes take
much better advantage of the CPU's instruction-level parallelism.

Meanwhile, a very common use case for SHA-256 hashing in the Linux
kernel is dm-verity and fs-verity.  Both use a Merkle tree that has a
fixed block size, usually 4096 bytes with an empty or 32-byte salt
prepended.  Usually, many blocks need to be hashed at a time.  This is
an ideal scenario for 2-way interleaved hashing.

To enable this optimization, add a new function sha256_finup_2x() to the
SHA-256 library API.  It computes the hash of two equal-length messages,
starting from a common initial context.

For now it always falls back to sequential processing.  Later patches
will wire up arm64 and x86_64 optimized implementations.

Note that the interleaving factor could in principle be higher than 2x.
However, that runs into many practical difficulties and CPU throughput
limitations.  Thus, both the implementations I'm adding are 2x.  In the
interest of using the simplest solution, the API matches that.

Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20250915160819.140019-2-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-09-17 13:09:39 -05:00