linux

mirror of https://github.com/torvalds/linux.git synced 2025-12-07 20:06:24 +00:00

Author	SHA1	Message	Date
Linus Torvalds	509d3f4584	Merge tag 'mm-nonmm-stable-2025-12-06-11-14' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: - "panic: sys_info: Refactor and fix a potential issue" (Andy Shevchenko) fixes a build issue and does some cleanup in ib/sys_info.c - "Implement mul_u64_u64_div_u64_roundup()" (David Laight) enhances the 64-bit math code on behalf of a PWM driver and beefs up the test module for these library functions - "scripts/gdb/symbols: make BPF debug info available to GDB" (Ilya Leoshkevich) makes BPF symbol names, sizes, and line numbers available to the GDB debugger - "Enable hung_task and lockup cases to dump system info on demand" (Feng Tang) adds a sysctl which can be used to cause additional info dumping when the hung-task and lockup detectors fire - "lib/base64: add generic encoder/decoder, migrate users" (Kuan-Wei Chiu) adds a general base64 encoder/decoder to lib/ and migrates several users away from their private implementations - "rbree: inline rb_first() and rb_last()" (Eric Dumazet) makes TCP a little faster - "liveupdate: Rework KHO for in-kernel users" (Pasha Tatashin) reworks the KEXEC Handover interfaces in preparation for Live Update Orchestrator (LUO), and possibly for other future clients - "kho: simplify state machine and enable dynamic updates" (Pasha Tatashin) increases the flexibility of KEXEC Handover. Also preparation for LUO - "Live Update Orchestrator" (Pasha Tatashin) is a major new feature targeted at cloud environments. Quoting the cover letter: This series introduces the Live Update Orchestrator, a kernel subsystem designed to facilitate live kernel updates using a kexec-based reboot. This capability is critical for cloud environments, allowing hypervisors to be updated with minimal downtime for running virtual machines. LUO achieves this by preserving the state of selected resources, such as memory, devices and their dependencies, across the kernel transition. As a key feature, this series includes support for preserving memfd file descriptors, which allows critical in-memory data, such as guest RAM or any other large memory region, to be maintained in RAM across the kexec reboot. Mike Rappaport merits a mention here, for his extensive review and testing work. - "kexec: reorganize kexec and kdump sysfs" (Sourabh Jain) moves the kexec and kdump sysfs entries from /sys/kernel/ to /sys/kernel/kexec/ and adds back-compatibility symlinks which can hopefully be removed one day - "kho: fixes for vmalloc restoration" (Mike Rapoport) fixes a BUG which was being hit during KHO restoration of vmalloc() regions * tag 'mm-nonmm-stable-2025-12-06-11-14' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (139 commits) calibrate: update header inclusion Reinstate "resource: avoid unnecessary lookups in find_next_iomem_res()" vmcoreinfo: track and log recoverable hardware errors kho: fix restoring of contiguous ranges of order-0 pages kho: kho_restore_vmalloc: fix initialization of pages array MAINTAINERS: TPM DEVICE DRIVER: update the W-tag init: replace simple_strtoul with kstrtoul to improve lpj_setup KHO: fix boot failure due to kmemleak access to non-PRESENT pages Documentation/ABI: new kexec and kdump sysfs interface Documentation/ABI: mark old kexec sysfs deprecated kexec: move sysfs entries to /sys/kernel/kexec test_kho: always print restore status kho: free chunks using free_page() instead of kfree() selftests/liveupdate: add kexec test for multiple and empty sessions selftests/liveupdate: add simple kexec-based selftest for LUO selftests/liveupdate: add userspace API selftests docs: add documentation for memfd preservation via LUO mm: memfd_luo: allow preserving memfd liveupdate: luo_file: add private argument to store runtime state mm: shmem: export some functions to internal.h ...	2025-12-06 14:01:20 -08:00
Linus Torvalds	9368f0f941	Merge tag 'vfs-6.19-rc1.inode' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs inode updates from Christian Brauner: "Features: - Hide inode->i_state behind accessors. Open-coded accesses prevent asserting they are done correctly. One obvious aspect is locking, but significantly more can be checked. For example it can be detected when the code is clearing flags which are already missing, or is setting flags when it is illegal (e.g., I_FREEING when ->i_count > 0) - Provide accessors for ->i_state, converts all filesystems using coccinelle and manual conversions (btrfs, ceph, smb, f2fs, gfs2, overlayfs, nilfs2, xfs), and makes plain ->i_state access fail to compile - Rework I_NEW handling to operate without fences, simplifying the code after the accessor infrastructure is in place Cleanups: - Move wait_on_inode() from writeback.h to fs.h - Spell out fenced ->i_state accesses with explicit smp_wmb/smp_rmb for clarity - Cosmetic fixes to LRU handling - Push list presence check into inode_io_list_del() - Touch up predicts in __d_lookup_rcu() - ocfs2: retire ocfs2_drop_inode() and I_WILL_FREE usage - Assert on ->i_count in iput_final() - Assert ->i_lock held in __iget() Fixes: - Add missing fences to I_NEW handling" * tag 'vfs-6.19-rc1.inode' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (22 commits) dcache: touch up predicts in __d_lookup_rcu() fs: push list presence check into inode_io_list_del() fs: cosmetic fixes to lru handling fs: rework I_NEW handling to operate without fences fs: make plain ->i_state access fail to compile xfs: use the new ->i_state accessors nilfs2: use the new ->i_state accessors overlayfs: use the new ->i_state accessors gfs2: use the new ->i_state accessors f2fs: use the new ->i_state accessors smb: use the new ->i_state accessors ceph: use the new ->i_state accessors btrfs: use the new ->i_state accessors Manual conversion to use ->i_state accessors of all places not covered by coccinelle Coccinelle-based conversion to use ->i_state accessors fs: provide accessors for ->i_state fs: spell out fenced ->i_state accesses with explicit smp_wmb/smp_rmb fs: move wait_on_inode() from writeback.h to fs.h fs: add missing fences to I_NEW handling ocfs2: retire ocfs2_drop_inode() and I_WILL_FREE usage ...	2025-12-01 09:02:34 -08:00
Guan-Chun Wu	7794510e20	fscrypt: replace local base64url helpers with lib/base64 Replace the base64url encoding and decoding functions in fscrypt with the generic base64_encode() and base64_decode() helpers from lib/base64. This removes the custom implementation in fscrypt, reduces code duplication, and relies on the shared Base64 implementation in lib. The helpers preserve RFC 4648-compliant URL-safe Base64 encoding without padding, so there are no functional changes. This change also improves performance: encoding is about 2.7x faster and decoding achieves 43-52x speedups compared to the previous implementation. Link: https://lkml.kernel.org/r/20251114060221.89734-1-409411716@gms.tku.edu.tw Reviewed-by: Kuan-Wei Chiu <visitorckw@gmail.com> Signed-off-by: Guan-Chun Wu <409411716@gms.tku.edu.tw> Cc: Christoph Hellwig <hch@lst.de> Cc: David Laight <david.laight.linux@gmail.com> Cc: Eric Biggers <ebiggers@kernel.org> Cc: Ilya Dryomov <idryomov@gmail.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Keith Busch <kbusch@kernel.org> Cc: Sagi Grimberg <sagi@grimberg.me> Cc: "Theodore Y. Ts'o" <tytso@mit.edu> Cc: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> Cc: Xiubo Li <xiubli@redhat.com> Cc: Yu-Sheng Huang <home7438072@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-11-20 14:03:44 -08:00
Yongpeng Yang	1e39da974c	fscrypt: fix left shift underflow when inode->i_blkbits > PAGE_SHIFT When simulating an nvme device on qemu with both logical_block_size and physical_block_size set to 8 KiB, an error trace appears during partition table reading at boot time. The issue is caused by inode->i_blkbits being larger than PAGE_SHIFT, which leads to a left shift of -1 and triggering a UBSAN warning. [ 2.697306] ------------[ cut here ]------------ [ 2.697309] UBSAN: shift-out-of-bounds in fs/crypto/inline_crypt.c:336:37 [ 2.697311] shift exponent -1 is negative [ 2.697315] CPU: 3 UID: 0 PID: 274 Comm: (udev-worker) Not tainted 6.18.0-rc2+ #34 PREEMPT(voluntary) [ 2.697317] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 [ 2.697320] Call Trace: [ 2.697324] <TASK> [ 2.697325] dump_stack_lvl+0x76/0xa0 [ 2.697340] dump_stack+0x10/0x20 [ 2.697342] __ubsan_handle_shift_out_of_bounds+0x1e3/0x390 [ 2.697351] bh_get_inode_and_lblk_num.cold+0x12/0x94 [ 2.697359] fscrypt_set_bio_crypt_ctx_bh+0x44/0x90 [ 2.697365] submit_bh_wbc+0xb6/0x190 [ 2.697370] block_read_full_folio+0x194/0x270 [ 2.697371] ? __pfx_blkdev_get_block+0x10/0x10 [ 2.697375] ? __pfx_blkdev_read_folio+0x10/0x10 [ 2.697377] blkdev_read_folio+0x18/0x30 [ 2.697379] filemap_read_folio+0x40/0xe0 [ 2.697382] filemap_get_pages+0x5ef/0x7a0 [ 2.697385] ? mmap_region+0x63/0xd0 [ 2.697389] filemap_read+0x11d/0x520 [ 2.697392] blkdev_read_iter+0x7c/0x180 [ 2.697393] vfs_read+0x261/0x390 [ 2.697397] ksys_read+0x71/0xf0 [ 2.697398] __x64_sys_read+0x19/0x30 [ 2.697399] x64_sys_call+0x1e88/0x26a0 [ 2.697405] do_syscall_64+0x80/0x670 [ 2.697410] ? __x64_sys_newfstat+0x15/0x20 [ 2.697414] ? x64_sys_call+0x204a/0x26a0 [ 2.697415] ? do_syscall_64+0xb8/0x670 [ 2.697417] ? irqentry_exit_to_user_mode+0x2e/0x2a0 [ 2.697420] ? irqentry_exit+0x43/0x50 [ 2.697421] ? exc_page_fault+0x90/0x1b0 [ 2.697422] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 2.697425] RIP: 0033:0x75054cba4a06 [ 2.697426] Code: 5d e8 41 8b 93 08 03 00 00 59 5e 48 83 f8 fc 75 19 83 e2 39 83 fa 08 75 11 e8 26 ff ff ff 66 0f 1f 44 00 00 48 8b 45 10 0f 05 <48> 8b 5d f8 c9 c3 0f 1f 40 00 f3 0f 1e fa 55 48 89 e5 48 83 ec 08 [ 2.697427] RSP: 002b:00007fff973723a0 EFLAGS: 00000202 ORIG_RAX: 0000000000000000 [ 2.697430] RAX: ffffffffffffffda RBX: 00005ea9a2c02760 RCX: 000075054cba4a06 [ 2.697432] RDX: 0000000000002000 RSI: 000075054c190000 RDI: 000000000000001b [ 2.697433] RBP: 00007fff973723c0 R08: 0000000000000000 R09: 0000000000000000 [ 2.697434] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000 [ 2.697434] R13: 00005ea9a2c027c0 R14: 00005ea9a2be5608 R15: 00005ea9a2be55f0 [ 2.697436] </TASK> [ 2.697436] ---[ end trace ]--- This situation can happen for block devices because when CONFIG_TRANSPARENT_HUGEPAGE is enabled, the maximum logical_block_size is 64 KiB. set_init_blocksize() then sets the block device inode->i_blkbits to 13, which is within this limit. File I/O does not trigger this problem because for filesystems that do not support the FS_LBS feature, sb_set_blocksize() prevents sb->s_blocksize_bits from being larger than PAGE_SHIFT. During inode allocation, alloc_inode()->inode_init_always() assigns inode->i_blkbits from sb->s_blocksize_bits. Currently, only xfs_fs_type has the FS_LBS flag, and since xfs I/O paths do not reach submit_bh_wbc(), it does not hit the left-shift underflow issue. Signed-off-by: Yongpeng Yang <yangyongpeng@xiaomi.com> Fixes: `47dd675323` ("block/bdev: lift block size restrictions to 64k") Cc: stable@vger.kernel.org [EB: use folio_pos() and consolidate the two shifts by i_blkbits] Link: https://lore.kernel.org/r/20251105003642.42796-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>	2025-11-04 16:37:38 -08:00
Mateusz Guzik	b4dbfd8653	Coccinelle-based conversion to use ->i_state accessors All places were patched by coccinelle with the default expecting that ->i_lock is held, afterwards entries got fixed up by hand to use unlocked variants as needed. The script: @@ expression inode, flags; @@ - inode->i_state & flags + inode_state_read(inode) & flags @@ expression inode, flags; @@ - inode->i_state &= ~flags + inode_state_clear(inode, flags) @@ expression inode, flag1, flag2; @@ - inode->i_state &= ~flag1 & ~flag2 + inode_state_clear(inode, flag1 \| flag2) @@ expression inode, flags; @@ - inode->i_state \|= flags + inode_state_set(inode, flags) @@ expression inode, flags; @@ - inode->i_state = flags + inode_state_assign(inode, flags) @@ expression inode, flags; @@ - flags = inode->i_state + flags = inode_state_read(inode) @@ expression inode, flags; @@ - READ_ONCE(inode->i_state) & flags + inode_state_read(inode) & flags Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-10-20 20:22:26 +02:00
Linus Torvalds	d60ac92c10	Merge tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linux Pull fscrypt updates from Eric Biggers: "Make fs/crypto/ use the HMAC-SHA512 library functions instead of crypto_shash. This is simpler, faster, and more reliable" * tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linux: fscrypt: use HMAC-SHA512 library for HKDF fscrypt: Remove redundant __GFP_NOWARN	2025-09-29 15:33:50 -07:00
Eric Biggers	19591f7e78	fscrypt: use HMAC-SHA512 library for HKDF For the HKDF-SHA512 key derivation needed by fscrypt, just use the HMAC-SHA512 library functions directly. These functions were introduced in v6.17, and they provide simple and efficient direct support for HMAC-SHA512. This ends up being quite a bit simpler and more efficient than using crypto/hkdf.c, as it avoids the generic crypto layer: - The HMAC library can't fail, so callers don't need to handle errors - No inefficient indirect calls - No inefficient and error-prone dynamic allocations - No inefficient and error-prone loading of algorithm by name - Less stack usage Benchmarks on x86_64 show that deriving a per-file key gets about 30% faster, and FS_IOC_ADD_ENCRYPTION_KEY gets nearly twice as fast. The only small downside is the HKDF-Expand logic gets duplicated again. Then again, even considering that, the new fscrypt_hkdf_expand() is only 7 lines longer than the version that called hkdf_expand(). Later we could add HKDF support to lib/crypto/, but for now let's just do this. Link: https://lore.kernel.org/r/20250906035913.1141532-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>	2025-09-05 21:01:51 -07:00
Eric Biggers	93221de31a	fscrypt: add support for info in fs-specific part of inode Add an inode_info_offs field to struct fscrypt_operations, and update fs/crypto/ to support it. When set to a nonzero value, it specifies the offset to the fscrypt_inode_info pointer within the filesystem-specific part of the inode structure, to be used instead of inode::i_crypt_info. Since this makes inode::i_crypt_info no longer necessarily used, update comments that mentioned it. This is a prerequisite for a later commit that removes inode::i_crypt_info, saving memory and improving cache efficiency with filesystems that don't support fscrypt. Co-developed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Eric Biggers <ebiggers@kernel.org> Link: https://lore.kernel.org/20250810075706.172910-3-ebiggers@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-08-21 13:58:07 +02:00
Eric Biggers	6c9468aad2	fscrypt: replace raw loads of info pointer with helper function Add and use a helper function fscrypt_get_inode_info_raw(). It loads an inode's fscrypt info pointer using a raw dereference, which is appropriate when the caller knows the key setup already happened. This eliminates most occurrences of inode::i_crypt_info in the source, in preparation for replacing that with a filesystem-specific field. Co-developed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Eric Biggers <ebiggers@kernel.org> Link: https://lore.kernel.org/20250810075706.172910-2-ebiggers@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-08-21 13:58:06 +02:00
Qianfeng Rong	0e6608d493	fscrypt: Remove redundant __GFP_NOWARN GFP_NOWAIT already includes __GFP_NOWARN, so let's remove the redundant __GFP_NOWARN. Signed-off-by: Qianfeng Rong <rongqianfeng@vivo.com> Link: https://lore.kernel.org/r/20250803102243.623705-3-rongqianfeng@vivo.com Signed-off-by: Eric Biggers <ebiggers@kernel.org>	2025-08-11 10:27:32 -07:00
Eric Biggers	47462586f9	fscrypt: Remove gfp_t argument from fscrypt_encrypt_block_inplace() This argument is no longer used, so remove it. Reviewed-by: Alex Markuze <amarkuze@redhat.com> Link: https://lore.kernel.org/r/20250710060754.637098-6-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>	2025-07-10 12:33:13 -07:00
Eric Biggers	a9a95ecd9d	fscrypt: Remove gfp_t argument from fscrypt_crypt_data_unit() This argument is no longer used, so remove it. Link: https://lore.kernel.org/r/20250710060754.637098-5-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>	2025-07-10 12:33:08 -07:00
Eric Biggers	52e7e0d889	fscrypt: Switch to sync_skcipher and on-stack requests Now that fscrypt uses only synchronous skciphers, switch to the actual sync_skcipher API and the corresponding on-stack requests. This eliminates a heap allocation per en/decryption operation. Link: https://lore.kernel.org/r/20250710060754.637098-4-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>	2025-07-10 12:33:08 -07:00
Eric Biggers	53d9218d8d	fscrypt: Drop FORBID_WEAK_KEYS flag for AES-ECB This flag only has an effect for DES, 3DES, and XTS mode. It does nothing for AES-ECB, as there is no concept of weak keys for AES. Link: https://lore.kernel.org/r/20250710060754.637098-3-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>	2025-07-10 12:33:08 -07:00
Eric Biggers	71ffd1dc52	fscrypt: Don't use asynchronous CryptoAPI algorithms Now that fscrypt's incomplete support for non-inline crypto engines has been removed, and none of the CPU-based algorithms have the CRYPTO_ALG_ASYNC flag set anymore, there is no need to accommodate asynchronous algorithms. Therefore, explicitly allocate only synchronous algorithms. Then, remove the code that handled waiting for asynchronous en/decryption operations to complete. This commit should not be backported to kernels that lack commit `0ba6ec5b29` ("crypto: x86/aes - stop using the SIMD helper"), as then it would disable the use of the optimized AES code on x86. Link: https://lore.kernel.org/r/20250710060754.637098-2-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>	2025-07-10 12:33:08 -07:00
Eric Biggers	b41c1d8d07	fscrypt: Don't use problematic non-inline crypto engines Make fscrypt no longer use Crypto API drivers for non-inline crypto engines, even when the Crypto API prioritizes them over CPU-based code (which unfortunately it often does). These drivers tend to be really problematic, especially for fscrypt's workload. This commit has no effect on inline crypto engines, which are different and do work well. Specifically, exclude drivers that have CRYPTO_ALG_KERN_DRIVER_ONLY or CRYPTO_ALG_ALLOCATES_MEMORY set. (Later, CRYPTO_ALG_ASYNC should be excluded too. That's omitted for now to keep this commit backportable, since until recently some CPU-based code had CRYPTO_ALG_ASYNC set.) There are two major issues with these drivers: bugs and performance. First, these drivers tend to be buggy. They're fundamentally much more error-prone and harder to test than the CPU-based code. They often don't get tested before kernel releases, and even if they do, the crypto self-tests don't properly test these drivers. Released drivers have en/decrypted or hashed data incorrectly. These bugs cause issues for fscrypt users who often didn't even want to use these drivers, e.g.: - https://github.com/google/fscryptctl/issues/32 - https://github.com/google/fscryptctl/issues/9 - https://lore.kernel.org/r/PH0PR02MB731916ECDB6C613665863B6CFFAA2@PH0PR02MB7319.namprd02.prod.outlook.com These drivers have also similarly caused issues for dm-crypt users, including data corruption and deadlocks. Since Linux v5.10, dm-crypt has disabled most of them by excluding CRYPTO_ALG_ALLOCATES_MEMORY. Second, these drivers tend to be much slower than the CPU-based code. This may seem counterintuitive, but benchmarks clearly show it. There's a lot of overhead associated with going to a hardware driver, off the CPU, and back again. To prove this, I gathered as many systems with this type of crypto engine as I could, and I measured synchronous encryption of 4096-byte messages (which matches fscrypt's workload): Intel Emerald Rapids server: AES-256-XTS: xts-aes-vaes-avx512 16171 MB/s [CPU-based, Vector AES] qat_aes_xts 289 MB/s [Offload, Intel QuickAssist] Qualcomm SM8650 HDK: AES-256-XTS: xts-aes-ce 4301 MB/s [CPU-based, ARMv8 Crypto Extensions] xts-aes-qce 73 MB/s [Offload, Qualcomm Crypto Engine] i.MX 8M Nano LPDDR4 EVK: AES-256-XTS: xts-aes-ce 647 MB/s [CPU-based, ARMv8 Crypto Extensions] xts(ecb-aes-caam) 20 MB/s [Offload, CAAM] AES-128-CBC-ESSIV: essiv(cbc-aes-caam,sha256-lib) 23 MB/s [Offload, CAAM] STM32MP157F-DK2: AES-256-XTS: xts-aes-neonbs 13.2 MB/s [CPU-based, ARM NEON] xts(stm32-ecb-aes) 3.1 MB/s [Offload, STM32 crypto engine] AES-128-CBC-ESSIV: essiv(cbc-aes-neonbs,sha256-lib) 14.7 MB/s [CPU-based, ARM NEON] essiv(stm32-cbc-aes,sha256-lib) 3.2 MB/s [Offload, STM32 crypto engine] Adiantum: adiantum(xchacha12-arm,aes-arm,nhpoly1305-neon) 52.8 MB/s [CPU-based, ARM scalar + NEON] So, there was no case in which the crypto engine was even close to being faster. On the first three, which have AES instructions in the CPU, the CPU was 30 to 55 times faster (!). Even on STM32MP157F-DK2 which has a Cortex-A7 CPU that doesn't have AES instructions, AES was over 4 times faster on the CPU. And Adiantum encryption, which is what actually should be used on CPUs like that, was over 17 times faster. Other justifications that have been given for these non-inline crypto engines (almost always coming from the hardware vendors, not actual users) don't seem very plausible either: - The crypto engine throughput could be improved by processing multiple requests concurrently. Currently irrelevant to fscrypt, since it doesn't do that. This would also be complex, and unhelpful in many cases. 2 of the 4 engines I tested even had only one queue. - Some of the engines, e.g. STM32, support hardware keys. Also currently irrelevant to fscrypt, since it doesn't support these. Interestingly, the STM32 driver itself doesn't support this either. - Free up CPU for other tasks and/or reduce energy usage. Not very plausible considering the "short" message length, driver overhead, and scheduling overhead. There's just very little time for the CPU to do something else like run another task or enter low-power state, before the message finishes and it's time to process the next one. - Some of these engines resist power analysis and electromagnetic attacks, while the CPU-based crypto generally does not. In theory, this sounds great. In practice, if this benefit requires the use of an off-CPU offload that massively regresses performance and has a low-quality, buggy driver, the price for this hardening (which is not relevant to most fscrypt users, and tends to be incomplete) is just too high. Inline crypto engines are much more promising here, as are on-CPU solutions like RISC-V High Assurance Cryptography. Fixes: `b30ab0e034` ("ext4 crypto: add ext4 encryption facilities") Cc: stable@vger.kernel.org Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20250704070322.20692-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>	2025-07-04 10:25:26 -07:00
Eric Biggers	c6a14b32c9	fscrypt: Explicitly include <linux/export.h> Fix build warnings with W=1 that started appearing after commit `a934a57a42` ("scripts/misc-check: check missing #include <linux/export.h> when W=1"). While at it, also sort the include lists alphabetically. Link: https://lore.kernel.org/r/20250614221301.100803-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>	2025-06-20 14:30:23 -07:00
Eric Biggers	c07d3aede2	fscrypt: add support for hardware-wrapped keys Add support for hardware-wrapped keys to fscrypt. Such keys are protected from certain attacks, such as cold boot attacks. For more information, see the "Hardware-wrapped keys" section of Documentation/block/inline-encryption.rst. To support hardware-wrapped keys in fscrypt, we allow the fscrypt master keys to be hardware-wrapped. File contents encryption is done by passing the wrapped key to the inline encryption hardware via blk-crypto. Other fscrypt operations such as filenames encryption continue to be done by the kernel, using the "software secret" which the hardware derives. For more information, see the documentation which this patch adds to Documentation/filesystems/fscrypt.rst. Note that this feature doesn't require any filesystem-specific changes. However it does depend on inline encryption support, and thus currently it is only applicable to ext4 and f2fs. The version of this feature introduced by this patch is mostly equivalent to the version that has existed downstream in the Android Common Kernels since 2020. However, a couple fixes are included. First, the flags field in struct fscrypt_add_key_arg is now placed in the proper location. Second, key identifiers for HW-wrapped keys are now derived using a distinct HKDF context byte; this fixes a bug where a raw key could have the same identifier as a HW-wrapped key. Note that as a result of these fixes, the version of this feature introduced by this patch is not UAPI or on-disk format compatible with the version in the Android Common Kernels, though the divergence is limited to just those specific fixes. This version should be used going forwards. This patch has been heavily rewritten from the original version by Gaurav Kashyap <quic_gaurkash@quicinc.com> and Barani Muthukumaran <bmuthuku@codeaurora.org>. Tested-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> # sm8650 Link: https://lore.kernel.org/r/20250404225859.172344-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>	2025-04-08 19:32:11 -07:00
Linus Torvalds	9b960d8cd6	Merge tag 'for-6.15/block-20250322' of git://git.kernel.dk/linux Pull block updates from Jens Axboe: - Fixes for integrity handling - NVMe pull request via Keith: - Secure concatenation for TCP transport (Hannes) - Multipath sysfs visibility (Nilay) - Various cleanups (Qasim, Baruch, Wang, Chen, Mike, Damien, Li) - Correct use of 64-bit BARs for pci-epf target (Niklas) - Socket fix for selinux when used in containers (Peijie) - MD pull request via Yu: - fix recovery can preempt resync (Li Nan) - fix md-bitmap IO limit (Su Yue) - fix raid10 discard with REQ_NOWAIT (Xiao Ni) - fix raid1 memory leak (Zheng Qixing) - fix mddev uaf (Yu Kuai) - fix raid1,raid10 IO flags (Yu Kuai) - some refactor and cleanup (Yu Kuai) - Series cleaning up and fixing bugs in the bad block handling code - Improve support for write failure simulation in null_blk - Various lock ordering fixes - Fixes for locking for debugfs attributes - Various ublk related fixes and improvements - Cleanups for blk-rq-qos wait handling - blk-throttle fixes - Fixes for loop dio and sync handling - Fixes and cleanups for the auto-PI code - Block side support for hardware encryption keys in blk-crypto - Various cleanups and fixes * tag 'for-6.15/block-20250322' of git://git.kernel.dk/linux: (105 commits) nvmet: replace max(a, min(b, c)) by clamp(val, lo, hi) nvme-tcp: fix selinux denied when calling sock_sendmsg nvmet: pci-epf: Always configure BAR0 as 64-bit nvmet: Remove duplicate uuid_copy nvme: zns: Simplify nvme_zone_parse_entry() nvmet: pci-epf: Remove redundant 'flush_workqueue()' calls nvmet-fc: Remove unused functions nvme-pci: remove stale comment nvme-fc: Utilise min3() to simplify queue count calculation nvme-multipath: Add visibility for queue-depth io-policy nvme-multipath: Add visibility for numa io-policy nvme-multipath: Add visibility for round-robin io-policy nvmet: add tls_concat and tls_key debugfs entries nvmet-tcp: support secure channel concatenation nvmet: Add 'sq' argument to alloc_ctrl_args nvme-fabrics: reset admin connection for secure concatenation nvme-tcp: request secure channel concatenation nvme-keyring: add nvme_tls_psk_refresh() nvme: add nvme_auth_derive_tls_psk() nvme: add nvme_auth_generate_digest() ...	2025-03-26 18:08:55 -07:00
Linus Torvalds	a86c6d0b2a	Merge tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linux Pull fscrypt updates from Eric Biggers: "A fix for an issue where CONFIG_FS_ENCRYPTION could be enabled without some of its dependencies, and a small documentation update" * tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linux: fscrypt: mention init_on_free instead of page poisoning fscrypt: drop obsolete recommendation to enable optimized ChaCha20 Revert "fscrypt: relax Kconfig dependencies for crypto API algorithms"	2025-03-25 18:31:38 -07:00
Hannes Reinecke	3241cd0c6c	crypto,fs: Separate out hkdf_extract() and hkdf_expand() Separate out the HKDF functions into a separate module to to make them available to other callers. And add a testsuite to the module with test vectors from RFC 5869 (and additional vectors for SHA384 and SHA512) to ensure the integrity of the algorithm. Signed-off-by: Hannes Reinecke <hare@kernel.org> Acked-by: Eric Biggers <ebiggers@kernel.org> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Keith Busch <kbusch@kernel.org>	2025-03-20 16:53:53 -07:00
Matthew Wilcox (Oracle)	59b59a9431	fscrypt: Change fscrypt_encrypt_pagecache_blocks() to take a folio ext4 and ceph already have a folio to pass; f2fs needs to be properly converted but this will do for now. This removes a reference to page->index and page->mapping as well as removing a call to compound_head(). Signed-off-by: "Matthew Wilcox (Oracle)" <willy@infradead.org> Link: https://lore.kernel.org/r/20250304170224.523141-1-willy@infradead.org Acked-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-03-05 12:57:15 +01:00
Eric Biggers	75eb8b9410	Revert "fscrypt: relax Kconfig dependencies for crypto API algorithms" This mostly reverts commit `a0fc20333e`. Keep the relevant parts of the comment added by that commit. The problem with that commit is that it allowed people to create broken configurations that enabled FS_ENCRYPTION but not the mandatory algorithms. An example of this can be found here: https://lore.kernel.org/r/1207325.1737387826@warthog.procyon.org.uk/ The commit did allow people to disable specific generic algorithm implementations that aren't needed. But that at best allowed saving a bit of code. In the real world people are unlikely to intentionally and correctly make such a tweak anyway, as they tend to just be confused by what all the different crypto kconfig options mean. Of course we really need the crypto API to enable the correct implementations automatically, but that's for a later fix. Acked-by: Ard Biesheuvel <ardb@kernel.org> Cc: David Howells <dhowells@redhat.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Link: https://lore.kernel.org/r/20250217185314.27345-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>	2025-02-17 11:33:32 -08:00
Eric Biggers	ebc4176551	blk-crypto: add basic hardware-wrapped key support To prevent keys from being compromised if an attacker acquires read access to kernel memory, some inline encryption hardware can accept keys which are wrapped by a per-boot hardware-internal key. This avoids needing to keep the raw keys in kernel memory, without limiting the number of keys that can be used. Such hardware also supports deriving a "software secret" for cryptographic tasks that can't be handled by inline encryption; this is needed for fscrypt to work properly. To support this hardware, allow struct blk_crypto_key to represent a hardware-wrapped key as an alternative to a raw key, and make drivers set flags in struct blk_crypto_profile to indicate which types of keys they support. Also add the ->derive_sw_secret() low-level operation, which drivers supporting wrapped keys must implement. For more information, see the detailed documentation which this patch adds to Documentation/block/inline-encryption.rst. Signed-off-by: Eric Biggers <ebiggers@google.com> Tested-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> # sm8650 Link: https://lore.kernel.org/r/20250204060041.409950-2-ebiggers@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-02-10 09:54:19 -07:00
Al Viro	c4a9fe6319	fscrypt_d_revalidate(): use stable parent inode passed by caller The only thing it's using is parent directory inode and we are already given a stable reference to that - no need to bother with boilerplate. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2025-01-27 19:25:23 -05:00
Al Viro	5be1fa8abd	Pass parent directory inode and expected name to ->d_revalidate() ->d_revalidate() often needs to access dentry parent and name; that has to be done carefully, since the locking environment varies from caller to caller. We are not guaranteed that dentry in question will not be moved right under us - not unless the filesystem is such that nothing on it ever gets renamed. It can be dealt with, but that results in boilerplate code that isn't even needed - the callers normally have just found the dentry via dcache lookup and want to verify that it's in the right place; they already have the values of ->d_parent and ->d_name stable. There is a couple of exceptions (overlayfs and, to less extent, ecryptfs), but for the majority of calls that song and dance is not needed at all. It's easier to make ecryptfs and overlayfs find and pass those values if there's a ->d_revalidate() instance to be called, rather than doing that in the instances. This commit only changes the calling conventions; making use of supplied values is left to followups. NOTE: some instances need more than just the parent - things like CIFS may need to build an entire path from filesystem root, so they need more precautions than the usual boilerplate. This series doesn't do anything to that need - these filesystems have to keep their locking mechanisms (rename_lock loops, use of dentry_path_raw(), private rwsem a-la v9fs). One thing to keep in mind when using name is that name->name will normally point into the pathname being resolved; the filename in question occupies name->len bytes starting at name->name, and there is NUL somewhere after it, but it the next byte might very well be '/' rather than '\0'. Do not ignore name->len. Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Gabriel Krisman Bertazi <gabriel@krisman.be> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2025-01-27 19:25:23 -05:00
Linus Torvalds	8a7fa81137	Merge tag 'random-6.13-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random Pull random number generator updates from Jason Donenfeld: "This contains a single series from Uros to replace uses of <linux/random.h> with prandom.h or other more specific headers as needed, in order to avoid a circular header issue. Uros' goal is to be able to use percpu.h from prandom.h, which will then allow him to define __percpu in percpu.h rather than in compiler_types.h" * tag 'random-6.13-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random: prandom: Include <linux/percpu.h> in <linux/prandom.h> random: Do not include <linux/prandom.h> in <linux/random.h> netem: Include <linux/prandom.h> in sch_netem.c lib/test_scanf: Include <linux/prandom.h> instead of <linux/random.h> lib/test_parman: Include <linux/prandom.h> instead of <linux/random.h> bpf/tests: Include <linux/prandom.h> instead of <linux/random.h> lib/rbtree-test: Include <linux/prandom.h> instead of <linux/random.h> random32: Include <linux/prandom.h> instead of <linux/random.h> kunit: string-stream-test: Include <linux/prandom.h> lib/interval_tree_test.c: Include <linux/prandom.h> instead of <linux/random.h> bpf: Include <linux/prandom.h> instead of <linux/random.h> scsi: libfcoe: Include <linux/prandom.h> instead of <linux/random.h> fscrypt: Include <linux/once.h> in fs/crypto/keyring.c mtd: tests: Include <linux/prandom.h> instead of <linux/random.h> media: vivid: Include <linux/prandom.h> in vivid-vid-cap.c drm/lib: Include <linux/prandom.h> instead of <linux/random.h> drm/i915/selftests: Include <linux/prandom.h> instead of <linux/random.h> crypto: testmgr: Include <linux/prandom.h> instead of <linux/random.h> x86/kaslr: Include <linux/prandom.h> instead of <linux/random.h>	2024-11-19 10:43:44 -08:00
Uros Bizjak	b27e03ee6f	fscrypt: Include <linux/once.h> in fs/crypto/keyring.c Include <linux/once.h> header to allow the removal of legacy inclusion of <linux/prandom.h> from <linux/random.h>. Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Cc: Theodore Y. Ts'o <tytso@mit.edu> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Acked-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2024-10-03 18:19:56 +02:00
Al Viro	5f60d5f6bb	move asm/unaligned.h to linux/unaligned.h asm/unaligned.h is always an include of asm-generic/unaligned.h; might as well move that thing to linux/unaligned.h and include that - there's nothing arch-specific in that header. auto-generated by the following: for i in `git grep -l -w asm/unaligned.h`; do sed -i -e "s/asm\/unaligned.h/linux\/unaligned.h/" $i done for i in `git grep -l -w asm-generic/unaligned.h`; do sed -i -e "s/asm-generic\/unaligned.h/linux\/unaligned.h/" $i done git mv include/asm-generic/unaligned.h include/linux/unaligned.h git mv tools/include/asm-generic/unaligned.h tools/include/linux/unaligned.h sed -i -e "/unaligned.h/d" include/asm-generic/Kbuild sed -i -e "s/__ASM_GENERIC/__LINUX/" include/linux/unaligned.h tools/include/linux/unaligned.h	2024-10-02 17:23:23 -04:00
Linus Torvalds	61307b7be4	Merge tag 'mm-stable-2024-05-17-19-19' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull mm updates from Andrew Morton: "The usual shower of singleton fixes and minor series all over MM, documented (hopefully adequately) in the respective changelogs. Notable series include: - Lucas Stach has provided some page-mapping cleanup/consolidation/ maintainability work in the series "mm/treewide: Remove pXd_huge() API". - In the series "Allow migrate on protnone reference with MPOL_PREFERRED_MANY policy", Donet Tom has optimized mempolicy's MPOL_PREFERRED_MANY mode, yielding almost doubled performance in one test. - In their series "Memory allocation profiling" Kent Overstreet and Suren Baghdasaryan have contributed a means of determining (via /proc/allocinfo) whereabouts in the kernel memory is being allocated: number of calls and amount of memory. - Matthew Wilcox has provided the series "Various significant MM patches" which does a number of rather unrelated things, but in largely similar code sites. - In his series "mm: page_alloc: freelist migratetype hygiene" Johannes Weiner has fixed the page allocator's handling of migratetype requests, with resulting improvements in compaction efficiency. - In the series "make the hugetlb migration strategy consistent" Baolin Wang has fixed a hugetlb migration issue, which should improve hugetlb allocation reliability. - Liu Shixin has hit an I/O meltdown caused by readahead in a memory-tight memcg. Addressed in the series "Fix I/O high when memory almost met memcg limit". - In the series "mm/filemap: optimize folio adding and splitting" Kairui Song has optimized pagecache insertion, yielding ~10% performance improvement in one test. - Baoquan He has cleaned up and consolidated the early zone initialization code in the series "mm/mm_init.c: refactor free_area_init_core()". - Baoquan has also redone some MM initializatio code in the series "mm/init: minor clean up and improvement". - MM helper cleanups from Christoph Hellwig in his series "remove follow_pfn". - More cleanups from Matthew Wilcox in the series "Various page->flags cleanups". - Vlastimil Babka has contributed maintainability improvements in the series "memcg_kmem hooks refactoring". - More folio conversions and cleanups in Matthew Wilcox's series: "Convert huge_zero_page to huge_zero_folio" "khugepaged folio conversions" "Remove page_idle and page_young wrappers" "Use folio APIs in procfs" "Clean up __folio_put()" "Some cleanups for memory-failure" "Remove page_mapping()" "More folio compat code removal" - David Hildenbrand chipped in with "fs/proc/task_mmu: convert hugetlb functions to work on folis". - Code consolidation and cleanup work related to GUP's handling of hugetlbs in Peter Xu's series "mm/gup: Unify hugetlb, part 2". - Rick Edgecombe has developed some fixes to stack guard gaps in the series "Cover a guard gap corner case". - Jinjiang Tu has fixed KSM's behaviour after a fork+exec in the series "mm/ksm: fix ksm exec support for prctl". - Baolin Wang has implemented NUMA balancing for multi-size THPs. This is a simple first-cut implementation for now. The series is "support multi-size THP numa balancing". - Cleanups to vma handling helper functions from Matthew Wilcox in the series "Unify vma_address and vma_pgoff_address". - Some selftests maintenance work from Dev Jain in the series "selftests/mm: mremap_test: Optimizations and style fixes". - Improvements to the swapping of multi-size THPs from Ryan Roberts in the series "Swap-out mTHP without splitting". - Kefeng Wang has significantly optimized the handling of arm64's permission page faults in the series "arch/mm/fault: accelerate pagefault when badaccess" "mm: remove arch's private VM_FAULT_BADMAP/BADACCESS" - GUP cleanups from David Hildenbrand in "mm/gup: consistently call it GUP-fast". - hugetlb fault code cleanups from Vishal Moola in "Hugetlb fault path to use struct vm_fault". - selftests build fixes from John Hubbard in the series "Fix selftests/mm build without requiring "make headers"". - Memory tiering fixes/improvements from Ho-Ren (Jack) Chuang in the series "Improved Memory Tier Creation for CPUless NUMA Nodes". Fixes the initialization code so that migration between different memory types works as intended. - David Hildenbrand has improved follow_pte() and fixed an errant driver in the series "mm: follow_pte() improvements and acrn follow_pte() fixes". - David also did some cleanup work on large folio mapcounts in his series "mm: mapcount for large folios + page_mapcount() cleanups". - Folio conversions in KSM in Alex Shi's series "transfer page to folio in KSM". - Barry Song has added some sysfs stats for monitoring multi-size THP's in the series "mm: add per-order mTHP alloc and swpout counters". - Some zswap cleanups from Yosry Ahmed in the series "zswap same-filled and limit checking cleanups". - Matthew Wilcox has been looking at buffer_head code and found the documentation to be lacking. The series is "Improve buffer head documentation". - Multi-size THPs get more work, this time from Lance Yang. His series "mm/madvise: enhance lazyfreeing with mTHP in madvise_free" optimizes the freeing of these things. - Kemeng Shi has added more userspace-visible writeback instrumentation in the series "Improve visibility of writeback". - Kemeng Shi then sent some maintenance work on top in the series "Fix and cleanups to page-writeback". - Matthew Wilcox reduces mmap_lock traffic in the anon vma code in the series "Improve anon_vma scalability for anon VMAs". Intel's test bot reported an improbable 3x improvement in one test. - SeongJae Park adds some DAMON feature work in the series "mm/damon: add a DAMOS filter type for page granularity access recheck" "selftests/damon: add DAMOS quota goal test" - Also some maintenance work in the series "mm/damon/paddr: simplify page level access re-check for pageout" "mm/damon: misc fixes and improvements" - David Hildenbrand has disabled some known-to-fail selftests ni the series "selftests: mm: cow: flag vmsplice() hugetlb tests as XFAIL". - memcg metadata storage optimizations from Shakeel Butt in "memcg: reduce memory consumption by memcg stats". - DAX fixes and maintenance work from Vishal Verma in the series "dax/bus.c: Fixups for dax-bus locking"" * tag 'mm-stable-2024-05-17-19-19' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (426 commits) memcg, oom: cleanup unused memcg_oom_gfp_mask and memcg_oom_order selftests/mm: hugetlb_madv_vs_map: avoid test skipping by querying hugepage size at runtime mm/hugetlb: add missing VM_FAULT_SET_HINDEX in hugetlb_wp mm/hugetlb: add missing VM_FAULT_SET_HINDEX in hugetlb_fault selftests: cgroup: add tests to verify the zswap writeback path mm: memcg: make alloc_mem_cgroup_per_node_info() return bool mm/damon/core: fix return value from damos_wmark_metric_value mm: do not update memcg stats for NR_{FILE/SHMEM}_PMDMAPPED selftests: cgroup: remove redundant enabling of memory controller Docs/mm/damon/maintainer-profile: allow posting patches based on damon/next tree Docs/mm/damon/maintainer-profile: change the maintainer's timezone from PST to PT Docs/mm/damon/design: use a list for supported filters Docs/admin-guide/mm/damon/usage: fix wrong schemes effective quota update command Docs/admin-guide/mm/damon/usage: fix wrong example of DAMOS filter matching sysfs file selftests/damon: classify tests for functionalities and regressions selftests/damon/_damon_sysfs: use 'is' instead of '==' for 'None' selftests/damon/_damon_sysfs: find sysfs mount point from /proc/mounts selftests/damon/_damon_sysfs: check errors from nr_schemes file reads mm/damon/core: initialize ->esz_bp from damos_quota_init_priv() selftests/damon: add a test for DAMOS quota goal ...	2024-05-19 09:21:03 -07:00
Mateusz Guzik	7f016edaa0	fscrypt: try to avoid refing parent dentry in fscrypt_file_open Merely checking if the directory is encrypted happens for every open when using ext4, at the moment refing and unrefing the parent, costing 2 atomics and serializing opens of different files. The most common case of encryption not being used can be checked for with RCU instead. Sample result from open1_processes -t 20 ("Separate file open/close") from will-it-scale on Sapphire Rapids (ops/s): before: 12539898 after: 25575494 (+103%) v2: - add a comment justifying rcu usage, submitted by Eric Biggers - whack spurious IS_ENCRYPTED check from the refed case Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Link: https://lore.kernel.org/r/20240508081400.422212-1-mjguzik@gmail.com Signed-off-by: Eric Biggers <ebiggers@google.com>	2024-05-08 10:28:58 -07:00
Matthew Wilcox (Oracle)	262f014dd7	fscrypt: convert bh_get_inode_and_lblk_num to use a folio Patch series "Remove page_mapping()". There are only a few users left. Convert them all to either call folio_mapping() or just use folio->mapping directly. This patch (of 6): Remove uses of page->index, page_mapping() and b_page. Saves a call to compound_head(). Link: https://lkml.kernel.org/r/20240423225552.4113447-1-willy@infradead.org Link: https://lkml.kernel.org/r/20240423225552.4113447-2-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Eric Biggers <ebiggers@google.com> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2024-05-05 17:53:47 -07:00
Linus Torvalds	3bf95d567d	Merge tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linux Pull fscrypt updates from Eric Biggers: "Fix flakiness in a test by releasing the quota synchronously when a key is removed, and other minor cleanups" * tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linux: fscrypt: shrink the size of struct fscrypt_inode_info slightly fscrypt: write CBC-CTS instead of CTS-CBC fscrypt: clear keyring before calling key_put() fscrypt: explicitly require that inode->i_blkbits be set	2024-03-12 13:17:36 -07:00
Christian Brauner	09406ad8e5	Merge tag 'for-next-6.9' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/krisman/unicode into vfs.misc Merge case-insensitive updates from Gabriel Krisman Bertazi: - Patch case-insensitive lookup by trying the case-exact comparison first, before falling back to costly utf8 casefolded comparison. - Fix to forbid using a case-insensitive directory as part of an overlayfs mount. - Patchset to ensure d_op are set at d_alloc time for fscrypt and casefold volumes, ensuring filesystem dentries will all have the correct ops, whether they come from a lookup or not. * tag 'for-next-6.9' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/krisman/unicode: libfs: Drop generic_set_encrypted_ci_d_ops ubifs: Configure dentry operations at dentry-creation time f2fs: Configure dentry operations at dentry-creation time ext4: Configure dentry operations at dentry-creation time libfs: Add helper to choose dentry operations at mount-time libfs: Merge encrypted_ci_dentry_ops and ci_dentry_ops fscrypt: Drop d_revalidate once the key is added fscrypt: Drop d_revalidate for valid dentries during lookup fscrypt: Factor out a helper to configure the lookup dentry ovl: Always reject mounting over case-insensitive directories libfs: Attempt exact-match comparison first during casefolded lookup Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-03-07 11:55:41 +01:00
Gabriel Krisman Bertazi	8b6bb995d3	fscrypt: Factor out a helper to configure the lookup dentry Both fscrypt_prepare_lookup_partial and fscrypt_prepare_lookup will set DCACHE_NOKEY_NAME for dentries when the key is not available. Extract out a helper to set this flag in a single place, in preparation to also add the optimization that will disable ->d_revalidate if possible. Reviewed-by: Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20240221171412.10710-3-krisman@suse.de Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>	2024-02-27 16:55:34 -05:00
Eric Biggers	8c62f31edd	fscrypt: shrink the size of struct fscrypt_inode_info slightly Shrink the size of struct fscrypt_inode_info by 8 bytes by packing the small fields into the 64 bits after ci_enc_key. Link: https://lore.kernel.org/r/20240224060103.91037-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>	2024-02-23 22:03:48 -08:00
Eric Biggers	2f944c66ae	fscrypt: write CBC-CTS instead of CTS-CBC Calling CBC with ciphertext stealing "CBC-CTS" seems to be more common than calling it "CTS-CBC". E.g., CBC-CTS is used by OpenSSL, Crypto++, RFC3962, and RFC6803. The NIST SP800-38A addendum uses CBC-CS1, CBC-CS2, and CBC-CS3, distinguishing between different CTS conventions but similarly putting the CBC part first. In the interest of avoiding any idiosyncratic terminology, update the fscrypt documentation and the fscrypt_mode "friendly names" to align with the more common convention. Changing the "friendly names" only affects some log messages. The actual mode constants in the API are unchanged; those call it simply "CTS". Add a note to the documentation that clarifies that "CBC" and "CTS" in the API really mean CBC-ESSIV and CBC-CTS, respectively. Link: https://lore.kernel.org/r/20240224053550.44659-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>	2024-02-23 21:38:59 -08:00
Luis Henriques	d3a7bd4200	fscrypt: clear keyring before calling key_put() Now that the key quotas are handled immediately on key_put() instead of being postponed to the key management garbage collection worker, a call to keyring_clear() is all that is required in fscrypt_put_master_key() so that the keyring clean-up is also done synchronously. This patch should fix the fstest generic/581 flakiness. Signed-off-by: Luis Henriques <lhenriques@suse.de> Link: https://lore.kernel.org/r/20240206101619.8083-1-lhenriques@suse.de [ebiggers: added comment] Signed-off-by: Eric Biggers <ebiggers@google.com>	2024-02-06 16:55:35 -08:00
Xiubo Li	5befc19cae	fscrypt: explicitly require that inode->i_blkbits be set Document that fscrypt_prepare_new_inode() requires inode->i_blkbits to be set, and make it WARN if it's not. This would have made the CephFS bug https://tracker.ceph.com/issues/64035 a bit easier to debug. Signed-off-by: Xiubo Li <xiubli@redhat.com> Link: https://lore.kernel.org/r/20240201003525.1788594-1-xiubli@redhat.com Signed-off-by: Eric Biggers <ebiggers@google.com>	2024-01-31 18:44:59 -08:00
Christian Brauner	0000ff2523	Merge tag 'exportfs-6.9' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/cel/linux Merge exportfs fixes from Chuck Lever: * tag 'exportfs-6.9' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/cel/linux: fs: Create a generic is_dot_dotdot() utility exportfs: fix the fallback implementation of the get_name export operation Link: https://lore.kernel.org/r/BDC2AEB4-7085-4A7C-8DE8-A659FE1DBA6A@oracle.com Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-01-23 17:56:30 +01:00
Chuck Lever	42c3732fa8	fs: Create a generic is_dot_dotdot() utility De-duplicate the same functionality in several places by hoisting the is_dot_dotdot() utility function into linux/fs.h. Suggested-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Acked-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2024-01-23 10:58:56 -05:00
Eric Biggers	c1f1f5bf41	fscrypt: document that CephFS supports fscrypt now The help text for CONFIG_FS_ENCRYPTION and the fscrypt.rst documentation file both list the filesystems that support fscrypt. CephFS added support for fscrypt in v6.6, so add CephFS to the list. Link: https://lore.kernel.org/r/20231227045158.87276-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>	2023-12-26 22:55:42 -06:00
Eric Biggers	0fc24a6549	fscrypt: update comment for do_remove_key() Adjust a comment that was missed during commit `15baf55481` ("fscrypt: track master key presence separately from secret"). Link: https://lore.kernel.org/r/20231206002127.14790-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>	2023-12-09 12:38:16 -08:00
Linus Torvalds	bc3012f4e3	Merge tag 'v6.7-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto updates from Herbert Xu: "API: - Add virtual-address based lskcipher interface - Optimise ahash/shash performance in light of costly indirect calls - Remove ahash alignmask attribute Algorithms: - Improve AES/XTS performance of 6-way unrolling for ppc - Remove some uses of obsolete algorithms (md4, md5, sha1) - Add FIPS 202 SHA-3 support in pkcs1pad - Add fast path for single-page messages in adiantum - Remove zlib-deflate Drivers: - Add support for S4 in meson RNG driver - Add STM32MP13x support in stm32 - Add hwrng interface support in qcom-rng - Add support for deflate algorithm in hisilicon/zip" * tag 'v6.7-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (283 commits) crypto: adiantum - flush destination page before unmapping crypto: testmgr - move pkcs1pad(rsa,sha3-*) to correct place Documentation/module-signing.txt: bring up to date module: enable automatic module signing with FIPS 202 SHA-3 crypto: asymmetric_keys - allow FIPS 202 SHA-3 signatures crypto: rsa-pkcs1pad - Add FIPS 202 SHA-3 support crypto: FIPS 202 SHA-3 register in hash info for IMA x509: Add OIDs for FIPS 202 SHA-3 hash and signatures crypto: ahash - optimize performance when wrapping shash crypto: ahash - check for shash type instead of not ahash type crypto: hash - move "ahash wrapping shash" functions to ahash.c crypto: talitos - stop using crypto_ahash::init crypto: chelsio - stop using crypto_ahash::init crypto: ahash - improve file comment crypto: ahash - remove struct ahash_request_priv crypto: ahash - remove crypto_ahash_alignmask crypto: gcm - stop using alignmask of ahash crypto: chacha20poly1305 - stop using alignmask of ahash crypto: ccm - stop using alignmask of ahash net: ipv6: stop checking crypto_ahash_alignmask ...	2023-11-02 16:15:30 -10:00
Eric Biggers	15baf55481	fscrypt: track master key presence separately from secret Master keys can be in one of three states: present, incompletely removed, and absent (as per FSCRYPT_KEY_STATUS_* used in the UAPI). Currently, the way that "present" is distinguished from "incompletely removed" internally is by whether ->mk_secret exists or not. With extent-based encryption, it will be necessary to allow per-extent keys to be derived while the master key is incompletely removed, so that I/O on open files will reliably continue working after removal of the key has been initiated. (We could allow I/O to sometimes fail in that case, but that seems problematic for reasons such as writes getting silently thrown away and diverging from the existing fscrypt semantics.) Therefore, when the filesystem is using extent-based encryption, ->mk_secret can't be wiped when the key becomes incompletely removed. As a prerequisite for doing that, this patch makes the "present" state be tracked using a new field, ->mk_present. No behavior is changed yet. The basic idea here is borrowed from Josef Bacik's patch "fscrypt: use a flag to indicate that the master key is being evicted" (https://lore.kernel.org/r/e86c16dddc049ff065f877d793ad773e4c6bfad9.1696970227.git.josef@toxicpanda.com). I reimplemented it using a "present" bool instead of an "evicted" flag, fixed a couple bugs, and tried to update everything to be consistent. Note: I considered adding a ->mk_status field instead, holding one of FSCRYPT_KEY_STATUS_*. At first that seemed nice, but it ended up being more complex (despite simplifying FS_IOC_GET_ENCRYPTION_KEY_STATUS), since it would have introduced redundancy and had weird locking rules. Reviewed-by: Neal Gompa <neal@gompa.dev> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Link: https://lore.kernel.org/r/20231015061055.62673-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>	2023-10-16 21:23:45 -07:00
Josef Bacik	3e7807d5a7	fscrypt: rename fscrypt_info => fscrypt_inode_info We are going to track per-extent information, so it'll be necessary to distinguish between inode infos and extent infos. Rename fscrypt_info to fscrypt_inode_info, adjusting any lines that now exceed 80 characters. Signed-off-by: Josef Bacik <josef@toxicpanda.com> [ebiggers: rebased onto fscrypt tree, renamed fscrypt_get_info(), adjusted two comments, and fixed some lines over 80 characters] Link: https://lore.kernel.org/r/20231005025757.33521-1-ebiggers@kernel.org Reviewed-by: Neal Gompa <neal@gompa.dev> Signed-off-by: Eric Biggers <ebiggers@google.com>	2023-10-08 20:44:26 -07:00
Eric Biggers	5b11888471	fscrypt: support crypto data unit size less than filesystem block size Until now, fscrypt has always used the filesystem block size as the granularity of file contents encryption. Two scenarios have come up where a sub-block granularity of contents encryption would be useful: 1. Inline crypto hardware that only supports a crypto data unit size that is less than the filesystem block size. 2. Support for direct I/O at a granularity less than the filesystem block size, for example at the block device's logical block size in order to match the traditional direct I/O alignment requirement. (1) first came up with older eMMC inline crypto hardware that only supports a crypto data unit size of 512 bytes. That specific case ultimately went away because all systems with that hardware continued using out of tree code and never actually upgraded to the upstream inline crypto framework. But, now it's coming back in a new way: some current UFS controllers only support a data unit size of 4096 bytes, and there is a proposal to increase the filesystem block size to 16K. (2) was discussed as a "nice to have" feature, though not essential, when support for direct I/O on encrypted files was being upstreamed. Still, the fact that this feature has come up several times does suggest it would be wise to have available. Therefore, this patch implements it by using one of the reserved bytes in fscrypt_policy_v2 to allow users to select a sub-block data unit size. Supported data unit sizes are powers of 2 between 512 and the filesystem block size, inclusively. Support is implemented for both the FS-layer and inline crypto cases. This patch focuses on the basic support for sub-block data units. Some things are out of scope for this patch but may be addressed later: - Supporting sub-block data units in combination with FSCRYPT_POLICY_FLAG_IV_INO_LBLK_64, in most cases. Unfortunately this combination usually causes data unit indices to exceed 32 bits, and thus fscrypt_supported_policy() correctly disallows it. The users who potentially need this combination are using f2fs. To support it, f2fs would need to provide an option to slightly reduce its max file size. - Supporting sub-block data units in combination with FSCRYPT_POLICY_FLAG_IV_INO_LBLK_32. This has the same problem described above, but also it will need special code to make DUN wraparound still happen on a FS block boundary. - Supporting use case (2) mentioned above. The encrypted direct I/O code will need to stop requiring and assuming FS block alignment. This won't be hard, but it belongs in a separate patch. - Supporting this feature on filesystems other than ext4 and f2fs. (Filesystems declare support for it via their fscrypt_operations.) On UBIFS, sub-block data units don't make sense because UBIFS encrypts variable-length blocks as a result of compression. CephFS could support it, but a bit more work would be needed to make the fscrypt_*_block_inplace functions play nicely with sub-block data units. I don't think there's a use case for this on CephFS anyway. Link: https://lore.kernel.org/r/20230925055451.59499-6-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>	2023-09-25 22:34:33 -07:00
Eric Biggers	7a0263dc90	fscrypt: replace get_ino_and_lblk_bits with just has_32bit_inodes Now that fs/crypto/ computes the filesystem's lblk_bits from its maximum file size, it is no longer necessary for filesystems to provide lblk_bits via fscrypt_operations::get_ino_and_lblk_bits. It is still necessary for fs/crypto/ to retrieve ino_bits from the filesystem. However, this is used only to decide whether inode numbers fit in 32 bits. Also, ino_bits is static for all relevant filesystems, i.e. it doesn't depend on the filesystem instance. Therefore, in the interest of keeping things as simple as possible, replace 'get_ino_and_lblk_bits' with a flag 'has_32bit_inodes'. This can always be changed back to a function if a filesystem needs it to be dynamic, but for now a static flag is all that's needed. Link: https://lore.kernel.org/r/20230925055451.59499-5-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>	2023-09-25 22:34:33 -07:00
Eric Biggers	f0904e8bc3	fscrypt: compute max_lblk_bits from s_maxbytes and block size For a given filesystem, the number of bits used by the maximum file logical block number is computable from the maximum file size and the block size. These values are always present in struct super_block. Therefore, compute it this way instead of using the value from fscrypt_operations::get_ino_and_lblk_bits. Since filesystems always have to set the super_block fields anyway, this avoids having to provide this information redundantly via fscrypt_operations. This change is in preparation for adding support for sub-block data units. For that, the value that is needed will become "the maximum file data unit index". A hardcoded value won't suffice for that; it will need to be computed anyway. Link: https://lore.kernel.org/r/20230925055451.59499-4-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>	2023-09-25 22:34:30 -07:00
Eric Biggers	40e13e1816	fscrypt: make the bounce page pool opt-in instead of opt-out Replace FS_CFLG_OWN_PAGES with a bit flag 'needs_bounce_pages' which has the opposite meaning. I.e., filesystems now opt into the bounce page pool instead of opt out. Make fscrypt_alloc_bounce_page() check that the bounce page pool has been initialized. I believe the opt-in makes more sense, since nothing else in fscrypt_operations is opt-out, and these days filesystems can choose to use blk-crypto which doesn't need the fscrypt bounce page pool. Also, I happen to be planning to add two more flags, and I wanted to fix the "FS_CFLG_" name anyway as it wasn't prefixed with "FSCRYPT_". Link: https://lore.kernel.org/r/20230925055451.59499-3-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>	2023-09-24 23:03:09 -07:00

1 2 3 4 5 ...

351 Commits