mirror of
https://github.com/torvalds/linux.git
synced 2025-12-07 11:56:58 +00:00
Pull MM updates from Andrew Morton:
"__vmalloc()/kvmalloc() and no-block support" (Uladzislau Rezki)
Rework the vmalloc() code to support non-blocking allocations
(GFP_ATOIC, GFP_NOWAIT)
"ksm: fix exec/fork inheritance" (xu xin)
Fix a rare case where the KSM MMF_VM_MERGE_ANY prctl state is not
inherited across fork/exec
"mm/zswap: misc cleanup of code and documentations" (SeongJae Park)
Some light maintenance work on the zswap code
"mm/page_owner: add debugfs files 'show_handles' and 'show_stacks_handles'" (Mauricio Faria de Oliveira)
Enhance the /sys/kernel/debug/page_owner debug feature by adding
unique identifiers to differentiate the various stack traces so
that userspace monitoring tools can better match stack traces over
time
"mm/page_alloc: pcp->batch cleanups" (Joshua Hahn)
Minor alterations to the page allocator's per-cpu-pages feature
"Improve UFFDIO_MOVE scalability by removing anon_vma lock" (Lokesh Gidra)
Address a scalability issue in userfaultfd's UFFDIO_MOVE operation
"kasan: cleanups for kasan_enabled() checks" (Sabyrzhan Tasbolatov)
"drivers/base/node: fold node register and unregister functions" (Donet Tom)
Clean up the NUMA node handling code a little
"mm: some optimizations for prot numa" (Kefeng Wang)
Cleanups and small optimizations to the NUMA allocation hinting
code
"mm/page_alloc: Batch callers of free_pcppages_bulk" (Joshua Hahn)
Address long lock hold times at boot on large machines. These were
causing (harmless) softlockup warnings
"optimize the logic for handling dirty file folios during reclaim" (Baolin Wang)
Remove some now-unnecessary work from page reclaim
"mm/damon: allow DAMOS auto-tuned for per-memcg per-node memory usage" (SeongJae Park)
Enhance the DAMOS auto-tuning feature
"mm/damon: fixes for address alignment issues in DAMON_LRU_SORT and DAMON_RECLAIM" (Quanmin Yan)
Fix DAMON_LRU_SORT and DAMON_RECLAIM with certain userspace
configuration
"expand mmap_prepare functionality, port more users" (Lorenzo Stoakes)
Enhance the new(ish) file_operations.mmap_prepare() method and port
additional callsites from the old ->mmap() over to ->mmap_prepare()
"Fix stale IOTLB entries for kernel address space" (Lu Baolu)
Fix a bug (and possible security issue on non-x86) in the IOMMU
code. In some situations the IOMMU could be left hanging onto a
stale kernel pagetable entry
"mm/huge_memory: cleanup __split_unmapped_folio()" (Wei Yang)
Clean up and optimize the folio splitting code
"mm, swap: misc cleanup and bugfix" (Kairui Song)
Some cleanups and a minor fix in the swap discard code
"mm/damon: misc documentation fixups" (SeongJae Park)
"mm/damon: support pin-point targets removal" (SeongJae Park)
Permit userspace to remove a specific monitoring target in the
middle of the current targets list
"mm: MISC follow-up patches for linux/pgalloc.h" (Harry Yoo)
A couple of cleanups related to mm header file inclusion
"mm/swapfile.c: select swap devices of default priority round robin" (Baoquan He)
improve the selection of swap devices for NUMA machines
"mm: Convert memory block states (MEM_*) macros to enums" (Israel Batista)
Change the memory block labels from macros to enums so they will
appear in kernel debug info
"ksm: perform a range-walk to jump over holes in break_ksm" (Pedro Demarchi Gomes)
Address an inefficiency when KSM unmerges an address range
"mm/damon/tests: fix memory bugs in kunit tests" (SeongJae Park)
Fix leaks and unhandled malloc() failures in DAMON userspace unit
tests
"some cleanups for pageout()" (Baolin Wang)
Clean up a couple of minor things in the page scanner's
writeback-for-eviction code
"mm/hugetlb: refactor sysfs/sysctl interfaces" (Hui Zhu)
Move hugetlb's sysfs/sysctl handling code into a new file
"introduce VM_MAYBE_GUARD and make it sticky" (Lorenzo Stoakes)
Make the VMA guard regions available in /proc/pid/smaps and
improves the mergeability of guarded VMAs
"mm: perform guard region install/remove under VMA lock" (Lorenzo Stoakes)
Reduce mmap lock contention for callers performing VMA guard region
operations
"vma_start_write_killable" (Matthew Wilcox)
Start work on permitting applications to be killed when they are
waiting on a read_lock on the VMA lock
"mm/damon/tests: add more tests for online parameters commit" (SeongJae Park)
Add additional userspace testing of DAMON's "commit" feature
"mm/damon: misc cleanups" (SeongJae Park)
"make VM_SOFTDIRTY a sticky VMA flag" (Lorenzo Stoakes)
Address the possible loss of a VMA's VM_SOFTDIRTY flag when that
VMA is merged with another
"mm: support device-private THP" (Balbir Singh)
Introduce support for Transparent Huge Page (THP) migration in zone
device-private memory
"Optimize folio split in memory failure" (Zi Yan)
"mm/huge_memory: Define split_type and consolidate split support checks" (Wei Yang)
Some more cleanups in the folio splitting code
"mm: remove is_swap_[pte, pmd]() + non-swap entries, introduce leaf entries" (Lorenzo Stoakes)
Clean up our handling of pagetable leaf entries by introducing the
concept of 'software leaf entries', of type softleaf_t
"reparent the THP split queue" (Muchun Song)
Reparent the THP split queue to its parent memcg. This is in
preparation for addressing the long-standing "dying memcg" problem,
wherein dead memcg's linger for too long, consuming memory
resources
"unify PMD scan results and remove redundant cleanup" (Wei Yang)
A little cleanup in the hugepage collapse code
"zram: introduce writeback bio batching" (Sergey Senozhatsky)
Improve zram writeback efficiency by introducing batched bio
writeback support
"memcg: cleanup the memcg stats interfaces" (Shakeel Butt)
Clean up our handling of the interrupt safety of some memcg stats
"make vmalloc gfp flags usage more apparent" (Vishal Moola)
Clean up vmalloc's handling of incoming GFP flags
"mm: Add soft-dirty and uffd-wp support for RISC-V" (Chunyan Zhang)
Teach soft dirty and userfaultfd write protect tracking to use
RISC-V's Svrsw60t59b extension
"mm: swap: small fixes and comment cleanups" (Youngjun Park)
Fix a small bug and clean up some of the swap code
"initial work on making VMA flags a bitmap" (Lorenzo Stoakes)
Start work on converting the vma struct's flags to a bitmap, so we
stop running out of them, especially on 32-bit
"mm/swapfile: fix and cleanup swap list iterations" (Youngjun Park)
Address a possible bug in the swap discard code and clean things
up a little
[ This merge also reverts commit ebb9aeb980 ("vfio/nvgrace-gpu:
register device memory for poison handling") because it looks
broken to me, I've asked for clarification - Linus ]
* tag 'mm-stable-2025-12-03-21-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (321 commits)
mm: fix vma_start_write_killable() signal handling
mm/swapfile: use plist_for_each_entry in __folio_throttle_swaprate
mm/swapfile: fix list iteration when next node is removed during discard
fs/proc/task_mmu.c: fix make_uffd_wp_huge_pte() huge pte handling
mm/kfence: add reboot notifier to disable KFENCE on shutdown
memcg: remove inc/dec_lruvec_kmem_state helpers
selftests/mm/uffd: initialize char variable to Null
mm: fix DEBUG_RODATA_TEST indentation in Kconfig
mm: introduce VMA flags bitmap type
tools/testing/vma: eliminate dependency on vma->__vm_flags
mm: simplify and rename mm flags function for clarity
mm: declare VMA flags by bit
zram: fix a spelling mistake
mm/page_alloc: optimize lowmem_reserve max lookup using its semantic monotonicity
mm/vmscan: skip increasing kswapd_failures when reclaim was boosted
pagemap: update BUDDY flag documentation
mm: swap: remove scan_swap_map_slots() references from comments
mm: swap: change swap_alloc_slow() to void
mm, swap: remove redundant comment for read_swap_cache_async
mm, swap: use SWP_SOLIDSTATE to determine if swap is rotational
...
1156 lines
30 KiB
C
1156 lines
30 KiB
C
// SPDX-License-Identifier: GPL-2.0
|
|
/*
|
|
* Copyright IBM Corp. 2007, 2011
|
|
* Author(s): Martin Schwidefsky <schwidefsky@de.ibm.com>
|
|
*/
|
|
|
|
#include <linux/cpufeature.h>
|
|
#include <linux/export.h>
|
|
#include <linux/sched.h>
|
|
#include <linux/kernel.h>
|
|
#include <linux/errno.h>
|
|
#include <linux/gfp.h>
|
|
#include <linux/mm.h>
|
|
#include <linux/swap.h>
|
|
#include <linux/smp.h>
|
|
#include <linux/spinlock.h>
|
|
#include <linux/rcupdate.h>
|
|
#include <linux/slab.h>
|
|
#include <linux/leafops.h>
|
|
#include <linux/sysctl.h>
|
|
#include <linux/ksm.h>
|
|
#include <linux/mman.h>
|
|
|
|
#include <asm/tlbflush.h>
|
|
#include <asm/mmu_context.h>
|
|
#include <asm/page-states.h>
|
|
#include <asm/pgtable.h>
|
|
#include <asm/machine.h>
|
|
|
|
pgprot_t pgprot_writecombine(pgprot_t prot)
|
|
{
|
|
/*
|
|
* mio_wb_bit_mask may be set on a different CPU, but it is only set
|
|
* once at init and only read afterwards.
|
|
*/
|
|
return __pgprot(pgprot_val(prot) | mio_wb_bit_mask);
|
|
}
|
|
EXPORT_SYMBOL_GPL(pgprot_writecombine);
|
|
|
|
static inline void ptep_ipte_local(struct mm_struct *mm, unsigned long addr,
|
|
pte_t *ptep, int nodat)
|
|
{
|
|
unsigned long opt, asce;
|
|
|
|
if (machine_has_tlb_guest()) {
|
|
opt = 0;
|
|
asce = READ_ONCE(mm->context.gmap_asce);
|
|
if (asce == 0UL || nodat)
|
|
opt |= IPTE_NODAT;
|
|
if (asce != -1UL) {
|
|
asce = asce ? : mm->context.asce;
|
|
opt |= IPTE_GUEST_ASCE;
|
|
}
|
|
__ptep_ipte(addr, ptep, opt, asce, IPTE_LOCAL);
|
|
} else {
|
|
__ptep_ipte(addr, ptep, 0, 0, IPTE_LOCAL);
|
|
}
|
|
}
|
|
|
|
static inline void ptep_ipte_global(struct mm_struct *mm, unsigned long addr,
|
|
pte_t *ptep, int nodat)
|
|
{
|
|
unsigned long opt, asce;
|
|
|
|
if (machine_has_tlb_guest()) {
|
|
opt = 0;
|
|
asce = READ_ONCE(mm->context.gmap_asce);
|
|
if (asce == 0UL || nodat)
|
|
opt |= IPTE_NODAT;
|
|
if (asce != -1UL) {
|
|
asce = asce ? : mm->context.asce;
|
|
opt |= IPTE_GUEST_ASCE;
|
|
}
|
|
__ptep_ipte(addr, ptep, opt, asce, IPTE_GLOBAL);
|
|
} else {
|
|
__ptep_ipte(addr, ptep, 0, 0, IPTE_GLOBAL);
|
|
}
|
|
}
|
|
|
|
static inline pte_t ptep_flush_direct(struct mm_struct *mm,
|
|
unsigned long addr, pte_t *ptep,
|
|
int nodat)
|
|
{
|
|
pte_t old;
|
|
|
|
old = *ptep;
|
|
if (unlikely(pte_val(old) & _PAGE_INVALID))
|
|
return old;
|
|
atomic_inc(&mm->context.flush_count);
|
|
if (cpu_has_tlb_lc() &&
|
|
cpumask_equal(mm_cpumask(mm), cpumask_of(smp_processor_id())))
|
|
ptep_ipte_local(mm, addr, ptep, nodat);
|
|
else
|
|
ptep_ipte_global(mm, addr, ptep, nodat);
|
|
atomic_dec(&mm->context.flush_count);
|
|
return old;
|
|
}
|
|
|
|
static inline pte_t ptep_flush_lazy(struct mm_struct *mm,
|
|
unsigned long addr, pte_t *ptep,
|
|
int nodat)
|
|
{
|
|
pte_t old;
|
|
|
|
old = *ptep;
|
|
if (unlikely(pte_val(old) & _PAGE_INVALID))
|
|
return old;
|
|
atomic_inc(&mm->context.flush_count);
|
|
if (cpumask_equal(&mm->context.cpu_attach_mask,
|
|
cpumask_of(smp_processor_id()))) {
|
|
set_pte(ptep, set_pte_bit(*ptep, __pgprot(_PAGE_INVALID)));
|
|
mm->context.flush_mm = 1;
|
|
} else
|
|
ptep_ipte_global(mm, addr, ptep, nodat);
|
|
atomic_dec(&mm->context.flush_count);
|
|
return old;
|
|
}
|
|
|
|
static inline pgste_t pgste_get(pte_t *ptep)
|
|
{
|
|
unsigned long pgste = 0;
|
|
#ifdef CONFIG_PGSTE
|
|
pgste = *(unsigned long *)(ptep + PTRS_PER_PTE);
|
|
#endif
|
|
return __pgste(pgste);
|
|
}
|
|
|
|
static inline void pgste_set(pte_t *ptep, pgste_t pgste)
|
|
{
|
|
#ifdef CONFIG_PGSTE
|
|
*(pgste_t *)(ptep + PTRS_PER_PTE) = pgste;
|
|
#endif
|
|
}
|
|
|
|
static inline pgste_t pgste_update_all(pte_t pte, pgste_t pgste,
|
|
struct mm_struct *mm)
|
|
{
|
|
#ifdef CONFIG_PGSTE
|
|
unsigned long address, bits, skey;
|
|
|
|
if (!mm_uses_skeys(mm) || pte_val(pte) & _PAGE_INVALID)
|
|
return pgste;
|
|
address = pte_val(pte) & PAGE_MASK;
|
|
skey = (unsigned long) page_get_storage_key(address);
|
|
bits = skey & (_PAGE_CHANGED | _PAGE_REFERENCED);
|
|
/* Transfer page changed & referenced bit to guest bits in pgste */
|
|
pgste = set_pgste_bit(pgste, bits << 48); /* GR bit & GC bit */
|
|
/* Copy page access key and fetch protection bit to pgste */
|
|
pgste = clear_pgste_bit(pgste, PGSTE_ACC_BITS | PGSTE_FP_BIT);
|
|
pgste = set_pgste_bit(pgste, (skey & (_PAGE_ACC_BITS | _PAGE_FP_BIT)) << 56);
|
|
#endif
|
|
return pgste;
|
|
|
|
}
|
|
|
|
static inline void pgste_set_key(pte_t *ptep, pgste_t pgste, pte_t entry,
|
|
struct mm_struct *mm)
|
|
{
|
|
#ifdef CONFIG_PGSTE
|
|
unsigned long address;
|
|
unsigned long nkey;
|
|
|
|
if (!mm_uses_skeys(mm) || pte_val(entry) & _PAGE_INVALID)
|
|
return;
|
|
VM_BUG_ON(!(pte_val(*ptep) & _PAGE_INVALID));
|
|
address = pte_val(entry) & PAGE_MASK;
|
|
/*
|
|
* Set page access key and fetch protection bit from pgste.
|
|
* The guest C/R information is still in the PGSTE, set real
|
|
* key C/R to 0.
|
|
*/
|
|
nkey = (pgste_val(pgste) & (PGSTE_ACC_BITS | PGSTE_FP_BIT)) >> 56;
|
|
nkey |= (pgste_val(pgste) & (PGSTE_GR_BIT | PGSTE_GC_BIT)) >> 48;
|
|
page_set_storage_key(address, nkey, 0);
|
|
#endif
|
|
}
|
|
|
|
static inline pgste_t pgste_set_pte(pte_t *ptep, pgste_t pgste, pte_t entry)
|
|
{
|
|
#ifdef CONFIG_PGSTE
|
|
if ((pte_val(entry) & _PAGE_PRESENT) &&
|
|
(pte_val(entry) & _PAGE_WRITE) &&
|
|
!(pte_val(entry) & _PAGE_INVALID)) {
|
|
if (!machine_has_esop()) {
|
|
/*
|
|
* Without enhanced suppression-on-protection force
|
|
* the dirty bit on for all writable ptes.
|
|
*/
|
|
entry = set_pte_bit(entry, __pgprot(_PAGE_DIRTY));
|
|
entry = clear_pte_bit(entry, __pgprot(_PAGE_PROTECT));
|
|
}
|
|
if (!(pte_val(entry) & _PAGE_PROTECT))
|
|
/* This pte allows write access, set user-dirty */
|
|
pgste = set_pgste_bit(pgste, PGSTE_UC_BIT);
|
|
}
|
|
#endif
|
|
set_pte(ptep, entry);
|
|
return pgste;
|
|
}
|
|
|
|
static inline pgste_t pgste_pte_notify(struct mm_struct *mm,
|
|
unsigned long addr,
|
|
pte_t *ptep, pgste_t pgste)
|
|
{
|
|
#ifdef CONFIG_PGSTE
|
|
unsigned long bits;
|
|
|
|
bits = pgste_val(pgste) & (PGSTE_IN_BIT | PGSTE_VSIE_BIT);
|
|
if (bits) {
|
|
pgste = __pgste(pgste_val(pgste) ^ bits);
|
|
ptep_notify(mm, addr, ptep, bits);
|
|
}
|
|
#endif
|
|
return pgste;
|
|
}
|
|
|
|
static inline pgste_t ptep_xchg_start(struct mm_struct *mm,
|
|
unsigned long addr, pte_t *ptep)
|
|
{
|
|
pgste_t pgste = __pgste(0);
|
|
|
|
if (mm_has_pgste(mm)) {
|
|
pgste = pgste_get_lock(ptep);
|
|
pgste = pgste_pte_notify(mm, addr, ptep, pgste);
|
|
}
|
|
return pgste;
|
|
}
|
|
|
|
static inline pte_t ptep_xchg_commit(struct mm_struct *mm,
|
|
unsigned long addr, pte_t *ptep,
|
|
pgste_t pgste, pte_t old, pte_t new)
|
|
{
|
|
if (mm_has_pgste(mm)) {
|
|
if (pte_val(old) & _PAGE_INVALID)
|
|
pgste_set_key(ptep, pgste, new, mm);
|
|
if (pte_val(new) & _PAGE_INVALID) {
|
|
pgste = pgste_update_all(old, pgste, mm);
|
|
if ((pgste_val(pgste) & _PGSTE_GPS_USAGE_MASK) ==
|
|
_PGSTE_GPS_USAGE_UNUSED)
|
|
old = set_pte_bit(old, __pgprot(_PAGE_UNUSED));
|
|
}
|
|
pgste = pgste_set_pte(ptep, pgste, new);
|
|
pgste_set_unlock(ptep, pgste);
|
|
} else {
|
|
set_pte(ptep, new);
|
|
}
|
|
return old;
|
|
}
|
|
|
|
pte_t ptep_xchg_direct(struct mm_struct *mm, unsigned long addr,
|
|
pte_t *ptep, pte_t new)
|
|
{
|
|
pgste_t pgste;
|
|
pte_t old;
|
|
int nodat;
|
|
|
|
preempt_disable();
|
|
pgste = ptep_xchg_start(mm, addr, ptep);
|
|
nodat = !!(pgste_val(pgste) & _PGSTE_GPS_NODAT);
|
|
old = ptep_flush_direct(mm, addr, ptep, nodat);
|
|
old = ptep_xchg_commit(mm, addr, ptep, pgste, old, new);
|
|
preempt_enable();
|
|
return old;
|
|
}
|
|
EXPORT_SYMBOL(ptep_xchg_direct);
|
|
|
|
/*
|
|
* Caller must check that new PTE only differs in _PAGE_PROTECT HW bit, so that
|
|
* RDP can be used instead of IPTE. See also comments at pte_allow_rdp().
|
|
*/
|
|
void ptep_reset_dat_prot(struct mm_struct *mm, unsigned long addr, pte_t *ptep,
|
|
pte_t new)
|
|
{
|
|
preempt_disable();
|
|
atomic_inc(&mm->context.flush_count);
|
|
if (cpumask_equal(mm_cpumask(mm), cpumask_of(smp_processor_id())))
|
|
__ptep_rdp(addr, ptep, 1);
|
|
else
|
|
__ptep_rdp(addr, ptep, 0);
|
|
/*
|
|
* PTE is not invalidated by RDP, only _PAGE_PROTECT is cleared. That
|
|
* means it is still valid and active, and must not be changed according
|
|
* to the architecture. But writing a new value that only differs in SW
|
|
* bits is allowed.
|
|
*/
|
|
set_pte(ptep, new);
|
|
atomic_dec(&mm->context.flush_count);
|
|
preempt_enable();
|
|
}
|
|
EXPORT_SYMBOL(ptep_reset_dat_prot);
|
|
|
|
pte_t ptep_xchg_lazy(struct mm_struct *mm, unsigned long addr,
|
|
pte_t *ptep, pte_t new)
|
|
{
|
|
pgste_t pgste;
|
|
pte_t old;
|
|
int nodat;
|
|
|
|
preempt_disable();
|
|
pgste = ptep_xchg_start(mm, addr, ptep);
|
|
nodat = !!(pgste_val(pgste) & _PGSTE_GPS_NODAT);
|
|
old = ptep_flush_lazy(mm, addr, ptep, nodat);
|
|
old = ptep_xchg_commit(mm, addr, ptep, pgste, old, new);
|
|
preempt_enable();
|
|
return old;
|
|
}
|
|
EXPORT_SYMBOL(ptep_xchg_lazy);
|
|
|
|
pte_t ptep_modify_prot_start(struct vm_area_struct *vma, unsigned long addr,
|
|
pte_t *ptep)
|
|
{
|
|
pgste_t pgste;
|
|
pte_t old;
|
|
int nodat;
|
|
struct mm_struct *mm = vma->vm_mm;
|
|
|
|
pgste = ptep_xchg_start(mm, addr, ptep);
|
|
nodat = !!(pgste_val(pgste) & _PGSTE_GPS_NODAT);
|
|
old = ptep_flush_lazy(mm, addr, ptep, nodat);
|
|
if (mm_has_pgste(mm)) {
|
|
pgste = pgste_update_all(old, pgste, mm);
|
|
pgste_set(ptep, pgste);
|
|
}
|
|
return old;
|
|
}
|
|
|
|
void ptep_modify_prot_commit(struct vm_area_struct *vma, unsigned long addr,
|
|
pte_t *ptep, pte_t old_pte, pte_t pte)
|
|
{
|
|
pgste_t pgste;
|
|
struct mm_struct *mm = vma->vm_mm;
|
|
|
|
if (mm_has_pgste(mm)) {
|
|
pgste = pgste_get(ptep);
|
|
pgste_set_key(ptep, pgste, pte, mm);
|
|
pgste = pgste_set_pte(ptep, pgste, pte);
|
|
pgste_set_unlock(ptep, pgste);
|
|
} else {
|
|
set_pte(ptep, pte);
|
|
}
|
|
}
|
|
|
|
static inline void pmdp_idte_local(struct mm_struct *mm,
|
|
unsigned long addr, pmd_t *pmdp)
|
|
{
|
|
if (machine_has_tlb_guest())
|
|
__pmdp_idte(addr, pmdp, IDTE_NODAT | IDTE_GUEST_ASCE,
|
|
mm->context.asce, IDTE_LOCAL);
|
|
else
|
|
__pmdp_idte(addr, pmdp, 0, 0, IDTE_LOCAL);
|
|
if (mm_has_pgste(mm) && mm->context.allow_gmap_hpage_1m)
|
|
gmap_pmdp_idte_local(mm, addr);
|
|
}
|
|
|
|
static inline void pmdp_idte_global(struct mm_struct *mm,
|
|
unsigned long addr, pmd_t *pmdp)
|
|
{
|
|
if (machine_has_tlb_guest()) {
|
|
__pmdp_idte(addr, pmdp, IDTE_NODAT | IDTE_GUEST_ASCE,
|
|
mm->context.asce, IDTE_GLOBAL);
|
|
if (mm_has_pgste(mm) && mm->context.allow_gmap_hpage_1m)
|
|
gmap_pmdp_idte_global(mm, addr);
|
|
} else {
|
|
__pmdp_idte(addr, pmdp, 0, 0, IDTE_GLOBAL);
|
|
if (mm_has_pgste(mm) && mm->context.allow_gmap_hpage_1m)
|
|
gmap_pmdp_idte_global(mm, addr);
|
|
}
|
|
}
|
|
|
|
static inline pmd_t pmdp_flush_direct(struct mm_struct *mm,
|
|
unsigned long addr, pmd_t *pmdp)
|
|
{
|
|
pmd_t old;
|
|
|
|
old = *pmdp;
|
|
if (pmd_val(old) & _SEGMENT_ENTRY_INVALID)
|
|
return old;
|
|
atomic_inc(&mm->context.flush_count);
|
|
if (cpu_has_tlb_lc() &&
|
|
cpumask_equal(mm_cpumask(mm), cpumask_of(smp_processor_id())))
|
|
pmdp_idte_local(mm, addr, pmdp);
|
|
else
|
|
pmdp_idte_global(mm, addr, pmdp);
|
|
atomic_dec(&mm->context.flush_count);
|
|
return old;
|
|
}
|
|
|
|
static inline pmd_t pmdp_flush_lazy(struct mm_struct *mm,
|
|
unsigned long addr, pmd_t *pmdp)
|
|
{
|
|
pmd_t old;
|
|
|
|
old = *pmdp;
|
|
if (pmd_val(old) & _SEGMENT_ENTRY_INVALID)
|
|
return old;
|
|
atomic_inc(&mm->context.flush_count);
|
|
if (cpumask_equal(&mm->context.cpu_attach_mask,
|
|
cpumask_of(smp_processor_id()))) {
|
|
set_pmd(pmdp, set_pmd_bit(*pmdp, __pgprot(_SEGMENT_ENTRY_INVALID)));
|
|
mm->context.flush_mm = 1;
|
|
if (mm_has_pgste(mm))
|
|
gmap_pmdp_invalidate(mm, addr);
|
|
} else {
|
|
pmdp_idte_global(mm, addr, pmdp);
|
|
}
|
|
atomic_dec(&mm->context.flush_count);
|
|
return old;
|
|
}
|
|
|
|
#ifdef CONFIG_PGSTE
|
|
static int pmd_lookup(struct mm_struct *mm, unsigned long addr, pmd_t **pmdp)
|
|
{
|
|
struct vm_area_struct *vma;
|
|
pgd_t *pgd;
|
|
p4d_t *p4d;
|
|
pud_t *pud;
|
|
|
|
/* We need a valid VMA, otherwise this is clearly a fault. */
|
|
vma = vma_lookup(mm, addr);
|
|
if (!vma)
|
|
return -EFAULT;
|
|
|
|
pgd = pgd_offset(mm, addr);
|
|
if (!pgd_present(*pgd))
|
|
return -ENOENT;
|
|
|
|
p4d = p4d_offset(pgd, addr);
|
|
if (!p4d_present(*p4d))
|
|
return -ENOENT;
|
|
|
|
pud = pud_offset(p4d, addr);
|
|
if (!pud_present(*pud))
|
|
return -ENOENT;
|
|
|
|
/* Large PUDs are not supported yet. */
|
|
if (pud_leaf(*pud))
|
|
return -EFAULT;
|
|
|
|
*pmdp = pmd_offset(pud, addr);
|
|
return 0;
|
|
}
|
|
#endif
|
|
|
|
pmd_t pmdp_xchg_direct(struct mm_struct *mm, unsigned long addr,
|
|
pmd_t *pmdp, pmd_t new)
|
|
{
|
|
pmd_t old;
|
|
|
|
preempt_disable();
|
|
old = pmdp_flush_direct(mm, addr, pmdp);
|
|
set_pmd(pmdp, new);
|
|
preempt_enable();
|
|
return old;
|
|
}
|
|
EXPORT_SYMBOL(pmdp_xchg_direct);
|
|
|
|
pmd_t pmdp_xchg_lazy(struct mm_struct *mm, unsigned long addr,
|
|
pmd_t *pmdp, pmd_t new)
|
|
{
|
|
pmd_t old;
|
|
|
|
preempt_disable();
|
|
old = pmdp_flush_lazy(mm, addr, pmdp);
|
|
set_pmd(pmdp, new);
|
|
preempt_enable();
|
|
return old;
|
|
}
|
|
EXPORT_SYMBOL(pmdp_xchg_lazy);
|
|
|
|
static inline void pudp_idte_local(struct mm_struct *mm,
|
|
unsigned long addr, pud_t *pudp)
|
|
{
|
|
if (machine_has_tlb_guest())
|
|
__pudp_idte(addr, pudp, IDTE_NODAT | IDTE_GUEST_ASCE,
|
|
mm->context.asce, IDTE_LOCAL);
|
|
else
|
|
__pudp_idte(addr, pudp, 0, 0, IDTE_LOCAL);
|
|
}
|
|
|
|
static inline void pudp_idte_global(struct mm_struct *mm,
|
|
unsigned long addr, pud_t *pudp)
|
|
{
|
|
if (machine_has_tlb_guest())
|
|
__pudp_idte(addr, pudp, IDTE_NODAT | IDTE_GUEST_ASCE,
|
|
mm->context.asce, IDTE_GLOBAL);
|
|
else
|
|
__pudp_idte(addr, pudp, 0, 0, IDTE_GLOBAL);
|
|
}
|
|
|
|
static inline pud_t pudp_flush_direct(struct mm_struct *mm,
|
|
unsigned long addr, pud_t *pudp)
|
|
{
|
|
pud_t old;
|
|
|
|
old = *pudp;
|
|
if (pud_val(old) & _REGION_ENTRY_INVALID)
|
|
return old;
|
|
atomic_inc(&mm->context.flush_count);
|
|
if (cpu_has_tlb_lc() &&
|
|
cpumask_equal(mm_cpumask(mm), cpumask_of(smp_processor_id())))
|
|
pudp_idte_local(mm, addr, pudp);
|
|
else
|
|
pudp_idte_global(mm, addr, pudp);
|
|
atomic_dec(&mm->context.flush_count);
|
|
return old;
|
|
}
|
|
|
|
pud_t pudp_xchg_direct(struct mm_struct *mm, unsigned long addr,
|
|
pud_t *pudp, pud_t new)
|
|
{
|
|
pud_t old;
|
|
|
|
preempt_disable();
|
|
old = pudp_flush_direct(mm, addr, pudp);
|
|
set_pud(pudp, new);
|
|
preempt_enable();
|
|
return old;
|
|
}
|
|
EXPORT_SYMBOL(pudp_xchg_direct);
|
|
|
|
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
|
|
void pgtable_trans_huge_deposit(struct mm_struct *mm, pmd_t *pmdp,
|
|
pgtable_t pgtable)
|
|
{
|
|
struct list_head *lh = (struct list_head *) pgtable;
|
|
|
|
assert_spin_locked(pmd_lockptr(mm, pmdp));
|
|
|
|
/* FIFO */
|
|
if (!pmd_huge_pte(mm, pmdp))
|
|
INIT_LIST_HEAD(lh);
|
|
else
|
|
list_add(lh, (struct list_head *) pmd_huge_pte(mm, pmdp));
|
|
pmd_huge_pte(mm, pmdp) = pgtable;
|
|
}
|
|
|
|
pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp)
|
|
{
|
|
struct list_head *lh;
|
|
pgtable_t pgtable;
|
|
pte_t *ptep;
|
|
|
|
assert_spin_locked(pmd_lockptr(mm, pmdp));
|
|
|
|
/* FIFO */
|
|
pgtable = pmd_huge_pte(mm, pmdp);
|
|
lh = (struct list_head *) pgtable;
|
|
if (list_empty(lh))
|
|
pmd_huge_pte(mm, pmdp) = NULL;
|
|
else {
|
|
pmd_huge_pte(mm, pmdp) = (pgtable_t) lh->next;
|
|
list_del(lh);
|
|
}
|
|
ptep = (pte_t *) pgtable;
|
|
set_pte(ptep, __pte(_PAGE_INVALID));
|
|
ptep++;
|
|
set_pte(ptep, __pte(_PAGE_INVALID));
|
|
return pgtable;
|
|
}
|
|
#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
|
|
|
|
#ifdef CONFIG_PGSTE
|
|
void ptep_set_pte_at(struct mm_struct *mm, unsigned long addr,
|
|
pte_t *ptep, pte_t entry)
|
|
{
|
|
pgste_t pgste;
|
|
|
|
/* the mm_has_pgste() check is done in set_pte_at() */
|
|
preempt_disable();
|
|
pgste = pgste_get_lock(ptep);
|
|
pgste = clear_pgste_bit(pgste, _PGSTE_GPS_ZERO);
|
|
pgste_set_key(ptep, pgste, entry, mm);
|
|
pgste = pgste_set_pte(ptep, pgste, entry);
|
|
pgste_set_unlock(ptep, pgste);
|
|
preempt_enable();
|
|
}
|
|
|
|
void ptep_set_notify(struct mm_struct *mm, unsigned long addr, pte_t *ptep)
|
|
{
|
|
pgste_t pgste;
|
|
|
|
preempt_disable();
|
|
pgste = pgste_get_lock(ptep);
|
|
pgste = set_pgste_bit(pgste, PGSTE_IN_BIT);
|
|
pgste_set_unlock(ptep, pgste);
|
|
preempt_enable();
|
|
}
|
|
|
|
/**
|
|
* ptep_force_prot - change access rights of a locked pte
|
|
* @mm: pointer to the process mm_struct
|
|
* @addr: virtual address in the guest address space
|
|
* @ptep: pointer to the page table entry
|
|
* @prot: indicates guest access rights: PROT_NONE, PROT_READ or PROT_WRITE
|
|
* @bit: pgste bit to set (e.g. for notification)
|
|
*
|
|
* Returns 0 if the access rights were changed and -EAGAIN if the current
|
|
* and requested access rights are incompatible.
|
|
*/
|
|
int ptep_force_prot(struct mm_struct *mm, unsigned long addr,
|
|
pte_t *ptep, int prot, unsigned long bit)
|
|
{
|
|
pte_t entry;
|
|
pgste_t pgste;
|
|
int pte_i, pte_p, nodat;
|
|
|
|
pgste = pgste_get_lock(ptep);
|
|
entry = *ptep;
|
|
/* Check pte entry after all locks have been acquired */
|
|
pte_i = pte_val(entry) & _PAGE_INVALID;
|
|
pte_p = pte_val(entry) & _PAGE_PROTECT;
|
|
if ((pte_i && (prot != PROT_NONE)) ||
|
|
(pte_p && (prot & PROT_WRITE))) {
|
|
pgste_set_unlock(ptep, pgste);
|
|
return -EAGAIN;
|
|
}
|
|
/* Change access rights and set pgste bit */
|
|
nodat = !!(pgste_val(pgste) & _PGSTE_GPS_NODAT);
|
|
if (prot == PROT_NONE && !pte_i) {
|
|
ptep_flush_direct(mm, addr, ptep, nodat);
|
|
pgste = pgste_update_all(entry, pgste, mm);
|
|
entry = set_pte_bit(entry, __pgprot(_PAGE_INVALID));
|
|
}
|
|
if (prot == PROT_READ && !pte_p) {
|
|
ptep_flush_direct(mm, addr, ptep, nodat);
|
|
entry = clear_pte_bit(entry, __pgprot(_PAGE_INVALID));
|
|
entry = set_pte_bit(entry, __pgprot(_PAGE_PROTECT));
|
|
}
|
|
pgste = set_pgste_bit(pgste, bit);
|
|
pgste = pgste_set_pte(ptep, pgste, entry);
|
|
pgste_set_unlock(ptep, pgste);
|
|
return 0;
|
|
}
|
|
|
|
int ptep_shadow_pte(struct mm_struct *mm, unsigned long saddr,
|
|
pte_t *sptep, pte_t *tptep, pte_t pte)
|
|
{
|
|
pgste_t spgste, tpgste;
|
|
pte_t spte, tpte;
|
|
int rc = -EAGAIN;
|
|
|
|
if (!(pte_val(*tptep) & _PAGE_INVALID))
|
|
return 0; /* already shadowed */
|
|
spgste = pgste_get_lock(sptep);
|
|
spte = *sptep;
|
|
if (!(pte_val(spte) & _PAGE_INVALID) &&
|
|
!((pte_val(spte) & _PAGE_PROTECT) &&
|
|
!(pte_val(pte) & _PAGE_PROTECT))) {
|
|
spgste = set_pgste_bit(spgste, PGSTE_VSIE_BIT);
|
|
tpgste = pgste_get_lock(tptep);
|
|
tpte = __pte((pte_val(spte) & PAGE_MASK) |
|
|
(pte_val(pte) & _PAGE_PROTECT));
|
|
/* don't touch the storage key - it belongs to parent pgste */
|
|
tpgste = pgste_set_pte(tptep, tpgste, tpte);
|
|
pgste_set_unlock(tptep, tpgste);
|
|
rc = 1;
|
|
}
|
|
pgste_set_unlock(sptep, spgste);
|
|
return rc;
|
|
}
|
|
|
|
void ptep_unshadow_pte(struct mm_struct *mm, unsigned long saddr, pte_t *ptep)
|
|
{
|
|
pgste_t pgste;
|
|
int nodat;
|
|
|
|
pgste = pgste_get_lock(ptep);
|
|
/* notifier is called by the caller */
|
|
nodat = !!(pgste_val(pgste) & _PGSTE_GPS_NODAT);
|
|
ptep_flush_direct(mm, saddr, ptep, nodat);
|
|
/* don't touch the storage key - it belongs to parent pgste */
|
|
pgste = pgste_set_pte(ptep, pgste, __pte(_PAGE_INVALID));
|
|
pgste_set_unlock(ptep, pgste);
|
|
}
|
|
|
|
static void ptep_zap_softleaf_entry(struct mm_struct *mm, softleaf_t entry)
|
|
{
|
|
if (softleaf_is_swap(entry))
|
|
dec_mm_counter(mm, MM_SWAPENTS);
|
|
else if (softleaf_is_migration(entry)) {
|
|
struct folio *folio = softleaf_to_folio(entry);
|
|
|
|
dec_mm_counter(mm, mm_counter(folio));
|
|
}
|
|
free_swap_and_cache(entry);
|
|
}
|
|
|
|
void ptep_zap_unused(struct mm_struct *mm, unsigned long addr,
|
|
pte_t *ptep, int reset)
|
|
{
|
|
unsigned long pgstev;
|
|
pgste_t pgste;
|
|
pte_t pte;
|
|
|
|
/* Zap unused and logically-zero pages */
|
|
preempt_disable();
|
|
pgste = pgste_get_lock(ptep);
|
|
pgstev = pgste_val(pgste);
|
|
pte = *ptep;
|
|
if (!reset && pte_swap(pte) &&
|
|
((pgstev & _PGSTE_GPS_USAGE_MASK) == _PGSTE_GPS_USAGE_UNUSED ||
|
|
(pgstev & _PGSTE_GPS_ZERO))) {
|
|
ptep_zap_softleaf_entry(mm, softleaf_from_pte(pte));
|
|
pte_clear(mm, addr, ptep);
|
|
}
|
|
if (reset)
|
|
pgste = clear_pgste_bit(pgste, _PGSTE_GPS_USAGE_MASK | _PGSTE_GPS_NODAT);
|
|
pgste_set_unlock(ptep, pgste);
|
|
preempt_enable();
|
|
}
|
|
|
|
void ptep_zap_key(struct mm_struct *mm, unsigned long addr, pte_t *ptep)
|
|
{
|
|
unsigned long ptev;
|
|
pgste_t pgste;
|
|
|
|
/* Clear storage key ACC and F, but set R/C */
|
|
preempt_disable();
|
|
pgste = pgste_get_lock(ptep);
|
|
pgste = clear_pgste_bit(pgste, PGSTE_ACC_BITS | PGSTE_FP_BIT);
|
|
pgste = set_pgste_bit(pgste, PGSTE_GR_BIT | PGSTE_GC_BIT);
|
|
ptev = pte_val(*ptep);
|
|
if (!(ptev & _PAGE_INVALID) && (ptev & _PAGE_WRITE))
|
|
page_set_storage_key(ptev & PAGE_MASK, PAGE_DEFAULT_KEY, 0);
|
|
pgste_set_unlock(ptep, pgste);
|
|
preempt_enable();
|
|
}
|
|
|
|
/*
|
|
* Test and reset if a guest page is dirty
|
|
*/
|
|
bool ptep_test_and_clear_uc(struct mm_struct *mm, unsigned long addr,
|
|
pte_t *ptep)
|
|
{
|
|
pgste_t pgste;
|
|
pte_t pte;
|
|
bool dirty;
|
|
int nodat;
|
|
|
|
pgste = pgste_get_lock(ptep);
|
|
dirty = !!(pgste_val(pgste) & PGSTE_UC_BIT);
|
|
pgste = clear_pgste_bit(pgste, PGSTE_UC_BIT);
|
|
pte = *ptep;
|
|
if (dirty && (pte_val(pte) & _PAGE_PRESENT)) {
|
|
pgste = pgste_pte_notify(mm, addr, ptep, pgste);
|
|
nodat = !!(pgste_val(pgste) & _PGSTE_GPS_NODAT);
|
|
ptep_ipte_global(mm, addr, ptep, nodat);
|
|
if (machine_has_esop() || !(pte_val(pte) & _PAGE_WRITE))
|
|
pte = set_pte_bit(pte, __pgprot(_PAGE_PROTECT));
|
|
else
|
|
pte = set_pte_bit(pte, __pgprot(_PAGE_INVALID));
|
|
set_pte(ptep, pte);
|
|
}
|
|
pgste_set_unlock(ptep, pgste);
|
|
return dirty;
|
|
}
|
|
EXPORT_SYMBOL_GPL(ptep_test_and_clear_uc);
|
|
|
|
int set_guest_storage_key(struct mm_struct *mm, unsigned long addr,
|
|
unsigned char key, bool nq)
|
|
{
|
|
unsigned long keyul, paddr;
|
|
spinlock_t *ptl;
|
|
pgste_t old, new;
|
|
pmd_t *pmdp;
|
|
pte_t *ptep;
|
|
|
|
/*
|
|
* If we don't have a PTE table and if there is no huge page mapped,
|
|
* we can ignore attempts to set the key to 0, because it already is 0.
|
|
*/
|
|
switch (pmd_lookup(mm, addr, &pmdp)) {
|
|
case -ENOENT:
|
|
return key ? -EFAULT : 0;
|
|
case 0:
|
|
break;
|
|
default:
|
|
return -EFAULT;
|
|
}
|
|
again:
|
|
ptl = pmd_lock(mm, pmdp);
|
|
if (!pmd_present(*pmdp)) {
|
|
spin_unlock(ptl);
|
|
return key ? -EFAULT : 0;
|
|
}
|
|
|
|
if (pmd_leaf(*pmdp)) {
|
|
paddr = pmd_val(*pmdp) & HPAGE_MASK;
|
|
paddr |= addr & ~HPAGE_MASK;
|
|
/*
|
|
* Huge pmds need quiescing operations, they are
|
|
* always mapped.
|
|
*/
|
|
page_set_storage_key(paddr, key, 1);
|
|
spin_unlock(ptl);
|
|
return 0;
|
|
}
|
|
spin_unlock(ptl);
|
|
|
|
ptep = pte_offset_map_lock(mm, pmdp, addr, &ptl);
|
|
if (!ptep)
|
|
goto again;
|
|
new = old = pgste_get_lock(ptep);
|
|
new = clear_pgste_bit(new, PGSTE_GR_BIT | PGSTE_GC_BIT |
|
|
PGSTE_ACC_BITS | PGSTE_FP_BIT);
|
|
keyul = (unsigned long) key;
|
|
new = set_pgste_bit(new, (keyul & (_PAGE_CHANGED | _PAGE_REFERENCED)) << 48);
|
|
new = set_pgste_bit(new, (keyul & (_PAGE_ACC_BITS | _PAGE_FP_BIT)) << 56);
|
|
if (!(pte_val(*ptep) & _PAGE_INVALID)) {
|
|
unsigned long bits, skey;
|
|
|
|
paddr = pte_val(*ptep) & PAGE_MASK;
|
|
skey = (unsigned long) page_get_storage_key(paddr);
|
|
bits = skey & (_PAGE_CHANGED | _PAGE_REFERENCED);
|
|
skey = key & (_PAGE_ACC_BITS | _PAGE_FP_BIT);
|
|
/* Set storage key ACC and FP */
|
|
page_set_storage_key(paddr, skey, !nq);
|
|
/* Merge host changed & referenced into pgste */
|
|
new = set_pgste_bit(new, bits << 52);
|
|
}
|
|
/* changing the guest storage key is considered a change of the page */
|
|
if ((pgste_val(new) ^ pgste_val(old)) &
|
|
(PGSTE_ACC_BITS | PGSTE_FP_BIT | PGSTE_GR_BIT | PGSTE_GC_BIT))
|
|
new = set_pgste_bit(new, PGSTE_UC_BIT);
|
|
|
|
pgste_set_unlock(ptep, new);
|
|
pte_unmap_unlock(ptep, ptl);
|
|
return 0;
|
|
}
|
|
EXPORT_SYMBOL(set_guest_storage_key);
|
|
|
|
/*
|
|
* Conditionally set a guest storage key (handling csske).
|
|
* oldkey will be updated when either mr or mc is set and a pointer is given.
|
|
*
|
|
* Returns 0 if a guests storage key update wasn't necessary, 1 if the guest
|
|
* storage key was updated and -EFAULT on access errors.
|
|
*/
|
|
int cond_set_guest_storage_key(struct mm_struct *mm, unsigned long addr,
|
|
unsigned char key, unsigned char *oldkey,
|
|
bool nq, bool mr, bool mc)
|
|
{
|
|
unsigned char tmp, mask = _PAGE_ACC_BITS | _PAGE_FP_BIT;
|
|
int rc;
|
|
|
|
/* we can drop the pgste lock between getting and setting the key */
|
|
if (mr | mc) {
|
|
rc = get_guest_storage_key(current->mm, addr, &tmp);
|
|
if (rc)
|
|
return rc;
|
|
if (oldkey)
|
|
*oldkey = tmp;
|
|
if (!mr)
|
|
mask |= _PAGE_REFERENCED;
|
|
if (!mc)
|
|
mask |= _PAGE_CHANGED;
|
|
if (!((tmp ^ key) & mask))
|
|
return 0;
|
|
}
|
|
rc = set_guest_storage_key(current->mm, addr, key, nq);
|
|
return rc < 0 ? rc : 1;
|
|
}
|
|
EXPORT_SYMBOL(cond_set_guest_storage_key);
|
|
|
|
/*
|
|
* Reset a guest reference bit (rrbe), returning the reference and changed bit.
|
|
*
|
|
* Returns < 0 in case of error, otherwise the cc to be reported to the guest.
|
|
*/
|
|
int reset_guest_reference_bit(struct mm_struct *mm, unsigned long addr)
|
|
{
|
|
spinlock_t *ptl;
|
|
unsigned long paddr;
|
|
pgste_t old, new;
|
|
pmd_t *pmdp;
|
|
pte_t *ptep;
|
|
int cc = 0;
|
|
|
|
/*
|
|
* If we don't have a PTE table and if there is no huge page mapped,
|
|
* the storage key is 0 and there is nothing for us to do.
|
|
*/
|
|
switch (pmd_lookup(mm, addr, &pmdp)) {
|
|
case -ENOENT:
|
|
return 0;
|
|
case 0:
|
|
break;
|
|
default:
|
|
return -EFAULT;
|
|
}
|
|
again:
|
|
ptl = pmd_lock(mm, pmdp);
|
|
if (!pmd_present(*pmdp)) {
|
|
spin_unlock(ptl);
|
|
return 0;
|
|
}
|
|
|
|
if (pmd_leaf(*pmdp)) {
|
|
paddr = pmd_val(*pmdp) & HPAGE_MASK;
|
|
paddr |= addr & ~HPAGE_MASK;
|
|
cc = page_reset_referenced(paddr);
|
|
spin_unlock(ptl);
|
|
return cc;
|
|
}
|
|
spin_unlock(ptl);
|
|
|
|
ptep = pte_offset_map_lock(mm, pmdp, addr, &ptl);
|
|
if (!ptep)
|
|
goto again;
|
|
new = old = pgste_get_lock(ptep);
|
|
/* Reset guest reference bit only */
|
|
new = clear_pgste_bit(new, PGSTE_GR_BIT);
|
|
|
|
if (!(pte_val(*ptep) & _PAGE_INVALID)) {
|
|
paddr = pte_val(*ptep) & PAGE_MASK;
|
|
cc = page_reset_referenced(paddr);
|
|
/* Merge real referenced bit into host-set */
|
|
new = set_pgste_bit(new, ((unsigned long)cc << 53) & PGSTE_HR_BIT);
|
|
}
|
|
/* Reflect guest's logical view, not physical */
|
|
cc |= (pgste_val(old) & (PGSTE_GR_BIT | PGSTE_GC_BIT)) >> 49;
|
|
/* Changing the guest storage key is considered a change of the page */
|
|
if ((pgste_val(new) ^ pgste_val(old)) & PGSTE_GR_BIT)
|
|
new = set_pgste_bit(new, PGSTE_UC_BIT);
|
|
|
|
pgste_set_unlock(ptep, new);
|
|
pte_unmap_unlock(ptep, ptl);
|
|
return cc;
|
|
}
|
|
EXPORT_SYMBOL(reset_guest_reference_bit);
|
|
|
|
int get_guest_storage_key(struct mm_struct *mm, unsigned long addr,
|
|
unsigned char *key)
|
|
{
|
|
unsigned long paddr;
|
|
spinlock_t *ptl;
|
|
pgste_t pgste;
|
|
pmd_t *pmdp;
|
|
pte_t *ptep;
|
|
|
|
/*
|
|
* If we don't have a PTE table and if there is no huge page mapped,
|
|
* the storage key is 0.
|
|
*/
|
|
*key = 0;
|
|
|
|
switch (pmd_lookup(mm, addr, &pmdp)) {
|
|
case -ENOENT:
|
|
return 0;
|
|
case 0:
|
|
break;
|
|
default:
|
|
return -EFAULT;
|
|
}
|
|
again:
|
|
ptl = pmd_lock(mm, pmdp);
|
|
if (!pmd_present(*pmdp)) {
|
|
spin_unlock(ptl);
|
|
return 0;
|
|
}
|
|
|
|
if (pmd_leaf(*pmdp)) {
|
|
paddr = pmd_val(*pmdp) & HPAGE_MASK;
|
|
paddr |= addr & ~HPAGE_MASK;
|
|
*key = page_get_storage_key(paddr);
|
|
spin_unlock(ptl);
|
|
return 0;
|
|
}
|
|
spin_unlock(ptl);
|
|
|
|
ptep = pte_offset_map_lock(mm, pmdp, addr, &ptl);
|
|
if (!ptep)
|
|
goto again;
|
|
pgste = pgste_get_lock(ptep);
|
|
*key = (pgste_val(pgste) & (PGSTE_ACC_BITS | PGSTE_FP_BIT)) >> 56;
|
|
paddr = pte_val(*ptep) & PAGE_MASK;
|
|
if (!(pte_val(*ptep) & _PAGE_INVALID))
|
|
*key = page_get_storage_key(paddr);
|
|
/* Reflect guest's logical view, not physical */
|
|
*key |= (pgste_val(pgste) & (PGSTE_GR_BIT | PGSTE_GC_BIT)) >> 48;
|
|
pgste_set_unlock(ptep, pgste);
|
|
pte_unmap_unlock(ptep, ptl);
|
|
return 0;
|
|
}
|
|
EXPORT_SYMBOL(get_guest_storage_key);
|
|
|
|
/**
|
|
* pgste_perform_essa - perform ESSA actions on the PGSTE.
|
|
* @mm: the memory context. It must have PGSTEs, no check is performed here!
|
|
* @hva: the host virtual address of the page whose PGSTE is to be processed
|
|
* @orc: the specific action to perform, see the ESSA_SET_* macros.
|
|
* @oldpte: the PTE will be saved there if the pointer is not NULL.
|
|
* @oldpgste: the old PGSTE will be saved there if the pointer is not NULL.
|
|
*
|
|
* Return: 1 if the page is to be added to the CBRL, otherwise 0,
|
|
* or < 0 in case of error. -EINVAL is returned for invalid values
|
|
* of orc, -EFAULT for invalid addresses.
|
|
*/
|
|
int pgste_perform_essa(struct mm_struct *mm, unsigned long hva, int orc,
|
|
unsigned long *oldpte, unsigned long *oldpgste)
|
|
{
|
|
struct vm_area_struct *vma;
|
|
unsigned long pgstev;
|
|
spinlock_t *ptl;
|
|
pgste_t pgste;
|
|
pte_t *ptep;
|
|
int res = 0;
|
|
|
|
WARN_ON_ONCE(orc > ESSA_MAX);
|
|
if (unlikely(orc > ESSA_MAX))
|
|
return -EINVAL;
|
|
|
|
vma = vma_lookup(mm, hva);
|
|
if (!vma || is_vm_hugetlb_page(vma))
|
|
return -EFAULT;
|
|
ptep = get_locked_pte(mm, hva, &ptl);
|
|
if (unlikely(!ptep))
|
|
return -EFAULT;
|
|
pgste = pgste_get_lock(ptep);
|
|
pgstev = pgste_val(pgste);
|
|
if (oldpte)
|
|
*oldpte = pte_val(*ptep);
|
|
if (oldpgste)
|
|
*oldpgste = pgstev;
|
|
|
|
switch (orc) {
|
|
case ESSA_GET_STATE:
|
|
break;
|
|
case ESSA_SET_STABLE:
|
|
pgstev &= ~(_PGSTE_GPS_USAGE_MASK | _PGSTE_GPS_NODAT);
|
|
pgstev |= _PGSTE_GPS_USAGE_STABLE;
|
|
break;
|
|
case ESSA_SET_UNUSED:
|
|
pgstev &= ~_PGSTE_GPS_USAGE_MASK;
|
|
pgstev |= _PGSTE_GPS_USAGE_UNUSED;
|
|
if (pte_val(*ptep) & _PAGE_INVALID)
|
|
res = 1;
|
|
break;
|
|
case ESSA_SET_VOLATILE:
|
|
pgstev &= ~_PGSTE_GPS_USAGE_MASK;
|
|
pgstev |= _PGSTE_GPS_USAGE_VOLATILE;
|
|
if (pte_val(*ptep) & _PAGE_INVALID)
|
|
res = 1;
|
|
break;
|
|
case ESSA_SET_POT_VOLATILE:
|
|
pgstev &= ~_PGSTE_GPS_USAGE_MASK;
|
|
if (!(pte_val(*ptep) & _PAGE_INVALID)) {
|
|
pgstev |= _PGSTE_GPS_USAGE_POT_VOLATILE;
|
|
break;
|
|
}
|
|
if (pgstev & _PGSTE_GPS_ZERO) {
|
|
pgstev |= _PGSTE_GPS_USAGE_VOLATILE;
|
|
break;
|
|
}
|
|
if (!(pgstev & PGSTE_GC_BIT)) {
|
|
pgstev |= _PGSTE_GPS_USAGE_VOLATILE;
|
|
res = 1;
|
|
break;
|
|
}
|
|
break;
|
|
case ESSA_SET_STABLE_RESIDENT:
|
|
pgstev &= ~_PGSTE_GPS_USAGE_MASK;
|
|
pgstev |= _PGSTE_GPS_USAGE_STABLE;
|
|
/*
|
|
* Since the resident state can go away any time after this
|
|
* call, we will not make this page resident. We can revisit
|
|
* this decision if a guest will ever start using this.
|
|
*/
|
|
break;
|
|
case ESSA_SET_STABLE_IF_RESIDENT:
|
|
if (!(pte_val(*ptep) & _PAGE_INVALID)) {
|
|
pgstev &= ~_PGSTE_GPS_USAGE_MASK;
|
|
pgstev |= _PGSTE_GPS_USAGE_STABLE;
|
|
}
|
|
break;
|
|
case ESSA_SET_STABLE_NODAT:
|
|
pgstev &= ~_PGSTE_GPS_USAGE_MASK;
|
|
pgstev |= _PGSTE_GPS_USAGE_STABLE | _PGSTE_GPS_NODAT;
|
|
break;
|
|
default:
|
|
/* we should never get here! */
|
|
break;
|
|
}
|
|
/* If we are discarding a page, set it to logical zero */
|
|
if (res)
|
|
pgstev |= _PGSTE_GPS_ZERO;
|
|
|
|
pgste = __pgste(pgstev);
|
|
pgste_set_unlock(ptep, pgste);
|
|
pte_unmap_unlock(ptep, ptl);
|
|
return res;
|
|
}
|
|
EXPORT_SYMBOL(pgste_perform_essa);
|
|
|
|
/**
|
|
* set_pgste_bits - set specific PGSTE bits.
|
|
* @mm: the memory context. It must have PGSTEs, no check is performed here!
|
|
* @hva: the host virtual address of the page whose PGSTE is to be processed
|
|
* @bits: a bitmask representing the bits that will be touched
|
|
* @value: the values of the bits to be written. Only the bits in the mask
|
|
* will be written.
|
|
*
|
|
* Return: 0 on success, < 0 in case of error.
|
|
*/
|
|
int set_pgste_bits(struct mm_struct *mm, unsigned long hva,
|
|
unsigned long bits, unsigned long value)
|
|
{
|
|
struct vm_area_struct *vma;
|
|
spinlock_t *ptl;
|
|
pgste_t new;
|
|
pte_t *ptep;
|
|
|
|
vma = vma_lookup(mm, hva);
|
|
if (!vma || is_vm_hugetlb_page(vma))
|
|
return -EFAULT;
|
|
ptep = get_locked_pte(mm, hva, &ptl);
|
|
if (unlikely(!ptep))
|
|
return -EFAULT;
|
|
new = pgste_get_lock(ptep);
|
|
|
|
new = clear_pgste_bit(new, bits);
|
|
new = set_pgste_bit(new, value & bits);
|
|
|
|
pgste_set_unlock(ptep, new);
|
|
pte_unmap_unlock(ptep, ptl);
|
|
return 0;
|
|
}
|
|
EXPORT_SYMBOL(set_pgste_bits);
|
|
|
|
/**
|
|
* get_pgste - get the current PGSTE for the given address.
|
|
* @mm: the memory context. It must have PGSTEs, no check is performed here!
|
|
* @hva: the host virtual address of the page whose PGSTE is to be processed
|
|
* @pgstep: will be written with the current PGSTE for the given address.
|
|
*
|
|
* Return: 0 on success, < 0 in case of error.
|
|
*/
|
|
int get_pgste(struct mm_struct *mm, unsigned long hva, unsigned long *pgstep)
|
|
{
|
|
struct vm_area_struct *vma;
|
|
spinlock_t *ptl;
|
|
pte_t *ptep;
|
|
|
|
vma = vma_lookup(mm, hva);
|
|
if (!vma || is_vm_hugetlb_page(vma))
|
|
return -EFAULT;
|
|
ptep = get_locked_pte(mm, hva, &ptl);
|
|
if (unlikely(!ptep))
|
|
return -EFAULT;
|
|
*pgstep = pgste_val(pgste_get(ptep));
|
|
pte_unmap_unlock(ptep, ptl);
|
|
return 0;
|
|
}
|
|
EXPORT_SYMBOL(get_pgste);
|
|
#endif
|