Files
linux/arch/x86/include/asm/svm.h
Linus Torvalds 51d90a15fe Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM updates from Paolo Bonzini:
 "ARM:

   - Support for userspace handling of synchronous external aborts
     (SEAs), allowing the VMM to potentially handle the abort in a
     non-fatal manner

   - Large rework of the VGIC's list register handling with the goal of
     supporting more active/pending IRQs than available list registers
     in hardware. In addition, the VGIC now supports EOImode==1 style
     deactivations for IRQs which may occur on a separate vCPU than the
     one that acked the IRQ

   - Support for FEAT_XNX (user / privileged execute permissions) and
     FEAT_HAF (hardware update to the Access Flag) in the software page
     table walkers and shadow MMU

   - Allow page table destruction to reschedule, fixing long
     need_resched latencies observed when destroying a large VM

   - Minor fixes to KVM and selftests

  Loongarch:

   - Get VM PMU capability from HW GCFG register

   - Add AVEC basic support

   - Use 64-bit register definition for EIOINTC

   - Add KVM timer test cases for tools/selftests

  RISC/V:

   - SBI message passing (MPXY) support for KVM guest

   - Give a new, more specific error subcode for the case when in-kernel
     AIA virtualization fails to allocate IMSIC VS-file

   - Support KVM_DIRTY_LOG_INITIALLY_SET, enabling dirty log gradually
     in small chunks

   - Fix guest page fault within HLV* instructions

   - Flush VS-stage TLB after VCPU migration for Andes cores

  s390:

   - Always allocate ESCA (Extended System Control Area), instead of
     starting with the basic SCA and converting to ESCA with the
     addition of the 65th vCPU. The price is increased number of exits
     (and worse performance) on z10 and earlier processor; ESCA was
     introduced by z114/z196 in 2010

   - VIRT_XFER_TO_GUEST_WORK support

   - Operation exception forwarding support

   - Cleanups

  x86:

   - Skip the costly "zap all SPTEs" on an MMIO generation wrap if MMIO
     SPTE caching is disabled, as there can't be any relevant SPTEs to
     zap

   - Relocate a misplaced export

   - Fix an async #PF bug where KVM would clear the completion queue
     when the guest transitioned in and out of paging mode, e.g. when
     handling an SMI and then returning to paged mode via RSM

   - Leave KVM's user-return notifier registered even when disabling
     virtualization, as long as kvm.ko is loaded. On reboot/shutdown,
     keeping the notifier registered is ok; the kernel does not use the
     MSRs and the callback will run cleanly and restore host MSRs if the
     CPU manages to return to userspace before the system goes down

   - Use the checked version of {get,put}_user()

   - Fix a long-lurking bug where KVM's lack of catch-up logic for
     periodic APIC timers can result in a hard lockup in the host

   - Revert the periodic kvmclock sync logic now that KVM doesn't use a
     clocksource that's subject to NTP corrections

   - Clean up KVM's handling of MMIO Stale Data and L1TF, and bury the
     latter behind CONFIG_CPU_MITIGATIONS

   - Context switch XCR0, XSS, and PKRU outside of the entry/exit fast
     path; the only reason they were handled in the fast path was to
     paper of a bug in the core #MC code, and that has long since been
     fixed

   - Add emulator support for AVX MOV instructions, to play nice with
     emulated devices whose guest drivers like to access PCI BARs with
     large multi-byte instructions

  x86 (AMD):

   - Fix a few missing "VMCB dirty" bugs

   - Fix the worst of KVM's lack of EFER.LMSLE emulation

   - Add AVIC support for addressing 4k vCPUs in x2AVIC mode

   - Fix incorrect handling of selective CR0 writes when checking
     intercepts during emulation of L2 instructions

   - Fix a currently-benign bug where KVM would clobber SPEC_CTRL[63:32]
     on VMRUN and #VMEXIT

   - Fix a bug where KVM corrupt the guest code stream when re-injecting
     a soft interrupt if the guest patched the underlying code after the
     VM-Exit, e.g. when Linux patches code with a temporary INT3

   - Add KVM_X86_SNP_POLICY_BITS to advertise supported SNP policy bits
     to userspace, and extend KVM "support" to all policy bits that
     don't require any actual support from KVM

  x86 (Intel):

   - Use the root role from kvm_mmu_page to construct EPTPs instead of
     the current vCPU state, partly as worthwhile cleanup, but mostly to
     pave the way for tracking per-root TLB flushes, and elide EPT
     flushes on pCPU migration if the root is clean from a previous
     flush

   - Add a few missing nested consistency checks

   - Rip out support for doing "early" consistency checks via hardware
     as the functionality hasn't been used in years and is no longer
     useful in general; replace it with an off-by-default module param
     to WARN if hardware fails a check that KVM does not perform

   - Fix a currently-benign bug where KVM would drop the guest's
     SPEC_CTRL[63:32] on VM-Enter

   - Misc cleanups

   - Overhaul the TDX code to address systemic races where KVM (acting
     on behalf of userspace) could inadvertantly trigger lock contention
     in the TDX-Module; KVM was either working around these in weird,
     ugly ways, or was simply oblivious to them (though even Yan's
     devilish selftests could only break individual VMs, not the host
     kernel)

   - Fix a bug where KVM could corrupt a vCPU's cpu_list when freeing a
     TDX vCPU, if creating said vCPU failed partway through

   - Fix a few sparse warnings (bad annotation, 0 != NULL)

   - Use struct_size() to simplify copying TDX capabilities to userspace

   - Fix a bug where TDX would effectively corrupt user-return MSR
     values if the TDX Module rejects VP.ENTER and thus doesn't clobber
     host MSRs as expected

  Selftests:

   - Fix a math goof in mmu_stress_test when running on a single-CPU
     system/VM

   - Forcefully override ARCH from x86_64 to x86 to play nice with
     specifying ARCH=x86_64 on the command line

   - Extend a bunch of nested VMX to validate nested SVM as well

   - Add support for LA57 in the core VM_MODE_xxx macro, and add a test
     to verify KVM can save/restore nested VMX state when L1 is using
     5-level paging, but L2 is not

   - Clean up the guest paging code in anticipation of sharing the core
     logic for nested EPT and nested NPT

  guest_memfd:

   - Add NUMA mempolicy support for guest_memfd, and clean up a variety
     of rough edges in guest_memfd along the way

   - Define a CLASS to automatically handle get+put when grabbing a
     guest_memfd from a memslot to make it harder to leak references

   - Enhance KVM selftests to make it easer to develop and debug
     selftests like those added for guest_memfd NUMA support, e.g. where
     test and/or KVM bugs often result in hard-to-debug SIGBUS errors

   - Misc cleanups

  Generic:

   - Use the recently-added WQ_PERCPU when creating the per-CPU
     workqueue for irqfd cleanup

   - Fix a goof in the dirty ring documentation

   - Fix choice of target for directed yield across different calls to
     kvm_vcpu_on_spin(); the function was always starting from the first
     vCPU instead of continuing the round-robin search"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (260 commits)
  KVM: arm64: at: Update AF on software walk only if VM has FEAT_HAFDBS
  KVM: arm64: at: Use correct HA bit in TCR_EL2 when regime is EL2
  KVM: arm64: Document KVM_PGTABLE_PROT_{UX,PX}
  KVM: arm64: Fix spelling mistake "Unexpeced" -> "Unexpected"
  KVM: arm64: Add break to default case in kvm_pgtable_stage2_pte_prot()
  KVM: arm64: Add endian casting to kvm_swap_s[12]_desc()
  KVM: arm64: Fix compilation when CONFIG_ARM64_USE_LSE_ATOMICS=n
  KVM: arm64: selftests: Add test for AT emulation
  KVM: arm64: nv: Expose hardware access flag management to NV guests
  KVM: arm64: nv: Implement HW access flag management in stage-2 SW PTW
  KVM: arm64: Implement HW access flag management in stage-1 SW PTW
  KVM: arm64: Propagate PTW errors up to AT emulation
  KVM: arm64: Add helper for swapping guest descriptor
  KVM: arm64: nv: Use pgtable definitions in stage-2 walk
  KVM: arm64: Handle endianness in read helper for emulated PTW
  KVM: arm64: nv: Stop passing vCPU through void ptr in S2 PTW
  KVM: arm64: Call helper for reading descriptors directly
  KVM: arm64: nv: Advertise support for FEAT_XNX
  KVM: arm64: Teach ptdump about FEAT_XNX permissions
  KVM: s390: Use generic VIRT_XFER_TO_GUEST_WORK functions
  ...
2025-12-05 17:01:20 -08:00

710 lines
18 KiB
C

/* SPDX-License-Identifier: GPL-2.0 */
#ifndef __SVM_H
#define __SVM_H
#include <uapi/asm/svm.h>
#include <uapi/asm/kvm.h>
#include <hyperv/hvhdk.h>
/*
* 32-bit intercept words in the VMCB Control Area, starting
* at Byte offset 000h.
*/
enum intercept_words {
INTERCEPT_CR = 0,
INTERCEPT_DR,
INTERCEPT_EXCEPTION,
INTERCEPT_WORD3,
INTERCEPT_WORD4,
INTERCEPT_WORD5,
MAX_INTERCEPT,
};
enum {
/* Byte offset 000h (word 0) */
INTERCEPT_CR0_READ = 0,
INTERCEPT_CR3_READ = 3,
INTERCEPT_CR4_READ = 4,
INTERCEPT_CR8_READ = 8,
INTERCEPT_CR0_WRITE = 16,
INTERCEPT_CR3_WRITE = 16 + 3,
INTERCEPT_CR4_WRITE = 16 + 4,
INTERCEPT_CR8_WRITE = 16 + 8,
/* Byte offset 004h (word 1) */
INTERCEPT_DR0_READ = 32,
INTERCEPT_DR1_READ,
INTERCEPT_DR2_READ,
INTERCEPT_DR3_READ,
INTERCEPT_DR4_READ,
INTERCEPT_DR5_READ,
INTERCEPT_DR6_READ,
INTERCEPT_DR7_READ,
INTERCEPT_DR0_WRITE = 48,
INTERCEPT_DR1_WRITE,
INTERCEPT_DR2_WRITE,
INTERCEPT_DR3_WRITE,
INTERCEPT_DR4_WRITE,
INTERCEPT_DR5_WRITE,
INTERCEPT_DR6_WRITE,
INTERCEPT_DR7_WRITE,
/* Byte offset 008h (word 2) */
INTERCEPT_EXCEPTION_OFFSET = 64,
/* Byte offset 00Ch (word 3) */
INTERCEPT_INTR = 96,
INTERCEPT_NMI,
INTERCEPT_SMI,
INTERCEPT_INIT,
INTERCEPT_VINTR,
INTERCEPT_SELECTIVE_CR0,
INTERCEPT_STORE_IDTR,
INTERCEPT_STORE_GDTR,
INTERCEPT_STORE_LDTR,
INTERCEPT_STORE_TR,
INTERCEPT_LOAD_IDTR,
INTERCEPT_LOAD_GDTR,
INTERCEPT_LOAD_LDTR,
INTERCEPT_LOAD_TR,
INTERCEPT_RDTSC,
INTERCEPT_RDPMC,
INTERCEPT_PUSHF,
INTERCEPT_POPF,
INTERCEPT_CPUID,
INTERCEPT_RSM,
INTERCEPT_IRET,
INTERCEPT_INTn,
INTERCEPT_INVD,
INTERCEPT_PAUSE,
INTERCEPT_HLT,
INTERCEPT_INVLPG,
INTERCEPT_INVLPGA,
INTERCEPT_IOIO_PROT,
INTERCEPT_MSR_PROT,
INTERCEPT_TASK_SWITCH,
INTERCEPT_FERR_FREEZE,
INTERCEPT_SHUTDOWN,
/* Byte offset 010h (word 4) */
INTERCEPT_VMRUN = 128,
INTERCEPT_VMMCALL,
INTERCEPT_VMLOAD,
INTERCEPT_VMSAVE,
INTERCEPT_STGI,
INTERCEPT_CLGI,
INTERCEPT_SKINIT,
INTERCEPT_RDTSCP,
INTERCEPT_ICEBP,
INTERCEPT_WBINVD,
INTERCEPT_MONITOR,
INTERCEPT_MWAIT,
INTERCEPT_MWAIT_COND,
INTERCEPT_XSETBV,
INTERCEPT_RDPRU,
TRAP_EFER_WRITE,
TRAP_CR0_WRITE,
TRAP_CR1_WRITE,
TRAP_CR2_WRITE,
TRAP_CR3_WRITE,
TRAP_CR4_WRITE,
TRAP_CR5_WRITE,
TRAP_CR6_WRITE,
TRAP_CR7_WRITE,
TRAP_CR8_WRITE,
/* Byte offset 014h (word 5) */
INTERCEPT_INVLPGB = 160,
INTERCEPT_INVLPGB_ILLEGAL,
INTERCEPT_INVPCID,
INTERCEPT_MCOMMIT,
INTERCEPT_TLBSYNC,
INTERCEPT_BUSLOCK,
INTERCEPT_IDLE_HLT = 166,
};
struct __attribute__ ((__packed__)) vmcb_control_area {
u32 intercepts[MAX_INTERCEPT];
u32 reserved_1[15 - MAX_INTERCEPT];
u16 pause_filter_thresh;
u16 pause_filter_count;
u64 iopm_base_pa;
u64 msrpm_base_pa;
u64 tsc_offset;
u32 asid;
u8 tlb_ctl;
u8 reserved_2[3];
u32 int_ctl;
u32 int_vector;
u32 int_state;
u8 reserved_3[4];
u32 exit_code;
u32 exit_code_hi;
u64 exit_info_1;
u64 exit_info_2;
u32 exit_int_info;
u32 exit_int_info_err;
u64 nested_ctl;
u64 avic_vapic_bar;
u64 ghcb_gpa;
u32 event_inj;
u32 event_inj_err;
u64 nested_cr3;
u64 virt_ext;
u32 clean;
u32 reserved_5;
u64 next_rip;
u8 insn_len;
u8 insn_bytes[15];
u64 avic_backing_page; /* Offset 0xe0 */
u8 reserved_6[8]; /* Offset 0xe8 */
u64 avic_logical_id; /* Offset 0xf0 */
u64 avic_physical_id; /* Offset 0xf8 */
u8 reserved_7[8];
u64 vmsa_pa; /* Used for an SEV-ES guest */
u8 reserved_8[16];
u16 bus_lock_counter; /* Offset 0x120 */
u8 reserved_9[22];
u64 allowed_sev_features; /* Offset 0x138 */
u64 guest_sev_features; /* Offset 0x140 */
u8 reserved_10[664];
/*
* Offset 0x3e0, 32 bytes reserved
* for use by hypervisor/software.
*/
union {
struct hv_vmcb_enlightenments hv_enlightenments;
u8 reserved_sw[32];
};
};
#define TLB_CONTROL_DO_NOTHING 0
#define TLB_CONTROL_FLUSH_ALL_ASID 1
#define TLB_CONTROL_FLUSH_ASID 3
#define TLB_CONTROL_FLUSH_ASID_LOCAL 7
#define V_TPR_MASK 0x0f
#define V_IRQ_SHIFT 8
#define V_IRQ_MASK (1 << V_IRQ_SHIFT)
#define V_GIF_SHIFT 9
#define V_GIF_MASK (1 << V_GIF_SHIFT)
#define V_NMI_PENDING_SHIFT 11
#define V_NMI_PENDING_MASK (1 << V_NMI_PENDING_SHIFT)
#define V_NMI_BLOCKING_SHIFT 12
#define V_NMI_BLOCKING_MASK (1 << V_NMI_BLOCKING_SHIFT)
#define V_INTR_PRIO_SHIFT 16
#define V_INTR_PRIO_MASK (0x0f << V_INTR_PRIO_SHIFT)
#define V_IGN_TPR_SHIFT 20
#define V_IGN_TPR_MASK (1 << V_IGN_TPR_SHIFT)
#define V_IRQ_INJECTION_BITS_MASK (V_IRQ_MASK | V_INTR_PRIO_MASK | V_IGN_TPR_MASK)
#define V_INTR_MASKING_SHIFT 24
#define V_INTR_MASKING_MASK (1 << V_INTR_MASKING_SHIFT)
#define V_GIF_ENABLE_SHIFT 25
#define V_GIF_ENABLE_MASK (1 << V_GIF_ENABLE_SHIFT)
#define V_NMI_ENABLE_SHIFT 26
#define V_NMI_ENABLE_MASK (1 << V_NMI_ENABLE_SHIFT)
#define AVIC_ENABLE_SHIFT 31
#define AVIC_ENABLE_MASK (1 << AVIC_ENABLE_SHIFT)
#define X2APIC_MODE_SHIFT 30
#define X2APIC_MODE_MASK (1 << X2APIC_MODE_SHIFT)
#define LBR_CTL_ENABLE_MASK BIT_ULL(0)
#define VIRTUAL_VMLOAD_VMSAVE_ENABLE_MASK BIT_ULL(1)
#define SVM_INTERRUPT_SHADOW_MASK BIT_ULL(0)
#define SVM_GUEST_INTERRUPT_MASK BIT_ULL(1)
#define SVM_IOIO_STR_SHIFT 2
#define SVM_IOIO_REP_SHIFT 3
#define SVM_IOIO_SIZE_SHIFT 4
#define SVM_IOIO_ASIZE_SHIFT 7
#define SVM_IOIO_TYPE_MASK 1
#define SVM_IOIO_STR_MASK (1 << SVM_IOIO_STR_SHIFT)
#define SVM_IOIO_REP_MASK (1 << SVM_IOIO_REP_SHIFT)
#define SVM_IOIO_SIZE_MASK (7 << SVM_IOIO_SIZE_SHIFT)
#define SVM_IOIO_ASIZE_MASK (7 << SVM_IOIO_ASIZE_SHIFT)
#define SVM_NESTED_CTL_NP_ENABLE BIT(0)
#define SVM_NESTED_CTL_SEV_ENABLE BIT(1)
#define SVM_NESTED_CTL_SEV_ES_ENABLE BIT(2)
#define SVM_TSC_RATIO_RSVD 0xffffff0000000000ULL
#define SVM_TSC_RATIO_MIN 0x0000000000000001ULL
#define SVM_TSC_RATIO_MAX 0x000000ffffffffffULL
#define SVM_TSC_RATIO_DEFAULT 0x0100000000ULL
/* AVIC */
#define AVIC_LOGICAL_ID_ENTRY_GUEST_PHYSICAL_ID_MASK (0xFFULL)
#define AVIC_LOGICAL_ID_ENTRY_VALID_BIT 31
#define AVIC_LOGICAL_ID_ENTRY_VALID_MASK (1 << 31)
/*
* GA_LOG_INTR is a synthetic flag that's never propagated to hardware-visible
* tables. GA_LOG_INTR is set if the vCPU needs device posted IRQs to generate
* GA log interrupts to wake the vCPU (because it's blocking or about to block).
*/
#define AVIC_PHYSICAL_ID_ENTRY_GA_LOG_INTR BIT_ULL(61)
#define AVIC_PHYSICAL_ID_ENTRY_HOST_PHYSICAL_ID_MASK GENMASK_ULL(11, 0)
#define AVIC_PHYSICAL_ID_ENTRY_BACKING_PAGE_MASK GENMASK_ULL(51, 12)
#define AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK (1ULL << 62)
#define AVIC_PHYSICAL_ID_ENTRY_VALID_MASK (1ULL << 63)
#define AVIC_PHYSICAL_ID_TABLE_SIZE_MASK (0xFFULL)
#define AVIC_DOORBELL_PHYSICAL_ID_MASK GENMASK_ULL(11, 0)
#define AVIC_UNACCEL_ACCESS_WRITE_MASK 1
#define AVIC_UNACCEL_ACCESS_OFFSET_MASK 0xFF0
#define AVIC_UNACCEL_ACCESS_VECTOR_MASK 0xFFFFFFFF
enum avic_ipi_failure_cause {
AVIC_IPI_FAILURE_INVALID_INT_TYPE,
AVIC_IPI_FAILURE_TARGET_NOT_RUNNING,
AVIC_IPI_FAILURE_INVALID_TARGET,
AVIC_IPI_FAILURE_INVALID_BACKING_PAGE,
AVIC_IPI_FAILURE_INVALID_IPI_VECTOR,
};
#define AVIC_PHYSICAL_MAX_INDEX_MASK GENMASK_ULL(11, 0)
/*
* For AVIC, the max index allowed for physical APIC ID table is 0xfe (254), as
* 0xff is a broadcast to all CPUs, i.e. can't be targeted individually.
*/
#define AVIC_MAX_PHYSICAL_ID 0XFEULL
/*
* For x2AVIC, the max index allowed for physical APIC ID table is 0x1ff (511).
* With X86_FEATURE_X2AVIC_EXT, the max index is increased to 0xfff (4095).
*/
#define X2AVIC_MAX_PHYSICAL_ID 0x1FFUL
#define X2AVIC_4K_MAX_PHYSICAL_ID 0xFFFUL
static_assert((AVIC_MAX_PHYSICAL_ID & AVIC_PHYSICAL_MAX_INDEX_MASK) == AVIC_MAX_PHYSICAL_ID);
static_assert((X2AVIC_MAX_PHYSICAL_ID & AVIC_PHYSICAL_MAX_INDEX_MASK) == X2AVIC_MAX_PHYSICAL_ID);
static_assert((X2AVIC_4K_MAX_PHYSICAL_ID & AVIC_PHYSICAL_MAX_INDEX_MASK) == X2AVIC_4K_MAX_PHYSICAL_ID);
#define SVM_SEV_FEAT_SNP_ACTIVE BIT(0)
#define SVM_SEV_FEAT_RESTRICTED_INJECTION BIT(3)
#define SVM_SEV_FEAT_ALTERNATE_INJECTION BIT(4)
#define SVM_SEV_FEAT_DEBUG_SWAP BIT(5)
#define SVM_SEV_FEAT_SECURE_TSC BIT(9)
#define VMCB_ALLOWED_SEV_FEATURES_VALID BIT_ULL(63)
struct vmcb_seg {
u16 selector;
u16 attrib;
u32 limit;
u64 base;
} __packed;
/* Save area definition for legacy and SEV-MEM guests */
struct vmcb_save_area {
struct vmcb_seg es;
struct vmcb_seg cs;
struct vmcb_seg ss;
struct vmcb_seg ds;
struct vmcb_seg fs;
struct vmcb_seg gs;
struct vmcb_seg gdtr;
struct vmcb_seg ldtr;
struct vmcb_seg idtr;
struct vmcb_seg tr;
/* Reserved fields are named following their struct offset */
u8 reserved_0xa0[42];
u8 vmpl;
u8 cpl;
u8 reserved_0xcc[4];
u64 efer;
u8 reserved_0xd8[112];
u64 cr4;
u64 cr3;
u64 cr0;
u64 dr7;
u64 dr6;
u64 rflags;
u64 rip;
u8 reserved_0x180[88];
u64 rsp;
u64 s_cet;
u64 ssp;
u64 isst_addr;
u64 rax;
u64 star;
u64 lstar;
u64 cstar;
u64 sfmask;
u64 kernel_gs_base;
u64 sysenter_cs;
u64 sysenter_esp;
u64 sysenter_eip;
u64 cr2;
u8 reserved_0x248[32];
u64 g_pat;
u64 dbgctl;
u64 br_from;
u64 br_to;
u64 last_excp_from;
u64 last_excp_to;
u8 reserved_0x298[72];
u64 spec_ctrl; /* Guest version of SPEC_CTRL at 0x2E0 */
} __packed;
/* Save area definition for SEV-ES and SEV-SNP guests */
struct sev_es_save_area {
struct vmcb_seg es;
struct vmcb_seg cs;
struct vmcb_seg ss;
struct vmcb_seg ds;
struct vmcb_seg fs;
struct vmcb_seg gs;
struct vmcb_seg gdtr;
struct vmcb_seg ldtr;
struct vmcb_seg idtr;
struct vmcb_seg tr;
u64 pl0_ssp;
u64 pl1_ssp;
u64 pl2_ssp;
u64 pl3_ssp;
u64 u_cet;
u8 reserved_0xc8[2];
u8 vmpl;
u8 cpl;
u8 reserved_0xcc[4];
u64 efer;
u8 reserved_0xd8[104];
u64 xss;
u64 cr4;
u64 cr3;
u64 cr0;
u64 dr7;
u64 dr6;
u64 rflags;
u64 rip;
u64 dr0;
u64 dr1;
u64 dr2;
u64 dr3;
u64 dr0_addr_mask;
u64 dr1_addr_mask;
u64 dr2_addr_mask;
u64 dr3_addr_mask;
u8 reserved_0x1c0[24];
u64 rsp;
u64 s_cet;
u64 ssp;
u64 isst_addr;
u64 rax;
u64 star;
u64 lstar;
u64 cstar;
u64 sfmask;
u64 kernel_gs_base;
u64 sysenter_cs;
u64 sysenter_esp;
u64 sysenter_eip;
u64 cr2;
u8 reserved_0x248[32];
u64 g_pat;
u64 dbgctl;
u64 br_from;
u64 br_to;
u64 last_excp_from;
u64 last_excp_to;
u8 reserved_0x298[80];
u32 pkru;
u32 tsc_aux;
u64 tsc_scale;
u64 tsc_offset;
u8 reserved_0x300[8];
u64 rcx;
u64 rdx;
u64 rbx;
u64 reserved_0x320; /* rsp already available at 0x01d8 */
u64 rbp;
u64 rsi;
u64 rdi;
u64 r8;
u64 r9;
u64 r10;
u64 r11;
u64 r12;
u64 r13;
u64 r14;
u64 r15;
u8 reserved_0x380[16];
u64 guest_exit_info_1;
u64 guest_exit_info_2;
u64 guest_exit_int_info;
u64 guest_nrip;
u64 sev_features;
u64 vintr_ctrl;
u64 guest_exit_code;
u64 virtual_tom;
u64 tlb_id;
u64 pcpu_id;
u64 event_inj;
u64 xcr0;
u8 reserved_0x3f0[16];
/* Floating point area */
u64 x87_dp;
u32 mxcsr;
u16 x87_ftw;
u16 x87_fsw;
u16 x87_fcw;
u16 x87_fop;
u16 x87_ds;
u16 x87_cs;
u64 x87_rip;
u8 fpreg_x87[80];
u8 fpreg_xmm[256];
u8 fpreg_ymm[256];
} __packed;
struct ghcb_save_area {
u8 reserved_0x0[203];
u8 cpl;
u8 reserved_0xcc[116];
u64 xss;
u8 reserved_0x148[24];
u64 dr7;
u8 reserved_0x168[16];
u64 rip;
u8 reserved_0x180[88];
u64 rsp;
u8 reserved_0x1e0[24];
u64 rax;
u8 reserved_0x200[264];
u64 rcx;
u64 rdx;
u64 rbx;
u8 reserved_0x320[8];
u64 rbp;
u64 rsi;
u64 rdi;
u64 r8;
u64 r9;
u64 r10;
u64 r11;
u64 r12;
u64 r13;
u64 r14;
u64 r15;
u8 reserved_0x380[16];
u64 sw_exit_code;
u64 sw_exit_info_1;
u64 sw_exit_info_2;
u64 sw_scratch;
u8 reserved_0x3b0[56];
u64 xcr0;
u8 valid_bitmap[16];
u64 x87_state_gpa;
} __packed;
#define GHCB_SHARED_BUF_SIZE 2032
struct ghcb {
struct ghcb_save_area save;
u8 reserved_save[2048 - sizeof(struct ghcb_save_area)];
u8 shared_buffer[GHCB_SHARED_BUF_SIZE];
u8 reserved_0xff0[10];
u16 protocol_version; /* negotiated SEV-ES/GHCB protocol version */
u32 ghcb_usage;
} __packed;
struct vmcb {
struct vmcb_control_area control;
union {
struct vmcb_save_area save;
/*
* For SEV-ES VMs, the save area in the VMCB is used only to
* save/load host state. Guest state resides in a separate
* page, the aptly named VM Save Area (VMSA), that is encrypted
* with the guest's private key.
*/
struct sev_es_save_area host_sev_es_save;
};
} __packed;
#define EXPECTED_VMCB_SAVE_AREA_SIZE 744
#define EXPECTED_GHCB_SAVE_AREA_SIZE 1032
#define EXPECTED_SEV_ES_SAVE_AREA_SIZE 1648
#define EXPECTED_VMCB_CONTROL_AREA_SIZE 1024
#define EXPECTED_GHCB_SIZE PAGE_SIZE
#define BUILD_BUG_RESERVED_OFFSET(x, y) \
ASSERT_STRUCT_OFFSET(struct x, reserved ## _ ## y, y)
static inline void __unused_size_checks(void)
{
BUILD_BUG_ON(sizeof(struct vmcb_save_area) != EXPECTED_VMCB_SAVE_AREA_SIZE);
BUILD_BUG_ON(sizeof(struct ghcb_save_area) != EXPECTED_GHCB_SAVE_AREA_SIZE);
BUILD_BUG_ON(sizeof(struct sev_es_save_area) != EXPECTED_SEV_ES_SAVE_AREA_SIZE);
BUILD_BUG_ON(sizeof(struct vmcb_control_area) != EXPECTED_VMCB_CONTROL_AREA_SIZE);
BUILD_BUG_ON(offsetof(struct vmcb, save) != EXPECTED_VMCB_CONTROL_AREA_SIZE);
BUILD_BUG_ON(sizeof(struct ghcb) != EXPECTED_GHCB_SIZE);
/* Check offsets of reserved fields */
BUILD_BUG_RESERVED_OFFSET(vmcb_save_area, 0xa0);
BUILD_BUG_RESERVED_OFFSET(vmcb_save_area, 0xcc);
BUILD_BUG_RESERVED_OFFSET(vmcb_save_area, 0xd8);
BUILD_BUG_RESERVED_OFFSET(vmcb_save_area, 0x180);
BUILD_BUG_RESERVED_OFFSET(vmcb_save_area, 0x248);
BUILD_BUG_RESERVED_OFFSET(vmcb_save_area, 0x298);
BUILD_BUG_RESERVED_OFFSET(sev_es_save_area, 0xc8);
BUILD_BUG_RESERVED_OFFSET(sev_es_save_area, 0xcc);
BUILD_BUG_RESERVED_OFFSET(sev_es_save_area, 0xd8);
BUILD_BUG_RESERVED_OFFSET(sev_es_save_area, 0x1c0);
BUILD_BUG_RESERVED_OFFSET(sev_es_save_area, 0x248);
BUILD_BUG_RESERVED_OFFSET(sev_es_save_area, 0x298);
BUILD_BUG_RESERVED_OFFSET(sev_es_save_area, 0x300);
BUILD_BUG_RESERVED_OFFSET(sev_es_save_area, 0x320);
BUILD_BUG_RESERVED_OFFSET(sev_es_save_area, 0x380);
BUILD_BUG_RESERVED_OFFSET(sev_es_save_area, 0x3f0);
BUILD_BUG_RESERVED_OFFSET(ghcb_save_area, 0x0);
BUILD_BUG_RESERVED_OFFSET(ghcb_save_area, 0xcc);
BUILD_BUG_RESERVED_OFFSET(ghcb_save_area, 0x148);
BUILD_BUG_RESERVED_OFFSET(ghcb_save_area, 0x168);
BUILD_BUG_RESERVED_OFFSET(ghcb_save_area, 0x180);
BUILD_BUG_RESERVED_OFFSET(ghcb_save_area, 0x1e0);
BUILD_BUG_RESERVED_OFFSET(ghcb_save_area, 0x200);
BUILD_BUG_RESERVED_OFFSET(ghcb_save_area, 0x320);
BUILD_BUG_RESERVED_OFFSET(ghcb_save_area, 0x380);
BUILD_BUG_RESERVED_OFFSET(ghcb_save_area, 0x3b0);
BUILD_BUG_RESERVED_OFFSET(ghcb, 0xff0);
}
#define SVM_CPUID_FUNC 0x8000000a
#define SVM_SELECTOR_S_SHIFT 4
#define SVM_SELECTOR_DPL_SHIFT 5
#define SVM_SELECTOR_P_SHIFT 7
#define SVM_SELECTOR_AVL_SHIFT 8
#define SVM_SELECTOR_L_SHIFT 9
#define SVM_SELECTOR_DB_SHIFT 10
#define SVM_SELECTOR_G_SHIFT 11
#define SVM_SELECTOR_TYPE_MASK (0xf)
#define SVM_SELECTOR_S_MASK (1 << SVM_SELECTOR_S_SHIFT)
#define SVM_SELECTOR_DPL_MASK (3 << SVM_SELECTOR_DPL_SHIFT)
#define SVM_SELECTOR_P_MASK (1 << SVM_SELECTOR_P_SHIFT)
#define SVM_SELECTOR_AVL_MASK (1 << SVM_SELECTOR_AVL_SHIFT)
#define SVM_SELECTOR_L_MASK (1 << SVM_SELECTOR_L_SHIFT)
#define SVM_SELECTOR_DB_MASK (1 << SVM_SELECTOR_DB_SHIFT)
#define SVM_SELECTOR_G_MASK (1 << SVM_SELECTOR_G_SHIFT)
#define SVM_SELECTOR_WRITE_MASK (1 << 1)
#define SVM_SELECTOR_READ_MASK SVM_SELECTOR_WRITE_MASK
#define SVM_SELECTOR_CODE_MASK (1 << 3)
#define SVM_EVTINJ_VEC_MASK 0xff
#define SVM_EVTINJ_TYPE_SHIFT 8
#define SVM_EVTINJ_TYPE_MASK (7 << SVM_EVTINJ_TYPE_SHIFT)
#define SVM_EVTINJ_TYPE_INTR (0 << SVM_EVTINJ_TYPE_SHIFT)
#define SVM_EVTINJ_TYPE_NMI (2 << SVM_EVTINJ_TYPE_SHIFT)
#define SVM_EVTINJ_TYPE_EXEPT (3 << SVM_EVTINJ_TYPE_SHIFT)
#define SVM_EVTINJ_TYPE_SOFT (4 << SVM_EVTINJ_TYPE_SHIFT)
#define SVM_EVTINJ_VALID (1 << 31)
#define SVM_EVTINJ_VALID_ERR (1 << 11)
#define SVM_EXITINTINFO_VEC_MASK SVM_EVTINJ_VEC_MASK
#define SVM_EXITINTINFO_TYPE_MASK SVM_EVTINJ_TYPE_MASK
#define SVM_EXITINTINFO_TYPE_INTR SVM_EVTINJ_TYPE_INTR
#define SVM_EXITINTINFO_TYPE_NMI SVM_EVTINJ_TYPE_NMI
#define SVM_EXITINTINFO_TYPE_EXEPT SVM_EVTINJ_TYPE_EXEPT
#define SVM_EXITINTINFO_TYPE_SOFT SVM_EVTINJ_TYPE_SOFT
#define SVM_EXITINTINFO_VALID SVM_EVTINJ_VALID
#define SVM_EXITINTINFO_VALID_ERR SVM_EVTINJ_VALID_ERR
#define SVM_EXITINFOSHIFT_TS_REASON_IRET 36
#define SVM_EXITINFOSHIFT_TS_REASON_JMP 38
#define SVM_EXITINFOSHIFT_TS_HAS_ERROR_CODE 44
#define SVM_EXITINFO_REG_MASK 0x0F
#define SVM_CR0_SELECTIVE_MASK (X86_CR0_TS | X86_CR0_MP)
/* GHCB Accessor functions */
#define GHCB_BITMAP_IDX(field) \
(offsetof(struct ghcb_save_area, field) / sizeof(u64))
#define DEFINE_GHCB_ACCESSORS(field) \
static __always_inline bool ghcb_##field##_is_valid(const struct ghcb *ghcb) \
{ \
return test_bit(GHCB_BITMAP_IDX(field), \
(unsigned long *)&ghcb->save.valid_bitmap); \
} \
\
static __always_inline u64 ghcb_get_##field(struct ghcb *ghcb) \
{ \
return ghcb->save.field; \
} \
\
static __always_inline u64 ghcb_get_##field##_if_valid(struct ghcb *ghcb) \
{ \
return ghcb_##field##_is_valid(ghcb) ? ghcb->save.field : 0; \
} \
\
static __always_inline void ghcb_set_##field(struct ghcb *ghcb, u64 value) \
{ \
__set_bit(GHCB_BITMAP_IDX(field), \
(unsigned long *)&ghcb->save.valid_bitmap); \
ghcb->save.field = value; \
}
DEFINE_GHCB_ACCESSORS(cpl)
DEFINE_GHCB_ACCESSORS(rip)
DEFINE_GHCB_ACCESSORS(rsp)
DEFINE_GHCB_ACCESSORS(rax)
DEFINE_GHCB_ACCESSORS(rcx)
DEFINE_GHCB_ACCESSORS(rdx)
DEFINE_GHCB_ACCESSORS(rbx)
DEFINE_GHCB_ACCESSORS(rbp)
DEFINE_GHCB_ACCESSORS(rsi)
DEFINE_GHCB_ACCESSORS(rdi)
DEFINE_GHCB_ACCESSORS(r8)
DEFINE_GHCB_ACCESSORS(r9)
DEFINE_GHCB_ACCESSORS(r10)
DEFINE_GHCB_ACCESSORS(r11)
DEFINE_GHCB_ACCESSORS(r12)
DEFINE_GHCB_ACCESSORS(r13)
DEFINE_GHCB_ACCESSORS(r14)
DEFINE_GHCB_ACCESSORS(r15)
DEFINE_GHCB_ACCESSORS(sw_exit_code)
DEFINE_GHCB_ACCESSORS(sw_exit_info_1)
DEFINE_GHCB_ACCESSORS(sw_exit_info_2)
DEFINE_GHCB_ACCESSORS(sw_scratch)
DEFINE_GHCB_ACCESSORS(xcr0)
DEFINE_GHCB_ACCESSORS(xss)
#endif