diff --git a/share/security/advisories/FreeBSD-SA-18:03.speculative_execution.asc b/share/security/advisories/FreeBSD-SA-18:03.speculative_execution.asc
new file mode 100644
index 0000000000..bad3639944
--- /dev/null
+++ b/share/security/advisories/FreeBSD-SA-18:03.speculative_execution.asc
@@ -0,0 +1,206 @@
+-----BEGIN PGP SIGNED MESSAGE-----
+Hash: SHA512
+
+=============================================================================
+FreeBSD-SA-18:03.speculative_execution Security Advisory
+ The FreeBSD Project
+
+Topic: Speculative Execution Vulnerabilities
+
+Category: core
+Module: kernel
+Announced: 2018-03-14
+Credits: Jann Horn (Google Project Zero); Werner Haas, Thomas
+ Prescher (Cyberus Technology); Daniel Gruss, Moritz Lipp,
+ Stefan Mangard, Michael Schwarz (Graz University of
+ Technology); Paul Kocher; Daniel Genkin (University of
+ Pennsylvania and University of Maryland), Mike Hamburg
+ (Rambus); Yuval Yarom (University of Adelaide and Data6)
+Affects: All supported versions of FreeBSD.
+Corrected: 2018-02-17 18:00:01 UTC (stable/11, 11.1-STABLE)
+ 2018-03-14 04:00:00 UTC (releng/11.1, 11.1-RELEASE-p8)
+CVE Name: CVE-2017-5715, CVE-2017-5754
+
+Special Note: Speculative execution vulnerability mitigation is a work
+ in progress. This advisory addresses the most significant
+ issues for FreeBSD 11.1 on amd64 CPUs. We expect to update
+ this advisory to include 10.x for amd64 CPUs. Future FreeBSD
+ releases will address this issue on i386 and other CPUs.
+ freebsd-update will include changes on i386 as part of this
+ update due to common code changes shared between amd64 and
+ i386, however it contains no functional changes for i386 (in
+ particular, it does not mitigate the issue on i386).
+
+For general information regarding FreeBSD Security Advisories,
+including descriptions of the fields above, security branches, and the
+following sections, please visit .
+
+I. Background
+
+Many modern processors have implementation issues that allow unprivileged
+attackers to bypass user-kernel or inter-process memory access restrictions
+by exploiting speculative execution and shared resources (for example,
+caches).
+
+II. Problem Description
+
+A number of issues relating to speculative execution were found last year
+and publicly announced January 3rd. Two of these, known as Meltdown and
+Spectre V2, are addressed here.
+
+CVE-2017-5754 (Meltdown)
+- ------------------------
+
+This issue relies on an affected CPU speculatively executing instructions
+beyond a faulting instruction. When this happens, changes to architectural
+state are not committed, but observable changes may be left in micro-
+architectural state (for example, cache). This may be used to infer
+privileged data.
+
+CVE-2017-5715 (Spectre V2)
+- --------------------------
+
+Spectre V2 uses branch target injection to speculatively execute kernel code
+at an address under the control of an attacker.
+
+III. Impact
+
+An attacker may be able to read secret data from the kernel or from a
+process when executing untrusted code (for example, in a web browser).
+
+IV. Workaround
+
+No workaround is available.
+
+V. Solution
+
+Perform one of the following:
+
+1) Upgrade your vulnerable system to a supported FreeBSD stable or
+release / security branch (releng) dated after the correction date,
+and reboot.
+
+2) To update your vulnerable system via a binary patch:
+
+Systems running a RELEASE version of FreeBSD on the i386 or amd64
+platforms can be updated via the freebsd-update(8) utility, followed
+by a reboot into the new kernel:
+
+# freebsd-update fetch
+# freebsd-update install
+# shutdown -r now
+
+3) To update your vulnerable system via a source code patch:
+
+The following patches have been verified to apply to the applicable
+FreeBSD release branches.
+
+a) Download the relevant patch from the location below, and verify the
+detached PGP signature using your PGP utility.
+
+[FreeBSD 11.1]
+# fetch https://security.FreeBSD.org/patches/SA-18:03/speculative_execution-amd64-11.patch
+# fetch https://security.FreeBSD.org/patches/SA-18:03/speculative_execution-amd64-11.patch.asc
+# gpg --verify speculative_execution-amd64-11.patch.asc
+
+b) Apply the patch. Execute the following commands as root:
+
+# cd /usr/src
+# patch < /path/to/patch
+
+c) Recompile your kernel as described in
+ and reboot the
+system.
+
+VI. Correction details
+
+CVE-2017-5754 (Meltdown)
+- ------------------------
+
+The mitigation is known as Page Table Isolation (PTI). PTI largely separates
+kernel and user mode page tables, so that even during speculative execution
+most of the kernel's data is unmapped and not accessible.
+
+A demonstration of the Meltdown vulnerability is available at
+https://github.com/dag-erling/meltdown. A positive result is definitive
+(that is, the vulnerability exists with certainty). A negative result
+indicates either that the CPU is not affected, or that the test is not
+capable of demonstrating the issue on the CPU (and may need to be modified).
+
+A patched kernel will automatically enable PTI on Intel CPUs. The status can
+be checked via the vm.pmap.pti sysctl:
+
+# sysctl vm.pmap.pti
+vm.pmap.pti: 1
+
+The default setting can be overridden by setting the loader tunable
+vm.pmap.pti to 1 or 0 in /boot/loader.conf. This setting takes effect only
+at boot.
+
+PTI introduces a performance regression. The observed performance loss is
+significant in microbenchmarks of system call overhead, but is much smaller
+for many real workloads.
+
+CVE-2017-5715 (Spectre V2)
+- --------------------------
+
+There are two common mitigations for Spectre V2. This patch includes a
+mitigation using Indirect Branch Restricted Speculation, a feature available
+via a microcode update from processor manufacturers. The alternate
+mitigation, Retpoline, is a feature available in newer compilers. The
+feasibility of applying Retpoline to stable branches and/or releases is under
+investigation.
+
+The patch includes the IBRS mitigation for Spectre V2. To use the mitigation
+the system must have an updated microcode; with older microcode a patched
+kernel will function without the mitigation.
+
+IBRS can be disabled via the hw.ibrs_disable sysctl (and tunable), and the
+status can be checked via the hw.ibrs_active sysctl. IBRS may be enabled or
+disabled at runtime. Additional detail on microcode updates will follow.
+
+The following list contains the correction revision numbers for each
+affected branch.
+
+Branch/path Revision
+- -------------------------------------------------------------------------
+stable/11/ r329462
+releng/11.1/ r330908
+- -------------------------------------------------------------------------
+
+To see which files were modified by a particular revision, run the
+following command, replacing NNNNNN with the revision number, on a
+machine with Subversion installed:
+
+# svn diff -cNNNNNN --summarize svn://svn.freebsd.org/base
+
+Or visit the following URL, replacing NNNNNN with the revision number:
+
+
+
+VII. References
+
+
+
+
+
+The latest revision of this advisory is available at
+
+-----BEGIN PGP SIGNATURE-----
+
+iQKTBAEBCgB9FiEE/A6HiuWv54gCjWNV05eS9J6n5cIFAlqon0RfFIAAAAAALgAo
+aXNzdWVyLWZwckBub3RhdGlvbnMub3BlbnBncC5maWZ0aGhvcnNlbWFuLm5ldEZD
+MEU4NzhBRTVBRkU3ODgwMjhENjM1NUQzOTc5MkY0OUVBN0U1QzIACgkQ05eS9J6n
+5cKORw/+Lc5lxLhDgU1rQ0JF6sb2b80Ly5k+rJLXFWBvmEQt0uVyVkF4TMJ99M04
+bcmrLbT4Pl0Csh/iEYvZQ4el12KvPDApHszsLTBgChD+KfCLvCZvBZzasgDWGD0E
+JhL4eIX0wjJ4oGGsT+TAqkmwXyAMJgWW/ZgZPFVXocylZTL3fV4g52VdG1Jnd2yu
+hnkViH2kVlVJqXX9AHlenIUfEmUiRUGrMh5oPPpFYDDmfJ+enZ8QLxfZtOKIliD7
+u+2GP8V/bvaErkxqF5wwobybrBOMXpq9Y/fWw0EH/om7myevj/oORqK+ZmGZ17bl
+IRbdWxgjc1hN2TIMVn9q9xX6i0I0wSPwbpLYagKnSnE8WNVUTZUteaj1GKGTG1rj
+DFH2zOLlbRr/IXUFldM9b6VbZX6G5Ijxwy1DJzB/0KL5ZTbAReUR0pqHR7xpulbJ
+eDv8SKCwYiUpMuwPOXNdVlVLZSsH5/9A0cyjH3+E+eIhM8qyxw7iRFwA0DxnGVkr
+tkMo51Vl3Gl7JFFimGKljsE9mBh00m8B0WYJwknvfhdehO4WripcwI7/V5zL6cwj
+s018kaW4Xm77LOz6P1iN8nbcjZ9gN2AsPYUYYZqJxjCcZ7r489Hg9BhbDf0QtC0R
+gnwZWiZ/KuAy0C6vaHljsm0xPEM5nBz/yScFXDbuhEdmEgBBD6w=
+=fqrI
+-----END PGP SIGNATURE-----
diff --git a/share/security/patches/SA-18:03/speculative_execution-amd64-11.patch b/share/security/patches/SA-18:03/speculative_execution-amd64-11.patch
new file mode 100644
index 0000000000..466edbcc15
--- /dev/null
+++ b/share/security/patches/SA-18:03/speculative_execution-amd64-11.patch
@@ -0,0 +1,4618 @@
+--- sys/amd64/amd64/apic_vector.S.orig
++++ sys/amd64/amd64/apic_vector.S
+@@ -2,7 +2,13 @@
+ * Copyright (c) 1989, 1990 William F. Jolitz.
+ * Copyright (c) 1990 The Regents of the University of California.
+ * All rights reserved.
++ * Copyright (c) 2014-2018 The FreeBSD Foundation
++ * All rights reserved.
+ *
++ * Portions of this software were developed by
++ * Konstantin Belousov under sponsorship from
++ * the FreeBSD Foundation.
++ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+@@ -38,12 +44,12 @@
+
+ #include "opt_smp.h"
+
++#include "assym.s"
++
+ #include
+ #include
+ #include
+
+-#include "assym.s"
+-
+ #ifdef SMP
+ #define LK lock ;
+ #else
+@@ -73,30 +79,28 @@
+ * translates that into a vector, and passes the vector to the
+ * lapic_handle_intr() function.
+ */
+-#define ISR_VEC(index, vec_name) \
+- .text ; \
+- SUPERALIGN_TEXT ; \
+-IDTVEC(vec_name) ; \
+- PUSH_FRAME ; \
+- FAKE_MCOUNT(TF_RIP(%rsp)) ; \
+- cmpl $0,x2apic_mode ; \
+- je 1f ; \
+- movl $(MSR_APIC_ISR0 + index),%ecx ; \
+- rdmsr ; \
+- jmp 2f ; \
+-1: ; \
+- movq lapic_map, %rdx ; /* pointer to local APIC */ \
+- movl LA_ISR + 16 * (index)(%rdx), %eax ; /* load ISR */ \
+-2: ; \
+- bsrl %eax, %eax ; /* index of highest set bit in ISR */ \
+- jz 3f ; \
+- addl $(32 * index),%eax ; \
+- movq %rsp, %rsi ; \
+- movl %eax, %edi ; /* pass the IRQ */ \
+- call lapic_handle_intr ; \
+-3: ; \
+- MEXITCOUNT ; \
++ .macro ISR_VEC index, vec_name
++ INTR_HANDLER \vec_name
++ FAKE_MCOUNT(TF_RIP(%rsp))
++ cmpl $0,x2apic_mode
++ je 1f
++ movl $(MSR_APIC_ISR0 + \index),%ecx
++ rdmsr
++ jmp 2f
++1:
++ movq lapic_map, %rdx /* pointer to local APIC */
++ movl LA_ISR + 16 * (\index)(%rdx), %eax /* load ISR */
++2:
++ bsrl %eax, %eax /* index of highest set bit in ISR */
++ jz 3f
++ addl $(32 * \index),%eax
++ movq %rsp, %rsi
++ movl %eax, %edi /* pass the IRQ */
++ call lapic_handle_intr
++3:
++ MEXITCOUNT
+ jmp doreti
++ .endm
+
+ /*
+ * Handle "spurious INTerrupts".
+@@ -108,26 +112,21 @@
+ .text
+ SUPERALIGN_TEXT
+ IDTVEC(spuriousint)
+-
+ /* No EOI cycle used here */
+-
+ jmp doreti_iret
+
+- ISR_VEC(1, apic_isr1)
+- ISR_VEC(2, apic_isr2)
+- ISR_VEC(3, apic_isr3)
+- ISR_VEC(4, apic_isr4)
+- ISR_VEC(5, apic_isr5)
+- ISR_VEC(6, apic_isr6)
+- ISR_VEC(7, apic_isr7)
++ ISR_VEC 1, apic_isr1
++ ISR_VEC 2, apic_isr2
++ ISR_VEC 3, apic_isr3
++ ISR_VEC 4, apic_isr4
++ ISR_VEC 5, apic_isr5
++ ISR_VEC 6, apic_isr6
++ ISR_VEC 7, apic_isr7
+
+ /*
+ * Local APIC periodic timer handler.
+ */
+- .text
+- SUPERALIGN_TEXT
+-IDTVEC(timerint)
+- PUSH_FRAME
++ INTR_HANDLER timerint
+ FAKE_MCOUNT(TF_RIP(%rsp))
+ movq %rsp, %rdi
+ call lapic_handle_timer
+@@ -137,10 +136,7 @@
+ /*
+ * Local APIC CMCI handler.
+ */
+- .text
+- SUPERALIGN_TEXT
+-IDTVEC(cmcint)
+- PUSH_FRAME
++ INTR_HANDLER cmcint
+ FAKE_MCOUNT(TF_RIP(%rsp))
+ call lapic_handle_cmc
+ MEXITCOUNT
+@@ -149,10 +145,7 @@
+ /*
+ * Local APIC error interrupt handler.
+ */
+- .text
+- SUPERALIGN_TEXT
+-IDTVEC(errorint)
+- PUSH_FRAME
++ INTR_HANDLER errorint
+ FAKE_MCOUNT(TF_RIP(%rsp))
+ call lapic_handle_error
+ MEXITCOUNT
+@@ -163,10 +156,7 @@
+ * Xen event channel upcall interrupt handler.
+ * Only used when the hypervisor supports direct vector callbacks.
+ */
+- .text
+- SUPERALIGN_TEXT
+-IDTVEC(xen_intr_upcall)
+- PUSH_FRAME
++ INTR_HANDLER xen_intr_upcall
+ FAKE_MCOUNT(TF_RIP(%rsp))
+ movq %rsp, %rdi
+ call xen_intr_handle_upcall
+@@ -183,59 +173,59 @@
+ SUPERALIGN_TEXT
+ invltlb_ret:
+ call as_lapic_eoi
+- POP_FRAME
+- jmp doreti_iret
++ jmp ld_regs
+
+ SUPERALIGN_TEXT
+-IDTVEC(invltlb)
+- PUSH_FRAME
+-
++ INTR_HANDLER invltlb
+ call invltlb_handler
+ jmp invltlb_ret
+
+-IDTVEC(invltlb_pcid)
+- PUSH_FRAME
+-
++ INTR_HANDLER invltlb_pcid
+ call invltlb_pcid_handler
+ jmp invltlb_ret
+
+-IDTVEC(invltlb_invpcid)
+- PUSH_FRAME
+-
++ INTR_HANDLER invltlb_invpcid_nopti
+ call invltlb_invpcid_handler
+ jmp invltlb_ret
+
++ INTR_HANDLER invltlb_invpcid_pti
++ call invltlb_invpcid_pti_handler
++ jmp invltlb_ret
++
+ /*
+ * Single page TLB shootdown
+ */
+- .text
++ INTR_HANDLER invlpg
++ call invlpg_handler
++ jmp invltlb_ret
+
+- SUPERALIGN_TEXT
+-IDTVEC(invlpg)
+- PUSH_FRAME
++ INTR_HANDLER invlpg_invpcid
++ call invlpg_invpcid_handler
++ jmp invltlb_ret
+
+- call invlpg_handler
++ INTR_HANDLER invlpg_pcid
++ call invlpg_pcid_handler
+ jmp invltlb_ret
+
+ /*
+ * Page range TLB shootdown.
+ */
+- .text
+- SUPERALIGN_TEXT
+-IDTVEC(invlrng)
+- PUSH_FRAME
+-
++ INTR_HANDLER invlrng
+ call invlrng_handler
+ jmp invltlb_ret
+
++ INTR_HANDLER invlrng_invpcid
++ call invlrng_invpcid_handler
++ jmp invltlb_ret
++
++ INTR_HANDLER invlrng_pcid
++ call invlrng_pcid_handler
++ jmp invltlb_ret
++
+ /*
+ * Invalidate cache.
+ */
+- .text
+- SUPERALIGN_TEXT
+-IDTVEC(invlcache)
+- PUSH_FRAME
+-
++ INTR_HANDLER invlcache
+ call invlcache_handler
+ jmp invltlb_ret
+
+@@ -242,15 +232,9 @@
+ /*
+ * Handler for IPIs sent via the per-cpu IPI bitmap.
+ */
+- .text
+- SUPERALIGN_TEXT
+-IDTVEC(ipi_intr_bitmap_handler)
+- PUSH_FRAME
+-
++ INTR_HANDLER ipi_intr_bitmap_handler
+ call as_lapic_eoi
+-
+ FAKE_MCOUNT(TF_RIP(%rsp))
+-
+ call ipi_bitmap_handler
+ MEXITCOUNT
+ jmp doreti
+@@ -258,13 +242,8 @@
+ /*
+ * Executed by a CPU when it receives an IPI_STOP from another CPU.
+ */
+- .text
+- SUPERALIGN_TEXT
+-IDTVEC(cpustop)
+- PUSH_FRAME
+-
++ INTR_HANDLER cpustop
+ call as_lapic_eoi
+-
+ call cpustop_handler
+ jmp doreti
+
+@@ -271,11 +250,7 @@
+ /*
+ * Executed by a CPU when it receives an IPI_SUSPEND from another CPU.
+ */
+- .text
+- SUPERALIGN_TEXT
+-IDTVEC(cpususpend)
+- PUSH_FRAME
+-
++ INTR_HANDLER cpususpend
+ call cpususpend_handler
+ call as_lapic_eoi
+ jmp doreti
+@@ -285,10 +260,7 @@
+ *
+ * - Calls the generic rendezvous action function.
+ */
+- .text
+- SUPERALIGN_TEXT
+-IDTVEC(rendezvous)
+- PUSH_FRAME
++ INTR_HANDLER rendezvous
+ #ifdef COUNT_IPIS
+ movl PCPU(CPUID), %eax
+ movq ipi_rendezvous_counts(,%rax,8), %rax
+@@ -328,4 +300,8 @@
+ popq %rax
+ jmp doreti_iret
+
++ INTR_HANDLER justreturn1
++ call as_lapic_eoi
++ jmp doreti
++
+ #endif /* SMP */
+--- sys/amd64/amd64/atpic_vector.S.orig
++++ sys/amd64/amd64/atpic_vector.S
+@@ -36,38 +36,35 @@
+ * master and slave interrupt controllers.
+ */
+
++#include "assym.s"
+ #include
+
+-#include "assym.s"
+-
+ /*
+ * Macros for interrupt entry, call to handler, and exit.
+ */
+-#define INTR(irq_num, vec_name) \
+- .text ; \
+- SUPERALIGN_TEXT ; \
+-IDTVEC(vec_name) ; \
+- PUSH_FRAME ; \
+- FAKE_MCOUNT(TF_RIP(%rsp)) ; \
+- movq %rsp, %rsi ; \
+- movl $irq_num, %edi; /* pass the IRQ */ \
+- call atpic_handle_intr ; \
+- MEXITCOUNT ; \
++ .macro INTR irq_num, vec_name
++ INTR_HANDLER \vec_name
++ FAKE_MCOUNT(TF_RIP(%rsp))
++ movq %rsp, %rsi
++ movl $\irq_num, %edi /* pass the IRQ */
++ call atpic_handle_intr
++ MEXITCOUNT
+ jmp doreti
++ .endm
+
+- INTR(0, atpic_intr0)
+- INTR(1, atpic_intr1)
+- INTR(2, atpic_intr2)
+- INTR(3, atpic_intr3)
+- INTR(4, atpic_intr4)
+- INTR(5, atpic_intr5)
+- INTR(6, atpic_intr6)
+- INTR(7, atpic_intr7)
+- INTR(8, atpic_intr8)
+- INTR(9, atpic_intr9)
+- INTR(10, atpic_intr10)
+- INTR(11, atpic_intr11)
+- INTR(12, atpic_intr12)
+- INTR(13, atpic_intr13)
+- INTR(14, atpic_intr14)
+- INTR(15, atpic_intr15)
++ INTR 0, atpic_intr0
++ INTR 1, atpic_intr1
++ INTR 2, atpic_intr2
++ INTR 3, atpic_intr3
++ INTR 4, atpic_intr4
++ INTR 5, atpic_intr5
++ INTR 6, atpic_intr6
++ INTR 7, atpic_intr7
++ INTR 8, atpic_intr8
++ INTR 9, atpic_intr9
++ INTR 10, atpic_intr10
++ INTR 11, atpic_intr11
++ INTR 12, atpic_intr12
++ INTR 13, atpic_intr13
++ INTR 14, atpic_intr14
++ INTR 15, atpic_intr15
+--- sys/amd64/amd64/cpu_switch.S.orig
++++ sys/amd64/amd64/cpu_switch.S
+@@ -191,9 +191,11 @@
+ done_tss:
+ movq %r8,PCPU(RSP0)
+ movq %r8,PCPU(CURPCB)
+- /* Update the TSS_RSP0 pointer for the next interrupt */
++ /* Update the COMMON_TSS_RSP0 pointer for the next interrupt */
++ cmpb $0,pti(%rip)
++ jne 1f
+ movq %r8,COMMON_TSS_RSP0(%rdx)
+- movq %r12,PCPU(CURTHREAD) /* into next thread */
++1: movq %r12,PCPU(CURTHREAD) /* into next thread */
+
+ /* Test if debug registers should be restored. */
+ testl $PCB_DBREGS,PCB_FLAGS(%r8)
+@@ -270,7 +272,12 @@
+ shrq $8,%rcx
+ movl %ecx,8(%rax)
+ movb $0x89,5(%rax) /* unset busy */
+- movl $TSSSEL,%eax
++ cmpb $0,pti(%rip)
++ je 1f
++ movq PCPU(PRVSPACE),%rax
++ addq $PC_PTI_STACK+PC_PTI_STACK_SZ*8,%rax
++ movq %rax,COMMON_TSS_RSP0(%rdx)
++1: movl $TSSSEL,%eax
+ ltr %ax
+ jmp done_tss
+
+--- sys/amd64/amd64/db_trace.c.orig
++++ sys/amd64/amd64/db_trace.c
+@@ -200,6 +200,7 @@
+ if (name != NULL) {
+ if (strcmp(name, "calltrap") == 0 ||
+ strcmp(name, "fork_trampoline") == 0 ||
++ strcmp(name, "mchk_calltrap") == 0 ||
+ strcmp(name, "nmi_calltrap") == 0 ||
+ strcmp(name, "Xdblfault") == 0)
+ frame_type = TRAP;
+--- sys/amd64/amd64/exception.S.orig
++++ sys/amd64/amd64/exception.S
+@@ -1,12 +1,16 @@
+ /*-
+ * Copyright (c) 1989, 1990 William F. Jolitz.
+ * Copyright (c) 1990 The Regents of the University of California.
+- * Copyright (c) 2007 The FreeBSD Foundation
++ * Copyright (c) 2007-2018 The FreeBSD Foundation
+ * All rights reserved.
+ *
+ * Portions of this software were developed by A. Joseph Koshy under
+ * sponsorship from the FreeBSD Foundation and Google, Inc.
+ *
++ * Portions of this software were developed by
++ * Konstantin Belousov under sponsorship from
++ * the FreeBSD Foundation.
++ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+@@ -38,13 +42,13 @@
+ #include "opt_compat.h"
+ #include "opt_hwpmc_hooks.h"
+
++#include "assym.s"
++
+ #include
+ #include
+ #include
+ #include
+
+-#include "assym.s"
+-
+ #ifdef KDTRACE_HOOKS
+ .bss
+ .globl dtrace_invop_jump_addr
+@@ -100,69 +104,62 @@
+ MCOUNT_LABEL(user)
+ MCOUNT_LABEL(btrap)
+
+-/* Traps that we leave interrupts disabled for.. */
+-#define TRAP_NOEN(a) \
+- subq $TF_RIP,%rsp; \
+- movl $(a),TF_TRAPNO(%rsp) ; \
+- movq $0,TF_ADDR(%rsp) ; \
+- movq $0,TF_ERR(%rsp) ; \
++/* Traps that we leave interrupts disabled for. */
++ .macro TRAP_NOEN l, trapno
++ PTI_ENTRY \l,X\l
++ .globl X\l
++ .type X\l,@function
++X\l: subq $TF_RIP,%rsp
++ movl $\trapno,TF_TRAPNO(%rsp)
++ movq $0,TF_ADDR(%rsp)
++ movq $0,TF_ERR(%rsp)
+ jmp alltraps_noen
+-IDTVEC(dbg)
+- TRAP_NOEN(T_TRCTRAP)
+-IDTVEC(bpt)
+- TRAP_NOEN(T_BPTFLT)
++ .endm
++
++ TRAP_NOEN dbg, T_TRCTRAP
++ TRAP_NOEN bpt, T_BPTFLT
+ #ifdef KDTRACE_HOOKS
+-IDTVEC(dtrace_ret)
+- TRAP_NOEN(T_DTRACE_RET)
++ TRAP_NOEN dtrace_ret, T_DTRACE_RET
+ #endif
+
+ /* Regular traps; The cpu does not supply tf_err for these. */
+-#define TRAP(a) \
+- subq $TF_RIP,%rsp; \
+- movl $(a),TF_TRAPNO(%rsp) ; \
+- movq $0,TF_ADDR(%rsp) ; \
+- movq $0,TF_ERR(%rsp) ; \
++ .macro TRAP l, trapno
++ PTI_ENTRY \l,X\l
++ .globl X\l
++ .type X\l,@function
++X\l:
++ subq $TF_RIP,%rsp
++ movl $\trapno,TF_TRAPNO(%rsp)
++ movq $0,TF_ADDR(%rsp)
++ movq $0,TF_ERR(%rsp)
+ jmp alltraps
+-IDTVEC(div)
+- TRAP(T_DIVIDE)
+-IDTVEC(ofl)
+- TRAP(T_OFLOW)
+-IDTVEC(bnd)
+- TRAP(T_BOUND)
+-IDTVEC(ill)
+- TRAP(T_PRIVINFLT)
+-IDTVEC(dna)
+- TRAP(T_DNA)
+-IDTVEC(fpusegm)
+- TRAP(T_FPOPFLT)
+-IDTVEC(mchk)
+- TRAP(T_MCHK)
+-IDTVEC(rsvd)
+- TRAP(T_RESERVED)
+-IDTVEC(fpu)
+- TRAP(T_ARITHTRAP)
+-IDTVEC(xmm)
+- TRAP(T_XMMFLT)
++ .endm
+
+-/* This group of traps have tf_err already pushed by the cpu */
+-#define TRAP_ERR(a) \
+- subq $TF_ERR,%rsp; \
+- movl $(a),TF_TRAPNO(%rsp) ; \
+- movq $0,TF_ADDR(%rsp) ; \
++ TRAP div, T_DIVIDE
++ TRAP ofl, T_OFLOW
++ TRAP bnd, T_BOUND
++ TRAP ill, T_PRIVINFLT
++ TRAP dna, T_DNA
++ TRAP fpusegm, T_FPOPFLT
++ TRAP rsvd, T_RESERVED
++ TRAP fpu, T_ARITHTRAP
++ TRAP xmm, T_XMMFLT
++
++/* This group of traps have tf_err already pushed by the cpu. */
++ .macro TRAP_ERR l, trapno
++ PTI_ENTRY \l,X\l,has_err=1
++ .globl X\l
++ .type X\l,@function
++X\l:
++ subq $TF_ERR,%rsp
++ movl $\trapno,TF_TRAPNO(%rsp)
++ movq $0,TF_ADDR(%rsp)
+ jmp alltraps
+-IDTVEC(tss)
+- TRAP_ERR(T_TSSFLT)
+-IDTVEC(missing)
+- subq $TF_ERR,%rsp
+- movl $T_SEGNPFLT,TF_TRAPNO(%rsp)
+- jmp prot_addrf
+-IDTVEC(stk)
+- subq $TF_ERR,%rsp
+- movl $T_STKFLT,TF_TRAPNO(%rsp)
+- jmp prot_addrf
+-IDTVEC(align)
+- TRAP_ERR(T_ALIGNFLT)
++ .endm
+
++ TRAP_ERR tss, T_TSSFLT
++ TRAP_ERR align, T_ALIGNFLT
++
+ /*
+ * alltraps entry point. Use swapgs if this is the first time in the
+ * kernel from userland. Reenable interrupts if they were enabled
+@@ -174,25 +171,24 @@
+ alltraps:
+ movq %rdi,TF_RDI(%rsp)
+ testb $SEL_RPL_MASK,TF_CS(%rsp) /* Did we come from kernel? */
+- jz alltraps_testi /* already running with kernel GS.base */
++ jz 1f /* already running with kernel GS.base */
+ swapgs
+ movq PCPU(CURPCB),%rdi
+ andl $~PCB_FULL_IRET,PCB_FLAGS(%rdi)
+- movw %fs,TF_FS(%rsp)
+- movw %gs,TF_GS(%rsp)
+- movw %es,TF_ES(%rsp)
+- movw %ds,TF_DS(%rsp)
+-alltraps_testi:
+- testl $PSL_I,TF_RFLAGS(%rsp)
+- jz alltraps_pushregs_no_rdi
++1: SAVE_SEGS
++ movq %rdx,TF_RDX(%rsp)
++ movq %rax,TF_RAX(%rsp)
++ movq %rcx,TF_RCX(%rsp)
++ testb $SEL_RPL_MASK,TF_CS(%rsp)
++ jz 2f
++ call handle_ibrs_entry
++2: testl $PSL_I,TF_RFLAGS(%rsp)
++ jz alltraps_pushregs_no_rax
+ sti
+-alltraps_pushregs_no_rdi:
++alltraps_pushregs_no_rax:
+ movq %rsi,TF_RSI(%rsp)
+- movq %rdx,TF_RDX(%rsp)
+- movq %rcx,TF_RCX(%rsp)
+ movq %r8,TF_R8(%rsp)
+ movq %r9,TF_R9(%rsp)
+- movq %rax,TF_RAX(%rsp)
+ movq %rbx,TF_RBX(%rsp)
+ movq %rbp,TF_RBP(%rsp)
+ movq %r10,TF_R10(%rsp)
+@@ -248,15 +244,18 @@
+ alltraps_noen:
+ movq %rdi,TF_RDI(%rsp)
+ testb $SEL_RPL_MASK,TF_CS(%rsp) /* Did we come from kernel? */
+- jz 1f /* already running with kernel GS.base */
++ jz 1f /* already running with kernel GS.base */
+ swapgs
+ movq PCPU(CURPCB),%rdi
+ andl $~PCB_FULL_IRET,PCB_FLAGS(%rdi)
+-1: movw %fs,TF_FS(%rsp)
+- movw %gs,TF_GS(%rsp)
+- movw %es,TF_ES(%rsp)
+- movw %ds,TF_DS(%rsp)
+- jmp alltraps_pushregs_no_rdi
++1: SAVE_SEGS
++ movq %rdx,TF_RDX(%rsp)
++ movq %rax,TF_RAX(%rsp)
++ movq %rcx,TF_RCX(%rsp)
++ testb $SEL_RPL_MASK,TF_CS(%rsp)
++ jz alltraps_pushregs_no_rax
++ call handle_ibrs_entry
++ jmp alltraps_pushregs_no_rax
+
+ IDTVEC(dblfault)
+ subq $TF_ERR,%rsp
+@@ -278,10 +277,7 @@
+ movq %r13,TF_R13(%rsp)
+ movq %r14,TF_R14(%rsp)
+ movq %r15,TF_R15(%rsp)
+- movw %fs,TF_FS(%rsp)
+- movw %gs,TF_GS(%rsp)
+- movw %es,TF_ES(%rsp)
+- movw %ds,TF_DS(%rsp)
++ SAVE_SEGS
+ movl $TF_HASSEGS,TF_FLAGS(%rsp)
+ cld
+ testb $SEL_RPL_MASK,TF_CS(%rsp) /* Did we come from kernel? */
+@@ -288,31 +284,54 @@
+ jz 1f /* already running with kernel GS.base */
+ swapgs
+ 1:
+- movq %rsp,%rdi
++ movq PCPU(KCR3),%rax
++ cmpq $~0,%rax
++ je 2f
++ movq %rax,%cr3
++2: movq %rsp,%rdi
+ call dblfault_handler
+-2:
+- hlt
+- jmp 2b
++3: hlt
++ jmp 3b
+
++ ALIGN_TEXT
++IDTVEC(page_pti)
++ testb $SEL_RPL_MASK,PTI_CS-2*8(%rsp)
++ jz Xpage
++ swapgs
++ pushq %rax
++ pushq %rdx
++ movq %cr3,%rax
++ movq %rax,PCPU(SAVED_UCR3)
++ PTI_UUENTRY has_err=1
++ subq $TF_ERR,%rsp
++ movq %rdi,TF_RDI(%rsp)
++ movq %rax,TF_RAX(%rsp)
++ movq %rdx,TF_RDX(%rsp)
++ movq %rcx,TF_RCX(%rsp)
++ jmp page_u
+ IDTVEC(page)
+ subq $TF_ERR,%rsp
+- movl $T_PAGEFLT,TF_TRAPNO(%rsp)
+- movq %rdi,TF_RDI(%rsp) /* free up a GP register */
++ movq %rdi,TF_RDI(%rsp) /* free up GP registers */
++ movq %rax,TF_RAX(%rsp)
++ movq %rdx,TF_RDX(%rsp)
++ movq %rcx,TF_RCX(%rsp)
+ testb $SEL_RPL_MASK,TF_CS(%rsp) /* Did we come from kernel? */
+- jz 1f /* already running with kernel GS.base */
++ jz page_cr2 /* already running with kernel GS.base */
+ swapgs
+- movq PCPU(CURPCB),%rdi
++page_u: movq PCPU(CURPCB),%rdi
+ andl $~PCB_FULL_IRET,PCB_FLAGS(%rdi)
+-1: movq %cr2,%rdi /* preserve %cr2 before .. */
++ movq PCPU(SAVED_UCR3),%rax
++ movq %rax,PCB_SAVED_UCR3(%rdi)
++ call handle_ibrs_entry
++page_cr2:
++ movq %cr2,%rdi /* preserve %cr2 before .. */
+ movq %rdi,TF_ADDR(%rsp) /* enabling interrupts. */
+- movw %fs,TF_FS(%rsp)
+- movw %gs,TF_GS(%rsp)
+- movw %es,TF_ES(%rsp)
+- movw %ds,TF_DS(%rsp)
++ SAVE_SEGS
++ movl $T_PAGEFLT,TF_TRAPNO(%rsp)
+ testl $PSL_I,TF_RFLAGS(%rsp)
+- jz alltraps_pushregs_no_rdi
++ jz alltraps_pushregs_no_rax
+ sti
+- jmp alltraps_pushregs_no_rdi
++ jmp alltraps_pushregs_no_rax
+
+ /*
+ * We have to special-case this one. If we get a trap in doreti() at
+@@ -319,30 +338,71 @@
+ * the iretq stage, we'll reenter with the wrong gs state. We'll have
+ * to do a special the swapgs in this case even coming from the kernel.
+ * XXX linux has a trap handler for their equivalent of load_gs().
++ *
++ * On the stack, we have the hardware interrupt frame to return
++ * to usermode (faulted) and another frame with error code, for
++ * fault. For PTI, copy both frames to the main thread stack.
+ */
+-IDTVEC(prot)
++ .macro PROTF_ENTRY name,trapno
++\name\()_pti_doreti:
++ pushq %rax
++ pushq %rdx
++ swapgs
++ movq PCPU(KCR3),%rax
++ movq %rax,%cr3
++ movq PCPU(RSP0),%rax
++ subq $2*PTI_SIZE-3*8,%rax /* no err, %rax, %rdx in faulted frame */
++ MOVE_STACKS (PTI_SIZE / 4 - 3)
++ movq %rax,%rsp
++ popq %rdx
++ popq %rax
++ swapgs
++ jmp X\name
++IDTVEC(\name\()_pti)
++ cmpq $doreti_iret,PTI_RIP-2*8(%rsp)
++ je \name\()_pti_doreti
++ testb $SEL_RPL_MASK,PTI_CS-2*8(%rsp) /* %rax, %rdx not yet pushed */
++ jz X\name
++ PTI_UENTRY has_err=1
++ swapgs
++IDTVEC(\name)
+ subq $TF_ERR,%rsp
+- movl $T_PROTFLT,TF_TRAPNO(%rsp)
++ movl $\trapno,TF_TRAPNO(%rsp)
++ jmp prot_addrf
++ .endm
++
++ PROTF_ENTRY missing, T_SEGNPFLT
++ PROTF_ENTRY stk, T_STKFLT
++ PROTF_ENTRY prot, T_PROTFLT
++
+ prot_addrf:
+ movq $0,TF_ADDR(%rsp)
+ movq %rdi,TF_RDI(%rsp) /* free up a GP register */
++ movq %rax,TF_RAX(%rsp)
++ movq %rdx,TF_RDX(%rsp)
++ movq %rcx,TF_RCX(%rsp)
++ movw %fs,TF_FS(%rsp)
++ movw %gs,TF_GS(%rsp)
+ leaq doreti_iret(%rip),%rdi
+ cmpq %rdi,TF_RIP(%rsp)
+- je 1f /* kernel but with user gsbase!! */
++ je 5f /* kernel but with user gsbase!! */
+ testb $SEL_RPL_MASK,TF_CS(%rsp) /* Did we come from kernel? */
+- jz 2f /* already running with kernel GS.base */
+-1: swapgs
+-2: movq PCPU(CURPCB),%rdi
++ jz 6f /* already running with kernel GS.base */
++ swapgs
++ movq PCPU(CURPCB),%rdi
++4: call handle_ibrs_entry
+ orl $PCB_FULL_IRET,PCB_FLAGS(%rdi) /* always full iret from GPF */
+- movw %fs,TF_FS(%rsp)
+- movw %gs,TF_GS(%rsp)
+ movw %es,TF_ES(%rsp)
+ movw %ds,TF_DS(%rsp)
+ testl $PSL_I,TF_RFLAGS(%rsp)
+- jz alltraps_pushregs_no_rdi
++ jz alltraps_pushregs_no_rax
+ sti
+- jmp alltraps_pushregs_no_rdi
++ jmp alltraps_pushregs_no_rax
+
++5: swapgs
++6: movq PCPU(CURPCB),%rdi
++ jmp 4b
++
+ /*
+ * Fast syscall entry point. We enter here with just our new %cs/%ss set,
+ * and the new privilige level. We are still running on the old user stack
+@@ -352,8 +412,18 @@
+ * We do not support invoking this from a custom %cs or %ss (e.g. using
+ * entries from an LDT).
+ */
++ SUPERALIGN_TEXT
++IDTVEC(fast_syscall_pti)
++ swapgs
++ movq %rax,PCPU(SCRATCH_RAX)
++ movq PCPU(KCR3),%rax
++ movq %rax,%cr3
++ jmp fast_syscall_common
++ SUPERALIGN_TEXT
+ IDTVEC(fast_syscall)
+ swapgs
++ movq %rax,PCPU(SCRATCH_RAX)
++fast_syscall_common:
+ movq %rsp,PCPU(SCRATCH_RSP)
+ movq PCPU(RSP0),%rsp
+ /* Now emulate a trapframe. Make the 8 byte alignment odd for call. */
+@@ -363,10 +433,11 @@
+ movq %rcx,TF_RIP(%rsp) /* %rcx original value is in %r10 */
+ movq PCPU(SCRATCH_RSP),%r11 /* %r11 already saved */
+ movq %r11,TF_RSP(%rsp) /* user stack pointer */
+- movw %fs,TF_FS(%rsp)
+- movw %gs,TF_GS(%rsp)
+- movw %es,TF_ES(%rsp)
+- movw %ds,TF_DS(%rsp)
++ movq PCPU(SCRATCH_RAX),%rax
++ movq %rax,TF_RAX(%rsp) /* syscall number */
++ movq %rdx,TF_RDX(%rsp) /* arg 3 */
++ SAVE_SEGS
++ call handle_ibrs_entry
+ movq PCPU(CURPCB),%r11
+ andl $~PCB_FULL_IRET,PCB_FLAGS(%r11)
+ sti
+@@ -375,11 +446,9 @@
+ movq $2,TF_ERR(%rsp)
+ movq %rdi,TF_RDI(%rsp) /* arg 1 */
+ movq %rsi,TF_RSI(%rsp) /* arg 2 */
+- movq %rdx,TF_RDX(%rsp) /* arg 3 */
+ movq %r10,TF_RCX(%rsp) /* arg 4 */
+ movq %r8,TF_R8(%rsp) /* arg 5 */
+ movq %r9,TF_R9(%rsp) /* arg 6 */
+- movq %rax,TF_RAX(%rsp) /* syscall number */
+ movq %rbx,TF_RBX(%rsp) /* C preserved */
+ movq %rbp,TF_RBP(%rsp) /* C preserved */
+ movq %r12,TF_R12(%rsp) /* C preserved */
+@@ -398,11 +467,12 @@
+ /* Disable interrupts before testing PCB_FULL_IRET. */
+ cli
+ testl $PCB_FULL_IRET,PCB_FLAGS(%rax)
+- jnz 3f
++ jnz 4f
+ /* Check for and handle AST's on return to userland. */
+ movq PCPU(CURTHREAD),%rax
+ testl $TDF_ASTPENDING | TDF_NEEDRESCHED,TD_FLAGS(%rax)
+- jne 2f
++ jne 3f
++ call handle_ibrs_exit
+ /* Restore preserved registers. */
+ MEXITCOUNT
+ movq TF_RDI(%rsp),%rdi /* bonus; preserve arg 1 */
+@@ -412,16 +482,21 @@
+ movq TF_RFLAGS(%rsp),%r11 /* original %rflags */
+ movq TF_RIP(%rsp),%rcx /* original %rip */
+ movq TF_RSP(%rsp),%rsp /* user stack pointer */
+- swapgs
++ cmpb $0,pti
++ je 2f
++ movq PCPU(UCR3),%r9
++ movq %r9,%cr3
++ xorl %r9d,%r9d
++2: swapgs
+ sysretq
+
+-2: /* AST scheduled. */
++3: /* AST scheduled. */
+ sti
+ movq %rsp,%rdi
+ call ast
+ jmp 1b
+
+-3: /* Requested full context restore, use doreti for that. */
++4: /* Requested full context restore, use doreti for that. */
+ MEXITCOUNT
+ jmp doreti
+
+@@ -477,10 +552,7 @@
+ movq %r13,TF_R13(%rsp)
+ movq %r14,TF_R14(%rsp)
+ movq %r15,TF_R15(%rsp)
+- movw %fs,TF_FS(%rsp)
+- movw %gs,TF_GS(%rsp)
+- movw %es,TF_ES(%rsp)
+- movw %ds,TF_DS(%rsp)
++ SAVE_SEGS
+ movl $TF_HASSEGS,TF_FLAGS(%rsp)
+ cld
+ xorl %ebx,%ebx
+@@ -487,7 +559,8 @@
+ testb $SEL_RPL_MASK,TF_CS(%rsp)
+ jnz nmi_fromuserspace
+ /*
+- * We've interrupted the kernel. Preserve GS.base in %r12.
++ * We've interrupted the kernel. Preserve GS.base in %r12,
++ * %cr3 in %r13, and possibly lower half of MSR_IA32_SPEC_CTL in %r14d.
+ */
+ movl $MSR_GSBASE,%ecx
+ rdmsr
+@@ -499,10 +572,32 @@
+ movl %edx,%eax
+ shrq $32,%rdx
+ wrmsr
++ movq %cr3,%r13
++ movq PCPU(KCR3),%rax
++ cmpq $~0,%rax
++ je 1f
++ movq %rax,%cr3
++1: testl $CPUID_STDEXT3_IBPB,cpu_stdext_feature3(%rip)
++ je nmi_calltrap
++ movl $MSR_IA32_SPEC_CTRL,%ecx
++ rdmsr
++ movl %eax,%r14d
++ call handle_ibrs_entry
+ jmp nmi_calltrap
+ nmi_fromuserspace:
+ incl %ebx
+ swapgs
++ movq %cr3,%r13
++ movq PCPU(KCR3),%rax
++ cmpq $~0,%rax
++ je 1f
++ movq %rax,%cr3
++1: call handle_ibrs_entry
++ movq PCPU(CURPCB),%rdi
++ testq %rdi,%rdi
++ jz 3f
++ orl $PCB_FULL_IRET,PCB_FLAGS(%rdi)
++3:
+ /* Note: this label is also used by ddb and gdb: */
+ nmi_calltrap:
+ FAKE_MCOUNT(TF_RIP(%rsp))
+@@ -525,14 +620,9 @@
+ movq PCPU(CURTHREAD),%rax
+ orq %rax,%rax /* curthread present? */
+ jz nocallchain
+- testl $TDP_CALLCHAIN,TD_PFLAGS(%rax) /* flagged for capture? */
+- jz nocallchain
+ /*
+- * A user callchain is to be captured, so:
+- * - Move execution to the regular kernel stack, to allow for
+- * nested NMI interrupts.
+- * - Take the processor out of "NMI" mode by faking an "iret".
+- * - Enable interrupts, so that copyin() can work.
++ * Move execution to the regular kernel stack, because we
++ * committed to return through doreti.
+ */
+ movq %rsp,%rsi /* source stack pointer */
+ movq $TF_SIZE,%rcx
+@@ -539,12 +629,20 @@
+ movq PCPU(RSP0),%rdx
+ subq %rcx,%rdx
+ movq %rdx,%rdi /* destination stack pointer */
+-
+ shrq $3,%rcx /* trap frame size in long words */
+ cld
+ rep
+ movsq /* copy trapframe */
++ movq %rdx,%rsp /* we are on the regular kstack */
+
++ testl $TDP_CALLCHAIN,TD_PFLAGS(%rax) /* flagged for capture? */
++ jz nocallchain
++ /*
++ * A user callchain is to be captured, so:
++ * - Take the processor out of "NMI" mode by faking an "iret",
++ * to allow for nested NMI interrupts.
++ * - Enable interrupts, so that copyin() can work.
++ */
+ movl %ss,%eax
+ pushq %rax /* tf_ss */
+ pushq %rdx /* tf_rsp (on kernel stack) */
+@@ -574,33 +672,139 @@
+ cli
+ nocallchain:
+ #endif
+- testl %ebx,%ebx
++ testl %ebx,%ebx /* %ebx == 0 => return to userland */
+ jnz doreti_exit
+-nmi_kernelexit:
+ /*
++ * Restore speculation control MSR, if preserved.
++ */
++ testl $CPUID_STDEXT3_IBPB,cpu_stdext_feature3(%rip)
++ je 1f
++ movl %r14d,%eax
++ xorl %edx,%edx
++ movl $MSR_IA32_SPEC_CTRL,%ecx
++ wrmsr
++ /*
+ * Put back the preserved MSR_GSBASE value.
+ */
++1: movl $MSR_GSBASE,%ecx
++ movq %r12,%rdx
++ movl %edx,%eax
++ shrq $32,%rdx
++ wrmsr
++ movq %r13,%cr3
++ RESTORE_REGS
++ addq $TF_RIP,%rsp
++ jmp doreti_iret
++
++/*
++ * MC# handling is similar to NMI.
++ *
++ * As with NMIs, machine check exceptions do not respect RFLAGS.IF and
++ * can occur at any time with a GS.base value that does not correspond
++ * to the privilege level in CS.
++ *
++ * Machine checks are not unblocked by iretq, but it is best to run
++ * the handler with interrupts disabled since the exception may have
++ * interrupted a critical section.
++ *
++ * The MC# handler runs on its own stack (tss_ist3). The canonical
++ * GS.base value for the processor is stored just above the bottom of
++ * its MC# stack. For exceptions taken from kernel mode, the current
++ * value in the processor's GS.base is saved at entry to C-preserved
++ * register %r12, the canonical value for GS.base is then loaded into
++ * the processor, and the saved value is restored at exit time. For
++ * exceptions taken from user mode, the cheaper 'SWAPGS' instructions
++ * are used for swapping GS.base.
++ */
++
++IDTVEC(mchk)
++ subq $TF_RIP,%rsp
++ movl $(T_MCHK),TF_TRAPNO(%rsp)
++ movq $0,TF_ADDR(%rsp)
++ movq $0,TF_ERR(%rsp)
++ movq %rdi,TF_RDI(%rsp)
++ movq %rsi,TF_RSI(%rsp)
++ movq %rdx,TF_RDX(%rsp)
++ movq %rcx,TF_RCX(%rsp)
++ movq %r8,TF_R8(%rsp)
++ movq %r9,TF_R9(%rsp)
++ movq %rax,TF_RAX(%rsp)
++ movq %rbx,TF_RBX(%rsp)
++ movq %rbp,TF_RBP(%rsp)
++ movq %r10,TF_R10(%rsp)
++ movq %r11,TF_R11(%rsp)
++ movq %r12,TF_R12(%rsp)
++ movq %r13,TF_R13(%rsp)
++ movq %r14,TF_R14(%rsp)
++ movq %r15,TF_R15(%rsp)
++ SAVE_SEGS
++ movl $TF_HASSEGS,TF_FLAGS(%rsp)
++ cld
++ xorl %ebx,%ebx
++ testb $SEL_RPL_MASK,TF_CS(%rsp)
++ jnz mchk_fromuserspace
++ /*
++ * We've interrupted the kernel. Preserve GS.base in %r12,
++ * %cr3 in %r13, and possibly lower half of MSR_IA32_SPEC_CTL in %r14d.
++ */
+ movl $MSR_GSBASE,%ecx
++ rdmsr
++ movq %rax,%r12
++ shlq $32,%rdx
++ orq %rdx,%r12
++ /* Retrieve and load the canonical value for GS.base. */
++ movq TF_SIZE(%rsp),%rdx
++ movl %edx,%eax
++ shrq $32,%rdx
++ wrmsr
++ movq %cr3,%r13
++ movq PCPU(KCR3),%rax
++ cmpq $~0,%rax
++ je 1f
++ movq %rax,%cr3
++1: testl $CPUID_STDEXT3_IBPB,cpu_stdext_feature3(%rip)
++ je mchk_calltrap
++ movl $MSR_IA32_SPEC_CTRL,%ecx
++ rdmsr
++ movl %eax,%r14d
++ call handle_ibrs_entry
++ jmp mchk_calltrap
++mchk_fromuserspace:
++ incl %ebx
++ swapgs
++ movq %cr3,%r13
++ movq PCPU(KCR3),%rax
++ cmpq $~0,%rax
++ je 1f
++ movq %rax,%cr3
++1: call handle_ibrs_entry
++/* Note: this label is also used by ddb and gdb: */
++mchk_calltrap:
++ FAKE_MCOUNT(TF_RIP(%rsp))
++ movq %rsp,%rdi
++ call mca_intr
++ MEXITCOUNT
++ testl %ebx,%ebx /* %ebx == 0 => return to userland */
++ jnz doreti_exit
++ /*
++ * Restore speculation control MSR, if preserved.
++ */
++ testl $CPUID_STDEXT3_IBPB,cpu_stdext_feature3(%rip)
++ je 1f
++ movl %r14d,%eax
++ xorl %edx,%edx
++ movl $MSR_IA32_SPEC_CTRL,%ecx
++ wrmsr
++ /*
++ * Put back the preserved MSR_GSBASE value.
++ */
++1: movl $MSR_GSBASE,%ecx
+ movq %r12,%rdx
+ movl %edx,%eax
+ shrq $32,%rdx
+ wrmsr
+-nmi_restoreregs:
+- movq TF_RDI(%rsp),%rdi
+- movq TF_RSI(%rsp),%rsi
+- movq TF_RDX(%rsp),%rdx
+- movq TF_RCX(%rsp),%rcx
+- movq TF_R8(%rsp),%r8
+- movq TF_R9(%rsp),%r9
+- movq TF_RAX(%rsp),%rax
+- movq TF_RBX(%rsp),%rbx
+- movq TF_RBP(%rsp),%rbp
+- movq TF_R10(%rsp),%r10
+- movq TF_R11(%rsp),%r11
+- movq TF_R12(%rsp),%r12
+- movq TF_R13(%rsp),%r13
+- movq TF_R14(%rsp),%r14
+- movq TF_R15(%rsp),%r15
++ movq %r13,%cr3
++ RESTORE_REGS
+ addq $TF_RIP,%rsp
+ jmp doreti_iret
+
+@@ -767,27 +971,39 @@
+ ld_ds:
+ movw TF_DS(%rsp),%ds
+ ld_regs:
+- movq TF_RDI(%rsp),%rdi
+- movq TF_RSI(%rsp),%rsi
+- movq TF_RDX(%rsp),%rdx
+- movq TF_RCX(%rsp),%rcx
+- movq TF_R8(%rsp),%r8
+- movq TF_R9(%rsp),%r9
+- movq TF_RAX(%rsp),%rax
+- movq TF_RBX(%rsp),%rbx
+- movq TF_RBP(%rsp),%rbp
+- movq TF_R10(%rsp),%r10
+- movq TF_R11(%rsp),%r11
+- movq TF_R12(%rsp),%r12
+- movq TF_R13(%rsp),%r13
+- movq TF_R14(%rsp),%r14
+- movq TF_R15(%rsp),%r15
++ RESTORE_REGS
+ testb $SEL_RPL_MASK,TF_CS(%rsp) /* Did we come from kernel? */
+- jz 1f /* keep running with kernel GS.base */
++ jz 2f /* keep running with kernel GS.base */
+ cli
++ call handle_ibrs_exit_rs
++ cmpb $0,pti
++ je 1f
++ pushq %rdx
++ movq PCPU(PRVSPACE),%rdx
++ addq $PC_PTI_STACK+PC_PTI_STACK_SZ*8-PTI_SIZE,%rdx
++ movq %rax,PTI_RAX(%rdx)
++ popq %rax
++ movq %rax,PTI_RDX(%rdx)
++ movq TF_RIP(%rsp),%rax
++ movq %rax,PTI_RIP(%rdx)
++ movq TF_CS(%rsp),%rax
++ movq %rax,PTI_CS(%rdx)
++ movq TF_RFLAGS(%rsp),%rax
++ movq %rax,PTI_RFLAGS(%rdx)
++ movq TF_RSP(%rsp),%rax
++ movq %rax,PTI_RSP(%rdx)
++ movq TF_SS(%rsp),%rax
++ movq %rax,PTI_SS(%rdx)
++ movq PCPU(UCR3),%rax
+ swapgs
+-1:
+- addq $TF_RIP,%rsp /* skip over tf_err, tf_trapno */
++ movq %rdx,%rsp
++ movq %rax,%cr3
++ popq %rdx
++ popq %rax
++ addq $8,%rsp
++ jmp doreti_iret
++1: swapgs
++2: addq $TF_RIP,%rsp
+ .globl doreti_iret
+ doreti_iret:
+ iretq
+@@ -811,22 +1027,20 @@
+ .globl doreti_iret_fault
+ doreti_iret_fault:
+ subq $TF_RIP,%rsp /* space including tf_err, tf_trapno */
+- testl $PSL_I,TF_RFLAGS(%rsp)
++ movq %rax,TF_RAX(%rsp)
++ movq %rdx,TF_RDX(%rsp)
++ movq %rcx,TF_RCX(%rsp)
++ call handle_ibrs_entry
++ testb $SEL_RPL_MASK,TF_CS(%rsp)
+ jz 1f
+ sti
+ 1:
+- movw %fs,TF_FS(%rsp)
+- movw %gs,TF_GS(%rsp)
+- movw %es,TF_ES(%rsp)
+- movw %ds,TF_DS(%rsp)
++ SAVE_SEGS
+ movl $TF_HASSEGS,TF_FLAGS(%rsp)
+ movq %rdi,TF_RDI(%rsp)
+ movq %rsi,TF_RSI(%rsp)
+- movq %rdx,TF_RDX(%rsp)
+- movq %rcx,TF_RCX(%rsp)
+ movq %r8,TF_R8(%rsp)
+ movq %r9,TF_R9(%rsp)
+- movq %rax,TF_RAX(%rsp)
+ movq %rbx,TF_RBX(%rsp)
+ movq %rbp,TF_RBP(%rsp)
+ movq %r10,TF_R10(%rsp)
+@@ -845,7 +1059,7 @@
+ .globl ds_load_fault
+ ds_load_fault:
+ movl $T_PROTFLT,TF_TRAPNO(%rsp)
+- testl $PSL_I,TF_RFLAGS(%rsp)
++ testb $SEL_RPL_MASK,TF_CS(%rsp)
+ jz 1f
+ sti
+ 1:
+--- sys/amd64/amd64/genassym.c.orig
++++ sys/amd64/amd64/genassym.c
+@@ -145,6 +145,7 @@
+ ASSYM(PCB_TR, offsetof(struct pcb, pcb_tr));
+ ASSYM(PCB_FLAGS, offsetof(struct pcb, pcb_flags));
+ ASSYM(PCB_ONFAULT, offsetof(struct pcb, pcb_onfault));
++ASSYM(PCB_SAVED_UCR3, offsetof(struct pcb, pcb_saved_ucr3));
+ ASSYM(PCB_TSSP, offsetof(struct pcb, pcb_tssp));
+ ASSYM(PCB_SAVEFPU, offsetof(struct pcb, pcb_save));
+ ASSYM(PCB_EFER, offsetof(struct pcb, pcb_efer));
+@@ -190,6 +191,16 @@
+ ASSYM(TF_SIZE, sizeof(struct trapframe));
+ ASSYM(TF_HASSEGS, TF_HASSEGS);
+
++ASSYM(PTI_RDX, offsetof(struct pti_frame, pti_rdx));
++ASSYM(PTI_RAX, offsetof(struct pti_frame, pti_rax));
++ASSYM(PTI_ERR, offsetof(struct pti_frame, pti_err));
++ASSYM(PTI_RIP, offsetof(struct pti_frame, pti_rip));
++ASSYM(PTI_CS, offsetof(struct pti_frame, pti_cs));
++ASSYM(PTI_RFLAGS, offsetof(struct pti_frame, pti_rflags));
++ASSYM(PTI_RSP, offsetof(struct pti_frame, pti_rsp));
++ASSYM(PTI_SS, offsetof(struct pti_frame, pti_ss));
++ASSYM(PTI_SIZE, sizeof(struct pti_frame));
++
+ ASSYM(SIGF_HANDLER, offsetof(struct sigframe, sf_ahu.sf_handler));
+ ASSYM(SIGF_UC, offsetof(struct sigframe, sf_uc));
+ ASSYM(UC_EFLAGS, offsetof(ucontext_t, uc_mcontext.mc_rflags));
+@@ -206,6 +217,7 @@
+ ASSYM(PC_CURPCB, offsetof(struct pcpu, pc_curpcb));
+ ASSYM(PC_CPUID, offsetof(struct pcpu, pc_cpuid));
+ ASSYM(PC_SCRATCH_RSP, offsetof(struct pcpu, pc_scratch_rsp));
++ASSYM(PC_SCRATCH_RAX, offsetof(struct pcpu, pc_scratch_rax));
+ ASSYM(PC_CURPMAP, offsetof(struct pcpu, pc_curpmap));
+ ASSYM(PC_TSSP, offsetof(struct pcpu, pc_tssp));
+ ASSYM(PC_RSP0, offsetof(struct pcpu, pc_rsp0));
+@@ -215,6 +227,12 @@
+ ASSYM(PC_COMMONTSSP, offsetof(struct pcpu, pc_commontssp));
+ ASSYM(PC_TSS, offsetof(struct pcpu, pc_tss));
+ ASSYM(PC_PM_SAVE_CNT, offsetof(struct pcpu, pc_pm_save_cnt));
++ASSYM(PC_KCR3, offsetof(struct pcpu, pc_kcr3));
++ASSYM(PC_UCR3, offsetof(struct pcpu, pc_ucr3));
++ASSYM(PC_SAVED_UCR3, offsetof(struct pcpu, pc_saved_ucr3));
++ASSYM(PC_PTI_STACK, offsetof(struct pcpu, pc_pti_stack));
++ASSYM(PC_PTI_STACK_SZ, PC_PTI_STACK_SZ);
++ASSYM(PC_IBPB_SET, offsetof(struct pcpu, pc_ibpb_set));
+
+ ASSYM(LA_EOI, LAPIC_EOI * LAPIC_MEM_MUL);
+ ASSYM(LA_ISR, LAPIC_ISR0 * LAPIC_MEM_MUL);
+--- sys/amd64/amd64/initcpu.c.orig
++++ sys/amd64/amd64/initcpu.c
+@@ -194,6 +194,7 @@
+ wrmsr(MSR_EFER, msr);
+ pg_nx = PG_NX;
+ }
++ hw_ibrs_recalculate();
+ switch (cpu_vendor_id) {
+ case CPU_VENDOR_AMD:
+ init_amd();
+--- sys/amd64/amd64/machdep.c.orig
++++ sys/amd64/amd64/machdep.c
+@@ -114,6 +114,7 @@
+ #include
+ #include
+ #include
++#include
+ #include
+ #include
+ #include
+@@ -149,6 +150,14 @@
+ /* Sanity check for __curthread() */
+ CTASSERT(offsetof(struct pcpu, pc_curthread) == 0);
+
++/*
++ * The PTI trampoline stack needs enough space for a hardware trapframe and a
++ * couple of scratch registers, as well as the trapframe left behind after an
++ * iret fault.
++ */
++CTASSERT(PC_PTI_STACK_SZ * sizeof(register_t) >= 2 * sizeof(struct pti_frame) -
++ offsetof(struct pti_frame, pti_rip));
++
+ extern u_int64_t hammer_time(u_int64_t, u_int64_t);
+
+ #define CS_SECURE(cs) (ISPL(cs) == SEL_UPL)
+@@ -180,12 +189,6 @@
+ .msi_init = msi_init,
+ };
+
+-/*
+- * The file "conf/ldscript.amd64" defines the symbol "kernphys". Its value is
+- * the physical address at which the kernel is loaded.
+- */
+-extern char kernphys[];
+-
+ struct msgbuf *msgbufp;
+
+ /*
+@@ -670,7 +673,7 @@
+ struct gate_descriptor *idt = &idt0[0]; /* interrupt descriptor table */
+
+ static char dblfault_stack[PAGE_SIZE] __aligned(16);
+-
++static char mce0_stack[PAGE_SIZE] __aligned(16);
+ static char nmi0_stack[PAGE_SIZE] __aligned(16);
+ CTASSERT(sizeof(struct nmi_pcpu) == 16);
+
+@@ -824,13 +827,20 @@
+ IDTVEC(tss), IDTVEC(missing), IDTVEC(stk), IDTVEC(prot),
+ IDTVEC(page), IDTVEC(mchk), IDTVEC(rsvd), IDTVEC(fpu), IDTVEC(align),
+ IDTVEC(xmm), IDTVEC(dblfault),
++ IDTVEC(div_pti), IDTVEC(dbg_pti), IDTVEC(bpt_pti),
++ IDTVEC(ofl_pti), IDTVEC(bnd_pti), IDTVEC(ill_pti), IDTVEC(dna_pti),
++ IDTVEC(fpusegm_pti), IDTVEC(tss_pti), IDTVEC(missing_pti),
++ IDTVEC(stk_pti), IDTVEC(prot_pti), IDTVEC(page_pti),
++ IDTVEC(rsvd_pti), IDTVEC(fpu_pti), IDTVEC(align_pti),
++ IDTVEC(xmm_pti),
+ #ifdef KDTRACE_HOOKS
+- IDTVEC(dtrace_ret),
++ IDTVEC(dtrace_ret), IDTVEC(dtrace_ret_pti),
+ #endif
+ #ifdef XENHVM
+- IDTVEC(xen_intr_upcall),
++ IDTVEC(xen_intr_upcall), IDTVEC(xen_intr_upcall_pti),
+ #endif
+- IDTVEC(fast_syscall), IDTVEC(fast_syscall32);
++ IDTVEC(fast_syscall), IDTVEC(fast_syscall32),
++ IDTVEC(fast_syscall_pti);
+
+ #ifdef DDB
+ /*
+@@ -1523,6 +1533,23 @@
+ #endif
+ }
+
++/* Set up the fast syscall stuff */
++void
++amd64_conf_fast_syscall(void)
++{
++ uint64_t msr;
++
++ msr = rdmsr(MSR_EFER) | EFER_SCE;
++ wrmsr(MSR_EFER, msr);
++ wrmsr(MSR_LSTAR, pti ? (u_int64_t)IDTVEC(fast_syscall_pti) :
++ (u_int64_t)IDTVEC(fast_syscall));
++ wrmsr(MSR_CSTAR, (u_int64_t)IDTVEC(fast_syscall32));
++ msr = ((u_int64_t)GSEL(GCODE_SEL, SEL_KPL) << 32) |
++ ((u_int64_t)GSEL(GUCODE32_SEL, SEL_UPL) << 48);
++ wrmsr(MSR_STAR, msr);
++ wrmsr(MSR_SF_MASK, PSL_NT | PSL_T | PSL_I | PSL_C | PSL_D);
++}
++
+ u_int64_t
+ hammer_time(u_int64_t modulep, u_int64_t physfree)
+ {
+@@ -1531,7 +1558,7 @@
+ struct pcpu *pc;
+ struct nmi_pcpu *np;
+ struct xstate_hdr *xhdr;
+- u_int64_t msr;
++ u_int64_t rsp0;
+ char *env;
+ size_t kstack0_sz;
+ int late_console;
+@@ -1544,6 +1571,8 @@
+
+ kmdp = init_ops.parse_preload_data(modulep);
+
++ identify_cpu1();
++
+ /* Init basic tunables, hz etc */
+ init_param1();
+
+@@ -1600,34 +1629,55 @@
+ mtx_init(&dt_lock, "descriptor tables", NULL, MTX_DEF);
+
+ /* exceptions */
++ pti = pti_get_default();
++ TUNABLE_INT_FETCH("vm.pmap.pti", &pti);
++
+ for (x = 0; x < NIDT; x++)
+- setidt(x, &IDTVEC(rsvd), SDT_SYSIGT, SEL_KPL, 0);
+- setidt(IDT_DE, &IDTVEC(div), SDT_SYSIGT, SEL_KPL, 0);
+- setidt(IDT_DB, &IDTVEC(dbg), SDT_SYSIGT, SEL_KPL, 0);
++ setidt(x, pti ? &IDTVEC(rsvd_pti) : &IDTVEC(rsvd), SDT_SYSIGT,
++ SEL_KPL, 0);
++ setidt(IDT_DE, pti ? &IDTVEC(div_pti) : &IDTVEC(div), SDT_SYSIGT,
++ SEL_KPL, 0);
++ setidt(IDT_DB, pti ? &IDTVEC(dbg_pti) : &IDTVEC(dbg), SDT_SYSIGT,
++ SEL_KPL, 0);
+ setidt(IDT_NMI, &IDTVEC(nmi), SDT_SYSIGT, SEL_KPL, 2);
+- setidt(IDT_BP, &IDTVEC(bpt), SDT_SYSIGT, SEL_UPL, 0);
+- setidt(IDT_OF, &IDTVEC(ofl), SDT_SYSIGT, SEL_KPL, 0);
+- setidt(IDT_BR, &IDTVEC(bnd), SDT_SYSIGT, SEL_KPL, 0);
+- setidt(IDT_UD, &IDTVEC(ill), SDT_SYSIGT, SEL_KPL, 0);
+- setidt(IDT_NM, &IDTVEC(dna), SDT_SYSIGT, SEL_KPL, 0);
++ setidt(IDT_BP, pti ? &IDTVEC(bpt_pti) : &IDTVEC(bpt), SDT_SYSIGT,
++ SEL_UPL, 0);
++ setidt(IDT_OF, pti ? &IDTVEC(ofl_pti) : &IDTVEC(ofl), SDT_SYSIGT,
++ SEL_KPL, 0);
++ setidt(IDT_BR, pti ? &IDTVEC(bnd_pti) : &IDTVEC(bnd), SDT_SYSIGT,
++ SEL_KPL, 0);
++ setidt(IDT_UD, pti ? &IDTVEC(ill_pti) : &IDTVEC(ill), SDT_SYSIGT,
++ SEL_KPL, 0);
++ setidt(IDT_NM, pti ? &IDTVEC(dna_pti) : &IDTVEC(dna), SDT_SYSIGT,
++ SEL_KPL, 0);
+ setidt(IDT_DF, &IDTVEC(dblfault), SDT_SYSIGT, SEL_KPL, 1);
+- setidt(IDT_FPUGP, &IDTVEC(fpusegm), SDT_SYSIGT, SEL_KPL, 0);
+- setidt(IDT_TS, &IDTVEC(tss), SDT_SYSIGT, SEL_KPL, 0);
+- setidt(IDT_NP, &IDTVEC(missing), SDT_SYSIGT, SEL_KPL, 0);
+- setidt(IDT_SS, &IDTVEC(stk), SDT_SYSIGT, SEL_KPL, 0);
+- setidt(IDT_GP, &IDTVEC(prot), SDT_SYSIGT, SEL_KPL, 0);
+- setidt(IDT_PF, &IDTVEC(page), SDT_SYSIGT, SEL_KPL, 0);
+- setidt(IDT_MF, &IDTVEC(fpu), SDT_SYSIGT, SEL_KPL, 0);
+- setidt(IDT_AC, &IDTVEC(align), SDT_SYSIGT, SEL_KPL, 0);
+- setidt(IDT_MC, &IDTVEC(mchk), SDT_SYSIGT, SEL_KPL, 0);
+- setidt(IDT_XF, &IDTVEC(xmm), SDT_SYSIGT, SEL_KPL, 0);
++ setidt(IDT_FPUGP, pti ? &IDTVEC(fpusegm_pti) : &IDTVEC(fpusegm),
++ SDT_SYSIGT, SEL_KPL, 0);
++ setidt(IDT_TS, pti ? &IDTVEC(tss_pti) : &IDTVEC(tss), SDT_SYSIGT,
++ SEL_KPL, 0);
++ setidt(IDT_NP, pti ? &IDTVEC(missing_pti) : &IDTVEC(missing),
++ SDT_SYSIGT, SEL_KPL, 0);
++ setidt(IDT_SS, pti ? &IDTVEC(stk_pti) : &IDTVEC(stk), SDT_SYSIGT,
++ SEL_KPL, 0);
++ setidt(IDT_GP, pti ? &IDTVEC(prot_pti) : &IDTVEC(prot), SDT_SYSIGT,
++ SEL_KPL, 0);
++ setidt(IDT_PF, pti ? &IDTVEC(page_pti) : &IDTVEC(page), SDT_SYSIGT,
++ SEL_KPL, 0);
++ setidt(IDT_MF, pti ? &IDTVEC(fpu_pti) : &IDTVEC(fpu), SDT_SYSIGT,
++ SEL_KPL, 0);
++ setidt(IDT_AC, pti ? &IDTVEC(align_pti) : &IDTVEC(align), SDT_SYSIGT,
++ SEL_KPL, 0);
++ setidt(IDT_MC, &IDTVEC(mchk), SDT_SYSIGT, SEL_KPL, 3);
++ setidt(IDT_XF, pti ? &IDTVEC(xmm_pti) : &IDTVEC(xmm), SDT_SYSIGT,
++ SEL_KPL, 0);
+ #ifdef KDTRACE_HOOKS
+- setidt(IDT_DTRACE_RET, &IDTVEC(dtrace_ret), SDT_SYSIGT, SEL_UPL, 0);
++ setidt(IDT_DTRACE_RET, pti ? &IDTVEC(dtrace_ret_pti) :
++ &IDTVEC(dtrace_ret), SDT_SYSIGT, SEL_UPL, 0);
+ #endif
+ #ifdef XENHVM
+- setidt(IDT_EVTCHN, &IDTVEC(xen_intr_upcall), SDT_SYSIGT, SEL_UPL, 0);
++ setidt(IDT_EVTCHN, pti ? &IDTVEC(xen_intr_upcall_pti) :
++ &IDTVEC(xen_intr_upcall), SDT_SYSIGT, SEL_KPL, 0);
+ #endif
+-
+ r_idt.rd_limit = sizeof(idt0) - 1;
+ r_idt.rd_base = (long) idt;
+ lidt(&r_idt);
+@@ -1648,7 +1698,7 @@
+ != NULL)
+ vty_set_preferred(VTY_VT);
+
+- identify_cpu(); /* Final stage of CPU initialization */
++ finishidentcpu(); /* Final stage of CPU initialization */
+ initializecpu(); /* Initialize CPU registers */
+ initializecpucache();
+
+@@ -1663,6 +1713,14 @@
+ np->np_pcpu = (register_t) pc;
+ common_tss[0].tss_ist2 = (long) np;
+
++ /*
++ * MC# stack, runs on ist3. The pcpu pointer is stored just
++ * above the start of the ist3 stack.
++ */
++ np = ((struct nmi_pcpu *) &mce0_stack[sizeof(mce0_stack)]) - 1;
++ np->np_pcpu = (register_t) pc;
++ common_tss[0].tss_ist3 = (long) np;
++
+ /* Set the IO permission bitmap (empty due to tss seg limit) */
+ common_tss[0].tss_iobase = sizeof(struct amd64tss) + IOPERM_BITMAP_SIZE;
+
+@@ -1669,15 +1727,7 @@
+ gsel_tss = GSEL(GPROC0_SEL, SEL_KPL);
+ ltr(gsel_tss);
+
+- /* Set up the fast syscall stuff */
+- msr = rdmsr(MSR_EFER) | EFER_SCE;
+- wrmsr(MSR_EFER, msr);
+- wrmsr(MSR_LSTAR, (u_int64_t)IDTVEC(fast_syscall));
+- wrmsr(MSR_CSTAR, (u_int64_t)IDTVEC(fast_syscall32));
+- msr = ((u_int64_t)GSEL(GCODE_SEL, SEL_KPL) << 32) |
+- ((u_int64_t)GSEL(GUCODE32_SEL, SEL_UPL) << 48);
+- wrmsr(MSR_STAR, msr);
+- wrmsr(MSR_SF_MASK, PSL_NT|PSL_T|PSL_I|PSL_C|PSL_D);
++ amd64_conf_fast_syscall();
+
+ /*
+ * Temporary forge some valid pointer to PCB, for exception
+@@ -1749,10 +1799,12 @@
+ xhdr->xstate_bv = xsave_mask;
+ }
+ /* make an initial tss so cpu can get interrupt stack on syscall! */
+- common_tss[0].tss_rsp0 = (vm_offset_t)thread0.td_pcb;
++ rsp0 = (vm_offset_t)thread0.td_pcb;
+ /* Ensure the stack is aligned to 16 bytes */
+- common_tss[0].tss_rsp0 &= ~0xFul;
+- PCPU_SET(rsp0, common_tss[0].tss_rsp0);
++ rsp0 &= ~0xFul;
++ common_tss[0].tss_rsp0 = pti ? ((vm_offset_t)PCPU_PTR(pti_stack) +
++ PC_PTI_STACK_SZ * sizeof(uint64_t)) & ~0xful : rsp0;
++ PCPU_SET(rsp0, rsp0);
+ PCPU_SET(curpcb, thread0.td_pcb);
+
+ /* transfer to user mode */
+@@ -1782,6 +1834,8 @@
+ #endif
+ thread0.td_critnest = 0;
+
++ TUNABLE_INT_FETCH("hw.ibrs_disable", &hw_ibrs_disable);
++
+ /* Location of kernel stack for locore */
+ return ((u_int64_t)thread0.td_pcb);
+ }
+--- sys/amd64/amd64/mp_machdep.c.orig
++++ sys/amd64/amd64/mp_machdep.c
+@@ -85,10 +85,9 @@
+
+ /* Temporary variables for init_secondary() */
+ char *doublefault_stack;
++char *mce_stack;
+ char *nmi_stack;
+
+-extern inthand_t IDTVEC(fast_syscall), IDTVEC(fast_syscall32);
+-
+ /*
+ * Local data and functions.
+ */
+@@ -132,33 +131,50 @@
+ /* Install an inter-CPU IPI for TLB invalidation */
+ if (pmap_pcid_enabled) {
+ if (invpcid_works) {
+- setidt(IPI_INVLTLB, IDTVEC(invltlb_invpcid),
+- SDT_SYSIGT, SEL_KPL, 0);
++ setidt(IPI_INVLTLB, pti ?
++ IDTVEC(invltlb_invpcid_pti_pti) :
++ IDTVEC(invltlb_invpcid_nopti), SDT_SYSIGT,
++ SEL_KPL, 0);
++ setidt(IPI_INVLPG, pti ? IDTVEC(invlpg_invpcid_pti) :
++ IDTVEC(invlpg_invpcid), SDT_SYSIGT, SEL_KPL, 0);
++ setidt(IPI_INVLRNG, pti ? IDTVEC(invlrng_invpcid_pti) :
++ IDTVEC(invlrng_invpcid), SDT_SYSIGT, SEL_KPL, 0);
+ } else {
+- setidt(IPI_INVLTLB, IDTVEC(invltlb_pcid), SDT_SYSIGT,
+- SEL_KPL, 0);
++ setidt(IPI_INVLTLB, pti ? IDTVEC(invltlb_pcid_pti) :
++ IDTVEC(invltlb_pcid), SDT_SYSIGT, SEL_KPL, 0);
++ setidt(IPI_INVLPG, pti ? IDTVEC(invlpg_pcid_pti) :
++ IDTVEC(invlpg_pcid), SDT_SYSIGT, SEL_KPL, 0);
++ setidt(IPI_INVLRNG, pti ? IDTVEC(invlrng_pcid_pti) :
++ IDTVEC(invlrng_pcid), SDT_SYSIGT, SEL_KPL, 0);
+ }
+ } else {
+- setidt(IPI_INVLTLB, IDTVEC(invltlb), SDT_SYSIGT, SEL_KPL, 0);
++ setidt(IPI_INVLTLB, pti ? IDTVEC(invltlb_pti) : IDTVEC(invltlb),
++ SDT_SYSIGT, SEL_KPL, 0);
++ setidt(IPI_INVLPG, pti ? IDTVEC(invlpg_pti) : IDTVEC(invlpg),
++ SDT_SYSIGT, SEL_KPL, 0);
++ setidt(IPI_INVLRNG, pti ? IDTVEC(invlrng_pti) : IDTVEC(invlrng),
++ SDT_SYSIGT, SEL_KPL, 0);
+ }
+- setidt(IPI_INVLPG, IDTVEC(invlpg), SDT_SYSIGT, SEL_KPL, 0);
+- setidt(IPI_INVLRNG, IDTVEC(invlrng), SDT_SYSIGT, SEL_KPL, 0);
+
+ /* Install an inter-CPU IPI for cache invalidation. */
+- setidt(IPI_INVLCACHE, IDTVEC(invlcache), SDT_SYSIGT, SEL_KPL, 0);
++ setidt(IPI_INVLCACHE, pti ? IDTVEC(invlcache_pti) : IDTVEC(invlcache),
++ SDT_SYSIGT, SEL_KPL, 0);
+
+ /* Install an inter-CPU IPI for all-CPU rendezvous */
+- setidt(IPI_RENDEZVOUS, IDTVEC(rendezvous), SDT_SYSIGT, SEL_KPL, 0);
++ setidt(IPI_RENDEZVOUS, pti ? IDTVEC(rendezvous_pti) :
++ IDTVEC(rendezvous), SDT_SYSIGT, SEL_KPL, 0);
+
+ /* Install generic inter-CPU IPI handler */
+- setidt(IPI_BITMAP_VECTOR, IDTVEC(ipi_intr_bitmap_handler),
+- SDT_SYSIGT, SEL_KPL, 0);
++ setidt(IPI_BITMAP_VECTOR, pti ? IDTVEC(ipi_intr_bitmap_handler_pti) :
++ IDTVEC(ipi_intr_bitmap_handler), SDT_SYSIGT, SEL_KPL, 0);
+
+ /* Install an inter-CPU IPI for CPU stop/restart */
+- setidt(IPI_STOP, IDTVEC(cpustop), SDT_SYSIGT, SEL_KPL, 0);
++ setidt(IPI_STOP, pti ? IDTVEC(cpustop_pti) : IDTVEC(cpustop),
++ SDT_SYSIGT, SEL_KPL, 0);
+
+ /* Install an inter-CPU IPI for CPU suspend/resume */
+- setidt(IPI_SUSPEND, IDTVEC(cpususpend), SDT_SYSIGT, SEL_KPL, 0);
++ setidt(IPI_SUSPEND, pti ? IDTVEC(cpususpend_pti) : IDTVEC(cpususpend),
++ SDT_SYSIGT, SEL_KPL, 0);
+
+ /* Set boot_cpu_id if needed. */
+ if (boot_cpu_id == -1) {
+@@ -188,7 +204,7 @@
+ {
+ struct pcpu *pc;
+ struct nmi_pcpu *np;
+- u_int64_t msr, cr0;
++ u_int64_t cr0;
+ int cpu, gsel_tss, x;
+ struct region_descriptor ap_gdt;
+
+@@ -197,7 +213,6 @@
+
+ /* Init tss */
+ common_tss[cpu] = common_tss[0];
+- common_tss[cpu].tss_rsp0 = 0; /* not used until after switch */
+ common_tss[cpu].tss_iobase = sizeof(struct amd64tss) +
+ IOPERM_BITMAP_SIZE;
+ common_tss[cpu].tss_ist1 = (long)&doublefault_stack[PAGE_SIZE];
+@@ -206,6 +221,10 @@
+ np = ((struct nmi_pcpu *) &nmi_stack[PAGE_SIZE]) - 1;
+ common_tss[cpu].tss_ist2 = (long) np;
+
++ /* The MC# stack runs on IST3. */
++ np = ((struct nmi_pcpu *) &mce_stack[PAGE_SIZE]) - 1;
++ common_tss[cpu].tss_ist3 = (long) np;
++
+ /* Prepare private GDT */
+ gdt_segs[GPROC0_SEL].ssd_base = (long) &common_tss[cpu];
+ for (x = 0; x < NGDT; x++) {
+@@ -240,10 +259,17 @@
+ pc->pc_curpmap = kernel_pmap;
+ pc->pc_pcid_gen = 1;
+ pc->pc_pcid_next = PMAP_PCID_KERN + 1;
++ common_tss[cpu].tss_rsp0 = pti ? ((vm_offset_t)&pc->pc_pti_stack +
++ PC_PTI_STACK_SZ * sizeof(uint64_t)) & ~0xful : 0;
+
+ /* Save the per-cpu pointer for use by the NMI handler. */
++ np = ((struct nmi_pcpu *) &nmi_stack[PAGE_SIZE]) - 1;
+ np->np_pcpu = (register_t) pc;
+
++ /* Save the per-cpu pointer for use by the MC# handler. */
++ np = ((struct nmi_pcpu *) &mce_stack[PAGE_SIZE]) - 1;
++ np->np_pcpu = (register_t) pc;
++
+ wrmsr(MSR_FSBASE, 0); /* User value */
+ wrmsr(MSR_GSBASE, (u_int64_t)pc);
+ wrmsr(MSR_KGSBASE, (u_int64_t)pc); /* XXX User value while we're in the kernel */
+@@ -263,15 +289,7 @@
+ cr0 &= ~(CR0_CD | CR0_NW | CR0_EM);
+ load_cr0(cr0);
+
+- /* Set up the fast syscall stuff */
+- msr = rdmsr(MSR_EFER) | EFER_SCE;
+- wrmsr(MSR_EFER, msr);
+- wrmsr(MSR_LSTAR, (u_int64_t)IDTVEC(fast_syscall));
+- wrmsr(MSR_CSTAR, (u_int64_t)IDTVEC(fast_syscall32));
+- msr = ((u_int64_t)GSEL(GCODE_SEL, SEL_KPL) << 32) |
+- ((u_int64_t)GSEL(GUCODE32_SEL, SEL_UPL) << 48);
+- wrmsr(MSR_STAR, msr);
+- wrmsr(MSR_SF_MASK, PSL_NT|PSL_T|PSL_I|PSL_C|PSL_D);
++ amd64_conf_fast_syscall();
+
+ /* signal our startup to the BSP. */
+ mp_naps++;
+@@ -346,6 +364,8 @@
+ kstack_pages * PAGE_SIZE, M_WAITOK | M_ZERO);
+ doublefault_stack = (char *)kmem_malloc(kernel_arena,
+ PAGE_SIZE, M_WAITOK | M_ZERO);
++ mce_stack = (char *)kmem_malloc(kernel_arena, PAGE_SIZE,
++ M_WAITOK | M_ZERO);
+ nmi_stack = (char *)kmem_malloc(kernel_arena, PAGE_SIZE,
+ M_WAITOK | M_ZERO);
+ dpcpu = (void *)kmem_malloc(kernel_arena, DPCPU_SIZE,
+@@ -428,9 +448,43 @@
+ }
+
+ void
++invltlb_invpcid_pti_handler(void)
++{
++ struct invpcid_descr d;
++ uint32_t generation;
++
++#ifdef COUNT_XINVLTLB_HITS
++ xhits_gbl[PCPU_GET(cpuid)]++;
++#endif /* COUNT_XINVLTLB_HITS */
++#ifdef COUNT_IPIS
++ (*ipi_invltlb_counts[PCPU_GET(cpuid)])++;
++#endif /* COUNT_IPIS */
++
++ generation = smp_tlb_generation;
++ d.pcid = smp_tlb_pmap->pm_pcids[PCPU_GET(cpuid)].pm_pcid;
++ d.pad = 0;
++ d.addr = 0;
++ if (smp_tlb_pmap == kernel_pmap) {
++ /*
++ * This invalidation actually needs to clear kernel
++ * mappings from the TLB in the current pmap, but
++ * since we were asked for the flush in the kernel
++ * pmap, achieve it by performing global flush.
++ */
++ invpcid(&d, INVPCID_CTXGLOB);
++ } else {
++ invpcid(&d, INVPCID_CTX);
++ d.pcid |= PMAP_PCID_USER_PT;
++ invpcid(&d, INVPCID_CTX);
++ }
++ PCPU_SET(smp_tlb_done, generation);
++}
++
++void
+ invltlb_pcid_handler(void)
+ {
+- uint32_t generation;
++ uint64_t kcr3, ucr3;
++ uint32_t generation, pcid;
+
+ #ifdef COUNT_XINVLTLB_HITS
+ xhits_gbl[PCPU_GET(cpuid)]++;
+@@ -451,9 +505,132 @@
+ * CPU.
+ */
+ if (PCPU_GET(curpmap) == smp_tlb_pmap) {
+- load_cr3(smp_tlb_pmap->pm_cr3 |
+- smp_tlb_pmap->pm_pcids[PCPU_GET(cpuid)].pm_pcid);
++ pcid = smp_tlb_pmap->pm_pcids[PCPU_GET(cpuid)].pm_pcid;
++ kcr3 = smp_tlb_pmap->pm_cr3 | pcid;
++ ucr3 = smp_tlb_pmap->pm_ucr3;
++ if (ucr3 != PMAP_NO_CR3) {
++ ucr3 |= PMAP_PCID_USER_PT | pcid;
++ pmap_pti_pcid_invalidate(ucr3, kcr3);
++ } else
++ load_cr3(kcr3);
+ }
+ }
+ PCPU_SET(smp_tlb_done, generation);
+ }
++
++void
++invlpg_invpcid_handler(void)
++{
++ struct invpcid_descr d;
++ uint32_t generation;
++
++#ifdef COUNT_XINVLTLB_HITS
++ xhits_pg[PCPU_GET(cpuid)]++;
++#endif /* COUNT_XINVLTLB_HITS */
++#ifdef COUNT_IPIS
++ (*ipi_invlpg_counts[PCPU_GET(cpuid)])++;
++#endif /* COUNT_IPIS */
++
++ generation = smp_tlb_generation; /* Overlap with serialization */
++ invlpg(smp_tlb_addr1);
++ if (smp_tlb_pmap->pm_ucr3 != PMAP_NO_CR3) {
++ d.pcid = smp_tlb_pmap->pm_pcids[PCPU_GET(cpuid)].pm_pcid |
++ PMAP_PCID_USER_PT;
++ d.pad = 0;
++ d.addr = smp_tlb_addr1;
++ invpcid(&d, INVPCID_ADDR);
++ }
++ PCPU_SET(smp_tlb_done, generation);
++}
++
++void
++invlpg_pcid_handler(void)
++{
++ uint64_t kcr3, ucr3;
++ uint32_t generation;
++ uint32_t pcid;
++
++#ifdef COUNT_XINVLTLB_HITS
++ xhits_pg[PCPU_GET(cpuid)]++;
++#endif /* COUNT_XINVLTLB_HITS */
++#ifdef COUNT_IPIS
++ (*ipi_invlpg_counts[PCPU_GET(cpuid)])++;
++#endif /* COUNT_IPIS */
++
++ generation = smp_tlb_generation; /* Overlap with serialization */
++ invlpg(smp_tlb_addr1);
++ if (smp_tlb_pmap == PCPU_GET(curpmap) &&
++ (ucr3 = smp_tlb_pmap->pm_ucr3) != PMAP_NO_CR3) {
++ pcid = smp_tlb_pmap->pm_pcids[PCPU_GET(cpuid)].pm_pcid;
++ kcr3 = smp_tlb_pmap->pm_cr3 | pcid | CR3_PCID_SAVE;
++ ucr3 |= pcid | PMAP_PCID_USER_PT | CR3_PCID_SAVE;
++ pmap_pti_pcid_invlpg(ucr3, kcr3, smp_tlb_addr1);
++ }
++ PCPU_SET(smp_tlb_done, generation);
++}
++
++void
++invlrng_invpcid_handler(void)
++{
++ struct invpcid_descr d;
++ vm_offset_t addr, addr2;
++ uint32_t generation;
++
++#ifdef COUNT_XINVLTLB_HITS
++ xhits_rng[PCPU_GET(cpuid)]++;
++#endif /* COUNT_XINVLTLB_HITS */
++#ifdef COUNT_IPIS
++ (*ipi_invlrng_counts[PCPU_GET(cpuid)])++;
++#endif /* COUNT_IPIS */
++
++ addr = smp_tlb_addr1;
++ addr2 = smp_tlb_addr2;
++ generation = smp_tlb_generation; /* Overlap with serialization */
++ do {
++ invlpg(addr);
++ addr += PAGE_SIZE;
++ } while (addr < addr2);
++ if (smp_tlb_pmap->pm_ucr3 != PMAP_NO_CR3) {
++ d.pcid = smp_tlb_pmap->pm_pcids[PCPU_GET(cpuid)].pm_pcid |
++ PMAP_PCID_USER_PT;
++ d.pad = 0;
++ d.addr = smp_tlb_addr1;
++ do {
++ invpcid(&d, INVPCID_ADDR);
++ d.addr += PAGE_SIZE;
++ } while (d.addr < addr2);
++ }
++ PCPU_SET(smp_tlb_done, generation);
++}
++
++void
++invlrng_pcid_handler(void)
++{
++ vm_offset_t addr, addr2;
++ uint64_t kcr3, ucr3;
++ uint32_t generation;
++ uint32_t pcid;
++
++#ifdef COUNT_XINVLTLB_HITS
++ xhits_rng[PCPU_GET(cpuid)]++;
++#endif /* COUNT_XINVLTLB_HITS */
++#ifdef COUNT_IPIS
++ (*ipi_invlrng_counts[PCPU_GET(cpuid)])++;
++#endif /* COUNT_IPIS */
++
++ addr = smp_tlb_addr1;
++ addr2 = smp_tlb_addr2;
++ generation = smp_tlb_generation; /* Overlap with serialization */
++ do {
++ invlpg(addr);
++ addr += PAGE_SIZE;
++ } while (addr < addr2);
++ if (smp_tlb_pmap == PCPU_GET(curpmap) &&
++ (ucr3 = smp_tlb_pmap->pm_ucr3) != PMAP_NO_CR3) {
++ pcid = smp_tlb_pmap->pm_pcids[PCPU_GET(cpuid)].pm_pcid;
++ kcr3 = smp_tlb_pmap->pm_cr3 | pcid | CR3_PCID_SAVE;
++ ucr3 |= pcid | PMAP_PCID_USER_PT | CR3_PCID_SAVE;
++ pmap_pti_pcid_invlrng(ucr3, kcr3, smp_tlb_addr1, addr2);
++ }
++ PCPU_SET(smp_tlb_done, generation);
++}
+--- sys/amd64/amd64/pmap.c.orig
++++ sys/amd64/amd64/pmap.c
+@@ -9,11 +9,17 @@
+ * All rights reserved.
+ * Copyright (c) 2005-2010 Alan L. Cox
+ * All rights reserved.
++ * Copyright (c) 2014-2018 The FreeBSD Foundation
++ * All rights reserved.
+ *
+ * This code is derived from software contributed to Berkeley by
+ * the Systems Programming Group of the University of Utah Computer
+ * Science Department and William Jolitz of UUNET Technologies Inc.
+ *
++ * Portions of this software were developed by
++ * Konstantin Belousov under sponsorship from
++ * the FreeBSD Foundation.
++ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+@@ -147,6 +153,7 @@
+ #ifdef SMP
+ #include
+ #endif
++#include
+
+ static __inline boolean_t
+ pmap_type_guest(pmap_t pmap)
+@@ -208,6 +215,8 @@
+ return (mask);
+ }
+
++static pt_entry_t pg_g;
++
+ static __inline pt_entry_t
+ pmap_global_bit(pmap_t pmap)
+ {
+@@ -215,7 +224,7 @@
+
+ switch (pmap->pm_type) {
+ case PT_X86:
+- mask = X86_PG_G;
++ mask = pg_g;
+ break;
+ case PT_RVI:
+ case PT_EPT:
+@@ -405,6 +414,15 @@
+ SYSCTL_INT(_vm_pmap, OID_AUTO, invpcid_works, CTLFLAG_RD, &invpcid_works, 0,
+ "Is the invpcid instruction available ?");
+
++int pti = 0;
++SYSCTL_INT(_vm_pmap, OID_AUTO, pti, CTLFLAG_RDTUN | CTLFLAG_NOFETCH,
++ &pti, 0,
++ "Page Table Isolation enabled");
++static vm_object_t pti_obj;
++static pml4_entry_t *pti_pml4;
++static vm_pindex_t pti_pg_idx;
++static bool pti_finalized;
++
+ static int
+ pmap_pcid_save_cnt_proc(SYSCTL_HANDLER_ARGS)
+ {
+@@ -622,6 +640,11 @@
+ static boolean_t pmap_protect_pde(pmap_t pmap, pd_entry_t *pde, vm_offset_t sva,
+ vm_prot_t prot);
+ static void pmap_pte_attr(pt_entry_t *pte, int cache_bits, int mask);
++static void pmap_pti_add_kva_locked(vm_offset_t sva, vm_offset_t eva,
++ bool exec);
++static pdp_entry_t *pmap_pti_pdpe(vm_offset_t va);
++static pd_entry_t *pmap_pti_pde(vm_offset_t va);
++static void pmap_pti_wire_pte(void *pte);
+ static int pmap_remove_pde(pmap_t pmap, pd_entry_t *pdq, vm_offset_t sva,
+ struct spglist *free, struct rwlock **lockp);
+ static int pmap_remove_pte(pmap_t pmap, pt_entry_t *ptq, vm_offset_t sva,
+@@ -901,7 +924,7 @@
+ /* XXX not fully used, underneath 2M pages */
+ pt_p = (pt_entry_t *)KPTphys;
+ for (i = 0; ptoa(i) < *firstaddr; i++)
+- pt_p[i] = ptoa(i) | X86_PG_RW | X86_PG_V | X86_PG_G;
++ pt_p[i] = ptoa(i) | X86_PG_RW | X86_PG_V | pg_g;
+
+ /* Now map the page tables at their location within PTmap */
+ pd_p = (pd_entry_t *)KPDphys;
+@@ -912,7 +935,7 @@
+ /* This replaces some of the KPTphys entries above */
+ for (i = 0; (i << PDRSHIFT) < *firstaddr; i++)
+ pd_p[i] = (i << PDRSHIFT) | X86_PG_RW | X86_PG_V | PG_PS |
+- X86_PG_G;
++ pg_g;
+
+ /* And connect up the PD to the PDP (leaving room for L4 pages) */
+ pdp_p = (pdp_entry_t *)(KPDPphys + ptoa(KPML4I - KPML4BASE));
+@@ -932,7 +955,7 @@
+ for (i = NPDEPG * ndm1g, j = 0; i < NPDEPG * ndmpdp; i++, j++) {
+ pd_p[j] = (vm_paddr_t)i << PDRSHIFT;
+ /* Preset PG_M and PG_A because demotion expects it. */
+- pd_p[j] |= X86_PG_RW | X86_PG_V | PG_PS | X86_PG_G |
++ pd_p[j] |= X86_PG_RW | X86_PG_V | PG_PS | pg_g |
+ X86_PG_M | X86_PG_A;
+ }
+ pdp_p = (pdp_entry_t *)DMPDPphys;
+@@ -939,7 +962,7 @@
+ for (i = 0; i < ndm1g; i++) {
+ pdp_p[i] = (vm_paddr_t)i << PDPSHIFT;
+ /* Preset PG_M and PG_A because demotion expects it. */
+- pdp_p[i] |= X86_PG_RW | X86_PG_V | PG_PS | X86_PG_G |
++ pdp_p[i] |= X86_PG_RW | X86_PG_V | PG_PS | pg_g |
+ X86_PG_M | X86_PG_A;
+ }
+ for (j = 0; i < ndmpdp; i++, j++) {
+@@ -982,6 +1005,9 @@
+ pt_entry_t *pte;
+ int i;
+
++ if (!pti)
++ pg_g = X86_PG_G;
++
+ /*
+ * Create an initial set of page tables to run the kernel in.
+ */
+@@ -1014,6 +1040,7 @@
+ PMAP_LOCK_INIT(kernel_pmap);
+ kernel_pmap->pm_pml4 = (pdp_entry_t *)PHYS_TO_DMAP(KPML4phys);
+ kernel_pmap->pm_cr3 = KPML4phys;
++ kernel_pmap->pm_ucr3 = PMAP_NO_CR3;
+ CPU_FILL(&kernel_pmap->pm_active); /* don't allow deactivation */
+ TAILQ_INIT(&kernel_pmap->pm_pvchunk);
+ kernel_pmap->pm_flags = pmap_flags;
+@@ -1528,6 +1555,9 @@
+ pmap_invalidate_page(pmap_t pmap, vm_offset_t va)
+ {
+ cpuset_t *mask;
++ struct invpcid_descr d;
++ uint64_t kcr3, ucr3;
++ uint32_t pcid;
+ u_int cpuid, i;
+
+ if (pmap_type_guest(pmap)) {
+@@ -1544,9 +1574,32 @@
+ mask = &all_cpus;
+ } else {
+ cpuid = PCPU_GET(cpuid);
+- if (pmap == PCPU_GET(curpmap))
++ if (pmap == PCPU_GET(curpmap)) {
+ invlpg(va);
+- else if (pmap_pcid_enabled)
++ if (pmap_pcid_enabled && pmap->pm_ucr3 != PMAP_NO_CR3) {
++ /*
++ * Disable context switching. pm_pcid
++ * is recalculated on switch, which
++ * might make us use wrong pcid below.
++ */
++ critical_enter();
++ pcid = pmap->pm_pcids[cpuid].pm_pcid;
++
++ if (invpcid_works) {
++ d.pcid = pcid | PMAP_PCID_USER_PT;
++ d.pad = 0;
++ d.addr = va;
++ invpcid(&d, INVPCID_ADDR);
++ } else {
++ kcr3 = pmap->pm_cr3 | pcid |
++ CR3_PCID_SAVE;
++ ucr3 = pmap->pm_ucr3 | pcid |
++ PMAP_PCID_USER_PT | CR3_PCID_SAVE;
++ pmap_pti_pcid_invlpg(ucr3, kcr3, va);
++ }
++ critical_exit();
++ }
++ } else if (pmap_pcid_enabled)
+ pmap->pm_pcids[cpuid].pm_gen = 0;
+ if (pmap_pcid_enabled) {
+ CPU_FOREACH(i) {
+@@ -1556,7 +1609,7 @@
+ }
+ mask = &pmap->pm_active;
+ }
+- smp_masked_invlpg(*mask, va);
++ smp_masked_invlpg(*mask, va, pmap);
+ sched_unpin();
+ }
+
+@@ -1567,7 +1620,10 @@
+ pmap_invalidate_range(pmap_t pmap, vm_offset_t sva, vm_offset_t eva)
+ {
+ cpuset_t *mask;
++ struct invpcid_descr d;
+ vm_offset_t addr;
++ uint64_t kcr3, ucr3;
++ uint32_t pcid;
+ u_int cpuid, i;
+
+ if (eva - sva >= PMAP_INVLPG_THRESHOLD) {
+@@ -1593,6 +1649,26 @@
+ if (pmap == PCPU_GET(curpmap)) {
+ for (addr = sva; addr < eva; addr += PAGE_SIZE)
+ invlpg(addr);
++ if (pmap_pcid_enabled && pmap->pm_ucr3 != PMAP_NO_CR3) {
++ critical_enter();
++ pcid = pmap->pm_pcids[cpuid].pm_pcid;
++ if (invpcid_works) {
++ d.pcid = pcid | PMAP_PCID_USER_PT;
++ d.pad = 0;
++ d.addr = sva;
++ for (; d.addr < eva; d.addr +=
++ PAGE_SIZE)
++ invpcid(&d, INVPCID_ADDR);
++ } else {
++ kcr3 = pmap->pm_cr3 | pcid |
++ CR3_PCID_SAVE;
++ ucr3 = pmap->pm_ucr3 | pcid |
++ PMAP_PCID_USER_PT | CR3_PCID_SAVE;
++ pmap_pti_pcid_invlrng(ucr3, kcr3, sva,
++ eva);
++ }
++ critical_exit();
++ }
+ } else if (pmap_pcid_enabled) {
+ pmap->pm_pcids[cpuid].pm_gen = 0;
+ }
+@@ -1604,7 +1680,7 @@
+ }
+ mask = &pmap->pm_active;
+ }
+- smp_masked_invlpg_range(*mask, sva, eva);
++ smp_masked_invlpg_range(*mask, sva, eva, pmap);
+ sched_unpin();
+ }
+
+@@ -1613,6 +1689,8 @@
+ {
+ cpuset_t *mask;
+ struct invpcid_descr d;
++ uint64_t kcr3, ucr3;
++ uint32_t pcid;
+ u_int cpuid, i;
+
+ if (pmap_type_guest(pmap)) {
+@@ -1636,15 +1714,29 @@
+ cpuid = PCPU_GET(cpuid);
+ if (pmap == PCPU_GET(curpmap)) {
+ if (pmap_pcid_enabled) {
++ critical_enter();
++ pcid = pmap->pm_pcids[cpuid].pm_pcid;
+ if (invpcid_works) {
+- d.pcid = pmap->pm_pcids[cpuid].pm_pcid;
++ d.pcid = pcid;
+ d.pad = 0;
+ d.addr = 0;
+ invpcid(&d, INVPCID_CTX);
++ if (pmap->pm_ucr3 != PMAP_NO_CR3) {
++ d.pcid |= PMAP_PCID_USER_PT;
++ invpcid(&d, INVPCID_CTX);
++ }
+ } else {
+- load_cr3(pmap->pm_cr3 | pmap->pm_pcids
+- [PCPU_GET(cpuid)].pm_pcid);
++ kcr3 = pmap->pm_cr3 | pcid;
++ ucr3 = pmap->pm_ucr3;
++ if (ucr3 != PMAP_NO_CR3) {
++ ucr3 |= pcid | PMAP_PCID_USER_PT;
++ pmap_pti_pcid_invalidate(ucr3,
++ kcr3);
++ } else {
++ load_cr3(kcr3);
++ }
+ }
++ critical_exit();
+ } else {
+ invltlb();
+ }
+@@ -1749,6 +1841,9 @@
+ void
+ pmap_invalidate_page(pmap_t pmap, vm_offset_t va)
+ {
++ struct invpcid_descr d;
++ uint64_t kcr3, ucr3;
++ uint32_t pcid;
+
+ if (pmap->pm_type == PT_RVI || pmap->pm_type == PT_EPT) {
+ pmap->pm_eptgen++;
+@@ -1757,9 +1852,26 @@
+ KASSERT(pmap->pm_type == PT_X86,
+ ("pmap_invalidate_range: unknown type %d", pmap->pm_type));
+
+- if (pmap == kernel_pmap || pmap == PCPU_GET(curpmap))
++ if (pmap == kernel_pmap || pmap == PCPU_GET(curpmap)) {
+ invlpg(va);
+- else if (pmap_pcid_enabled)
++ if (pmap == PCPU_GET(curpmap) && pmap_pcid_enabled &&
++ pmap->pm_ucr3 != PMAP_NO_CR3) {
++ critical_enter();
++ pcid = pmap->pm_pcids[0].pm_pcid;
++ if (invpcid_works) {
++ d.pcid = pcid | PMAP_PCID_USER_PT;
++ d.pad = 0;
++ d.addr = va;
++ invpcid(&d, INVPCID_ADDR);
++ } else {
++ kcr3 = pmap->pm_cr3 | pcid | CR3_PCID_SAVE;
++ ucr3 = pmap->pm_ucr3 | pcid |
++ PMAP_PCID_USER_PT | CR3_PCID_SAVE;
++ pmap_pti_pcid_invlpg(ucr3, kcr3, va);
++ }
++ critical_exit();
++ }
++ } else if (pmap_pcid_enabled)
+ pmap->pm_pcids[0].pm_gen = 0;
+ }
+
+@@ -1766,7 +1878,9 @@
+ void
+ pmap_invalidate_range(pmap_t pmap, vm_offset_t sva, vm_offset_t eva)
+ {
++ struct invpcid_descr d;
+ vm_offset_t addr;
++ uint64_t kcr3, ucr3;
+
+ if (pmap->pm_type == PT_RVI || pmap->pm_type == PT_EPT) {
+ pmap->pm_eptgen++;
+@@ -1778,6 +1892,25 @@
+ if (pmap == kernel_pmap || pmap == PCPU_GET(curpmap)) {
+ for (addr = sva; addr < eva; addr += PAGE_SIZE)
+ invlpg(addr);
++ if (pmap == PCPU_GET(curpmap) && pmap_pcid_enabled &&
++ pmap->pm_ucr3 != PMAP_NO_CR3) {
++ critical_enter();
++ if (invpcid_works) {
++ d.pcid = pmap->pm_pcids[0].pm_pcid |
++ PMAP_PCID_USER_PT;
++ d.pad = 0;
++ d.addr = sva;
++ for (; d.addr < eva; d.addr += PAGE_SIZE)
++ invpcid(&d, INVPCID_ADDR);
++ } else {
++ kcr3 = pmap->pm_cr3 | pmap->pm_pcids[0].
++ pm_pcid | CR3_PCID_SAVE;
++ ucr3 = pmap->pm_ucr3 | pmap->pm_pcids[0].
++ pm_pcid | PMAP_PCID_USER_PT | CR3_PCID_SAVE;
++ pmap_pti_pcid_invlrng(ucr3, kcr3, sva, eva);
++ }
++ critical_exit();
++ }
+ } else if (pmap_pcid_enabled) {
+ pmap->pm_pcids[0].pm_gen = 0;
+ }
+@@ -1787,6 +1920,7 @@
+ pmap_invalidate_all(pmap_t pmap)
+ {
+ struct invpcid_descr d;
++ uint64_t kcr3, ucr3;
+
+ if (pmap->pm_type == PT_RVI || pmap->pm_type == PT_EPT) {
+ pmap->pm_eptgen++;
+@@ -1804,15 +1938,26 @@
+ }
+ } else if (pmap == PCPU_GET(curpmap)) {
+ if (pmap_pcid_enabled) {
++ critical_enter();
+ if (invpcid_works) {
+ d.pcid = pmap->pm_pcids[0].pm_pcid;
+ d.pad = 0;
+ d.addr = 0;
+ invpcid(&d, INVPCID_CTX);
++ if (pmap->pm_ucr3 != PMAP_NO_CR3) {
++ d.pcid |= PMAP_PCID_USER_PT;
++ invpcid(&d, INVPCID_CTX);
++ }
+ } else {
+- load_cr3(pmap->pm_cr3 | pmap->pm_pcids[0].
+- pm_pcid);
++ kcr3 = pmap->pm_cr3 | pmap->pm_pcids[0].pm_pcid;
++ if (pmap->pm_ucr3 != PMAP_NO_CR3) {
++ ucr3 = pmap->pm_ucr3 | pmap->pm_pcids[
++ 0].pm_pcid | PMAP_PCID_USER_PT;
++ pmap_pti_pcid_invalidate(ucr3, kcr3);
++ } else
++ load_cr3(kcr3);
+ }
++ critical_exit();
+ } else {
+ invltlb();
+ }
+@@ -2094,7 +2239,7 @@
+ pt_entry_t *pte;
+
+ pte = vtopte(va);
+- pte_store(pte, pa | X86_PG_RW | X86_PG_V | X86_PG_G);
++ pte_store(pte, pa | X86_PG_RW | X86_PG_V | pg_g);
+ }
+
+ static __inline void
+@@ -2105,7 +2250,7 @@
+
+ pte = vtopte(va);
+ cache_bits = pmap_cache_bits(kernel_pmap, mode, 0);
+- pte_store(pte, pa | X86_PG_RW | X86_PG_V | X86_PG_G | cache_bits);
++ pte_store(pte, pa | X86_PG_RW | X86_PG_V | pg_g | cache_bits);
+ }
+
+ /*
+@@ -2165,7 +2310,7 @@
+ pa = VM_PAGE_TO_PHYS(m) | cache_bits;
+ if ((*pte & (PG_FRAME | X86_PG_PTE_CACHE)) != pa) {
+ oldpte |= *pte;
+- pte_store(pte, pa | X86_PG_G | X86_PG_RW | X86_PG_V);
++ pte_store(pte, pa | pg_g | X86_PG_RW | X86_PG_V);
+ }
+ pte++;
+ }
+@@ -2284,6 +2429,10 @@
+ pml4_entry_t *pml4;
+ pml4 = pmap_pml4e(pmap, va);
+ *pml4 = 0;
++ if (pmap->pm_pml4u != NULL && va <= VM_MAXUSER_ADDRESS) {
++ pml4 = &pmap->pm_pml4u[pmap_pml4e_index(va)];
++ *pml4 = 0;
++ }
+ } else if (m->pindex >= NUPDE) {
+ /* PD page */
+ pdp_entry_t *pdp;
+@@ -2349,7 +2498,10 @@
+
+ PMAP_LOCK_INIT(pmap);
+ pmap->pm_pml4 = (pml4_entry_t *)PHYS_TO_DMAP(KPML4phys);
++ pmap->pm_pml4u = NULL;
+ pmap->pm_cr3 = KPML4phys;
++ /* hack to keep pmap_pti_pcid_invalidate() alive */
++ pmap->pm_ucr3 = PMAP_NO_CR3;
+ pmap->pm_root.rt_root = 0;
+ CPU_ZERO(&pmap->pm_active);
+ TAILQ_INIT(&pmap->pm_pvchunk);
+@@ -2358,6 +2510,8 @@
+ CPU_FOREACH(i) {
+ pmap->pm_pcids[i].pm_pcid = PMAP_PCID_NONE;
+ pmap->pm_pcids[i].pm_gen = 0;
++ if (!pti)
++ __pcpu[i].pc_kcr3 = PMAP_NO_CR3;
+ }
+ PCPU_SET(curpmap, kernel_pmap);
+ pmap_activate(curthread);
+@@ -2387,6 +2541,17 @@
+ X86_PG_A | X86_PG_M;
+ }
+
++static void
++pmap_pinit_pml4_pti(vm_page_t pml4pg)
++{
++ pml4_entry_t *pm_pml4;
++ int i;
++
++ pm_pml4 = (pml4_entry_t *)PHYS_TO_DMAP(VM_PAGE_TO_PHYS(pml4pg));
++ for (i = 0; i < NPML4EPG; i++)
++ pm_pml4[i] = pti_pml4[i];
++}
++
+ /*
+ * Initialize a preallocated and zeroed pmap structure,
+ * such as one in a vmspace structure.
+@@ -2394,7 +2559,7 @@
+ int
+ pmap_pinit_type(pmap_t pmap, enum pmap_type pm_type, int flags)
+ {
+- vm_page_t pml4pg;
++ vm_page_t pml4pg, pml4pgu;
+ vm_paddr_t pml4phys;
+ int i;
+
+@@ -2411,8 +2576,11 @@
+ pmap->pm_pcids[i].pm_pcid = PMAP_PCID_NONE;
+ pmap->pm_pcids[i].pm_gen = 0;
+ }
+- pmap->pm_cr3 = ~0; /* initialize to an invalid value */
++ pmap->pm_cr3 = PMAP_NO_CR3; /* initialize to an invalid value */
++ pmap->pm_ucr3 = PMAP_NO_CR3;
++ pmap->pm_pml4u = NULL;
+
++ pmap->pm_type = pm_type;
+ if ((pml4pg->flags & PG_ZERO) == 0)
+ pagezero(pmap->pm_pml4);
+
+@@ -2420,10 +2588,21 @@
+ * Do not install the host kernel mappings in the nested page
+ * tables. These mappings are meaningless in the guest physical
+ * address space.
++ * Install minimal kernel mappings in PTI case.
+ */
+- if ((pmap->pm_type = pm_type) == PT_X86) {
++ if (pm_type == PT_X86) {
+ pmap->pm_cr3 = pml4phys;
+ pmap_pinit_pml4(pml4pg);
++ if (pti) {
++ while ((pml4pgu = vm_page_alloc(NULL, 0,
++ VM_ALLOC_NORMAL | VM_ALLOC_NOOBJ | VM_ALLOC_WIRED))
++ == NULL)
++ VM_WAIT;
++ pmap->pm_pml4u = (pml4_entry_t *)PHYS_TO_DMAP(
++ VM_PAGE_TO_PHYS(pml4pgu));
++ pmap_pinit_pml4_pti(pml4pgu);
++ pmap->pm_ucr3 = VM_PAGE_TO_PHYS(pml4pgu);
++ }
+ }
+
+ pmap->pm_root.rt_root = 0;
+@@ -2495,7 +2674,7 @@
+ */
+
+ if (ptepindex >= (NUPDE + NUPDPE)) {
+- pml4_entry_t *pml4;
++ pml4_entry_t *pml4, *pml4u;
+ vm_pindex_t pml4index;
+
+ /* Wire up a new PDPE page */
+@@ -2502,7 +2681,21 @@
+ pml4index = ptepindex - (NUPDE + NUPDPE);
+ pml4 = &pmap->pm_pml4[pml4index];
+ *pml4 = VM_PAGE_TO_PHYS(m) | PG_U | PG_RW | PG_V | PG_A | PG_M;
++ if (pmap->pm_pml4u != NULL && pml4index < NUPML4E) {
++ /*
++ * PTI: Make all user-space mappings in the
++ * kernel-mode page table no-execute so that
++ * we detect any programming errors that leave
++ * the kernel-mode page table active on return
++ * to user space.
++ */
++ *pml4 |= pg_nx;
+
++ pml4u = &pmap->pm_pml4u[pml4index];
++ *pml4u = VM_PAGE_TO_PHYS(m) | PG_U | PG_RW | PG_V |
++ PG_A | PG_M;
++ }
++
+ } else if (ptepindex >= NUPDE) {
+ vm_pindex_t pml4index;
+ vm_pindex_t pdpindex;
+@@ -2702,6 +2895,13 @@
+ m->wire_count--;
+ atomic_subtract_int(&vm_cnt.v_wire_count, 1);
+ vm_page_free_zero(m);
++
++ if (pmap->pm_pml4u != NULL) {
++ m = PHYS_TO_VM_PAGE(DMAP_TO_PHYS((vm_offset_t)pmap->pm_pml4u));
++ m->wire_count--;
++ atomic_subtract_int(&vm_cnt.v_wire_count, 1);
++ vm_page_free(m);
++ }
+ }
+
+ static int
+@@ -6867,13 +7067,15 @@
+
+ CRITICAL_ASSERT(curthread);
+ gen = PCPU_GET(pcid_gen);
+- if (pmap->pm_pcids[cpuid].pm_pcid == PMAP_PCID_KERN ||
+- pmap->pm_pcids[cpuid].pm_gen == gen)
++ if (!pti && (pmap->pm_pcids[cpuid].pm_pcid == PMAP_PCID_KERN ||
++ pmap->pm_pcids[cpuid].pm_gen == gen))
+ return (CR3_PCID_SAVE);
+ pcid_next = PCPU_GET(pcid_next);
+- KASSERT(pcid_next <= PMAP_PCID_OVERMAX, ("cpu %d pcid_next %#x",
+- cpuid, pcid_next));
+- if (pcid_next == PMAP_PCID_OVERMAX) {
++ KASSERT((!pti && pcid_next <= PMAP_PCID_OVERMAX) ||
++ (pti && pcid_next <= PMAP_PCID_OVERMAX_KERN),
++ ("cpu %d pcid_next %#x", cpuid, pcid_next));
++ if ((!pti && pcid_next == PMAP_PCID_OVERMAX) ||
++ (pti && pcid_next == PMAP_PCID_OVERMAX_KERN)) {
+ new_gen = gen + 1;
+ if (new_gen == 0)
+ new_gen = 1;
+@@ -6892,7 +7094,8 @@
+ pmap_activate_sw(struct thread *td)
+ {
+ pmap_t oldpmap, pmap;
+- uint64_t cached, cr3;
++ struct invpcid_descr d;
++ uint64_t cached, cr3, kcr3, ucr3;
+ register_t rflags;
+ u_int cpuid;
+
+@@ -6948,11 +7151,41 @@
+ PCPU_INC(pm_save_cnt);
+ }
+ PCPU_SET(curpmap, pmap);
++ if (pti) {
++ kcr3 = pmap->pm_cr3 | pmap->pm_pcids[cpuid].pm_pcid;
++ ucr3 = pmap->pm_ucr3 | pmap->pm_pcids[cpuid].pm_pcid |
++ PMAP_PCID_USER_PT;
++
++ /*
++ * Manually invalidate translations cached
++ * from the user page table, which are not
++ * flushed by reload of cr3 with the kernel
++ * page table pointer above.
++ */
++ if (pmap->pm_ucr3 != PMAP_NO_CR3) {
++ if (invpcid_works) {
++ d.pcid = PMAP_PCID_USER_PT |
++ pmap->pm_pcids[cpuid].pm_pcid;
++ d.pad = 0;
++ d.addr = 0;
++ invpcid(&d, INVPCID_CTX);
++ } else {
++ pmap_pti_pcid_invalidate(ucr3, kcr3);
++ }
++ }
++
++ PCPU_SET(kcr3, kcr3 | CR3_PCID_SAVE);
++ PCPU_SET(ucr3, ucr3 | CR3_PCID_SAVE);
++ }
+ if (!invpcid_works)
+ intr_restore(rflags);
+ } else if (cr3 != pmap->pm_cr3) {
+ load_cr3(pmap->pm_cr3);
+ PCPU_SET(curpmap, pmap);
++ if (pti) {
++ PCPU_SET(kcr3, pmap->pm_cr3);
++ PCPU_SET(ucr3, pmap->pm_ucr3);
++ }
+ }
+ #ifdef SMP
+ CPU_CLR_ATOMIC(cpuid, &oldpmap->pm_active);
+@@ -7271,6 +7504,291 @@
+ mtx_unlock_spin(&qframe_mtx);
+ }
+
++static vm_page_t
++pmap_pti_alloc_page(void)
++{
++ vm_page_t m;
++
++ VM_OBJECT_ASSERT_WLOCKED(pti_obj);
++ m = vm_page_grab(pti_obj, pti_pg_idx++, VM_ALLOC_NOBUSY |
++ VM_ALLOC_WIRED | VM_ALLOC_ZERO);
++ return (m);
++}
++
++static bool
++pmap_pti_free_page(vm_page_t m)
++{
++
++ KASSERT(m->wire_count > 0, ("page %p not wired", m));
++ m->wire_count--;
++ if (m->wire_count != 0)
++ return (false);
++ atomic_subtract_int(&vm_cnt.v_wire_count, 1);
++ vm_page_free_zero(m);
++ return (true);
++}
++
++static void
++pmap_pti_init(void)
++{
++ vm_page_t pml4_pg;
++ pdp_entry_t *pdpe;
++ vm_offset_t va;
++ int i;
++
++ if (!pti)
++ return;
++ pti_obj = vm_pager_allocate(OBJT_PHYS, NULL, 0, VM_PROT_ALL, 0, NULL);
++ VM_OBJECT_WLOCK(pti_obj);
++ pml4_pg = pmap_pti_alloc_page();
++ pti_pml4 = (pml4_entry_t *)PHYS_TO_DMAP(VM_PAGE_TO_PHYS(pml4_pg));
++ for (va = VM_MIN_KERNEL_ADDRESS; va <= VM_MAX_KERNEL_ADDRESS &&
++ va >= VM_MIN_KERNEL_ADDRESS && va > NBPML4; va += NBPML4) {
++ pdpe = pmap_pti_pdpe(va);
++ pmap_pti_wire_pte(pdpe);
++ }
++ pmap_pti_add_kva_locked((vm_offset_t)&__pcpu[0],
++ (vm_offset_t)&__pcpu[0] + sizeof(__pcpu[0]) * MAXCPU, false);
++ pmap_pti_add_kva_locked((vm_offset_t)gdt, (vm_offset_t)gdt +
++ sizeof(struct user_segment_descriptor) * NGDT * MAXCPU, false);
++ pmap_pti_add_kva_locked((vm_offset_t)idt, (vm_offset_t)idt +
++ sizeof(struct gate_descriptor) * NIDT, false);
++ pmap_pti_add_kva_locked((vm_offset_t)common_tss,
++ (vm_offset_t)common_tss + sizeof(struct amd64tss) * MAXCPU, false);
++ CPU_FOREACH(i) {
++ /* Doublefault stack IST 1 */
++ va = common_tss[i].tss_ist1;
++ pmap_pti_add_kva_locked(va - PAGE_SIZE, va, false);
++ /* NMI stack IST 2 */
++ va = common_tss[i].tss_ist2 + sizeof(struct nmi_pcpu);
++ pmap_pti_add_kva_locked(va - PAGE_SIZE, va, false);
++ /* MC# stack IST 3 */
++ va = common_tss[i].tss_ist3 + sizeof(struct nmi_pcpu);
++ pmap_pti_add_kva_locked(va - PAGE_SIZE, va, false);
++ }
++ pmap_pti_add_kva_locked((vm_offset_t)kernphys + KERNBASE,
++ (vm_offset_t)etext, true);
++ pti_finalized = true;
++ VM_OBJECT_WUNLOCK(pti_obj);
++}
++SYSINIT(pmap_pti, SI_SUB_CPU + 1, SI_ORDER_ANY, pmap_pti_init, NULL);
++
++static pdp_entry_t *
++pmap_pti_pdpe(vm_offset_t va)
++{
++ pml4_entry_t *pml4e;
++ pdp_entry_t *pdpe;
++ vm_page_t m;
++ vm_pindex_t pml4_idx;
++ vm_paddr_t mphys;
++
++ VM_OBJECT_ASSERT_WLOCKED(pti_obj);
++
++ pml4_idx = pmap_pml4e_index(va);
++ pml4e = &pti_pml4[pml4_idx];
++ m = NULL;
++ if (*pml4e == 0) {
++ if (pti_finalized)
++ panic("pml4 alloc after finalization\n");
++ m = pmap_pti_alloc_page();
++ if (*pml4e != 0) {
++ pmap_pti_free_page(m);
++ mphys = *pml4e & ~PAGE_MASK;
++ } else {
++ mphys = VM_PAGE_TO_PHYS(m);
++ *pml4e = mphys | X86_PG_RW | X86_PG_V;
++ }
++ } else {
++ mphys = *pml4e & ~PAGE_MASK;
++ }
++ pdpe = (pdp_entry_t *)PHYS_TO_DMAP(mphys) + pmap_pdpe_index(va);
++ return (pdpe);
++}
++
++static void
++pmap_pti_wire_pte(void *pte)
++{
++ vm_page_t m;
++
++ VM_OBJECT_ASSERT_WLOCKED(pti_obj);
++ m = PHYS_TO_VM_PAGE(DMAP_TO_PHYS((uintptr_t)pte));
++ m->wire_count++;
++}
++
++static void
++pmap_pti_unwire_pde(void *pde, bool only_ref)
++{
++ vm_page_t m;
++
++ VM_OBJECT_ASSERT_WLOCKED(pti_obj);
++ m = PHYS_TO_VM_PAGE(DMAP_TO_PHYS((uintptr_t)pde));
++ MPASS(m->wire_count > 0);
++ MPASS(only_ref || m->wire_count > 1);
++ pmap_pti_free_page(m);
++}
++
++static void
++pmap_pti_unwire_pte(void *pte, vm_offset_t va)
++{
++ vm_page_t m;
++ pd_entry_t *pde;
++
++ VM_OBJECT_ASSERT_WLOCKED(pti_obj);
++ m = PHYS_TO_VM_PAGE(DMAP_TO_PHYS((uintptr_t)pte));
++ MPASS(m->wire_count > 0);
++ if (pmap_pti_free_page(m)) {
++ pde = pmap_pti_pde(va);
++ MPASS((*pde & (X86_PG_PS | X86_PG_V)) == X86_PG_V);
++ *pde = 0;
++ pmap_pti_unwire_pde(pde, false);
++ }
++}
++
++static pd_entry_t *
++pmap_pti_pde(vm_offset_t va)
++{
++ pdp_entry_t *pdpe;
++ pd_entry_t *pde;
++ vm_page_t m;
++ vm_pindex_t pd_idx;
++ vm_paddr_t mphys;
++
++ VM_OBJECT_ASSERT_WLOCKED(pti_obj);
++
++ pdpe = pmap_pti_pdpe(va);
++ if (*pdpe == 0) {
++ m = pmap_pti_alloc_page();
++ if (*pdpe != 0) {
++ pmap_pti_free_page(m);
++ MPASS((*pdpe & X86_PG_PS) == 0);
++ mphys = *pdpe & ~PAGE_MASK;
++ } else {
++ mphys = VM_PAGE_TO_PHYS(m);
++ *pdpe = mphys | X86_PG_RW | X86_PG_V;
++ }
++ } else {
++ MPASS((*pdpe & X86_PG_PS) == 0);
++ mphys = *pdpe & ~PAGE_MASK;
++ }
++
++ pde = (pd_entry_t *)PHYS_TO_DMAP(mphys);
++ pd_idx = pmap_pde_index(va);
++ pde += pd_idx;
++ return (pde);
++}
++
++static pt_entry_t *
++pmap_pti_pte(vm_offset_t va, bool *unwire_pde)
++{
++ pd_entry_t *pde;
++ pt_entry_t *pte;
++ vm_page_t m;
++ vm_paddr_t mphys;
++
++ VM_OBJECT_ASSERT_WLOCKED(pti_obj);
++
++ pde = pmap_pti_pde(va);
++ if (unwire_pde != NULL) {
++ *unwire_pde = true;
++ pmap_pti_wire_pte(pde);
++ }
++ if (*pde == 0) {
++ m = pmap_pti_alloc_page();
++ if (*pde != 0) {
++ pmap_pti_free_page(m);
++ MPASS((*pde & X86_PG_PS) == 0);
++ mphys = *pde & ~(PAGE_MASK | pg_nx);
++ } else {
++ mphys = VM_PAGE_TO_PHYS(m);
++ *pde = mphys | X86_PG_RW | X86_PG_V;
++ if (unwire_pde != NULL)
++ *unwire_pde = false;
++ }
++ } else {
++ MPASS((*pde & X86_PG_PS) == 0);
++ mphys = *pde & ~(PAGE_MASK | pg_nx);
++ }
++
++ pte = (pt_entry_t *)PHYS_TO_DMAP(mphys);
++ pte += pmap_pte_index(va);
++
++ return (pte);
++}
++
++static void
++pmap_pti_add_kva_locked(vm_offset_t sva, vm_offset_t eva, bool exec)
++{
++ vm_paddr_t pa;
++ pd_entry_t *pde;
++ pt_entry_t *pte, ptev;
++ bool unwire_pde;
++
++ VM_OBJECT_ASSERT_WLOCKED(pti_obj);
++
++ sva = trunc_page(sva);
++ MPASS(sva > VM_MAXUSER_ADDRESS);
++ eva = round_page(eva);
++ MPASS(sva < eva);
++ for (; sva < eva; sva += PAGE_SIZE) {
++ pte = pmap_pti_pte(sva, &unwire_pde);
++ pa = pmap_kextract(sva);
++ ptev = pa | X86_PG_RW | X86_PG_V | X86_PG_A |
++ (exec ? 0 : pg_nx) | pmap_cache_bits(kernel_pmap,
++ VM_MEMATTR_DEFAULT, FALSE);
++ if (*pte == 0) {
++ pte_store(pte, ptev);
++ pmap_pti_wire_pte(pte);
++ } else {
++ KASSERT(!pti_finalized,
++ ("pti overlap after fin %#lx %#lx %#lx",
++ sva, *pte, ptev));
++ KASSERT(*pte == ptev,
++ ("pti non-identical pte after fin %#lx %#lx %#lx",
++ sva, *pte, ptev));
++ }
++ if (unwire_pde) {
++ pde = pmap_pti_pde(sva);
++ pmap_pti_unwire_pde(pde, true);
++ }
++ }
++}
++
++void
++pmap_pti_add_kva(vm_offset_t sva, vm_offset_t eva, bool exec)
++{
++
++ if (!pti)
++ return;
++ VM_OBJECT_WLOCK(pti_obj);
++ pmap_pti_add_kva_locked(sva, eva, exec);
++ VM_OBJECT_WUNLOCK(pti_obj);
++}
++
++void
++pmap_pti_remove_kva(vm_offset_t sva, vm_offset_t eva)
++{
++ pt_entry_t *pte;
++ vm_offset_t va;
++
++ if (!pti)
++ return;
++ sva = rounddown2(sva, PAGE_SIZE);
++ MPASS(sva > VM_MAXUSER_ADDRESS);
++ eva = roundup2(eva, PAGE_SIZE);
++ MPASS(sva < eva);
++ VM_OBJECT_WLOCK(pti_obj);
++ for (va = sva; va < eva; va += PAGE_SIZE) {
++ pte = pmap_pti_pte(va, NULL);
++ KASSERT((*pte & X86_PG_V) != 0,
++ ("invalid pte va %#lx pte %#lx pt %#lx", va,
++ (u_long)pte, *pte));
++ pte_clear(pte);
++ pmap_pti_unwire_pte(pte, va);
++ }
++ pmap_invalidate_range(kernel_pmap, sva, eva);
++ VM_OBJECT_WUNLOCK(pti_obj);
++}
++
+ #include "opt_ddb.h"
+ #ifdef DDB
+ #include
+--- sys/amd64/amd64/support.S.orig
++++ sys/amd64/amd64/support.S
+@@ -33,6 +33,7 @@
+ #include "opt_ddb.h"
+
+ #include
++#include
+ #include
+
+ #include "assym.s"
+@@ -787,3 +788,115 @@
+ movl $EFAULT,%eax
+ POP_FRAME_POINTER
+ ret
++
++/*
++ * void pmap_pti_pcid_invalidate(uint64_t ucr3, uint64_t kcr3);
++ * Invalidates address space addressed by ucr3, then returns to kcr3.
++ * Done in assembler to ensure no other memory accesses happen while
++ * on ucr3.
++ */
++ ALIGN_TEXT
++ENTRY(pmap_pti_pcid_invalidate)
++ pushfq
++ cli
++ movq %rdi,%cr3 /* to user page table */
++ movq %rsi,%cr3 /* back to kernel */
++ popfq
++ retq
++
++/*
++ * void pmap_pti_pcid_invlpg(uint64_t ucr3, uint64_t kcr3, vm_offset_t va);
++ * Invalidates virtual address va in address space ucr3, then returns to kcr3.
++ */
++ ALIGN_TEXT
++ENTRY(pmap_pti_pcid_invlpg)
++ pushfq
++ cli
++ movq %rdi,%cr3 /* to user page table */
++ invlpg (%rdx)
++ movq %rsi,%cr3 /* back to kernel */
++ popfq
++ retq
++
++/*
++ * void pmap_pti_pcid_invlrng(uint64_t ucr3, uint64_t kcr3, vm_offset_t sva,
++ * vm_offset_t eva);
++ * Invalidates virtual addresses between sva and eva in address space ucr3,
++ * then returns to kcr3.
++ */
++ ALIGN_TEXT
++ENTRY(pmap_pti_pcid_invlrng)
++ pushfq
++ cli
++ movq %rdi,%cr3 /* to user page table */
++1: invlpg (%rdx)
++ addq $PAGE_SIZE,%rdx
++ cmpq %rdx,%rcx
++ ja 1b
++ movq %rsi,%cr3 /* back to kernel */
++ popfq
++ retq
++
++ .altmacro
++ .macro ibrs_seq_label l
++handle_ibrs_\l:
++ .endm
++ .macro ibrs_call_label l
++ call handle_ibrs_\l
++ .endm
++ .macro ibrs_seq count
++ ll=1
++ .rept \count
++ ibrs_call_label %(ll)
++ nop
++ ibrs_seq_label %(ll)
++ addq $8,%rsp
++ ll=ll+1
++ .endr
++ .endm
++
++/* all callers already saved %rax, %rdx, and %rcx */
++ENTRY(handle_ibrs_entry)
++ cmpb $0,hw_ibrs_active(%rip)
++ je 1f
++ movl $MSR_IA32_SPEC_CTRL,%ecx
++ movl $(IA32_SPEC_CTRL_IBRS|IA32_SPEC_CTRL_STIBP),%eax
++ movl $(IA32_SPEC_CTRL_IBRS|IA32_SPEC_CTRL_STIBP)>>32,%edx
++ wrmsr
++ movb $1,PCPU(IBPB_SET)
++ testl $CPUID_STDEXT_SMEP,cpu_stdext_feature(%rip)
++ jne 1f
++ ibrs_seq 32
++1: ret
++END(handle_ibrs_entry)
++
++ENTRY(handle_ibrs_exit)
++ cmpb $0,PCPU(IBPB_SET)
++ je 1f
++ movl $MSR_IA32_SPEC_CTRL,%ecx
++ xorl %eax,%eax
++ xorl %edx,%edx
++ wrmsr
++ movb $0,PCPU(IBPB_SET)
++1: ret
++END(handle_ibrs_exit)
++
++/* registers-neutral version, but needs stack */
++ENTRY(handle_ibrs_exit_rs)
++ cmpb $0,PCPU(IBPB_SET)
++ je 1f
++ pushq %rax
++ pushq %rdx
++ pushq %rcx
++ movl $MSR_IA32_SPEC_CTRL,%ecx
++ xorl %eax,%eax
++ xorl %edx,%edx
++ wrmsr
++ popq %rcx
++ popq %rdx
++ popq %rax
++ movb $0,PCPU(IBPB_SET)
++1: ret
++END(handle_ibrs_exit_rs)
++
++ .noaltmacro
+--- sys/amd64/amd64/sys_machdep.c.orig
++++ sys/amd64/amd64/sys_machdep.c
+@@ -357,7 +357,9 @@
+ pcb = td->td_pcb;
+ if (pcb->pcb_tssp == NULL) {
+ tssp = (struct amd64tss *)kmem_malloc(kernel_arena,
+- ctob(IOPAGES+1), M_WAITOK);
++ ctob(IOPAGES + 1), M_WAITOK);
++ pmap_pti_add_kva((vm_offset_t)tssp, (vm_offset_t)tssp +
++ ctob(IOPAGES + 1), false);
+ iomap = (char *)&tssp[1];
+ memset(iomap, 0xff, IOPERM_BITMAP_SIZE);
+ critical_enter();
+@@ -452,6 +454,8 @@
+ struct proc_ldt *pldt, *new_ldt;
+ struct mdproc *mdp;
+ struct soft_segment_descriptor sldt;
++ vm_offset_t sva;
++ vm_size_t sz;
+
+ mtx_assert(&dt_lock, MA_OWNED);
+ mdp = &p->p_md;
+@@ -459,13 +463,13 @@
+ return (mdp->md_ldt);
+ mtx_unlock(&dt_lock);
+ new_ldt = malloc(sizeof(struct proc_ldt), M_SUBPROC, M_WAITOK);
+- new_ldt->ldt_base = (caddr_t)kmem_malloc(kernel_arena,
+- max_ldt_segment * sizeof(struct user_segment_descriptor),
+- M_WAITOK | M_ZERO);
++ sz = max_ldt_segment * sizeof(struct user_segment_descriptor);
++ sva = kmem_malloc(kernel_arena, sz, M_WAITOK | M_ZERO);
++ new_ldt->ldt_base = (caddr_t)sva;
++ pmap_pti_add_kva(sva, sva + sz, false);
+ new_ldt->ldt_refcnt = 1;
+- sldt.ssd_base = (uint64_t)new_ldt->ldt_base;
+- sldt.ssd_limit = max_ldt_segment *
+- sizeof(struct user_segment_descriptor) - 1;
++ sldt.ssd_base = sva;
++ sldt.ssd_limit = sz - 1;
+ sldt.ssd_type = SDT_SYSLDT;
+ sldt.ssd_dpl = SEL_KPL;
+ sldt.ssd_p = 1;
+@@ -475,8 +479,8 @@
+ mtx_lock(&dt_lock);
+ pldt = mdp->md_ldt;
+ if (pldt != NULL && !force) {
+- kmem_free(kernel_arena, (vm_offset_t)new_ldt->ldt_base,
+- max_ldt_segment * sizeof(struct user_segment_descriptor));
++ pmap_pti_remove_kva(sva, sva + sz);
++ kmem_free(kernel_arena, sva, sz);
+ free(new_ldt, M_SUBPROC);
+ return (pldt);
+ }
+@@ -518,10 +522,14 @@
+ static void
+ user_ldt_derefl(struct proc_ldt *pldt)
+ {
++ vm_offset_t sva;
++ vm_size_t sz;
+
+ if (--pldt->ldt_refcnt == 0) {
+- kmem_free(kernel_arena, (vm_offset_t)pldt->ldt_base,
+- max_ldt_segment * sizeof(struct user_segment_descriptor));
++ sva = (vm_offset_t)pldt->ldt_base;
++ sz = max_ldt_segment * sizeof(struct user_segment_descriptor);
++ pmap_pti_remove_kva(sva, sva + sz);
++ kmem_free(kernel_arena, sva, sz);
+ free(pldt, M_SUBPROC);
+ }
+ }
+--- sys/amd64/amd64/trap.c.orig
++++ sys/amd64/amd64/trap.c
+@@ -218,11 +218,6 @@
+ #endif
+ }
+
+- if (type == T_MCHK) {
+- mca_intr();
+- goto out;
+- }
+-
+ if ((frame->tf_rflags & PSL_I) == 0) {
+ /*
+ * Buggy application or kernel code has disabled
+@@ -452,9 +447,28 @@
+ * problem here and not have to check all the
+ * selectors and pointers when the user changes
+ * them.
++ *
++ * In case of PTI, the IRETQ faulted while the
++ * kernel used the pti stack, and exception
++ * frame records %rsp value pointing to that
++ * stack. If we return normally to
++ * doreti_iret_fault, the trapframe is
++ * reconstructed on pti stack, and calltrap()
++ * called on it as well. Due to the very
++ * limited pti stack size, kernel does not
++ * survive for too long. Switch to the normal
++ * thread stack for the trap handling.
++ *
++ * Magic '5' is the number of qwords occupied by
++ * the hardware trap frame.
+ */
+ if (frame->tf_rip == (long)doreti_iret) {
+ frame->tf_rip = (long)doreti_iret_fault;
++ if (pti && frame->tf_rsp == (uintptr_t)PCPU_PTR(
++ pti_stack) + (PC_PTI_STACK_SZ - 5) *
++ sizeof(register_t))
++ frame->tf_rsp = PCPU_GET(rsp0) - 5 *
++ sizeof(register_t);
+ goto out;
+ }
+ if (frame->tf_rip == (long)ld_ds) {
+@@ -694,6 +708,17 @@
+ }
+
+ /*
++ * If nx protection of the usermode portion of kernel page
++ * tables caused trap, panic.
++ */
++ if (pti && usermode && pg_nx != 0 && (frame->tf_err & (PGEX_P | PGEX_W |
++ PGEX_U | PGEX_I)) == (PGEX_P | PGEX_U | PGEX_I) &&
++ (curpcb->pcb_saved_ucr3 & ~CR3_PCID_MASK)==
++ (PCPU_GET(curpmap)->pm_cr3 & ~CR3_PCID_MASK))
++ panic("PTI: pid %d comm %s tf_err %#lx\n", p->p_pid,
++ p->p_comm, frame->tf_err);
++
++ /*
+ * PGEX_I is defined only if the execute disable bit capability is
+ * supported and enabled.
+ */
+--- sys/amd64/amd64/vm_machdep.c.orig
++++ sys/amd64/amd64/vm_machdep.c
+@@ -339,6 +339,8 @@
+ * Clean TSS/iomap
+ */
+ if (pcb->pcb_tssp != NULL) {
++ pmap_pti_remove_kva((vm_offset_t)pcb->pcb_tssp,
++ (vm_offset_t)pcb->pcb_tssp + ctob(IOPAGES + 1));
+ kmem_free(kernel_arena, (vm_offset_t)pcb->pcb_tssp,
+ ctob(IOPAGES + 1));
+ pcb->pcb_tssp = NULL;
+--- sys/amd64/ia32/ia32_exception.S.orig
++++ sys/amd64/ia32/ia32_exception.S
+@@ -40,24 +40,27 @@
+ * that it originated in supervisor mode and skip the swapgs.
+ */
+ SUPERALIGN_TEXT
++IDTVEC(int0x80_syscall_pti)
++ PTI_UENTRY has_err=0
++ jmp int0x80_syscall_common
++ SUPERALIGN_TEXT
+ IDTVEC(int0x80_syscall)
+ swapgs
++int0x80_syscall_common:
+ pushq $2 /* sizeof "int 0x80" */
+ subq $TF_ERR,%rsp /* skip over tf_trapno */
+ movq %rdi,TF_RDI(%rsp)
+ movq PCPU(CURPCB),%rdi
+ andl $~PCB_FULL_IRET,PCB_FLAGS(%rdi)
+- movw %fs,TF_FS(%rsp)
+- movw %gs,TF_GS(%rsp)
+- movw %es,TF_ES(%rsp)
+- movw %ds,TF_DS(%rsp)
++ SAVE_SEGS
++ movq %rax,TF_RAX(%rsp)
++ movq %rdx,TF_RDX(%rsp)
++ movq %rcx,TF_RCX(%rsp)
++ call handle_ibrs_entry
+ sti
+ movq %rsi,TF_RSI(%rsp)
+- movq %rdx,TF_RDX(%rsp)
+- movq %rcx,TF_RCX(%rsp)
+ movq %r8,TF_R8(%rsp)
+ movq %r9,TF_R9(%rsp)
+- movq %rax,TF_RAX(%rsp)
+ movq %rbx,TF_RBX(%rsp)
+ movq %rbp,TF_RBP(%rsp)
+ movq %r10,TF_R10(%rsp)
+--- sys/amd64/ia32/ia32_syscall.c.orig
++++ sys/amd64/ia32/ia32_syscall.c
+@@ -93,7 +93,8 @@
+
+ #define IDTVEC(name) __CONCAT(X,name)
+
+-extern inthand_t IDTVEC(int0x80_syscall), IDTVEC(rsvd);
++extern inthand_t IDTVEC(int0x80_syscall), IDTVEC(int0x80_syscall_pti),
++ IDTVEC(rsvd), IDTVEC(rsvd_pti);
+
+ void ia32_syscall(struct trapframe *frame); /* Called from asm code */
+
+@@ -205,7 +206,8 @@
+ ia32_syscall_enable(void *dummy)
+ {
+
+- setidt(IDT_SYSCALL, &IDTVEC(int0x80_syscall), SDT_SYSIGT, SEL_UPL, 0);
++ setidt(IDT_SYSCALL, pti ? &IDTVEC(int0x80_syscall_pti) :
++ &IDTVEC(int0x80_syscall), SDT_SYSIGT, SEL_UPL, 0);
+ }
+
+ static void
+@@ -212,7 +214,8 @@
+ ia32_syscall_disable(void *dummy)
+ {
+
+- setidt(IDT_SYSCALL, &IDTVEC(rsvd), SDT_SYSIGT, SEL_KPL, 0);
++ setidt(IDT_SYSCALL, pti ? &IDTVEC(rsvd_pti) : &IDTVEC(rsvd),
++ SDT_SYSIGT, SEL_KPL, 0);
+ }
+
+ SYSINIT(ia32_syscall, SI_SUB_EXEC, SI_ORDER_ANY, ia32_syscall_enable, NULL);
+--- sys/amd64/include/asmacros.h.orig
++++ sys/amd64/include/asmacros.h
+@@ -1,7 +1,15 @@
++/* -*- mode: asm -*- */
+ /*-
+ * Copyright (c) 1993 The Regents of the University of California.
+ * All rights reserved.
+ *
++ * Copyright (c) 2018 The FreeBSD Foundation
++ * All rights reserved.
++ *
++ * Portions of this software were developed by
++ * Konstantin Belousov under sponsorship from
++ * the FreeBSD Foundation.
++ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+@@ -144,70 +152,135 @@
+
+ #ifdef LOCORE
+ /*
++ * Access per-CPU data.
++ */
++#define PCPU(member) %gs:PC_ ## member
++#define PCPU_ADDR(member, reg) \
++ movq %gs:PC_PRVSPACE, reg ; \
++ addq $PC_ ## member, reg
++
++/*
+ * Convenience macro for declaring interrupt entry points.
+ */
+ #define IDTVEC(name) ALIGN_TEXT; .globl __CONCAT(X,name); \
+ .type __CONCAT(X,name),@function; __CONCAT(X,name):
+
+-/*
+- * Macros to create and destroy a trap frame.
+- */
+-#define PUSH_FRAME \
+- subq $TF_RIP,%rsp ; /* skip dummy tf_err and tf_trapno */ \
+- testb $SEL_RPL_MASK,TF_CS(%rsp) ; /* come from kernel? */ \
+- jz 1f ; /* Yes, dont swapgs again */ \
+- swapgs ; \
+-1: movq %rdi,TF_RDI(%rsp) ; \
+- movq %rsi,TF_RSI(%rsp) ; \
+- movq %rdx,TF_RDX(%rsp) ; \
+- movq %rcx,TF_RCX(%rsp) ; \
+- movq %r8,TF_R8(%rsp) ; \
+- movq %r9,TF_R9(%rsp) ; \
+- movq %rax,TF_RAX(%rsp) ; \
+- movq %rbx,TF_RBX(%rsp) ; \
+- movq %rbp,TF_RBP(%rsp) ; \
+- movq %r10,TF_R10(%rsp) ; \
+- movq %r11,TF_R11(%rsp) ; \
+- movq %r12,TF_R12(%rsp) ; \
+- movq %r13,TF_R13(%rsp) ; \
+- movq %r14,TF_R14(%rsp) ; \
+- movq %r15,TF_R15(%rsp) ; \
+- movw %fs,TF_FS(%rsp) ; \
+- movw %gs,TF_GS(%rsp) ; \
+- movw %es,TF_ES(%rsp) ; \
+- movw %ds,TF_DS(%rsp) ; \
+- movl $TF_HASSEGS,TF_FLAGS(%rsp) ; \
++ .macro SAVE_SEGS
++ movw %fs,TF_FS(%rsp)
++ movw %gs,TF_GS(%rsp)
++ movw %es,TF_ES(%rsp)
++ movw %ds,TF_DS(%rsp)
++ .endm
++
++ .macro MOVE_STACKS qw
++ .L.offset=0
++ .rept \qw
++ movq .L.offset(%rsp),%rdx
++ movq %rdx,.L.offset(%rax)
++ .L.offset=.L.offset+8
++ .endr
++ .endm
++
++ .macro PTI_UUENTRY has_err
++ movq PCPU(KCR3),%rax
++ movq %rax,%cr3
++ movq PCPU(RSP0),%rax
++ subq $PTI_SIZE,%rax
++ MOVE_STACKS ((PTI_SIZE / 8) - 1 + \has_err)
++ movq %rax,%rsp
++ popq %rdx
++ popq %rax
++ .endm
++
++ .macro PTI_UENTRY has_err
++ swapgs
++ pushq %rax
++ pushq %rdx
++ PTI_UUENTRY \has_err
++ .endm
++
++ .macro PTI_ENTRY name, cont, has_err=0
++ ALIGN_TEXT
++ .globl X\name\()_pti
++ .type X\name\()_pti,@function
++X\name\()_pti:
++ /* %rax, %rdx and possibly err not yet pushed */
++ testb $SEL_RPL_MASK,PTI_CS-(2+1-\has_err)*8(%rsp)
++ jz \cont
++ PTI_UENTRY \has_err
++ swapgs
++ jmp \cont
++ .endm
++
++ .macro PTI_INTRENTRY vec_name
++ SUPERALIGN_TEXT
++ .globl X\vec_name\()_pti
++ .type X\vec_name\()_pti,@function
++X\vec_name\()_pti:
++ testb $SEL_RPL_MASK,PTI_CS-3*8(%rsp) /* err, %rax, %rdx not pushed */
++ jz \vec_name\()_u
++ PTI_UENTRY has_err=0
++ jmp \vec_name\()_u
++ .endm
++
++ .macro INTR_PUSH_FRAME vec_name
++ SUPERALIGN_TEXT
++ .globl X\vec_name
++ .type X\vec_name,@function
++X\vec_name:
++ testb $SEL_RPL_MASK,PTI_CS-3*8(%rsp) /* come from kernel? */
++ jz \vec_name\()_u /* Yes, dont swapgs again */
++ swapgs
++\vec_name\()_u:
++ subq $TF_RIP,%rsp /* skip dummy tf_err and tf_trapno */
++ movq %rdi,TF_RDI(%rsp)
++ movq %rsi,TF_RSI(%rsp)
++ movq %rdx,TF_RDX(%rsp)
++ movq %rcx,TF_RCX(%rsp)
++ movq %r8,TF_R8(%rsp)
++ movq %r9,TF_R9(%rsp)
++ movq %rax,TF_RAX(%rsp)
++ movq %rbx,TF_RBX(%rsp)
++ movq %rbp,TF_RBP(%rsp)
++ movq %r10,TF_R10(%rsp)
++ movq %r11,TF_R11(%rsp)
++ movq %r12,TF_R12(%rsp)
++ movq %r13,TF_R13(%rsp)
++ movq %r14,TF_R14(%rsp)
++ movq %r15,TF_R15(%rsp)
++ SAVE_SEGS
++ movl $TF_HASSEGS,TF_FLAGS(%rsp)
+ cld
++ testb $SEL_RPL_MASK,TF_CS(%rsp) /* come from kernel ? */
++ jz 1f /* yes, leave PCB_FULL_IRET alone */
++ movq PCPU(CURPCB),%r8
++ andl $~PCB_FULL_IRET,PCB_FLAGS(%r8)
++1:
++ .endm
+
+-#define POP_FRAME \
+- movq TF_RDI(%rsp),%rdi ; \
+- movq TF_RSI(%rsp),%rsi ; \
+- movq TF_RDX(%rsp),%rdx ; \
+- movq TF_RCX(%rsp),%rcx ; \
+- movq TF_R8(%rsp),%r8 ; \
+- movq TF_R9(%rsp),%r9 ; \
+- movq TF_RAX(%rsp),%rax ; \
+- movq TF_RBX(%rsp),%rbx ; \
+- movq TF_RBP(%rsp),%rbp ; \
+- movq TF_R10(%rsp),%r10 ; \
+- movq TF_R11(%rsp),%r11 ; \
+- movq TF_R12(%rsp),%r12 ; \
+- movq TF_R13(%rsp),%r13 ; \
+- movq TF_R14(%rsp),%r14 ; \
+- movq TF_R15(%rsp),%r15 ; \
+- testb $SEL_RPL_MASK,TF_CS(%rsp) ; /* come from kernel? */ \
+- jz 1f ; /* keep kernel GS.base */ \
+- cli ; \
+- swapgs ; \
+-1: addq $TF_RIP,%rsp /* skip over tf_err, tf_trapno */
++ .macro INTR_HANDLER vec_name
++ .text
++ PTI_INTRENTRY \vec_name
++ INTR_PUSH_FRAME \vec_name
++ .endm
+
+-/*
+- * Access per-CPU data.
+- */
+-#define PCPU(member) %gs:PC_ ## member
+-#define PCPU_ADDR(member, reg) \
+- movq %gs:PC_PRVSPACE, reg ; \
+- addq $PC_ ## member, reg
++ .macro RESTORE_REGS
++ movq TF_RDI(%rsp),%rdi
++ movq TF_RSI(%rsp),%rsi
++ movq TF_RDX(%rsp),%rdx
++ movq TF_RCX(%rsp),%rcx
++ movq TF_R8(%rsp),%r8
++ movq TF_R9(%rsp),%r9
++ movq TF_RAX(%rsp),%rax
++ movq TF_RBX(%rsp),%rbx
++ movq TF_RBP(%rsp),%rbp
++ movq TF_R10(%rsp),%r10
++ movq TF_R11(%rsp),%r11
++ movq TF_R12(%rsp),%r12
++ movq TF_R13(%rsp),%r13
++ movq TF_R14(%rsp),%r14
++ movq TF_R15(%rsp),%r15
++ .endm
+
+ #endif /* LOCORE */
+
+--- sys/amd64/include/frame.h.orig
++++ sys/amd64/include/frame.h
+@@ -1,6 +1,50 @@
+ /*-
+- * This file is in the public domain.
++ * SPDX-License-Identifier: BSD-2-Clause-FreeBSD
++ *
++ * Copyright (c) 2018 The FreeBSD Foundation
++ * All rights reserved.
++ *
++ * This software was developed by Konstantin Belousov
++ * under sponsorship from the FreeBSD Foundation.
++ *
++ * Redistribution and use in source and binary forms, with or without
++ * modification, are permitted provided that the following conditions
++ * are met:
++ * 1. Redistributions of source code must retain the above copyright
++ * notice, this list of conditions and the following disclaimer.
++ * 2. Redistributions in binary form must reproduce the above copyright
++ * notice, this list of conditions and the following disclaimer in the
++ * documentation and/or other materials provided with the distribution.
++ *
++ * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
++ * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
++ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
++ * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
++ * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
++ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
++ * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
++ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
++ * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
++ * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
++ * SUCH DAMAGE.
++ *
++ * $FreeBSD$
+ */
+-/* $FreeBSD: releng/11.1/sys/amd64/include/frame.h 247047 2013-02-20 17:39:52Z kib $ */
+
++#ifndef _AMD64_FRAME_H
++#define _AMD64_FRAME_H
++
+ #include
++
++struct pti_frame {
++ register_t pti_rdx;
++ register_t pti_rax;
++ register_t pti_err;
++ register_t pti_rip;
++ register_t pti_cs;
++ register_t pti_rflags;
++ register_t pti_rsp;
++ register_t pti_ss;
++};
++
++#endif
+--- sys/amd64/include/intr_machdep.h.orig
++++ sys/amd64/include/intr_machdep.h
+@@ -136,7 +136,7 @@
+
+ /*
+ * The following data structure holds per-cpu data, and is placed just
+- * above the top of the space used for the NMI stack.
++ * above the top of the space used for the NMI and MC# stacks.
+ */
+ struct nmi_pcpu {
+ register_t np_pcpu;
+--- sys/amd64/include/md_var.h.orig
++++ sys/amd64/include/md_var.h
+@@ -35,9 +35,17 @@
+ #include
+
+ extern uint64_t *vm_page_dump;
++extern int hw_ibrs_disable;
+
++/*
++ * The file "conf/ldscript.amd64" defines the symbol "kernphys". Its
++ * value is the physical address at which the kernel is loaded.
++ */
++extern char kernphys[];
++
+ struct savefpu;
+
++void amd64_conf_fast_syscall(void);
+ void amd64_db_resume_dbreg(void);
+ void amd64_syscall(struct thread *td, int traced);
+ void doreti_iret(void) __asm(__STRING(doreti_iret));
+--- sys/amd64/include/pcb.h.orig
++++ sys/amd64/include/pcb.h
+@@ -90,7 +90,7 @@
+ /* copyin/out fault recovery */
+ caddr_t pcb_onfault;
+
+- uint64_t pcb_pad0;
++ uint64_t pcb_saved_ucr3;
+
+ /* local tss, with i/o bitmap; NULL for common */
+ struct amd64tss *pcb_tssp;
+--- sys/amd64/include/pcpu.h.orig
++++ sys/amd64/include/pcpu.h
+@@ -33,6 +33,7 @@
+ #error "sys/cdefs.h is a prerequisite for this file"
+ #endif
+
++#define PC_PTI_STACK_SZ 16
+ /*
+ * The SMP parts are setup in pmap.c and locore.s for the BSP, and
+ * mp_machdep.c sets up the data for the AP's to "see" when they awake.
+@@ -46,8 +47,12 @@
+ struct pmap *pc_curpmap; \
+ struct amd64tss *pc_tssp; /* TSS segment active on CPU */ \
+ struct amd64tss *pc_commontssp;/* Common TSS for the CPU */ \
++ uint64_t pc_kcr3; \
++ uint64_t pc_ucr3; \
++ uint64_t pc_saved_ucr3; \
+ register_t pc_rsp0; \
+ register_t pc_scratch_rsp; /* User %rsp in syscall */ \
++ register_t pc_scratch_rax; \
+ u_int pc_apic_id; \
+ u_int pc_acpi_id; /* ACPI CPU id */ \
+ /* Pointer to the CPU %fs descriptor */ \
+@@ -61,12 +66,14 @@
+ uint64_t pc_pm_save_cnt; \
+ u_int pc_cmci_mask; /* MCx banks for CMCI */ \
+ uint64_t pc_dbreg[16]; /* ddb debugging regs */ \
++ uint64_t pc_pti_stack[PC_PTI_STACK_SZ]; \
+ int pc_dbreg_cmd; /* ddb debugging reg cmd */ \
+ u_int pc_vcpu_id; /* Xen vCPU ID */ \
+ uint32_t pc_pcid_next; \
+ uint32_t pc_pcid_gen; \
+ uint32_t pc_smp_tlb_done; /* TLB op acknowledgement */ \
+- char __pad[145] /* be divisor of PAGE_SIZE \
++ uint32_t pc_ibpb_set; \
++ char __pad[96] /* be divisor of PAGE_SIZE \
+ after cache alignment */
+
+ #define PC_DBREG_CMD_NONE 0
+--- sys/amd64/include/pmap.h.orig
++++ sys/amd64/include/pmap.h
+@@ -223,7 +223,11 @@
+ #define PMAP_PCID_NONE 0xffffffff
+ #define PMAP_PCID_KERN 0
+ #define PMAP_PCID_OVERMAX 0x1000
++#define PMAP_PCID_OVERMAX_KERN 0x800
++#define PMAP_PCID_USER_PT 0x800
+
++#define PMAP_NO_CR3 (~0UL)
++
+ #ifndef LOCORE
+
+ #include
+@@ -313,7 +317,9 @@
+ struct pmap {
+ struct mtx pm_mtx;
+ pml4_entry_t *pm_pml4; /* KVA of level 4 page table */
++ pml4_entry_t *pm_pml4u; /* KVA of user l4 page table */
+ uint64_t pm_cr3;
++ uint64_t pm_ucr3;
+ TAILQ_HEAD(,pv_chunk) pm_pvchunk; /* list of mappings in pmap */
+ cpuset_t pm_active; /* active on cpus */
+ enum pmap_type pm_type; /* regular or nested tables */
+@@ -419,6 +425,12 @@
+ void pmap_get_mapping(pmap_t pmap, vm_offset_t va, uint64_t *ptr, int *num);
+ boolean_t pmap_map_io_transient(vm_page_t *, vm_offset_t *, int, boolean_t);
+ void pmap_unmap_io_transient(vm_page_t *, vm_offset_t *, int, boolean_t);
++void pmap_pti_add_kva(vm_offset_t sva, vm_offset_t eva, bool exec);
++void pmap_pti_remove_kva(vm_offset_t sva, vm_offset_t eva);
++void pmap_pti_pcid_invalidate(uint64_t ucr3, uint64_t kcr3);
++void pmap_pti_pcid_invlpg(uint64_t ucr3, uint64_t kcr3, vm_offset_t va);
++void pmap_pti_pcid_invlrng(uint64_t ucr3, uint64_t kcr3, vm_offset_t sva,
++ vm_offset_t eva);
+ #endif /* _KERNEL */
+
+ /* Return various clipped indexes for a given VA */
+--- sys/amd64/include/smp.h.orig
++++ sys/amd64/include/smp.h
+@@ -28,12 +28,36 @@
+
+ /* IPI handlers */
+ inthand_t
++ IDTVEC(justreturn), /* interrupt CPU with minimum overhead */
++ IDTVEC(justreturn1_pti),
++ IDTVEC(invltlb_pti),
++ IDTVEC(invltlb_pcid_pti),
+ IDTVEC(invltlb_pcid), /* TLB shootdowns - global, pcid */
+- IDTVEC(invltlb_invpcid),/* TLB shootdowns - global, invpcid */
+- IDTVEC(justreturn); /* interrupt CPU with minimum overhead */
++ IDTVEC(invltlb_invpcid_pti_pti),
++ IDTVEC(invltlb_invpcid_nopti),
++ IDTVEC(invlpg_pti),
++ IDTVEC(invlpg_invpcid_pti),
++ IDTVEC(invlpg_invpcid),
++ IDTVEC(invlpg_pcid_pti),
++ IDTVEC(invlpg_pcid),
++ IDTVEC(invlrng_pti),
++ IDTVEC(invlrng_invpcid_pti),
++ IDTVEC(invlrng_invpcid),
++ IDTVEC(invlrng_pcid_pti),
++ IDTVEC(invlrng_pcid),
++ IDTVEC(invlcache_pti),
++ IDTVEC(ipi_intr_bitmap_handler_pti),
++ IDTVEC(cpustop_pti),
++ IDTVEC(cpususpend_pti),
++ IDTVEC(rendezvous_pti);
+
+ void invltlb_pcid_handler(void);
+ void invltlb_invpcid_handler(void);
++void invltlb_invpcid_pti_handler(void);
++void invlpg_invpcid_handler(void);
++void invlpg_pcid_handler(void);
++void invlrng_invpcid_handler(void);
++void invlrng_pcid_handler(void);
+ int native_start_all_aps(void);
+
+ #endif /* !LOCORE */
+--- sys/amd64/vmm/intel/vmx.c.orig
++++ sys/amd64/vmm/intel/vmx.c
+@@ -693,7 +693,8 @@
+ MSR_VMX_TRUE_PINBASED_CTLS, PINBASED_POSTED_INTERRUPT, 0,
+ &tmp);
+ if (error == 0) {
+- pirvec = lapic_ipi_alloc(&IDTVEC(justreturn));
++ pirvec = lapic_ipi_alloc(pti ? &IDTVEC(justreturn1_pti) :
++ &IDTVEC(justreturn));
+ if (pirvec < 0) {
+ if (bootverbose) {
+ printf("vmx_init: unable to allocate "
+--- sys/amd64/vmm/vmm.c.orig
++++ sys/amd64/vmm/vmm.c
+@@ -55,6 +55,7 @@
+ #include
+ #include
+ #include
++#include
+ #include
+ #include
+
+@@ -325,7 +326,8 @@
+
+ vmm_host_state_init();
+
+- vmm_ipinum = lapic_ipi_alloc(&IDTVEC(justreturn));
++ vmm_ipinum = lapic_ipi_alloc(pti ? &IDTVEC(justreturn1_pti) :
++ &IDTVEC(justreturn));
+ if (vmm_ipinum < 0)
+ vmm_ipinum = IPI_AST;
+
+--- sys/conf/Makefile.amd64.orig
++++ sys/conf/Makefile.amd64
+@@ -39,6 +39,7 @@
+
+ ASM_CFLAGS.acpi_wakecode.S= ${CLANG_NO_IAS34}
+ ASM_CFLAGS.mpboot.S= ${CLANG_NO_IAS34}
++ASM_CFLAGS.support.S= ${CLANG_NO_IAS}
+
+ %BEFORE_DEPEND
+
+--- sys/dev/cpuctl/cpuctl.c.orig
++++ sys/dev/cpuctl/cpuctl.c
+@@ -71,6 +71,7 @@
+ struct thread *td);
+ static int cpuctl_do_cpuid_count(int cpu, cpuctl_cpuid_count_args_t *data,
+ struct thread *td);
++static int cpuctl_do_eval_cpu_features(int cpu, struct thread *td);
+ static int cpuctl_do_update(int cpu, cpuctl_update_args_t *data,
+ struct thread *td);
+ static int update_intel(int cpu, cpuctl_update_args_t *args,
+@@ -157,7 +158,8 @@
+ }
+ /* Require write flag for "write" requests. */
+ if ((cmd == CPUCTL_MSRCBIT || cmd == CPUCTL_MSRSBIT ||
+- cmd == CPUCTL_UPDATE || cmd == CPUCTL_WRMSR) &&
++ cmd == CPUCTL_UPDATE || cmd == CPUCTL_WRMSR ||
++ cmd == CPUCTL_EVAL_CPU_FEATURES) &&
+ (flags & FWRITE) == 0)
+ return (EPERM);
+ switch (cmd) {
+@@ -185,6 +187,9 @@
+ ret = cpuctl_do_cpuid_count(cpu,
+ (cpuctl_cpuid_count_args_t *)data, td);
+ break;
++ case CPUCTL_EVAL_CPU_FEATURES:
++ ret = cpuctl_do_eval_cpu_features(cpu, td);
++ break;
+ default:
+ ret = EINVAL;
+ break;
+@@ -502,6 +507,30 @@
+ return (ret);
+ }
+
++static int
++cpuctl_do_eval_cpu_features(int cpu, struct thread *td)
++{
++ int is_bound = 0;
++ int oldcpu;
++
++ KASSERT(cpu >= 0 && cpu <= mp_maxid,
++ ("[cpuctl,%d]: bad cpu number %d", __LINE__, cpu));
++
++#ifdef __i386__
++ if (cpu_id == 0)
++ return (ENODEV);
++#endif
++ oldcpu = td->td_oncpu;
++ is_bound = cpu_sched_is_bound(td);
++ set_cpu(cpu, td);
++ identify_cpu1();
++ identify_cpu2();
++ hw_ibrs_recalculate();
++ restore_cpu(oldcpu, is_bound, td);
++ printcpuinfo();
++ return (0);
++}
++
+ int
+ cpuctl_open(struct cdev *dev, int flags, int fmt __unused, struct thread *td)
+ {
+--- sys/dev/hyperv/vmbus/amd64/vmbus_vector.S.orig
++++ sys/dev/hyperv/vmbus/amd64/vmbus_vector.S
+@@ -26,11 +26,11 @@
+ * $FreeBSD$
+ */
+
++#include "assym.s"
++
+ #include
+ #include
+
+-#include "assym.s"
+-
+ /*
+ * This is the Hyper-V vmbus channel direct callback interrupt.
+ * Only used when it is running on Hyper-V.
+@@ -37,8 +37,7 @@
+ */
+ .text
+ SUPERALIGN_TEXT
+-IDTVEC(vmbus_isr)
+- PUSH_FRAME
++ INTR_HANDLER vmbus_isr
+ FAKE_MCOUNT(TF_RIP(%rsp))
+ movq %rsp, %rdi
+ call vmbus_handle_intr
+--- sys/dev/hyperv/vmbus/i386/vmbus_vector.S.orig
++++ sys/dev/hyperv/vmbus/i386/vmbus_vector.S
+@@ -37,6 +37,7 @@
+ */
+ .text
+ SUPERALIGN_TEXT
++IDTVEC(vmbus_isr_pti)
+ IDTVEC(vmbus_isr)
+ PUSH_FRAME
+ SET_KERNEL_SREGS
+--- sys/dev/hyperv/vmbus/vmbus.c.orig
++++ sys/dev/hyperv/vmbus/vmbus.c
+@@ -46,6 +46,7 @@
+
+ #include
+ #include
++#include
+ #include
+ #include
+
+@@ -128,7 +129,7 @@
+
+ static struct vmbus_softc *vmbus_sc;
+
+-extern inthand_t IDTVEC(vmbus_isr);
++extern inthand_t IDTVEC(vmbus_isr), IDTVEC(vmbus_isr_pti);
+
+ static const uint32_t vmbus_version[] = {
+ VMBUS_VERSION_WIN8_1,
+@@ -928,7 +929,8 @@
+ * All Hyper-V ISR required resources are setup, now let's find a
+ * free IDT vector for Hyper-V ISR and set it up.
+ */
+- sc->vmbus_idtvec = lapic_ipi_alloc(IDTVEC(vmbus_isr));
++ sc->vmbus_idtvec = lapic_ipi_alloc(pti ? IDTVEC(vmbus_isr_pti) :
++ IDTVEC(vmbus_isr));
+ if (sc->vmbus_idtvec < 0) {
+ device_printf(sc->vmbus_dev, "cannot find free IDT vector\n");
+ return ENXIO;
+--- sys/i386/i386/apic_vector.s.orig
++++ sys/i386/i386/apic_vector.s
+@@ -70,6 +70,7 @@
+ #define ISR_VEC(index, vec_name) \
+ .text ; \
+ SUPERALIGN_TEXT ; \
++IDTVEC(vec_name ## _pti) ; \
+ IDTVEC(vec_name) ; \
+ PUSH_FRAME ; \
+ SET_KERNEL_SREGS ; \
+@@ -123,6 +124,7 @@
+ */
+ .text
+ SUPERALIGN_TEXT
++IDTVEC(timerint_pti)
+ IDTVEC(timerint)
+ PUSH_FRAME
+ SET_KERNEL_SREGS
+@@ -139,6 +141,7 @@
+ */
+ .text
+ SUPERALIGN_TEXT
++IDTVEC(cmcint_pti)
+ IDTVEC(cmcint)
+ PUSH_FRAME
+ SET_KERNEL_SREGS
+@@ -153,6 +156,7 @@
+ */
+ .text
+ SUPERALIGN_TEXT
++IDTVEC(errorint_pti)
+ IDTVEC(errorint)
+ PUSH_FRAME
+ SET_KERNEL_SREGS
+--- sys/i386/i386/atpic_vector.s.orig
++++ sys/i386/i386/atpic_vector.s
+@@ -46,6 +46,7 @@
+ #define INTR(irq_num, vec_name) \
+ .text ; \
+ SUPERALIGN_TEXT ; \
++IDTVEC(vec_name ##_pti) ; \
+ IDTVEC(vec_name) ; \
+ PUSH_FRAME ; \
+ SET_KERNEL_SREGS ; \
+--- sys/i386/i386/exception.s.orig
++++ sys/i386/i386/exception.s
+@@ -133,6 +133,7 @@
+ TRAP(T_PAGEFLT)
+ IDTVEC(mchk)
+ pushl $0; TRAP(T_MCHK)
++IDTVEC(rsvd_pti)
+ IDTVEC(rsvd)
+ pushl $0; TRAP(T_RESERVED)
+ IDTVEC(fpu)
+--- sys/i386/i386/machdep.c.orig
++++ sys/i386/i386/machdep.c
+@@ -2577,7 +2577,7 @@
+ GSEL(GCODE_SEL, SEL_KPL));
+ #endif
+ #ifdef XENHVM
+- setidt(IDT_EVTCHN, &IDTVEC(xen_intr_upcall), SDT_SYS386IGT, SEL_UPL,
++ setidt(IDT_EVTCHN, &IDTVEC(xen_intr_upcall), SDT_SYS386IGT, SEL_KPL,
+ GSEL(GCODE_SEL, SEL_KPL));
+ #endif
+
+--- sys/i386/i386/pmap.c.orig
++++ sys/i386/i386/pmap.c
+@@ -283,6 +283,8 @@
+ "Number of times pmap_pte_quick didn't change PMAP1");
+ static struct mtx PMAP2mutex;
+
++int pti;
++
+ static void free_pv_chunk(struct pv_chunk *pc);
+ static void free_pv_entry(pmap_t pmap, pv_entry_t pv);
+ static pv_entry_t get_pv_entry(pmap_t pmap, boolean_t try);
+@@ -1043,7 +1045,7 @@
+ CPU_AND(&other_cpus, &pmap->pm_active);
+ mask = &other_cpus;
+ }
+- smp_masked_invlpg(*mask, va);
++ smp_masked_invlpg(*mask, va, pmap);
+ sched_unpin();
+ }
+
+@@ -1077,7 +1079,7 @@
+ CPU_AND(&other_cpus, &pmap->pm_active);
+ mask = &other_cpus;
+ }
+- smp_masked_invlpg_range(*mask, sva, eva);
++ smp_masked_invlpg_range(*mask, sva, eva, pmap);
+ sched_unpin();
+ }
+
+--- sys/i386/i386/support.s.orig
++++ sys/i386/i386/support.s
+@@ -830,3 +830,11 @@
+ movl $0,PCB_ONFAULT(%ecx)
+ movl $EFAULT,%eax
+ ret
++
++ENTRY(handle_ibrs_entry)
++ ret
++END(handle_ibrs_entry)
++
++ENTRY(handle_ibrs_exit)
++ ret
++END(handle_ibrs_exit)
+--- sys/i386/i386/vm_machdep.c.orig
++++ sys/i386/i386/vm_machdep.c
+@@ -795,7 +795,7 @@
+ CPU_NAND(&other_cpus, &sf->cpumask);
+ if (!CPU_EMPTY(&other_cpus)) {
+ CPU_OR(&sf->cpumask, &other_cpus);
+- smp_masked_invlpg(other_cpus, sf->kva);
++ smp_masked_invlpg(other_cpus, sf->kva, kernel_pmap);
+ }
+ }
+ sched_unpin();
+--- sys/sys/cpuctl.h.orig
++++ sys/sys/cpuctl.h
+@@ -57,5 +57,6 @@
+ #define CPUCTL_MSRSBIT _IOWR('c', 5, cpuctl_msr_args_t)
+ #define CPUCTL_MSRCBIT _IOWR('c', 6, cpuctl_msr_args_t)
+ #define CPUCTL_CPUID_COUNT _IOWR('c', 7, cpuctl_cpuid_count_args_t)
++#define CPUCTL_EVAL_CPU_FEATURES _IO('c', 8)
+
+ #endif /* _CPUCTL_H_ */
+--- sys/x86/include/apicvar.h.orig
++++ sys/x86/include/apicvar.h
+@@ -179,7 +179,11 @@
+ IDTVEC(apic_isr1), IDTVEC(apic_isr2), IDTVEC(apic_isr3),
+ IDTVEC(apic_isr4), IDTVEC(apic_isr5), IDTVEC(apic_isr6),
+ IDTVEC(apic_isr7), IDTVEC(cmcint), IDTVEC(errorint),
+- IDTVEC(spuriousint), IDTVEC(timerint);
++ IDTVEC(spuriousint), IDTVEC(timerint),
++ IDTVEC(apic_isr1_pti), IDTVEC(apic_isr2_pti), IDTVEC(apic_isr3_pti),
++ IDTVEC(apic_isr4_pti), IDTVEC(apic_isr5_pti), IDTVEC(apic_isr6_pti),
++ IDTVEC(apic_isr7_pti), IDTVEC(cmcint_pti), IDTVEC(errorint_pti),
++ IDTVEC(spuriousint_pti), IDTVEC(timerint_pti);
+
+ extern vm_paddr_t lapic_paddr;
+ extern int apic_cpuids[];
+--- sys/x86/include/specialreg.h.orig
++++ sys/x86/include/specialreg.h
+@@ -374,6 +374,17 @@
+ #define CPUID_STDEXT2_SGXLC 0x40000000
+
+ /*
++ * CPUID instruction 7 Structured Extended Features, leaf 0 edx info
++ */
++#define CPUID_STDEXT3_IBPB 0x04000000
++#define CPUID_STDEXT3_STIBP 0x08000000
++#define CPUID_STDEXT3_ARCH_CAP 0x20000000
++
++/* MSR IA32_ARCH_CAP(ABILITIES) bits */
++#define IA32_ARCH_CAP_RDCL_NO 0x00000001
++#define IA32_ARCH_CAP_IBRS_ALL 0x00000002
++
++/*
+ * CPUID manufacturers identifiers
+ */
+ #define AMD_VENDOR_ID "AuthenticAMD"
+@@ -401,6 +412,8 @@
+ #define MSR_EBL_CR_POWERON 0x02a
+ #define MSR_TEST_CTL 0x033
+ #define MSR_IA32_FEATURE_CONTROL 0x03a
++#define MSR_IA32_SPEC_CTRL 0x048
++#define MSR_IA32_PRED_CMD 0x049
+ #define MSR_BIOS_UPDT_TRIG 0x079
+ #define MSR_BBL_CR_D0 0x088
+ #define MSR_BBL_CR_D1 0x089
+@@ -413,6 +426,7 @@
+ #define MSR_APERF 0x0e8
+ #define MSR_IA32_EXT_CONFIG 0x0ee /* Undocumented. Core Solo/Duo only */
+ #define MSR_MTRRcap 0x0fe
++#define MSR_IA32_ARCH_CAP 0x10a
+ #define MSR_BBL_CR_ADDR 0x116
+ #define MSR_BBL_CR_DECC 0x118
+ #define MSR_BBL_CR_CTL 0x119
+@@ -556,6 +570,17 @@
+ #define IA32_MISC_EN_XDD 0x0000000400000000ULL
+
+ /*
++ * IA32_SPEC_CTRL and IA32_PRED_CMD MSRs are described in the Intel'
++ * document 336996-001 Speculative Execution Side Channel Mitigations.
++ */
++/* MSR IA32_SPEC_CTRL */
++#define IA32_SPEC_CTRL_IBRS 0x00000001
++#define IA32_SPEC_CTRL_STIBP 0x00000002
++
++/* MSR IA32_PRED_CMD */
++#define IA32_PRED_CMD_IBPB_BARRIER 0x0000000000000001ULL
++
++/*
+ * PAT modes.
+ */
+ #define PAT_UNCACHEABLE 0x00
+--- sys/x86/include/x86_smp.h.orig
++++ sys/x86/include/x86_smp.h
+@@ -37,6 +37,7 @@
+ extern int cpu_cores;
+ extern volatile uint32_t smp_tlb_generation;
+ extern struct pmap *smp_tlb_pmap;
++extern vm_offset_t smp_tlb_addr1, smp_tlb_addr2;
+ extern u_int xhits_gbl[];
+ extern u_int xhits_pg[];
+ extern u_int xhits_rng[];
+@@ -95,9 +96,9 @@
+ u_int mp_bootaddress(u_int);
+ void set_interrupt_apic_ids(void);
+ void smp_cache_flush(void);
+-void smp_masked_invlpg(cpuset_t mask, vm_offset_t addr);
++void smp_masked_invlpg(cpuset_t mask, vm_offset_t addr, struct pmap *pmap);
+ void smp_masked_invlpg_range(cpuset_t mask, vm_offset_t startva,
+- vm_offset_t endva);
++ vm_offset_t endva, struct pmap *pmap);
+ void smp_masked_invltlb(cpuset_t mask, struct pmap *pmap);
+ void mem_range_AP_init(void);
+ void topo_probe(void);
+--- sys/x86/include/x86_var.h.orig
++++ sys/x86/include/x86_var.h
+@@ -50,6 +50,8 @@
+ extern u_int cpu_clflush_line_size;
+ extern u_int cpu_stdext_feature;
+ extern u_int cpu_stdext_feature2;
++extern u_int cpu_stdext_feature3;
++extern uint64_t cpu_ia32_arch_caps;
+ extern u_int cpu_fxsr;
+ extern u_int cpu_high;
+ extern u_int cpu_id;
+@@ -78,6 +80,7 @@
+ extern int _ugssel;
+ extern int use_xsave;
+ extern uint64_t xsave_mask;
++extern int pti;
+
+ struct pcb;
+ struct thread;
+@@ -115,7 +118,9 @@
+ void cpu_setregs(void);
+ void dump_add_page(vm_paddr_t);
+ void dump_drop_page(vm_paddr_t);
+-void identify_cpu(void);
++void finishidentcpu(void);
++void identify_cpu1(void);
++void identify_cpu2(void);
+ void initializecpu(void);
+ void initializecpucache(void);
+ bool fix_cpuid(void);
+@@ -122,11 +127,15 @@
+ void fillw(int /*u_short*/ pat, void *base, size_t cnt);
+ int is_physical_memory(vm_paddr_t addr);
+ int isa_nmi(int cd);
++void handle_ibrs_entry(void);
++void handle_ibrs_exit(void);
++void hw_ibrs_recalculate(void);
+ void nmi_call_kdb(u_int cpu, u_int type, struct trapframe *frame);
+ void nmi_call_kdb_smp(u_int type, struct trapframe *frame);
+ void nmi_handle_intr(u_int type, struct trapframe *frame);
+ void pagecopy(void *from, void *to);
+ void printcpuinfo(void);
++int pti_get_default(void);
+ int user_dbreg_trap(void);
+ int minidumpsys(struct dumperinfo *);
+ struct pcb *get_pcb_td(struct thread *td);
+--- sys/x86/isa/atpic.c.orig
++++ sys/x86/isa/atpic.c
+@@ -86,6 +86,16 @@
+ IDTVEC(atpic_intr9), IDTVEC(atpic_intr10), IDTVEC(atpic_intr11),
+ IDTVEC(atpic_intr12), IDTVEC(atpic_intr13), IDTVEC(atpic_intr14),
+ IDTVEC(atpic_intr15);
++/* XXXKIB i386 uses stubs until pti comes */
++inthand_t
++ IDTVEC(atpic_intr0_pti), IDTVEC(atpic_intr1_pti),
++ IDTVEC(atpic_intr2_pti), IDTVEC(atpic_intr3_pti),
++ IDTVEC(atpic_intr4_pti), IDTVEC(atpic_intr5_pti),
++ IDTVEC(atpic_intr6_pti), IDTVEC(atpic_intr7_pti),
++ IDTVEC(atpic_intr8_pti), IDTVEC(atpic_intr9_pti),
++ IDTVEC(atpic_intr10_pti), IDTVEC(atpic_intr11_pti),
++ IDTVEC(atpic_intr12_pti), IDTVEC(atpic_intr13_pti),
++ IDTVEC(atpic_intr14_pti), IDTVEC(atpic_intr15_pti);
+
+ #define IRQ(ap, ai) ((ap)->at_irqbase + (ai)->at_irq)
+
+@@ -98,7 +108,7 @@
+
+ #define INTSRC(irq) \
+ { { &atpics[(irq) / 8].at_pic }, IDTVEC(atpic_intr ## irq ), \
+- (irq) % 8 }
++ IDTVEC(atpic_intr ## irq ## _pti), (irq) % 8 }
+
+ struct atpic {
+ struct pic at_pic;
+@@ -110,7 +120,7 @@
+
+ struct atpic_intsrc {
+ struct intsrc at_intsrc;
+- inthand_t *at_intr;
++ inthand_t *at_intr, *at_intr_pti;
+ int at_irq; /* Relative to PIC base. */
+ enum intr_trigger at_trigger;
+ u_long at_count;
+@@ -435,7 +445,8 @@
+ ai->at_intsrc.is_count = &ai->at_count;
+ ai->at_intsrc.is_straycount = &ai->at_straycount;
+ setidt(((struct atpic *)ai->at_intsrc.is_pic)->at_intbase +
+- ai->at_irq, ai->at_intr, SDT_ATPIC, SEL_KPL, GSEL_ATPIC);
++ ai->at_irq, pti ? ai->at_intr_pti : ai->at_intr, SDT_ATPIC,
++ SEL_KPL, GSEL_ATPIC);
+ }
+
+ #ifdef DEV_MCA
+--- sys/x86/x86/cpu_machdep.c.orig
++++ sys/x86/x86/cpu_machdep.c
+@@ -139,6 +139,12 @@
+ int *state;
+
+ /*
++ * A comment in Linux patch claims that 'CPUs run faster with
++ * speculation protection disabled. All CPU threads in a core
++ * must disable speculation protection for it to be
++ * disabled. Disable it while we are idle so the other
++ * hyperthread can run fast.'
++ *
+ * XXXKIB. Software coordination mode should be supported,
+ * but all Intel CPUs provide hardware coordination.
+ */
+@@ -147,9 +153,11 @@
+ KASSERT(*state == STATE_SLEEPING,
+ ("cpu_mwait_cx: wrong monitorbuf state"));
+ *state = STATE_MWAIT;
++ handle_ibrs_entry();
+ cpu_monitor(state, 0, 0);
+ if (*state == STATE_MWAIT)
+ cpu_mwait(MWAIT_INTRBREAK, mwait_hint);
++ handle_ibrs_exit();
+
+ /*
+ * We should exit on any event that interrupts mwait, because
+@@ -578,3 +586,47 @@
+ nmi_call_kdb(PCPU_GET(cpuid), type, frame);
+ #endif
+ }
++
++int hw_ibrs_active;
++int hw_ibrs_disable = 1;
++
++SYSCTL_INT(_hw, OID_AUTO, ibrs_active, CTLFLAG_RD, &hw_ibrs_active, 0,
++ "Indirect Branch Restricted Speculation active");
++
++void
++hw_ibrs_recalculate(void)
++{
++ uint64_t v;
++
++ if ((cpu_ia32_arch_caps & IA32_ARCH_CAP_IBRS_ALL) != 0) {
++ if (hw_ibrs_disable) {
++ v= rdmsr(MSR_IA32_SPEC_CTRL);
++ v &= ~(uint64_t)IA32_SPEC_CTRL_IBRS;
++ wrmsr(MSR_IA32_SPEC_CTRL, v);
++ } else {
++ v= rdmsr(MSR_IA32_SPEC_CTRL);
++ v |= IA32_SPEC_CTRL_IBRS;
++ wrmsr(MSR_IA32_SPEC_CTRL, v);
++ }
++ return;
++ }
++ hw_ibrs_active = (cpu_stdext_feature3 & CPUID_STDEXT3_IBPB) != 0 &&
++ !hw_ibrs_disable;
++}
++
++static int
++hw_ibrs_disable_handler(SYSCTL_HANDLER_ARGS)
++{
++ int error, val;
++
++ val = hw_ibrs_disable;
++ error = sysctl_handle_int(oidp, &val, 0, req);
++ if (error != 0 || req->newptr == NULL)
++ return (error);
++ hw_ibrs_disable = val != 0;
++ hw_ibrs_recalculate();
++ return (0);
++}
++SYSCTL_PROC(_hw, OID_AUTO, ibrs_disable, CTLTYPE_INT | CTLFLAG_RWTUN |
++ CTLFLAG_NOFETCH | CTLFLAG_MPSAFE, NULL, 0, hw_ibrs_disable_handler, "I",
++ "Disable Indirect Branch Restricted Speculation");
+--- sys/x86/x86/identcpu.c.orig
++++ sys/x86/x86/identcpu.c
+@@ -104,8 +104,10 @@
+ u_int cpu_fxsr; /* SSE enabled */
+ u_int cpu_mxcsr_mask; /* Valid bits in mxcsr */
+ u_int cpu_clflush_line_size = 32;
+-u_int cpu_stdext_feature;
+-u_int cpu_stdext_feature2;
++u_int cpu_stdext_feature; /* %ebx */
++u_int cpu_stdext_feature2; /* %ecx */
++u_int cpu_stdext_feature3; /* %edx */
++uint64_t cpu_ia32_arch_caps;
+ u_int cpu_max_ext_state_size;
+ u_int cpu_mon_mwait_flags; /* MONITOR/MWAIT flags (CPUID.05H.ECX) */
+ u_int cpu_mon_min_size; /* MONITOR minimum range size, bytes */
+@@ -978,6 +980,16 @@
+ );
+ }
+
++ if (cpu_stdext_feature3 != 0) {
++ printf("\n Structured Extended Features3=0x%b",
++ cpu_stdext_feature3,
++ "\020"
++ "\033IBPB"
++ "\034STIBP"
++ "\036ARCH_CAP"
++ );
++ }
++
+ if ((cpu_feature2 & CPUID2_XSAVE) != 0) {
+ cpuid_count(0xd, 0x1, regs);
+ if (regs[0] != 0) {
+@@ -991,6 +1003,15 @@
+ }
+ }
+
++ if (cpu_ia32_arch_caps != 0) {
++ printf("\n IA32_ARCH_CAPS=0x%b",
++ (u_int)cpu_ia32_arch_caps,
++ "\020"
++ "\001RDCL_NO"
++ "\002IBRS_ALL"
++ );
++ }
++
+ if (via_feature_rng != 0 || via_feature_xcrypt != 0)
+ print_via_padlock_info();
+
+@@ -1370,23 +1391,11 @@
+ return (false);
+ }
+
+-/*
+- * Final stage of CPU identification.
+- */
+-#ifdef __i386__
+ void
+-finishidentcpu(void)
+-#else
+-void
+-identify_cpu(void)
+-#endif
++identify_cpu1(void)
+ {
+- u_int regs[4], cpu_stdext_disable;
+-#ifdef __i386__
+- u_char ccr3;
+-#endif
++ u_int regs[4];
+
+-#ifdef __amd64__
+ do_cpuid(0, regs);
+ cpu_high = regs[0];
+ ((u_int *)&cpu_vendor)[0] = regs[1];
+@@ -1399,6 +1408,44 @@
+ cpu_procinfo = regs[1];
+ cpu_feature = regs[3];
+ cpu_feature2 = regs[2];
++}
++
++void
++identify_cpu2(void)
++{
++ u_int regs[4], cpu_stdext_disable;
++
++ if (cpu_high >= 7) {
++ cpuid_count(7, 0, regs);
++ cpu_stdext_feature = regs[1];
++
++ /*
++ * Some hypervisors failed to filter out unsupported
++ * extended features. Allow to disable the
++ * extensions, activation of which requires setting a
++ * bit in CR4, and which VM monitors do not support.
++ */
++ cpu_stdext_disable = 0;
++ TUNABLE_INT_FETCH("hw.cpu_stdext_disable", &cpu_stdext_disable);
++ cpu_stdext_feature &= ~cpu_stdext_disable;
++
++ cpu_stdext_feature2 = regs[2];
++ cpu_stdext_feature3 = regs[3];
++
++ if ((cpu_stdext_feature3 & CPUID_STDEXT3_ARCH_CAP) != 0)
++ cpu_ia32_arch_caps = rdmsr(MSR_IA32_ARCH_CAP);
++ }
++}
++
++/*
++ * Final stage of CPU identification.
++ */
++void
++finishidentcpu(void)
++{
++ u_int regs[4];
++#ifdef __i386__
++ u_char ccr3;
+ #endif
+
+ identify_hypervisor();
+@@ -1416,26 +1463,8 @@
+ cpu_mon_max_size = regs[1] & CPUID5_MON_MAX_SIZE;
+ }
+
+- if (cpu_high >= 7) {
+- cpuid_count(7, 0, regs);
+- cpu_stdext_feature = regs[1];
++ identify_cpu2();
+
+- /*
+- * Some hypervisors fail to filter out unsupported
+- * extended features. For now, disable the
+- * extensions, activation of which requires setting a
+- * bit in CR4, and which VM monitors do not support.
+- */
+- if (cpu_feature2 & CPUID2_HV) {
+- cpu_stdext_disable = CPUID_STDEXT_FSGSBASE |
+- CPUID_STDEXT_SMEP;
+- } else
+- cpu_stdext_disable = 0;
+- TUNABLE_INT_FETCH("hw.cpu_stdext_disable", &cpu_stdext_disable);
+- cpu_stdext_feature &= ~cpu_stdext_disable;
+- cpu_stdext_feature2 = regs[2];
+- }
+-
+ #ifdef __i386__
+ if (cpu_high > 0 &&
+ (cpu_vendor_id == CPU_VENDOR_INTEL ||
+@@ -1563,6 +1592,17 @@
+ #endif
+ }
+
++int
++pti_get_default(void)
++{
++
++ if (strcmp(cpu_vendor, AMD_VENDOR_ID) == 0)
++ return (0);
++ if ((cpu_ia32_arch_caps & IA32_ARCH_CAP_RDCL_NO) != 0)
++ return (0);
++ return (1);
++}
++
+ static u_int
+ find_cpu_vendor_id(void)
+ {
+--- sys/x86/x86/local_apic.c.orig
++++ sys/x86/x86/local_apic.c
+@@ -166,6 +166,16 @@
+ IDTVEC(apic_isr7), /* 224 - 255 */
+ };
+
++static inthand_t *ioint_pti_handlers[] = {
++ NULL, /* 0 - 31 */
++ IDTVEC(apic_isr1_pti), /* 32 - 63 */
++ IDTVEC(apic_isr2_pti), /* 64 - 95 */
++ IDTVEC(apic_isr3_pti), /* 96 - 127 */
++ IDTVEC(apic_isr4_pti), /* 128 - 159 */
++ IDTVEC(apic_isr5_pti), /* 160 - 191 */
++ IDTVEC(apic_isr6_pti), /* 192 - 223 */
++ IDTVEC(apic_isr7_pti), /* 224 - 255 */
++};
+
+ static u_int32_t lapic_timer_divisors[] = {
+ APIC_TDCR_1, APIC_TDCR_2, APIC_TDCR_4, APIC_TDCR_8, APIC_TDCR_16,
+@@ -172,7 +182,7 @@
+ APIC_TDCR_32, APIC_TDCR_64, APIC_TDCR_128
+ };
+
+-extern inthand_t IDTVEC(rsvd);
++extern inthand_t IDTVEC(rsvd_pti), IDTVEC(rsvd);
+
+ volatile char *lapic_map;
+ vm_paddr_t lapic_paddr;
+@@ -489,15 +499,18 @@
+ PCPU_SET(apic_id, lapic_id());
+
+ /* Local APIC timer interrupt. */
+- setidt(APIC_TIMER_INT, IDTVEC(timerint), SDT_APIC, SEL_KPL, GSEL_APIC);
++ setidt(APIC_TIMER_INT, pti ? IDTVEC(timerint_pti) : IDTVEC(timerint),
++ SDT_APIC, SEL_KPL, GSEL_APIC);
+
+ /* Local APIC error interrupt. */
+- setidt(APIC_ERROR_INT, IDTVEC(errorint), SDT_APIC, SEL_KPL, GSEL_APIC);
++ setidt(APIC_ERROR_INT, pti ? IDTVEC(errorint_pti) : IDTVEC(errorint),
++ SDT_APIC, SEL_KPL, GSEL_APIC);
+
+ /* XXX: Thermal interrupt */
+
+ /* Local APIC CMCI. */
+- setidt(APIC_CMC_INT, IDTVEC(cmcint), SDT_APICT, SEL_KPL, GSEL_APIC);
++ setidt(APIC_CMC_INT, pti ? IDTVEC(cmcint_pti) : IDTVEC(cmcint),
++ SDT_APICT, SEL_KPL, GSEL_APIC);
+
+ if ((resource_int_value("apic", 0, "clock", &i) != 0 || i != 0)) {
+ arat = 0;
+@@ -1561,8 +1574,8 @@
+ KASSERT(vector != IDT_DTRACE_RET,
+ ("Attempt to overwrite DTrace entry"));
+ #endif
+- setidt(vector, ioint_handlers[vector / 32], SDT_APIC, SEL_KPL,
+- GSEL_APIC);
++ setidt(vector, (pti ? ioint_pti_handlers : ioint_handlers)[vector / 32],
++ SDT_APIC, SEL_KPL, GSEL_APIC);
+ }
+
+ static void
+@@ -1581,7 +1594,8 @@
+ * We can not currently clear the idt entry because other cpus
+ * may have a valid vector at this offset.
+ */
+- setidt(vector, &IDTVEC(rsvd), SDT_APICT, SEL_KPL, GSEL_APIC);
++ setidt(vector, pti ? &IDTVEC(rsvd_pti) : &IDTVEC(rsvd), SDT_APICT,
++ SEL_KPL, GSEL_APIC);
+ #endif
+ }
+
+@@ -2084,7 +2098,8 @@
+ long func;
+ int idx, vector;
+
+- KASSERT(ipifunc != &IDTVEC(rsvd), ("invalid ipifunc %p", ipifunc));
++ KASSERT(ipifunc != &IDTVEC(rsvd) && ipifunc != &IDTVEC(rsvd_pti),
++ ("invalid ipifunc %p", ipifunc));
+
+ vector = -1;
+ mtx_lock_spin(&icu_lock);
+@@ -2091,7 +2106,8 @@
+ for (idx = IPI_DYN_FIRST; idx <= IPI_DYN_LAST; idx++) {
+ ip = &idt[idx];
+ func = (ip->gd_hioffset << 16) | ip->gd_looffset;
+- if (func == (uintptr_t)&IDTVEC(rsvd)) {
++ if ((!pti && func == (uintptr_t)&IDTVEC(rsvd)) ||
++ (pti && func == (uintptr_t)&IDTVEC(rsvd_pti))) {
+ vector = idx;
+ setidt(vector, ipifunc, SDT_APIC, SEL_KPL, GSEL_APIC);
+ break;
+@@ -2113,8 +2129,10 @@
+ mtx_lock_spin(&icu_lock);
+ ip = &idt[vector];
+ func = (ip->gd_hioffset << 16) | ip->gd_looffset;
+- KASSERT(func != (uintptr_t)&IDTVEC(rsvd),
++ KASSERT(func != (uintptr_t)&IDTVEC(rsvd) &&
++ func != (uintptr_t)&IDTVEC(rsvd_pti),
+ ("invalid idtfunc %#lx", func));
+- setidt(vector, &IDTVEC(rsvd), SDT_APICT, SEL_KPL, GSEL_APIC);
++ setidt(vector, pti ? &IDTVEC(rsvd_pti) : &IDTVEC(rsvd), SDT_APICT,
++ SEL_KPL, GSEL_APIC);
+ mtx_unlock_spin(&icu_lock);
+ }
+--- sys/x86/x86/mp_x86.c.orig
++++ sys/x86/x86/mp_x86.c
+@@ -1436,7 +1436,7 @@
+ */
+
+ /* Variables needed for SMP tlb shootdown. */
+-static vm_offset_t smp_tlb_addr1, smp_tlb_addr2;
++vm_offset_t smp_tlb_addr1, smp_tlb_addr2;
+ pmap_t smp_tlb_pmap;
+ volatile uint32_t smp_tlb_generation;
+
+@@ -1509,11 +1509,11 @@
+ }
+
+ void
+-smp_masked_invlpg(cpuset_t mask, vm_offset_t addr)
++smp_masked_invlpg(cpuset_t mask, vm_offset_t addr, pmap_t pmap)
+ {
+
+ if (smp_started) {
+- smp_targeted_tlb_shootdown(mask, IPI_INVLPG, NULL, addr, 0);
++ smp_targeted_tlb_shootdown(mask, IPI_INVLPG, pmap, addr, 0);
+ #ifdef COUNT_XINVLTLB_HITS
+ ipi_page++;
+ #endif
+@@ -1521,11 +1521,12 @@
+ }
+
+ void
+-smp_masked_invlpg_range(cpuset_t mask, vm_offset_t addr1, vm_offset_t addr2)
++smp_masked_invlpg_range(cpuset_t mask, vm_offset_t addr1, vm_offset_t addr2,
++ pmap_t pmap)
+ {
+
+ if (smp_started) {
+- smp_targeted_tlb_shootdown(mask, IPI_INVLRNG, NULL,
++ smp_targeted_tlb_shootdown(mask, IPI_INVLRNG, pmap,
+ addr1, addr2);
+ #ifdef COUNT_XINVLTLB_HITS
+ ipi_range++;
+--- sys/x86/xen/pv.c.orig
++++ sys/x86/xen/pv.c
+@@ -97,6 +97,7 @@
+ #ifdef SMP
+ /* Variables used by amd64 mp_machdep to start APs */
+ extern char *doublefault_stack;
++extern char *mce_stack;
+ extern char *nmi_stack;
+ #endif
+
+@@ -217,6 +218,8 @@
+ (void *)kmem_malloc(kernel_arena, stacksize, M_WAITOK | M_ZERO);
+ doublefault_stack =
+ (char *)kmem_malloc(kernel_arena, PAGE_SIZE, M_WAITOK | M_ZERO);
++ mce_stack =
++ (char *)kmem_malloc(kernel_arena, PAGE_SIZE, M_WAITOK | M_ZERO);
+ nmi_stack =
+ (char *)kmem_malloc(kernel_arena, PAGE_SIZE, M_WAITOK | M_ZERO);
+ dpcpu =
+--- usr.sbin/cpucontrol/cpucontrol.8.orig
++++ usr.sbin/cpucontrol/cpucontrol.8
+@@ -24,7 +24,7 @@
+ .\"
+ .\" $FreeBSD$
+ .\"
+-.Dd June 30, 2009
++.Dd January 5, 2018
+ .Dt CPUCONTROL 8
+ .Os
+ .Sh NAME
+@@ -36,44 +36,48 @@
+ .Nm
+ .Op Fl vh
+ .Fl m Ar msr
+-.Bk
+ .Ar device
+ .Ek
++.Bk
+ .Nm
+ .Op Fl vh
+ .Fl m Ar msr Ns = Ns Ar value
+-.Bk
+ .Ar device
+ .Ek
++.Bk
+ .Nm
+ .Op Fl vh
+ .Fl m Ar msr Ns &= Ns Ar mask
+-.Bk
+ .Ar device
+ .Ek
++.Bk
+ .Nm
+ .Op Fl vh
+ .Fl m Ar msr Ns |= Ns Ar mask
+-.Bk
+ .Ar device
+ .Ek
++.Bk
+ .Nm
+ .Op Fl vh
+ .Fl i Ar level
+-.Bk
+ .Ar device
+ .Ek
++.Bk
+ .Nm
+ .Op Fl vh
+ .Fl i Ar level,level_type
+-.Bk
+ .Ar device
+ .Ek
++.Bk
+ .Nm
+ .Op Fl vh
+ .Op Fl d Ar datadir
+ .Fl u
++.Ar device
++.Ek
+ .Bk
++.Nm
++.Fl e
+ .Ar device
+ .Ek
+ .Sh DESCRIPTION
+@@ -129,6 +133,20 @@
+ .Nm
+ utility will walk through the configured data directories
+ and apply all firmware updates available for this CPU.
++.It Fl e
++Re-evaluate the kernel flags indicating the present CPU features.
++This command is typically executed after a firmware update was applied
++which changes information reported by the
++.Dv CPUID
++instruction.
++.Pp
++.Bf -symbolic
++Only execute the
++.Fl e
++command after the microcode update was applied to all CPUs in the system.
++The kernel does not operate correctly if the features of processors are
++not identical.
++.Ef
+ .It Fl v
+ Increase the verbosity level.
+ .It Fl h
+--- usr.sbin/cpucontrol/cpucontrol.c.orig
++++ usr.sbin/cpucontrol/cpucontrol.c
+@@ -60,6 +60,7 @@
+ #define FLAG_I 0x01
+ #define FLAG_M 0x02
+ #define FLAG_U 0x04
++#define FLAG_E 0x10
+
+ #define OP_INVAL 0x00
+ #define OP_READ 0x01
+@@ -114,7 +115,7 @@
+ if (name == NULL)
+ name = "cpuctl";
+ fprintf(stderr, "Usage: %s [-vh] [-d datadir] [-m msr[=value] | "
+- "-i level | -i level,level_type | -u] device\n", name);
++ "-i level | -i level,level_type | -e | -u] device\n", name);
+ exit(EX_USAGE);
+ }
+
+@@ -338,6 +339,25 @@
+ }
+
+ static int
++do_eval_cpu_features(const char *dev)
++{
++ int fd, error;
++
++ assert(dev != NULL);
++
++ fd = open(dev, O_RDWR);
++ if (fd < 0) {
++ WARN(0, "error opening %s for writing", dev);
++ return (1);
++ }
++ error = ioctl(fd, CPUCTL_EVAL_CPU_FEATURES, NULL);
++ if (error < 0)
++ WARN(0, "ioctl(%s, CPUCTL_EVAL_CPU_FEATURES)", dev);
++ close(fd);
++ return (error);
++}
++
++static int
+ do_update(const char *dev)
+ {
+ int fd;
+@@ -431,11 +451,14 @@
+ * Add all default data dirs to the list first.
+ */
+ datadir_add(DEFAULT_DATADIR);
+- while ((c = getopt(argc, argv, "d:hi:m:uv")) != -1) {
++ while ((c = getopt(argc, argv, "d:ehi:m:uv")) != -1) {
+ switch (c) {
+ case 'd':
+ datadir_add(optarg);
+ break;
++ case 'e':
++ flags |= FLAG_E;
++ break;
+ case 'i':
+ flags |= FLAG_I;
+ cmdarg = optarg;
+@@ -464,22 +487,25 @@
+ /* NOTREACHED */
+ }
+ dev = argv[0];
+- c = flags & (FLAG_I | FLAG_M | FLAG_U);
++ c = flags & (FLAG_E | FLAG_I | FLAG_M | FLAG_U);
+ switch (c) {
+- case FLAG_I:
+- if (strstr(cmdarg, ",") != NULL)
+- error = do_cpuid_count(cmdarg, dev);
+- else
+- error = do_cpuid(cmdarg, dev);
+- break;
+- case FLAG_M:
+- error = do_msr(cmdarg, dev);
+- break;
+- case FLAG_U:
+- error = do_update(dev);
+- break;
+- default:
+- usage(); /* Only one command can be selected. */
++ case FLAG_I:
++ if (strstr(cmdarg, ",") != NULL)
++ error = do_cpuid_count(cmdarg, dev);
++ else
++ error = do_cpuid(cmdarg, dev);
++ break;
++ case FLAG_M:
++ error = do_msr(cmdarg, dev);
++ break;
++ case FLAG_U:
++ error = do_update(dev);
++ break;
++ case FLAG_E:
++ error = do_eval_cpu_features(dev);
++ break;
++ default:
++ usage(); /* Only one command can be selected. */
+ }
+ SLIST_FREE(&datadirs, next, free);
+ return (error == 0 ? 0 : 1);
diff --git a/share/security/patches/SA-18:03/speculative_execution-amd64-11.patch.asc b/share/security/patches/SA-18:03/speculative_execution-amd64-11.patch.asc
new file mode 100644
index 0000000000..13cc83c15c
--- /dev/null
+++ b/share/security/patches/SA-18:03/speculative_execution-amd64-11.patch.asc
@@ -0,0 +1,18 @@
+-----BEGIN PGP SIGNATURE-----
+
+iQKTBAABCgB9FiEE/A6HiuWv54gCjWNV05eS9J6n5cIFAlqon1hfFIAAAAAALgAo
+aXNzdWVyLWZwckBub3RhdGlvbnMub3BlbnBncC5maWZ0aGhvcnNlbWFuLm5ldEZD
+MEU4NzhBRTVBRkU3ODgwMjhENjM1NUQzOTc5MkY0OUVBN0U1QzIACgkQ05eS9J6n
+5cIkxA//e6AvcQRqf5nsbsyfnC35XNp6knt4psnCTTi5ny4MdHPt8r8Jk7Rdwqar
+mkaz2ZSTKY1h1PrITjriWjIB9tjuuqRF27UXoxclkCyYPj1dRGuGRNLrD8+W1pVd
+c3Bbb30c1VOSojJU/d1g9pLpIXmkppM1vuLOlgisjTncrbmT7EHJmjaLCMycEFY5
+SGy2PmHrzh5xyrCMl4gbWwDhBSFA+5osReA2c2kC4QZ1RYIE2XG3Zo3po1daDZYw
+BDOqLhfOq4jl22GbKIjbYcSClrttFWb622k7A9R14zKWOgSdbXFp1yizKR/1bK7v
+5+0WOWR9JXwMObbqmgss/lky9tKSWtendIka+ZdUQtuJETbP/nKcrA6QPZ6MoLGB
+IlQzPzdPmljsqPXaQRKcVNytMzrfekyUbJilileXG8RTMb7HVnWyEjA6FhEoLX/E
+hzJIgaSd+IluLphdCGnBnL05UYCpz+zLDSh2K6aJUX/840ZOwmEUkWOWeTK29od5
+LY2s9hYBw9cTbX9U+Lh/UuZ9Y8wYc0/RH/09VJcIS5ga4Qc17L1RgLAyoFbHm0p7
+9Ly3i5o4GVXa7u5fcjoDWsilUmUdZcNLNfieQ5zusmV3GHeXlXMRwfgMZeWSxv5N
+UfbaQGdMSGY6eMfqDUfCxmDXj1LH818nf2r9lznC3oox69vxtHo=
+=fW1f
+-----END PGP SIGNATURE-----
diff --git a/share/xml/advisories.xml b/share/xml/advisories.xml
index d08e28eeaa..fc8fb35ed7 100644
--- a/share/xml/advisories.xml
+++ b/share/xml/advisories.xml
@@ -10,6 +10,15 @@
3
+
+ 14
+
+
+ FreeBSD-SA-18:03.speculative_execution
+
+
+
+
7