INVLPG (M8)
Summary:
"Invalidate TLB Entries"
Reference:
https://www.felixcloutier.com/x86/INVLPG.html
Extension:
BASE
Category:
SYSTEM
ISA-Set:
I486REAL
CPL:
0
iform:
INVLPG_MEMb
iclass:
INVLPG
ASM:
INVLPG
Operands
Operand 1 (r): Memory
Available performance data
Alder Lake-P
Alder Lake-E
Rocket Lake
Tiger Lake
Ice Lake
Cascade Lake
Cannon Lake
Skylake-X
Coffee Lake
Kaby Lake
Skylake
Broadwell
Haswell
Ivy Bridge
Sandy Bridge
Westmere
Nehalem
Wolfdale
Conroe
Tremont
Goldmont Plus
Goldmont
Airmont
Bonnell
AMD Zen 4
AMD Zen 3
AMD Zen 2
AMD Zen+
Alder Lake-P
Measurements
Throughput
Computed from the port usage: 7.00 (if an indexed addressing mode is used: 6.67)
Measured (loop):
223.75 (if an indexed addressing mode is used: 223.72)
Measured (unrolled):
223.65 (if an indexed addressing mode is used: 223.69)
Number of μops
Executed: 42
Retire slots: 40
Decoded (MITE): 0
Microcode Sequencer (MS): 42
Requires the complex decoder (no other instruction can be decoded with simple decoders in the same cycle)
Port usage:
5*p0+3*p0156B+2*p056+7*p06+6*p1+2*p15B+5*p49+7*p5+5*p78 (if an indexed addressing mode is used: 5*p0+8*p056+7*p06+6*p1+5*p49+5*p78)
Alder Lake-E
Measurements
Throughput
Measured (loop):
97.53
Measured (unrolled):
98.00
Number of μops
Executed: 20
Microcode Sequencer (MS): 19
Requires the complex decoder
Rocket Lake
Measurements
Throughput
Computed from the port usage: 9.00
Measured (loop):
260.43
Measured (unrolled):
257.60
Number of μops
Executed: 45
Retire slots: 46
Decoded (MITE): 0
Microcode Sequencer (MS): 48
Requires the complex decoder (no other instruction can be decoded with simple decoders in the same cycle)
Port usage:
5*p0+4*p0156+13*p06+6*p1+1*p23+5*p49+6*p5+5*p78
Tiger Lake
Measurements
Throughput
Computed from the port usage: 9.00
Measured (loop):
259.98
Measured (unrolled):
257.54
Number of μops
Executed: 45
Retire slots: 46
Decoded (MITE): 0
Microcode Sequencer (MS): 48
Requires the complex decoder (no other instruction can be decoded with simple decoders in the same cycle)
Port usage:
5*p0+1*p015+3*p0156+13*p06+6*p1+1*p23+5*p49+6*p5+5*p78
Ice Lake
Measurements
Throughput
Computed from the port usage: 8.50
Measured (loop):
217.90
Measured (unrolled):
216.13
Number of μops
Executed: 43
Retire slots: 42
Decoded (MITE): 0
Microcode Sequencer (MS): 44
Requires the complex decoder (no other instruction can be decoded with simple decoders in the same cycle)
Port usage:
6*p0+4*p0156+11*p06+6*p1+5*p49+6*p5+5*p78
Cascade Lake
Measurements
Throughput
Computed from the port usage: 9.50
Measured (loop):
228.92
Measured (unrolled):
230.50
Number of μops
Executed: 47
Retire slots: 42
Decoded (MITE): 0
Microcode Sequencer (MS): 44
Requires the complex decoder (no other instruction can be decoded with simple decoders in the same cycle)
Port usage:
7*p0+3*p0156+12*p06+7*p1+2*p23+3*p237+5*p4+8*p5
Cannon Lake
Measurements
Throughput
Computed from the port usage: 8.25
Measured (loop):
215.03
Measured (unrolled):
213.12
Number of μops
Executed: 43
Retire slots: 42
Decoded (MITE): 0
Microcode Sequencer (MS): 44
Requires the complex decoder (no other instruction can be decoded with simple decoders in the same cycle)
Port usage:
4*p0+4*p0156+11*p06+8*p1+2*p23+3*p237+5*p4+6*p5
Skylake-X
Measurements
Throughput
Computed from the port usage: 9.50
Measured (loop):
224.90
Measured (unrolled):
231.00
Number of μops
Executed: 47
Retire slots: 42
Decoded (MITE): 0
Microcode Sequencer (MS): 44
Requires the complex decoder (no other instruction can be decoded with simple decoders in the same cycle)
Port usage:
7*p0+3*p0156+12*p06+7*p1+2*p23+3*p237+5*p4+8*p5
Coffee Lake
Measurements
Throughput
Computed from the port usage: 9.50
Measured (loop):
213.28
Measured (unrolled):
210.75
Number of μops
Executed: 47
Retire slots: 42
Decoded (MITE): 0
Microcode Sequencer (MS): 44
Requires the complex decoder (no other instruction can be decoded with simple decoders in the same cycle)
Port usage:
7*p0+3*p0156+12*p06+7*p1+2*p23+3*p237+5*p4+8*p5
Kaby Lake
Measurements
Throughput
Computed from the port usage: 9.50
Measured (loop):
213.12
Measured (unrolled):
210.94
Number of μops
Executed: 47
Retire slots: 42
Decoded (MITE): 0
Microcode Sequencer (MS): 44
Requires the complex decoder (no other instruction can be decoded with simple decoders in the same cycle)
Port usage:
7*p0+3*p0156+12*p06+7*p1+2*p23+3*p237+5*p4+8*p5
Skylake
Measurements
Throughput
Computed from the port usage: 9.50
Measured (loop):
213.17
Measured (unrolled):
211.13
Number of μops
Executed: 47
Retire slots: 42
Decoded (MITE): 0
Microcode Sequencer (MS): 44
Requires the complex decoder (no other instruction can be decoded with simple decoders in the same cycle)
Port usage:
7*p0+3*p0156+12*p06+7*p1+2*p23+3*p237+5*p4+8*p5
Broadwell
Measurements
Throughput
Computed from the port usage: 11.00
Measured (loop):
209.02
Measured (unrolled):
206.75
Number of μops
Executed: 49
Retire slots: 42
Decoded (MITE): 0
Microcode Sequencer (MS): 44
Requires the complex decoder (no other instruction can be decoded with simple decoders in the same cycle)
Port usage:
7*p0+3*p0156+15*p06+6*p1+2*p23+3*p237+5*p4+8*p5
Haswell
Measurements
Throughput
Computed from the port usage: 10.50
Measured (loop):
210.02
Measured (unrolled):
207.84
Number of μops
Executed: 49
Retire slots: 42
Decoded (MITE): 0
Microcode Sequencer (MS): 44
Requires the complex decoder (no other instruction can be decoded with simple decoders in the same cycle)
Port usage:
7*p0+3*p0156+14*p06+7*p1+2*p23+3*p237+5*p4+8*p5
Ivy Bridge
Measurements
Throughput
Computed from the port usage: 19.00
Measured (loop):
238.70
Measured (unrolled):
236.56
Number of μops
Executed: 49
Retire slots: 45
Decoded (MITE): 1
Microcode Sequencer (MS): 46
Requires the complex decoder (no other instruction can be decoded with simple decoders in the same cycle)
Port usage:
10*p0+1*p015+10*p1+5*p23+4*p4+19*p5
Sandy Bridge
Measurements
Throughput
Computed from the port usage: 18.00
Measured (loop):
235.82
Measured (unrolled):
233.63
Number of μops
Executed: 49
Retire slots: 45
Decoded (MITE): 1
Microcode Sequencer (MS): 46
Requires the complex decoder (no other instruction can be decoded with simple decoders in the same cycle)
Port usage:
11*p0+1*p015+10*p1+5*p23+4*p4+18*p5
Westmere
Measurements
Throughput
Computed from the port usage: 16.00
Measured (loop):
203.80
Measured (unrolled):
203.00
Number of μops
Executed: 48
Retire slots: 45
Microcode Sequencer (MS): 133
Requires the complex decoder
Port usage:
11*p0+5*p015+8*p1+2*p2+3*p3+3*p4+16*p5
Nehalem
Measurements
Throughput
Computed from the port usage: 16.00
Measured (loop):
211.80
Measured (unrolled):
211.00
Number of μops
Executed: 49
Retire slots: 45
Microcode Sequencer (MS): 141
Requires the complex decoder
Port usage:
8*p0+4*p015+11*p1+2*p2+4*p3+4*p4+16*p5
Wolfdale
Measurements
Throughput
Computed from the port usage: 33.00
Measured (loop):
250.17
Measured (unrolled):
249.09
Number of μops
Executed: 57
Port usage:
14*p0+11*p1+7*p2+6*p3+6*p4+33*p5
Conroe
Measurements
Throughput
Computed from the port usage: 33.00
Measured (loop):
248.70
Measured (unrolled):
246.48
Number of μops
Executed: 60
Port usage:
13*p0+15*p1+7*p2+6*p3+6*p4+33*p5
Tremont
Measurements
Throughput
Measured (loop):
72.78
Measured (unrolled):
72.00
Number of μops
Executed: 16
Microcode Sequencer (MS): 15
Requires the complex decoder
Goldmont Plus
Measurements
Throughput
Measured (loop):
60.03
Measured (unrolled):
59.75
Number of μops
Executed: 18
Microcode Sequencer (MS): 18
Requires the complex decoder
Goldmont
Measurements
Throughput
Measured (loop):
61.00
Measured (unrolled):
61.00
Number of μops
Executed: 21
Microcode Sequencer (MS): 21
Requires the complex decoder
Airmont
Measurements
Throughput
Measured (loop):
71.44
Measured (unrolled):
70.87
Number of μops
Executed: 21
Microcode Sequencer (MS): 21
Requires the complex decoder
Bonnell
Measurements
Throughput
Measured (loop):
59.04
Measured (unrolled):
59.00
Number of μops
Executed: 20
Microcode Sequencer (MS): 20
Requires the complex decoder
AMD Zen 4
Measurements
Throughput
Measured (loop):
121.67
Measured (unrolled):
118.00
Number of μops
Executed: 26
AMD Zen 3
Measurements
Throughput
Measured (loop):
118.90
Measured (unrolled):
117.97
Number of μops
Executed: 26
Documentation
Number of μops: ucode
AMD Zen 2
Measurements
Throughput
Measured (loop):
126.80
Measured (unrolled):
126.00
Number of μops
Executed: 24
Documentation
Number of μops: ucode
AMD Zen+
Measurements
Throughput
Measured (loop):
126.90
Measured (unrolled):
126.00
Number of μops
Executed: 24
Documentation
Number of μops: ucode