VPSHUFB_EVEX (YMM, YMM, YMM)
Summary:
"Packed Shuffle Bytes"
Reference:
https://www.felixcloutier.com/x86/PSHUFB.html
Extension:
AVX512EVEX
Category:
AVX512
ISA-Set:
AVX512BW_256
CPL:
3
iform:
VPSHUFB_YMMu8_MASKmskw_YMMu8_YMMu8_AVX512
iclass:
VPSHUFB
ASM:
{evex} VPSHUFB
Operands
Operand 1 (w): Register (YMM0, YMM1, YMM2, YMM3, YMM4, YMM5, YMM6, YMM7, YMM8, YMM9, YMM10, YMM11, YMM12, YMM13, YMM14, YMM15, YMM16, YMM17, YMM18, YMM19, YMM20, YMM21, YMM22, YMM23, YMM24, YMM25, YMM26, YMM27, YMM28, YMM29, YMM30, YMM31)
Operand 2 (r): Register (YMM0, YMM1, YMM2, YMM3, YMM4, YMM5, YMM6, YMM7, YMM8, YMM9, YMM10, YMM11, YMM12, YMM13, YMM14, YMM15, YMM16, YMM17, YMM18, YMM19, YMM20, YMM21, YMM22, YMM23, YMM24, YMM25, YMM26, YMM27, YMM28, YMM29, YMM30, YMM31)
Operand 3 (r): Register (YMM0, YMM1, YMM2, YMM3, YMM4, YMM5, YMM6, YMM7, YMM8, YMM9, YMM10, YMM11, YMM12, YMM13, YMM14, YMM15, YMM16, YMM17, YMM18, YMM19, YMM20, YMM21, YMM22, YMM23, YMM24, YMM25, YMM26, YMM27, YMM28, YMM29, YMM30, YMM31)
Available performance data
Alder Lake-P
Rocket Lake
Tiger Lake
Ice Lake
Cascade Lake
Cannon Lake
Skylake-X
AMD Zen 4
Alder Lake-P
Measurements
Latencies
Latency operand 2 → 1:
1
Latency operand 3 → 1:
1
Throughput
Computed from the port usage: 0.50
Measured (loop):
0.50
Measured (unrolled):
0.50
Number of μops
Executed: 1
Retire slots: 1
Decoded (MITE): 1
Microcode Sequencer (MS): 0
Port usage:
1*p15
Rocket Lake
Measurements
Latencies
Latency operand 2 → 1:
1
Latency operand 3 → 1:
1
Throughput
Computed from the port usage: 0.50
Measured (loop):
0.50
Measured (unrolled):
0.50
Number of μops
Executed: 1
Retire slots: 1
Decoded (MITE): 1
Microcode Sequencer (MS): 0
Port usage:
1*p15
Tiger Lake
Measurements
Latencies
Latency operand 2 → 1:
1
Latency operand 3 → 1:
1
Throughput
Computed from the port usage: 0.50
Measured (loop):
0.50
Measured (unrolled):
0.50
Number of μops
Executed: 1
Retire slots: 1
Decoded (MITE): 1
Microcode Sequencer (MS): 0
Port usage:
1*p15
Ice Lake
Measurements
Latencies
Latency operand 2 → 1:
1
Latency operand 3 → 1:
1
Throughput
Computed from the port usage: 0.50
Measured (loop):
0.50
Measured (unrolled):
0.50
Number of μops
Executed: 1
Retire slots: 1
Decoded (MITE): 1
Microcode Sequencer (MS): 0
Port usage:
1*p15
Documentation
Latency: 1.0
Throughput: 0.5
Cascade Lake
Measurements
Latencies
Latency operand 2 → 1:
1
Latency operand 3 → 1:
1
Throughput
Computed from the port usage: 1.00
Measured (loop):
1.00
Measured (unrolled):
1.00
Number of μops
Executed: 1
Retire slots: 1
Decoded (MITE): 1
Microcode Sequencer (MS): 0
Port usage:
1*p5
Cannon Lake
Measurements
Latencies
Latency operand 2 → 1:
1
Latency operand 3 → 1:
1
Throughput
Computed from the port usage: 1.00
Measured (loop):
1.00
Measured (unrolled):
1.00
Number of μops
Executed: 1
Retire slots: 1
Decoded (MITE): 1
Microcode Sequencer (MS): 0
Port usage:
1*p5
Skylake-X
Measurements
Latencies
Latency operand 2 → 1:
1
Latency operand 3 → 1:
1
Throughput
Computed from the port usage: 1.00
Measured (loop):
1.00
Measured (unrolled):
1.00
Number of μops
Executed: 1
Retire slots: 1
Decoded (MITE): 1
Microcode Sequencer (MS): 0
Port usage:
1*p5
IACA 2.3
Throughput
Computed from the port usage: 1.00
IACA:
1.00
Number of μops:
1
Port usage:
1*p5
IACA 3.0
Throughput
Computed from the port usage: 1.00
IACA:
0.98
Number of μops:
1
Port usage:
1*p5
AMD Zen 4
Measurements
Latencies
Latency operand 2 → 1:
2
Latency operand 3 → 1:
2
Throughput
Computed from the port usage: 0.50
Measured (loop):
0.50
Measured (unrolled):
0.50
Number of μops
Executed: 1
Port usage:
1*FP12