VPERMB (ZMM, ZMM, ZMM)
Summary:
"Permute Packed Bytes Elements"
Reference:
https://www.felixcloutier.com/x86/VPERMB.html
Extension:
AVX512EVEX
Category:
AVX512_VBMI
ISA-Set:
AVX512_VBMI_512
CPL:
3
iform:
VPERMB_ZMMu8_MASKmskw_ZMMu8_ZMMu8_AVX512
iclass:
VPERMB
ASM:
VPERMB
Operands
Operand 1 (w): Register (ZMM0, ZMM1, ZMM2, ZMM3, ZMM4, ZMM5, ZMM6, ZMM7, ZMM8, ZMM9, ZMM10, ZMM11, ZMM12, ZMM13, ZMM14, ZMM15, ZMM16, ZMM17, ZMM18, ZMM19, ZMM20, ZMM21, ZMM22, ZMM23, ZMM24, ZMM25, ZMM26, ZMM27, ZMM28, ZMM29, ZMM30, ZMM31)
Operand 2 (r): Register (ZMM0, ZMM1, ZMM2, ZMM3, ZMM4, ZMM5, ZMM6, ZMM7, ZMM8, ZMM9, ZMM10, ZMM11, ZMM12, ZMM13, ZMM14, ZMM15, ZMM16, ZMM17, ZMM18, ZMM19, ZMM20, ZMM21, ZMM22, ZMM23, ZMM24, ZMM25, ZMM26, ZMM27, ZMM28, ZMM29, ZMM30, ZMM31)
Operand 3 (r): Register (ZMM0, ZMM1, ZMM2, ZMM3, ZMM4, ZMM5, ZMM6, ZMM7, ZMM8, ZMM9, ZMM10, ZMM11, ZMM12, ZMM13, ZMM14, ZMM15, ZMM16, ZMM17, ZMM18, ZMM19, ZMM20, ZMM21, ZMM22, ZMM23, ZMM24, ZMM25, ZMM26, ZMM27, ZMM28, ZMM29, ZMM30, ZMM31)
Available performance data
Alder Lake-P
Rocket Lake
Tiger Lake
Ice Lake
Cannon Lake
AMD Zen 4
Alder Lake-P
Measurements
Latencies
Latency operand 2 → 1:
3
Latency operand 3 → 1:
3
Throughput
Computed from the port usage: 1.00
Measured (loop):
1.00
Measured (unrolled):
1.00
Number of μops
Executed: 1
Retire slots: 1
Decoded (MITE): 1
Microcode Sequencer (MS): 0
Port usage:
1*p5
Rocket Lake
Measurements
Latencies
Latency operand 2 → 1:
3
Latency operand 3 → 1:
3
Throughput
Computed from the port usage: 1.00
Measured (loop):
1.00
Measured (unrolled):
1.00
Number of μops
Executed: 1
Retire slots: 1
Decoded (MITE): 1
Microcode Sequencer (MS): 0
Port usage:
1*p5
Tiger Lake
Measurements
Latencies
Latency operand 2 → 1:
3
Latency operand 3 → 1:
3
Throughput
Computed from the port usage: 1.00
Measured (loop):
1.00
Measured (unrolled):
1.00
Number of μops
Executed: 1
Retire slots: 1
Decoded (MITE): 1
Microcode Sequencer (MS): 0
Port usage:
1*p5
Ice Lake
Measurements
Latencies
Latency operand 2 → 1:
3
Latency operand 3 → 1:
3
Throughput
Computed from the port usage: 1.00
Measured (loop):
1.00
Measured (unrolled):
1.00
Number of μops
Executed: 1
Retire slots: 1
Decoded (MITE): 1
Microcode Sequencer (MS): 0
Port usage:
1*p5
Documentation
Latency: 3.0
Throughput: 1.0
Cannon Lake
Measurements
Latencies
Latency operand 2 → 1:
3
Latency operand 3 → 1:
3
Throughput
Computed from the port usage: 1.00
Measured (loop):
1.00
Measured (unrolled):
1.00
Number of μops
Executed: 1
Retire slots: 1
Decoded (MITE): 1
Microcode Sequencer (MS): 0
Port usage:
1*p5
AMD Zen 4
Measurements
Latencies
Latency operand 2 → 1:
6
Latency operand 3 → 1:
6
Throughput
Computed from the port usage: 0.50
Measured (loop):
1.00
Measured (unrolled):
1.00
Number of μops
Executed: 1
Port usage:
1*FP12