VPADDQ (ZMM, K, ZMM, ZMM)
Summary:
"Add Packed Integers"
Reference:
https://www.felixcloutier.com/x86/PADDB:PADDW:PADDD:PADDQ.html
Extension:
AVX512EVEX
Category:
AVX512
ISA-Set:
AVX512F_512
CPL:
3
iform:
VPADDQ_ZMMu64_MASKmskw_ZMMu64_ZMMu64_AVX512
iclass:
VPADDQ
ASM:
VPADDQ
Operands
Operand 1 (r/w): Register (ZMM0, ZMM1, ZMM2, ZMM3, ZMM4, ZMM5, ZMM6, ZMM7, ZMM8, ZMM9, ZMM10, ZMM11, ZMM12, ZMM13, ZMM14, ZMM15, ZMM16, ZMM17, ZMM18, ZMM19, ZMM20, ZMM21, ZMM22, ZMM23, ZMM24, ZMM25, ZMM26, ZMM27, ZMM28, ZMM29, ZMM30, ZMM31)
Operand 2 (r): Register (K1, K2, K3, K4, K5, K6, K7)
Operand 3 (r): Register (ZMM0, ZMM1, ZMM2, ZMM3, ZMM4, ZMM5, ZMM6, ZMM7, ZMM8, ZMM9, ZMM10, ZMM11, ZMM12, ZMM13, ZMM14, ZMM15, ZMM16, ZMM17, ZMM18, ZMM19, ZMM20, ZMM21, ZMM22, ZMM23, ZMM24, ZMM25, ZMM26, ZMM27, ZMM28, ZMM29, ZMM30, ZMM31)
Operand 4 (r): Register (ZMM0, ZMM1, ZMM2, ZMM3, ZMM4, ZMM5, ZMM6, ZMM7, ZMM8, ZMM9, ZMM10, ZMM11, ZMM12, ZMM13, ZMM14, ZMM15, ZMM16, ZMM17, ZMM18, ZMM19, ZMM20, ZMM21, ZMM22, ZMM23, ZMM24, ZMM25, ZMM26, ZMM27, ZMM28, ZMM29, ZMM30, ZMM31)
Available performance data
Alder Lake-P
Rocket Lake
Tiger Lake
Ice Lake
Cascade Lake
Cannon Lake
Skylake-X
AMD Zen 4
Alder Lake-P
Measurements
Latencies
Latency operand 1 → 1:
1
Latency operand 2 → 1:
1
Latency operand 3 → 1:
1
Latency operand 4 → 1:
1
Throughput
Computed from the port usage: 0.50
Measured (loop):
0.50
Measured (unrolled):
0.50
Number of μops
Executed: 1
Retire slots: 1
Decoded (MITE): 1
Microcode Sequencer (MS): 0
Port usage:
1*p05
Rocket Lake
Measurements
Latencies
Latency operand 1 → 1:
1
Latency operand 2 → 1:
1
Latency operand 3 → 1:
1
Latency operand 4 → 1:
1
Throughput
Computed from the port usage: 0.50
Measured (loop):
0.50
Measured (unrolled):
0.50
Number of μops
Executed: 1
Retire slots: 1
Decoded (MITE): 1
Microcode Sequencer (MS): 0
Port usage:
1*p05
Tiger Lake
Measurements
Latencies
Latency operand 1 → 1:
1
Latency operand 2 → 1:
1
Latency operand 3 → 1:
1
Latency operand 4 → 1:
1
Throughput
Computed from the port usage: 0.50
Measured (loop):
0.50
Measured (unrolled):
0.50
Number of μops
Executed: 1
Retire slots: 1
Decoded (MITE): 1
Microcode Sequencer (MS): 0
Port usage:
1*p05
Ice Lake
Measurements
Latencies
Latency operand 1 → 1:
1
Latency operand 2 → 1:
1
Latency operand 3 → 1:
1
Latency operand 4 → 1:
1
Throughput
Computed from the port usage: 0.50
Measured (loop):
0.50
Measured (unrolled):
0.50
Number of μops
Executed: 1
Retire slots: 1
Decoded (MITE): 1
Microcode Sequencer (MS): 0
Port usage:
1*p05
Documentation
Latency: 1.0
Throughput: 0.5
Cascade Lake
Measurements
Latencies
Latency operand 1 → 1:
1
Latency operand 2 → 1:
1
Latency operand 3 → 1:
1
Latency operand 4 → 1:
1
Throughput
Computed from the port usage: 0.50
Measured (loop):
0.50
Measured (unrolled):
0.50
Number of μops
Executed: 1
Retire slots: 1
Decoded (MITE): 1
Microcode Sequencer (MS): 0
Port usage:
1*p05
Cannon Lake
Measurements
Latencies
Latency operand 1 → 1:
1
Latency operand 2 → 1:
1
Latency operand 3 → 1:
1
Latency operand 4 → 1:
1
Throughput
Computed from the port usage: 0.50
Measured (loop):
0.50
Measured (unrolled):
0.50
Number of μops
Executed: 1
Retire slots: 1
Decoded (MITE): 1
Microcode Sequencer (MS): 0
Port usage:
1*p05
Skylake-X
Measurements
Latencies
Latency operand 1 → 1:
1
Latency operand 2 → 1:
1
Latency operand 3 → 1:
1
Latency operand 4 → 1:
1
Throughput
Computed from the port usage: 0.50
Measured (loop):
0.50
Measured (unrolled):
0.50
Number of μops
Executed: 1
Retire slots: 1
Decoded (MITE): 1
Microcode Sequencer (MS): 0
Port usage:
1*p05
IACA 2.3
Throughput
Computed from the port usage: 0.50
IACA:
0.50
Number of μops:
1
Port usage:
1*p05
IACA 3.0
Throughput
Computed from the port usage: 0.50
IACA:
0.50
Number of μops:
1
Port usage:
1*p05
AMD Zen 4
Measurements
Latencies
Latency operand 1 → 1:
1
Latency operand 2 → 1:
2
Latency operand 3 → 1:
1
Latency operand 4 → 1:
1
Throughput
Computed from the port usage: 0.25
Measured (loop):
0.60
Measured (unrolled):
0.56
Number of μops
Executed: 1
Port usage:
1*FP0123