VPOPCNTQ (ZMM, M512)
Extension:
AVX512EVEX
Category:
AVX512
ISA-Set:
AVX512_VPOPCNTDQ_512
CPL:
3
iform:
VPOPCNTQ_ZMMu64_MASKmskw_MEMu64_AVX512
iclass:
VPOPCNTQ
ASM:
VPOPCNTQ
Operands
Operand 1 (w): Register (ZMM0, ZMM1, ZMM2, ZMM3, ZMM4, ZMM5, ZMM6, ZMM7, ZMM8, ZMM9, ZMM10, ZMM11, ZMM12, ZMM13, ZMM14, ZMM15, ZMM16, ZMM17, ZMM18, ZMM19, ZMM20, ZMM21, ZMM22, ZMM23, ZMM24, ZMM25, ZMM26, ZMM27, ZMM28, ZMM29, ZMM30, ZMM31)
Operand 2 (r): Memory
Available performance data
Emerald Rapids
Alder Lake-P
Rocket Lake
Tiger Lake
Ice Lake
AMD Zen 5
AMD Zen 4
Emerald Rapids
Measurements
Latencies
Latency operand 2 → 1 (address, base register):
≤11
Latency operand 2 → 1 (address, index register):
≤11
Latency operand 2 → 1 (memory):
≤8
Throughput
Computed from the port usage: 1.00 (if an indexed addressing mode is used: 0.33)
Measured (loop):
1.00
Measured (unrolled):
1.00
Number of μops
Executed: 2
Retire slots: 1 (if an indexed addressing mode is used: 2)
Decoded (MITE): 1
Microcode Sequencer (MS): 0
Port usage:
1*p23A+1*p5 (if an indexed addressing mode is used: 1*p23A)
Alder Lake-P
Measurements
Latencies
Latency operand 2 → 1 (address, base register):
≤11
Latency operand 2 → 1 (address, index register):
≤11
Latency operand 2 → 1 (memory):
≤8
Throughput
Computed from the port usage: 1.00 (if an indexed addressing mode is used: 0.33)
Measured (loop):
1.00
Measured (unrolled):
1.00
Number of μops
Executed: 2
Retire slots: 1 (if an indexed addressing mode is used: 2)
Decoded (MITE): 1
Microcode Sequencer (MS): 0
Port usage:
1*p23A+1*p5 (if an indexed addressing mode is used: 1*p23A)
Rocket Lake
Measurements
Latencies
Latency operand 2 → 1 (address, base register):
≤11
Latency operand 2 → 1 (address, index register):
≤11
Latency operand 2 → 1 (memory):
≤9
Throughput
Computed from the port usage: 1.00
Measured (loop):
1.00
Measured (unrolled):
1.00
Number of μops
Executed: 2
Retire slots: 1 (if an indexed addressing mode is used: 2)
Decoded (MITE): 1
Microcode Sequencer (MS): 0
Port usage:
1*p23+1*p5
Tiger Lake
Measurements
Latencies
Latency operand 2 → 1 (address, base register):
≤11
Latency operand 2 → 1 (address, index register):
≤11
Latency operand 2 → 1 (memory):
≤9
Throughput
Computed from the port usage: 1.00
Measured (loop):
1.00
Measured (unrolled):
1.00
Number of μops
Executed: 2
Retire slots: 1 (if an indexed addressing mode is used: 2)
Decoded (MITE): 1
Microcode Sequencer (MS): 0
Port usage:
1*p23+1*p5
Ice Lake
Measurements
Latencies
Latency operand 2 → 1 (address, base register):
≤11
Latency operand 2 → 1 (address, index register):
≤11
Latency operand 2 → 1 (memory):
≤9
Throughput
Computed from the port usage: 1.00
Measured (loop):
1.00
Measured (unrolled):
1.00
Number of μops
Executed: 2
Retire slots: 1 (if an indexed addressing mode is used: 2)
Decoded (MITE): 1
Microcode Sequencer (MS): 0
Port usage:
1*p23+1*p5
AMD Zen 5
Measurements
Latencies
Latency operand 2 → 1 (address, base register):
≤10
Latency operand 2 → 1 (address, index register):
≤10
Latency operand 2 → 1 (memory):
≤11
Throughput
Computed from the port usage: 0.50
Measured (loop):
0.50
Measured (unrolled):
0.50
Number of μops
Executed: 1
Port usage:
1*FP03
Documentation
Latency: 2
Throughput: 0.50
Number of μops: 1
Port usage: FP0/3
AMD Zen 4
Measurements
Latencies
Latency operand 2 → 1 (address, base register):
≤11
Latency operand 2 → 1 (address, index register):
≤11
Latency operand 2 → 1 (memory):
≤13
Throughput
Computed from the port usage: 0.50
Measured (loop):
1.00
Measured (unrolled):
1.00
Number of μops
Executed: 1
Port usage:
1*FP01
Documentation
Latency: 2
Throughput: 1.00
Number of μops: 1
Port usage: FP0/1