ADD (M64, R64)

Summary:	"Add"
Reference:	https://www.felixcloutier.com/x86/add
Extension:	BASE
Category:	BINARY
ISA-Set:	I86
CPL:	3
iform:	ADD_MEMv_GPRv
iclass:	ADD
ASM:	ADD

Operands

Operand 1 (r/w): Memory
Operand 2 (r): Register (RAX, RCX, RDX, RBX, RSP, RBP, RSI, RDI, R8, R9, R10, R11, R12, R13, R14, R15)
Operand 3 (w, suppressed): Flags (AF: w, CF: w, OF: w, PF: w, SF: w, ZF: w)

Available performance data

Arrow Lake-P
Arrow Lake-E
Meteor Lake-P
Meteor Lake-E
Emerald Rapids
Alder Lake-P
Alder Lake-E
Rocket Lake
Tiger Lake
Ice Lake
Cascade Lake
Cannon Lake
Skylake-X
Coffee Lake
Kaby Lake
Skylake
Broadwell
Haswell
Ivy Bridge
Sandy Bridge
Westmere
Nehalem
Wolfdale
Conroe
Tremont
Goldmont Plus
Goldmont
Airmont
Bonnell
AMD Zen 5
AMD Zen 4
AMD Zen 3
AMD Zen 2
AMD Zen+

Arrow Lake-P

Measurements

Latencies
Throughput
- Computed from the port usage: 0.50
- Measured (loop): 0.57 (if an indexed addressing mode is used: 0.58)
- Measured (unrolled): 1.00
Number of μops
- Executed: 4
- Retire slots: 2 (if an indexed addressing mode is used: 3)
- Decoded (MITE): 2 (if an indexed addressing mode is used: 3)
- Microcode Sequencer (MS): 0
- Requires the complex decoder (6 other instructions can be decoded with simple decoders in the same cycle)
Port usage: 1*ALU+1*LD+1*STA+1*STD

Arrow Lake-E

Measurements

Latencies
Throughput
- Measured (loop): 0.50
- Measured (unrolled): 0.57
Number of μops
- Executed: 1
- Microcode Sequencer (MS): 0

Meteor Lake-P

Measurements

Latencies
Throughput
- Computed from the port usage: 0.50
- Measured (loop): 0.75 (if an indexed addressing mode is used: 0.78)
- Measured (unrolled): 0.95 (if an indexed addressing mode is used: 1.00)
Number of μops
- Executed: 4
- Retire slots: 2 (if an indexed addressing mode is used: 3)
- Decoded (MITE): 2
- Microcode Sequencer (MS): 0
- Requires the complex decoder (4 other instructions can be decoded with simple decoders in the same cycle)
Port usage: 1*p0156B+1*p23A+1*p49+1*p78

Meteor Lake-E

Measurements

Latencies
Throughput
- Measured (loop): 0.52
- Measured (unrolled): 0.54
Number of μops
- Executed: 1
- Microcode Sequencer (MS): 0

Emerald Rapids

Measurements

Latencies
Throughput
- Computed from the port usage: 0.50
- Measured (loop): 0.62 (if an indexed addressing mode is used: 0.58)
- Measured (unrolled): 0.96 (if an indexed addressing mode is used: 1.00)
Number of μops
- Executed: 4
- Retire slots: 2 (if an indexed addressing mode is used: 3)
- Decoded (MITE): 2
- Microcode Sequencer (MS): 0
- Requires the complex decoder (4 other instructions can be decoded with simple decoders in the same cycle)
Port usage: 1*p0156B+1*p23A+1*p49+1*p78

Alder Lake-P

Measurements

Latencies
Throughput
- Computed from the port usage: 0.50
- Measured (loop): 0.54 (if an indexed addressing mode is used: 0.53)
- Measured (unrolled): 0.97 (if an indexed addressing mode is used: 1.00)
Number of μops
- Executed: 4
- Retire slots: 2 (if an indexed addressing mode is used: 3)
- Decoded (MITE): 2
- Microcode Sequencer (MS): 0
- Requires the complex decoder (4 other instructions can be decoded with simple decoders in the same cycle)
Port usage: 1*p0156B+1*p23A+1*p49+1*p78

Alder Lake-E

Measurements

Latencies
Throughput
- Measured (loop): 0.50
- Measured (unrolled): 0.52
Number of μops
- Executed: 1
- Microcode Sequencer (MS): 0

Rocket Lake

Measurements

Latencies
Throughput
- Computed from the port usage: 0.50
- Measured (loop): 0.60 (if an indexed addressing mode is used: 0.67)
- Measured (unrolled): 1.00
Number of μops
- Executed: 4
- Retire slots: 2 (if an indexed addressing mode is used: 3)
- Decoded (MITE): 2
- Microcode Sequencer (MS): 0
- Requires the complex decoder (3 other instructions can be decoded with simple decoders in the same cycle)
Port usage: 1*p0156+1*p23+1*p49+1*p78

Tiger Lake

Measurements

Latencies
Throughput
- Computed from the port usage: 0.50
- Measured (loop): 0.53 (if an indexed addressing mode is used: 0.67)
- Measured (unrolled): 1.00
Number of μops
- Executed: 4
- Retire slots: 2 (if an indexed addressing mode is used: 3)
- Decoded (MITE): 2
- Microcode Sequencer (MS): 0
- Requires the complex decoder (3 other instructions can be decoded with simple decoders in the same cycle)
Port usage: 1*p0156+1*p23+1*p49+1*p78

Ice Lake

Measurements

Latencies
Throughput
- Computed from the port usage: 0.50
- Measured (loop): 0.53 (if an indexed addressing mode is used: 0.67)
- Measured (unrolled): 1.00
Number of μops
- Executed: 4
- Retire slots: 2 (if an indexed addressing mode is used: 3)
- Decoded (MITE): 2
- Microcode Sequencer (MS): 0
- Requires the complex decoder (3 other instructions can be decoded with simple decoders in the same cycle)
Port usage: 1*p0156+1*p23+1*p49+1*p78

Cascade Lake

Measurements

Latencies
Throughput
- Computed from the port usage: 1.00
- Measured (loop): 1.00
- Measured (unrolled): 1.00
Number of μops
- Executed: 4
- Retire slots: 2 (if an indexed addressing mode is used: 3)
- Decoded (MITE): 2
- Microcode Sequencer (MS): 0
- Requires the complex decoder (3 other instructions can be decoded with simple decoders in the same cycle)
Port usage: 1*p0156+1*p23+1*p237+1*p4 (if an indexed addressing mode is used: 2*p23+1*p4)

Cannon Lake

Measurements

Latencies
Throughput
- Computed from the port usage: 1.00
- Measured (loop): 1.00
- Measured (unrolled): 1.00
Number of μops
- Executed: 4
- Retire slots: 2 (if an indexed addressing mode is used: 3)
- Decoded (MITE): 2
- Microcode Sequencer (MS): 0
- Requires the complex decoder (3 other instructions can be decoded with simple decoders in the same cycle)
Port usage: 1*p0156+1*p23+1*p237+1*p4 (if an indexed addressing mode is used: 1*p0156+2*p23+1*p4)

Skylake-X

Measurements

Latencies
Throughput
- Computed from the port usage: 1.00
- Measured (loop): 1.00
- Measured (unrolled): 1.00
Number of μops
- Executed: 4
- Retire slots: 2 (if an indexed addressing mode is used: 3)
- Decoded (MITE): 2
- Microcode Sequencer (MS): 0
- Requires the complex decoder (3 other instructions can be decoded with simple decoders in the same cycle)
Port usage: 1*p0156+1*p23+1*p237+1*p4 (if an indexed addressing mode is used: 1*p0156+2*p23+1*p4)

IACA 2.3

Throughput
- Computed from the port usage: 1.00
- IACA: 1.00
Number of μops: 4
Port usage: 1*p0156+1*p23+1*p237+1*p4

IACA 3.0

Throughput
- Computed from the port usage: 1.00
- IACA: 1.00
Number of μops: 4
Port usage: 1*p0156+1*p23+1*p237+1*p4

Coffee Lake

Measurements

Latencies
Throughput
- Computed from the port usage: 1.00
- Measured (loop): 1.00
- Measured (unrolled): 1.00
Number of μops
- Executed: 4
- Retire slots: 2 (if an indexed addressing mode is used: 3)
- Decoded (MITE): 2
- Microcode Sequencer (MS): 0
- Requires the complex decoder (3 other instructions can be decoded with simple decoders in the same cycle)
Port usage: 1*p0156+1*p23+1*p237+1*p4 (if an indexed addressing mode is used: 1*p0156+2*p23+1*p4)

Kaby Lake

Measurements

Latencies
Throughput
- Computed from the port usage: 1.00
- Measured (loop): 1.00
- Measured (unrolled): 1.00
Number of μops
- Executed: 4
- Retire slots: 2 (if an indexed addressing mode is used: 3)
- Decoded (MITE): 2
- Microcode Sequencer (MS): 0
- Requires the complex decoder (3 other instructions can be decoded with simple decoders in the same cycle)
Port usage: 1*p0156+1*p23+1*p237+1*p4 (if an indexed addressing mode is used: 1*p0156+2*p23+1*p4)

Skylake

Measurements

Latencies
Throughput
- Computed from the port usage: 1.00
- Measured (loop): 1.00
- Measured (unrolled): 1.00
Number of μops
- Executed: 4
- Retire slots: 2 (if an indexed addressing mode is used: 3)
- Decoded (MITE): 2
- Microcode Sequencer (MS): 0
- Requires the complex decoder (3 other instructions can be decoded with simple decoders in the same cycle)
Port usage: 1*p0156+1*p23+1*p237+1*p4 (if an indexed addressing mode is used: 1*p0156+2*p23+1*p4)

IACA 2.3

Throughput
- Computed from the port usage: 1.00
- IACA: 1.00
Number of μops: 4
Port usage: 1*p0156+1*p23+1*p237+1*p4

IACA 3.0

Throughput
- Computed from the port usage: 1.00
- IACA: 1.00
Number of μops: 4
Port usage: 1*p0156+1*p23+1*p237+1*p4

Broadwell

Measurements

Latencies
Throughput
- Computed from the port usage: 1.00
- Measured (loop): 1.00
- Measured (unrolled): 1.00
Number of μops
- Executed: 4
- Retire slots: 2 (if an indexed addressing mode is used: 3)
- Decoded (MITE): 2
- Microcode Sequencer (MS): 0
- Requires the complex decoder (2 other instructions can be decoded with simple decoders in the same cycle)
Port usage: 1*p0156+1*p23+1*p237+1*p4 (if an indexed addressing mode is used: 1*p0156+2*p23+1*p4)

IACA 2.2

Throughput
- Computed from the port usage: 1.00
- IACA: 1.00 (with the -no_interiteration flag: 1.00)
Number of μops: 4
Port usage: 1*p0156+1*p23+1*p237+1*p4

IACA 2.3

Throughput
- Computed from the port usage: 1.00
- IACA: 1.00
Number of μops: 4
Port usage: 1*p0156+1*p23+1*p237+1*p4

IACA 3.0

Throughput
- Computed from the port usage: 1.00
- IACA: 1.00
Number of μops: 4
Port usage: 1*p0156+1*p23+1*p237+1*p4

Haswell

Measurements

Latencies
Throughput
- Computed from the port usage: 1.00
- Measured (loop): 1.09 (if an indexed addressing mode is used: 1.00)
- Measured (unrolled): 1.09 (if an indexed addressing mode is used: 1.00)
Number of μops
- Executed: 4
- Retire slots: 2 (if an indexed addressing mode is used: 3)
- Decoded (MITE): 2
- Microcode Sequencer (MS): 0
- Requires the complex decoder (2 other instructions can be decoded with simple decoders in the same cycle)
Port usage: 1*p0156+1*p23+1*p237+1*p4 (if an indexed addressing mode is used: 1*p0156+2*p23+1*p4)

IACA 2.1

Latency: 7
Throughput
- Computed from the port usage: 1.00
- IACA: 1.00 (with the -no_interiteration flag: 1.00)
Number of μops: 4
Port usage: 1*p0156+1*p23+1*p237+1*p4

IACA 2.2

Throughput
- Computed from the port usage: 1.00
- IACA: 1.00 (with the -no_interiteration flag: 1.00)
Number of μops: 4
Port usage: 1*p0156+1*p23+1*p237+1*p4

IACA 2.3

Throughput
- Computed from the port usage: 1.00
- IACA: 1.00
Number of μops: 4
Port usage: 1*p0156+1*p23+1*p237+1*p4

IACA 3.0

Throughput
- Computed from the port usage: 1.00
- IACA: 1.00
Number of μops: 4
Port usage: 1*p0156+1*p23+1*p237+1*p4

Ivy Bridge

Measurements

Latencies
Throughput
- Computed from the port usage: 1.00
- Measured (loop): 1.47 (if an indexed addressing mode is used: 1.37)
- Measured (unrolled): 1.45 (if an indexed addressing mode is used: 1.00)
Number of μops
- Executed: 4
- Retire slots: 2 (if an indexed addressing mode is used: 4)
- Decoded (MITE): 2
- Microcode Sequencer (MS): 0
- Requires the complex decoder (2 other instructions can be decoded with simple decoders in the same cycle)
Port usage: 1*p015+2*p23+1*p4

IACA 2.1

Latency: 7
Throughput
- Computed from the port usage: 1.00
- IACA: 1.00 (with the -no_interiteration flag: 1.00)
Number of μops: 4
Port usage: 1*p015+2*p23+1*p4

IACA 2.2

Throughput
- Computed from the port usage: 1.00
- IACA: 1.00 (with the -no_interiteration flag: 1.00)
Number of μops: 4
Port usage: 1*p015+2*p23+1*p4

IACA 2.3

Throughput
- Computed from the port usage: 1.00
- IACA: 1.00
Number of μops: 4
Port usage: 1*p015+2*p23+1*p4

Sandy Bridge

Measurements

Latencies
Throughput
- Computed from the port usage: 1.00
- Measured (loop): 1.37 (if an indexed addressing mode is used: 1.25)
- Measured (unrolled): 1.33 (if an indexed addressing mode is used: 1.00)
Number of μops
- Executed: 4
- Retire slots: 2 (if an indexed addressing mode is used: 4)
- Decoded (MITE): 2
- Microcode Sequencer (MS): 0
- Requires the complex decoder (2 other instructions can be decoded with simple decoders in the same cycle)
Port usage: 1*p015+2*p23+1*p4

IACA 2.1

Latency: 7
Throughput
- Computed from the port usage: 1.00
- IACA: 1.00 (with the -no_interiteration flag: 1.00)
Number of μops: 4
Port usage: 1*p015+2*p23+1*p4

IACA 2.2

Throughput
- Computed from the port usage: 1.00
- IACA: 1.00 (with the -no_interiteration flag: 1.00)
Number of μops: 4
Port usage: 1*p015+2*p23+1*p4

IACA 2.3

Throughput
- Computed from the port usage: 1.00
- IACA: 1.00
Number of μops: 4
Port usage: 1*p015+2*p23+1*p4

Westmere

Measurements

Latencies
Throughput
- Computed from the port usage: 1.00
- Measured (loop): 1.00
- Measured (unrolled): 1.00
Number of μops
- Executed: 4
- Retire slots: 2 (if an indexed addressing mode is used: 4)
- Microcode Sequencer (MS): 0
- Requires the complex decoder
Port usage: 1*p015+1*p2+1*p3+1*p4

IACA 2.1

Latency: 6
Throughput
- Computed from the port usage: 1.00
- IACA: 1.00 (with the -no_interiteration flag: 1.00)
Number of μops: 4
Port usage: 1*p015+1*p2+1*p3+1*p4

IACA 2.2

Throughput
- Computed from the port usage: 1.00
- IACA: 1.00 (with the -no_interiteration flag: 1.00)
Number of μops: 4
Port usage: 1*p015+1*p2+1*p3+1*p4

Nehalem

Measurements

Latencies
Throughput
- Computed from the port usage: 1.00
- Measured (loop): 1.00
- Measured (unrolled): 1.00
Number of μops
- Executed: 4
- Retire slots: 2 (if an indexed addressing mode is used: 4)
- Microcode Sequencer (MS): 0
- Requires the complex decoder
Port usage: 1*p015+1*p2+1*p3+1*p4

IACA 2.1

Latency: 6
Throughput
- Computed from the port usage: 1.00
- IACA: 1.00 (with the -no_interiteration flag: 1.00)
Number of μops: 4
Port usage: 1*p015+1*p2+1*p3+1*p4

IACA 2.2

Throughput
- Computed from the port usage: 1.00
- IACA: 1.00 (with the -no_interiteration flag: 1.00)
Number of μops: 4
Port usage: 1*p015+1*p2+1*p3+1*p4