ADD (AL, I8) - Throughput and Uops
With unroll_count=500 and no inner loop
Code:
0: 04 02 add al,0x2
Show nanoBench command
Results:
Instructions retired: 1.0
Core cycles: 1.0
Reference cycles: 0.99
RS_UOPS_DISPATCHED: 1.0
UOPS_PORT_0: 0.34
UOPS_PORT_1: 0.34
UOPS_PORT_2: 0.0
UOPS_PORT_3: 0.0
UOPS_PORT_4: 0.0
UOPS_PORT_5: 0.33
With loop_count=1000 and unroll_count=10
Code:
0: 04 02 add al,0x2
Show nanoBench command
Results:
Instructions retired: 1.2
Core cycles: 1.05
Reference cycles: 1.05
RS_UOPS_DISPATCHED: 1.2
UOPS_PORT_0: 0.41
UOPS_PORT_1: 0.34
UOPS_PORT_2: 0.0
UOPS_PORT_3: 0.0
UOPS_PORT_4: 0.0
UOPS_PORT_5: 0.44
With loop_count=100 and unroll_count=100
Code:
0: 04 02 add al,0x2
Show nanoBench command
Results:
Instructions retired: 1.02
Core cycles: 1.0
Reference cycles: 1.0
RS_UOPS_DISPATCHED: 1.02
UOPS_PORT_0: 0.34
UOPS_PORT_1: 0.34
UOPS_PORT_2: 0.0
UOPS_PORT_3: 0.0
UOPS_PORT_4: 0.0
UOPS_PORT_5: 0.34
With additional dependency-breaking instructions
With unroll_count=500 and no inner loop
Code:
0: 48 31 c0 xor rax,rax 3: 04 02 add al,0x2
Show nanoBench command
Results:
Instructions retired: 2.0
Core cycles: 0.66
Reference cycles: 0.67
RS_UOPS_DISPATCHED: 2.0
UOPS_PORT_0: 0.66
UOPS_PORT_1: 0.67
UOPS_PORT_2: 0.0
UOPS_PORT_3: 0.0
UOPS_PORT_4: 0.0
UOPS_PORT_5: 0.67
With loop_count=1000 and unroll_count=10
Code:
0: 48 31 c0 xor rax,rax 3: 04 02 add al,0x2
Show nanoBench command
Results:
Instructions retired: 2.2
Core cycles: 0.73
Reference cycles: 0.73
RS_UOPS_DISPATCHED: 2.2
UOPS_PORT_0: 0.73
UOPS_PORT_1: 0.73
UOPS_PORT_2: 0.0
UOPS_PORT_3: 0.0
UOPS_PORT_4: 0.0
UOPS_PORT_5: 0.73
With loop_count=100 and unroll_count=100
Code:
0: 48 31 c0 xor rax,rax 3: 04 02 add al,0x2
Show nanoBench command
Results:
Instructions retired: 2.02
Core cycles: 0.68
Reference cycles: 0.68
RS_UOPS_DISPATCHED: 2.02
UOPS_PORT_0: 0.67
UOPS_PORT_1: 0.67
UOPS_PORT_2: 0.0
UOPS_PORT_3: 0.0
UOPS_PORT_4: 0.0
UOPS_PORT_5: 0.67