MUL (R32) - Throughput and Uops
With unroll_count=500 and no inner loop
Code:
0: 41 f7 e0 mul r8d
Show nanoBench command
Results:
Instructions retired: 1.0
Core cycles: 3.09
Reference cycles: 2.59
UOPS_RETIRED.ALL: 2.0
UOPS_MS: 0.0
With loop_count=1000 and unroll_count=10
Code:
0: 41 f7 e0 mul r8d
Show nanoBench command
Results:
Instructions retired: 1.2
Core cycles: 3.09
Reference cycles: 2.6
UOPS_RETIRED.ALL: 2.2
UOPS_MS: 0.0
With loop_count=100 and unroll_count=100
Code:
0: 41 f7 e0 mul r8d
Show nanoBench command
Results:
Instructions retired: 1.02
Core cycles: 3.09
Reference cycles: 2.61
UOPS_RETIRED.ALL: 2.02
UOPS_MS: 0.0
With additional dependency-breaking instructions
With unroll_count=500 and no inner loop
Code:
0: 48 31 c0 xor rax,rax 3: 41 f7 e0 mul r8d
Show nanoBench command
Results:
Instructions retired: 2.0
Core cycles: 1.07
Reference cycles: 0.88
UOPS_RETIRED.ALL: 3.0
UOPS_MS: 0.0
With loop_count=1000 and unroll_count=10
Code:
0: 48 31 c0 xor rax,rax 3: 41 f7 e0 mul r8d
Show nanoBench command
Results:
Instructions retired: 2.2
Core cycles: 1.1
Reference cycles: 0.92
UOPS_RETIRED.ALL: 3.2
UOPS_MS: 0.0
With loop_count=100 and unroll_count=100
Code:
0: 48 31 c0 xor rax,rax 3: 41 f7 e0 mul r8d
Show nanoBench command
Results:
Instructions retired: 2.02
Core cycles: 1.05
Reference cycles: 0.89
UOPS_RETIRED.ALL: 3.02
UOPS_MS: 0.0