PCMPESTRM (XMM, XMM, I8) - Throughput and Uops
With unroll_count=500 and no inner loop
Code:
0: 66 0f 3a 60 ca 02 pcmpestrm xmm1,xmm2,0x2
Init:
PXOR XMM1, XMM1; PXOR XMM2, XMM2
Show nanoBench command
Results:
Instructions retired: 1.0
Core cycles: 6.0
Reference cycles: 5.52
UOPS_RETIRED.ANY: 9.0
RETIRE_SLOTS: 9.0
UOPS_MS: 31.0
UOPS_PORT_0: 1.5
UOPS_PORT_1: 3.25
UOPS_PORT_2: 0.0
UOPS_PORT_3: 0.0
UOPS_PORT_4: 0.0
UOPS_PORT_5: 4.26
DIV_CYCLES: 0.0
ILD_STALL.LCP: 0.0
INST_DECODED.DEC0: 1.0
With loop_count=1000 and unroll_count=10
Code:
0: 66 0f 3a 60 ca 02 pcmpestrm xmm1,xmm2,0x2
Init:
PXOR XMM1, XMM1; PXOR XMM2, XMM2
Show nanoBench command
Results:
Instructions retired: 1.2
Core cycles: 6.1
Reference cycles: 5.63
UOPS_RETIRED.ANY: 9.2
RETIRE_SLOTS: 9.2
UOPS_MS: 31.0
UOPS_PORT_0: 1.5
UOPS_PORT_1: 3.2
UOPS_PORT_2: 0.0
UOPS_PORT_3: 0.0
UOPS_PORT_4: 0.0
UOPS_PORT_5: 4.5
DIV_CYCLES: 0.0
ILD_STALL.LCP: 0.0
INST_DECODED.DEC0: 1.0
With loop_count=100 and unroll_count=100
Code:
0: 66 0f 3a 60 ca 02 pcmpestrm xmm1,xmm2,0x2
Init:
PXOR XMM1, XMM1; PXOR XMM2, XMM2
Show nanoBench command
Results:
Instructions retired: 1.02
Core cycles: 6.01
Reference cycles: 5.55
UOPS_RETIRED.ANY: 9.02
RETIRE_SLOTS: 9.02
UOPS_MS: 31.0
UOPS_PORT_0: 1.41
UOPS_PORT_1: 3.2
UOPS_PORT_2: 0.0
UOPS_PORT_3: 0.0
UOPS_PORT_4: 0.0
UOPS_PORT_5: 4.41
DIV_CYCLES: 0.0
ILD_STALL.LCP: 0.0
INST_DECODED.DEC0: 1.0