PCMPESTRM (XMM, XMM, I8) - Throughput and Uops
With unroll_count=500 and no inner loop
Code:
0: 66 0f 3a 60 ca 02 pcmpestrm xmm1,xmm2,0x2
Init:
PXOR XMM1, XMM1; PXOR XMM2, XMM2
Show nanoBench command
Results:
Instructions retired: 1.0
Core cycles: 6.0
Reference cycles: 5.0
UOPS_RETIRED.ANY: 9.0
RETIRE_SLOTS: 9.0
UOPS_MS: 25.0
UOPS_PORT_0: 3.66
UOPS_PORT_1: 3.0
UOPS_PORT_2: 0.0
UOPS_PORT_3: 0.0
UOPS_PORT_4: 0.0
UOPS_PORT_5: 2.34
DIV_CYCLES: 0.0
ILD_STALL.LCP: 0.0
INST_DECODED.DEC0: 1.0
With loop_count=1000 and unroll_count=10
Code:
0: 66 0f 3a 60 ca 02 pcmpestrm xmm1,xmm2,0x2
Init:
PXOR XMM1, XMM1; PXOR XMM2, XMM2
Show nanoBench command
Results:
Instructions retired: 1.2
Core cycles: 6.1
Reference cycles: 5.08
UOPS_RETIRED.ANY: 9.2
RETIRE_SLOTS: 9.2
UOPS_MS: 25.0
UOPS_PORT_0: 3.27
UOPS_PORT_1: 3.0
UOPS_PORT_2: 0.0
UOPS_PORT_3: 0.0
UOPS_PORT_4: 0.0
UOPS_PORT_5: 2.93
DIV_CYCLES: 0.0
ILD_STALL.LCP: 0.0
INST_DECODED.DEC0: 1.0
With loop_count=100 and unroll_count=100
Code:
0: 66 0f 3a 60 ca 02 pcmpestrm xmm1,xmm2,0x2
Init:
PXOR XMM1, XMM1; PXOR XMM2, XMM2
Show nanoBench command
Results:
Instructions retired: 1.02
Core cycles: 6.01
Reference cycles: 5.01
UOPS_RETIRED.ANY: 9.02
RETIRE_SLOTS: 9.02
UOPS_MS: 25.0
UOPS_PORT_0: 3.63
UOPS_PORT_1: 3.0
UOPS_PORT_2: 0.0
UOPS_PORT_3: 0.0
UOPS_PORT_4: 0.0
UOPS_PORT_5: 2.39
DIV_CYCLES: 0.0
ILD_STALL.LCP: 0.0
INST_DECODED.DEC0: 1.0