BSR (R64, M64) - Throughput and Uops


With a non-indexed addressing mode

With 1 independent instruction

With unroll_count=500 and no inner loop

With loop_count=1000 and unroll_count=10

With loop_count=100 and unroll_count=100

With 4 independent instructions

With unroll_count=200 and no inner loop

With loop_count=1000 and unroll_count=2

With loop_count=100 and unroll_count=20

With 8 independent instructions

With unroll_count=100 and no inner loop

With loop_count=1000 and unroll_count=1

With loop_count=100 and unroll_count=10


With an indexed addressing mode

With 1 independent instruction

With unroll_count=500 and no inner loop

With loop_count=1000 and unroll_count=10

With loop_count=100 and unroll_count=100

With 4 independent instructions

With unroll_count=200 and no inner loop

With loop_count=1000 and unroll_count=2

With loop_count=100 and unroll_count=20

With 8 independent instructions

With unroll_count=100 and no inner loop

With loop_count=1000 and unroll_count=1

With loop_count=100 and unroll_count=10


Tests for macro-fusion

With JB (Rel32)

With JB (Rel8)

With JBE (Rel32)

With JBE (Rel8)

With JL (Rel32)

With JL (Rel8)

With JLE (Rel32)

With JLE (Rel8)

With JNB (Rel32)

With JNB (Rel8)

With JNBE (Rel32)

With JNBE (Rel8)

With JNL (Rel32)

With JNL (Rel8)

With JNLE (Rel32)

With JNLE (Rel8)

With JNO (Rel32)

With JNO (Rel8)

With JNP (Rel32)

With JNP (Rel8)

With JNS (Rel32)

With JNS (Rel8)

With JNZ (Rel32)

With JNZ (Rel8)

With JO (Rel32)

With JO (Rel8)

With JP (Rel32)

With JP (Rel8)

With JS (Rel32)

With JS (Rel8)

With JZ (Rel32)

With JZ (Rel8)