IMUL (M32) - Throughput and Uops


With a non-indexed addressing mode

With 1 independent instruction

With unroll_count=500 and no inner loop

With unroll_count=500, no inner loop, and 1 NOP

With unroll_count=500, no inner loop, and 2 NOPs

With unroll_count=500, no inner loop, and 3 NOPs

With loop_count=1000 and unroll_count=10

With loop_count=1000, unroll_count=10, and padding (redundant prefixes)

With loop_count=100 and unroll_count=100

With loop_count=100, unroll_count=100, and padding (redundant prefixes)

With additional dependency-breaking instructions

With unroll_count=500 and no inner loop

With loop_count=1000 and unroll_count=10

With loop_count=1000, unroll_count=10, and padding (redundant prefixes)

With loop_count=100 and unroll_count=100

With loop_count=100, unroll_count=100, and padding (redundant prefixes)

With 4 independent instructions

With unroll_count=200 and no inner loop

With loop_count=1000 and unroll_count=2

With loop_count=1000, unroll_count=2, and padding (redundant prefixes)

With loop_count=100 and unroll_count=20

With loop_count=100, unroll_count=20, and padding (redundant prefixes)

With additional dependency-breaking instructions

With unroll_count=200 and no inner loop

With loop_count=1000 and unroll_count=2

With loop_count=1000, unroll_count=2, and padding (redundant prefixes)

With loop_count=100 and unroll_count=20

With loop_count=100, unroll_count=20, and padding (redundant prefixes)

With 8 independent instructions

With unroll_count=100 and no inner loop

With loop_count=1000 and unroll_count=1

With loop_count=1000, unroll_count=1, and padding (redundant prefixes)

With loop_count=100 and unroll_count=10

With loop_count=100, unroll_count=10, and padding (redundant prefixes)

With additional dependency-breaking instructions

With unroll_count=100 and no inner loop

With loop_count=1000 and unroll_count=1

With loop_count=1000, unroll_count=1, and padding (redundant prefixes)

With loop_count=100 and unroll_count=10

With loop_count=100, unroll_count=10, and padding (redundant prefixes)

With 16 independent instructions

With unroll_count=100 and no inner loop

With loop_count=1000 and unroll_count=1

With loop_count=1000, unroll_count=1, and padding (redundant prefixes)

With loop_count=100 and unroll_count=10

With loop_count=100, unroll_count=10, and padding (redundant prefixes)

With additional dependency-breaking instructions

With unroll_count=100 and no inner loop

With loop_count=1000 and unroll_count=1

With loop_count=1000, unroll_count=1, and padding (redundant prefixes)

With loop_count=100 and unroll_count=10

With loop_count=100, unroll_count=10, and padding (redundant prefixes)


With an indexed addressing mode

With 1 independent instruction

With unroll_count=500 and no inner loop

With unroll_count=500, no inner loop, and 1 NOP

With unroll_count=500, no inner loop, and 2 NOPs

With unroll_count=500, no inner loop, and 3 NOPs

With loop_count=1000 and unroll_count=10

With loop_count=1000, unroll_count=10, and padding (redundant prefixes)

With loop_count=100 and unroll_count=100

With loop_count=100, unroll_count=100, and padding (redundant prefixes)

With additional dependency-breaking instructions

With unroll_count=500 and no inner loop

With loop_count=1000 and unroll_count=10

With loop_count=1000, unroll_count=10, and padding (redundant prefixes)

With loop_count=100 and unroll_count=100

With loop_count=100, unroll_count=100, and padding (redundant prefixes)

With 4 independent instructions

With unroll_count=200 and no inner loop

With loop_count=1000 and unroll_count=2

With loop_count=1000, unroll_count=2, and padding (redundant prefixes)

With loop_count=100 and unroll_count=20

With loop_count=100, unroll_count=20, and padding (redundant prefixes)

With additional dependency-breaking instructions

With unroll_count=200 and no inner loop

With loop_count=1000 and unroll_count=2

With loop_count=1000, unroll_count=2, and padding (redundant prefixes)

With loop_count=100 and unroll_count=20

With loop_count=100, unroll_count=20, and padding (redundant prefixes)

With 8 independent instructions

With unroll_count=100 and no inner loop

With loop_count=1000 and unroll_count=1

With loop_count=1000, unroll_count=1, and padding (redundant prefixes)

With loop_count=100 and unroll_count=10

With loop_count=100, unroll_count=10, and padding (redundant prefixes)

With additional dependency-breaking instructions

With unroll_count=100 and no inner loop

With loop_count=1000 and unroll_count=1

With loop_count=1000, unroll_count=1, and padding (redundant prefixes)

With loop_count=100 and unroll_count=10

With loop_count=100, unroll_count=10, and padding (redundant prefixes)

With 16 independent instructions

With unroll_count=100 and no inner loop

With loop_count=1000 and unroll_count=1

With loop_count=1000, unroll_count=1, and padding (redundant prefixes)

With loop_count=100 and unroll_count=10

With loop_count=100, unroll_count=10, and padding (redundant prefixes)

With additional dependency-breaking instructions

With unroll_count=100 and no inner loop

With loop_count=1000 and unroll_count=1

With loop_count=1000, unroll_count=1, and padding (redundant prefixes)

With loop_count=100 and unroll_count=10

With loop_count=100, unroll_count=10, and padding (redundant prefixes)