AMD com not
Both com and not are bound by ld/st to 1c/w which we can achieve for
not , for com we are also bound by macro-op retirement so we can get
1+€ c/w
com runs at 1.25c/w with a 4-way unroll
not runs at 1.0c/w with a 2-way unroll
For the K10 we can use SSE to speed things up , the ld/st bound is
0.75c/w