Core2 logic

All the intel chips are limited by the ld/st bandwidth of 1 write and 1 read op per cycle , so the best we can achieve is 2.0c/w for non-sse and 1.0c/w for SSE

for non-sse we just use the k8 version which all run at at the best speed of 2.0c/w

and or xor andn orn xorn nand nor all from the k8

The SSE versions need to be writen