There is a ppro flag in cast-586 which turns
on/offgeneration of pentium
pro/II friendly code
This flag makes the inner loop one cycle longer, but generates
code that runs %30 faster on the pentium
pro/II, while only %7 slower
on the pentium. By default, this flag is on.