Hello!
while(buf != end) got worse in ppc. gcc 4.3.4 got even more worse than gcc 3.4.6. I think it is safe to say that gcc 4.3.4 is busted when it comes to optimization, even on x86. Seen -O1 do better than -O2 for x86 with gcc 3.4.3.
BTW have you tried unrolling the loops or using __attribute__((hot))?
Since gcc in general isn't very good at optimization I think the best bet is to have different loops for different archs. I seen people do that based on endian: #ifdef CPU_BIG_ENDIAN for(buf--; len, --len) sum = acc32(sum, *++buf); #else while(buf != end) sum = add32(sum, *buf++); #endif
Huh, what should do endianity have in common with the choice of pre-/postincrement? However, I would still like to see a full profile of running BIRD, so that we know the real hot spots. (If it turns out that the checksum function is a hot spot, it would be interesting to write a SSE version.) Have a nice fortnight -- Martin `MJ' Mares <mj@ucw.cz> http://mj.ucw.cz/ Faculty of Math and Physics, Charles University, Prague, Czech Rep., Earth Not all rumors are as misleading as this one.