Re: [PATCH 0/5] IP checksum improvements

26 Apr 2010


      Martin Mares <mj@ucw.cz> wrote on 2010/04/26 10:31:31:
...
Hello!
...
while(buf != end) got worse in ppc. gcc 4.3.4 got even more worse
than gcc 3.4.6. I think it is safe to say that gcc 4.3.4 is busted when
it comes to optimization, even on x86. Seen -O1 do better than -O2 for
x86 with gcc 3.4.3.
BTW have you tried unrolling the loops or using __attribute__((hot))?
Not ATM, will probably don't do much on ppc. Its branch prediction
makes the loop "free" when do a for(;len;--len) loop.
...
...
Since gcc in general isn't very good at optimization I think the best bet
is to have different loops for different archs. I seen people do that based on
endian:
#ifdef CPU_BIG_ENDIAN
  for(buf--; len, --len)
    sum = acc32(sum, *++buf);
#else
  while(buf != end)
    sum = add32(sum, *buf++);
#endif
Huh, what should do endianity have in common with the choice of pre-/postincrement?
Because most archs that can deal with preinc. are big endian, the
for loop is important too. Decrement and test for zero is basically free.

Re: [PATCH 0/5] IP checksum improvements

Joakim Tjernlund