Re: [PATCH] ipsum_calc_block: Optimize size and speed

23 Apr 2010

      Ondrej Zajicek <santiago@crfreenet.org> wrote on 2010/04/23 21:39:06:
...
On Fri, Apr 23, 2010 at 07:40:28PM +0200, Joakim Tjernlund wrote:
...
Martin Mares <mj@ucw.cz> wrote on 2010/04/23 19:23:18:
...
Hello!
...
...
...
So there isn't really difference in performance of both
implementations. Even on slow embedded AMD Geode CPU, it gives
~ 180 MB/s.
No difference? what does 1.2 mean? to me this means 20% which is a lot
Yes, but according to Santiago's benchmarks, your code is sometimes 20%
faster, sometimes 20% slower. It does not seem like a reason for change.
uhh, 20% slower? Ahh now I see, the MIPS. That is really strange. Santiago, are
you sure that is not a typo?
FYI, code z = sum + x, z + (z < sum) was compiled to:
addu    $2,$3,$2
sltu    $3,$2,$3
addu    $3,$2,$3
OK, MIPS has always been a strange platform to me.
So I had to test myself again:
x84 Core 2 duo, 3.1 MHz:
New code:
   64 byte buffer:   5899 +/-2.3%
  128 byte buffer:   5570 +/-3.1%
  256 byte buffer:   5797 +/-0.3%
  512 byte buffer:   5501 +/-1.1%
 1024 byte buffer:   5357 +/-1.5%
 2048 byte buffer:   5277 +/-0.6%
 4096 byte buffer:   5249 +/-1.2%
 8192 byte buffer:   5245 +/-2.1%
16384 byte buffer:   5221 +/-1.6%

Old code:
   64 byte buffer:   7237 +/-0.4%
  128 byte buffer:   6505 +/-1.7%
  256 byte buffer:   6075 +/-1.6%
  512 byte buffer:   6120 +/-1.6%
 1024 byte buffer:   5773 +/-8.2%
 2048 byte buffer:   5790 +/-2.0%
 4096 byte buffer:   5474 +/-0.7%
 8192 byte buffer:   5679 +/-47.1%
16384 byte buffer:   5339 +/-1.3%

PowerPC MPC 8321, 266 Mhz

New Code:
   64 byte buffer:  68349 +/-8.0%
  128 byte buffer:  58271 +/-8.7%
  256 byte buffer:  52945 +/-8.4%
  512 byte buffer:  50535 +/-8.6%
 1024 byte buffer:  49288 +/-9.6%
 2048 byte buffer:  48984 +/-10.3%
 4096 byte buffer:  48345 +/-8.6%
 8192 byte buffer:  48127 +/-8.4%

Old Code:
   64 byte buffer:  68349 +/-8.0%
  128 byte buffer:  58271 +/-8.7%
  256 byte buffer:  52945 +/-8.4%
  512 byte buffer:  50535 +/-8.6%
 1024 byte buffer:  49288 +/-9.6%
 2048 byte buffer:  48984 +/-10.3%
 4096 byte buffer:  48345 +/-8.6%
 8192 byte buffer:  48127 +/-8.4%

Just for fun, replace add32 with
static inline
unsigned long
add32(unsigned long sum, unsigned long x)
{
	asm ("addc %0, %0, %1": "=r"(sum) : "r" (x));
	return sum;
}
MPC 8321 with asm addc:
   64 byte buffer:  52007 +/-8.7%
  128 byte buffer:  41986 +/-9.9%
  256 byte buffer:  37160 +/-11.4%
  512 byte buffer:  34593 +/-10.3%
 1024 byte buffer:  33265 +/-10.4%
 2048 byte buffer:  32648 +/-11.4%
 4096 byte buffer:  32843 +/-14.1%
 8192 byte buffer:  32223 +/-12.5%

So the new code is better on both platforms and the asm addc on ppc is
very fast.

Test prog attached.

 Jocke

(See attached file: crc32test.c)