Ondrej Zajicek <santiago@crfreenet.org> wrote on 2010/04/23 21:39:06:
On Fri, Apr 23, 2010 at 07:40:28PM +0200, Joakim Tjernlund wrote:
Martin Mares <mj@ucw.cz> wrote on 2010/04/23 19:23:18:
Hello!
So there isn't really difference in performance of both implementations. Even on slow embedded AMD Geode CPU, it gives ~ 180 MB/s.
No difference? what does 1.2 mean? to me this means 20% which is a lot
Yes, but according to Santiago's benchmarks, your code is sometimes 20% faster, sometimes 20% slower. It does not seem like a reason for change.
uhh, 20% slower? Ahh now I see, the MIPS. That is really strange. Santiago, are you sure that is not a typo?
FYI, code z = sum + x, z + (z < sum) was compiled to:
addu $2,$3,$2 sltu $3,$2,$3 addu $3,$2,$3
OK, MIPS has always been a strange platform to me. So I had to test myself again: x84 Core 2 duo, 3.1 MHz: New code: 64 byte buffer: 5899 +/-2.3% 128 byte buffer: 5570 +/-3.1% 256 byte buffer: 5797 +/-0.3% 512 byte buffer: 5501 +/-1.1% 1024 byte buffer: 5357 +/-1.5% 2048 byte buffer: 5277 +/-0.6% 4096 byte buffer: 5249 +/-1.2% 8192 byte buffer: 5245 +/-2.1% 16384 byte buffer: 5221 +/-1.6% Old code: 64 byte buffer: 7237 +/-0.4% 128 byte buffer: 6505 +/-1.7% 256 byte buffer: 6075 +/-1.6% 512 byte buffer: 6120 +/-1.6% 1024 byte buffer: 5773 +/-8.2% 2048 byte buffer: 5790 +/-2.0% 4096 byte buffer: 5474 +/-0.7% 8192 byte buffer: 5679 +/-47.1% 16384 byte buffer: 5339 +/-1.3% PowerPC MPC 8321, 266 Mhz New Code: 64 byte buffer: 68349 +/-8.0% 128 byte buffer: 58271 +/-8.7% 256 byte buffer: 52945 +/-8.4% 512 byte buffer: 50535 +/-8.6% 1024 byte buffer: 49288 +/-9.6% 2048 byte buffer: 48984 +/-10.3% 4096 byte buffer: 48345 +/-8.6% 8192 byte buffer: 48127 +/-8.4% Old Code: 64 byte buffer: 68349 +/-8.0% 128 byte buffer: 58271 +/-8.7% 256 byte buffer: 52945 +/-8.4% 512 byte buffer: 50535 +/-8.6% 1024 byte buffer: 49288 +/-9.6% 2048 byte buffer: 48984 +/-10.3% 4096 byte buffer: 48345 +/-8.6% 8192 byte buffer: 48127 +/-8.4% Just for fun, replace add32 with static inline unsigned long add32(unsigned long sum, unsigned long x) { asm ("addc %0, %0, %1": "=r"(sum) : "r" (x)); return sum; } MPC 8321 with asm addc: 64 byte buffer: 52007 +/-8.7% 128 byte buffer: 41986 +/-9.9% 256 byte buffer: 37160 +/-11.4% 512 byte buffer: 34593 +/-10.3% 1024 byte buffer: 33265 +/-10.4% 2048 byte buffer: 32648 +/-11.4% 4096 byte buffer: 32843 +/-14.1% 8192 byte buffer: 32223 +/-12.5% So the new code is better on both platforms and the asm addc on ppc is very fast. Test prog attached. Jocke (See attached file: crc32test.c)