The team will post updates and news about our project here |
|
---|
| | Markus (mfro)
Posts 99 22 Oct 2016 12:25
| Just for completeness: with the (nearly) exact same code (just replaced the dbf instruction and have movem adapted), the FireBee scores 194 MIPS. If you allow me an additional comment and - hopefully recepted as constructive - criticism: the code as provided doesn't really show the quality of branch prediction very well. It repeatedly calls the assembler loops as subroutine and uses movem to save and restore registers. This would make it a perfect candidate for inlining as the expensive register save and restore operations effectively "hide" the branch prediction quality from the result. I just did a quick test inlining the assembler subroutine into the calling code and ended up with a score pretty close to the FireBee's clock rate which indicates near 100% hit rate of branch prediction.
| |
| | Gunnar von Boehn (Apollo Team Member) Posts 6254 22 Oct 2016 12:39
| Markus (mfro) wrote:
| Just for completeness: with the (nearly) exact same code (just replaced the dbf instruction and have movem adapted), the FireBee scores 194 MIPS. |
Thanks for the test. :-) We have some more CPU test which show a lot more detailed numbers. Would you like to compile for example our minibench? Cheers
| |
| | Gunnar von Boehn (Apollo Team Member) Posts 6254 22 Oct 2016 13:43
| Markus (mfro) wrote:
| Just for completeness: with the (nearly) exact same code (just replaced the dbf instruction
|
Can you show us the new code? Cheers Gunnar
| |
| | Markus (mfro)
Posts 99 22 Oct 2016 19:51
| Gunnar von Boehn wrote:
| Can you show us the new code?
|
Sure. The only changes to the original sources are: diff bench/loop1.S bench_fb/loop1.S 5c5,6 < movem.l d0-a6,-(sp) --- > lea -15 * 4(sp),sp > movem.l d0-a6,(sp) 9,10c10,11 < moveq #1,D6 < addq #1,D5 --- > moveq #2,D6 > addq.l #1,D5 22c23,25 < dbra D6,L2 --- > subq.l #1,d6 > bne.s L2 > // dbra D6,L2 25c28 < bne L1 --- > bne.s L1 27c30,31 < movem.l (sp)+,d0-a6 --- > movem.l (sp),d0-a6 > lea 15 * 4(sp),sp
| |
| | Markus (mfro)
Posts 99 22 Oct 2016 20:31
| Just tried to compile minibench on the FireBee. Are the sources complete? I had to comment out the _64x calls since they only seem to appear in the x86 assembly sources?
| |
| | Philippe Flype (Apollo Team Member) Posts 299 26 Oct 2016 00:05
| Hi, i took time to sort the files, here is archive of sources EXTERNAL LINK
| |
| | Markus (mfro)
Posts 99 26 Oct 2016 12:52
| Philippe Flype wrote:
| Hi, i took time to sort the files, here is archive of sources EXTERNAL LINK |
Thank you. Will probably look into it during the weekend.
| |
| | Vincent Rivière
Posts 87 29 Oct 2016 12:43
| Gunnar von Boehn wrote:
| Coldfire 266 Firebee 8
|
For precision:- The "ColdFire" processor spells with a capital F in the middle, this is how Freescale spells it. The "Atari Coldfire Project" (ACP) deliberately chose to remove that capital F in their name, that's their own choice. - Similarly, "FireBee" spells with a capital B in the middle. - The FireBee ColdFire clock is exactly 264 MHz (contrary to what we sometimes read).
| |
| | Markus (mfro)
Posts 99 30 Oct 2016 13:20
| Got the sources to compile on the FireBee. I had to mill them through PortAsm (the Micro-APL EXTERNAL LINK tool to convert 68k code to the ColdFire instruction set). The code spits out a lot of bogus values, however while others seem to be reasonable, even where PortAsm didn't change anything. So its probably not the tool to blame for that. Before I start searching the haystack: we do use the same calling conventions on Atari and Amiga, do we? Arguments on the stack, left to right, d0-d2/a0 = scratch, return values in d0, all other registers to be preserved by called function? ------------------------------------------------ Processor & Memory Performance Benchmark. $VER: Minibench 8.06 (04.07.16) Apollo Team ------------------------------------------------ ------------------------------------------------ CPU - Math 512KB ------------------------------------------------ NOP 209.7 ADD.L REG 1048576.0 ADD.W Im16 1048576.0 ADD.L Im32 1048576.0 SHIFT REG 1048576.0 SHIFT Imm 1048576.0 AND.L REG 1048576.0 ANDI.L Im32 1048576.0 MULU.L 1048576.0 DIV.L 52.4 ROL.L Dn,Dm 209.7 BFFFO Dn{},Dm 5.8 BFEXTU (a0){},Dn 17.4 ------------------------------------------------ CPU - Special 512KB ------------------------------------------------ fuse_ma_x16_1 Reg 1048576.0 fuse_ma_x16_2 Imm8 1048576.0 fuse_ma_x16_3 Imm32 1048576.0 fuse_ma_x16_4 And 1048576.0 bond_ma_x16_1 Reg 1048576.0 bond_ma_x16_2 Imm 1048576.0 bond_ma_x16_3 Mem 1048576.0 ea_latencyx16 209.7 alu_latency1x16 1048576.0 cache_latency1x16 104.8 ------------------------------------------------ CPU - EA 512KB ------------------------------------------------ R (d16,An) 209.7 R (d32,An) 209.7 R (An)+ 209.7 R (An) ; ADDQ #,An 209.7 R (An,Dn) 104.8 R (d32,An,Dn) 209.7 W (d16,An) 1048576.0 W (d32,An) 104.8 W (An)+ 209.7 W (An) ; ADDQ #,An 209.7 W (An,Dn) 209.7 W (d32,An,Dn) 69.9 U (d16,An) 209.7 U (d32,An) 52.4 U (An)+ 209.7 U (An) ; ADDQ #,An 104.8 U (An,Dn) 104.8 U (d32,An,Dn) 69.9 ------------------------------------------------ CPU - Loop 512KB ------------------------------------------------ loopx2 209.8 loopx4 209.7 loopx6 209.7 loopx8 1048576.0 loopx16 1048576.0 loopx32 1048576.0 loopx64 1048576.0 loopx128 1048576.0 loopix2 209.8 loopix4 209.7 loopix6 209.7 loopix8 1048576.0 loopix16 1048576.0 loopix32 1048576.0 loopix64 1048576.0 loopix128 1048576.0 ------------------------------------------------ CPU - Goto 512KB ------------------------------------------------ goto_x16 104.8 goto2_x16 209.7 goto4_x16 209.7 gotoCC 104.8 gotoCCTRUE 209.7 gotoCCFALSE 209.7 gosup_chainx1 69.9 gosup_chainx2 69.9 gosup_chainx4 104.8 ------------------------------------------------ CPU - Workload 512KB ------------------------------------------------ workload_AAAA 210.5 workload_LA 1048576.0 workload_LAA 1048576.0 workload_LAAA 1048576.0 workload_LAAAA 1048576.0 workload_LLA 1048576.0 workload_LLAA 1048576.0 workload_LLAAA 1048576.0 workload_LLAAAA 1048576.0 workload_LAALA 1048576.0 ------------------------------------------------ Measuring memory throughput: Results are in MB/sec. Higher value is faster. Memory 2 Memory Alignment 0-0 512KB 16KB 4KB ------------------------------------------------ libc memcpy 52.4 52.4 52.4 read 8 52.4 52.4 69.9 read 8x4 69.9 69.9 69.9 read 32 69.9 69.9 69.9 read 32x4 104.8 104.8 104.8 read 32x8 104.8 104.8 104.8 write 8 41.9 41.9 41.9 write 8x4 52.4 52.4 52.4 write 32 41.9 52.4 41.9 write 32x4 52.4 41.9 52.4 write 32x8 41.9 52.4 52.4 copy 8 58.9 52.4 52.4 copy 8x4 52.4 52.4 52.4 copy 32 58.9 58.9 58.9 copy 32x4 52.4 58.9 58.9 copy 32x8 58.9 58.9 52.4 ------------------------------------------------ Cache 2 Cache Alignment 0-0 512KB 16KB 4KB ------------------------------------------------ libc memcpy 52.4 418.6 2097152.0 read 8 52.4 104.8 69.9 read 8x4 69.9 104.8 209.7 read 32 69.9 1048576.0 209.7 read 32x4 69.9 1048576.0 1048576.0 read 32x8 104.8 1048576.0 1048576.0 write 8 41.9 104.8 104.9 write 8x4 52.4 209.7 209.7 write 32 41.9 1048576.0 209.7 write 32x4 52.4 1048576.0 1048576.0 write 32x8 52.4 1048576.0 1048576.0 copy 8 58.9 208.7 138.8 copy 8x4 58.9 208.7 208.7 copy 32 58.9 418.5 2097152.0 copy 32x4 52.4 2097152.0 418.6 copy 32x8 58.9 2097152.0 2097152.0 ------------------------------------------------ MIPS: 37754778 / 75 = 503397 MEMORY: 6992967 / 32 = 218530 TOTAL: 418203 ------------------------------------------------
| |
| | Vincent Rivière
Posts 87 30 Oct 2016 17:32
| Markus (mfro) wrote:
| Before I start searching the haystack: we do use the same calling conventions on Atari and Amiga, do we? Arguments on the stack, left to right, d0-d2/a0 = scratch, return values in d0, all other registers to be preserved by called function?
|
Beware, calling conventions may be different depending on the API.For Atari GCC functions, the scratch registers are d0-d1/a0-a1. I believe this is the standard for almost all GCC 680x0 targets. For Atari BIOS/XBIOS/GEMDOS system calls, scratch registers are d0-d2/a0-a2. We must be extremely careful about that when mixing C functions and system calls.
| |
| | Gunnar von Boehn (Apollo Team Member) Posts 6254 30 Oct 2016 17:47
| Markus (mfro) wrote:
| The code spits out a lot of bogus values,
|
Did you run it with strongest parameter? Cheers Gunnar
| |
| | Markus (mfro)
Posts 99 30 Oct 2016 19:00
| Gunnar von Boehn wrote:
|
Markus (mfro) wrote:
| The code spits out a lot of bogus values, |
Did you run it with strongest parameter? Cheers Gunnar
|
Not until just now. strongest results in slighly different values, but still these 7-digit numbers at places.
| |
| | Gunnar von Boehn (Apollo Team Member) Posts 6254 30 Oct 2016 19:41
| Markus (mfro) wrote:
| Not until just now. strongest results in slighly different values, but still these 7-digit numbers at places.
|
The values look to me like the time for the run is to short, or the resolution of the timer value not high enough.. If you increase the runtime this should be fixed. Can you try? Cheers Gunnar
| |
| | Markus (mfro)
Posts 99 30 Oct 2016 20:21
| Gunnar von Boehn wrote:
| Markus (mfro) wrote:
| Not until just now. strongest results in slighly different values, but still these 7-digit numbers at places. |
|
Just had a look into main.c - "strongest" is not a valid parameter, it seems. The only one that is recognized appears to be "68000". Do I have the latest sources? Gunnar von Boehn wrote:
| The values look to me like the time for the run is to short, or the resolution of the timer value not high enough.. If you increase the runtime this should be fixed. Can you try? |
Thanks. Set LOOPS (from 2) to 32 (didn't really check what it does in detail, but it appeared to me the most straightforward thing to do ;) ). At least it made the strange values vanish. As memory throughput numbers seem to be reasonable now (about the same - pretty disappointing values - I got from my own measurements), I guess it's indeed caused by the timer granularity and we're getting closer. Can you show 68080 values? This is what I have now:
firebee:~#./benchv6_68k ------------------------------------------------ Processor & Memory Performance Benchmark. $VER: Minibench 8.06 (04.07.16) Apollo Team ------------------------------------------------ ------------------------------------------------ CPU - Math 512KB ------------------------------------------------ NOP 167.4 ADD.L REG 838.4 ADD.W Im16 838.4 ADD.L Im32 838.4 SHIFT REG 1677.8 SHIFT Imm 1677.8 AND.L REG 1118.8 ANDI.L Im32 671.8 MULU.L 258.0 DIV.L 50.0 ROL.L Dn,Dm 139.6 BFFFO Dn{},Dm 5.2 BFEXTU (a0){},Dn 15.8 ------------------------------------------------ CPU - Special 512KB ------------------------------------------------ fuse_ma_x16_1 Reg 1118.8 fuse_ma_x16_2 Imm8 838.4 fuse_ma_x16_3 Imm32 1118.8 fuse_ma_x16_4 And 1677.8 bond_ma_x16_1 Reg 1118.8 bond_ma_x16_2 Imm 671.8 bond_ma_x16_3 Mem 335.8 ea_latencyx16 124.6 alu_latency1x16 258.0 cache_latency1x16 93.8 ------------------------------------------------ CPU - EA 512KB ------------------------------------------------ R (d16,An) 258.0 R (d32,An) 119.4 R (An)+ 258.0 R (An) ; ADDQ #,An 129.0 R (An,Dn) 129.0 R (d32,An,Dn) 124.6 W (d16,An) 258.0 W (d32,An) 124.6 W (An)+ 239.8 W (An) ; ADDQ #,An 129.0 W (An,Dn) 124.6 W (d32,An,Dn) 64.0 U (d16,An) 134.2 U (d32,An) 55.4 U (An)+ 134.2 U (An) ; ADDQ #,An 101.2 U (An,Dn) 104.4 U (d32,An,Dn) 52.2 ------------------------------------------------ CPU - Loop 512KB ------------------------------------------------ loopx2 176.0 loopx4 223.8 loopx6 223.8 loopx8 671.8 loopx16 838.4 loopx32 838.4 loopx64 1118.8 loopx128 1118.8 loopix2 129.0 loopix4 152.4 loopix6 159.8 loopix8 479.8 loopix16 671.8 loopix32 671.8 loopix64 671.8 loopix128 838.4 ------------------------------------------------ CPU - Goto 512KB ------------------------------------------------ goto_x16 76.6 goto2_x16 176.0 goto4_x16 239.8 gotoCC 134.2 gotoCCTRUE 223.8 gotoCCFALSE 167.4 gosup_chainx1 71.4 gosup_chainx2 79.8 gosup_chainx4 81.0 ------------------------------------------------ CPU - Workload 512KB ------------------------------------------------ workload_AAAA 258.0 workload_LA 9.4 workload_LAA 9.4 workload_LAAA 9.4 workload_LAAAA 9.4 workload_LLA 9.4 workload_LLAA 9.4 workload_LLAAA 9.4 workload_LLAAAA 9.4 workload_LAALA 9.4 ------------------------------------------------ Measuring memory throughput: Results are in MB/sec. Higher value is faster. Memory 2 Memory Alignment 0-0 512KB 16KB 4KB ------------------------------------------------ libc memcpy 52.2 52.2 52.2 read 8 56.4 55.4 56.4 read 8x4 62.8 61.8 61.8 read 32 79.8 78.8 76.6 read 32x4 78.8 76.6 78.8 read 32x8 79.8 78.8 78.8 write 8 45.8 45.8 45.8 write 8x4 45.8 45.8 45.8 write 32 45.8 45.8 45.8 write 32x4 45.8 44.6 45.8 write 32x8 45.8 45.8 45.8 copy 8 56.4 52.2 50.2 copy 8x4 56.4 54.4 52.2 copy 32 56.4 54.4 54.4 copy 32x4 56.4 54.4 54.4 copy 32x8 56.4 54.4 54.4 ------------------------------------------------ Cache 2 Cache Alignment 0-0 512KB 16KB 4KB ------------------------------------------------ libc memcpy 54.2 744.6 958.8 read 8 56.4 88.4 86.2 read 8x4 62.8 115.2 119.4 read 32 78.8 419.2 419.2 read 32x4 78.8 838.4 838.4 read 32x8 79.8 838.4 1118.8 write 8 45.8 108.6 104.4 write 8x4 45.8 209.0 209.0 write 32 45.8 479.8 559.8 write 32x4 45.8 838.4 838.4 write 32x8 45.8 1118.8 838.4 copy 8 56.4 172.6 176.0 copy 8x4 56.4 208.0 208.0 copy 32 56.4 670.8 744.6 copy 32x4 56.4 744.6 838.4 copy 32x8 56.4 838.4 958.8 ------------------------------------------------ MIPS: 28569 / 75 = 380 MEMORY: 7013 / 32 = 219 TOTAL: 332 ------------------------------------------------ firebee:~#
| |
| | Gunnar von Boehn (Apollo Team Member) Posts 6254 30 Oct 2016 20:45
| Hi Markus, Sorry tried to ping you on IRC with info. I was confused with the code versions. The scores that you have are still impossible values There must be a Config variable called CONFIG_TEST_SIZE Please set it to 64 MB I hope this will fix it Thanks
| |
| | Markus (mfro)
Posts 99 30 Oct 2016 21:52
| Gunnar von Boehn wrote:
| Hi Markus, Sorry tried to ping you on IRC with info. I was confused with the code versions. The scores that you have are still impossible values There must be a Config variable called CONFIG_TEST_SIZE Please set it to 64 MB I hope this will fix it Thanks
|
O.k., done, next try: Results in waaaay longer runtime (yawn ... I had to set LOOPS to 8 additionally because I got odd numbers for the CPU workload benchmark again), but then pretty much the same values (like less than 5% off) as posted above, so I decided not to clutter the forum with it. Maybe I have to inspect what PortAsm did to the code. Is there anything particular you'd consider way off so we'd look into that first?
| |
| | Markus (mfro)
Posts 99 31 Oct 2016 11:55
| I guess I've found at least most of the problematic parts. First thing was rather trivial: the original code uses preprocessor macros for instruction sequences like e.g.
#define NOP4 nop; nop; nop; nop
which looks innocent, but doesn't work with PortAsm. PortAsm interprets the semicolon as start of a comment (although it has been told we are using the gnu assembler where this is valid syntax), so only the very first instruction was executed. Fixed, but still no go. Second was a little trickier and not so obvious (at least not for an Atarian like me). As I just had to learn the hard way, Amiga code appears to use register A5 as frame pointer while the rest of the world uses A6. This isn't going to be a problem as long as you consistently use either %a5 or %fp. Unfortunately, this wasn't the case. The routines in tests_WORKLOAD_68k.S where using both (%fp in the LINK instruction, %a5 for the unlnk) which obviously corrupted registers of the calling routine and caused the code to fail. Fixing that takes us there:
firebee:~#./benchv6_68k ------------------------------------------------ Processor & Memory Performance Benchmark. $VER: Minibench 8.06 (04.07.16) Apollo Team ------------------------------------------------ ------------------------------------------------ CPU - Math 64MB ------------------------------------------------ NOP 43.4 ADD.L REG 255.6 ADD.W Im16 86.8 ADD.L Im32 255.6 SHIFT REG 255.6 SHIFT Imm 255.6 AND.L REG 255.6 ANDI.L Im32 172.0 MULU.L 65.2 DIV.L 9.0 ROL.L Dn,Dm 36.8 BFFFO Dn{},Dm 1.4 BFEXTU (a0){},Dn 3.8 ------------------------------------------------ CPU - Special 64MB ------------------------------------------------ fuse_ma_x16_1 Reg 479.2 fuse_ma_x16_2 Imm8 479.2 fuse_ma_x16_3 Imm32 260.6 fuse_ma_x16_4 And 479.2 bond_ma_x16_1 Reg 255.6 bond_ma_x16_2 Imm 255.6 bond_ma_x16_3 Mem 104.0 ea_latencyx16 123.6 alu_latency1x16 253.2 cache_latency1x16 93.4 ------------------------------------------------ CPU - EA 64MB ------------------------------------------------ R (d16,An) 255.6 R (d32,An) 120.2 R (An)+ 248.4 R (An) ; ADDQ #,An 127.8 R (An,Dn) 127.8 R (d32,An,Dn) 124.2 W (d16,An) 253.2 W (d32,An) 124.2 W (An)+ 239.6 W (An) ; ADDQ #,An 127.8 W (An,Dn) 125.4 W (d32,An,Dn) 64.2 U (d16,An) 132.8 U (d32,An) 55.8 U (An)+ 132.2 U (An) ; ADDQ #,An 102.8 U (An,Dn) 103.6 U (d32,An,Dn) 52.6 ------------------------------------------------ CPU - Loop 64MB ------------------------------------------------ loopx2 175.4 loopx4 211.2 loopx6 225.4 loopx8 235.4 loopx16 248.4 loopx32 255.6 loopx64 258.0 loopx128 263.0 loopix2 131.4 loopix4 150.8 loopix6 157.8 loopix8 162.6 loopix16 168.8 loopix32 172.0 loopix64 174.2 loopix128 175.4 ------------------------------------------------ CPU - Goto 64MB ------------------------------------------------ goto_x16 75.0 goto2_x16 175.4 goto4_x16 239.6 gotoCC 126.6 gotoCCTRUE 229.4 gotoCCFALSE 170.8 gosup_chainx1 71.2 gosup_chainx2 78.4 gosup_chainx4 82.4 ------------------------------------------------ CPU - Workload 64MB ------------------------------------------------ workload_AAAA 258.0 workload_LA 258.0 workload_LAA 258.0 workload_LAAA 258.0 workload_LAAAA 258.0 workload_LLA 258.0 workload_LLAA 258.0 workload_LLAAA 258.0 workload_LLAAAA 258.0 workload_LAALA 258.0 ------------------------------------------------ Measuring memory throughput: Results are in MB/sec. Higher value is faster. Memory 2 Memory Alignment 0-0 64MB 16KB 4KB ------------------------------------------------ libc memcpy 54.2 54.6 54.2 read 8 56.2 56.0 56.0 read 8x4 61.0 60.8 60.6 read 32 76.8 76.6 76.4 read 32x4 76.8 76.4 76.2 read 32x8 78.2 77.8 77.8 write 8 45.0 44.8 44.8 write 8x4 45.0 44.8 44.8 write 32 45.0 44.8 44.8 write 32x4 45.0 45.0 44.8 write 32x8 45.0 44.8 44.8 copy 8 58.0 54.2 52.4 copy 8x4 58.2 56.0 52.2 copy 32 58.4 56.8 56.4 copy 32x4 58.4 56.2 54.8 copy 32x8 58.4 56.8 56.4 ------------------------------------------------ Cache 2 Cache Alignment 0-0 64MB 16KB 4KB ------------------------------------------------ libc memcpy 54.2 800.2 852.0 read 8 56.2 87.6 87.4 read 8x4 61.0 116.6 116.2 read 32 76.8 419.4 412.8 read 32x4 76.8 838.8 813.4 read 32x8 78.2 925.6 894.6 write 8 45.0 105.2 104.8 write 8x4 45.0 209.6 208.0 write 32 45.0 526.2 516.2 write 32x4 45.0 838.8 813.4 write 32x8 45.0 925.6 894.6 copy 8 58.0 172.6 174.8 copy 8x4 58.2 206.4 208.6 copy 32 58.4 678.4 688.2 copy 32x4 58.4 800.2 824.8 copy 32x8 58.4 838.8 908.8 ------------------------------------------------ MIPS: 13920 / 75 = 185 MEMORY: 6870 / 32 = 214 TOTAL: 194 ------------------------------------------------ firebee:~#
Probably a little disappointing for us FireBee users, but it's not as bad as it looks. Nobody in the Atari world would ever come up with the strange idea to use bitset instructions. PortAsm generates code that loops with 16 instructions 32 x through the register ... If we do not count them, we reach a score of ------------------------------------------------ MIPS: 13840 / 71 = 194 MEMORY: 6880 / 32 = 215 TOTAL: 201 ------------------------------------------------
- at least more than 200 ;) On the other hand, minibench is mostly nice to the FireBee in that it uses word-sized instructions very sparingly. This is what really hurts ColdFire performance on TOS in real world. All in all: well done, Apollians!
| |
| | Gunnar von Boehn (Apollo Team Member) Posts 6254 31 Oct 2016 15:16
| Thanks Markus. Very interesting result. For comparison here are current Vampire scores. ----------------------------------------------------------- MiniBench 8.07h (MC68K) MEMORY USED: 2MB ----------------------------------------------------------- CPU - Math 512KB ----------------------------------------------------------- NOP 173.1 ADD.L Reg 173.1 ADD.W Imm16 163.5 ADD.L Imm32 163.5 SHIFT Reg 173.0 SHIFT Imm16 173.1 AND.L Reg 173.0 ANDI.L Imm32 162.1 MULU.L 30.0 DIV.L 2.6 ROL.L Dn,Dm 173.1 BFFFO Dn,Dm 86.6 BFEXTU (a0),Dn 85.8 ----------------------------------------------------------- CPU - Special 512KB ----------------------------------------------------------- Fuse1 x16 Reg 325.8 Fuse2 x16 Imm8 326.4 Fuse3 x16 Imm32 326.1 Fuse4 x16 And 326.2 Bond1 x16 Reg 173.1 Bond2 x16 Imm 163.5 Bond3 x16 Mem 173.1 EA Latency x16 86.6 ALU Latency x16 92.0 Cache Latency x16 86.6 ----------------------------------------------------------- CPU - EA 512KB ----------------------------------------------------------- R (d16,An) 89.2 R (d32,An) 88.8 R (An)+ 89.3 R (An); ADDQ #,An 86.5 R (An,Dn) 86.2 R (d32,An,Dn) 86.6 W (d16,An) 87.4 W (d32,An) 87.4 W (An)+ 87.8 W (An); ADDQ #,An 84.8 W (An,Dn) 85.2 W (d32,An,Dn) 84.7 U (d16,An) 87.4 U (d32,An) 87.8 U (An)+ 87.4 U (An); ADDQ #,An 85.2 U (An,Dn) 84.8 U (d32,An,Dn) 84.8 ----------------------------------------------------------- CPU - Loop 512KB ----------------------------------------------------------- Loop1 x2 92.0 Loop1 x4 121.8 Loop1 x6 138.0 Loop1 x8 147.2 Loop1 x16 162.1 Loop1 x32 173.0 Loop1 x64 173.1 Loop1 x128 176.7 Loop2 x2 92.0 Loop2 x4 121.9 Loop2 x6 138.0 Loop2 x8 147.2 Loop2 x16 162.1 Loop2 x32 163.5 Loop2 x64 173.1 Loop2 x128 176.5 ----------------------------------------------------------- CPU - Goto 512KB ----------------------------------------------------------- Goto1 81.8 Goto2 129.9 Goto4 147.1 Gosup1 39.2 Gosup2 39.0 Gosup4 36.4 GotoCC 132.5 GotoCC0 132.5 GotoCC1 132.5 ----------------------------------------------------------- CPU - Workload 512KB ----------------------------------------------------------- WorkLoad AAAA 176.4 WorkLoad LA 169.7 WorkLoad LAA 169.7 WorkLoad LAAA 175.9 WorkLoad LAAAA 176.5 WorkLoad LLA 129.9 WorkLoad LLAA 169.7 WorkLoad LLAAA 142.0 WorkLoad LLAAAA 169.7 WorkLoad LAALA 169.7 ----------------------------------------------------------- Memory to Memory (MB/sec) Alignment 0-0 512KB 16KB 4KB ----------------------------------------------------------- Libc Memcpy 220.3 216.6 210.5 Read 8 92.0 91.1 90.9 Read 8x4 92.0 91.5 90.1 Read 32 240.4 238.1 234.0 Read 32x4 240.4 237.4 232.9 Read 32x8 240.3 236.8 230.6 Write 8 90.5 90.2 89.6 Write 8x4 90.0 90.2 88.9 Write 32 359.5 356.2 345.3 Write 32x4 359.5 355.5 344.2 Write 32x8 359.4 354.6 340.1 Copy 8 54.1 54.1 54.8 Copy 8x4 70.0 70.8 70.5 Copy 32 170.9 170.1 168.9 Copy 32x4 274.6 272.5 268.1 Copy 32x8 274.2 272.3 268.4 ----------------------------------------------------------- Cache to Cache (MB/sec) Alignment 0-0 512KB 16KB 4KB ----------------------------------------------------------- Libc Memcpy 220.5 306.5 298.2 Read 8 91.5 91.8 91.2 Read 8x4 92.0 91.3 90.9 Read 32 240.4 363.0 353.1 Read 32x4 240.4 362.7 351.2 Read 32x8 240.4 361.2 346.1 Write 8 90.3 89.8 89.4 Write 8x4 90.5 90.2 89.3 Write 32 359.4 356.0 345.8 Write 32x4 359.5 355.6 344.4 Write 32x8 359.5 354.7 340.4 Copy 8 54.2 60.9 60.8 Copy 8x4 70.9 80.9 80.6 Copy 32 170.6 242.7 240.2 Copy 32x4 274.8 576.4 558.1 Copy 32x8 274.6 636.8 614.5 ----------------------------------------------------------- CPU: 10048 / 75 = 133 MIPS. MEM: 7152 / 32 = 223 MB/Sec. ALL: 160 Points. -----------------------------------------------------------
| |
| | Markus (mfro)
Posts 99 31 Oct 2016 17:56
| maybe there is someone who volunteers throwing this: EXTERNAL LINK at an Amiga compiler and post the outcome? Yes, its aged and probably more than far from being the best benchmark, but since Motorola originally claimed to score 401 VAX MIPS with this on the ColdFire V4, we simply _had to_ test it (and missed the goal miserably, so much for marketing). We have collected some numbers for different Atari machines here: http://firebee.org/~firebee/pictures/files/dhrystone.pdf Would be nice if we could add another 68k to it ...
| |
| | Vincent Rivière
Posts 87 31 Oct 2016 21:02
| Excellent investigation, Markus :-D
| |
|
|
|