Overview Features Instructions Performance Forum Downloads Products Reseller Contact

Welcome to the Apollo Forum

This forum is for people interested in the APOLLO CPU.
Please read the forum usage manual.
VISIT APOLLO IRC CHANNEL



All TopicsNewsPerformanceGamesDemosApolloVampireCoffinReleases
The team will post updates and news here

68080page  1 2 3 

Markus (mfro)

Posts 89
22 Oct 2016 12:25


Just for completeness: with the (nearly) exact same code (just replaced the dbf instruction and have movem adapted), the FireBee scores 194 MIPS.

If you allow me an additional comment and - hopefully recepted as constructive - criticism: the code as provided doesn't really show the quality of branch prediction very well.

It repeatedly calls the assembler loops as subroutine and uses movem to save and restore registers. This would make it a perfect candidate for inlining as the expensive register save and restore operations effectively "hide" the branch prediction quality from the result.

I just did a quick test inlining the assembler subroutine into the calling code and ended up with a score pretty close to the FireBee's clock rate which indicates near 100% hit rate of branch prediction.



Gunnar von Boehn
(Apollo Team Member)
Posts 3357
22 Oct 2016 12:39


Markus (mfro) wrote:

Just for completeness: with the (nearly) exact same code (just replaced the dbf instruction and have movem adapted), the FireBee scores 194 MIPS.

Thanks for the test. :-)

We have some more CPU test which show a lot more detailed numbers.
Would you like to compile for example our minibench?

Cheers


Gunnar von Boehn
(Apollo Team Member)
Posts 3357
22 Oct 2016 13:43


Markus (mfro) wrote:

Just for completeness: with the (nearly) exact same code (just replaced the dbf instruction
 

Can you show us the new code?

Cheers
Gunnar



Markus (mfro)

Posts 89
22 Oct 2016 19:51


Gunnar von Boehn wrote:

  Can you show us the new code?

Sure. The only changes to the original sources are:


diff bench/loop1.S bench_fb/loop1.S
5c5,6
<  movem.l d0-a6,-(sp)
---
>  lea  -15 * 4(sp),sp
>  movem.l d0-a6,(sp)
9,10c10,11
<  moveq  #1,D6
<  addq    #1,D5
---
>  moveq  #2,D6
>  addq.l    #1,D5
22c23,25
<  dbra    D6,L2
---
>  subq.l #1,d6
>  bne.s  L2
>  // dbra    D6,L2
25c28
<  bne    L1
---
>  bne.s    L1
27c30,31
<  movem.l (sp)+,d0-a6
---
>  movem.l (sp),d0-a6
>  lea  15 * 4(sp),sp




Markus (mfro)

Posts 89
22 Oct 2016 20:31


Just tried to compile minibench on the FireBee.

Are the sources complete? I had to comment out the _64x calls since they only seem to appear in the x86 assembly sources?


Philippe Flype

Posts 190
26 Oct 2016 00:05


Hi, i took time to sort the files,

here is archive of sources EXTERNAL LINK


Markus (mfro)

Posts 89
26 Oct 2016 12:52


Philippe Flype wrote:

Hi, i took time to sort the files,
 
  here is archive of sources EXTERNAL LINK 

Thank you. Will probably look into it during the weekend.


Vincent Rivière

Posts 83
29 Oct 2016 12:43


Gunnar von Boehn wrote:

        Coldfire 266    Firebee              8

 
For precision:

- The "ColdFire" processor spells with a capital F in the middle, this is how Freescale spells it. The "Atari Coldfire Project" (ACP) deliberately chose to remove that capital F in their name, that's their own choice.

- Similarly, "FireBee" spells with a capital B in the middle.

- The FireBee ColdFire clock is exactly 264 MHz (contrary to what we sometimes read).



Markus (mfro)

Posts 89
30 Oct 2016 13:20


Got the sources to compile on the FireBee. I had to mill them through PortAsm (the Micro-APL EXTERNAL LINK tool to convert 68k code to the ColdFire instruction set).
 
  The code spits out a lot of bogus values, however while others seem to be reasonable, even where PortAsm didn't change anything. So its probably not the tool to blame for that.
 
  Before I start searching the haystack: we do use the same calling conventions on Atari and Amiga, do we?
 
  Arguments on the stack, left to right, d0-d2/a0 = scratch, return values in d0, all other registers to be preserved by called function?
 


  ------------------------------------------------
  Processor & Memory Performance Benchmark.
  $VER: Minibench 8.06 (04.07.16) Apollo Team
  ------------------------------------------------
  ------------------------------------------------
  CPU - Math                512KB
  ------------------------------------------------
  NOP                      209.7
  ADD.L REG            1048576.0
  ADD.W Im16            1048576.0
  ADD.L Im32            1048576.0
  SHIFT REG            1048576.0
  SHIFT Imm            1048576.0
  AND.L REG            1048576.0
  ANDI.L Im32          1048576.0
  MULU.L                1048576.0
  DIV.L                      52.4
  ROL.L Dn,Dm              209.7
  BFFFO  Dn{},Dm              5.8
  BFEXTU (a0){},Dn          17.4
  ------------------------------------------------
  CPU - Special            512KB
  ------------------------------------------------
  fuse_ma_x16_1 Reg    1048576.0
  fuse_ma_x16_2 Imm8    1048576.0
  fuse_ma_x16_3 Imm32  1048576.0
  fuse_ma_x16_4 And    1048576.0
  bond_ma_x16_1 Reg    1048576.0
  bond_ma_x16_2 Imm    1048576.0
  bond_ma_x16_3 Mem    1048576.0
  ea_latencyx16            209.7
  alu_latency1x16      1048576.0
  cache_latency1x16        104.8
  ------------------------------------------------
  CPU - EA                  512KB
  ------------------------------------------------
  R (d16,An)                209.7
  R (d32,An)                209.7
  R (An)+                  209.7
  R (An) ; ADDQ #,An        209.7
  R (An,Dn)                104.8
  R (d32,An,Dn)            209.7
  W (d16,An)            1048576.0
  W (d32,An)                104.8
  W (An)+                  209.7
  W (An) ; ADDQ #,An        209.7
  W (An,Dn)                209.7
  W (d32,An,Dn)              69.9
  U (d16,An)                209.7
  U (d32,An)                52.4
  U (An)+                  209.7
  U (An) ; ADDQ #,An        104.8
  U (An,Dn)                104.8
  U (d32,An,Dn)              69.9
  ------------------------------------------------
  CPU - Loop                512KB
  ------------------------------------------------
  loopx2                    209.8
  loopx4                    209.7
  loopx6                    209.7
  loopx8                1048576.0
  loopx16              1048576.0
  loopx32              1048576.0
  loopx64              1048576.0
  loopx128              1048576.0
  loopix2                  209.8
  loopix4                  209.7
  loopix6                  209.7
  loopix8              1048576.0
  loopix16              1048576.0
  loopix32              1048576.0
  loopix64              1048576.0
  loopix128            1048576.0
  ------------------------------------------------
  CPU - Goto                512KB
  ------------------------------------------------
  goto_x16                  104.8
  goto2_x16                209.7
  goto4_x16                209.7
  gotoCC                    104.8
  gotoCCTRUE                209.7
  gotoCCFALSE              209.7
  gosup_chainx1              69.9
  gosup_chainx2              69.9
  gosup_chainx4            104.8
  ------------------------------------------------
  CPU - Workload            512KB
  ------------------------------------------------
  workload_AAAA            210.5
  workload_LA          1048576.0
  workload_LAA          1048576.0
  workload_LAAA        1048576.0
  workload_LAAAA        1048576.0
  workload_LLA          1048576.0
  workload_LLAA        1048576.0
  workload_LLAAA        1048576.0
  workload_LLAAAA      1048576.0
  workload_LAALA        1048576.0
  ------------------------------------------------
  Measuring memory throughput:
  Results are in MB/sec. Higher value is faster.
  Memory 2 Memory
  Alignment 0-0      512KB      16KB        4KB
  ------------------------------------------------
  libc memcpy          52.4      52.4      52.4
  read 8                52.4      52.4      69.9
  read 8x4              69.9      69.9      69.9
  read 32              69.9      69.9      69.9
  read 32x4            104.8      104.8      104.8
  read 32x8            104.8      104.8      104.8
  write 8              41.9      41.9      41.9
  write 8x4            52.4      52.4      52.4
  write 32              41.9      52.4      41.9
  write 32x4            52.4      41.9      52.4
  write 32x8            41.9      52.4      52.4
  copy 8                58.9      52.4      52.4
  copy 8x4              52.4      52.4      52.4
  copy 32              58.9      58.9      58.9
  copy 32x4            52.4      58.9      58.9
  copy 32x8            58.9      58.9      52.4
  ------------------------------------------------
  Cache 2 Cache
  Alignment 0-0      512KB      16KB        4KB
  ------------------------------------------------
  libc memcpy          52.4      418.6  2097152.0
  read 8                52.4      104.8      69.9
  read 8x4              69.9      104.8      209.7
  read 32              69.9  1048576.0      209.7
  read 32x4            69.9  1048576.0  1048576.0
  read 32x8            104.8  1048576.0  1048576.0
  write 8              41.9      104.8      104.9
  write 8x4            52.4      209.7      209.7
  write 32              41.9  1048576.0      209.7
  write 32x4            52.4  1048576.0  1048576.0
  write 32x8            52.4  1048576.0  1048576.0
  copy 8                58.9      208.7      138.8
  copy 8x4              58.9      208.7      208.7
  copy 32              58.9      418.5  2097152.0
  copy 32x4            52.4  2097152.0      418.6
  copy 32x8            58.9  2097152.0  2097152.0
  ------------------------------------------------
  MIPS:  37754778 / 75 = 503397
  MEMORY: 6992967 / 32 = 218530
  TOTAL:  418203
  ------------------------------------------------
 



Vincent Rivière

Posts 83
30 Oct 2016 17:32


Markus (mfro) wrote:
 
  Before I start searching the haystack: we do use the same calling conventions on Atari and Amiga, do we?
 
  Arguments on the stack, left to right, d0-d2/a0 = scratch, return values in d0, all other registers to be preserved by called function?

Beware, calling conventions may be different depending on the API.

For Atari GCC functions, the scratch registers are d0-d1/a0-a1. I believe this is the standard for almost all GCC 680x0 targets.

For Atari BIOS/XBIOS/GEMDOS system calls, scratch registers are d0-d2/a0-a2. We must be extremely careful about that when mixing C functions and system calls.


Gunnar von Boehn
(Apollo Team Member)
Posts 3357
30 Oct 2016 17:47


Markus (mfro) wrote:

The code spits out a lot of bogus values,

Did you run it with strongest  parameter?

Cheers
Gunnar


Markus (mfro)

Posts 89
30 Oct 2016 19:00


Gunnar von Boehn wrote:

Markus (mfro) wrote:

  The code spits out a lot of bogus values,
 

 
  Did you run it with strongest  parameter?
 
  Cheers
  Gunnar

Not until just now. strongest results in slighly different values, but still these 7-digit numbers at places.


Gunnar von Boehn
(Apollo Team Member)
Posts 3357
30 Oct 2016 19:41


Markus (mfro) wrote:

  Not until just now. strongest results in slighly different values, but still these 7-digit numbers at places.

The values look to me like the time for the run is to short,
or the resolution of the timer value not high enough..

If you increase the runtime this should be fixed.
Can you try?

Cheers
Gunnar


Markus (mfro)

Posts 89
30 Oct 2016 20:21


Gunnar von Boehn wrote:

 
Markus (mfro) wrote:

    Not until just now. strongest results in slighly different values, but still these 7-digit numbers at places.
 

 

 
  Just had a look into main.c - "strongest" is not a valid parameter, it seems. The only one that is recognized appears to be "68000".
 
  Do I have the latest sources?
 
 
Gunnar von Boehn wrote:

  The values look to me like the time for the run is to short,
  or the resolution of the timer value not high enough..
 
  If you increase the runtime this should be fixed.
  Can you try?
 

 
  Thanks.
 
  Set LOOPS (from 2) to 32 (didn't really check what it does in detail, but it appeared to me the most straightforward thing to do ;) ).
 
  At least it made the strange values vanish.
 
  As memory throughput numbers seem to be reasonable now (about the same - pretty disappointing values - I got from my own measurements), I guess it's indeed caused by the timer granularity and we're getting closer. Can you show 68080 values? This is what I have now:
 

  firebee:~#./benchv6_68k
  ------------------------------------------------
  Processor & Memory Performance Benchmark.
  $VER: Minibench 8.06 (04.07.16) Apollo Team
  ------------------------------------------------
  ------------------------------------------------
  CPU - Math                512KB
  ------------------------------------------------
  NOP                      167.4
  ADD.L REG                838.4
  ADD.W Im16                838.4
  ADD.L Im32                838.4
  SHIFT REG                1677.8
  SHIFT Imm                1677.8
  AND.L REG                1118.8
  ANDI.L Im32              671.8
  MULU.L                    258.0
  DIV.L                      50.0
  ROL.L Dn,Dm              139.6
  BFFFO  Dn{},Dm              5.2
  BFEXTU (a0){},Dn          15.8
  ------------------------------------------------
  CPU - Special            512KB
  ------------------------------------------------
  fuse_ma_x16_1 Reg        1118.8
  fuse_ma_x16_2 Imm8        838.4
  fuse_ma_x16_3 Imm32      1118.8
  fuse_ma_x16_4 And        1677.8
  bond_ma_x16_1 Reg        1118.8
  bond_ma_x16_2 Imm        671.8
  bond_ma_x16_3 Mem        335.8
  ea_latencyx16            124.6
  alu_latency1x16          258.0
  cache_latency1x16          93.8
  ------------------------------------------------
  CPU - EA                  512KB
  ------------------------------------------------
  R (d16,An)                258.0
  R (d32,An)                119.4
  R (An)+                  258.0
  R (An) ; ADDQ #,An        129.0
  R (An,Dn)                129.0
  R (d32,An,Dn)            124.6
  W (d16,An)                258.0
  W (d32,An)                124.6
  W (An)+                  239.8
  W (An) ; ADDQ #,An        129.0
  W (An,Dn)                124.6
  W (d32,An,Dn)              64.0
  U (d16,An)                134.2
  U (d32,An)                55.4
  U (An)+                  134.2
  U (An) ; ADDQ #,An        101.2
  U (An,Dn)                104.4
  U (d32,An,Dn)              52.2
  ------------------------------------------------
  CPU - Loop                512KB
  ------------------------------------------------
  loopx2                    176.0
  loopx4                    223.8
  loopx6                    223.8
  loopx8                    671.8
  loopx16                  838.4
  loopx32                  838.4
  loopx64                  1118.8
  loopx128                1118.8
  loopix2                  129.0
  loopix4                  152.4
  loopix6                  159.8
  loopix8                  479.8
  loopix16                  671.8
  loopix32                  671.8
  loopix64                  671.8
  loopix128                838.4
  ------------------------------------------------
  CPU - Goto                512KB
  ------------------------------------------------
  goto_x16                  76.6
  goto2_x16                176.0
  goto4_x16                239.8
  gotoCC                    134.2
  gotoCCTRUE                223.8
  gotoCCFALSE              167.4
  gosup_chainx1              71.4
  gosup_chainx2              79.8
  gosup_chainx4              81.0
  ------------------------------------------------
  CPU - Workload            512KB
  ------------------------------------------------
  workload_AAAA            258.0
  workload_LA                9.4
  workload_LAA                9.4
  workload_LAAA              9.4
  workload_LAAAA              9.4
  workload_LLA                9.4
  workload_LLAA              9.4
  workload_LLAAA              9.4
  workload_LLAAAA            9.4
  workload_LAALA              9.4
  ------------------------------------------------
  Measuring memory throughput:
  Results are in MB/sec. Higher value is faster.
  Memory 2 Memory
  Alignment 0-0      512KB      16KB        4KB
  ------------------------------------------------
  libc memcpy          52.2      52.2      52.2
  read 8                56.4      55.4      56.4
  read 8x4              62.8      61.8      61.8
  read 32              79.8      78.8      76.6
  read 32x4            78.8      76.6      78.8
  read 32x8            79.8      78.8      78.8
  write 8              45.8      45.8      45.8
  write 8x4            45.8      45.8      45.8
  write 32              45.8      45.8      45.8
  write 32x4            45.8      44.6      45.8
  write 32x8            45.8      45.8      45.8
  copy 8                56.4      52.2      50.2
  copy 8x4              56.4      54.4      52.2
  copy 32              56.4      54.4      54.4
  copy 32x4            56.4      54.4      54.4
  copy 32x8            56.4      54.4      54.4
  ------------------------------------------------
  Cache 2 Cache
  Alignment 0-0      512KB      16KB        4KB
  ------------------------------------------------
  libc memcpy          54.2      744.6      958.8
  read 8                56.4      88.4      86.2
  read 8x4              62.8      115.2      119.4
  read 32              78.8      419.2      419.2
  read 32x4            78.8      838.4      838.4
  read 32x8            79.8      838.4    1118.8
  write 8              45.8      108.6      104.4
  write 8x4            45.8      209.0      209.0
  write 32              45.8      479.8      559.8
  write 32x4            45.8      838.4      838.4
  write 32x8            45.8    1118.8      838.4
  copy 8                56.4      172.6      176.0
  copy 8x4              56.4      208.0      208.0
  copy 32              56.4      670.8      744.6
  copy 32x4            56.4      744.6      838.4
  copy 32x8            56.4      838.4      958.8
  ------------------------------------------------
  MIPS:  28569 / 75 = 380
  MEMORY: 7013 / 32 = 219
  TOTAL:  332
  ------------------------------------------------
 
  firebee:~#
 



Gunnar von Boehn
(Apollo Team Member)
Posts 3357
30 Oct 2016 20:45


Hi Markus,
 
Sorry tried to ping you on IRC with info.
I was confused with the code versions.
The scores that you have are still impossible values
There must be a Config variable called
CONFIG_TEST_SIZE
Please set it to 64 MB
 
I hope this will fix it
Thanks


Markus (mfro)

Posts 89
30 Oct 2016 21:52


Gunnar von Boehn wrote:

Hi Markus,
 
  Sorry tried to ping you on IRC with info.
  I was confused with the code versions.
  The scores that you have are still impossible values
  There must be a Config variable called
  CONFIG_TEST_SIZE
  Please set it to 64 MB
 
  I hope this will fix it
  Thanks

O.k., done, next try:

Results in waaaay longer runtime (yawn ... I had to set LOOPS to 8 additionally because I got odd numbers for the CPU workload benchmark again), but then pretty much the same values (like less than 5% off) as posted above, so I decided not to clutter the forum with it.

Maybe I have to inspect what PortAsm did to the code. Is there anything particular you'd consider way off so we'd look into that first?


Markus (mfro)

Posts 89
31 Oct 2016 11:55


I guess I've found at least most of the problematic parts.
 
First thing was rather trivial: the original code uses preprocessor macros for instruction sequences like e.g.
 

#define NOP4    nop; nop; nop; nop

 
which looks innocent, but doesn't work with PortAsm.
 
  PortAsm interprets the semicolon as start of a comment (although it has been told we are using the gnu assembler where this is valid syntax), so only the very first instruction was executed.
 
Fixed, but still no go.
 
Second was a little trickier and not so obvious (at least not for an Atarian like me).
 
As I just had to learn the hard way, Amiga code appears to use register A5 as frame pointer while the rest of the world uses A6.
 
This isn't going to be a problem as long as you consistently use either %a5 or %fp.
 
Unfortunately, this wasn't the case. The routines in tests_WORKLOAD_68k.S where using both (%fp in the LINK instruction, %a5 for the unlnk) which obviously corrupted registers of the calling routine and caused the code to fail.
 
Fixing that takes us there:
 

firebee:~#./benchv6_68k
  ------------------------------------------------
  Processor & Memory Performance Benchmark.
  $VER: Minibench 8.06 (04.07.16) Apollo Team
  ------------------------------------------------
  ------------------------------------------------
  CPU - Math                64MB
  ------------------------------------------------
  NOP                        43.4
  ADD.L REG                255.6
  ADD.W Im16                86.8
  ADD.L Im32                255.6
  SHIFT REG                255.6
  SHIFT Imm                255.6
  AND.L REG                255.6
  ANDI.L Im32              172.0
  MULU.L                    65.2
  DIV.L                      9.0
  ROL.L Dn,Dm                36.8
  BFFFO  Dn{},Dm              1.4
  BFEXTU (a0){},Dn            3.8
  ------------------------------------------------
  CPU - Special              64MB
  ------------------------------------------------
  fuse_ma_x16_1 Reg        479.2
  fuse_ma_x16_2 Imm8        479.2
  fuse_ma_x16_3 Imm32      260.6
  fuse_ma_x16_4 And        479.2
  bond_ma_x16_1 Reg        255.6
  bond_ma_x16_2 Imm        255.6
  bond_ma_x16_3 Mem        104.0
  ea_latencyx16            123.6
  alu_latency1x16          253.2
  cache_latency1x16          93.4
  ------------------------------------------------
  CPU - EA                  64MB
  ------------------------------------------------
  R (d16,An)                255.6
  R (d32,An)                120.2
  R (An)+                  248.4
  R (An) ; ADDQ #,An        127.8
  R (An,Dn)                127.8
  R (d32,An,Dn)            124.2
  W (d16,An)                253.2
  W (d32,An)                124.2
  W (An)+                  239.6
  W (An) ; ADDQ #,An        127.8
  W (An,Dn)                125.4
  W (d32,An,Dn)              64.2
  U (d16,An)                132.8
  U (d32,An)                55.8
  U (An)+                  132.2
  U (An) ; ADDQ #,An        102.8
  U (An,Dn)                103.6
  U (d32,An,Dn)              52.6
  ------------------------------------------------
  CPU - Loop                64MB
  ------------------------------------------------
  loopx2                    175.4
  loopx4                    211.2
  loopx6                    225.4
  loopx8                    235.4
  loopx16                  248.4
  loopx32                  255.6
  loopx64                  258.0
  loopx128                  263.0
  loopix2                  131.4
  loopix4                  150.8
  loopix6                  157.8
  loopix8                  162.6
  loopix16                  168.8
  loopix32                  172.0
  loopix64                  174.2
  loopix128                175.4
  ------------------------------------------------
  CPU - Goto                64MB
  ------------------------------------------------
  goto_x16                  75.0
  goto2_x16                175.4
  goto4_x16                239.6
  gotoCC                    126.6
  gotoCCTRUE                229.4
  gotoCCFALSE              170.8
  gosup_chainx1              71.2
  gosup_chainx2              78.4
  gosup_chainx4              82.4
  ------------------------------------------------
  CPU - Workload            64MB
  ------------------------------------------------
  workload_AAAA            258.0
  workload_LA              258.0
  workload_LAA              258.0
  workload_LAAA            258.0
  workload_LAAAA            258.0
  workload_LLA              258.0
  workload_LLAA            258.0
  workload_LLAAA            258.0
  workload_LLAAAA          258.0
  workload_LAALA            258.0
  ------------------------------------------------
  Measuring memory throughput:
  Results are in MB/sec. Higher value is faster.
  Memory 2 Memory
  Alignment 0-0      64MB      16KB        4KB
  ------------------------------------------------
  libc memcpy          54.2      54.6      54.2
  read 8                56.2      56.0      56.0
  read 8x4              61.0      60.8      60.6
  read 32              76.8      76.6      76.4
  read 32x4            76.8      76.4      76.2
  read 32x8            78.2      77.8      77.8
  write 8              45.0      44.8      44.8
  write 8x4            45.0      44.8      44.8
  write 32              45.0      44.8      44.8
  write 32x4            45.0      45.0      44.8
  write 32x8            45.0      44.8      44.8
  copy 8                58.0      54.2      52.4
  copy 8x4              58.2      56.0      52.2
  copy 32              58.4      56.8      56.4
  copy 32x4            58.4      56.2      54.8
  copy 32x8            58.4      56.8      56.4
  ------------------------------------------------
  Cache 2 Cache
  Alignment 0-0      64MB      16KB        4KB
  ------------------------------------------------
  libc memcpy          54.2      800.2      852.0
  read 8                56.2      87.6      87.4
  read 8x4              61.0      116.6      116.2
  read 32              76.8      419.4      412.8
  read 32x4            76.8      838.8      813.4
  read 32x8            78.2      925.6      894.6
  write 8              45.0      105.2      104.8
  write 8x4            45.0      209.6      208.0
  write 32              45.0      526.2      516.2
  write 32x4            45.0      838.8      813.4
  write 32x8            45.0      925.6      894.6
  copy 8                58.0      172.6      174.8
  copy 8x4              58.2      206.4      208.6
  copy 32              58.4      678.4      688.2
  copy 32x4            58.4      800.2      824.8
  copy 32x8            58.4      838.8      908.8
  ------------------------------------------------
  MIPS:  13920 / 75 = 185
  MEMORY: 6870 / 32 = 214
  TOTAL:  194
  ------------------------------------------------
 
firebee:~#

 
Probably a little disappointing for us FireBee users, but it's not as bad as it looks. Nobody in the Atari world would ever come up with the strange idea to use bitset instructions. PortAsm generates code that loops with 16 instructions 32 x through the register ...
 
If we do not count them, we reach a score of
 
 

  ------------------------------------------------
  MIPS:  13840 / 71 = 194
  MEMORY: 6880 / 32 = 215
  TOTAL:  201
  ------------------------------------------------
 

- at least more than 200 ;)
 
On the other hand, minibench is mostly nice to the FireBee in that it uses word-sized instructions very sparingly. This is what really hurts ColdFire performance on TOS in real world.
 
All in all: well done, Apollians!
 


Gunnar von Boehn
(Apollo Team Member)
Posts 3357
31 Oct 2016 15:16


Thanks Markus.

Very interesting result.
For comparison here are current Vampire scores.


-----------------------------------------------------------
MiniBench 8.07h (MC68K)
MEMORY USED: 2MB
-----------------------------------------------------------
CPU - Math                512KB
-----------------------------------------------------------
NOP                      173.1
ADD.L Reg                173.1
ADD.W Imm16              163.5
ADD.L Imm32              163.5
SHIFT Reg                173.0
SHIFT Imm16              173.1
AND.L Reg                173.0
ANDI.L Imm32              162.1
MULU.L                    30.0
DIV.L                      2.6
ROL.L Dn,Dm              173.1
BFFFO Dn,Dm                86.6
BFEXTU (a0),Dn            85.8
-----------------------------------------------------------
CPU - Special            512KB
-----------------------------------------------------------
Fuse1 x16 Reg            325.8
Fuse2 x16 Imm8            326.4
Fuse3 x16 Imm32          326.1
Fuse4 x16 And            326.2
Bond1 x16 Reg            173.1
Bond2 x16 Imm            163.5
Bond3 x16 Mem            173.1
EA Latency x16            86.6
ALU Latency x16            92.0
Cache Latency x16          86.6
-----------------------------------------------------------
CPU - EA                  512KB
-----------------------------------------------------------
R (d16,An)                89.2
R (d32,An)                88.8
R (An)+                    89.3
R (An); ADDQ #,An          86.5
R (An,Dn)                  86.2
R (d32,An,Dn)              86.6
W (d16,An)                87.4
W (d32,An)                87.4
W (An)+                    87.8
W (An); ADDQ #,An          84.8
W (An,Dn)                  85.2
W (d32,An,Dn)              84.7
U (d16,An)                87.4
U (d32,An)                87.8
U (An)+                    87.4
U (An); ADDQ #,An          85.2
U (An,Dn)                  84.8
U (d32,An,Dn)              84.8
-----------------------------------------------------------
CPU - Loop                512KB
-----------------------------------------------------------
Loop1 x2                  92.0
Loop1 x4                  121.8
Loop1 x6                  138.0
Loop1 x8                  147.2
Loop1 x16                162.1
Loop1 x32                173.0
Loop1 x64                173.1
Loop1 x128                176.7
Loop2 x2                  92.0
Loop2 x4                  121.9
Loop2 x6                  138.0
Loop2 x8                  147.2
Loop2 x16                162.1
Loop2 x32                163.5
Loop2 x64                173.1
Loop2 x128                176.5
-----------------------------------------------------------
CPU - Goto                512KB
-----------------------------------------------------------
Goto1                      81.8
Goto2                    129.9
Goto4                    147.1
Gosup1                    39.2
Gosup2                    39.0
Gosup4                    36.4
GotoCC                    132.5
GotoCC0                  132.5
GotoCC1                  132.5
-----------------------------------------------------------
CPU - Workload            512KB
-----------------------------------------------------------
WorkLoad AAAA            176.4
WorkLoad LA              169.7
WorkLoad LAA              169.7
WorkLoad LAAA            175.9
WorkLoad LAAAA            176.5
WorkLoad LLA              129.9
WorkLoad LLAA            169.7
WorkLoad LLAAA            142.0
WorkLoad LLAAAA          169.7
WorkLoad LAALA            169.7
-----------------------------------------------------------
Memory to Memory (MB/sec)
Alignment 0-0      512KB      16KB      4KB
-----------------------------------------------------------
Libc Memcpy          220.3      216.6      210.5
Read 8                92.0      91.1      90.9
Read 8x4              92.0      91.5      90.1
Read 32              240.4      238.1      234.0
Read 32x4            240.4      237.4      232.9
Read 32x8            240.3      236.8      230.6
Write 8              90.5      90.2      89.6
Write 8x4            90.0      90.2      88.9
Write 32            359.5      356.2      345.3
Write 32x4          359.5      355.5      344.2
Write 32x8          359.4      354.6      340.1
Copy 8                54.1      54.1      54.8
Copy 8x4              70.0      70.8      70.5
Copy 32              170.9      170.1      168.9
Copy 32x4            274.6      272.5      268.1
Copy 32x8            274.2      272.3      268.4
-----------------------------------------------------------
Cache to Cache (MB/sec)
Alignment 0-0      512KB      16KB      4KB
-----------------------------------------------------------
Libc Memcpy          220.5      306.5      298.2
Read 8                91.5      91.8      91.2
Read 8x4              92.0      91.3      90.9
Read 32              240.4      363.0      353.1
Read 32x4            240.4      362.7      351.2
Read 32x8            240.4      361.2      346.1
Write 8              90.3      89.8      89.4
Write 8x4            90.5      90.2      89.3
Write 32            359.4      356.0      345.8
Write 32x4          359.5      355.6      344.4
Write 32x8          359.5      354.7      340.4
Copy 8                54.2      60.9      60.8
Copy 8x4              70.9      80.9      80.6
Copy 32              170.6      242.7      240.2
Copy 32x4            274.8      576.4      558.1
Copy 32x8            274.6      636.8      614.5
-----------------------------------------------------------
CPU: 10048 / 75 = 133 MIPS.
MEM: 7152 / 32 = 223 MB/Sec.
ALL: 160 Points.
-----------------------------------------------------------



Markus (mfro)

Posts 89
31 Oct 2016 17:56


maybe there is someone who volunteers throwing this: EXTERNAL LINK at an Amiga compiler and post the outcome?
   
  Yes, its aged and probably more than far from being the best benchmark, but since Motorola originally claimed to score 401 VAX MIPS with this on the ColdFire V4, we simply _had to_ test it (and missed the goal miserably, so much for marketing).
 
  We have collected some numbers for different Atari machines here:
  http://firebee.org/~firebee/pictures/files/dhrystone.pdf
   
    Would be nice if we could add another 68k to it ...
   


Vincent Rivière

Posts 83
31 Oct 2016 21:02


Excellent investigation, Markus :-D

posts 45page  1 2 3