Overview Features Instructions Performance Forum Downloads Products Reseller Contact

Welcome to the Apollo Forum

This forum is for people interested in the APOLLO CPU.
Please read the forum usage manual.
VISIT APOLLO IRC CHANNEL



All TopicsNewsPerformanceGamesDemosApolloVampireCoffinReleasesLogin
The team will post updates and news here

68080page  1 2 3 

Alex Kemmler

Posts 26
09 Oct 2016 23:14


The 68080 CPU has few mips, mhz??


Martin Soerensen

Posts 227
10 Oct 2016 10:09


Yes, it has both MIPS and MHz. I have no idea what your question is?..


Lord Aga
(Apollo Team Member)
Posts 87
10 Oct 2016 10:18


"Few" like 2 or 3.
I'm sure our CPU has more than 2 or 3 MIPS and MHz :)
We have at least 5 :D :D :D


Daniel Sevo

Posts 267
10 Oct 2016 10:22


Please remember this long thread before feeding the troll ;-)

CLICK HERE 


Lord Aga
(Apollo Team Member)
Posts 87
10 Oct 2016 10:38


Oh I do, hence the appropriate response :)


Alex Kemmler

Posts 26
10 Oct 2016 17:42


No troll thanks for listering mi language is bad sorry no native.


Gunnar von Boehn
(Apollo Team Member)
Posts 3944
10 Oct 2016 18:08


Different Benchmarks score different numbers

This is one benchmark:


      CPU      MHz  CARD                MIPS
      68020    14    A1200                2
      TG68            Mist                  4
      68020    28    Blizzard 1220        4
      68030    26    ACA1233              4.2
      68030    50                          6
      Coldfire 266    Firebee              8
      68040    28                        19
      68040    40                        27
      68060    50                        32
      68060    80    A1200+Apollo1260    46
      68080    78    Vampire GOLD        86
      68080    92    Vampire GOLD2      165
 
 

In other benchmarks, other numbers will be returned.




Szyk Cech

Posts 190
12 Oct 2016 17:53


Gunnar von Boehn wrote:

 

      Coldfire 266    Firebee              8
   
 
 

HOw is this possible? Is this crap CPU???
Gunnar von Boehn wrote:

 

      68080    78    Vampire GOLD        86
      68080    92    Vampire GOLD2      165
   
 
 

How is this spedup possible?!? Is this kind of "Only Amiga makes it possibe"?!?


Ian Parsons

Posts 185
12 Oct 2016 18:28


Coldfire has to emulate missing 680x0 instructions/modes, if the benchmark uses lots of those the Coldfire will score poorly (if it contained none Coldfire would score very high).

The benchmark benefits greatly from the improvement(s) made to the core (possibly related to predicting branches from what has been said on chat IIRC).

It all goes to show that single benchmarks are generally not a good measure of overall performance.


Wawa T

Posts 536
12 Oct 2016 20:10


this example though if probably part of an explanation, why coldfire had to be dropped as 68k replacement.


John William

Posts 420
12 Oct 2016 22:08


Gunnar von Boehn wrote:

Different Benchmarks score different numbers
 
  This is one benchmark:
 

      CPU      MHz  CARD                MIPS
      68020    14    A1200                2
      TG68            Mist                  4
      68020    28    Blizzard 1220        4
      68030    26    ACA1233              4.2
      68030    50                          6
      Coldfire 266    Firebee              8
      68040    28                        19
      68040    40                        27
      68060    50                        32
      68060    80    A1200+Apollo1260    46
      68080    78    Vampire GOLD        86
      68080    92    Vampire GOLD2      165
   
 
 
 
 
  In other benchmarks, other numbers will be returned.
 
 

Please, please, can you release vampire gold 2 for v500 and v600?



Gunnar von Boehn
(Apollo Team Member)
Posts 3944
12 Oct 2016 22:37


Ian Parsons wrote:

Coldfire has to emulate missing 680x0 instructions/modes, if the benchmark uses lots of those the Coldfire will score poorly (if it contained none Coldfire would score very high).

The core loop contains 10 arithmetic long instruction and 1 DBRA
The arithmetic long instructions run native on Coldfire
DBRA is very common on 68k code - but missing on Coldfire.

So 1 instruction needed to be trapped and emulated on Coldfire.
Therefore the low score.

Having only 1 missing instruction is actually a very Coldfire friendly case. In normal 68k program the percentage of on Coldfire emulated instruction could be much higher.


Adam Whittaker

Posts 209
13 Oct 2016 03:12


John William wrote:

Gunnar von Boehn wrote:

  Different Benchmarks score different numbers
 
  This is one benchmark:
 

        CPU      MHz  CARD                MIPS
        68020    14    A1200                2
        TG68            Mist                  4
        68020    28    Blizzard 1220        4
        68030    26    ACA1233              4.2
        68030    50                          6
        Coldfire 266    Firebee              8
        68040    28                        19
        68040    40                        27
        68060    50                        32
        68060    80    A1200+Apollo1260    46
        68080    78    Vampire GOLD        86
        68080    92    Vampire GOLD2      165
   
 
 
 
 
  In other benchmarks, other numbers will be returned.
 
 
 

 
  Please, please, can you release vampire gold 2 for v500 and v600?
 

I am with this guy on this lol - gold 2 please because its got to be said you have made my amiga my usable daily driver... and it works so smooth really good product and the fpga cpu core you have created is simply masterful!!!



Martin Soerensen

Posts 227
13 Oct 2016 10:19


I'm also wondering how it is possible to move from 86 to 165 MIPS from GOLD1 to GOLD2? The MHz increase alone is not enough to warrant that increase. Any clues or will we have to wait in anticipation? :)


Mr-Z EdgeOfPanic

Posts 179
13 Oct 2016 12:01


The 165 MIPS came from the short loops improvement benchmark.
According to BigGun this should for example speed up image decoding a lot, nice when surfing the web for instance.


OneSTone O2o

Posts 159
21 Oct 2016 23:53


Szyk Cech wrote:

   
Gunnar von Boehn wrote:

     

            Coldfire 266    Firebee              8
       
 
     

      HOw is this possible? Is this crap CPU???
     
Gunnar von Boehn wrote:

     

            68080    78    Vampire GOLD        86
            68080    92    Vampire GOLD2      165
       
 
     

      How is this spedup possible?!? Is this kind of "Only Amiga makes it possibe"?!?
   

   
Hello, a colleague at atari-home.de (user mfro) was analyzing this benchmark. That's actually what happens in the benchmark loop, it's just a kind of make nothing, but count the number of commands which are executed.
   
   
addil #16,%d0
    addil #16,%d1
    addil #16,%d2
    addil #16,%d3
    addal #16,%a0
    addal #16,%a1
    addal #16,%a2
    dbf %d6,0x486

   
Coldfire fails not only because of the dbf instruction which needs to be emulated, but also the adda.l does not exist and must be emulated. So Coldfire looses a lot of time with emulate the whole loop. In fact this benchmark tests the effectivity of the emulation lib. This test loop is almost the worst what you can feed a Coldfire processor with, unfortunatelly these adressing modes are missing there, a big limitation. mfto was so nasty and modifyed the benchmark loop in something what runs natively on coldfire:
   
   
       move.l  %[loops],d6
            add.l  #1,d6
    1:    addi.l  #16,d0
            addi.l  #16,d1
            addi.l  #16,d2
            addi.l  #16,d3
            lea    16(a0),a0
            lea    16(a1),a1
            lea    16(a2),a2
            subq.l  #1,d6
            bne.s  1b

It does effectively the same but still senseless benchmark code. Just count the MIPS. Result:
   
ran 7.595000 seconds, MIPS: 210.664917
   
That looks much better. It's not 264 because he did not divide by 9, but just again by 8 to stay compareable as the dbf equivalent needs two clocks. But mfro is even more nasty and did this, what still does something senseless (addition and multiplication in one instruction, but capable to run 2 instructions parallel):
   
               move.l  %[loops],d6
                    add.l  #1,d6
            1:    addi.l  #16,d0
                    mac.l  d0,d1,acc0
                    addi.l  #16,d1
                    mac.l  d0,d1,acc0
                    addi.l  #16,d2
                    mac.l  d0,d1,acc0
                    addi.l  #16,d3
                    mac.l  d0,d1,acc0
                    lea    16(a0),a0
                    mac.l  d0,d1,acc0
                    lea    16(a0),a1
                    mac.l  d0,d1,acc0
                    lea    16(a0),a2
                    mac.l  d0,d1,acc0

The difference is, that here Coldfire also executes some of the instructions parallel. Result:
   
ran 12.935000 seconds, MIPS: 262.852717
   
That is very close to one instruction per clock at 264 MHz. There is maybe still something what can be optimized, but mfro still doesn't understand well what happens here with the EMAC Units register utilisation, that seem to slowdown it a bit when using the "wrong" registers.
   
He also analyzed other benchmark and found out that it rons NOP command in the loop. Also this is bad for Coldfire as it needs 6 clocks to execute it and after that again 6 clocks to restart the CPU with a pipleine flush, so one NOP is in total 12 clocks. Replacing NOP with TPF (still does nothing) would accelerate such loop by factor 12 as it only needs one clock and no pipline flush.
   
This is just to show that the Coldfire is not that bad as in above benchmark resut list, if it gets the right instructions. mfro is very amazed about what you guys created with the Apollo 68080 core, it's impressive and a real enrichment.
   
Additionally I want to add, that new ATARI ST software (or old one where Source code is available) can be compiled with new version of gcc and GFA-Basic compiler in Coldfire native code, that is quite optimized and has very optimal results. (at least this version of gcc should also be usefull on Amiga and Mac68K platform) The problem for Coldfire is the legacy software, and there is so much, but in reality, such software also does not contain only Coldfire unaware instructions, so in reality legacy software is not that slow as the 8 MIPS loop. If it can run. For example Calamus DTP is not, and that is the advantage of Apollo, I expect that it runs on your core, looking forward to test it. I think Apollo will be faster with legacy software while Coldfire runs still better with new Coldfire compiled software.
   
If possible, it would be interesting to see the result of the two Coldfire optimized codes on the Apollo core. The 2nd code might not run on Apollo as it is.
   
Link to the neighbour forum discussion EXTERNAL LINK 


Gunnar von Boehn
(Apollo Team Member)
Posts 3944
22 Oct 2016 09:04


Hi

oneSTone o2o wrote:

  Hello, a colleague at atari-home.de (user mfro) was analyzing this benchmark. That's actually what happens in the benchmark loop, it's just a kind of make nothing, but count the number of commands which are executed.
     
     
addil #16,%d0
      addil #16,%d1
      addil #16,%d2
      addil #16,%d3
      addal #16,%a0
      addal #16,%a1
      addal #16,%a2
      dbf %d6,0x486

     
  Coldfire fails not only because of the dbf instruction which needs to be emulated, but also the adda.l does not exist and must be emulated. So Coldfire looses a lot of time with emulate the whole loop.

I'm sorry, but what you say is NOT correct.
ADD.L #imm,An - is a basic 32bit instruction.
ADD.L #imm,An is fully supported by Coldfire.
Please check this yourself in Coldfire Programmer Reference Manual Page 4-4

oneSTone o2o wrote:

In fact this benchmark tests the effectivity of the emulation lib. This test loop is almost the worst what you can feed a Coldfire processor with,

In this Loop only 1 instruction needs to be emulated.
You could call 1 missing instruction certainly not "a worst case". :-)

     

oneSTone o2o wrote:

     
       move.l  %[loops],d6
            add.l  #1,d6
      1:    addi.l  #16,d0
            addi.l  #16,d1
            addi.l  #16,d2
            addi.l  #16,d3
            lea    16(a0),a0
            lea    16(a1),a1
            lea    16(a2),a2
            subq.l  #1,d6
            bne.s  1b

  It does effectively the same but still senseless benchmark code. Just count the MIPS. Result:

The main purpose of this benchmark was to check DBF execution and DBF branch prediction.
For this purpose the test is pretty sensible. :-)

And the test had for this purpose 2 loops inside each other.
Does your "new" test measure this too?
To me it look like the inner loop which was 50% misprediction rate was removed?
So it look like your version of the test - does not test branch prediction anymore. Can this be?
If you remove the branch prediction from the test - then the whole test is not comparable!

     
     
oneSTone o2o wrote:

  He also analyzed other benchmark and found out that it rons NOP command in the loop. Also this is bad for Coldfire as it needs 6 clocks to execute it and after that again 6 clocks to restart the CPU with a pipleine flush, so one NOP is in total 12 clocks. Replacing NOP with TPF (still does nothing)

Yes of course NOPs take several clock.
This is also true for 68k CPUs.
But fact is that the NOP test in Minibench is run for information, but  its of course _NOT_ counted for the MIP result numbers.
So changing this to TPF would have no influence on the printed MIPS score.

oneSTone o2o wrote:
     
  The problem for Coldfire is the legacy software, and there is so much, but in reality, such software also does not contain only Coldfire unaware instructions, so in reality legacy software is not that slow as the 8 MIPS loop.

Also this test did had 90% of Coldfire native instructions.
Only 1 instruction needed to be emulated.
In fact, real world AMIGA legacy applications or even the AMIGA OS (Kickstart) have a much higher percentage of instruction which need to be emulated.

oneSTone o2o wrote:
     
I think Apollo will be faster with legacy software while Coldfire runs still better with new Coldfire compiled software.

I think you need to wording could be a little more precised. ;-)

First of all "Coldfire" does not equal to "Coldfire".
There are different Coldfire cores V1/V2/V3/V4/V4e
Clock by clock these different Coldfire Cores also have very different performance.
And clock by clock the Coldfire V4e is certainly the fastest Coldfire Core.

But clock by clock the Coldfire V4e is a lot slower than the Apollo 68080

Now, if you say that Coldfire V4 at 260 MHz is in some areas faster than an APOLLO @ 80 MHz ?

This is of course true.
But you have to mind that you can get a MUCH higher clocked APOLLO by simple using a more expensive FPGA chip.

The Vampire Card uses a relative affordable FPGA - to be able to offer the Accelerator for a affordable price.
If Igor would had put the Vampire into a price segment of the Firebee for example - then we could have offered to his customers also a twice as fast FPGA.

We did tests on more expensive FPGA models and reached over 200 MHz with APOLLO in these FPGAs.

You have to mind you right now compare two system - of which the Coldfire system costs twice the money.
So if you are willing to spend the same amount of money for an Apollo system as for the Coldfire Atari.. Then you can also get a 200 MHz Apollo core.
A 200 Mhz Apollo 68080 will of course much faster than Coldfire in all cases.

APOLLO is very fast clock by clock.
We run right now MPEG video with APOLLO @ 80MHz in AMIGA-500.
In fact some of these Videos play more smooth on APOLLO than on my 600 Mhz PowerPC Neo-AMIGA system.

With APOLLO we have the option to create very affordable FPGA systems running at 80-90 Mhz.
If customer demand is there we can also create a performance system running around 200 MHz.
An such performance system is something we could create in a few month timeframe.

In theory if the market would be big enough.
E.g. If AMIGA and ATARI community would join together on this.
Then one could in theory also let an ASIC version of APOLLO be produced.

And then reaching 1000 MHz is possible with APOLLO too.
Saying this - An ASIC would really be very long term option.



Markus (mfro)

Posts 91
22 Oct 2016 10:01


I'm the one that caused this discussion.

I guess it doesn't make sense to talk through a middle-man, so I decided to register in this forum as well.

Hello, everybody.

Gunnar von Boehn wrote:

Hi
 
 
oneSTone o2o wrote:

     
addil #16,%d0
      addil #16,%d1
      addil #16,%d2
      addil #16,%d3
      addal #16,%a0
      addal #16,%a1
      addal #16,%a2
      dbf %d6,0x486

 

 
  I'm sorry, but what you say is NOT correct.
  ADD.L #imm,An - is a basic 32bit instruction.
  ADD.L #imm,An is fully supported by Coldfire.
  Please check this yourself in Coldfire Programmer Reference Manual Page 4-4

You are correct. I obviosly have been fooled by the gas assembler.

My apologies for that. I have corrected this in my code with no apparent change to the results.
 
Gunnar von Boehn wrote:
 
  In this Loop only 1 instruction needs to be emulated.
  You could call 1 missing instruction certainly not "a worst case". :-)

Certainly not worst case, but nevertheless a "bad case". dbf triggers an illegal instruction exception on the ColdFire that needs to be handled within the cf68klib exception handler, effectly replacing the instruction with an analogous ColdFire instruction sequence.

Very bad in a loop intended to measure instructions per second and (at least in my opinion) not really a fair comparison.

Gunnar von Boehn wrote:
 
  And the test had for this purpose 2 loops inside each other.
  Does your "new" test measure this too?
  To me it look like the inner loop which was 50% misprediction rate was removed?
  So it look like your version of the test - does not test branch prediction anymore. Can this be?

Correct analysis as well. As the sources for the original benchmark weren't posted (at least I didn't find them anywhere), I had to analyze the disassembly. To me, the nested loops weren't obvious from there with my only excuse disassembling Amiga executables with only Atari tools available doesn't really provide very obvious results.

If somebody would be so kind to provide the original sources, I'm more than willing to repeat the test for an exact comparison.

I'd be more than curious on what that causes to the ColdFire results as well since for now, I always was at the opinion that ColdFire branch prediction isn't bad at all.
 


Gunnar von Boehn
(Apollo Team Member)
Posts 3944
22 Oct 2016 11:01


Hallo Markus,

Nice to see you here. :-)
 
 
Let me explain one thing. Our intention is NOT to bad mouth Coldfire or to put it down. We have ourselves a Coldfire development systems. Which we used long time ago to evaluate Coldfire as CPU option for AMIGA.
 
The testcase which you looked at, was done only to measure the effect of branch prediction. So the ADD are just some filler in it.
The motivation for this test was that we saw that during video decoding there are often short loops with 4 or 8 iterations.
In the Video codecs which we analyses were sometimes only few instructions in such a loop.
 
So this benchmark a simplified version of this behavior.
We try to measure a small loop with low number of iterations / and to see if the CPU is able to run it well.
 
The benchmark was done to compare 68K cores.
That someone ran it on a Coldfire was just by accident so to speak.

If you want, the source is available for download here.
CLICK HERE 




Markus (mfro)

Posts 91
22 Oct 2016 11:17


Gunnar von Boehn wrote:

Hallo Markus,
 
  Nice to see you here. :-)

   
Thank you.

Gunnar von Boehn wrote:

  Let me explain one thing. Our intention is NOT to bad mouth Coldfire or to put it down.

That was always my understanding and I really appreciate the mostly constructive mood in this forum (which is/was - as we probably both know all too well - not always the case when it comes to "Amiga vs. Atari" ;) ).

Gunnar von Boehn wrote:
 
If you want, the source is available for download here.
  CLICK HERE 

Thank you very much. I'll have a look into it.

posts 45page  1 2 3