Overview Features Coding ApolloOS Performance Forum Downloads Products Order Contact

Welcome to the Apollo Forum

This forum is for people interested in the APOLLO CPU.
Please read the forum usage manual.
Please visit our Apollo-Discord Server for support.



All TopicsNewsPerformanceGamesDemosApolloVampireAROSWorkbenchATARIReleases
Performance and Benchmark Results!

Performance and Software for 2018page  1 2 3 4 

Gunnar von Boehn
(Apollo Team Member)
Posts 6207
30 Nov 2017 09:31


I can understand that for a non-programmer the situation of AMIGA OS and different CPUs might look confusing. Maybe it makes sense to give a real world example.

The AMIGA OS is modular.
For example on AMIGA you have datatypes.
Datatypes are small programs providing decoding/processing functions for different data formats.
For example there is:
- 1 datatype program to decode IFF images,
- 1 datatype program to decode BMP images,
- 1 datatype program to decode GIF images,
- 1 datatype program to decode PNG images,
- 1 datatype program to decode JPEG images...

The different 680x0 CPUs provide different instructions.
The 68020 CPU for example supports more instructions than the 68000 CPU.

When you install a "datatype" to the system you can pick the program matching your CPU.

This means when you install the JPEG.datatype then the install can choose to install the 68000 version or 68020 version for example.

If your AMIGA has an 68020 CPU, then both datatype version will work for you, but the 020-version will runs faster.
If you have an AMIGA with only 68000 CPU, then you need to pick the 68000 file, as the 68020 program will not work on the 68000.

In the posts someone discussed the option to run the 020 binary - but to emulate the missing instructions on the 68000.
This is in theory technical possible but makes no sense.

Lets make some simplified performance example.
Lets give the files a realistic performance number, where higher means better.

Lets say
- The 68000 program has a speed of 100%
- The 68020 program might have a speed of 120%
This means picking the optimal datatype file on your 68020 CPU will improve your AMIGA speed by 20%.

There is no Emulator to run 020 binaries, but lets say there would be one then running an EMULATOR will for an 68020 file will eat a lot of performance.
This means if you run the 68020 file on an 6800 CPU in emualtion, then you might only reach 10% of total performance.

So for the 68000 owner this is not a sensible choice at all.
For the 68000 owner just selecting the binary matching is CPU is the best choice.

68080 now offers AMMX instruction which can accelerate a number of operations a lot. This mean the happy owner of a Vampire might have the option to pick a datatype.68080 file
- The 68080 program might have a speed of 300%

So selecting this will give him a nice boost.

Again running the 68080 datatype in an emulator on an 68000 - is not a sensible option.
If an 68000 CPU would emulate the 68080 instruction, then the datatype would maybe reach only 5% speed.

This means picking the normal 68000 datatype file - again is the only sensible decision.

In other words, nothing has changed for AMIGA users.
During install of a software you pick as always the binary matching your CPU model.


Saladriel Amrael

Posts 166
30 Nov 2017 10:31


Exactly. And also, this is called hardware evolution and not always it goes well with backward compatibility.
 
  Think about what happened with AGA: Some games were written with ECS/OCS in mind but got an AGA enhanced version (Shadow Fighters, Brian the Lion, ecc...)
  Some other games were written with AGA/68020+ in mind and could not get an ECS version becasuse it would have castrated the game (Banshee, Breathless, Capital Punishment, T-Zer0 ecc...)
  That was not only considered completely normal but also wellcomed at the times, because the later games were an evolution compared to the former ones.
  The same happened with other systems aswell when PC games started requiring 386+ processor.
 
  I think what is happening here with Vampire is the same thing:
  some software will make sense to be recompiled with backward compatibility in mind, some others not.
 
 
 


Renee Cousins
(Apollo Team Member)
Posts 142
01 Dec 2017 05:39


Steve Ferrell wrote:
You're getting to be as absurd and off the rails as Asaf.  You're stating that emulating an FPU is the same as what Asaf is suggesting,  and it isn't....not even close.

Didn't say that though, did I. And I take offence to that -- I've been doing firmware development since the mid 90's. From the 68332 to ColdFires to STM32's with more number crunching power than I really know what to do with. I've developed for 8-bitters to 64-bitters and I think I know what I'm talking about.
Steve Ferrell wrote:
Emulating a small set of floating point instructions on a CPU not equipped with an FPU is NOT the same as taking a CPU such as an 8086 and having it emulate the full instruction set of an Intel Core i-7 CPU "AND" its integrated FPU and memory manager.  In the preceding sentence, replace the 8086 and i-7 with the architectures of your choice for full effect.

Running ARM code with MMU on an 8-bit, 8MHz AVR is perfectly doable. I never claimed it was "generally" practical, but it has its corner uses. You're whole argument is just a giant straw man.
Steve Ferrell wrote:
And no, this ISN'T surprisingly common.  If it were, people would still be using 8086 processors and trying to decode x.265 video streams, but they aren't and for good reason.

Of course not -- emulators are optimistically ten times as slow as the host system their running on -- more often it's by a factor of 100 to 1000 times slower. Using an 8086 to decode x.265 streams is nonsense -- just like Gunnars comment about running a database. Of course you never get the performance. An 8086 emulating 286 code is already abysmally slow -- but it's possible, and as I said, it has it's use cases because not everything is about performance.
Steve Ferrell wrote:
There would also be translation layers allowing people to run CUDA apps on 80286 PC's with EGA graphics boards too....and there are none.

CUDA and OpenCL emulators exist too. But that whole reach there, yeah, just keep going...
Steve Ferrell wrote:
Or wait, what about a commercial product that lets you run Windows 10 on an 8088 CPU?  Again, there isn't such a product nor a need for it. So common for whom? Researchers who have a lot of time on their hands and a budget provided by the state?

That's a weird jab a socialism, by the way.

Just because you're unfamiliar with these use cases doesn't mean they don't exist. No, no one will run such a set-up to use tools that need performance, but you don't always need performance.

Sometimes, being accurate and being able to debug things is more important. Running OpenCL in an emulator is useful because speed isn't important when you're single-stepping through your buggy code. Microcontrollers are even tougher to work on when you're juggling the states of dozens of timers and interrupt latencies -- stepping through a simulator can be far more useful than trying to run it on real hardware.

More to the point, these tools are essential when you do not have access to the hardware you're developing on or when it doesn't exist yet. The ARM didn't pop out of someone's ass. They had to run simulations of it before burning the ASIC to ensure it worked. Since there wasn't a processor MORE powerful at the time, they clearly had to run the simulations on a lesser processor -- and they did -- they ran them on a 6502 using BBC BASIC.

A great many processors ran in simulations LONG before they were real silicon. Just like the Vampire.

Even today, we usually run FPGA's through simulations before wasting cycles running on on real chips because the simulations can tell us a lot more. If a chip fails, it just doesn't work -- a simulation can tell you why. And I'll tell you what -- there's no computer on earth that can run an FPGA simulation at the full speed the FPGA can. It's not even close.

But I wasn't "defending" Asaf either -- if you read my other posts you'd see that.

None of this was even what I **THINK** Asaf was asking for which was providing backwards compatibility -- that we already have. We already have 3D code for the 68000, we already have a JPEG decoder for the 68000. And no one is making him replace his 68000 JPEG datatypes with AMMX optimized JPEG datatypes. And writing an AMMX "emulator" for the 68000 while possible would be MUCH slower than than just using the regular 68000 JPEG datatype. And obviously, if a developer is targeting a Vampire system, there's little-to-no hope of running it on anything else simply because of the expectation of performance.

But maybe I have more than one computer and only enough cash for one Vampire. Maybe I want to be able to develop Apollo code on the one in my living room when the Vamp is in the 1000 upstairs. I don't care if it's 1000 times as slow -- I only want to know if what I just wrote is going to work. And for that, having this kind of emulation is actually really handy.


Renee Cousins
(Apollo Team Member)
Posts 142
01 Dec 2017 05:46


Saladriel Amrael wrote:
I think what is happening here with Vampire is the same thing: some software will make sense to be recompiled with backward compatibility in mind, some others not.

Agreed. So far, all AMMX "optimized" programs have non-AMMX versions (either developed in parallel or simply as matter of legacy). Aside from a few demos and highly specific utilities for the beta testers, I know of nothing that has been exclusively made for the Vampire, and honestly, it wouldn't make a whole lot of sense to do that right now. We're still in the "optimized for AGA" stage of this with a handful of apps having a considerable boost if they leverage the features, but can still run fine on 68040 and 68060 systems.

Post-V4, with a few hundred or maybe thousand sales behind us, we'll start seeing stuff that has zero consideration for non-Apollo-Core compatibility.


Steve Ferrell

Posts 424
01 Dec 2017 05:58


Renee Cousins wrote:

  Didn't say that though, did I. And I take offence to that -- I've been doing firmware development since the mid 90's. From the 68332 to ColdFires to STM32's with more number crunching power than I really know what to do with. I've developed for 8-bitters to 64-bitters and I think I know what I'm talking about.
     

     
Uh, yes you did say that and now that you've been called out on it you're attempting to back pedal and I'm having none of it. 
   
I don't care if you're offended or not.  No one in their right mind tries to emulate a contemporary CPU on a 25 year old CPU, which is exactly what you and Asaf both suggested and you went so far as to say that it's quite common.  It is not common and it isn't even useful as evidenced by the example that YOU provided which stated that it took 2 hours to boot and open a shell session.  I also don't  care that you've been doing firmware development since the 90's either.  That has nothing to do with CPU design.  I'm also not interested in comparing your work experience, educational background, nor the size of your genitalia with mine either.  And no, you don't know what your talking about as evidenced by your exchange with Gunnar over the Coldfire and its missing address modes and how to code for it.  You really should know more about a given CPU before you decide to lecture Gunnar on the best ways to develop for it.  For someone who supposedly works with the Coldfire you're found quite lacking and here's Gunnar's response to you in case you forgot:
   
Gunnar von Boehn wrote:

    No you misunderstand this.
    The Coldfire misses not just some rare EA modes
    but misses fundamental basis Address modes - which programmers and compilers used to use all the time.
    Not only that the FPU can not uses float immediates, it can also not address memory pointed to by absolute $32bit addresses.
   
    C/C++ compilers do of course support floating point constants and will happily use them.
    Just look at GCC internals and look at code created by GCC.
     
    Code like this is normal:
    FMUL #0.5,Fp0
    FMUL #Pi,Fp0
     
    Of course the Coldfire can not support this, and has to create the same result using several instructions.
    And this is just one example where the Coldfire needs to execute more instruction to do the same work.
    The Coldfire often needs more instructions to do the same work.
    There are many 68K instruction like MOVEM xxx,--(sp) or DBRA
    which the Coldfire can not support and needs several instructions to get to the same result.
     
    In average, the Coldfire V4 is significantly slower than 68060 at the same clockrate. The V4 chips can make up for this by having a higher clockrate.
    You see this results also on the ATARI benchmarks.
    In many KRONOS tests, the V4 Coldfire is lot slower than a 68060 running at the same clock as the Coldfire would be.
    Coldfire has the potential to be faster for some algorithms with optimized MAC code - similar as 68080 can use AMMX to get a speed gain.

     


Mr Niding

Posts 459
01 Dec 2017 09:56


@Steve Ferrell
 
I have no comments wether or not you are correct on any given topics, as your knowledge about hardware and software is eons ahead of mine, but;
 
Why do you always resort to sharp elbow style postings? Do they result in more efficient communication?
CRM research (Crew Resource Management or Customer Relations Management)suggests on the contrary.


Kolbjørn Barmen
(Needs Verification)
Posts 219/ 2
01 Dec 2017 09:58


There are a few 68000 systems that in general run in circles around several 020+ systems, for example the Minimig that runs a 68000 at 50MHz (well, 7 * 7.14MHz, iirc), and of course it is annoying when programs you want to use do not run, due to demands for 020+.  For example, I wanted to use a library for talking over serial port with arexx (dignet - http://aminet.net/package/util/libs/DigNet), but it doesn't work on 68000. Likewise, there are bits and pieces in OS3.9 that (for no good reason really) are 020+ only, without any 68000 equivalents. Yeah, OS3.5+ were sold as 020+ only, not that I ever saw any good technical reason for doing so.
 
  Anyways, then there is this, that anyone interested can improve on :)
 
  EXTERNAL LINK


Markus B

Posts 209
01 Dec 2017 10:06


I am impressed by the non-sense you're discussing here.


Aksel Andersen

Posts 120
01 Dec 2017 10:20


Aren't you guys a bundle of joy..


Gunnar von Boehn
(Apollo Team Member)
Posts 6207
01 Dec 2017 15:13


Account for sale wrote:

Anyways, then there is this, that anyone interested can improve on :)

Unfortunately this does not allow you to run 020 code.
The 68020 adds several new EA modes, which are used often
Without emulating them you can not use 020+ programs.



Renee Cousins
(Apollo Team Member)
Posts 142
01 Dec 2017 15:46


Gunnar von Boehn wrote:
Again running the 68080 datatype in an emulator on an 68000 - is not a sensible option. If an 68000 CPU would emulate the 68080 instruction, then the datatype would maybe reach only 5% speed.

The performance is unchanged for the 90% of the datatype code that's just regular MOVE, DBRA and such instructions. You're right in general, a 68080 emulator on a 68000 will be slower than an optimized 68000 version of that code, but by how much depends on how efficiently you're handling the exceptions, how frequently they're used in the code and how the routine performs against the optimized-for-68000 version. A quick back-of-napkin check shows performing four 8-bit additions in series to be about as slow as doing it in parallel with some bit twiddling, so the algorithm itself is unlikely to be a big component.

Also, let the record show that there is even an MMX emulator for chips without (e.g., 386/486) -- EXTERNAL LINK So yes, these things are real and they get used.

But to be clear, I am NO WAY expecting the Apollo Team to EVER provide this level of support. That would be insane to expect that of resources already stretched so thin -- if this ever came to the light of day it would have to be by the hands of some crafty developer who loves to tinker and who sees a genuine need for it. Personally, my time is money and it's a LOT cheaper for me to just buy a Vampire.


Renee Cousins
(Apollo Team Member)
Posts 142
01 Dec 2017 17:10


"Code like this is normal:
    FMUL #0.5,Fp0"

Honestly, using float literals like this is a little wasteful. If you need that same constant anywhere else in your program, you'e just wasted eight bytes -- and that can add up. Using FMUL (d16,PC) or (d16,Ax) will be exactly as fast as your example and conserve constant memory. It's common practice on the ColdFire to use A5 as your "constant base", but using PC-relative code would be more conforming of the Amiga-way.

You can always talk corner cases where some 68060 operations will be faster than on the ColdFire, but generally, the ColdFire faster than the 68060, clock-for-clock.

Exceptions take 19~23 cycles on the 68060 and four on the ColdFire.
RTS on the 68060 takes seven clock cycles and two on the ColdFire.
RTE on the 68060 takes seventeen clock cycles and fifteen on the ColdFire.

ColdFire has several new, very handy instructions like MVS and MVZ as well as the (E)MAC which the 68060 does not have and can provide many DSP and AMMX-like operations. Using the DSP library and EMAC macros to perform some parallel arithmetic can quickly exceed the processing abilities of the 68060. Although I miss the CPU32 table lookup and interpolate commands, those are super handy.

So yeah, sometimes you don't have the same depth of addressing modes, and if all you do is write assembler code, that MIGHT be a bit of a pain. But the addressing modes that are left are still MUCH deeper than with most other RISC processors (e.g., ARM) and generally, you can always figure out how to accomplish the same thing without loosing one clock cycle.


Gunnar von Boehn
(Apollo Team Member)
Posts 6207
01 Dec 2017 20:47


Renee Cousins wrote:

  You can always talk corner cases where some 68060 operations will be faster than on the ColdFire, but generally, the ColdFire faster than the 68060, clock-for-clock.
 

The Coldfire V4e is not bad, and is the best Coldfire available.

The Coldfire V4e has 1 EA Unit and 1 ALU unit.
In comparison the 68060 CPU has 2 EA Units and 2 ALU units.
The Coldfire V4e can execute max 1 ALU instruction per cycle,
the 68060 can execute max 2 ALU instructions per cycle.

Example:


AND.L D0,D1
EOR.L D2,D3

The 68060 can execute both instruction together in 1 cycle.
The Coldfire V4e needs 2 cycle.
 


Steve Ferrell

Posts 424
02 Dec 2017 16:53


Gunnar von Boehn wrote:

Renee Cousins wrote:

  You can always talk corner cases where some 68060 operations will be faster than on the ColdFire, but generally, the ColdFire faster than the 68060, clock-for-clock.
 

  The Coldfire V4e is not bad, and is the best Coldfire available.
 
  The Coldfire V4e has 1 EA Unit and 1 ALU unit.
  In comparison the 68060 CPU has 2 EA Units and 2 ALU units.
  The Coldfire V4e can execute max 1 ALU instruction per cycle,
  the 68060 can execute max 2 ALU instructions per cycle.
 
  Example:
 

  AND.L D0,D1
  EOR.L D2,D3
 

  The 68060 can execute both instruction together in 1 cycle.
  The Coldfire V4e needs 2 cycle.
 

I find it interesting that people who don't even own a 68060 nor do any CPU design want to lecture you about how a Coldfire or some other 68K variant is faster than a 68080.  You would think that they would provide some benchmarks to back up their claims, even if they were synthetic benchmarks. That would be the responsible thing to do.


Saladriel Amrael

Posts 166
02 Dec 2017 18:08


Renee Cousins wrote:

   
Saladriel Amrael wrote:
I think what is happening here with Vampire is the same thing: some software will make sense to be recompiled with backward compatibility in mind, some others not.

      Agreed. So far, all AMMX "optimized" programs have non-AMMX versions (either developed in parallel or simply as matter of legacy). Aside from a few demos and highly specific utilities for the beta testers, I know of nothing that has been exclusively made for the Vampire, and honestly, it wouldn't make a whole lot of sense to do that right now. We're still in the "optimized for AGA" stage of this with a handful of apps having a considerable boost if they leverage the features, but can still run fine on 68040 and 68060 systems.
     
      Post-V4, with a few hundred or maybe thousand sales behind us, we'll start seeing stuff that has zero consideration for non-Apollo-Core compatibility.
   

   
  I can't see anything wrong with your view. In all truth it's what I do hope for: evolution.
 


Kolbjørn Barmen
(Needs Verification)
Posts 219/ 2
03 Dec 2017 00:21


Renee Cousins wrote:

So far, all AMMX "optimized" programs have non-AMMX versions

Yes, all one of them :)


Roy Gillotti

Posts 517
03 Dec 2017 03:04


Account for sale wrote:

Renee Cousins wrote:

  So far, all AMMX "optimized" programs have non-AMMX versions
 

 
  Yes, all one of them :)

We have an AMMX optimized SDL library, used in MilkyTracker and a few other ports, so it can be more than one.


Thierry Atheist

Posts 644
03 Dec 2017 09:05


Account for sale wrote:

Renee Cousins wrote:
So far, all AMMX "optimized" programs have non-AMMX versions

Yes, all one of them :)

That would be because the IAR (program counter) starts at zero.


Mallagan Bellator

Posts 393
03 Dec 2017 14:38


Gunnar von Boehn wrote:

    _68080_ does support:
    100% of all 68000 instructions
    100% of all 68030 instructions
    100% of all 68040 instructions
    100% of all 68060 instructions
    So 68080 is fully compatible
   

   
    Does the 030 hold all instructions of the 010 and the 020?
    For the 080 to be fully compatible it will need all instructions, or support for them
   
 
Gunnar von Boehn wrote:

    68080 has of all CPUs in the 68K family to widest instruction set support - and is the most compatible.
   

   
    This is great, and I have understood this for a long time now.
    However, just for clarification, ”widest” doesn’t mean ”complete”. If the 080 instruction set IS complete inregards to at least supporting ALL previous 68k versions, including the 010 and 020, that’s awesome. In this case, one could use the fastest and best instructions across all the previous versions while porting games/programs for the 080, as well as make new games.
   
    Now, I do understand that the software base that will only work on 010 may be very slim, if there even is such software at all (lol), but I would believe that all games that were made with bare bone A1200 and A4000 in mind, would be programmed with 020 instructions. I don’t think previous cpus would support the AGA chipset at all
   


Gunnar von Boehn
(Apollo Team Member)
Posts 6207
03 Dec 2017 15:29


Mallagan Bellator wrote:

Gunnar von Boehn wrote:

68080 does support:
  100% of all 68000 instructions
  100% of all 68030 instructions
  100% of all 68040 instructions
  100% of all 68060 instructions
So 68080 is fully compatible

     
This is great, and I have understood this for a long time now.
However, just for clarification, ”widest” doesn’t mean ”complete”.

 
Yes but 100% means 100% complete. :-)
And the widest means that there is no other 68K family CPU supporting so many 68K instructions as the 68080.
 

posts 73page  1 2 3 4