Overview Features Coding ApolloOS Performance Forum Downloads Products Order Contact

Welcome to the Apollo Forum

This forum is for people interested in the APOLLO CPU.
Please read the forum usage manual.
Please visit our Apollo-Discord Server for support.



All TopicsNewsPerformanceGamesDemosApolloVampireAROSWorkbenchATARIReleases
Performance and Benchmark Results!

Why Does UAE Cheat In Benchmarks?page  1 2 3 

Gunnar von Boehn
(Apollo Team Member)
Posts 6207
01 Jul 2019 07:10


You know that our goal is always to improve CPU performance.
For this we continuously improve the APOLLO 68080 CPU core.
 
You might have see in latest benchmarks that we started to improve 68080 top speed even more.
68080 can now execute up to 5 instructions per cycle peak!
 
This means 68080 is by far the world fastest 68K CPU.
Compared to TG68 used in other systems many benchmarks show that APOLLO 68080 is really 25-30 times faster than TG68.
 
Most people know that UAE on modern x86 cores is very fast too.
Looking at some benchmarks in detail, we noticed something very interesting.
 
UAE does cheat on benchmarks.
 
On several benchmarks UAE does not execute the instructions as intended but cheats and gets by this unrealistic high scores.

In some tests the cheated benchmark scores are over 10 times faster because of these cheats.
 
Of course this is misleading to users.
 
And interesting question is, why is this cheat feature added to UAE?


Mister Cartoonmonkey

Posts 57
01 Jul 2019 07:47


I don't think you have to worry about selling product versus UAE. Could you be confusing their benchmark scores with the fact that they might run them using jit? To be fair I have never really paid attention to the fact that they might publish their scores with jit on or off... Okay please ship be a vampire 1200 right now. Thank you! Haha


Gunnar von Boehn
(Apollo Team Member)
Posts 6207
01 Jul 2019 08:17


Mister Cartoonmonkey wrote:

  Could you be confusing their benchmark scores with the fact that they might run them using jit?
 

 
If you monitor UAE JIT then you will clearly see
that there is a "benchmark mode" active which is SKIPPING instructions
 
Let me give you a example:
Some benchmarks have code like this to measure the speed of certain instruction

  MOVE.l #1000,D7
  LOOP
    ADDi.L #1,D0
    ADDi.L #1,D2
    ADDi.L #1,D3 
    ADDi.L #1,D4
 
    subq.l #1,D7
    BNE  LOOP
 

 

What is intended the purpose of the benchmark code?
The purpose is to measure how long the execution of 6000 instruction takes
 
Now what does UAE JIT do here?
Does it really execute 6000 instructions?
 
No, it does NOT.
 
The emulated code will produce the same result.
But it executes magnitudes LESS instructions.
 
The purpose of the code is not just to put #1000 in D0,D1,D2,D4.
Changing the code here will result on false benchmark results.
 
 
UAE scores many times higher MIPS in such tests,
than even native x86 code could reach!
 

The net effect for the users is that SYSINFO, and other benchmarks have false stellar scores.
 
 


C. Nicolakakis

Posts 5
01 Jul 2019 08:26


That is why I don't trust too much such benchmarks.
I prefer certain benchmarks like SysSpeed's 'ADPro' and 'ImageStudio' tests which use real applications to do real tasks like JPEG load, Blur, etc.

Incidentally in those tests WinUAE on my old Core2 Quad 3.4 Ghz is on average about 10 times faster than a 50Mhz 060.



Gernt Gerloff

Posts 49
01 Jul 2019 08:38


google: "JIT Optimization"

Thats the way java is able to be sometimes even faster than the native C code.
especially dead code is a good example what a JIT can optimize better than the compiler before, but a GCC would also remove such construct.
If you make a useless for loop count to 100000 or something gcc replace that with a simple assignment.

And so does JIT optimization. Its not really benchmark mode, it will happen in every application that have such code parts

for example: EXTERNAL LINK


Gunnar von Boehn
(Apollo Team Member)
Posts 6207
01 Jul 2019 08:48


Gernt Gerloff wrote:

  If you make a useless for loop count to 100000 or something gcc replace that with a simple assignment.
 

 
This is clear.
 
But replacing the benchmark loop with different code will result in totally wrong MIPS scores in SYSINFO and friends.
 
 
 
Lets make a car example:
With cars an important information for SPEED is how many seconds the car needs to reach the real speed of 100/KMh.

Does your car need 9 seconds or 7 or only 5?
 
The real work here is to accelerate the car to 100MHz -
not only to move needle on the speedometer to 100 units.
 
If the JIT changes the work to only move the needle on the speedometer to 100,  and not moves the car anymore,
then any result means nothing anymore.


Gernt Gerloff

Posts 49
01 Jul 2019 09:06


Gunnar von Boehn wrote:

  Lets make a car example:
  With cars an important information for SPEED is how many seconds the car needs to reach the real speed of 100/KMh.
 
  Does your car need 9 seconds or 7 or only 5?
 
  The real work here is to accelerate the car to 100MHz -
  not only to move needle on the speedometer to 100 units.
 
  If the JIT changes the work to only move the needle on the speedometer to 100,  and not moves the car anymore,
  then any result means nothing anymore.

Exactly, very good example, because normal people do not care about these numbers, including me.
I only care about how fast it brings me to my destination, and if it has a way to find a faster (equally safe) way of doing that (a secret tunnel through the alps :-P only for me), I'm all for it.

as Nicolakakis wrote... who cares about a MIPS number, I'm only interested in how fast it loads a JPEG or renders my picture (as long it does it without errors), the rest is academics.


Pedro Cotter
(Apollo Team Member)
Posts 308
01 Jul 2019 09:27


What I find important and interesting in this thread is that people can better understand what is being benchmarked. And in the end the user have the knowledge to know the difference between JIT enabled benchmarks and real HW.

For ex, on my 3.2 Ghz Intel Amithlon machine, with JIT disabled, it is slower than the vampire.


Gunnar von Boehn
(Apollo Team Member)
Posts 6207
01 Jul 2019 09:29


Gernt Gerloff wrote:

  the rest is academics.
 

You are right.
 
But many people are not aware of this.
Some people might look at a SYSINFO score and believe what they see.
 
Let us make another example:
Lets say there are those instructions:
 

  addi.l #1,D0
  addi.l #1,D0
  addi.l #1,D0
  addi.l #1,D0
  addi.l #1,D0
  addi.l #1,D0
  addi.l #1,D0
  addi.l #1,D0
 

 
A clever(read=cheating) JIT might replace this with

  addi.l #8,d0

 
The result in D0 is the same.
But not the result of the benchmark.

The Benchmark Sysinfo will believe the CPU did execute 8 instructions not 1.
Therefore it will calculate you a MIPS number based on 8 instructions.
 
== it will print out 8 times higher MIPS
 
Is this number correct or wrong?
Will the user be mislead?
 
 
The Apollo Team always works on improving the APOLLO 68080 CPU.
We REALLY execute all instructions in those benchmarks we NOT skip some.
 
What we you think if we add a PRE-PROCESSOR which for such instruction streams that alters the instructions so that APOLLO in deed does the same here and also only executes
 


  addi.l #8,d0
 

 
Wouldn't this be nice?

Our benchmark scores would instantly be a magnitude higher.
SYSINFO reports today ~200 MIPS
 
How would you call such pre-processor which instantly gives us 1000 or 2000 MIPS scores in SYSINFO and friends?
 


Andy Hearn

Posts 374
01 Jul 2019 09:37


just for the sake of my interest, could the same conceptual (and very basic) parallel be drawn with PCTask's dynamic instruction buffer?
why bother emulating a set of instructions, when you may have  already done that piece of work previously, and already know the result?

with UAE you can see the JIT vs non-JIT scores are crazy - again, how much this translates to real world performance is what we're after.

never mind the fact that Quake2 on an Amiga500/600 is now a possibility. :D


Gernt Gerloff

Posts 49
01 Jul 2019 09:40


as long as I do not care about the MIPS number, of course it would be nice if you have a JIT inside the Vampire.

If you want to look at that, in such an academic way, in principle branch prediction, pipe-lining and even caching are "cheats" spoiling the real result of the benchmark MIPS calculation. But who would say that... nobody. So where you draw the line.

I do not buy my computers by Ghz/Flops/Mips.... if by a number then by fps in a recent game i would like to play or maybe 3D Mark which is a fairly good test, because so close to actual games.

Same with my car, I did not buy because it has xxx PS or make x.x s to 60 mph (I never pressed the pedal to the metal in my life time, so why should i care) I drove it for a 2 days test and it was good, therefore I bought it.


Gunnar von Boehn
(Apollo Team Member)
Posts 6207
01 Jul 2019 09:46


Gernt Gerloff wrote:

  So where you draw the line.
 

 
I think, you ask the right question!

Lets look at this:
 


  addi.l #1,d0
  addi.l #1,d0
  addi.l #1,d0
  addi.l #1,d0
  addi.l #1,d0
  addi.l #1,d0
  addi.l #1,d0
  addi.l #1,d0
 

 
The fact is, that no real program would have this silly code.
Only benchmarks have this code.
 
So rewriting these 8 instructions to 1 instruction will not work  for normal programs.  It will NOT make your computer run any program faster.

This trick only trigger so well in SYSINFO and SYSSPEED.
 

So this "feature" looks great in benchmarks but is useless for real programs.
 
 
Here is the difference with real CPU improvements
like Branch-prediction or Data-caches - those features improve all programs.

I think this is where the line needs be drawn.

E.g. If your DIV instruction take 32 cycle, and you internally tweak the DIV to be faster and do the same work in 16 cycle - then this improves all programs and also maybe a benchmark.
So this is real improvement.




Gernt Gerloff

Posts 49
01 Jul 2019 10:04


I strongly disagree on that statement.
I do a lot of Java profiling and it’s amazing how that jit optimising works. Whole parts of code are skipped or replaced by a single instruction because it already knows the result from a previous loop cycle or even from a very different part of code. You can’t even follow, why it’s doing that. The only problem arise if there is a glitch inside and it actually introduce bugs, but this became seldom this days.


Olaf Schoenweiss

Posts 690
01 Jul 2019 10:07


Sorry why should Toni Wilen "cheat" with Benchmarks so that UAE looks faster than it is in those amiga-specific benchmarks?

That is a really silly accusation

You could say you found some JIT-specific optimizations who affect amiga-benchmarks


Gunnar von Boehn
(Apollo Team Member)
Posts 6207
01 Jul 2019 10:09


Gernt Gerloff wrote:

I strongly disagree on that statement.
I do a lot of Java profiling ...

Hi Gernt,

On what statement do you disagree?

* That BRANCH_PREDICTION does helps all programs a like?
* That replacing 8 ADDI by 1 ADDI has no real world useage?

One question how good can you read 68K ASM syntax?



Gunnar von Boehn
(Apollo Team Member)
Posts 6207
01 Jul 2019 10:14


Olaf Schoenweiss wrote:

    Sorry why should Toni Wilen "cheat" with Benchmarks so that UAE looks faster than it is in those amiga-specific benchmarks?
     
      That is a really silly accusation
     
      You could say you found some JIT-specific optimizations who affect amiga-benchmarks
   

   
Olaf,
   
Please read careful. :-)
 
I did not mention Tonis name at all.
I did not accuse him.
And btw did Toni even write the UAE JIT? I think not.
What I said is that UAE not executes the benchmark code as intended.
 
A marathon is 42175 meter distance.
So if you run a marathon you are supposed to run 42175 meter.
If you "Skip" 42000 of those meters and only run 175 meter - your result is not correct. People call this cheating.
 
If a benchmark loop has 42175 instructions.
The intention of the benchmark is that CPU to executes all of them.
Now if UAE (out of whatever reason) does not execute 42175 instructions but only 175.
Its cheating too.
 
     
What we clearly see is the result:
Instructions are getting "condensed" which results in totally wrong MIPS numbers.
The calculated score is simply wrong, false, and incorrect.
   
This outcome is a fact - undisputable fact.
   
Popular AMIGA benchmarks are mislead by this and calculate much to high MIPS numbers.
 
 
I did NOT say Toni did this with evil intension.
Nor did I say Toni did this code at all!
In fact Toni did not write the 68k JIT.
 
Nevertheless it is a fact is that this "adding up" of ADDI instructions - has very high impact on benchmarks - and only there.
And I can see no benefit of this for real world programs this this tuning.
AMIGA games will not run faster.
DOOM will not be faster because of ADDI summing up
QUAKE will not be any faster.



Gernt Gerloff

Posts 49
01 Jul 2019 10:21


So rewriting these 8 instructions to 1 instruction will not work  for normal programs.  It will NOT make your computer run any program faster.

This 8 instructions skip is nothing. As I said I saw in Java profiling how the jit optimiser removed whole code parts of maybe 100 lines so the improvement is out of scale. My point is that benchmark optimisation is a pure side effect of the VERY helpful jit optimiser which speed up every program. To remove that behaviour it would need special benchmark mode of the optimiser.

I‘m not really into 68k syntax but I can identify a move or an add if I see it ;-)


Stefan "Bebbo" Franke

Posts 139
01 Jul 2019 14:53


you should use vamos not winuae ^^  /duck


Marian Nowicki
(Needs Verification)
Posts 22/ 1
01 Jul 2019 17:22


That's how JIT works.
It takes block of code, translate, optimize and then execute.
JIT in uae is since 2000. It works very well.
Of course if in doom/q1 etc will be some stupid code as in sysinfo it will optimize it and doom/q1 etc will will work better.
Maybe it is time to switch from sysinfo to better benchmark.




Markus (mfro)

Posts 99
01 Jul 2019 18:32


wouldn't call that cheating.

With JIT activated, you already indicated you want this to run as fast as possible and do not care about cycle accuracy. If the JIT compiler is smart enough, why shouldn't it take the possible shortcuts it recognizes?

You most likely also take shortcuts (on a slightly different level) when the FPGA coder is smart enough to realize one can execute instructions super-scalar and/or in a deep pipeline. Would you call that cheating as well?

posts 46page  1 2 3