Overview Features Coding ApolloOS Performance Forum Downloads Products Order Contact

Welcome to the Apollo Forum

This forum is for people interested in the APOLLO CPU.
Please read the forum usage manual.
Please visit our Apollo-Discord Server for support.



All TopicsNewsPerformanceGamesDemosApolloVampireAROSWorkbenchATARIReleases
Performance and Benchmark Results!

Why Does UAE Cheat In Benchmarks?page  1 2 3 

Gunnar von Boehn
(Apollo Team Member)
Posts 6207
01 Jul 2019 19:49


Markus (mfro) wrote:

  If the JIT compiler is smart enough, why shouldn't it take the possible shortcuts it recognizes?
 

 
Please tell me, what do you think is the purpose of such Benchmark programs?
Is it not that this benchmarks want to "measure" the speed of a CPU to give the user an indication how fast the CPU is?
 

Lets say your benchmark program does


REPEAT 42
  addi.l #1,.D0
END REPEAT

This means we have a block of 42 instructions.

The "smart" JIT translates this into
addi.l #42,D0
This means the JIT translates this into 1 Single instruction

Your JIT CPU will now look like it can do 42 instructions per cycle.
This is impressive!
And no real CPU on earth can do this!

 
But this is this super speed "true" also for other programs?
Can you name any real live program which benefits from this "cheat" the same way?

I bet there is no real like program which benefits from this the same way.
Only the benchmark will look super fast - in real live this JIT will not perform like this benchmark will wrongly report.

You can name it "cheat"
you can name it "tuning working optimally in benchmarks",
you can name it "smart rewrite which benefits benchmarks the most"

The result is the same.


C. Nicolakakis

Posts 5
01 Jul 2019 20:26


I have a feeling that we are trying to assign blame for these inaccurate sysinfo numbers where there isn't any.

When SysInfo was written there was no such thing as JIT-enabled emulators so it was tested on existing Amigas and did OK for that hardware.

Years later JIT compiling was used in WinUAE to speed up CPU emulation and also did OK despite its flaws.

JIT compiling in WinUAE affects negatively more programs than just SysInfo.
There are some WHDLoad games that run so fast when JIT is enabled that are unplayable and I have been unable to slow them down.

This isn't the JIT compiler's fault but just happens because that was the way the game was written.
And you can't really blame the game's programmer for taking shortcuts that would cause problems decades later on emulated Amiga systems or other Amiga hardware that didn't exist when that game was written.




Mike Kopack

Posts 268
01 Jul 2019 20:29


Gunnar:
 
  While I don't disagree with you in terms of it being a "cheat", realize that it's not like UAE included the JIT just to get higher scores on benchmarks - the JIT is there to get the best performance under the emulation as possible by taking advantage of optimizations available for the emulation environment on it's host.
 
  If one wants a realistic impression of simulated performance vs. hardware, one should turn off the JIT, and turn on every option necessary to try to do 1:1 hardware matching. In such a case, the emulator should produce benchmark scores that closely match what the actual emulated hardware would produce.
 
  That would make for the most accurate "apple to apples" comparison, taking out any performance advantages of the different host system hardware/software.
 
  BUT - consider - people are NOT using UAE to get cycle-accurate Amigas (except maybe when playing games) - they're using it to get a hyper-fast Amiga, for instance to do things like Lightwave rendering. In such a scenario, they're certainly going to turn on the JIT and every other option that will make UAE run as fast as possible, using every ounce of performance the host system can provide to the emulator to make the Lightwave render go as fast as possible. In that sort of scenario, it's probably going to outperform even the Vampire.
 
  And you know what, that's PERFECTLY OK!!! 
 
  For those who want to run on the old hardware, or who want the stand alone to act as a "modern" hardware Amiga - the Vampire is a fantastic solution, even if it can't quite keep up with UAE running on a modern high power PC. Hell, even running UAE on my Macbook uses orders of magnitude more electricity than the V4SA will.
 
  For those who want absolutely the fastest they can get no matter what, or who don't care about having an independent system or prefer VM's (like me with Linux, other than RasPi's I only ever run Linux inside of VM's on my Macs), there's the emulation option.
 
  To each their own. Don't take it as an insult. Just gently remind folks that comparing a modern machine running UAE with a JIT and tons of optimizations is not a fair fight to the Vampire - Apples and Oranges. Show me another hardware based Amiga that can compete with a Vampire running 68K (not PPC), they fail miserably.  That's honestly why so many people who like the old hardware want the Vampire for the A1200 and the V4 for the 500/600, etc.
 
  So are they "cheating" the benchmarks - yes and no... It just depends on how you look at it and what your end goal is.


Gunnar von Boehn
(Apollo Team Member)
Posts 6207
01 Jul 2019 20:43


Mike Kopack wrote:

  Gunnar:
 
  While I don't disagree with you in terms of it being a "cheat", realize that it's not like UAE included the JIT just to get higher scores on benchmarks -
 

 

Maybe it makes sense to think about this.
 
INTERPRETING:

A CPU instruction of a foreign CPU can be interpreted.
UAE does this,  PCTask can do this, other emulators lik C64 do this too.

If you "interpret" an instruction you need to first identify what instruction it is, and then you need to jump to the subroutine
  emulating this instruction.
As a ball park number you roughly need 40 clocks to interpret a single foreign opcode.
 
JIT:
The main idea of JIT is to remove this "identify what instruction it is" overhead. This means you translate each instruction ONCE and store the translated instructions.
This will save the identification and will greatly improves speed - for long loops!

PC-Task can do this, UAE can do this.
 
 
 
PC-Task for example will translate 1 to 1.
This means each instruction is translated and saved.
This means benchmarks run in PC-Task and real-live programs behave and score the same. This is realistic.
 
 
Now UAE added an extra "optimization" step.
Mind that an extra step - first of all will slow you down!.

This optimization step by accident greatly affect benchmark code.
This tweak does what we have talked here about.

This "tweak" needs extra instructions =more cycles during the JIT.
This means this "tweak" cost more cycle and makes "normal" code actually take longer to JIT!
=  it will normally slow you down!

But it makes benchmarks shine greatly!
So you will believe to have a super speed .

Lets be frank here:
This extra code takes time!
I personally would not have included it into UAE JIT as it benefit most real live programs not at all.
 



Markus (mfro)

Posts 99
01 Jul 2019 20:58


Gunnar von Boehn wrote:

Markus (mfro) wrote:

  If the JIT compiler is smart enough, why shouldn't it take the possible shortcuts it recognizes?
 

 
  Please tell me, what do you think is the purpose of such Benchmark programs?

I'm in the IT industry since nearly 40 years and I'd say: Benchmarks originally were invented by end users to see what their CPUs can do.

For a few years only, until marketing discovered them.

Since then, their main purpuse is to sell hardware and compilers.

  Nowadays, SAP servers are optimized (=cheating) to look good in the SAP SD benchmark and Intel CPUs look good with Intel compilers...



Renee Cousins
(Apollo Team Member)
Posts 142
01 Jul 2019 22:24


The UAE JIT does **NOT** perform peephole optimization.


ExiE CZEX

Posts 48
01 Jul 2019 22:33


Or we can say it's just a bad benchmark. Amiga need some kind of Cinebech instead of synthetic tests...

BTW how many people do benchmark tests under WinAUE? I would never bother doing such things. For almost ten years most PCs emulating Amiga are so much faster than real Amigas, that who would care...


Tygre Chingu

Posts 32
09 Jul 2019 23:00


Dear Gunnar and all,
I understand your point, Gunnar, in theory but disagree in principle. Yes, we would like benchmarks to be as fair as possible and measure fairly the same things. As such, if UAE or some CPU doesn't execute all instructions, I understand that you see that as cheating.

But it is in the nature of UAE, its JIT, etc. to do so! So, in principle, you cannot blame them for skipping some instructions. It is fair game.

As it is fair game to compare processor with/without branch prediction, caches, etc.

Thank you though, Gunnar, for your amazing work!


Don Adan

Posts 38
10 Jul 2019 00:30


Exist easy solution to solve JIT optimisation trick. Test loop must use all 68000 registers. 9 Address registers and 8 data registers. Some real routines used up to 16 data (word) registers (via swap command). It will be slow down all available JIT to real speed.


Vojin Vidanovic
(Needs Verification)
Posts 1916/ 1
10 Jul 2019 01:59


Gunnar von Boehn wrote:

  This means 68080 is by far the world fastest 68K CPU.
  Compared to TG68 used in other systems many benchmarks show that APOLLO 68080 is really 25-30 times faster than TG68.
   
  Most people know that UAE on modern x86 cores is very fast too.

 
  * Side note: Igor did start with tweaking TG68? And then Apollo team jumped in ... ? :)
 
  Every emulation is a cheat, but convenient one.
 
  Modern x86s are very fast too, and UAE is nice in being able to abuse it. Really love how the emulator improves and spreads. UAE on ARMs and Androids works fine, on Linux too, and even OS4 needs one from time to time. There were even people trying UAE m68k on a Vampire to emulate some OCS games they use to fail.

  Legal mambo jumbo requires OS and WB, if there wasn't that limit, it would be more AmigaForever Light or even AmiKit XE like by now :)
 
  Its a preservation. Until now. But gladly, we are the generation that care about real hardware. And beside, UAE cant emulate Vampire, yet.


Renee Cousins
(Apollo Team Member)
Posts 142
10 Jul 2019 02:40


Let's put a pin in this. The following code was created with .rept 16 and .rept 256.

EXTERNAL LINK 
I compiled it using gcc under UAE and ran each version.

EXTERNAL LINK 
There is *more* than a sixteen fold increase in execution speed.

I repeat, UAE does not perform peephold optimizations. There were versions of the UAE JIT that did. Hatari still uses this version, maybe others do as well. But since this was ONLY ever a benchmark cheat, it was removed since it slowed regular (e.g., non 'optimizeable') code down a lot.


Markus B

Posts 209
10 Jul 2019 10:40


I wouldn't call it cheating if the benchmark routine can be highly optimized with the help of the JIT compiler.

Isn't there something similar in the Apollo Core to detect unoptimized code and replace it with better code? But maybe I remember it completeley wrong here.

Comparing UAE JIT/non-JIT with something like C4D should give a much better impression of 080 performance compared to current emulation systems.


Renee Cousins
(Apollo Team Member)
Posts 142
10 Jul 2019 15:04


Markus B wrote:
Isn't there something similar in the Apollo Core to detect unoptimized code and replace it with better code? But maybe I remember it completeley wrong here.

Fusing: EXTERNAL LINK 
And UAE does not do this either.


Markus B

Posts 209
10 Jul 2019 17:23


But this fusing is done on the AC level, right? Old code gets optimized automatically.


Samuel Devulder

Posts 248
10 Jul 2019 18:25


Gunnar von Boehn wrote:

    Now what does UAE JIT do here?
    Does it really execute 6000 instructions?
     
    No, it does NOT.
     
    The emulated code will produce the same result.
    But it executes magnitudes LESS instructions.
   

    This is indeed a very good JIT optimizer. This is not proper cheating, it is smart optimisation.
     
   
Gunnar von Boehn wrote:

    The purpose of the code is not just to put #1000 in D0,D1,D2,D4.
   

This is arguable. In the end that's what we get. Of course the optimizer has no way to decide if this dummy code should be kept as-is or optimized. What's matters for the JIT is getting the correct result fast, not the way to compute it. If it can optimize, then why not let it roll?
   

    Changing the code here will result on false benchmark results.
   

Right, but good result anyway. What's important for a JIT: provide good results pretty fast or accurate benchmarks? The answer is clear: JIT are there to get result fast. (Benchmarks are of no real interest in every-day use of a computer.)
   

    The net effect for the users is that SYSINFO, and other benchmarks have false stellar scores.
   

Rule of thumb: never trust a benchmark which is too-simple or too-old anyway. Benchmarks regularly need to be updated to counter-act compiler-based or jit-based optimizations. SYSINFO is simply out of date to be a precise benchmark tool. We should switch to something more up-to-date to measure the speed of amigas IMHO.


Renee Cousins
(Apollo Team Member)
Posts 142
11 Jul 2019 02:26


UAE.

DOES.

NOT.

DO.

THIS.


Don Adan

Posts 38
11 Jul 2019 03:24


No, this is not very good JIT optimizer, this is very poor/stupid JIT optimizer. F.e simple delay loop used often on 68000 perhaps can looks next for this optimizer.
.
from:
  moveq #127,D0
loop
  dbf D0,loop

to

  move.l #$ffff,d0

results in D0 is same.



Renee Cousins
(Apollo Team Member)
Posts 142
11 Jul 2019 03:45


UAE.
 
  DOES.
 
  NOT.
 
  DO.
 
  THIS.
 
  There are no optimizers in UAE. There were. The peephole optimizers were removed back around 2006. Some others survived up to 2016, but are gone now and even then, did not perform peephole optimization.
 
  The overhead of the optimizers was proven to be slower in most use cases except for pointless benchmarks like SysInfo. It was quite smart but was too much work and totally counter productive.


Samuel Devulder

Posts 248
11 Jul 2019 06:42


So, why the original statement of BigGun?


Renee Cousins
(Apollo Team Member)
Posts 142
12 Jul 2019 04:13


Samuel Devulder wrote:

So, why the original statement of BigGun?

"There are three kinds of lies: lies, damned lies, and statistics." -- Mark Twain

I think the best takeaway here is that a simple benchmark like SysInfo can be too easily manipulated. It lacks 'algorithmic complexity' and is subject to all manners of "cheats". Not that JIT optimization and instruction fusing are "cheats" in the proper sense of this word.

Nonetheless, I feel too many will continue to rely on SysInfo as a valid measure of benchmarking -- and it can be as long as we're honest about what we're doing.

posts 46page  1 2 3