Overview Features Coding ApolloOS Performance Forum Downloads Products Order Contact

Welcome to the Apollo Forum

This forum is for people interested in the APOLLO CPU.
Please read the forum usage manual.
Please visit our Apollo-Discord Server for support.



All TopicsNewsPerformanceGamesDemosApolloVampireAROSWorkbenchATARIReleases
The team will post updates and news about our project here

EmuTOS for Amiga and Vampirepage  1 2 3 4 5 6 7 

Gunnar von Boehn
(Apollo Team Member)
Posts 6214
01 Mar 2017 20:54


Vincent Rivière wrote:

  Thanks :-)
   
    For modes up to 1024x768, I just used the VESA settings from the Picasso96 driver. I don't know exactly the pixel clock, as it is automatically computed.
 

There might be room for much better settings.
Please ping me and we can check.


OneSTone O2o

Posts 159
02 Mar 2017 15:34


Why in the Kronos benchmark is the FPU performance so low? Kronos sysinfo tells, there is a SFP004, that means at least a memory mapped 68881 has been detected by Kronos. (SFP004 is the original 68881 FPU extension card from ATARI to add in the Mega ST 68000 system bus - > EXTERNAL LINK )


Vincent Rivière

Posts 87
02 Mar 2017 15:55


Current Apollo 68080 has no FPU.
I believe that FreeMiNT (or Kronos?) SFP004 detection is completely bogus on 68080 (will have to be fixed).
When I ran Kronos FPU and OpenGL tests, I chose the 68000 variant, while reference CT60 tests had obviously been run on 68060 FPU.
Performance of soft-float will always be poor compared to real FPU.

However, it would be interesting to see the results of CT60 using the 68000 soft-float FPU test. That would be a good CPU benchmark.


Markus (mfro)

Posts 99
02 Mar 2017 18:11


Vincent Rivière wrote:
... I believe that FreeMiNT (or Kronos?) SFP004 detection is completely bogus on 68080...

 
  I would assume this is caused by Amiga/Apollo not firing a bus error on access to 0xfffffa40 (which is the memory-mapped SFP004 location).

Would also assume this is the same with a standard Amiga running Kronos on EmuTOS?


Vincent Rivière

Posts 87
02 Mar 2017 22:07


Thanks to everyone for your hints for SAGA, specially Flype.
I made tests with Picasso96Mode on AmigaOS. My monitor starts to have intermittent blanks when the pixel clock is set too high.
I will update the SAGA fVDI driver accordingly.


Olivier Landemarre

Posts 147
03 Mar 2017 20:00


There is no FPU detection in Kronos, Kronos only report FPU information of "_FPU" cookie so provided by the system.

This could be interesting to have pure 68000 openGL test under CT60, this quite easy todo only force choice of FPU before start the test, I have no CT60 to do this.

Olivier

Vincent Rivière wrote:

Current Apollo 68080 has no FPU.
  I believe that FreeMiNT (or Kronos?) SFP004 detection is completely bogus on 68080 (will have to be fixed).
  When I ran Kronos FPU and OpenGL tests, I chose the 68000 variant, while reference CT60 tests had obviously been run on 68060 FPU.
  Performance of soft-float will always be poor compared to real FPU.
 
  However, it would be interesting to see the results of CT60 using the 68000 soft-float FPU test. That would be a good CPU benchmark.




Vincent Rivière

Posts 87
05 Mar 2017 00:33


New EmuTOS snapshot:
EXTERNAL LINK 
emutos-amiga-rom-*.zip now contains emutos-vampire.rom. It is a special ROM optimized for Vampire V2:
- compiled for 68040
- hardcoded FastRAM settings
No AROS code is used. Provided ROM is full-featured, except missing AUTOCONFIG but that doesn't matter.

This ROM has identical features as the EmuTOS floppy optimized for Vampire. With the MapROM feature of the upcoming GOLD3, it will be easy to run EmuTOS from hard disk. Until that, you still need to boot the EmuTOS floppy (very easy with Gotek).


Olivier Landemarre

Posts 147
05 Mar 2017 08:34


I have received result from one friend using Kronos in full 68000, based on the opengl test give a performance of 71% of this CT60 (68060 running at 95Mhz).

I send you results to compare. Need some time to understand results, before share I think.

Olivier

Vincent Rivière wrote:

Current Apollo 68080 has no FPU.
  I believe that FreeMiNT (or Kronos?) SFP004 detection is completely bogus on 68080 (will have to be fixed).
  When I ran Kronos FPU and OpenGL tests, I chose the 68000 variant, while reference CT60 tests had obviously been run on 68060 FPU.
  Performance of soft-float will always be poor compared to real FPU.
 
  However, it would be interesting to see the results of CT60 using the 68000 soft-float FPU test. That would be a good CPU benchmark.




Vincent Rivière

Posts 87
05 Mar 2017 15:02


I have updated the fVDI driver for SAGA:
EXTERNAL LINK 
First I must thank Flype, Gunnar, Markus and others for their hints.

There was absolutely no issue with memory bandwidth limitation, as I incorrectly suspected. The problem of instability (occasional blanks) simply occurs when the Pixel Clock is set too high. On my monitor, it must stays < 60 MHz to be completely stable.

There are several solutions to reduce the Pixel Clock frequency:
- use RBT (Reduced Blanking Timing): they are just fine with LCD monitors
- decrease vertical refresh. My LCD monitor can deal down to 24 Hz without any trouble.

I have used the umc tool to compute new modelines:
EXTERNAL LINK 
Basically:
- I have kept standard timings for 4:3 modes
- I have used RBT timings for 16:10 modes
- I have reduced the frame rate to 30 Hz for 1440x900 and 1680x1050 video modes, in order to reduce the pixel clock.

Results:

1680x1050 (my native monitor resolution) is perfectly stable and crisp, in 30 Hz:
EXTERNAL LINK  The only drawback of 30 Hz is the mouse pointer which is not exactly as smooth as usual (not a big issue).
Of course, everything is very small in this resolution.

840x525 is exactly half of my native resolution. It is also perfectly stable, even in 60 Hz. There are no visible artifacts due to rescaling. This resolution is very good, as icons remains big enough. And the screen ratio is respected.
EXTERNAL LINK 
So: everything just work well, no more mysteries related to SAGA :-)


OneSTone O2o

Posts 159
05 Mar 2017 15:50


This is very impressive. Next step mith be extend fvdi to 24 and 32 Bit?


Gunnar von Boehn
(Apollo Team Member)
Posts 6214
05 Mar 2017 17:32


Vincent, very impressive what you did!
Emutos looks real nice.



OneSTone O2o

Posts 159
05 Mar 2017 21:17


Gunnar, it's even MiNT (Multitasking) as you can see GEMView picture viewer embedded on the GEM desktop.

Now again Kronos benchmark would be interestimg again to see if the performance is about the same as on the lower resolution.


Philippe Flype
(Apollo Team Member)
Posts 299
06 Mar 2017 08:17


Kronos results must be better at a lower resolution (at least for some tests) since the SAGA video DMA needed to ensure the according pixel clock will significatively decrease.


Vincent Rivière

Posts 87
06 Mar 2017 21:59


You are right, next step was indeed to run other Kronos tests. So I did.

Still using GOLD2 core.

To my surprise results in 1280x1024 were slightly different than previous ones. Maybe about 5%. Since previous test, I had recompiled EmuTOS and fVDI. Maybe that could explain some things, but certainly not difference in CPU performance. Maybe there is something slightly random somewhere.

Reference (100%) is still fVDI + FreeMiNT in 1024x768 60 Hz, but run again before other tests to avoid eventual difference of software I had recompiled.

1) fVDI + FreeMiNT in 1680x1050 30 Hz EXTERNAL LINK  Resolution is higher, but frame rate is twice lower. Pixel clock is 58.75 MHz (vs. 64.35 MHz in 1024x768).
CPU result is slightly lower, other ones are almost equal. I can't understand why CPU result is lower, as other results didn't change. Maybe some randomness. Anyway, everything is almost identical.

2) fVDI + FreeMiNT in 640x480 60 Hz EXTERNAL LINK  Frame rate is identical, resolution is lower, pixel clock is 24.05 MHz (vs. 64.35 MHz in 1024x768).
We can see +20% boost on CPU and Disk! Memory speed also increased a bit. Strangely, other values stayed almost the same. I still wonder if the disk benchmark is really relevant.

3) fVDI + EmuTOS in 1024x768 60 Hz EXTERNAL LINK  FreeMiNT provides preemptive multitasking, but it must be verified that it does not bring too much overhead. So I tested the same setup with plain single-task EmuTOS, still with fVDI.
CPU, soft-FPU and memory results are identical. This means that FreeMiNT does not bring any significant overhead, great OS :-)
VDI (graphics) is a bit faster (is this relevant?), maybe is is because the AES (windowing environment) is different (EmuTOS AES vs XaAES).
Disk is *much* worse with plain EmuTOS. This is mainly because FreeMiNT uses disk caching, and different filesystem driver.

4) Plain EmuTOS in monochrome (no SAGA) EXTERNAL LINK  Of course, ultimate test is without SAGA at all. So I ran Kronos with normal Amiga display, without SAGA. Please note that, at bootup, the Vampire card displays a Vampire logo in 640x480, and that one stays displayed on the DIGITAL-VIDEO output until software uses SAGA by itself. If I understand correctly, SAGA still runs behind the scenes in this case. Video mode used to display the logo might be lower than 16-bit, though.
Result: +25% CPU boost!
soft-FPU and OpenGL result are identical. I hardly understand how this is possible, as CPU is supposed to be significantly faster.
VDI (graphics) is much lower, this is because the EmuTOS VDI is used in that case (instead of fVDI + SAGA driver), and bitplane manipulation is more complicated for the CPU than TrueColor.
Disk result is still bad due to EmuTOS vs FreeMiNT.
Memory and video bus speed seems to be much worse... but it is not! This is because Kronos mixes the results of 2 different tests. See details: EXTERNAL LINK  Beware, on this detail screen "Your computer" is the second bar starting from top, while reference is the first bar.
We can see that STRam (= Chip RAM) performance is identical.
On the other hand, video results are very bad. They actually use VDI functions! So it is actually a VDI (graphics API) test, not raw memory performance.

My conclusion:
Everything looks good, with expected performance. We can go ahead with other stuff.


Gunnar von Boehn
(Apollo Team Member)
Posts 6214
06 Mar 2017 22:47


Dear Vincent,

very nice result.
Impressive as always!

This looks not bad either

BTW the recent cores (pre-GOLD3) do also support planar modes over SAGA.
This means you can run ATARI style monochrome, or 2bit planar / 4bit planar also over DIGITAL-VIDEO. If you like :-)

And you can freely program their resolution so not only 640x480 but also 1024x768 or what size you want to have.

I look forward to see in the future Kronos scored of a x13 BLACK VAMPIRE core. :-)



Vincent Rivière

Posts 87
06 Mar 2017 23:08


Last benchmark: Vampire V2 GOLD2 vs CT60 soft-float (no hardware FPU)
  Olivier Landemarre sent me the reference file from a CT60 user. That CT60 runs 68060 at 100 MHz, which is considered as one of the faster Atari machines (if not the fastest).
 
  General: EXTERNAL LINK  CPU: EXTERNAL LINK  soft-FPU: EXTERNAL LINK  Memory: EXTERNAL LINK  soft-FPU OpenGL: EXTERNAL LINK 
  Comments:
 
  - Memory: Much better performance on Vampire. Can be seen on CPU details. This can't be really seen in memory details, are only ST-RAM and video API are tested, which is irrelevant here.
 
  - CPU, including soft-FPU tests: Globally, Vampire runs at 80% of 68060 speed.
 
  Conclusion: This confirms again that Vampire has very high FastRAM performance. But regarding to the CPU, is is only 80% of the power of the fast 68060 @ 100 MHz. Even if performance is very good, it is not a 68060 killer.


Gunnar von Boehn
(Apollo Team Member)
Posts 6214
07 Mar 2017 06:02


Vincent Rivière wrote:

    Conclusion: This confirms again that Vampire has very high FastRAM performance. But regarding to the CPU, is is only 80% of the power of the fast 68060 @ 100 MHz. Even if performance is very good, it is not a 68060 killer.
   

   
Hmm, your results look a little ODD.
1st At what clock do you run your 68080?
2nd Did you enable SuperScalar on the 68080? What value is the PCR?
3rd What is your CACR setting?
4th The Disk IO is surprisingly low? Did you enable Fast-IDE?
5th What Screen resolution do you run in parallel?
6th Do you use VBR in chip or Fastmem?


Gunnar von Boehn
(Apollo Team Member)
Posts 6214
07 Mar 2017 07:04


Vincent Rivière wrote:

  But regarding to the CPU, is is only 80% of the power of the fast 68060 @ 100 MHz.
 

 
Let me clarify this. :)
 
Both 68060 and 68080 are internally very very similar.
The 68060 and the 68080 have nearly the same internal pipeline architecture.
 
But there are some differences.
The 68060 can "eat" up to 4 Bytes of instruction per clock.
The 68080 can "eat" up to 16 Bytes of instruction per clock.
 
This means a 68060 can execute continously code
like these 2 instruction in parallel. As they are together 4 Bytes.
ADDq.L #1,D0
ADDq.L #1,D1
 
Instruction like this
ADD.L #$11111,D0
ADD.L #$11111,D1
Which is 6+6 =12 bytes would run slow on the 68060, but run fast on 68080.
 
The 68060 can process 1 to 2 instructions per cycle.
The 68080 can process 1 to 4 instructions per cycle.
 
The 2nd pipeline on the 68060 was limited and could execute not all instruction in it. The 68080 removes some of these limitations and can execute more instructions in parallel.
 
The 68080 is very similar to the 68060 but generally an evolution of the 68060. Many things where improved in the 68080.
For example RTS is much faster, misaligned Cache reads are free, misaligned JUMPs/LOOPs have no penalty anymore, certain branches are never mispredected, DBRA knows when it will end the loop.
 
Of course comparing the core does depend very much on the code executed.
If for example you exxecute a code sequence like:

  ADDQ.l #1,D0
  ADDQ.l #1,D0
  ADDQ.l #1,D0
  ADDQ.l #1,D0

This this code is simple but fully sequential and will run with the same IPC on both 68060 and 68080.
 
Then your result will not be influenced by any core improvements but only by the clockrate.
 
Regarding clock rate the normal Vampire is shipped as x11=78 MHz.
The Black Vampire are tested to run x13=93 MHz.

The 68060 is generally a very good CPU.
The 68080 is from design an improved 68060 core.
Where the 68060 did had some design limitations, the 68080 tries to remove most of those limitations.

A huge difference between the 68060 and 68080 is the AMMX instruction enhancement.
The 68080 does not only support BYTE/WORD/LONG operations but also 64bit and 128bit operations.
If you are able to use these AMMX operations
E.g. in Video driver or in a game then you can expext up to 10 times speed up compared to old 68k code.


Markus (mfro)

Posts 99
07 Mar 2017 08:00


Gunnar von Boehn wrote:

 
Vincent Rivière wrote:

    But regarding to the CPU, is is only 80% of the power of the fast 68060 @ 100 MHz.
   

    ... This this code is simple but fully sequential and will run with the same IPC on both 68060 and 68080.
   
  Then your result will not be influenced by any core improvements but only by the clockrate...

 
  in other words: you'll either need handcrafted assembler code, a smarter compiler that does instruction scheduling based on 68080 capabilities or a core that does out-of-order execution to unleash the full power of the Apollo CPU?
 
  In the interest of fairness: while I'm sure Vincent's numbers are correct, I'd consider the comparison isn't exactly fair to the Apollo CPU.
 
  Apparently, these numbers are from a CT60 Falcon tuned to its limits. Although it's not clear if the Falcon base has been accelerated (some boards are accelerated from 16 to 25 MHz on the Falcon side), it's obvious there is a Radeon card inside. This not only greatly improves video performance (as the drivers use accelerated video), but also ST RAM performance (since the drivers can then reduce bus load caused by the Falcon's original video circuit on ST RAM to an absolute minimum since it's not in use). I'd also assume that video memory performance is measured against the PCI memory at least to a certain extent.
  I'd expect Kronos cannot fully disable interrupts during CPU performance tests (as this would also stop timers needed to do measurements), you will always have ST RAM accesses (as this is where the exception vectors reside which are probably used more frequently in TOS than AmigaOS) influencing CPU numbers.
 
  Nevertheless exciting to see how a 15 years old design is still competitive.

[edit: Vincent's CPU numbers suddenly disappeared? At least for me, the corresponding link goes nowhere]
 


Gunnar von Boehn
(Apollo Team Member)
Posts 6214
07 Mar 2017 08:42


Markus (mfro) wrote:

in other words: you'll either need handcrafted assembler code, a smarter compiler that does instruction scheduling based on 68080 capabilities or a core that does out-of-order execution to unleash the full power of the Apollo CPU?

 
Actually I think its the other way around. :-)
 
68K instruction support variable length.
There are very short instructions, shortest are 2 Byte long.
But also 4Byte, 6Byte, 8Byte, 10Byte ... in fact instructions can even be over 20 Byte long.
 
So while the 68K code has naturally long and very long instructions - the 68060 has a limitation that it can "eat" max 4 Byte per clock and this for 2 pipes. This means for best speed on 6860 a coder needs to try to only use 2 Byte instructions. If longer instructions are used the 68060 will loose performance.
 
This means 68060 will only reach a good IPC which carefully handcrafted code, using only the shortest 2 Byte instructions.
 
This means code using longer instruction, this includes "normal" compiler generated 68k code will of course not run optimally on 68060.
 
But the 68080 does fix this.
This means "normal" or "average" or typical 68040 generated code will out of the box run better on 68080.
 
So its the other way arond.
In average all code should run better on 68080 than on 68060.
Only carefully handcrafted code which was tuned to reach best IPC for 68060 will run the the same on 68080.
 
Also the 68060 has several limitations that for example it can not execute SWAP in 2nd pipe. This and several other limitations are fixed. This means with average code, also normal compiler generated code - the 68080 will reach out of the box a higher IPC then 68060.

So generally coding for the 68080 or generating code for it became easier than coding for the 68060.

I think the improvements / fixes we added to the 68080 are the same that Motorola also would have improved the 68060  if they would have continued on it.

posts 137page  1 2 3 4 5 6 7