APOLLO CPU Knowledge Forum

Overview

Features

Welcome to the Apollo Forum

This forum is for people interested in the APOLLO CPU.
Please read the forum usage manual.
Please visit our Apollo-Discord Server for support.

All Topics

News

Performance

Games

Demos

Apollo

Vampire

AROS

Workbench

ATARI

Releases

Information about the Apollo CPU and FPU.

	page 1 2 3 4 5 6 7 8 9 10 11

Roman S.

Posts 149
05 May 2017 19:24

Gunnar - having the FPU stripped down to 64 bit to save some FPGA space is probably acceptable; UAE provides 80-bit FPU emulation only since very recently, there are very few problems if the precision is 64-bit only. Also stripping down the FPU to 68060 compatibility (no trygonomertics) wouldn't be much of a problem if we could use the 68060.library.

But having the Amiga able to run RiVA really fast, but at the same time being unable to run FFMPEG, Quake, full quality MP3 codecs, struggling with Real3D, VistaPro, Lightwave, is... well, not that fun.

Gunnar von Boehn
(Apollo Team Member)
Posts 6214
05 May 2017 20:12

Roman S. wrote:

full quality MP3 codecs,

MP3 is integer code.

M Rickan

Posts 177
05 May 2017 20:28

Kolbjørn Barmen wrote:

quite a few of us never really used it much for games as we used it for various kinds of arts and production...

I can't speak for Gunnar but it sounds like he just encouraged those wishing to prioritize the implementation of the FPU to identify an app that should both test and showcase the technology.

Sounds like a challenge to take some initiative.

Andrew Copland

Posts 113
05 May 2017 20:44

Gunnar von Boehn wrote:

Andrew Copland wrote:

Gunnar I'm glad to see that what you've got is already quite capable, much like the AMMX stuff though, if you don't release it then it can't be used.

I think my points it not clear to understand.
The old 68k FPU does calc everything in 80bit.

Does anything on AMIGA software needs this? NO!
Does any other software in the world need this? NO!
Other Architectures like IBM machines can not even do this.

My apologies Gunnar, I thought you had the split 32-bit | 64-bit | 80-bit split FPU with only 64-bit operations extended to 80-bit for exponent.

In fact I thought you guys started out with a plain 32-bit FPU originally.

Most FPU's I've worked with operate at different speed (cycles) depending on the instruction. So 32-bit operations used to be fastest, then 64-bit and ... well I've never used extended for anything but I assume it was even slower :)

So you do _everything_ at full 80-bit format and don't have a split implementation?

Rob M

Posts 60
05 May 2017 20:54

Gunnar von Boehn wrote:

Roman S. wrote:

full quality MP3 codecs,

MP3 is integer code.

mpega.library comes in integer and FPU versions. The author claims that the FPU code is more accurate.


Crow Mohikan Posts 78 05 May 2017 21:40	Guys. Maybe you will agree to me but a lot of people waiting for Vampire1200 and vampire3000/4000. We have no vampire card yet. Firstly card be applied for all amigas then team can focus to improve core for fpu or whatever.

Gunnar von Boehn
(Apollo Team Member)
Posts 6214
05 May 2017 22:50

Andrew Copland wrote:

So you do _everything_ at full 80-bit format and don't have a split implementation?

The original 68k FPU has 80 bit registers
The original FPU does expand every input to 80bit and it always does all calculations in 80bit.
Our current FPU design does work 100% the same.
So we also work in 80bit like the original.

But to be honest, No software depends on 80bit accuracy.
So it makes sense that today all architecture go for 64bit max and not one does 80bit anymore...

Our FPU has a significant latency or course.
You as coder will need to handle this.

We are aware that future FPGA that we plan to usenext year will allow us to reach GIGAFLOPS! so yes really 1000 MegaFLOPS
But we can not have this speed and also do 80bits!

So this is the dilemma. We have to now make a decision.
In reality the existing software on AMIGA uses mostly SINGLE.
That we internally calc it in EXTENDED is senseless overkill.

Shall we continue in the future to do this overkill - even if no software needs this?

Or shall we rather take advantage of the super FPUs that next-gen FPGA will otherwise give us..

Daniel Sevo

Posts 299
05 May 2017 23:59

Ok, my 2 cents..
First, lets just look at this upcoming FPGA generation split (that Gunnar mentions for a moment). If future Vampires will use new more powerful FPGA, it will not change the situation for the thousands sold until now. Those people will not benefit from any future GFLOPS architecture anyway as I understand it will not be suitable for current Cyclone III so for that userbase, it is not a dilemma of choosing between FPUs its a dilemma about getting one at all.

Lets remember that a lot of users already have overclocked 060s (me included) and unless current gen vampires get an FPU they will never truly replace the 060 equipped Amigas. Yes it will play movies in 640x360 but in 95% of games Vampire will not offer a useful speed advantage over 060 in regular games and the really advanced games (Quake I, Quake II etc) that really benefited from a fast 060 will not run at all..
Same goes for a lot of the productivity software (and far from all are open source so that they could be patched or recompiled). (Although using any Amiga for 3d rendering in this day and age is pure madness, I don't feel sorry for you guys :-)

So maybe one way to look at is to say that very few apps really need/use FPU so its poor usage of resources to put it there.
Another way to look at is it to say that the design will eventually run out of space in current FPGA so a split is inevitable at some point in the future. Let this one be the true 68080 (not 68EC080) that replaces the 060 once and for all for all current gen Amiga software. And let us worry about what might come in the future when the Apollo core has grown into a seriously capable and modern CPU no longer constrained by budget FPGAs.

Also.. I remember my brother buying a Pentium for the equivalent of �2000 in 1996 to play Quake as the 486DX was dead slow. ;-)
One game /does/ make people upgrade their hardware. ;-)


Gunnar von Boehn (Apollo Team Member) Posts 6214 06 May 2017 00:06	Daniel, thanks for the post, but can you pleasew help me understand what was you point? Quake uses 32bit Floats - it makes no difference in Quake whether you calculate them in 32bit or 64bit or 80bit... Using for them more than 32bit will only make them slower and fill up our FPGA quicker leaving no room for anything else you might like to have...


Kolbj�rn Barmen (Needs Verification) Posts 219/ 2 06 May 2017 01:14	So, is a list of software using FPU something that the team wants? All the constant talk about Quake is tedious, I am starting to really dislike that game.

M Rickan

Posts 177
06 May 2017 04:30

Kolbjørn Barmen wrote:

So, is a list of software using FPU something that the team wants? All the constant talk about Quake is tedious, I am starting to really dislike that game.

I think the point moreover, would be to identify a standout application that would offer a popular proof of concept that would bring perceived value to Vampire users.

Games may not be your thing, but they are the most popular way to showcase the use of an FPU.

It only takes one application to get the ball rolling...

Marcus Sackrow

Posts 37
06 May 2017 08:26

Just my 2 cents:

Seems most of the people do not know about what they are talking.
I'm a Scientific Software Developer, that means floating point calculations are my "bread and butter".

Single/Float (32Bit) is for most of the calculations usually enough.
32Bit float as 6 to 7 significant digits that means:
Imaging for a 3D Scene 1-9 km wide you still can describe something with 1 mm precision without and tricks thats more than enough. Bigger scenes you make tricks because you do not keep the whole scene in memory, so you only care the 1km around you and the far away things don't have to be so precise anymore

And thats a single 32bit float.. if you take a double it becomes really crazy stuff, double (64 bit) has 15-16 significant digits that means
a 1.000 km wide area you can make precise down to 1 pm (10^-12 m) or 1.000.000 km (remember circumfence of Earths 25.000 km ) down to 1 nm (10^-9 m, remember wavelength of light around 500 nm).
(Thats the reason I need double precision for my Mapparium program, the coordinates on the earth shelf needs to be rather high precision, higher than single, but still far away from reaching double precision)

There are very few occasion where you need more than double.. and to be honest in that cases the 80 bit extended is not enough and you need to develop 128bit and 256bit (google "Octuple-precision floating-point" ;)) math

Did you notice something odd on your x86_64 PC in contrast to your i386-pc? in terms of floating point calculations? No? Good, that means you do not need 80bit floats.
AMD/Intel marked the FPU as deprecated for x86_64 and all the 64 bit programs you are using, do not use the FPU at all, because it's very slow and old. All new floating point calculations on Intel/AND Processors are done with SSE2/SSE3 which does not have a extended float mode (80 bit) only single (32 bit) and double (64 bit). Even worse the SSE2/SSE3 floating point calculations does not have any SIN/CoSin/EXP and so on (as the FPU had) but Intel and AMD give out special routines (Google "Intel Approximate Math") these routines are very long (about a page assembler code) but much faster than the one line FPU.

@Gunnar:
Just curious: a Gigaflop FPU? you are having a FPGA running with GHz and more? :-O or you can do 10 fpu commands in one cycle (100 MHz + 10 FPU commands/cycle = 1GFlops)? :O I'm a little surprised. But if so I'm very happy.

Thierry Atheist

Posts 644
06 May 2017 08:40

Gunnar von Boehn wrote:

d) You spend the time and money for a compatible hardware FPU - but also go the effort to make it compatible with future FPGA technologie and go the effort to make it that fast that you can enable software which was never been able to do on AMIGA land.

We are clearly in favour of option D)

Gunnar offers the best options... I'm for option "d)". :-)

Thierry Atheist

Posts 644
06 May 2017 08:51

Andrew Copland wrote:

So you do _everything_ at full 80-bit format and don't have a split implementation?

Well, ASIC are like ROM, set in stone.

While FPGAs are defined at time of boot-up, so 32, 64, 80 or even 96 bit can probably execute in the same amount of cycles? Only limit is the amount of bits retrieved from RAM per fetch. Is your CPU 32 bit, 64 bit, or "other", externally on the motherboard?

Aksel Andersen

Posts 120
06 May 2017 09:24

My two cents.

For my v500+ to be the complete and ultimate all-in-one amiga package i also need backwards compatible fpu and mmu.

First then i can retire my a4000. (which makes me a little bit sad ;)

Dont get me wrong. The work done by the team is awesome and we all appreciate it.

<child mode> But you all promised us FPU!!!!</child mode>

cheers! :)

Saladriel Amrael

Posts 166
06 May 2017 09:48

Not an Amiga developer myself but... I'm a programmer since I remember, so here it is my 2 cents.

People is asking for a FPU in order to run their favourite software.
But... demanding for FPU is demanding for HARDWARE compatibiluty, while what seems to matter is SOFTWARE compatiblity.
Now, Gunnar stated more than once that developing a full 68k compatible FPU would waste precious FPGA space and would be not as effective as people would expect and, reading all the tech and info about it two or three posts above, I can understand why.
And I have no reason to not trust Gunnar and the rest of Apollo team.

That said, is there the possibility to write a library that converts FPU calls into CPU calls (or does it already exists)? That would make any FPU requiring software run even without it, and I bet it would be faster than most 68082/040 HW out there. Maybe not faster than 060 but this can be acceptable: Vampire owners will have a super fast Integer performance (90% of Amiga Software), super fast GFX performance (100% of Amiga Software), lots of really fast RAM, very fast IDE and the equivalent of a 060FPU. That seems more than fair to me.

Returning to Apollo FPU: I clearly understand the fact that 80bit float is insane, and it can be understood by everyone reading the well written post above, even Intel and AMD deprecated it.
So, if the choice Apollo team has to made is to follow or not the same route of Intel and AMD (scrap the FPU for something that works very fast with 32/64 bit operations) let's do it, it has proven to work very well. I bet the Amiga user base will adapt to it once it will be real and software will be recompiled accordingly, libraryes will be written, compilers will be built.
Just look at what is happening with SAGA: Cannonball, Rick Dangerous and other games I can't recall right now are ported on a WIP gfx system just for passion's sake. I bet it will happen also with whatever idea you came up with your "FPU".
How I know it? Because you've always did an excellent job despite all and because I know the passion and love Amiga community has inside.

Saladriel Amrael

Posts 166
06 May 2017 09:55

Marcus Sackrow wrote:

Just my 2 cents:

Seems most of the people do not know about what they are talking.
I'm a Scientific Software Developer, that means floating point calculations are my "bread and butter".

Single/Float (32Bit) is for most of the calculations usually enough.
32Bit float as 6 to 7 significant digits that means:
Imaging for a 3D Scene 1-9 km wide you still can describe something with 1 mm precision without and tricks thats more than enough. Bigger scenes you make tricks because you do not keep the whole scene in memory, so you only care the 1km around you and the far away things don't have to be so precise anymore

And thats a single 32bit float.. if you take a double it becomes really crazy stuff, double (64 bit) has 15-16 significant digits that means
a 1.000 km wide area you can make precise down to 1 pm (10^-12 m) or 1.000.000 km (remember circumfence of Earths 25.000 km ) down to 1 nm (10^-9 m, remember wavelength of light around 500 nm).
(Thats the reason I need double precision for my Mapparium program, the coordinates on the earth shelf needs to be rather high precision, higher than single, but still far away from reaching double precision)

There are very few occasion where you need more than double.. and to be honest in that cases the 80 bit extended is not enough and you need to develop 128bit and 256bit (google "Octuple-precision floating-point" ;)) math

Did you notice something odd on your x86_64 PC in contrast to your i386-pc? in terms of floating point calculations? No? Good, that means you do not need 80bit floats.
AMD/Intel marked the FPU as deprecated for x86_64 and all the 64 bit programs you are using, do not use the FPU at all, because it's very slow and old. All new floating point calculations on Intel/AND Processors are done with SSE2/SSE3 which does not have a extended float mode (80 bit) only single (32 bit) and double (64 bit). Even worse the SSE2/SSE3 floating point calculations does not have any SIN/CoSin/EXP and so on (as the FPU had) but Intel and AMD give out special routines (Google "Intel Approximate Math") these routines are very long (about a page assembler code) but much faster than the one line FPU.

@Gunnar:
Just curious: a Gigaflop FPU? you are having a FPGA running with GHz and more? :-O or you can do 10 fpu commands in one cycle (100 MHz + 10 FPU commands/cycle = 1GFlops)? :O I'm a little surprised. But if so I'm very happy.

Fully agree: I worked 20 years in CAD/CAM development and we never ever had to use 64Bit precision, that would have been overkill.
32Bit is just enough to have 0.1mm precision on a 20m long drawing, 2D or 3D that is. I do not know much about RayTracing but i have reasons to believe that things do not change much.


Nixus Minimax Posts 416 06 May 2017 14:11	@Marcus Sackrow: I think some people on a1k could need some of your wisdom on this subject... :)


Crow Mohikan Posts 78 06 May 2017 14:21	Like acopro.lha on aminet.

Gunnar von Boehn
(Apollo Team Member)
Posts 6214
06 May 2017 15:00

Marcus Sackrow wrote:

Excellent Post!!
Yes GigaFlop is really no problem.
SuperScalar with 2 instructions per cycle would be the architecture first goal. But this is all a little too early to go into this now.

What would make real sense now.
Is to create a real good usecase.

This usecase should have the following features:

a) not too overly complex to be easy to profile

b) written in ASM

c) not completely artificial and useless but something sensible

d) Very FPU heavy to really be able to use 100 MFLOPS.
mostly FADD / FMUL

e) Written with high paralism in mind.
To be able to fully use Super Scalar FPU.
Our FPU has a LATENCY of several cycle.
So the test needs be able to work well with this.

posts 206	page 1 2 3 4 5 6 7 8 9 10 11