APOLLO CPU Knowledge Forum

Overview

Features

Welcome to the Apollo Forum

This forum is for people interested in the APOLLO CPU.
Please read the forum usage manual.
Please visit our Apollo-Discord Server for support.

All Topics

News

Performance

Games

Demos

Apollo

Vampire

AROS

Workbench

ATARI

Releases

Information about the Apollo CPU and FPU.

Polygon Pushing Performance of the 080	page 1 2 3 4 5 6 7 8 9


Steve Ferrell Posts 424 09 Oct 2018 22:27	@Markus B I don't think he understands how FPGAs work. Why advocate the design and inclusion of a virtual and dedicated 3D GPU co-processor into the existing core? The glue logic alone to connect this virtual GPU to the rest of the system would make it impractical and much slower than just adding 3D functions/instructions to the existing core.

Andy Hearn

Posts 374
09 Oct 2018 23:19

ok just for the heck of it
Vampire V500 2.10core @x12, r53coffin+Wazp3D

Glquake 320x240x8bit 2.4fps
all lighting and transparency effects fine, perspective texture correction got a bit wonky in places

Quake 320x240x8bit 20.5fps

and cow3D sub 1fps. even with the "b" option - at least it reported that it had created a 8.4Meg buffer - I didn't see that for either the Virge or Permedia2 testing.


Louis Dias (Needs Verification) Posts 55/ 1 10 Oct 2018 00:58	@Markus - thank you - I'm glad someone gets why I mentioned AKIKO to begin with. @Andy Hearn - those stats are kind of meaningless unless you also know how many texture-mapped and shaded polygons are being rendered per second.

Steve Ferrell

Posts 424
10 Oct 2018 01:29

@ Louis Dias

We "get" why you mentioned it but you're not "getting" it when it comes to modern FPGA core design. It makes no sense to design a discrete, virtual 3D GPU and add it to the existing core because the glue logic would eat up far too many gates for it to be fast OR efficient.

Back in 1985, adding co-processor chips to boards was all the rage, but it's 2018 and there's no need for it now. Intel and AMD have both moved to consolidate their CPU/FPU/IGPU's for obvious engineering and performance reasons. It eliminates memory bottle necks, wait states, bus contention and tons of glue logic. It also shrinks dies and lessens power consumption and heat. Many of these gains can be realized in FPGA systems as well by NOT engineering like it's 1985.


Markus B Posts 209 10 Oct 2018 07:19	Is this the point? Compare it to the GPUs included into the CPUs. They have hundreds of shaders, they work independent from the CPU massively parallel. I think this is idea. To have multiple from the CPU independent AMMX units.

Daniel Sevo

Posts 299
10 Oct 2018 08:10

I think a meaningful question would be "how can the current AMMX implementartion be modified further to maximize 3d performance relevant for the software and 3d APIs we currently have available for the 68k Amiga.
I think Thellier Alain has already pointed out a couple of things that would speed up an AMMX version of Wazp3d.

Whatever would be nice to have, we need to remember we are working with mid to late-ish 90s performance levels bothe CPU and graphics-wise.

Should the team in the future switch to a new level FPGA maybe it would become meaningful to start scaling things up to reach Pentium II 300MHz + nvidia TNT performance but that is not going to happen on the current platfrom so its best to stay realistic and constructive.

To stick with the right era, its interesting and fun to look at how 3d0, Sega Saturn etc managed to do 3d graphics back in the day. They did not have dedicated 3d hardware in the sense that Playstation1 and Nintendo 64 had yet they managed to squeeze out decent 3d at the time.


Markus B Posts 209 10 Oct 2018 09:35	Speculative, maybe a dedicated AMMX unit is rather small within the FPGA. So parallel execution is a possible speedup. That's how every current GPU works, isn't it?

Ronnie Beck
(Apollo Team Member)
Posts 199
10 Oct 2018 09:49

Steve Ferrell wrote:

It makes sense to design a discrete GPU because a GPU has some fundamental differences to a CPU. Firstly, GPUs are purpose built for graphics with instructions that target exactly that. CPUs must deal with more generic computational needs. Optimising your pipeline because the workload is specific almost certainly becomes easier and yields performance gains. Secondly, GPUs yield higher performance through massive parallelism. CPUs are typically limited in core count (the Vampire, like all Amiga 68k accelerators, has just one core). In stark contrast, the more recent top-end Nvida GPUs have several thousand cores. It isn't hard to guess who wins the pixel pushing competition there. The GPU!

A key advantage an FPGA has, and which lends itself very well to a discrete GPU, is that multiple operations can be executed with every tick of the clock because the logic units are discrete and independent from each other. A CPU-only solution will always be limited by the efficiency of the pipeline, clock speed and memory bandwidth. The good news for Vampire owners here is that the Apollo Core has a highly modern and optimised core and the memory controller which is second-to-none in the Amiga world. I would imagine a dedicated GPU could work quite well.

Steve Ferrell wrote:

Intel and AMD have both moved to consolidate their CPU/FPU/IGPU's for obvious engineering and performance reasons. It eliminates memory bottle necks, wait states, bus contention and tons of glue logic. It also shrinks dies and lessens power consumption and heat. Many of these gains can be realized in FPGA systems as well by NOT engineering like it's 1985.

This is an imaginative explanation for integrated GPUs but sadly the statements are axiomatically incorrect and factually inaccurate. The GPUs integrated are discrete and not simply an extended instruction set. It also flies in the face of benchmarks which show that best discrete GPU will ALWAYS beat the fastest CPU with integrated gfx. This isn't a controversial statement. A quick google and one can see that for oneself. High performance GPUs alleviate bus/memory bottle necks by having their own memory and buses in addition to the systems native buses. You can figure out on your own why having a dedicated memory and bus is better than sharing a bus/memory with the cpu.

AMD and NVIDA still produce discrete chips for GPU in addition to their integrated solutions and it is a core part of their business. Why? Performance obviously! NVIDIA GFX Card shortages in recent months were attributed to crypto-currency mining companies buying these card because of the performance benefits of a GPUs massive parallelism. It has that over what even the best Intel CPUs can deliver.

But I would conjecture that integrated GPU+CPU solutions has more to do with reducing manufacturing costs than performance (i.e. the cost of designing and manufacturing computers for two chips, CPU and also GPU, would be higher than the cost of designing and manufacturing for a single chip, CPU+GPU. When you consider the support electronics each chip needs, you consume less space on the mother board with an all-in-one solution which is crucial for laptops and small form factors PCs). There would be for sure engineering advantages of having an integrated solution, such as the CPU can have direct pathways to the GPU without the need to use an external (and slower?) bus. But these will never scale to compete with NVIDIAs dedicated gfx card chips, for example.

That said, some part of the 3D graphics drawing will need to be done by the CPU because the software, which runs on that cpu, must decide what should be drawn. Therefore it helps to have CPU instructions optimised for this. So you are correct to say that having CPU instructions for this purpose is a bonus. Hence (A)MMX and 3DNow.

Steve Ferrell wrote:

Back in 1985, adding co-processor chips to boards was all the rage, but it's 2018........

And in 2018 we have a separate chip for audio, SATA, gfx, USB.......it is all multi-chip. The design and implementation is totally different but clearly discrete logic is still "all the rage".

To bring this a little closer to the reality of the Vampire cards. What is effectively possible is constrained by the number of logic units of the FPGA. You would be correct if you had simply stated that a discrete GPU would require lots of logic gates and there simply might not be enough on even the V4 for this. But we can only speculate here because don't know how many gates the final version of Gold3 will consume. A member of the Apollo Team will likely have a good estimate though.

Still this is an interesting topic.

Samuel Crow

Posts 424
10 Oct 2018 10:40

Re:APU vs. Discrete GPU

Keep in mind that the Vampire stand-alone is supposed to compete with a RasPi more than an i9 or Ryzen Threadripper.

I realize that a discrete GPU will yield more performance but let's walk before we run. The discrete GPU discussion can wait.

To those that followed the GitHub link on the Vulkan software implementation thread, it's being rewritten in Rust: a programming language well suited to parallel design and memory checking at compile-time. I think the runtimes for Rust would be better use of the development time than a hardware-only implementation due to the sheer number of software developers vs. 2 or 3 guys familiar with VHDL on an FPGA.

Ronnie Beck
(Apollo Team Member)
Posts 199
10 Oct 2018 12:12

Samuel Crow wrote:

To those that followed the GitHub link on the Vulkan software implementation thread, it's being rewritten in Rust: a programming language well suited to parallel design and memory checking at compile-time. I think the runtimes for Rust would be better use of the development time than a hardware-only implementation due to the sheer number of software developers vs. 2 or 3 guys familiar with VHDL on an FPGA.

The core benefit of RUST is safe concurrent programming, which is a challenge on pre-emptive multitasking environments and/or multiple core systems. But the real benefit of concurrent programming is being able to utilising multiple processing pipelines. That is severely limited on a single core CPU. I seriously doubt that the efforts of porting RUST to the Amiga (or is google not telling me about an existing implementation?), then getting the RUST implementation of Vulkan running on the Amiga would be easier than some VHDL programmed dedicated logic. If one wanted to take a purely software route, then assembler code optimised for the Apollo Core (as already suggested in this thread) would yield much faster results. And I would bet that there are more ASM amiga programmers than RUST programmers in the Vampire scene.

I personally would bet on assembler optimised for the Apollo Core than a RUST based approach because it's a possibility that already exists in hardware and is largely untapped. We have the compilers/assemblers already. We have the instructions in hardware already. And there are people already programming with them.


Markus B Posts 209 10 Oct 2018 12:32	I am not saying to really mess with the problems of SMP on Amiga. But I'm curious if AMMX units could serve as co-processors. So, single core 68080 with multiple AMMX units. Then, split the work across them. It depends on the size of those units and how they can fed with work, I assume.

Gerardo G.

Posts 54
10 Oct 2018 13:35

I think it's much better to work on AMMX implementations of the Amiga's 3d APIs and libraries, than create a full independent 3d core. FSAA, bilineal filtering, DOF, shaders... Not really needed and waste of power and effort in my pov.

I think with AMMX it is possible to get a decent 'close to Matrox Mystique' result too, including perspective correction and may be a bit better lighting and shadow's rendering.

On future it would be possible to get better results with higher clock rates, better FPGAS or SOCs with a full Apollo core implementation and/or parallel execution units.

Years ago I found a very fast, software only, OpenGL ES implementation: Vincent 3D.

EXTERNAL LINK
I'm sure it's old and outdated, but maybe it has few interesting pieces of code.


Louis Dias (Needs Verification) Posts 55/ 1 10 Oct 2018 15:03	Gerado wrote: "I think it's much better to work on AMMX implementations of the Amiga's 3d APIs and libraries, than create a full independent 3d core. FSAA, bilineal filtering, DOF, shaders... Not really needed and waste of power and effort in my pov." Oh - 'the Amiga' has a standard 3d api? Where?

Samuel Crow

Posts 424
10 Oct 2018 15:08

Ugh! Where do I start?

The Apollo core has 2 threads per core with the second used for Blitter emulation currently. Adding 3D drivers to it would be almost ideal.

Runtime libraries are typically written in Assembly. Even though Rust is tricky, we'd only have to profile existing code to look for portions that would be handy to implement in hardware.

Adding another 68080 in parallel with 2 more threads can fit better in the Vampire v4 than a discrete GPU.


Thellier Alain Posts 141 10 Oct 2018 15:19	>Amiga has a standard 3d api? Where? Warp3D Warp3D v4 on Classic Amigas 68k & ppc Warp3D v5 on NG Amigas (v5 add multitexturing) Goa on Morphos (is also a Warp3D v4 implementation) Wazp3D on Aros (is also a Warp3D v4 implementation that have hardware 3D support via Mesa3D)

Gerardo G.

Posts 54
10 Oct 2018 16:23

thellier alain wrote:

>Amiga has a standard 3d api? Where?
Warp3D

Warp3D v4 on Classic Amigas 68k & ppc
Warp3D v5 on NG Amigas (v5 add multitexturing)
Goa on Morphos (is also a Warp3D v4 implementation)
Wazp3D on Aros (is also a Warp3D v4 implementation that have hardware 3D support via Mesa3D)

It would be great to see your Wazp3D running on 080, Alain :)


Louis Dias (Needs Verification) Posts 55/ 1 10 Oct 2018 16:35	I guess I should rephrase... What standard HARDWARE does this 3d api target? It seems to me that the Vampire is about setting new and higher standards... I don't want to have to dumpster-dive for 20 year old graphics cards that would still be limited by an archaic bus...

Ronnie Beck
(Apollo Team Member)
Posts 199
10 Oct 2018 17:27

Samuel Crow wrote:

Ugh! Where do I start?

From the beginning. Always from the beginning. :-)

Samuel Crow wrote:

The Apollo core has 2 threads per core with the second used for Blitter emulation currently. Adding 3D drivers to it would be almost ideal.

"It" being what exactly?

Samuel Crow wrote:

Runtime libraries are typically written in Assembly. Even though Rust is tricky, we'd only have to profile existing code to look for portions that would be handy to implement in hardware.

Impement what exactly into which hardware? I have the impression you are talking about using RUST to create hardware logic, akin to this:

EXTERNAL LINK
I really like that idea to be honest. Makes much more sense than throwing away all that time on a RUST implementation for the Amiga.

Samuel Crow wrote:

Adding another 68080 in parallel with 2 more threads can fit better in the Vampire v4 than a discrete GPU.

Yes, much more pragmatic and would leverage existing working technology. This idea gets my vote.


Louis Dias (Needs Verification) Posts 55/ 1 10 Oct 2018 17:40	Ok, no one seems to like a discrete GPU, so let's look at what was available circa 1993-1996: EXTERNAL LINK Now can someone fill in numbers for the Vampire so we can compare?

Steve Ferrell

Posts 424
10 Oct 2018 18:29

Ronnie Beck wrote:

Now you're being intellectually dishonest as well as just plain incorrect. You bring up IO interface chipset controllers for USB and SATA as an example of why a GPU should be kept discrete from a CPU. I stand by my statement as to why the big chip makers have integrated GPUs/FPUs and other co-processors into their CPU designs. This thread is about GPUs and polygon pushing capabilities on an FPGA board. Not about adding external IO devices and storage.

Next you speak of optimizing pipelines. You don't optimize pipelines by adding co-processors and the buses that connect them to other devices and chips, you optimize systems by eliminating co-processors and consolidating chips/chipsets and functions.....it's called LSI or Large Scale Integration. It's been a "thing" for quite a long time now but you and a couple other folks are still living in 1985.

Of course adding a REAL, discrete GPU to the Vampire is going to provide it with more GPU performance than it has now....as it has NO discrete GPU. But this would be a ridiculous move by chaining this real GPU to a virtual CPU that can only perform at about 20% of a real CPU. And the glue logic to interface this real GPU to the rest of the system would require a much larger FPGA to hold said logic as well as a total board redesign just to accommodate the bridge interface for this real GPU.

And adding a discrete "virtual" GPU would still require more glue logic and gates than are available in the current Vampire FPGAs. It would eliminate the need for a bridge interface but not eliminate any of the other problems.

As it stands, there are 3 choices for adding 3D capabilities to a Vampire.

1. Add a discrete, REAL 3D chip or board to the Vamp. Requires a new board design and a larger FPGA to accommodate the bus logic.

2. Add a discrete, VIRTUAL 3D chip to the Vampire. Requires a much larger FPGA to accommodate the new 3D instruction set and bus logic.

3. Add 3D instructions to the existing Vampire core. May require a larger FPGA depending on how many 3D instructions can be squeezed into the existing design.

From the perspective that this is an FPGA hobby board, the only solution that even comes close to being practical from an engineering and a performance standpoint is number 3. I haven't even touched on the financial aspect.

posts 161	page 1 2 3 4 5 6 7 8 9