Overview Features Instructions Performance Forum Downloads Products OrderV4 Reseller Contact

Welcome to the Apollo Forum

This forum is for people interested in the APOLLO CPU.
Please read the forum usage manual.



All TopicsNewsPerformanceGamesDemosApolloVampireAROSWorkbenchATARIReleases
Information about the Apollo CPU and FPU.

"Fast Blitter Operations"page  1 2 3 4 

Gunnar von Boehn
(Apollo Team Member)
Posts 4840
17 Apr 2020 10:55


A1200 coder wrote:

But for AmigaOS performance, the blitter is generally no good - there is a program that patches blitter calls with CPU calls called FBlit on Aminet. It makes Workbench run clearly faster on accelerated Amigas with fast ram.

Correct.
And FBLIT changes the AMIGA OS to use the CPU only for certain Blit jobs. This means the OS will do printF() with CPU only and not use Blitter.


Olle Haerstedt

Posts 102
17 Apr 2020 23:23


A1200 coder wrote:

Olle Haerstedt wrote:

  OK, but it *could* do something in parallel, in another scenario? Since it's a help processor? Or am I getting it wrong?
 

 
  Yes, the blitter can be useful in some special cases:
  -Blitter can CLEAR (not copy data) chip ram in parallel with CPU accessing chip ram also at same time without slowdowns (useful for example for 3D-stars effect)
  -in a flight simulator blitter can update the lower screen with various meters, when cpu is calculating next frame in fast ram
  -anytime when CPU is doing some longer calculations in fast ram, you can fire up the blitter with some task
  -more ideas...
 
  But for AmigaOS performance, the blitter is generally no good - there is a program that patches blitter calls with CPU calls called FBlit on Aminet. It makes Workbench run clearly faster on accelerated Amigas with fast ram.

OK, so there might be a future use-case for a (pimped) blitter even with AMMX in 080?


Markus B

Posts 195
17 Apr 2020 23:28


No, the blitter was a helpful co-processor when there was the 68000 around. So stock A500 etc.


Gunnar von Boehn
(Apollo Team Member)
Posts 4840
18 Apr 2020 07:44


Olle Haerstedt wrote:

  OK, so there might be a future use-case for a (pimped) blitter even with AMMX in 080?
 

A lot more flexible than the Blitter is the CPU.
If you want in the future help the CPU, then the most flexible, the most powerful, and multitasking friendly solution is to add a 2nd 68080.

But you need to mind one thing:
As more components you add to coding as more complex it gets.
A game "just" using the CPU is a lot easier coded than a game using 2 CPUs and a lot easier than using the blitter.
And easy does not only mean "getting it done" but also being bug free and stable.


Olle Haerstedt

Posts 102
18 Apr 2020 08:55


Gunnar von Boehn wrote:

Olle Haerstedt wrote:

  OK, so there might be a future use-case for a (pimped) blitter even with AMMX in 080?
 

  A lot more flexible than the Blitter is the CPU.
  If you want in the future help the CPU, then the most flexible, tzhe most powerful, and multitasking friendly solution is to add a 2nd 68080.

Good point. Or a second 68000! Just kidding. ;)


Ray Couzens

Posts 79
18 Apr 2020 11:01


Gunnar von Boehn wrote:

  If you want in the future help the CPU, then the most flexible, tzhe most powerful, and multitasking friendly solution is to add a 2nd 68080.
 

 
  That would potentially give the Vampire a huge performance boost, especially if AROS was made to utilise it.  But I don't know how this might affect compatibility with older Amiga software and how easy it would be to overcome.
 
  I'm sure I'll be happy with the current V4 (when I get one) for sometime to come. It's already significantly faster than any Amiga I've ever used.
 


Samuel Crow

Posts 388
18 Apr 2020 14:53


Re:Second 68080

Sounds like a plan!  But we'll have to start using the up-to-date branches of AROS to get SMP support and that will compromise compatibility to be certain.  The only way around it I can think of is to patch QBlit() in Graphics.library to use work loops other than those specified in the Blitter's register map.  Of course a second 68080 could dequeue from the Blit list independently of the main CPU so the interrupt could be eliminated from the system completely and the associated overhead.  I just wish QBlit() was documented better so that the CPU could use the blitter interrupt more efficiently on multitasking applications.


Ray Couzens

Posts 79
18 Apr 2020 16:40


Interesting for sure, and we all like extra processing power, but I would hesitate if the Vampire moved away from being as compatible as it is now.  I like that the current V4 has all that extra speed, the enhanced CPU registers, improved audio and video, but at the same time, and correct me if I'm wrong, it can run software written for the Amiga500.

Maybe there would be a market for a second edition that offered a certain level of compatibility, say 80% (complete guess!) but had the dual 68080 processor for a more modern type of Amiga?  Not sure what people think of that concept.

All this talk of the next generation of Vampire and I haven't even got a first generation yet :-)


Olle Haerstedt

Posts 102
18 Apr 2020 16:46


Just add the option to turn the extra CPU off, to keep same level of compatibility?


Gunnar von Boehn
(Apollo Team Member)
Posts 4840
18 Apr 2020 17:01


Ray Couzens wrote:

correct me if I'm wrong, it can run software written for the Amiga500.

Yes you are right.
And talk about AROS with SMP is unserious.
Nothing like this makes any sense or is planned from our side.




Manfred Bergmann

Posts 182
18 Apr 2020 17:09


I understood that the second CPU would be used to drive graphics, and not as 'real' second CPU which would require SMP.


Ray Couzens

Posts 79
18 Apr 2020 17:24


Gunnar von Boehn wrote:

  And talk about AROS with SMP is unserious.
  Nothing like this makes any sense or is planned from our side.

I think that is good, this means we stay with a 100% Amiga machine but with it's improvements.



Stefan "Bebbo" Franke

Posts 136
18 Apr 2020 23:14


/sigh - no Amiga Thread Ripper with 128 cores...


Andrew Miller

Posts 223
18 Apr 2020 23:15


It probably wouldnt be optimal, but for compatibility, would it not be better to implement the multi core use inside the CPU rather than with the OS.
I mean that the OS just sees a 68080, but the 68080 internally runs on multi core, though this probably would look more like a single CPU core with multiple execution units within it.


Gunnar von Boehn
(Apollo Team Member)
Posts 4840
19 Apr 2020 08:14


Andrew Miller wrote:

  It probably wouldnt be optimal, but for compatibility, would it not be better to implement the multi core use inside the CPU rather than with the OS.
 

 
Yes, you are absolutely right.
For compatibility one very strong 68080 is the optimum.
 
In general one can say that also for the Programmer
its a lot easier easier to have 1 strong CPU then several weak ones.
 
A famous saying is here:
What do you prefer to pull your card?
One strong horse or hundred hens?
 

 
Its a simple fact, that coding multithreaded for many CPUs does add a lot of complexity. It does not only make it a lot harder to code, managing many CPUs also does add problems which can make programs fail, lock up, or calculate wrong. And the worst for the programmer is that such multitread problems are not showing up every time but in general such issues only show up 1/100 times which makes finding and fixing those extremely difficult.
If you look behind the curtain of today professional companies like IBM/SAP/ORACLE that develop multithreaded software then you will see the problems of this.
 

The truth is that the efforts of Software and Hardware development are contrary.
 
For a Software developer having a nice complex CPU which can "ALL" like the 68080 is best. The programmer does not need to worry about alignment, and the CPU handles all for him. The programmer does not need to worry about missing instructions or missing operations - the CPUs provides them.
 
For the hardware developer the situation is the opposite.
For a hardware developer developing something complex and complete like 68080 is a lot more effort then developing a simple chip with reduced features - like a risc chip.
 
 
For the software developer a CPU which is high performance and is able to execute many instructions fast is magnitude easier to handle then a bunch of simple CPUs which each is slow but all together have the same speed.
 
And again for the hardware developer its much easier to make add more power by adding several or existing CPUs together instead to "invent" ideas to make one CPU stronger.
 
 
In the recent years we can see that often hardware development went the easy route for the hardware developers and made live more complex for Software developers.


Olle Haerstedt

Posts 102
19 Apr 2020 09:46


One good example of what Gunnar is talking about is the CELL CPU in Playstation. IIRC, it took game companies many years to learn to code for it, and porting games from PC to Playstation was more involving than to e.g. Xbox.
 
  Programming languages have not been able to keep up with changes in the hardware. That's why hardware still "pretends" to be a PDP-11 machine, as argued in this article: EXTERNAL LINK   
 
  > The root cause of the Spectre and Meltdown vulnerabilities was that processor architects were trying to build not just fast processors, but fast processors that expose the same abstract machine as a PDP-11. This is essential because it allows C programmers to continue in the belief that their language is close to the underlying hardware.
 
 
  The only programming language that easily scales to any number of processors is Erlang, because it uses message passing as a main abstraction, and this language is considered "esoteric".

Also, in games, there aren't logically different parts that can be split up. Usually you have game logic and the graphic pipeline. The graphic goes to GPU and game logic to CPU. If you have a physics engine, this can go to another CPU or to a GPU that supports it. If you have a deep, agent-based simulation, this can perhaps be split. But note that not even Dwarf Fortress, a game entirely based on simulation with NO graphics (basically), supports multiple CPUs last time I checked.


Gunnar von Boehn
(Apollo Team Member)
Posts 4840
19 Apr 2020 10:17


Olle Haerstedt wrote:

  That's why hardware still "pretends" to be a PDP-11 machine,
 

 
PDP-11 can be viewed as the mother of CISC Chips.
Both 68K and x86 are children of it.
Some might call 68K the nice child, and x86 the ugly one.
 
Maybe "pretending" is the not the right word.
This is like saying "todays ford car engines" pretends to be a OTTO-engine. They are based on this invention.
 
Fact is, that the PDP-11 ISA was very clever and very good.
And that is was good, is the reason why people copied it, or did developments based on it.
 


Gunnar von Boehn
(Apollo Team Member)
Posts 4840
19 Apr 2020 10:32


Olle Haerstedt wrote:

  One good example of what Gunnar is talking about is the CELL CPU in Playstation. IIRC, it took game companies many years to learn to code for it, and porting games from PC to Playstation was more involving than to e.g. Xbox.
 

 
The CELL chip is a chip combining
- 1 very high clocked but per clock weak PowerPC
- 8 Vector-Cores, (of which 6 are usable for on PS3 game coders)

The PPC core supported Hyperthreading - so looked virtually as 2 CPUs. But the hardware hyperthreading has plagued with bugs and dependancies which made using it effieciently nearly impossible.

If you wonder why the CELL used extra Vector units then
it might help to know that PS2 already had a design of 1 CPU + 2 Vector Cores.
Therefore the PS3-CELL is the logical evolution of this.
 

The chip behind the XBOX - named XENON also came from IBM,
and its actually a Chip containing 3 times the
- very high clocked but per clock weak PowerPC used in the CELL.
This means you have 6 Virtual CPUs which each is WEAK,
and your multithreading is plagued with bugs and side effect.

 
I think both Systems are not that easy at all to code for.
CELL might be more challenging to program than XENON.


Andrew Copland

Posts 113
19 Apr 2020 13:23


CELL was a pain to code for, but a lot of that fault falls at the feet of Sony. Their API and documentation were very basic.

Xenon was much easier to get decent performance out of and the API,  docmentation, libraries and tools supporting it were very good.

So getting something working, and working well on the Xbox360/Xenon, took much less time and effort than doing the same task on PS3.

Even figuring out where you had LHS penalties was easier on Xenon. If I knew our game had performance issues on the PPC core on PS3, I'd profile it on the X360 and fix it there, then reprofile it on PS3 to resolve it because the loop for doing so was much quicker and the tools would give me more information!

So yeah it took developers longer to exploit the "power" of PS3, but that's not so much because it was hard to understand, it's just that making games commercially is a business. We're not there to just piss around learning "cool" hardware.

There was one feature we were implementing and it took 3 full weeks to get it working on PS3, with a few evenings thrown in too, and 2 hours on X360.

Once you'd figured out this "awesome power" of the CELL architecture... it turns out there wasn't really much that it was better at. Certainly not much that you usually need at runtime. The SPUs just aren't that flexible and you spend a lot of time scheduling and marshalling jobs for them.

Whereas on the X360 you just, got on with it, and if it was slow you optimised it and iterated the algorithms and data you used. You could do the same on PS3 but it was a LOT more work to iterate design and data flow for SPU jobs.

If you did all that extra work then yes, you could sometimes get better performance, but the price you paid in time requried just wasn't worth it.


Samuel Crow

Posts 388
19 Apr 2020 13:59


@Gunnar
Wasn't the reason you wanted a software blitter on the 68080 core in the first place that you wanted it to have AMMX acceleration?  What happened?  Did that plan fall through?

Also, if the memory bus can be saturated by the CPU, that means you need faster memory and a controller to go with it.  That doesn't mean the blitter is useless or that the second core to run work loops like the blitter used to do is useless either.  It just means the v2 boards with 128 bit SDRAM are reaching the peak of their useful life and that the v4 boards can shine brighter with a slave processor.

posts 70page  1 2 3 4