APOLLO CPU Knowledge Forum

Overview

Features

Welcome to the Apollo Forum

This forum is for people interested in the APOLLO CPU.
Please read the forum usage manual.
Please visit our Apollo-Discord Server for support.

All Topics

News

Performance

Games

Demos

Apollo

Vampire

AROS

Workbench

ATARI

Releases

Information about the Apollo CPU and FPU.

"Fast Blitter Operations"	page 1 2 3 4


Olle Haerstedt Posts 110 10 Apr 2020 22:39	The Apollo wiki mentions "Fast Blitter operations" as a property of SAGA. Is there any more detailed information about this? Wiki link: https://wiki.apollo-accelerators.com/doku.php/saga:video

Gunnar von Boehn
(Apollo Team Member)
Posts 6263
13 Apr 2020 11:28

Generally I would recommend not to use the Blitter in the future.

I can give you some reason for this:

A major part of the "beauty" of the Amiga was using the very coder friendly 68K CPU.
Because of the elegant to code 68K CPU the Amiga had such a big coder scene.

While the 68K CPU family is great to code, only the 68030 or higher CPUs are really fast in Bitshifting operations.
The CPU used on the low end AMIGA, the 68000 has a performance problem here.
The Blitter is a "fix" for this.
The Blitter allows Bitshift/Bitcombine operations useful for GFX operations at a speed like 68020/68030 CPU.

This means the AMIGA Blitter is a performance fix for the 68000 CPU.
But the Blitter also has some "drawbacks".

The AMIGA OS is a Multitasking OS.
The Blitter does NOT support saving a context. This means several task can NOT use the Blitter or share the Blitter elegantly. Each task always has to WAIT for the previous Blitter job of the previous task to fully finish. Also manipulating GFX in parallel with Blitter and CPU can create problems and in general the AMIGA OS will "busy-wait" for a Blitter to finish to prevent problems.
Also one taks using the Blitter has to carefully wait for the first job to finish before starting the next job, as starting a job while the Blitter is not fully finished will cause bad things = even crash.

This means the Amiga Blitter is a compromise.
It improves bitoperation speed to be comparable to 68020 Level but it also introduces drawbacks for Multitasking and deadlock problems.
Also it makes coding more complex and the reason for many games problems is not 100% correct coding of the Blitter.

Ray Couzens

Posts 93
13 Apr 2020 12:10

Gunnar von Boehn wrote:

This means the Amiga Blitter is a compromise.
It improves bitoperation speed to be comparable to 68020 Level but it also introduces drawbacks for Multitasking and deadlock problems.
Also it makes coding more complex and the reason for many games problems is not 100% correct coding of the Blitter.

Thanks for that! It is this kind of information that is so very useful and interesting. I never realized that the blitter would not have been so good in later Amigas because they did not need it, except I guess, for backward compatibility.

Pity the blitter chip was not multitasking friendly.


Olle Haerstedt Posts 110 13 Apr 2020 15:13	Thanks for the answer! Some questions: 1) Does multitask matter for a graphical co-cpu? Doesn't the same argument apply to the copper? 2) Could a modern blitter be made faster than 080 in moving data? And thus still be able to complement it?

Gunnar von Boehn
(Apollo Team Member)
Posts 6263
13 Apr 2020 15:20

Olle Haerstedt wrote:

1) Doesn't the same argument apply to the copper?

No not at all.
The way you use the Copper and Blitter is very different.
The Copper is there to "control" the display.
This means for the display you have 1 copper list to control it.

Olle Haerstedt wrote:

2) Could a modern blitter be made faster than 080 in moving data? And thus still be able to complement it?

AMMX is very fast.
The 68080 with AMMX can fully saturate the memory interface.
And if the memory interfaces is saturated there is no "room" left for a Blitter.

AMMX is Multitasking friendly and deadlock free.
I can highly recommend you using it for rendering.

Maybe you can think about the blitter as a MEMCOPY function.
It is like 4 times faster than the 68000 CPU doing this.
But its slower than an 68030 doing the same.

And every time a program calls a memcopy - it would instead call a litter memcopy. The main program still needs to prepare the parameters for the function, and will basically wait for the function to finish.
And the overhead of creating the parameter for the Blitter is a significant this means for a short memcopy the Blitter is slower than doing it in 68000 - because of the extra startup overhead.

Actually using Blitter makes programs more complex and more error prone. But it was paying of as its faster then the old 68000.
A big problem with the Blitter/HW memcopy function is that in a multitasking environment if 2 programs call it same time - the hell breaks loose - so it need be protected with a semaphore.


Kamelito Loveless Posts 261 14 Apr 2020 11:18	Since Fblit is open source I suppose that it could be optimized for 68080/AMMX.

A1200 Coder

Posts 74
14 Apr 2020 11:39

The greatest problem with blitter is that it is SLOW.

It's still possible to use the blitter from different interrupt levels, without making faulty blits. This would correspond to multitasking or context switches.

Before writing into any blitter registers, the blitter setup values need to be saved into memory and then if there's an interrupt, the interrupt needs to write these saved values from previous blit back into blitter registers before returning. This will ensure that the previous blit will not fail because its registers were messed up before it had the chance to start the blitter operation.

Of course you need to also wait before blitter finishes, before starting a new blitter operation. To alleviate this problem no large blits should be allowed in a multitasking environment, but instead break up larger blits into smaller ones.

Ray Couzens

Posts 93
14 Apr 2020 12:24

A1200 coder wrote:

The greatest problem with blitter is that it is SLOW.

It's still possible to use the blitter from different interrupt levels, without making faulty blits. This would correspond to multitasking or context switches.

Before writing into any blitter registers, the blitter setup values need to be saved into memory and then if there's an interrupt, the interrupt needs to write these saved values from previous blit back into blitter registers before returning. This will ensure that the previous blit will not fail because its registers were messed up before it had the chance to start the blitter operation.

Of course you need to also wait before blitter finishes, before starting a new blitter operation. To alleviate this problem no large blits should be allowed in a multitasking environment, but instead break up larger blits into smaller ones.

I guess this is only of value to 68000-based Amigas, but does using the Blitter this way make programs more efficient considering the overhead required in saving states and still having to wait for the previous blitter operation? I'm not an Amiga programmer, although I did learn C programming on my A500 in the 80's. So asking out of curiosity.

A1200 Coder

Posts 74
16 Apr 2020 08:19

Ray Couzens wrote:

The blitter is certainly useful on all m68k-Amigas. On a 68020-68060 with fast ram, it can be used for gfx operations in parallel with CPU when CPU is accessing only fast ram and doing other things. The efficiency comes here from that you can have blitter doing something useful on the chip ram bus, when the CPU is doing something else.

Saving blitter state causes some overhead, but not much. Instead of busy waiting for blitter ready bit, one can also use blitter finished interrupt, to get an interrupt-driven feeding of new blitter operations.

The reason why one would want to use the blitter from several interrupt levels is e.g. you have some gfx objects that needs to be updated 50 FPS, and then also want to allocate rest of blitter time for some non timing critical operations.

A blitter chip would also be useful on Vampire, but it would need to be improved, and memory bandwidth is a problem as CPU and blitter would then share the same memory bus, which is not enough for both operating in parallel.

Gunnar von Boehn
(Apollo Team Member)
Posts 6263
16 Apr 2020 08:27

A1200 coder wrote:

The blitter is certainly useful on all m68k-Amigas. On a 68020-68060 with fast ram, it can be used for gfx operations in parallel with CPU when CPU is accessing only fast ram and doing other things.

Yes for a "special case" like a game.
But mind that this is NOT how Amiga OS works today.
AMIGA OS will often wait for the Blitter to finish.

Also you need to mind that the Blitter is slower than a 68030 CPU for many tasks - this means many GFX operations like printing TEXT
will run slower if you use the Blitter instead the CPU.

Ray Couzens

Posts 93
16 Apr 2020 08:45

Gunnar von Boehn wrote:

A1200 coder wrote:

The blitter is certainly useful on all m68k-Amigas. On a 68020-68060 with fast ram, it can be used for gfx operations in parallel with CPU when CPU is accessing only fast ram and doing other things.

Yes for a "special case" like a game.
But mind that this is NOT how Amiga OS works today.
AMIGA OS will often wait for the Blitter to finish.

Also you need to mind that the Blitter is slower than a 68030 CPU for many tasks - this means many GFX operations like printing TEXT
will run slower if you use the Blitter instead the CPU.

This is very interesting and I get the idea that only experience through trial and error, or asking here, will let us know when using the Blitter would benefit. It doesn't guarantee us better performance, it could be slower in many cases, but in others, it may offer improvements, depending on our implementation. For now, probably not something I should worry about until I become more experienced with ASM.

Thanks for the information.

Gunnar von Boehn
(Apollo Team Member)
Posts 6263
16 Apr 2020 11:50

Ray Couzens wrote:

You are 100% right.

The Blitter is faster than 68000.
But setting the Blitter up cost extra.
So for small operations even the 68000 will be faster than the Blitter.

The 68030 is generally faster than the Blitter.
The 68030 can use fastmem, and can benefit from caches
and the CPU can often use routines which are "smarter" than the normal blit job.

If ALL sources are in Chipmem the Blitter "can" have an advantage as the Blitter can under some circumstances get more DMA slots to chipmem, in the other hand on a machine with 32bit chipmem the CPU can be twice is efficient as the Blitter per memory access. There will be many different cases in which solutions is better depends on many factors. As rule of thumb 68020/68030 CPU is in general faster than Blitter.

The OS will generally not use CPU and Blitter in parallel.
On the other hand for a game you could do this - but then how much this benefits will depend on a case by case. There are often situations in a game where you shot many blit jobs to the Blitter, and CPU the time to prepare the next one, is similar to the extra overhead for setting the blitter up. This means the paralism benefit is minor and in the end only who does the copy faster counts.

Also when doing a BLijob you have choices as coder.
For example when doing a typical 5 plane(32color) Bob blit.
This means "normally" 5 Blitjobs per BOB.
The amount of overhead is very high here.

You can lower the overhead to 1 job, if you create a 5 times bigger MASK for the BOB in memory - but this wastes precious chipmem.

As you see the whole calculation does depend on many factors.


Kamelito Loveless Posts 261 16 Apr 2020 19:54	IIRC you can do it in one Blitter operation if the bitplanes are interleaved.

Olle Haerstedt

Posts 110
16 Apr 2020 21:01

Does it really matter if the CPU is faster than the blitter if they can work in parallel?

> The OS will generally not use CPU and Blitter in parallel.

The OS, or the game engines?

Gunnar von Boehn
(Apollo Team Member)
Posts 6263
16 Apr 2020 21:12

Olle Haerstedt wrote:

Does it really matter if the CPU is faster than the blitter if they can work in parallel?

Lets say you want to PRINTF some text of 20 Chars.

Lets say the Blitter needs per char

Time=2 setup with CPU
Time=5 Blitting
Time = 20*7 == 140 total

Lets say the CPU needs time 4 per Char
Time = 20*4 == 80 total

Does this example make it clearer?

Olle Haerstedt

Posts 110
16 Apr 2020 21:25

Gunnar von Boehn wrote:

Olle Haerstedt wrote:

Does it really matter if the CPU is faster than the blitter if they can work in parallel?

Lets say you want to PRINTF some text of 20 Chars.

Lets say the Blitter needs per char

Time=2 setup with CPU
Time=5 Blitting
Time = 20*7 == 140 total

Lets say the CPU needs time 4 per Char
Time = 20*4 == 80 total

Does this example make it clearer?

No? Since the CPU is free to do other stuff while the blitter is blitting. 2*20 = 40 CPU setup time, which gives 40 cycles more for the CPU than if it did all the work. Right?

Gunnar von Boehn
(Apollo Team Member)
Posts 6263
16 Apr 2020 21:40

Olle Haerstedt wrote:

Gunnar von Boehn wrote:

Olle Haerstedt wrote:

Does it really matter if the CPU is faster than the blitter if they can work in parallel?

No? Since the CPU is free to do other stuff while the blitter is blitting. 2*20 = 40 CPU setup time, which gives 40 cycles more for the CPU than if it did all the work. Right?

The CPU does a PRINTF() call.
The CPU will do nothing in parallel until this is done.
The CPU will busywait for the Blitter to finish each char.
Using the Blitter will just make it slower.

Clear now?

Olle Haerstedt

Posts 110
16 Apr 2020 21:48

OK, but it *could* do something in parallel, in another scenario? Since it's a help processor? Or am I getting it wrong?

> As the blitter is an asynchronous coprocessor, the 680x0 CPU continues to run as the blit is executing.

EXTERNAL LINK

Gunnar von Boehn
(Apollo Team Member)
Posts 6263
16 Apr 2020 22:05

Olle you did ask this question:

Olle Haerstedt wrote:

Does it really matter if the CPU is faster than the blitter if they can work in parallel?

I tried to give you a simple and clear real live example - why it matters. The Blitter does in REAL LIVE slow a fast CPU down in many cases. PRINTF() is one simple example.
The CPU can do nothing sensible in parallel.

In our example the time to print 1 char might be 30 or 40 instructions. Also in a multitasking environment this time can not be used, as "timeslices" for multitasking switches are measured in MILLIONs. This means there are cases were a slow blitter will make a fast CPU wait, and this time is then lost.

A1200 Coder

Posts 74
17 Apr 2020 09:43

Olle Haerstedt wrote:

OK, but it *could* do something in parallel, in another scenario? Since it's a help processor? Or am I getting it wrong?

Yes, the blitter can be useful in some special cases:
-Blitter can CLEAR (not copy data) chip ram in parallel with CPU accessing chip ram also at same time without slowdowns (useful for example for 3D-stars effect)
-in a flight simulator blitter can update the lower screen with various meters, when cpu is calculating next frame in fast ram
-anytime when CPU is doing some longer calculations in fast ram, you can fire up the blitter with some task
-more ideas...

But for AmigaOS performance, the blitter is generally no good - there is a program that patches blitter calls with CPU calls called FBlit on Aminet. It makes Workbench run clearly faster on accelerated Amigas with fast ram.

posts 70	page 1 2 3 4