Overview Features Instructions Performance Forum Downloads Products Reseller Contact

Welcome to the Apollo Forum

This forum is for people interested in the APOLLO CPU.
Please read the forum usage manual.
VISIT APOLLO IRC CHANNEL



All TopicsNewsPerformanceGamesApolloVampireReleases
Performance and Benchmark Results!

OpenGL On Vampire Cardspage  1 2 3 4 5 

Norbert Kett
(Apollo Team Member)
Posts 36
10 Mar 2017 13:12


Hello :)

Few weeks ago i found the TinyGL lib on Aminet. I started to work with it. I reworked, and added many new features:
EXTERNAL LINK 
I would like to know if Vampire cards power enough for this OpenGL subset lib. My test subject is Quake 1. I using the GLQuake SDL (AROS) port from BSzili. After some improvement its working with TinyGL, (color blending is not implemented yet)

Since i'm 12000km far from my Amiga machines i can use only WinUAE now. So at first i would like to ask help to test this Quake build.
Is it work on Vampire cards? If yes, what is the result of the timedemo with demo1? If not too bad, i can start to optimize, and speedup the rendering. I hope Apollo core has some useful new asm instructions to make the rendering (and color blending) faster.
Steps to test this quake build:
- make sure you have SDL.library, RTG and AHI
- download the build from EXTERNAL LINK  - setup a 16bit wb screen
- set stack to 10000000 (10MB)
- execute launch from quake-tinygl folder.
- on the ingame console type: timedemo demo1
(full screen mode is not working for me in UAE)

Thanks ;)




Jan Vonka

Posts 37
10 Mar 2017 13:26


Wow, OpenGL would be so cool feature.

But isnt FPU mandatory for Quake?


Wawa T

Posts 400
10 Mar 2017 13:54


software rendering will be very very slow, unless you find some way to hardware accelerate this on particular hardware. perhaps ammx fuctions may be some help here. or something in form of a dedicate 3d core as was announced for natami.
   
    maybe its out of your scope, but ideally the backend should be abstracted from the library itself, providing a hardware drivers for apollo, voodoo3, cv3d, things like that.
   
    previously it has been usually handled by warp3d, that provided dedicated drivers for supported hardware or fallen back to software mode. warp3d has its limitation as it doesnt support a number of more advanced functions the cards it supports actually provide.
   
    alternatively there is open source wazp3d replacement, basically a software renderer, which though can use a backend wrapper. on winuae it wraps the warp3d calls directly to windosw side/api. on aros it wraps to gallium/mesa.
   
    i dont think tinygl provides such infrastructure out of the box. alterntively there is storm mesa, which as minigl sits upon warp or wazp3d, which may be a more flexible solution. the sources are on aminet. an improved version that has been worked on few years ago is here:
    EXTERNAL LINK   
    i think it would be good to consider some design decissions up front, in order not to make vampire 3d solutions an isolöated case. it wouldnt be beneficial if applications would have to be compiled exclusively for it, meaning generic amiga ports would not work and vice versa. also the hardware might change, and the binaries might become incompatible.
   
    i dont know how tinygl is being linked. is that an amiga shared lib? if so then it is less problematic. sorry for lengthy post and if you are aware of all this. good luck!


Michael Nurney

Posts 197
10 Mar 2017 15:19


I've just tried it and the Amiga reboots just as something happens in the window..
 
  Not sure if it's fpu related ...

Hang on I missed the stack setting


Michael Nurney

Posts 197
10 Mar 2017 15:40


I've just tried it and the Amiga reboots just as something happens in the window..

Not sure if it's fpu related ...


Niclas A
(Apollo Team Member)
Posts 127
10 Mar 2017 15:45


Norbert Kett wrote:

Hello :)
 
  Few weeks ago i found the TinyGL lib on Aminet. I started to work with it. I reworked, and added many new features:
  EXTERNAL LINK 
  I would like to know if Vampire cards power enough for this OpenGL subset lib. My test subject is Quake 1. I using the GLQuake SDL (AROS) port from BSzili. After some improvement its working with TinyGL, (color blending is not implemented yet)
 
  Since i'm 12000km far from my Amiga machines i can use only WinUAE now. So at first i would like to ask help to test this Quake build.
  Is it work on Vampire cards? If yes, what is the result of the timedemo with demo1? If not too bad, i can start to optimize, and speedup the rendering. I hope Apollo core has some useful new asm instructions to make the rendering (and color blending) faster.
  Steps to test this quake build:
  - make sure you have SDL.library, RTG and AHI
  - download the build from EXTERNAL LINK  - setup a 16bit wb screen
  - set stack to 10000000 (10MB)
  - execute launch from quake-tinygl folder.
  - on the ingame console type: timedemo demo1
  (full screen mode is not working for me in UAE)
 
  Thanks ;)

Well it starts. But i dont get a menu and it is slooooooooow :)
EXTERNAL LINK 


Niclas A
(Apollo Team Member)
Posts 127
10 Mar 2017 16:03


Got menu. Just had to hide the console.
Works and starts. But slow as hell.. maybe 0,5 FPS as a guesstimate.


Wawa T

Posts 400
10 Mar 2017 16:18


0,5 FPS as a guesstimate

 
  thats well in the reange of expected perfromance. afair cube compiled against storm mesa was giving me around 3-5 fps on a4k/mediator/voodoo, which means hardware accelerated, even condidering bottlenecks of warp3d and slow bus to rtg.

software rendering is not a practicable option, except you only need to move around few low textured polys. it is a good option to test a library before and while implementing dedicated hardware driver, but thats it.


Gunnar von Boehn
(Apollo Team Member)
Posts 2845
10 Mar 2017 16:39


wawa t wrote:

  software rendering is not a practicable option
 

 
Yes with generic 68k code you are right.

On the other hand AMMX has GPU instructions.
They give you ASM feature comparable to the Voodoo.

If you use AMMX for pixel texture you should experience HUGE speedups.
 
So if you use AMMX in a software renderer games like Quake should be  playbable.
 
Of course only in combination with the fully activated FPU.
As we today are busy with V1200 and other Hardware stuff .. to use the FPU you will have to wait a little.
 


Norbert Kett
(Apollo Team Member)
Posts 36
10 Mar 2017 17:46


Yes, i examined the real-fpu, and soft-fpu asm output of gcc...
Without fpu we wasting a lot of time.

Where can i found documentation about the Apollo specific instructions?

EXTERNAL LINK 
this page listing some, but no description available.


Niclas A
(Apollo Team Member)
Posts 127
10 Mar 2017 20:34


Norbert Kett wrote:

Yes, i examined the real-fpu, and soft-fpu asm output of gcc...
  Without fpu we wasting a lot of time.
 
  Where can i found documentation about the Apollo specific instructions?
 
  EXTERNAL LINK 
  this page listing some, but no description available.

Have you looked here?
EXTERNAL LINK  and
EXTERNAL LINK


Norbert Kett
(Apollo Team Member)
Posts 36
11 Mar 2017 08:41


Thanks Niclas ;)

Gunnar: can i know the (68080) cpu cycles of the two following instructions: movem.l & STORE? what are those mentioned GPU
instructions?



Niclas A
(Apollo Team Member)
Posts 127
11 Mar 2017 09:34


You could maybe use the cycle count instruction mentioned here.
CLICK HERE


Gunnar von Boehn
(Apollo Team Member)
Posts 2845
11 Mar 2017 11:35


Norbert Kett wrote:

  Thanks Niclas ;)
 
  Gunnar: can i know the (68080) cpu cycles of the two following instructions: movem.l & STORE? what are those mentioned GPU
  instructions?
 

 
STORE writes 64 Bit in 1 cycle.
MOVEM.l writes 1 register per cycle.
 
What routine/algorimth do you want to do exactly?
Maybe I can help?
 
Regarding GPU there are several which might help you.
 
We have instructions for color blending / byte averaging / alpha blending.
We have instructions for saturation.
So no need to CMP and BCC
We have instructions for GFXFormat convertion.
From 32RGB to 16bit and back.
We can convert 128bit input to 64 bit output in 1 cycle.


Norbert Kett
(Apollo Team Member)
Posts 36
11 Mar 2017 14:02


Thanks for Your answer Gunnar. I have no Amiga with me now, so i can't measure any execution cycles. But those performance registers are useful what Niclas linked.

So with STORE i can write 2/4 pixels in one cycle, and with MOVEM i can write 1/2 pixels in one cycle. (32/16 bit pixel format). I can use it to speedup glClear(). Z & color buffer clearing. the current glClear() code is very slow.

The other thing is the render buffer copy. I can avoid it with full screen mode, no need to optimize.

The most hardest part is the triangle drawing, i still examining how it works. As i see the asm output TingGL using a lot of  instructions to create one pixel. And i see so many memory access, due to the limited amount of registers. The PUT_PIXEL part is integer only, so the extra registers may give huge speedup.
Currently i don't see what AMMX instruction would be useful in rendering. I quoting the PUT_PIXEL part:

#define PUT_PIXEL(_a)
{
  zz = z >> ZB_POINT_Z_FRAC_BITS;
  if ( ZCMP(zz,pz[_a]) ) // z buffer read
  {
    pp[_a] = *(PIXEL *)((char *)texture+(( ((t & mask)<<sh1) | (s & mask) ) >> sh2));
    pz[_a] = zz; // z buffer write
  }
  z += dzdx;
  s += dsdx;
  t += dtdx;
}

This used for the 3D rendering. The 2D rendering not using z
read/write. The texture ptr, the mask, and the two shift values are the same for the full triangle, they should be kept in registers.

<dreaming mode>
two kind of new instructions would be very good:
- A special move instruction for texture reading. "MOVET" Similar to the movem, but listing the registers for this special operation. then this line would be just one single instruction:
pp[_a] = *(PIXEL *)((char *)texture+(( ((t & mask)<<sh1) | (s & mask) ) >> sh2));
- A special CMPZ function for Z read, compare, write, and branch.
</dreaming mode>



Gunnar von Boehn
(Apollo Team Member)
Posts 2845
11 Mar 2017 14:47


Norbert Kett wrote:

I can use it to speedup glClear(). Z & color buffer clearing.

Yes 64bit store is good there.
But if you knwo you render the whole screen (like a game Quake)
Then you not even need to clear Z.
Use signed Z as trick and use one positive and negative each other frame.

Norbert Kett wrote:

The other thing is the render buffer copy. I can avoid it with full screen mode, no need to optimize.

Yes

Norbert Kett wrote:

Currently i don't see what AMMX instruction would be useful in rendering.

Next best Texel rendering looks bad like PS1.:-(
Use bilinear filtering of 4 textel and 2 mipmaps per rendered pixel.
This looks like PS3.
And for this we have instructions.

Norbert Kett wrote:

  <dreaming mode>
  two kind of new instructions would be very good:
  - A special move instruction for texture reading. "MOVET" Similar to the movem, but listing the registers for this special operation. then this line would be just one single instruction:
  pp[_a] = *(PIXEL *)((char *)texture+(( ((t & mask)<<sh1) | (s & mask) ) >> sh2));

What type is PIXEL in this example? BYTE?
We have extra instructions for bilienar mixing and alpha blending.

Norbert Kett wrote:

  - A special CMPZ function for Z read, compare, write, and branch.
  </dreaming mode>

We have conditional instructions/MOVE.
They do exactly what you want!




Norbert Kett
(Apollo Team Member)
Posts 36
11 Mar 2017 15:08


haha, TinyGL doesn't supports mipmaps yet, and the PS1 quality would be supergood with decent speed. what are these extra instructions?

PIXEL is unsigned short or unsigned long. depends on render buffer type. (TinyGL's rendering code is very tricky) no need any blending or bilinear filtering right now, just a quick texture reading :)

is there any chance to work together to make few "less generic" instructions? (texture reading, 4x4 matrix multiplication, etc) would be good to say: "on Vampire cards now you can use OpenGL 1.x or ES1 with decent speed"

what is the plan for the 3D support in the future?

(please describe that conditional move / branch instruction)



Gunnar von Boehn
(Apollo Team Member)
Posts 2845
11 Mar 2017 17:04


Norbert Kett wrote:

haha, TinyGL doesn't supports mipmaps yet, and the PS1 quality would be supergood with decent speed.

PS1 quality is not enough for my taste.
We should aim for PS2 qualitylevel.

Norbert Kett wrote:

what are these extra instructions?

We have MULA (MUL-ALPHA and PIXMERGE)
I can give you examples.

Norbert Kett wrote:

PIXEL is unsigned short or unsigned long.

16bit (5/6/5) is ok-ish for games.
But 24bit will look even better.

Using 32bit to store 24bit in Renderbuffer makes no sense.
As APOLLO can do 3 byte and 6 byte MOVES both in single cycle.
We can high effectively maintain 24bit screens.

Norbert Kett wrote:

  no need any blending or bilinear filtering right now, just a quick texture reading :)

Yes for a quick test I agree.

But for a real demo we should aim for high color
or truecolor and for high quality filtering.

Norbert Kett wrote:

is there any chance to work together to make few "less generic" instructions? (texture reading, 4x4 matrix multiplication, etc) would be good to say: "on Vampire cards now you can use OpenGL 1.x or ES1 with decent speed"

Yes, we have a good toolbox for this.

OK let us make some example.
Can you make a small testroutine which does 1 spezific test.
Like texture map one rasterline.

I need the ASM code for this and then we can work together
and make and example of how to use existing instructions the best.
 
Norbert Kett wrote:

  what is the plan for the 3D support in the future?

Lets start step by step to use what is there.

Norbert Kett wrote:

(please describe that conditional move / branch instruction)

APOLLO will automatically covnert certain combos of BCC +instruction
into a conditional instruction.
This means no branch needed. And can never be mispredicted.
This can improve ZBuffer handling a lot.

Cheers


Norbert Kett
(Apollo Team Member)
Posts 36
12 Mar 2017 05:59


Gunnar von Boehn wrote:

  PS1 quality is not enough for my taste.
  We should aim for PS2 qualitylevel.
 

 
  if You think its possible, i'm not against of that of course :) but i feel for this a dedicated 'texturizer' unit is required. like blitter for line drawing and filling.
 
 
Gunnar von Boehn wrote:

  Using 32bit to store 24bit in Renderbuffer makes no sense.
  As APOLLO can do 3 byte and 6 byte MOVES both in single cycle.
  We can high effectively maintain 24bit screens.
 

 
  TinyGL can use 16/24/32 bit render buffer. The PIXEL types in
  order: ushort,uchar,uint. Accessing 24 bit buffer is more complicated with non Apollo 68k code.
 
 
Gunnar von Boehn wrote:

  OK let us make some example.
  Can you make a small testroutine which does 1 spezific test.
  Like texture map one rasterline.
 
  I need the ASM code for this and then we can work together
  and make and example of how to use existing instructions the best.
 

 
  Ok, i'll do an example.
 
  What i quoted before is a generic texture reading. its good for any buffer type (16/24/32bit), and any (POT) texture size. but with old 68k code its very complicated, and requires many cpu cycles.
 


Norbert Kett
(Apollo Team Member)
Posts 36
14 Mar 2017 15:28


i made a quick calculation: if i assume the CPU runs at 100MHz, and i want 30 frames / second, and 640x400 pixels. then i have 13 CPU cycles to compute one pixel which is not too much. usually an application need to compute many other things too. so, if i'm correct without a special texturing HW we can not achieve ps2 quality. i feel AMMX wont help.

posts 88page  1 2 3 4 5