APOLLO CPU Knowledge Forum

Overview

Features

Welcome to the Apollo Forum

This forum is for people interested in the APOLLO CPU.
Please read the forum usage manual.
Please visit our Apollo-Discord Server for support.

All Topics

News

Performance

Games

Demos

Apollo

Vampire

AROS

Workbench

ATARI

Releases

The team will post updates and news about our project here

Good News: FPU Upgrades

Gunnar von Boehn
(Apollo Team Member)
Posts 6223
22 Jan 2024 07:41

We have some very good news:

1) I have changed the implementation of the FINT instruction
and made it much faster now.
FINT is very fast now, with needing only 1 cycle.

2) I have reviewed the source code of many PC games.
And I found that mane PC games use an FPU operation for which the 68K FPU family has no Hardware instruction.

Many PC games cast "FLOAT to Unsigned Integer", or cast "DOUBLE to Unsigned Integer".
While the 68K Family has an instruction for casting to SIGNED int,
there is no instruction for a cast to unsigned - in the 68K architecture.

As workaround compiler put it an instruction sequence to emulate this.

Here is the sequence that is typically used.


  _ftoUint:
          fsmove.s (4,sp),fp0
          fcmp.s #0x4f000000,fp0
          fjge .L2
          fintrz.x fp0,fp0
          fmove.l fp0,d0
          rts
  .L2:
          fssub.s #0x4f000000,fp0
          fintrz.x fp0,fp0
          fmove.l fp0,d0
          add.l #-2147483648,d0
          rts

The above could be replaced with the following new instruction
fmoveU.l fp0,d0

We decided to improve this and to enrich the 68k FPU instruction set with two new instructions.
To support casting casting from floats to unsigned and back.

This will highly speed this operation up.

John William

Posts 566
22 Jan 2024 20:41

Gunnar von Boehn wrote:


   _ftoUint:
           fsmove.s (4,sp),fp0
           fcmp.s #0x4f000000,fp0
           fjge .L2
           fintrz.x fp0,fp0
           fmove.l fp0,d0
           rts
   .L2:
           fssub.s #0x4f000000,fp0
           fintrz.x fp0,fp0
           fmove.l fp0,d0
           add.l #-2147483648,d0
           rts

But if this line of code works:

_ftoUint:
fsmove.s (4,sp),fp0
fcmp.s #0x4f000000,fp0
fjge .L2
fintrz.x fp0,fp0
fmove.l fp0,d0
rts
.L2:
fssub.s #0x4f000000,fp0
fintrz.x fp0,fp0
fmove.l fp0,d0
add.l #-2147483648,d0
rts

Why replace it?

Gunnar von Boehn
(Apollo Team Member)
Posts 6223
23 Jan 2024 07:29

John William wrote:

But if this line of code works:

 
    _ftoUint:
              fsmove.s (4,sp),fp0
              fcmp.s #0x4f000000,fp0
              fjge .L2
              fintrz.x fp0,fp0
              fmove.l fp0,d0
              rts
      .L2:
              fssub.s #0x4f000000,fp0
              fintrz.x fp0,fp0
              fmove.l fp0,d0
              add.l #-2147483648,d0
              rts

Why replace it?

Yes the code works fully correct.
Of course this function takes some extra time.
Even if you inline the function.


              fcmp.s #0x4f000000,fp0
              fjge .overmax
              fintrz.x fp0,fp0
              fmove.l fp0,d0
              bra   .next
      .overmax:
              fssub.s #0x4f000000,fp0
              fintrz.x fp0,fp0
              fmove.l fp0,d0
              add.l #-2147483648,d0
      .next

Then you still have 9 instructions
And these need a ~ 23 cycles.

Replacement 9 instruction with onlya single instruction which needs 1 cycle - this is always good and gives a speedup.

How important is this speedup?

Converting float to int is an operation which is not uncommon.
Some programs might even do this very often.

If a program does this only rarely... then of course this speedup will not matter much.

But for programs that use this operation very often... maybe once per row, or even once per pixel - for them this tuning can make a huge difference.


Kamelito Loveless Posts 260 23 Jan 2024 16:39	This is great. Any idea about the gain for Robin Hood? Next step is to improve gcc so those 2 instructions could benefit all programs.

Gunnar von Boehn
(Apollo Team Member)
Posts 6223
24 Jan 2024 11:08

Kamelito Loveless wrote:

This is great. Any idea about the gain for Robin Hood?

I think all program using the FPU are doing float to int conversion.
For conversion to signed integre they need 2 instructions ... for unsigned int 9 - We can improves this for 1 instruction each.

Yes also Robin code does float to int conversion about 700 hundred times.

Kamelito Loveless wrote:

Next step is to improve gcc so those 2 instructions could benefit all programs.

Yes this is a good idea.


Rollef 2000 Posts 29 25 Jan 2024 05:57	Moin, this development is the meaning of CISC, isn't it? Very good!


Carles Bernat Martorell Posts 22 25 Jan 2024 23:42	Thank you for your commitment and hard work!

Gunnar von Boehn
(Apollo Team Member)
Posts 6223
29 Jan 2024 06:01

We have added FMOVERZ and FMOVEURZ to the Core.
You can find their documentation now in the 68080 instruction list.

FMOVERZ converts a float to signed integer, with rounding down to zero.
FMOVEURZ converts a float to unsigned integer, with rounding down to zero.

As the C-Language standard requires rounding down to zero, these two instruction help C compilers to make this common conversion most efficient.

posts 8