Overview Features Coding ApolloOS Performance Forum Downloads Products Contact Goto
Apollo-Computer

Welcome to the Apollo Forum

This forum is for people interested in the APOLLO CPU.
Please read the forum usage manual.
Please visit our Apollo-Discord Server for support.



All TopicsNewsPerformanceGamesDemosApolloVampireAROSWorkbenchATARIReleases
The team will post updates and news about our project here

Good News: FPU Upgrades

Gunnar von Boehn
(Apollo Team Member)
Posts 6258
22 Jan 2024 07:41


We have some very good news:
 
1) I have changed the implementation of the FINT instruction
and made it much faster now.
FINT is very fast now, with needing only 1 cycle.
 
 
2) I have reviewed the source code of many PC games.
And I found that mane PC games use an FPU operation for which the 68K FPU family has no Hardware instruction.
 
Many PC games cast "FLOAT to Unsigned Integer", or cast "DOUBLE to Unsigned Integer".
While the 68K Family has an instruction for casting to SIGNED int,
there is no instruction for a cast to unsigned - in the 68K architecture.
 
As workaround compiler put it an instruction sequence to emulate this.
 
Here is the sequence that is typically used.
 

  _ftoUint:
          fsmove.s (4,sp),fp0
          fcmp.s #0x4f000000,fp0
          fjge .L2
          fintrz.x fp0,fp0
          fmove.l fp0,d0
          rts
  .L2:
          fssub.s #0x4f000000,fp0
          fintrz.x fp0,fp0
          fmove.l fp0,d0
          add.l #-2147483648,d0
          rts
 

The above could be replaced with the following new instruction
fmoveU.l fp0,d0
 
We decided to improve this and to enrich the 68k FPU instruction set with two new instructions.
To support casting casting from floats to unsigned and back.
 

This will highly speed this operation up.


John William

Posts 578
22 Jan 2024 20:41


Gunnar von Boehn wrote:

We have some very good news:
 
  1) I have changed the implementation of the FINT instruction
  and made it much faster now.
  FINT is very fast now, with needing only 1 cycle.
 
 
  2) I have reviewed the source code of many PC games.
  And I found that mane PC games use an FPU operation for which the 68K FPU family has no Hardware instruction.
 
  Many PC games cast "FLOAT to Unsigned Integer", or cast "DOUBLE to Unsigned Integer".
  While the 68K Family has an instruction for casting to SIGNED int,
  there is no instruction for a cast to unsigned - in the 68K architecture.
 
  As workaround compiler put it an instruction sequence to emulate this.
 
  Here is the sequence that is typically used.
 

  _ftoUint:
          fsmove.s (4,sp),fp0
          fcmp.s #0x4f000000,fp0
          fjge .L2
          fintrz.x fp0,fp0
          fmove.l fp0,d0
          rts
  .L2:
          fssub.s #0x4f000000,fp0
          fintrz.x fp0,fp0
          fmove.l fp0,d0
          add.l #-2147483648,d0
          rts
 

 
 
  The above could be replaced with the following new instruction
  fmoveU.l fp0,d0
 
  We decided to improve this and to enrich the 68k FPU instruction set with two new instructions.
  To support casting casting from floats to unsigned and back.
 
 
 
  This will highly speed this operation up.

But if this line of code works:

_ftoUint:
          fsmove.s (4,sp),fp0
          fcmp.s #0x4f000000,fp0
          fjge .L2
          fintrz.x fp0,fp0
          fmove.l fp0,d0
          rts
  .L2:
          fssub.s #0x4f000000,fp0
          fintrz.x fp0,fp0
          fmove.l fp0,d0
          add.l #-2147483648,d0
          rts
 
Why replace it?



Gunnar von Boehn
(Apollo Team Member)
Posts 6258
23 Jan 2024 07:29


John William wrote:

    But if this line of code works:
 
 
 
    _ftoUint:
              fsmove.s (4,sp),fp0
              fcmp.s #0x4f000000,fp0
              fjge .L2
              fintrz.x fp0,fp0
              fmove.l fp0,d0
              rts
      .L2:
              fssub.s #0x4f000000,fp0
              fintrz.x fp0,fp0
              fmove.l fp0,d0
              add.l #-2147483648,d0
              rts
 

     
    Why replace it?
   
 

 
  Yes the code works fully correct.
  Of course this function takes some extra time.
  Even if you inline the function.
 

              fcmp.s #0x4f000000,fp0
              fjge .overmax
              fintrz.x fp0,fp0
              fmove.l fp0,d0
              bra  .next
      .overmax:
              fssub.s #0x4f000000,fp0
              fintrz.x fp0,fp0
              fmove.l fp0,d0
              add.l #-2147483648,d0
      .next
 

 
Then you still have 9 instructions
And these need a ~ 23 cycles.
 
Replacement 9 instruction with onlya single instruction which needs 1 cycle - this is always good and gives a speedup.
 

How important is this speedup?

Converting float to int is an operation which is not uncommon.
Some programs might even do this very often.
 
If a program does this only rarely... then of course this speedup will not matter much.

But for programs that use this operation very often... maybe once per row, or even once per pixel - for them this tuning can make a huge difference.


Kamelito Loveless

Posts 261
23 Jan 2024 16:39


This is great. Any idea about the gain for Robin Hood?
Next step is to improve gcc so  those 2 instructions could benefit all programs.


Gunnar von Boehn
(Apollo Team Member)
Posts 6258
24 Jan 2024 11:08


Kamelito Loveless wrote:

This is great. Any idea about the gain for Robin Hood?

 
I think all program using the FPU are doing float to int conversion.
For conversion to signed integre they need 2 instructions ... for unsigned int 9 -  We can improves this for 1 instruction each.
 
Yes also Robin code does float to int conversion about 700 hundred times.
 
 
Kamelito Loveless wrote:

Next step is to improve gcc so  those 2 instructions could benefit all programs.

 
Yes this is a good idea.


Rollef 2000

Posts 29
25 Jan 2024 05:57


Moin,
this development is the meaning of CISC, isn't it?
Very good!


Carles Bernat Martorell

Posts 22
25 Jan 2024 23:42


Thank you for your commitment and hard work!


Gunnar von Boehn
(Apollo Team Member)
Posts 6258
29 Jan 2024 06:01


We have added FMOVERZ and FMOVEURZ to the Core.
You can find their documentation now in the 68080 instruction list.

FMOVERZ converts a float to signed integer, with rounding down to zero.
FMOVEURZ converts a float to unsigned integer, with rounding down to zero.
 
As the C-Language standard requires rounding down to zero, these two instruction help C compilers to make this common conversion most efficient.
 

posts 8