APOLLO CPU Knowledge Forum

Overview

Features

Welcome to the Apollo Forum

This forum is for people interested in the APOLLO CPU.
Please read the forum usage manual.
Please visit our Apollo-Discord Server for support.

All Topics

News

Performance

Games

Demos

Apollo

Vampire

AROS

Workbench

ATARI

Releases

The team will post updates and news about our project here

68k FPU Coding Challenge - Win a Prize!	page 1 2


Samuel Devulder Posts 248 12 Sep 2018 17:17	As you can see most of the code is FPU-based. There are very few isntructions that can fit in the 2nd pipe. Most bubbles are filled. There is a big one on fp0 while looping back. It seem seem difficult to eliminate. The only way I see to eliminate it is to use an extra floating-point reg. All FPs are used, so we'll have to use one E reg (E9) for that purpose.

Samuel Devulder

Posts 248
12 Sep 2018 18:17

[EDIT] sory for the duplicate. The post just above is a false-handling. I don't know how to delete it. Please consider only the following text:
--------------------------------------------------------------------
As you can see most of the code is FPU-based. There are very few instructions that can fit in the 2nd pipe. It is possible to split the movem inside the loop by several moves (btw do 2nd pipe accepts memory accesses?), but this would re-create a 3-cycle bubble that was exactly filled by the movem.

Most bubbles are filled. There is a big one on fp0 while looping back. It seem seem difficult to eliminate. The only way I see to eliminate it is to use another floating-point reg. All FPs are used, so we'll have to use one E reg (E9) for that purpose.

If I do this, I am able to remove all bubbles:

loop:
    fadd    fp0,fp1     ; 1 
       
    fmul3   d2,e5,fp5   ; 1
    fmul3   d0,e6,fp6   ; 1
            
    fmul3   d1,e7,fp7   ; 1             
    fmul3   d2,e8,E9    ; 1             
    fmul3   d0,e0,fp0   ; 1
                      
    fadd    fp3,fp4     ; 1             
                      
    fadd    fp1,fp2     ; 1             
    fadd    fp6,fp7     ; 1                  
               
    movem.l (a0)+,d0-d2 ; 3 (was 3 bubbles) -- is it possible to split this into 3 free moves ?
    fadd    fp4,fp5     ; 1                  
               
    fmove.s fp2,(a1)+   ; 1                  
    fadd    fp7,E9      ; 1                  
         
    fmul3   d1,e1,fp1   ; 1 (was 3 bubbles)
    fmul3   d2,e2,fp2   ; 1    "    " 
    fmul3   d0,e3,fp3   ; 1    "    " 
    fmove.s fp5,(a1)+   ; 1               
              
    fmul3   d1,e4,fp4   ; 1 (was 1 bubble)   
    fmove.s E9,(a1)+    ; 1                
   
    dbra    d7,loop     ; 0
   ; total = 21 cycles/loop

It is still possible to gain 3 extra cycles if the movem can be spread into 3 different memory accesses after few fpu instructions. In addition, provided fpu is enabled on the 2nd pipe (not the case atm I think), we could gain even more cycles.

Note: The code looks awful. It is a huge mess :( I can't spot any symmetry in the way it is written. I think it needs a good cleanup/rewriting so that symmetry of datapath is easily spotted. With cleaner code, places where optimisation is still possible might show up easily.

Don Adan

Posts 38
12 Sep 2018 18:59

Samuel Devulder wrote:

loop:
     fadd    fp0,fp1     ; 1 
        
     fmul3   d2,e5,fp5   ; 1
     fmul3   d0,e6,fp6   ; 1
             
     fmul3   d1,e7,fp7   ; 1             
     fmul3   d2,e8,E9    ; 1             
     fmul3   d0,e0,fp0   ; 1
                       
     fadd    fp3,fp4     ; 1             
                       
     fadd    fp1,fp2     ; 1             
     fadd    fp6,fp7     ; 1                  
                
     movem.l (a0)+,d0-d2 ; 3 (was 3 bubbles) -- is it possible to split this into 3 free moves ?
     fadd    fp4,fp5     ; 1                  
                
     fmove.s fp2,(a1)+   ; 1                  
     fadd    fp7,E9      ; 1                  
          
     fmul3   d1,e1,fp1   ; 1 (was 3 bubbles)
     fmul3   d2,e2,fp2   ; 1    "    " 
     fmul3   d0,e3,fp3   ; 1    "    " 
     fmove.s fp5,(a1)+   ; 1               
               
     fmul3   d1,e4,fp4   ; 1 (was 1 bubble)   
     fmove.s E9,(a1)+    ; 1                
    
     dbra    d7,loop     ; 0
    ; total = 21 cycles/loop

If i undetstand Gunnar info, you can use next code:

loop:
     fadd    fp0,fp1     ; 1 
        
     fmul3   d2,e5,fp5   ; 1
     fmul3   d0,e6,fp6   ; 1
             
     fmul3   d1,e7,fp7   ; 1             
     fmul3   d2,e8,E9    ; 1             
     fmul3   d0,e0,fp0   ; 1
                       
     fadd    fp3,fp4     ; 1             
     move.l (a0)+,d0     ; free
     fadd    fp1,fp2     ; 1          
     move.l (a0)+,d1     ; free 
     fadd    fp6,fp7     ; 1                  
                
     move.l (a0)+,d2     ; free
     fadd    fp4,fp5     ; 1                  
                
     fmove.s fp2,(a1)+   ; 1                  
     fadd    fp7,E9      ; 1                  
          
     fmul3   d1,e1,fp1   ; 1 (was 3 bubbles)
     fmul3   d2,e2,fp2   ; 1    "    " 
     fmul3   d0,e3,fp3   ; 1    "    " 
     fmove.s fp5,(a1)+   ; 1               
               
     fmul3   d1,e4,fp4   ; 1 (was 1 bubble)   
     fmove.s E9,(a1)+    ; 1                
    
     dbra    d7,loop     ; 0
    ; total = 21 cycles/loop

Gunnar von Boehn
(Apollo Team Member)
Posts 6254
12 Sep 2018 21:21

little bit more CPU info

APOLLO 68080 can do a free DCache read per cycle
So technically instead loading values in advance in register you can also just do this:


  FMUL.S (a0),Fp0 
  FMUL.S 4(a0),Fp1 
  FMUL.S 8(a0),Fp2

You can read in every instruction from Cache, even if you re-read the same value, this is no disadvantage.

Cheers


Thellier Alain Posts 143 12 Sep 2018 22:08	NICE well done Samuel You wrote 21 cycles but it is 18 if the move for reading x y z are free, no? If writing the x y z is not free (? This is what say your listing) then perhaps using movem to write yz or xyz is possible Anyway your code is for a 3x3 matrix I am almost sûre this is a 4x4 (used as 4x3) that is needed

Szyk Cech

Posts 191
25 Sep 2018 14:17

Gunnar von Boehn wrote:

We would like to invite you to participate on a little coding challenge.

Who won this challenge?!?

posts 26	page 1 2