Overview Features Coding ApolloOS Performance Forum Downloads Products Order Contact

Welcome to the Apollo Forum

This forum is for people interested in the APOLLO CPU.
Please read the forum usage manual.
Please visit our Apollo-Discord Server for support.



All TopicsNewsPerformanceGamesDemosApolloVampireAROSWorkbenchATARIReleases
Information about the Apollo CPU and FPU.

GCC Improvement for 68080page  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 

Stefan "Bebbo" Franke

Posts 139
12 Jul 2019 13:59


Gunnar von Boehn wrote:

Bebbo did you see my question above?

yes, you forgot adding the key word restrict


Stefan "Bebbo" Franke

Posts 139
12 Jul 2019 14:14


have a look at the most recent version here:

EXTERNAL LINK 



Gunnar von Boehn
(Apollo Team Member)
Posts 6207
12 Jul 2019 14:19


Stefan "Bebbo" Franke wrote:

have a look at the most recent version here:
 
  EXTERNAL LINK 
 

Looks much better - Thanks!


Samuel Devulder

Posts 248
12 Jul 2019 15:19


Stefan "Bebbo" Franke wrote:

Everything is fine, because the pointers may alias.
Use
     

      void Scale0(double scalar, double* restrict b, double* restrict c)
     


This gives:

_Scale0:
        fmovem fp2/fp3,-(sp)
        fdmove.d (28,sp),fp0
        move.l (40,sp),a0
        fdmove.d (a0)+,fp3
        fdmul.x fp0,fp3
        fdmove.d (a0)+,fp2
        fdmul.x fp0,fp2
        fdmove.d (a0)+,fp1
        fdmul.x fp0,fp1
        fdmul.d (a0),fp0
        move.l (36,sp),a0
; 0 wait-cycle
        fmove.d fp3,(a0)+
; 0 wait-cycle
        fmove.d fp2,(a0)+
        fmovem (sp)+,fp3/fp2 (<== 2 cycles)
; 0 wait-cycle
        fmove.d fp1,(a0)+
; 0 wait-cycle
        fmove.d fp0,(a0)
        rts
Perfect!! I like this code :)
   


Samuel Devulder

Posts 248
12 Jul 2019 21:46


@bebbo, I have a piece of self-dependant C code that makes the compiler crash when using -m68080 -fselective-scheduling.
snd_mix.c: In function 'S_TransferPaintBuffer':
  snd_mix.c:135:1: internal compiler error: in final_scan_insn, at final.c:2980
    }
    ^
  Please submit a full bug report,
  with preprocessed source if appropriate.
  See < EXTERNAL LINK for instructions.

  Strangely enough when submitting this code to the version 6.5.0b of cex, it doesn't crash (but it crashes with 6.5.0, 8.2.0 and 9.0 of cex though).
 
  My version is
GNU C11 (GCC) version 6.5.0b 190711211949 (m68k-amigaos)
          compiled by GNU C version 7.4.0, GMP version 6.1.2,
  MPFR version 4.0.2, MPC version 1.1.0, isl version none
  GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
  Compiler executable checksum: 1eb154ad23737cd2dbbbf8e4ff368eb4

  Are you interested in that piece of code ?
 
  Notice: this occurs with many c files in my project. I can work-around this by compiling it for 68030 (no scheduling implied), but in the end the linker complains about a missing symbol
 
(...snap...)m68k-amigaos/libnix/lib/libm.a(__vfprintf_total_size.o):(.text+0x74e):
  undefined reference to `__fixdfsi'
This may be a totally different issue.
 
 


Gunnar von Boehn
(Apollo Team Member)
Posts 6207
13 Jul 2019 06:48


Hi Bebbo,

Many thanks for the GCC improvements.
I think GCC got much better through your work!

Can we brainstorm about one FPU topic?
Lets look at this snipped:


          fdmove.d (a0)+,fp3
          fdmul.x fp0,fp3

Code like this is very common.
We need 2 instruction for this with the "normal" 68K 2-Opp encoding.

This could be coded in several ways on 68k.


1)
    fdmove.d (a0)+,fp3
    fdmul.x fp0,fp3


2)
    fdmove.x fp0,fp3
    fdmul.d  (a0)+,fp3


3)
    fdmul.d (a0)+,fp0,fp3  ** Done with BANK

With the new BANK opcode-word we can make this in 1 instruction.
Doing it in 1 instruction would increase speed a lot.




Stefan "Bebbo" Franke

Posts 139
13 Jul 2019 09:39


Samuel Devulder wrote:

@bebbo, I have a piece of self-dependant C code that makes the compiler crash when using -m68080 -fselective-scheduling.
snd_mix.c: In function 'S_TransferPaintBuffer':
    snd_mix.c:135:1: internal compiler error: in final_scan_insn, at final.c:2980
    }
    ^
    Please submit a full bug report,
    with preprocessed source if appropriate.
    See < EXTERNAL LINK for instructions.

    Strangely enough when submitting this code to the version 6.5.0b of cex, it doesn't crash (but it crashes with 6.5.0, 8.2.0 and 9.0 of cex though).
   
    My version is
GNU C11 (GCC) version 6.5.0b 190711211949 (m68k-amigaos)
            compiled by GNU C version 7.4.0, GMP version 6.1.2,
  MPFR version 4.0.2, MPC version 1.1.0, isl version none
    GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
    Compiler executable checksum: 1eb154ad23737cd2dbbbf8e4ff368eb4

    Are you interested in that piece of code ?
   
    Notice: this occurs with many c files in my project. I can work-around this by compiling it for 68030 (no scheduling implied), but in the end the linker complains about a missing symbol
   
(...snap...)m68k-amigaos/libnix/lib/libm.a(__vfprintf_total_size.o):(.text+0x74e):
  undefined reference to `__fixdfsi'
This may be a totally different issue.
   
   

I'm always interested. Please file a bug at EXTERNAL LINK and provide a link from cex (use the share option)

About `__fixdfsi':  try adding -lm ?



Stefan "Bebbo" Franke

Posts 139
13 Jul 2019 09:43


Gunnar von Boehn wrote:

Hi Bebbo,
 
  Many thanks for the GCC improvements.
  I think GCC got much better through your work!
 
 
  Can we brainstorm about one FPU topic?
  Lets look at this snipped:
 

          fdmove.d (a0)+,fp3
          fdmul.x fp0,fp3
 

 
  Code like this is very common.
  We need 2 instruction for this with the "normal" 68K 2-Opp encoding.
 
  This could be coded in several ways on 68k.
 
 

  1)
      fdmove.d (a0)+,fp3
      fdmul.x fp0,fp3
 

 

  2)
      fdmove.x fp0,fp3
      fdmul.d  (a0)+,fp3
 

 

  3)
      fdmul.d (a0)+,fp0,fp3  ** Done with BANK
 

 
  With the new BANK opcode-word we can make this in 1 instruction.
  Doing it in 1 instruction would increase speed a lot.
 
 

To start with 68080 specific support, I need the GNU to work with new asm insns. So I'd like to get what is already done there.

Plus you have to make some final decisions:

- a0-a15 or a0-a7, b0-b7?
- d0-d15, e0-e15 or d0-d7, e0-e24

The asm should not contain "bank" insns, it should use the new registers. And the assembler will handle it properly.




Stefan "Bebbo" Franke

Posts 139
13 Jul 2019 09:54


and you can't use -m68080 with other gcc versions.


Samuel Devulder

Posts 248
13 Jul 2019 11:11


Stefan "Bebbo" Franke wrote:

    I'm always interested. Please file a bug at EXTERNAL LINK and provide a link from cex (use the share option)

Done (https://github.com/bebbo/gcc/issues/107) Cex at EXTERNAL LINK   

    About `__fixdfsi':  try adding -lm ?
 

It is present! I guess, a new issue is required then ;)
 


Grom 68k

Posts 61
13 Jul 2019 11:18


Gunnar von Boehn wrote:

Hi Bebbo,
 
  Many thanks for the GCC improvements.
  I think GCC got much better through your work!
 
 
  Can we brainstorm about one FPU topic?
  Lets look at this snipped:
 

          fdmove.d (a0)+,fp3
          fdmul.x fp0,fp3
 

 
  Code like this is very common.
  We need 2 instruction for this with the "normal" 68K 2-Opp encoding.
 
  This could be coded in several ways on 68k.
 
 

  1)
      fdmove.d (a0)+,fp3
      fdmul.x fp0,fp3
 

 

  2)
      fdmove.x fp0,fp3
      fdmul.d  (a0)+,fp3
 

 

  3)
      fdmul.d (a0)+,fp0,fp3  ** Done with BANK
 

 
  With the new BANK opcode-word we can make this in 1 instruction.
  Doing it in 1 instruction would increase speed a lot.
 
 

Is it possible to have this one ?

fdsub.d fp0,(a0)+,fp3



Samuel Devulder

Posts 248
13 Jul 2019 22:21


What do you mean by "fdsub.d fp0,(a0)+,fp3" ?
 
      1) fp3 = (a0)+ - fp0
      2) fp3 = fp0 - (a0)+
     
I'm usure case 1) is possible. All examples seen sofar  like

              dc.w    $7340
              fadd.s  (a0)+,fp0      ; (a0)+ + Fp0 => FP1
seem to indicate that memory isn't allowed as the 2nd source. It seem that only f<op> <EA>,<reg1>,<reg2> are allowed (<reg2> = <reg1> <op> <EA>).
     
  This imply that interpretation 1 is invalid. To do it, you'll have to do the reverse: "fp3=fp0 - (a0)+", and then apply "fneg fp3" (ie. 2 fpu operations).
   


Claudio Guglielmotti
(Apollo Team Member)
Posts 185
14 Jul 2019 09:04


I tried quake compiled with GCC650b by Sam.
On my V4, the 68030 version works good. No problems with it.

The 68040 and 68060 builds freezes after 8 sec of work.
Also the 68080 build crashes after 8 sec of work, but it is more elegant because it gives a Guru 8 000000 B  (once I had the error "Bad Entity type -7505165" ).

The same sources compiled with GCC641 for the 68040/68060 works good ...  seems there is a typo somewhere in GCC 650b


Samuel Devulder

Posts 248
14 Jul 2019 12:36


On my v2 the exe works at least ~30secs without issues (it is the time it take for "timedemo demo1"), but yeah, there are still a couple of issues with gcc650b. It is a work in progress. As a side note: I have, in the past, added a lot of hand-optimized inline asm code (typically: the cross-product function). Now with gcc 650b, these inline functions kind of works against the instruction scheduler (especially true with the cross-product which is used everywhere). The source code will require a bit of de-asm-ification to fully benefit from gcc650b.




Stefan "Bebbo" Franke

Posts 139
14 Jul 2019 16:02


Samuel Devulder wrote:

On my v2 the exe works at least ~30secs without issues (it is the time it take for "timedemo demo1"), but yeah, there are still a couple of issues with gcc650b. It is a work in progress. As a side note: I have, in the past, added a lot of hand-optimized inline asm code (typically: the cross-product function). Now with gcc 650b, these inline functions kind of works against the instruction scheduler (especially true with the cross-product which is used everywhere). The source code will require a bit of de-asm-ification to fully benefit from gcc650b.

Yes, it is "work in progress" - but all releases are passing
- the gcc-torture-execute tests for 68000|68020 and -|baserel|baserel32
- all projects added to my amiga-stuff project at github.

=> Prepare your project to be added to my amiga-stuff and next builds wont break these



Grom 68k

Posts 61
15 Jul 2019 13:14


Stefan "Bebbo" Franke wrote:

have a look at the most recent version here:
 
  EXTERNAL LINK 

Hi bebbo,

I make a try to test the -mregparm and gcc use unnecessary fp2.


#include <string.h>

void Scale8(double scalar0, double scalar, double* restrict b, double* restrict c)
{
      size_t j;
      for (j=640; j; j--){
          *b++ = scalar * *c++;
      }
}

-O2 -mregparm -m68080


_Scale8:
        move.l #640,d0
        fmovem fp2,-(sp)
        fdmove.x fp1,fp2

.L2:
        fdmove.d (a1)+,fp0
        fdmul.x fp2,fp0
        fmove.d fp0,(a0)+
        subq.l #1,d0
        jne .L2
        fmovem (sp)+,fp2
        rts

-Os look good


Grom 68k

Posts 61
15 Jul 2019 13:36


Samuel Devulder wrote:

What do you mean by "fdsub.d fp0,(a0)+,fp3" ?
   
      1) fp3 = (a0)+ - fp0
      2) fp3 = fp0 - (a0)+
     
  I'm usure case 1) is possible. All examples seen sofar  like

                dc.w    $7340
                fadd.s  (a0)+,fp0      ; (a0)+ + Fp0 => FP1
seem to indicate that memory isn't allowed as the 2nd source. It seem that only f<op> <EA>,<reg1>,<reg2> are allowed (<reg2> = <reg1> <op> <EA>).
     
  This imply that interpretation 1 is invalid. To do it, you'll have to do the reverse: "fp3=fp0 - (a0)+", and then apply "fneg fp3" (ie. 2 fpu operations).
   

Hi Samuel,

Yes, I mean fp3 = (a0)+ - fp0
I known that memory isn't allowed as the 2nd source because it is also the output for 2-Opp encoding.
I has a doubt for 3-Opp encoding.

Samuel Devulder wrote:

    Is it in this link: EXTERNAL LINK ?
   
    Is there just a plain ZIP file containing just the content of the setup. (My anti-virus doesn't like the setup, and I usualy prefer plain zip to easily move the installation folder anytime when needed. And most important: on my W10 machine the setup.exe produces this error: EXTERNAL LINK ).
   

I am on W10 too, do you succeed to extract headers from the file in the link.

Thanks


Stefan "Bebbo" Franke

Posts 139
15 Jul 2019 13:54


Grom 68k wrote:

 
Samuel Devulder wrote:

      Is it in this link: EXTERNAL LINK ?
     
      Is there just a plain ZIP file containing just the content of the setup. (My anti-virus doesn't like the setup, and I usualy prefer plain zip to easily move the installation folder anytime when needed. And most important: on my W10 machine the setup.exe produces this error: EXTERNAL LINK ).
     
 

 
  I am on W10 too, do you succeed to extract headers from the file in the link.
 
  Thanks

to extract the headers you might use the linux tgz file: EXTERNAL LINK


Samuel Devulder

Posts 248
15 Jul 2019 13:59


Quick response:
void Scale8(double scalar0, double scalar, double* restrict b, double* restrict c)
notice scalar0 isn't used in the code. If it was used, then fp3 would be used probably too. But if you remove it, then the ASM looks ok (-O2 -mregparm -m68080) :
_Scale8:
            move.l #640,d0
    .L2:
            fdmove.d (a1)+,fp1
            fdmul.x fp0,fp1
  ; 5 cycles waiting for fp1
            fmove.d fp1,(a0)+
            subq.l #1,d0
            jne .L2
            rts
Anyway, it is strange that the unused scale0 changes the produced asm. Notice: this asm can be optimized by moving the subq instruction before the previous fmove so that subtraction would be free (there are 5 cycles available there). But the subq might already be free if it is combined with the jne. I don't know. BigGun might give a hint here about which option is best.
 
Concerning the amiga-gcc.exe setup file, I am not able to get its content by opening it as an archive file (sometimes 7z allows viewing exes as archives, which is quite handy). But I have found the source of it causing problems in my setup: the anti-virus (avast). These (anti-virii) tend to dislike compressed data inside non-standard exes. It that case the setup is run in a kind of sandbox producing issues like invalid pointer access. That's why I usually prefer portable apps that can be packaged into a single zip file (like the eclipse IDE for instance), but this not applicable here because there seem to be a kind of binary modifications containing the installation folder in some of the exes. Well that's not a big issue. I just need to disable parts of the anti-virus while installing amiga-gcc.


Grom 68k

Posts 61
15 Jul 2019 14:26


Samuel Devulder wrote:

Quick response:
void Scale8(double scalar0, double scalar, double* restrict b, double* restrict c)
notice scalar0 isn't used in the code. If it was used, then fp3 would be used probably too. But if you remove it, then the ASM looks ok (-O2 -mregparm -m68080) :
_Scale8:
            move.l #640,d0
    .L2:
            fdmove.d (a1)+,fp1
            fdmul.x fp0,fp1
  ; 5 cycles waiting for fp1
            fmove.d fp1,(a0)+
            subq.l #1,d0
            jne .L2
            rts
Anyway, it is strange that the unused scale0 changes the produced asm. Notice: this asm can be optimized by moving the subq instruction before the previous fmove so that subtraction would be free (there are 5 cycles available there). But the subq might already be free if it is combined with the jne. I don't know. BigGun might give a hint here about which option is best.
 
  Concerning the amiga-gcc.exe setup file, I am not able to get its content by opening it as an archive file (sometimes 7z allows viewing exes as archives, which is quite handy). But I have found the source of it causing problems in my setup: the anti-virus (avast). These (anti-virii) tend to dislike compressed data inside non-standard exes. It that case the setup is run in a kind of sandbox producing issues like invalid pointer access. That's why I usually prefer portable apps that can be packaged into a single zip file (like the eclipse IDE for instance), but this not applicable in that case because there seem to be a kind of binary modifications containing the installation folder in some of the exes. Well that's not a big issue. I just need to disable parts of the anti-virus while installing amiga-gcc.

subq.l #1,d0 mustn't be moved for -m68080

Gunnar von Boehn wrote:

Bebbo
 
  here are some examples of FUSINGs:
 
  MOVE.L (an)+,(am)+
  MOVE.L (an)+,(am)+
 
  ...
 
  SUBQ.L #1,Dn
  BNE.s  LOOP
 
 
 
  Above you see examples of FUSING of 2 instructions which execute together in single cycle in 1 ALU.
 
  Both ALU can execute such bundles.

It doesn't work only with this configuration, if you use -Os or you swap scalar and scalar0, this looks very good.


posts 367page  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19