APOLLO CPU Knowledge Forum

Overview

Features

Welcome to the Apollo Forum

This forum is for people interested in the APOLLO CPU.
Please read the forum usage manual.
Please visit our Apollo-Discord Server for support.

All Topics

News

Performance

Games

Demos

Apollo

Vampire

AROS

Workbench

ATARI

Releases

Information about the Apollo CPU and FPU.

GCC Improvement for 68080	page 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

Stefan "Bebbo" Franke

Posts 142
12 Jul 2019 13:59

Gunnar von Boehn wrote:

Bebbo did you see my question above?

yes, you forgot adding the key word restrict


Stefan "Bebbo" Franke Posts 142 12 Jul 2019 14:14	have a look at the most recent version here: EXTERNAL LINK

Gunnar von Boehn
(Apollo Team Member)
Posts 6254
12 Jul 2019 14:19

Stefan "Bebbo" Franke wrote:

have a look at the most recent version here:

EXTERNAL LINK

Looks much better - Thanks!

Samuel Devulder

Posts 248
12 Jul 2019 15:19

Stefan "Bebbo" Franke wrote:

Everything is fine, because the pointers may alias.
Use


      void Scale0(double scalar, double* restrict b, double* restrict c)

This gives:


_Scale0:
         fmovem fp2/fp3,-(sp)
         fdmove.d (28,sp),fp0
         move.l (40,sp),a0
         fdmove.d (a0)+,fp3
         fdmul.x fp0,fp3
         fdmove.d (a0)+,fp2
         fdmul.x fp0,fp2
         fdmove.d (a0)+,fp1
         fdmul.x fp0,fp1
         fdmul.d (a0),fp0
         move.l (36,sp),a0
; 0 wait-cycle
         fmove.d fp3,(a0)+
; 0 wait-cycle
         fmove.d fp2,(a0)+
         fmovem (sp)+,fp3/fp2 (<== 2 cycles)
; 0 wait-cycle
         fmove.d fp1,(a0)+
; 0 wait-cycle
         fmove.d fp0,(a0)
         rts

Perfect!! I like this code :)

Samuel Devulder

Posts 248
12 Jul 2019 21:46

@bebbo, I have a piece of self-dependant C code that makes the compiler crash when using -m68080 -fselective-scheduling.

snd_mix.c: In function 'S_TransferPaintBuffer':
   snd_mix.c:135:1: internal compiler error: in final_scan_insn, at final.c:2980
    }
    ^
   Please submit a full bug report,
   with preprocessed source if appropriate.
   See < EXTERNAL LINK for instructions.

Strangely enough when submitting this code to the version 6.5.0b of cex, it doesn't crash (but it crashes with 6.5.0, 8.2.0 and 9.0 of cex though).

My version is

GNU C11 (GCC) version 6.5.0b 190711211949 (m68k-amigaos)
           compiled by GNU C version 7.4.0, GMP version 6.1.2, 
  MPFR version 4.0.2, MPC version 1.1.0, isl version none
   GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
   Compiler executable checksum: 1eb154ad23737cd2dbbbf8e4ff368eb4

Are you interested in that piece of code ?

Notice: this occurs with many c files in my project. I can work-around this by compiling it for 68030 (no scheduling implied), but in the end the linker complains about a missing symbol

(...snap...)m68k-amigaos/libnix/lib/libm.a(__vfprintf_total_size.o):(.text+0x74e):
  undefined reference to `__fixdfsi'

This may be a totally different issue.

Gunnar von Boehn
(Apollo Team Member)
Posts 6254
13 Jul 2019 06:48

Hi Bebbo,

Many thanks for the GCC improvements.
I think GCC got much better through your work!

Can we brainstorm about one FPU topic?
Lets look at this snipped:


          fdmove.d (a0)+,fp3
          fdmul.x fp0,fp3

Code like this is very common.
We need 2 instruction for this with the "normal" 68K 2-Opp encoding.

This could be coded in several ways on 68k.


1)
     fdmove.d (a0)+,fp3
     fdmul.x fp0,fp3

2)
fdmove.x fp0,fp3
fdmul.d (a0)+,fp3

3)
fdmul.d (a0)+,fp0,fp3 ** Done with BANK

With the new BANK opcode-word we can make this in 1 instruction.
Doing it in 1 instruction would increase speed a lot.

Stefan "Bebbo" Franke

Posts 142
13 Jul 2019 09:39

Samuel Devulder wrote:

@bebbo, I have a piece of self-dependant C code that makes the compiler crash when using -m68080 -fselective-scheduling.

snd_mix.c: In function 'S_TransferPaintBuffer':
    snd_mix.c:135:1: internal compiler error: in final_scan_insn, at final.c:2980
     }
     ^
    Please submit a full bug report,
    with preprocessed source if appropriate.
    See < EXTERNAL LINK for instructions.

Strangely enough when submitting this code to the version 6.5.0b of cex, it doesn't crash (but it crashes with 6.5.0, 8.2.0 and 9.0 of cex though).

My version is

GNU C11 (GCC) version 6.5.0b 190711211949 (m68k-amigaos)
            compiled by GNU C version 7.4.0, GMP version 6.1.2, 
   MPFR version 4.0.2, MPC version 1.1.0, isl version none
    GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
    Compiler executable checksum: 1eb154ad23737cd2dbbbf8e4ff368eb4

(...snap...)m68k-amigaos/libnix/lib/libm.a(__vfprintf_total_size.o):(.text+0x74e):
   undefined reference to `__fixdfsi'

This may be a totally different issue.

I'm always interested. Please file a bug at EXTERNAL LINK and provide a link from cex (use the share option)

About `__fixdfsi': try adding -lm ?

Stefan "Bebbo" Franke

Posts 142
13 Jul 2019 09:43

Gunnar von Boehn wrote:

Hi Bebbo,

Many thanks for the GCC improvements.
I think GCC got much better through your work!

Can we brainstorm about one FPU topic?
Lets look at this snipped:

fdmove.d (a0)+,fp3
fdmul.x fp0,fp3

Code like this is very common.
We need 2 instruction for this with the "normal" 68K 2-Opp encoding.

This could be coded in several ways on 68k.

1)
fdmove.d (a0)+,fp3
fdmul.x fp0,fp3

2)
fdmove.x fp0,fp3
fdmul.d (a0)+,fp3

3)
fdmul.d (a0)+,fp0,fp3 ** Done with BANK

With the new BANK opcode-word we can make this in 1 instruction.
Doing it in 1 instruction would increase speed a lot.

To start with 68080 specific support, I need the GNU to work with new asm insns. So I'd like to get what is already done there.

Plus you have to make some final decisions:

- a0-a15 or a0-a7, b0-b7?
- d0-d15, e0-e15 or d0-d7, e0-e24

The asm should not contain "bank" insns, it should use the new registers. And the assembler will handle it properly.


Stefan "Bebbo" Franke Posts 142 13 Jul 2019 09:54	and you can't use -m68080 with other gcc versions.

Samuel Devulder

Posts 248
13 Jul 2019 11:11

Stefan "Bebbo" Franke wrote:

I'm always interested. Please file a bug at EXTERNAL LINK and provide a link from cex (use the share option)

Done (https://github.com/bebbo/gcc/issues/107) Cex at EXTERNAL LINK

About `__fixdfsi': try adding -lm ?

It is present! I guess, a new issue is required then ;)

Grom 68k

Posts 61
13 Jul 2019 11:18

Gunnar von Boehn wrote:

Is it possible to have this one ?

fdsub.d fp0,(a0)+,fp3

Samuel Devulder

Posts 248
13 Jul 2019 22:21

What do you mean by "fdsub.d fp0,(a0)+,fp3" ?

1) fp3 = (a0)+ - fp0
2) fp3 = fp0 - (a0)+

I'm usure case 1) is possible. All examples seen sofar like


               dc.w    $7340
               fadd.s  (a0)+,fp0      ; (a0)+ + Fp0 => FP1

seem to indicate that memory isn't allowed as the 2nd source. It seem that only f<op> <EA>,<reg1>,<reg2> are allowed (<reg2> = <reg1> <op> <EA>).

This imply that interpretation 1 is invalid. To do it, you'll have to do the reverse: "fp3=fp0 - (a0)+", and then apply "fneg fp3" (ie. 2 fpu operations).

Claudio Guglielmotti
(Apollo Team Member)
Posts 185
14 Jul 2019 09:04

I tried quake compiled with GCC650b by Sam.
On my V4, the 68030 version works good. No problems with it.

The 68040 and 68060 builds freezes after 8 sec of work.
Also the 68080 build crashes after 8 sec of work, but it is more elegant because it gives a Guru 8 000000 B (once I had the error "Bad Entity type -7505165" ).

The same sources compiled with GCC641 for the 68040/68060 works good ... seems there is a typo somewhere in GCC 650b

Samuel Devulder

Posts 248
14 Jul 2019 12:36

On my v2 the exe works at least ~30secs without issues (it is the time it take for "timedemo demo1"), but yeah, there are still a couple of issues with gcc650b. It is a work in progress. As a side note: I have, in the past, added a lot of hand-optimized inline asm code (typically: the cross-product function). Now with gcc 650b, these inline functions kind of works against the instruction scheduler (especially true with the cross-product which is used everywhere). The source code will require a bit of de-asm-ification to fully benefit from gcc650b.

Stefan "Bebbo" Franke

Posts 142
14 Jul 2019 16:02

Samuel Devulder wrote:

Yes, it is "work in progress" - but all releases are passing
- the gcc-torture-execute tests for 68000|68020 and -|baserel|baserel32
- all projects added to my amiga-stuff project at github.

=> Prepare your project to be added to my amiga-stuff and next builds wont break these

Grom 68k

Posts 61
15 Jul 2019 13:14

Stefan "Bebbo" Franke wrote:

have a look at the most recent version here:

EXTERNAL LINK

Hi bebbo,

I make a try to test the -mregparm and gcc use unnecessary fp2.


#include <string.h>void Scale8(double scalar0, double scalar, double* restrict b, double* restrict c)
{
       size_t j;
       for (j=640; j; j--){
           *b++ = scalar * *c++;
       }
}

-O2 -mregparm -m68080


_Scale8:
         move.l #640,d0
         fmovem fp2,-(sp)
         fdmove.x fp1,fp2
.L2:
         fdmove.d (a1)+,fp0
         fdmul.x fp2,fp0
         fmove.d fp0,(a0)+
         subq.l #1,d0
         jne .L2
         fmovem (sp)+,fp2
         rts

-Os look good

Grom 68k

Posts 61
15 Jul 2019 13:36

Samuel Devulder wrote:

What do you mean by "fdsub.d fp0,(a0)+,fp3" ?

1) fp3 = (a0)+ - fp0
2) fp3 = fp0 - (a0)+

I'm usure case 1) is possible. All examples seen sofar like


                dc.w    $7340
                fadd.s  (a0)+,fp0      ; (a0)+ + Fp0 => FP1

Hi Samuel,

Yes, I mean fp3 = (a0)+ - fp0
I known that memory isn't allowed as the 2nd source because it is also the output for 2-Opp encoding.
I has a doubt for 3-Opp encoding.

Samuel Devulder wrote:

Is it in this link: EXTERNAL LINK ?

Is there just a plain ZIP file containing just the content of the setup. (My anti-virus doesn't like the setup, and I usualy prefer plain zip to easily move the installation folder anytime when needed. And most important: on my W10 machine the setup.exe produces this error: EXTERNAL LINK ).

I am on W10 too, do you succeed to extract headers from the file in the link.

Thanks

Stefan "Bebbo" Franke

Posts 142
15 Jul 2019 13:54

Grom 68k wrote:

Samuel Devulder wrote:

Is it in this link: EXTERNAL LINK ?

Is there just a plain ZIP file containing just the content of the setup. (My anti-virus doesn't like the setup, and I usualy prefer plain zip to easily move the installation folder anytime when needed. And most important: on my W10 machine the setup.exe produces this error: EXTERNAL LINK ).

I am on W10 too, do you succeed to extract headers from the file in the link.

Thanks

to extract the headers you might use the linux tgz file: EXTERNAL LINK

Samuel Devulder

Posts 248
15 Jul 2019 13:59

Quick response:

void Scale8(double scalar0, double scalar, double* restrict b, double* restrict c)

notice scalar0 isn't used in the code. If it was used, then fp3 would be used probably too. But if you remove it, then the ASM looks ok (-O2 -mregparm -m68080) :

_Scale8:
             move.l #640,d0
     .L2:
             fdmove.d (a1)+,fp1
             fdmul.x fp0,fp1
   ; 5 cycles waiting for fp1
             fmove.d fp1,(a0)+
             subq.l #1,d0
             jne .L2
             rts

Anyway, it is strange that the unused scale0 changes the produced asm. Notice: this asm can be optimized by moving the subq instruction before the previous fmove so that subtraction would be free (there are 5 cycles available there). But the subq might already be free if it is combined with the jne. I don't know. BigGun might give a hint here about which option is best.

Concerning the amiga-gcc.exe setup file, I am not able to get its content by opening it as an archive file (sometimes 7z allows viewing exes as archives, which is quite handy). But I have found the source of it causing problems in my setup: the anti-virus (avast). These (anti-virii) tend to dislike compressed data inside non-standard exes. It that case the setup is run in a kind of sandbox producing issues like invalid pointer access. That's why I usually prefer portable apps that can be packaged into a single zip file (like the eclipse IDE for instance), but this not applicable here because there seem to be a kind of binary modifications containing the installation folder in some of the exes. Well that's not a big issue. I just need to disable parts of the anti-virus while installing amiga-gcc.

Grom 68k

Posts 61
15 Jul 2019 14:26

Samuel Devulder wrote:

Quick response:

void Scale8(double scalar0, double scalar, double* restrict b, double* restrict c)

notice scalar0 isn't used in the code. If it was used, then fp3 would be used probably too. But if you remove it, then the ASM looks ok (-O2 -mregparm -m68080) :

_Scale8:
             move.l #640,d0
     .L2:
             fdmove.d (a1)+,fp1
             fdmul.x fp0,fp1
   ; 5 cycles waiting for fp1
             fmove.d fp1,(a0)+
             subq.l #1,d0
             jne .L2
             rts

Anyway, it is strange that the unused scale0 changes the produced asm. Notice: this asm can be optimized by moving the subq instruction before the previous fmove so that subtraction would be free (there are 5 cycles available there). But the subq might already be free if it is combined with the jne. I don't know. BigGun might give a hint here about which option is best.

Concerning the amiga-gcc.exe setup file, I am not able to get its content by opening it as an archive file (sometimes 7z allows viewing exes as archives, which is quite handy). But I have found the source of it causing problems in my setup: the anti-virus (avast). These (anti-virii) tend to dislike compressed data inside non-standard exes. It that case the setup is run in a kind of sandbox producing issues like invalid pointer access. That's why I usually prefer portable apps that can be packaged into a single zip file (like the eclipse IDE for instance), but this not applicable in that case because there seem to be a kind of binary modifications containing the installation folder in some of the exes. Well that's not a big issue. I just need to disable parts of the anti-virus while installing amiga-gcc.

subq.l #1,d0 mustn't be moved for -m68080

Gunnar von Boehn wrote:

Bebbo

here are some examples of FUSINGs:

MOVE.L (an)+,(am)+
MOVE.L (an)+,(am)+

...

SUBQ.L #1,Dn
BNE.s LOOP

Above you see examples of FUSING of 2 instructions which execute together in single cycle in 1 ALU.

Both ALU can execute such bundles.

It doesn't work only with this configuration, if you use -Os or you swap scalar and scalar0, this looks very good.

posts 367	page 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19