Information about the Apollo CPU and FPU. |
|
---|
| | Stefan "Bebbo" Franke
Posts 142 12 Jul 2019 13:59
| Gunnar von Boehn wrote:
| Bebbo did you see my question above?
|
yes, you forgot adding the key word restrict
| |
| | Stefan "Bebbo" Franke
Posts 142 12 Jul 2019 14:14
| have a look at the most recent version here: EXTERNAL LINK
| |
| | Gunnar von Boehn (Apollo Team Member) Posts 6254 12 Jul 2019 14:19
| Stefan "Bebbo" Franke wrote:
| have a look at the most recent version here: EXTERNAL LINK
|
Looks much better - Thanks!
| |
| | Samuel Devulder
Posts 248 12 Jul 2019 15:19
| Stefan "Bebbo" Franke wrote:
| Everything is fine, because the pointers may alias. Use void Scale0(double scalar, double* restrict b, double* restrict c)
|
This gives:
_Scale0: fmovem fp2/fp3,-(sp) fdmove.d (28,sp),fp0 move.l (40,sp),a0 fdmove.d (a0)+,fp3 fdmul.x fp0,fp3 fdmove.d (a0)+,fp2 fdmul.x fp0,fp2 fdmove.d (a0)+,fp1 fdmul.x fp0,fp1 fdmul.d (a0),fp0 move.l (36,sp),a0 ; 0 wait-cycle fmove.d fp3,(a0)+ ; 0 wait-cycle fmove.d fp2,(a0)+ fmovem (sp)+,fp3/fp2 (<== 2 cycles) ; 0 wait-cycle fmove.d fp1,(a0)+ ; 0 wait-cycle fmove.d fp0,(a0) rts
Perfect!! I like this code :)
| |
| | Samuel Devulder
Posts 248 12 Jul 2019 21:46
| @bebbo, I have a piece of self-dependant C code that makes the compiler crash when using -m68080 -fselective-scheduling.snd_mix.c: In function 'S_TransferPaintBuffer': snd_mix.c:135:1: internal compiler error: in final_scan_insn, at final.c:2980 } ^ Please submit a full bug report, with preprocessed source if appropriate. See < EXTERNAL LINK for instructions. Strangely enough when submitting this code to the version 6.5.0b of cex, it doesn't crash (but it crashes with 6.5.0, 8.2.0 and 9.0 of cex though). My version is GNU C11 (GCC) version 6.5.0b 190711211949 (m68k-amigaos) compiled by GNU C version 7.4.0, GMP version 6.1.2, MPFR version 4.0.2, MPC version 1.1.0, isl version none GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072 Compiler executable checksum: 1eb154ad23737cd2dbbbf8e4ff368eb4 Are you interested in that piece of code ? Notice: this occurs with many c files in my project. I can work-around this by compiling it for 68030 (no scheduling implied), but in the end the linker complains about a missing symbol (...snap...)m68k-amigaos/libnix/lib/libm.a(__vfprintf_total_size.o):(.text+0x74e): undefined reference to `__fixdfsi' This may be a totally different issue.
| |
| | Gunnar von Boehn (Apollo Team Member) Posts 6254 13 Jul 2019 06:48
| Hi Bebbo, Many thanks for the GCC improvements. I think GCC got much better through your work! Can we brainstorm about one FPU topic? Lets look at this snipped:
fdmove.d (a0)+,fp3 fdmul.x fp0,fp3
Code like this is very common. We need 2 instruction for this with the "normal" 68K 2-Opp encoding. This could be coded in several ways on 68k. 1) fdmove.d (a0)+,fp3 fdmul.x fp0,fp3
2) fdmove.x fp0,fp3 fdmul.d (a0)+,fp3
3) fdmul.d (a0)+,fp0,fp3 ** Done with BANK
With the new BANK opcode-word we can make this in 1 instruction. Doing it in 1 instruction would increase speed a lot.
| |
| | Stefan "Bebbo" Franke
Posts 142 13 Jul 2019 09:39
| Samuel Devulder wrote:
| @bebbo, I have a piece of self-dependant C code that makes the compiler crash when using -m68080 -fselective-scheduling.snd_mix.c: In function 'S_TransferPaintBuffer': snd_mix.c:135:1: internal compiler error: in final_scan_insn, at final.c:2980 } ^ Please submit a full bug report, with preprocessed source if appropriate. See < EXTERNAL LINK for instructions. Strangely enough when submitting this code to the version 6.5.0b of cex, it doesn't crash (but it crashes with 6.5.0, 8.2.0 and 9.0 of cex though). My version is GNU C11 (GCC) version 6.5.0b 190711211949 (m68k-amigaos) compiled by GNU C version 7.4.0, GMP version 6.1.2, MPFR version 4.0.2, MPC version 1.1.0, isl version none GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072 Compiler executable checksum: 1eb154ad23737cd2dbbbf8e4ff368eb4 Are you interested in that piece of code ? Notice: this occurs with many c files in my project. I can work-around this by compiling it for 68030 (no scheduling implied), but in the end the linker complains about a missing symbol (...snap...)m68k-amigaos/libnix/lib/libm.a(__vfprintf_total_size.o):(.text+0x74e): undefined reference to `__fixdfsi' This may be a totally different issue.
|
I'm always interested. Please file a bug at EXTERNAL LINK and provide a link from cex (use the share option) About `__fixdfsi': try adding -lm ?
| |
| | Stefan "Bebbo" Franke
Posts 142 13 Jul 2019 09:43
| Gunnar von Boehn wrote:
| Hi Bebbo, Many thanks for the GCC improvements. I think GCC got much better through your work! Can we brainstorm about one FPU topic? Lets look at this snipped: fdmove.d (a0)+,fp3 fdmul.x fp0,fp3
Code like this is very common. We need 2 instruction for this with the "normal" 68K 2-Opp encoding. This could be coded in several ways on 68k. 1) fdmove.d (a0)+,fp3 fdmul.x fp0,fp3
2) fdmove.x fp0,fp3 fdmul.d (a0)+,fp3
3) fdmul.d (a0)+,fp0,fp3 ** Done with BANK
With the new BANK opcode-word we can make this in 1 instruction. Doing it in 1 instruction would increase speed a lot.
|
To start with 68080 specific support, I need the GNU to work with new asm insns. So I'd like to get what is already done there. Plus you have to make some final decisions: - a0-a15 or a0-a7, b0-b7? - d0-d15, e0-e15 or d0-d7, e0-e24 The asm should not contain "bank" insns, it should use the new registers. And the assembler will handle it properly.
| |
| | Stefan "Bebbo" Franke
Posts 142 13 Jul 2019 09:54
| and you can't use -m68080 with other gcc versions.
| |
| | Samuel Devulder
Posts 248 13 Jul 2019 11:11
| Stefan "Bebbo" Franke wrote:
| I'm always interested. Please file a bug at EXTERNAL LINK and provide a link from cex (use the share option) |
Done (https://github.com/bebbo/gcc/issues/107) Cex at EXTERNAL LINK About `__fixdfsi': try adding -lm ? |
It is present! I guess, a new issue is required then ;)
| |
| | Grom 68k
Posts 61 13 Jul 2019 11:18
| Gunnar von Boehn wrote:
| Hi Bebbo, Many thanks for the GCC improvements. I think GCC got much better through your work! Can we brainstorm about one FPU topic? Lets look at this snipped: fdmove.d (a0)+,fp3 fdmul.x fp0,fp3
Code like this is very common. We need 2 instruction for this with the "normal" 68K 2-Opp encoding. This could be coded in several ways on 68k. 1) fdmove.d (a0)+,fp3 fdmul.x fp0,fp3
2) fdmove.x fp0,fp3 fdmul.d (a0)+,fp3
3) fdmul.d (a0)+,fp0,fp3 ** Done with BANK
With the new BANK opcode-word we can make this in 1 instruction. Doing it in 1 instruction would increase speed a lot.
|
Is it possible to have this one ? fdsub.d fp0,(a0)+,fp3
| |
| | Samuel Devulder
Posts 248 13 Jul 2019 22:21
| What do you mean by "fdsub.d fp0,(a0)+,fp3" ? 1) fp3 = (a0)+ - fp0 2) fp3 = fp0 - (a0)+ I'm usure case 1) is possible. All examples seen sofar like dc.w $7340 fadd.s (a0)+,fp0 ; (a0)+ + Fp0 => FP1 seem to indicate that memory isn't allowed as the 2nd source. It seem that only f<op> <EA>,<reg1>,<reg2> are allowed (<reg2> = <reg1> <op> <EA>). This imply that interpretation 1 is invalid. To do it, you'll have to do the reverse: "fp3=fp0 - (a0)+", and then apply "fneg fp3" (ie. 2 fpu operations).
| |
| | Claudio Guglielmotti (Apollo Team Member) Posts 185 14 Jul 2019 09:04
| I tried quake compiled with GCC650b by Sam. On my V4, the 68030 version works good. No problems with it.The 68040 and 68060 builds freezes after 8 sec of work. Also the 68080 build crashes after 8 sec of work, but it is more elegant because it gives a Guru 8 000000 B (once I had the error "Bad Entity type -7505165" ). The same sources compiled with GCC641 for the 68040/68060 works good ... seems there is a typo somewhere in GCC 650b
| |
| | Samuel Devulder
Posts 248 14 Jul 2019 12:36
| On my v2 the exe works at least ~30secs without issues (it is the time it take for "timedemo demo1"), but yeah, there are still a couple of issues with gcc650b. It is a work in progress. As a side note: I have, in the past, added a lot of hand-optimized inline asm code (typically: the cross-product function). Now with gcc 650b, these inline functions kind of works against the instruction scheduler (especially true with the cross-product which is used everywhere). The source code will require a bit of de-asm-ification to fully benefit from gcc650b.
| |
| | Stefan "Bebbo" Franke
Posts 142 14 Jul 2019 16:02
| Samuel Devulder wrote:
| On my v2 the exe works at least ~30secs without issues (it is the time it take for "timedemo demo1"), but yeah, there are still a couple of issues with gcc650b. It is a work in progress. As a side note: I have, in the past, added a lot of hand-optimized inline asm code (typically: the cross-product function). Now with gcc 650b, these inline functions kind of works against the instruction scheduler (especially true with the cross-product which is used everywhere). The source code will require a bit of de-asm-ification to fully benefit from gcc650b.
|
Yes, it is "work in progress" - but all releases are passing - the gcc-torture-execute tests for 68000|68020 and -|baserel|baserel32 - all projects added to my amiga-stuff project at github. => Prepare your project to be added to my amiga-stuff and next builds wont break these
| |
| | Grom 68k
Posts 61 15 Jul 2019 13:14
| Stefan "Bebbo" Franke wrote:
| have a look at the most recent version here: EXTERNAL LINK |
Hi bebbo, I make a try to test the -mregparm and gcc use unnecessary fp2. #include <string.h>void Scale8(double scalar0, double scalar, double* restrict b, double* restrict c) { size_t j; for (j=640; j; j--){ *b++ = scalar * *c++; } }
-O2 -mregparm -m68080 _Scale8: move.l #640,d0 fmovem fp2,-(sp) fdmove.x fp1,fp2 .L2: fdmove.d (a1)+,fp0 fdmul.x fp2,fp0 fmove.d fp0,(a0)+ subq.l #1,d0 jne .L2 fmovem (sp)+,fp2 rts
-Os look good
| |
| | Grom 68k
Posts 61 15 Jul 2019 13:36
| Samuel Devulder wrote:
| What do you mean by "fdsub.d fp0,(a0)+,fp3" ? 1) fp3 = (a0)+ - fp0 2) fp3 = fp0 - (a0)+ I'm usure case 1) is possible. All examples seen sofar like dc.w $7340 fadd.s (a0)+,fp0 ; (a0)+ + Fp0 => FP1 seem to indicate that memory isn't allowed as the 2nd source. It seem that only f<op> <EA>,<reg1>,<reg2> are allowed (<reg2> = <reg1> <op> <EA>). This imply that interpretation 1 is invalid. To do it, you'll have to do the reverse: "fp3=fp0 - (a0)+", and then apply "fneg fp3" (ie. 2 fpu operations).
|
Hi Samuel, Yes, I mean fp3 = (a0)+ - fp0 I known that memory isn't allowed as the 2nd source because it is also the output for 2-Opp encoding. I has a doubt for 3-Opp encoding. Samuel Devulder wrote:
| Is it in this link: EXTERNAL LINK ? Is there just a plain ZIP file containing just the content of the setup. (My anti-virus doesn't like the setup, and I usualy prefer plain zip to easily move the installation folder anytime when needed. And most important: on my W10 machine the setup.exe produces this error: EXTERNAL LINK ).
|
I am on W10 too, do you succeed to extract headers from the file in the link. Thanks
| |
| | Stefan "Bebbo" Franke
Posts 142 15 Jul 2019 13:54
| Grom 68k wrote:
| Samuel Devulder wrote:
| Is it in this link: EXTERNAL LINK ? Is there just a plain ZIP file containing just the content of the setup. (My anti-virus doesn't like the setup, and I usualy prefer plain zip to easily move the installation folder anytime when needed. And most important: on my W10 machine the setup.exe produces this error: EXTERNAL LINK ). |
I am on W10 too, do you succeed to extract headers from the file in the link. Thanks
|
to extract the headers you might use the linux tgz file: EXTERNAL LINK
| |
| | Samuel Devulder
Posts 248 15 Jul 2019 13:59
| Quick response: void Scale8(double scalar0, double scalar, double* restrict b, double* restrict c) | notice scalar0 isn't used in the code. If it was used, then fp3 would be used probably too. But if you remove it, then the ASM looks ok (-O2 -mregparm -m68080) :_Scale8: move.l #640,d0 .L2: fdmove.d (a1)+,fp1 fdmul.x fp0,fp1 ; 5 cycles waiting for fp1 fmove.d fp1,(a0)+ subq.l #1,d0 jne .L2 rts Anyway, it is strange that the unused scale0 changes the produced asm. Notice: this asm can be optimized by moving the subq instruction before the previous fmove so that subtraction would be free (there are 5 cycles available there). But the subq might already be free if it is combined with the jne. I don't know. BigGun might give a hint here about which option is best. Concerning the amiga-gcc.exe setup file, I am not able to get its content by opening it as an archive file (sometimes 7z allows viewing exes as archives, which is quite handy). But I have found the source of it causing problems in my setup: the anti-virus (avast). These (anti-virii) tend to dislike compressed data inside non-standard exes. It that case the setup is run in a kind of sandbox producing issues like invalid pointer access. That's why I usually prefer portable apps that can be packaged into a single zip file (like the eclipse IDE for instance), but this not applicable here because there seem to be a kind of binary modifications containing the installation folder in some of the exes. Well that's not a big issue. I just need to disable parts of the anti-virus while installing amiga-gcc.
| |
| | Grom 68k
Posts 61 15 Jul 2019 14:26
| Samuel Devulder wrote:
| Quick response: void Scale8(double scalar0, double scalar, double* restrict b, double* restrict c) | notice scalar0 isn't used in the code. If it was used, then fp3 would be used probably too. But if you remove it, then the ASM looks ok (-O2 -mregparm -m68080) :_Scale8: move.l #640,d0 .L2: fdmove.d (a1)+,fp1 fdmul.x fp0,fp1 ; 5 cycles waiting for fp1 fmove.d fp1,(a0)+ subq.l #1,d0 jne .L2 rts Anyway, it is strange that the unused scale0 changes the produced asm. Notice: this asm can be optimized by moving the subq instruction before the previous fmove so that subtraction would be free (there are 5 cycles available there). But the subq might already be free if it is combined with the jne. I don't know. BigGun might give a hint here about which option is best. Concerning the amiga-gcc.exe setup file, I am not able to get its content by opening it as an archive file (sometimes 7z allows viewing exes as archives, which is quite handy). But I have found the source of it causing problems in my setup: the anti-virus (avast). These (anti-virii) tend to dislike compressed data inside non-standard exes. It that case the setup is run in a kind of sandbox producing issues like invalid pointer access. That's why I usually prefer portable apps that can be packaged into a single zip file (like the eclipse IDE for instance), but this not applicable in that case because there seem to be a kind of binary modifications containing the installation folder in some of the exes. Well that's not a big issue. I just need to disable parts of the anti-virus while installing amiga-gcc.
|
subq.l #1,d0 mustn't be moved for -m68080 Gunnar von Boehn wrote:
| Bebbo here are some examples of FUSINGs: MOVE.L (an)+,(am)+ MOVE.L (an)+,(am)+ ... SUBQ.L #1,Dn BNE.s LOOP Above you see examples of FUSING of 2 instructions which execute together in single cycle in 1 ALU. Both ALU can execute such bundles.
|
It doesn't work only with this configuration, if you use -Os or you swap scalar and scalar0, this looks very good.
| |
|
|
|