Overview Features Coding ApolloOS Performance Forum Downloads Products Order Contact

Welcome to the Apollo Forum

This forum is for people interested in the APOLLO CPU.
Please read the forum usage manual.
Please visit our Apollo-Discord Server for support.



All TopicsNewsPerformanceGamesDemosApolloVampireAROSWorkbenchATARIReleases
Running Games and Apps.

Quake ;)page  1 2 3 4 5 6 7 8 9 

Simo Koivukoski
(Apollo Team Member)
Posts 601
02 Nov 2017 10:07


WinUAE:
quake.sasc.030.881 -- textures ok
quake.sasc.030.882 -- textures ok
quake.sasc.040.882 -- textures ok
femu:
19.1 fps quake.sasc.030.881 -- bad textures
18.7 fps quake.sasc.030.882 -- bad textures
21.5 fps quake.sasc.040.882 -- bad textures
xx.x fps quake.sasc.68040.68882 -- textures ok (timedemo does not show fps)



Samuel Devulder

Posts 248
02 Nov 2017 11:54


Thanks! This mean the texture are not related to 881/881 nor 030/040 but the ASM optimizations.
 
This is very interesting!
 
I have the ideal suspect[*] right under my eyes because it is a function that is heavily used in precalc of texture stuff and also modifies the status-register of the FPU to alter the rounding mode. This might be where the FPU emulation might be inaccurate.
@FloorDivMod
 
  ; set rounding mode towards minus infinity
    fmove.l fpcr,d0
    move.l  d0,-(sp)
    moveq  #%10,d1
    bfins  d1,d0{26:2}
    fmove.l d0,fpcr
 
  ; if (numer >= 0.0)
    ftst    fp0
    fblt    .1
 
  ; x = floor(numer / denom);
  ; q = (int)x;
  ; r = (int)floor(numer - (x * denom));
    fmove.d fp0,-(sp)
    fdiv    fp1,fp0
    fmove.l fp0,(a0)
    fmul.l  (a0),fp1
    fmove.d (sp)+,fp0
    fsub    fp1,fp0
    fmove.l fp0,(a1)
    fmove.l (sp)+,fpcr
    rts
 
  ; else /* numer < 0.0) */
  .1:
  ; x = floor(-numer / denom);
  ; q = -(int)x;
  ; r = (int)floor(-numer - (x * denom));
 
 
    fmovem.x fp2/fp3,-(sp)
    fneg    fp0
    fmove  fp0,fp2
    fdiv    fp1,fp0
    fmove.l fp0,d0
    fmove  fp1,fp3
    fmul.l  d0,fp1
    neg.l  d0
    fsub    fp1,fp2
    move.l  d0,(a0)
    fmove.l fp2,d0
    move.l  d0,d1
    beq.b  .2
    fmove.l fp3,d1
    subq.l  #1,(a0)
    sub.l  d0,d1
  .2:
    move.l  d1,(a1)
    fmovem.x (sp)+,fp2/fp3
 
    fmove.l (sp)+,fpcr
    rts

 
Here is a recompiled SAS/C version with that function in plain C instead of ASM. Let's see if it's going any better...
 
EXTERNAL LINK 
____
[*] I'm a bit optimistic, but lets take a shortcut for this time. ;)


Claudio Guglielmotti
(Apollo Team Member)
Posts 185
02 Nov 2017 12:42


I still have the texture bug, also with quake.sasc.040.882-floorDivMod_in_C



Claudio Guglielmotti
(Apollo Team Member)
Posts 185
02 Nov 2017 12:47


I still have the texture bug, also with quake.sasc.040.882-floorDivMod_in_C

(it gives 16,8 fps if someone is interested)


Samuel Devulder

Posts 248
02 Nov 2017 12:56


Thank you Claudio! So it looks like I was too optimistic by targeting a single math function :-(
   
Now lets get rid of a bunch of math functions at once: EXTERNAL LINK   
 
For the speed, you are running which core ? x11 or faster ?


Nixus Minimax

Posts 416
02 Nov 2017 12:58


claudio guglielmotti wrote:

I still have the texture bug, also with quake.sasc.040.882-floorDivMod_in_C
 
  (it gives 16,8 fps if someone is interested)

Simo is on v500 and you are on v600, right? Are you sure you have the same core and femu?



Claudio Guglielmotti
(Apollo Team Member)
Posts 185
02 Nov 2017 13:05


Last quake build (mathlib in C) is corrupted. It is only 139Kb
   
    I use the x11 core for my tests
    (i am on irc now...)
 
  wait, seems it works now, but it is not the mathlib function.
  I have still the texture issue

Speed = 16,4 fps


Claudio Guglielmotti
(Apollo Team Member)
Posts 185
02 Nov 2017 13:22


Nixus Minimax wrote:

Simo is on v500 and you are on v600, right? Are you sure you have the same core and femu?

Biggun is very democratic, he puts the same bugs in the V500 and in the V600.

By the way, i use a V500 with the core V500_TEST17.jic
(last one available)


Vincent Viaule

Posts 5
02 Nov 2017 14:16


Quake Version History
EXTERNAL LINK 


Samuel Devulder

Posts 248
02 Nov 2017 14:52


So it's not in mathlib.c. Too bad. Yet this reduces the search-space down to 15 asm files. Still quite a few.

Let's try with two more asm files removed. They contains the du/dv stuff related to texture: EXTERNAL LINK 


Claudio Guglielmotti
(Apollo Team Member)
Posts 185
02 Nov 2017 15:19


nothing yet.
 
  Quake edge in C  -> bad textures -> fps 15,3

  Quake polyse in C -> bad textures  -> fps 14,3
and when it loaded the demo2 it gave the error "Bad Surface Extents VID_Shutdown"
Packfile pak0.pak : maps /e1m4,bsp
s-> extents: 480
float: 125876 26313 0x4367ec00 0x43f3f600
float : [0  480]
float : [0 30]
float : [0 30]
int : [0 30] 480

   


Samuel Devulder

Posts 248
02 Nov 2017 19:23


Ok... bad texture for all of them. I'm afraid I'll have to enumerate all of the 13 remaining asm files or do dichotomy (4 differents tests). Better do that via IRC instead of polluting the forum.
 
(The bad extents doesn't show bad fpu computations. The error indicates there that it's a special texture that is too big which shouldn't happen. The wrong computation occurred before the error detection, I'm afraid)


Simo Koivukoski
(Apollo Team Member)
Posts 601
02 Nov 2017 20:42


WinUAE:
GKcm36duxUr_quake.sasc.040.882.mathlib-in-C -- textures ok
quake.sasc.040.882.d_polyse-in-c            -- textures ok
quake.sasc.040.882.r_edge-in-c              -- textures ok
quake.sasc.040.882-FloorDivMod_in_C        -- textures ok
femu
20.8 fps GKcm36duxUr_quake.sasc.040.882.mathlib-in-C -- bad textures
18.8 fps quake.sasc.040.882.d_polyse-in-c            -- bad textures
21.1 fps quake.sasc.040.882.r_edge-in-c              -- bad textures
21.3 fps quake.sasc.040.882-FloorDivMod_in_C        -- bad textures



Samuel Devulder

Posts 248
03 Nov 2017 08:38


Last night with Flype on IRC (big thank to him as well), using dichotomy we found the probable suspect: d_scan68k.s. To be really sure a last test is to be performed: make an exe with everything in C, except that single file. Here it is:  EXTERNAL LINK   
   
If the texture error appears with that EXE, then this mean that d_scan68k.s contains code that is badly executed by FEmu. This is a big ASM file containing the routines to render the scans (lines of polygon), so this makes sense. The odd thing is that this code doesn't contain anything exceptional with the FPU. No rounding-mode change in there, just plain 68882 arithmetic.


Claudio Guglielmotti
(Apollo Team Member)
Posts 185
03 Nov 2017 09:57


so you can be happy !!
the last Quake "only d scan" still has the texture bug !

(and plays at 12,8 fps)


Simo Koivukoski
(Apollo Team Member)
Posts 601
03 Nov 2017 10:59


WinUAE:
quake.sasc.040 (1).882 -- textures ok
quake.sasc.040 (2).882 -- textures ok
quake.sasc.040 (4).882 -- textures ok
quake.sasc.040 (3).882 -- textures ok
femu:
13.2 fps quake.sasc.040 (1).882 -- textures ok
13.4 fps quake.sasc.040 (2).882 -- textures ok
19.9 fps quake.sasc.040 (4).882 -- bad textures
19.2 fps quake.sasc.040 (3).882 -- bad textures



Samuel Devulder

Posts 248
03 Nov 2017 12:00


Okay, we've trapped the culprit. Well done sir ! :D
     
Let's do some unix tool wizardry:
$ grep -e "\s\sf" d_scan68k.s | grep -v "^[*]" | \
sort | uniq | sed -e "s/        / /;s/^\s+/  /"
and the bad boy must be one of:

          fadd    fp0,fp2          ;fp2 = w + AMP2*2
          fadd    fp1,fp0
          fadd    fp1,fp3          ;fp3 = h + AMP2*2
          fadd    fp3,fp0          ;sdivz += sdivz16stepu
          fadd    fp3,fp0          ;sdivz += sdivz8stepu
          fadd    fp3,fp4
          fadd    fp4,fp1          ;tdivz += tdivz16stepu
          fadd    fp4,fp1          ;tdivz += tdivz8stepu
          fadd    fp5,fp2          ;zi += zi16stepu
          fadd    fp5,fp2          ;zi += zi8stepu
          fadd    fp5,fp4          ;fp4 = d_ziorigin + fp3 + fp4
          fadd    fp6,fp0          ;sdivz += fp6
          fadd    fp6,fp1
          fadd    fp6,fp1          ;tdivz += fp6
          fadd    fp7,fp2
          fadd    fp7,fp2          ;zi += fp7
          fadd.s  (a1)+,fp0        ;sdivz = d_sdivzorigin + fp0 + fp1
          fadd.s  (a1)+,fp1        ;tdivz = d_tdivzorigin + fp1 + fp6
          fadd.s  (a1)+,fp2        ;zi = d_ziorigin + fp2 + fp7
          fadd.s  (a6)+,fp0        ;sdivz = d_sdivzorigin + fp0 + fp1
          fadd.s  (a6)+,fp1        ;tdivz = d_tdivzorigin + fp1 + fp6
          fadd.s  (a6)+,fp2        ;zi = d_ziorigin + fp2 + fp7
          fbne    @D_DrawSpans16
          fcmp.s  #0,fp0
          fdiv    fp2,fp0          ;fp0=wratio*w/(w+AMP2*2)
          fdiv    fp2,fp6          ;z = (float)0x10000 / zi
          fdiv    fp2,fp7          ;z = (float)0x10000 / zi;
          fdiv    fp3,fp1          ;fp1=hratio*h/(h+AMP2*2)
          fmove  fp2,fp3
          fmove  fp6,fp3
          fmove  fp6,fp7
          fmove  fp7,fp1          ;fp1 = d_zistepu
          fmove  fp7,fp4
          fmove  fp7,fp6
          fmove.d _cl+CL_TIME,fp0  ;get cl.time
          fmove.l d0,fp2          ;fp2 = (float)u
          fmove.l d0,fp3          ;fp3 = (float)v
          fmove.l d1,fp2          ;du = (float)pspan->u
          fmove.l d2,fp7          ;dv = (float)pspan->v
          fmove.l d2,fp7          ;spancountminus1 = (float)(r_turb_spancount-1)
          fmove.l d2,fp7          ;spancountminus1 = (float)(spancount-1)
          fmove.l fp0,d0          ;(int)(cl.time*SPEED)
          fmove.l fp0,d4          ;(int)(cl.time*SPEED)
          fmove.l fp1,d4          ;izistep = d4
          fmove.l fp2,d1          ;d1 = (int)fp2
          fmove.l fp3,d1          ;d1 = (int)fp3
          fmove.l fp4,d3          ;convert to integer
          fmove.l fp6,d4          ;convert to integer
          fmove.l fp6,d7          ;convert to integer
          fmove.l fp7,d5          ;convert to integer
          fmove.l fp7,d6          ;convert to integer
          fmove.l REFDEF_VRECT+VRECT_HEIGHT(a4),fp1
          fmove.l REFDEF_VRECT+VRECT_WIDTH(a4),fp0
          fmove.s #16,fp7
          fmove.s #32768*65536,fp0
          fmove.s #65536,fp6
          fmove.s #65536,fp7
          fmove.s #8,fp7
          fmove.s #AMP2*2,fp2
          fmove.s (a1)+,fp0
          fmove.s (a1)+,fp1
          fmove.s (a1)+,fp6
          fmove.s (a6)+,fp0
          fmove.s (a6)+,fp1
          fmove.s (a6)+,fp6
          fmove.s .szstpu(sp),fp3
          fmove.s .tzstpu(sp),fp4
          fmove.s .zistpu(sp),fp5
          fmove.s _d_subdiv16+CVAR_VALUE,fp0
          fmove.s _d_ziorigin,fp5
          fmove.s _d_zistepu,fp7
          fmove.s _d_zistepv,fp6
          fmovem.x (sp)+,fp2/fp3
          fmovem.x (sp)+,fp2-fp7
          fmovem.x (sp)+,fp3-fp7
          fmovem.x fp2/fp3,-(sp)
          fmovem.x fp2-fp7,-(sp)
          fmovem.x fp3-fp7,-(sp)
          fmul    fp0,fp0          ;w*w
          fmul    fp0,fp1          ;multiply by $8000*$10000
          fmul    fp0,fp2          ;(float)u * wratio*w/(w+AMP2*2)
          fmul    fp0,fp4          ;izi = zi * $8000 * $10000
          fmul    fp0,fp6          ;fp2 = sdivz * z
          fmul    fp0,fp6          ;fp6 = sdivz * z
          fmul    fp0,fp7          ;fp7 = sdivz * z
          fmul    fp1,fp1          ;h*h
          fmul    fp1,fp3          ;(float)v*hratio*h/(h+AMP2*2)
          fmul    fp1,fp6          ;fp6 = tdivz * z
          fmul    fp1,fp7          ;fp7 = tdivz * z
          fmul    fp2,fp0          ;fp0 = du * d_sdivzstepu
          fmul    fp2,fp1          ;fp1 = du * d_tdivzstepu
          fmul    fp7,fp1          ;fp1 = dv * d_sdivzstepv
          fmul    fp7,fp3          ;sdivz16stepu = d_sdivzstepu * 16
          fmul    fp7,fp3          ;sdivz8stepu = d_sdivzstepu * 8
          fmul    fp7,fp4          ;tdivz16stepu = d_tdivzstepu * 16
          fmul    fp7,fp4          ;tdivz8stepu = d_tdivzstepu * 8
          fmul    fp7,fp5          ;zi16stepu = d_zistepu * 16
          fmul    fp7,fp5          ;zi8stepu = d_zistepu * 8
          fmul    fp7,fp6          ;fp6 = dv * d_tdivzstepv
          fmul.l  d0,fp4          ;fp4 = du * d_zistepu
          fmul.l  d1,fp3          ;fp3 = dv * d_zistepv
          fmul.l  d6,fp2          ;* (float)scr_vrect.width
          fmul.l  d7,fp3          ;* (float)scr_vrect.height
          fmul.s  #SPEED,fp0      ;fp0 = cl.time*SPEED
          fmul.s  (a1)+,fp2        ;fp2 = du * d_zistepu
          fmul.s  (a1)+,fp7        ;fp7 = dv * d_zistepv
          fmul.s  (a6)+,fp2        ;fp2 = du * d_zistepu
          fmul.s  (a6)+,fp7        ;fp7 = dv * d_zistepv
          fmul.s  .szstpu(sp),fp6  ;fp6 = d_sdivzstepu * spancountminus1
          fmul.s  .tzstpu(sp),fp6  ;fp6 = d_tdivzstepu * spancountminus1
          fmul.s  .zistpu(sp),fp7  ;fp7 = d_zistepu * spancountminus1
Well.. nothing really fancy in there! It's only a long list of already of long-tested instructions. I think now that Jari should come into play.
   
@Jari: do you want the plain asm source for examination, or do you already have an idea about what is not working with these ?
 


Gunnar von Boehn
(Apollo Team Member)
Posts 6207
03 Nov 2017 12:18


Hi Sam,

what rounding mode is active at this moment in the core?


Samuel Devulder

Posts 248
03 Nov 2017 15:02


Gunnar von Boehn wrote:

      Hi Sam,
       
        what rounding mode is active at this moment in the core?

Hello! The default one as all the ASM code that alter the rounding mode is now replaced by the C version which doesn't change it.
     
The error is very strange because there is no real trick in the ASM code (see: EXTERNAL LINK ). The code deals mainly with 16.16 fixedpt for texture drawing and behaves differently on the 060 than on the 080.
 
By default, the rendering is done in D_DrawSpans16, but if you do on the console

  d_subdiv16 0

it'll be done with D_DrawSpans8. If the texture issue disappears in the latter case, this mean the issue is related to things being done differently between the two versions. This will reduce the search range. If both drawings methods display the texture bug, then this mean that the defect is in common parts of the algorithm. In either case we'll learn something about the issue. It's a good thing to check.


Samuel Devulder

Posts 248
05 Nov 2017 20:42


Ok, no change between D_DrawSpans8 and D_DrawSpans16.
     
However, Flype & I have isolated the issue in one single function: D_DrawSpans16 ( EXTERNAL LINK ). There isn't really any magick trick in there. Just standard fp + fix-point computations. No rounding-mode change seem to be involved because even when I force the rounding mode to the default in function prologue, the texture defect remains there.
     
For me this issue is a complete mystery. I'm totally clueless. :-/
   
Notice: that ASM function contains lots of pipeline bubbles. I'm confident that once this issue is fixed there'll be plenty of room for speed improvements.

posts 170page  1 2 3 4 5 6 7 8 9