Overview Features Coding ApolloOS Performance Forum Downloads Products Order Contact

Welcome to the Apollo Forum

This forum is for people interested in the APOLLO CPU.
Please read the forum usage manual.
Please visit our Apollo-Discord Server for support.



All TopicsNewsPerformanceGamesDemosApolloVampireAROSWorkbenchATARIReleases
Information about the Apollo CPU and FPU.

Writing 3D Engine for 68080 In ASMpage  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 

Gunnar von Boehn
(Apollo Team Member)
Posts 6207
14 Jan 2020 11:14


Kyle Blake wrote:

Insistance on C on amiga in general is because the Amiga is a real computer platform, with a real OS. .. ASM code practices does not scale or adapt.

   
Actually there is no insistence on C on AMIGA.
The official AMIGA HW coding books are fully of ASM examples.
 
The AMIGA OS is actually designed to be used from ASM.
Parameters passed to OS functions are based in registers and not using stack. This means to use AMIGA OS - you need special C compiler patches.

 
Fact: Major part of the AMIGA OS are written in ASM.
Simply because ASM is much faster.
   
Fact: Important Graphicroutines of the OS and RTG drivers are written in ASM on AMIGA - simply for the same reason.
   
Fact: Amiga Video players like RIVA are written in 100% ASM for the same reason.
   
In high FPS games (Doom/etc) the render code is always in ASM - again for the same reason.
   
The Vampire Diablo port has the render function written in ASM.
This over doubled the FPS.
   
Writing the Render code in NEOGEO emulator from C to ASM improved the speed by a factor of 5 times.
   
   
Writing non time critical code in C is fully OK.
For time critical routines like render / GFX code ASM is always the best choice.
 
On AMIGA a huge number of programs and tool are written in pure ASM.
 
Most important part is of course ALGORITHM.
Tuning a screencopy in ASM - makes not sense if you can avoid the copy altogether by doing a simply PTR Swap.
 
Or writing a BubbleSort in ASM is the wrong approach
if MergeSort or QuickSort would be the better algorithm to use.
 
 
If you write a text editor than either C or ASM  can be used.
But also Pascal or Modula or, Oberon2 would be good choices.
 
But if you goal is to write a fast 3D FPS game
then ASM is really the best option!
 
 
 


A1200 Coder

Posts 74
14 Jan 2020 14:18


I also agree that coding in asm is the best thing youcan do with the Amiga. You can also make more complex applications in asm. I made once an ansi/vt100 terminal client without using any OS calls on the A500. I just cut the OS off, and stole the keyboard routine from some game, and used sprites for cursor in the terminal.


Kamelito Loveless

Posts 260
14 Jan 2020 17:12


Does it means that AROS critical code that need speed will be rewritten in ASM? AmigaOS Exec is pure ASM for instance.


Gunnar von Boehn
(Apollo Team Member)
Posts 6207
14 Jan 2020 17:15


Kamelito Loveless wrote:

  Does it means that AROS critical code that need speed will be rewritten in ASM? AmigaOS Exec is pure ASM for instance.
 

I know that AROS got 68k ASM tuning for IDE access the other day.
So yes there are improvements done like this today.
But this does not mean that AROS is rewritten in ASM.


Vladimir Repcak

Posts 359
14 Jan 2020 23:26


I spent some time cleaning up the source code, removing all bitplane/copper variables (plus include file) resulting in a single file. Also, I reduced the number of required includes to just 4 from over a dozen.
 
  Hopefully, somebody will find this useful in future to start their coding for Vampire. This sample renders colorful rectangle 256x240.
 

  .68000
    ; The Base NDK 3.9 doesn't have all files
    ; - cybergraphics_lib.i: taken from ChunkyStartup2 on Aminet
    ; - intuition_lib.i:  taken from amiga-sdk on github/deplinenoise
    ; vasm -m68020 -Fhunkexe -I c:\Vampire\samples\gt\build\include_i -o gt.exe  gt.asm
 
    include intuition/screens.i ; NDK 3.9
    include intuition_lib.i  ; EXTERNAL LINK    include cybergraphics.i  ; NDK 3.9
    include cybergraphics_lib.i ; EXTERNAL LINK 
  ResX  equ 320
  ResY  equ 240
  BitDepth equ 32
  PixelBytes equ 4
  START:
    ; Open graphics.library
    move.l 4,a6
    move.l #graphicsLibName,a1
    jsr  -552(a6)    ; Open library
    move.l d0,graphicsBase
    ; Open CyberGraphX
    move.l 4,a6
    move.l #CyberGraphXLibName,a1 ; Name
    move.l #41,d0    ; Version
    jsr  -552(a6)    ; Open library
    move.l d0,CyberGraphXBase  ; Store Ptr
    ; Open Intuition
    move.l 4,a6
    move.l #IntuitionLibName,a1 ; Name
    move.l #39,d0    ; Version
    jsr  -552(a6)    ; Open library
    move.l d0,IntuitionBase2  ; Store Ptr
    ; Open Dos
    move.l 4,a6
    move.l #DosLibName,a1  ; Name
    move.l #39,d0    ; Version
    jsr  -552(a6)    ; Open library
    move.l d0,DosBase    ; Store Ptr
    ; Requester
    move.l CyberGraphXBase,a6
    suba.l a0,a0
    lea.l requestertags,a1
    jsr  _LVOCModeRequestTagList(a6)
    move.l d0,mode_insert+4  ; store into screen taglist
    ; Open Screen
    suba.l a0,a0
    lea.l screentags,a1
    move.l IntuitionBase2,a6
    jsr  _LVOOpenScreenTagList(a6)  ; open the screen
    move.l d0,screen
    add.l #sc_RastPort,d0
    move.l d0,rastport ; save rport address
    ; Open Dummy Window
    suba.l a0,a0
    lea.l windowtags,a1
    move.l IntuitionBase2,a6
    jsr  _LVOOpenWindowTagList(a6) ; open a dummy window...
    move.l d0,window
 
    ; Render 256*240 = 61,440 colors out of 16.7M
      lea FrameBuffer,a0
      clr.l d2    ; d2: ARGB Color
      move.l #(ResY-1),d0  ; d0: YPOS loop
      .scrOuter:
      move.l d2,d3  ; Store
      move.l #256-1,d1  ; d1: XPOS loop
      .scrInner:
        move.l d2,(a0)+
        add.l #1,d2  ; Update Blue
        add.l #256*2,d2 ; Update Green
      dbra d1,.scrInner
      move.l d3,d2  ; Restore
      add.l #256*256,d2  ; Update Red
        ; Skip remaining (320-256) pixels
      add.l #(ResX-256)*PixelBytes,a0
      dbra d0,.scrOuter
 
      jsr UpdateBuffer
    ExitWait:
  ; "Wait" Loop without introducing Timer OS calls into the mix (utterly useless for the purpose of this code sample)
  SecondsToWait equ 5
  ; move.l #7000,d2 ; 7000 = 10 seconds (on my computer)
    move.l #SecondsToWait*700,d2
    .FunnyWaitLoopOuter:
    move.l #$FFFF,d3
    move.l #$FFFFFFFF,d1
    .FunnyWaitLoopInner:
      divu #2,d1
    dbra d3,.FunnyWaitLoopInner
    dbra d2,.FunnyWaitLoopOuter
  exit:
    move.l window,a0
    move.l IntuitionBase2,a6
    jsr _LVOCloseWindow(a6)
 
    move.l screen,a0
    move.l IntuitionBase2,a6
    jsr _LVOCloseScreen(a6) 
 
    move.l  graphicsBase,a6
    jsr -270(a6) ; WaitTOF
    move.l $4,a6
    jmp -126(a6) ; Enable
    ; May need more close calls (CGX/DOS ?)
 
    UpdateBuffer:
    movem.l d0-d7,-(sp)
      lea.l FrameBuffer,a0
      move.l rastport,a1
      clr.w d0
      clr.w d1
      move.w #PixelBytes*ResX,d2  ; bytes per line in source
      clr.w d3
      clr.w d4
      move.w #ResX,d5
      move.w #ResY,d6
      move.w #RECTFMT_ARGB,d7
      move.l CyberGraphXBase,a6
      jsr _LVOWritePixelArray(a6)
    movem.l (sp)+,d0-d7
    rts
 
  ; OS Libs (Pointer + name)
  graphicsBase:  dc.l 0
  graphicsLibName  DC.B 'graphics.library',0
 
  DosBase:  dc.l 0
  DosLibName  dc.b 'dos.library',0
 
  IntuitionBase2:  dc.l 0
  IntuitionLibName dc.b 'intuition.library',0
 
  CyberGraphXBase: dc.l 0
  CyberGraphXLibName dc.b 'cybergraphics.library',0
 
  reqtitle  dc.b "Pick a screenmode",0
    even
  requestertags dc.l CYBRMREQ_WinTitle,reqtitle
      dc.l CYBRMREQ_MinWidth,ResX
      dc.l CYBRMREQ_MaxWidth,ResX
      dc.l CYBRMREQ_MinHeight,ResY
      dc.l CYBRMREQ_MaxHeight,ResY
      dc.l CYBRMREQ_MinDepth,8
      dc.l CYBRMREQ_MaxDepth,32
      dc.l 0,0
   
  rastport  dc.l 0
 
  screentags  dc.l SA_Left,0
      dc.l SA_Top,0
      dc.l SA_Width,ResX
      dc.l SA_Height,ResY
      dc.l SA_Depth
  mode_depth  dc.l BitDepth
      dc.l SA_Type,CUSTOMSCREEN
  mode_insert  dc.l SA_DisplayID,0
      dc.l SA_Draggable,0
      dc.l SA_Exclusive,1
      dc.l 0,0 
 
  window  dc.l 0
  windowtags  dc.l WA_Left,0
      dc.l WA_Top,0
      dc.l WA_Width,20
      dc.l WA_Height,20
      dc.l WA_CustomScreen
  screen  dc.l 0
      dc.l WA_Borderless,1
      dc.l WA_BackFill,LAYERS_NOBACKFILL
      dc.l WA_Activate,1
      dc.l 0,0
 
    Section Chunky,BSS_F
 
  FrameBuffer  ds.b ResX*ResY*PixelBytes
 
 



Vladimir Repcak

Posts 359
15 Jan 2020 00:04


Gunnar von Boehn wrote:

Vladimir Repcak wrote:

  Yes, it's going to cost some bandwidth - 320x240x2 = 150 KB per each frame, which at 60 fps makes ~9 MB/s.
 

 
  9MB read + 9 MB write = 18MB time
  Also on higher res 640x360 = this becomes 55 MB/sec
  This is a lot time for nothing.
 
  The bigger problem of the copy is = it looks like shit.
  As you copy to the screen - which is displayed!
  This means you see the copy/redraw.
  This looks bad.
 
  A much cleaner solution is having 2 buffers
  and rendering the 2nd while the 1st is displayed and then swapping the PTR. This PTR swap is free and looks a lot better.
  You only need to sync the SWAP time.
 
  The very best solution is having 3 Buffers.
  1st is Displayed, 2nd is rendered, and 3rd is also rendered.
  Using 3 Buffers you can unsync
  the render loop from the display time.
  This means if your render routine does not need to wait for the display.
 
  For the final product I can highly recommend you to use Tripple Buffer and to only do PTR update.
 
  On Vampire/SAGA the PTR SWAP is auto-synced with screen display.
  This means all you need to do is 1 MOVE.L and the HW does the rest for you.
 

I presume the copying shouldn't be visible once double/triple buffering will be implemented.

But yeah, especially at higher resolutions, this would mean avoidable performance losses.

I just wanted to share the working cleaned-up code before I go implement it.


Gunnar von Boehn
(Apollo Team Member)
Posts 6207
15 Jan 2020 06:46


Nice start you have done.

You can also test user input.
E.g. Testing the Left Mouse button


  Wait:
    btst #6,$bfe001    ; test left mouse button
    bne  Wait          ; if not pressed goto WaitLoop

Testing Right Mouse button is also very easy


  btst  #2,$dff016  ; Right mouse button
  beq    RMB_pressed 

Also Joystick input can this easy be tested with simple ASM.

Maybe you can upgrade your demo that it shows a rotating 3D object?


Kamelito Loveless

Posts 260
15 Jan 2020 07:01


Nice but yes you should free all ressources you allocate.
There is a jump to enable() while there is no disable. Disable()/Enable() should not be used at least not for a long period under AmigaOS.


Nixus Minimax

Posts 416
15 Jan 2020 08:53


Vlad, you open graphics.library without giving a version so it will fail if there is some junk left in d0 from a previous task. For all I know the OS will not hand you empty registers. Furthermore you don't check whether the OpenLibrary() call succeeds. There is a reason the pointer to the library base is returned in a data register, not an address register. If the pointer is ==0, you need to do some error handling. I also believe your code should have some "even" directives between the strings and the dc.l for the pointers. I haven't counted the string lengths all but "graphics.library",0 is uneven. I'm not sure storing the screen pointer within the taglist for opening the window is a good idea because I don't know whether the OS will preserve these arguments you pass to OpenWindowTagList() and you will need the screen pointer for closing the screen upon exit. Because yes, AmigaOS does not have any resource tracking which is why you must free all allocated resources, close all libraries and so on. Another thing is that you might want to use the exec includes and then use "_LVOOpenLibrary" instead of "-552" and so on.

But you are clearly getting there. If you get the timer.device into the mix and calculate a nice FPS counter, you will soon see that disable()/enable() doesn't make that much of a difference (unless the user is running a raytracing job in the background which would clearly be user error...).



Vladimir Repcak

Posts 359
15 Jan 2020 18:26


Kamelito Loveless wrote:

    Nice but yes you should free all ressources you allocate.
    There is a jump to enable() while there is no disable. Disable()/Enable() should not be used at least not for a long period under AmigaOS.
   

    Thanks. I forgot which sample I took those ones from, but I commented them out now.
    I checked the OS docs and found out they do , explicitly, say that each OpenLibrary must have a matching CloseLibrary, so that's what I did:
   

    ; Exit
    exit:
      ; Close window+screen
      move.l window,a0
      move.l IntuitionBase2,a6
      jsr  _LVOCloseWindow(a6)
   
      move.l screen,a0
      move.l IntuitionBase2,a6
      jsr  _LVOCloseScreen(a6)
 
      ; Close dos.library
      move.l 4,a6
      move.l DosBase,a1
      jsr  _LVOCloseLibrary(a6)  ; Close library
 
      ; Close intuition.library
      move.l 4,a6
      move.l IntuitionBase2,a1
      jsr  _LVOCloseLibrary(a6)  ; Close library
 
      ; Close cybergraphics.library
      move.l 4,a6
      move.l CyberGraphXBase,a1
      jsr  _LVOCloseLibrary(a6)  ; Close library
     
      ; Close graphics.library
      move.l 4,a6
      move.l graphicsBase,a1
      jsr  _LVOCloseLibrary(a6)  ; Close library

      ; commented out
    ; move.l  graphicsBase,a6
    ; jsr -270(a6) ; WaitTOF
    ; move.l $4,a6
    ; jmp -126(a6) ; Enable
   


   


Vladimir Repcak

Posts 359
15 Jan 2020 18:32


Nixus Minimax wrote:

Vlad, you open graphics.library without giving a version so it will fail if there is some junk left in d0 from a previous task. For all I know the OS will not hand you empty registers. Furthermore you don't check whether the OpenLibrary() call succeeds. There is a reason the pointer to the library base is returned in a data register, not an address register. If the pointer is ==0, you need to do some error handling. I also believe your code should have some "even" directives between the strings and the dc.l for the pointers. I haven't counted the string lengths all but "graphics.library",0 is uneven. I'm not sure storing the screen pointer within the taglist for opening the window is a good idea because I don't know whether the OS will preserve these arguments you pass to OpenWindowTagList() and you will need the screen pointer for closing the screen upon exit. Because yes, AmigaOS does not have any resource tracking which is why you must free all allocated resources, close all libraries and so on. Another thing is that you might want to use the exec includes and then use "_LVOOpenLibrary" instead of "-552" and so on.
 
  But you are clearly getting there. If you get the timer.device into the mix and calculate a nice FPS counter, you will soon see that disable()/enable() doesn't make that much of a difference (unless the user is running a raytracing job in the background which would clearly be user error...).
 

Nice catch on the missing library version. Looks I cleaned the code up too much :- )))))
I included the exec_lib from the github/deplinenoise and used the _LVOOpenLibrary
Error handling will have to wait for now (but will be implemented later for sure), I think I've had enough OS stuff in last 2 weeks and should start porting the engine :)

Here's the current Initialization section:

  include intuition_lib.i  ; EXTERNAL LINK    include exec_lib.i  ; EXTERNAL LINK 
  include exec/execbase.i
  include intuition/screens.i ; NDK 3.9
  include cybergraphics.i  ; NDK 3.9
  include cybergraphics_lib.i ; EXTERNAL LINK 
START:
    ; Open graphics.library
    move.l 4,a6
    move.l #graphicsLibName,a1
    move.l #39,d0    ; Version
    jsr  _LVOOpenLibrary(a6)  ; Open library
    move.l d0,graphicsBase
    ; Open CyberGraphX
    move.l 4,a6
    move.l #CyberGraphXLibName,a1 ; Name
    move.l #41,d0    ; Version
    jsr  _LVOOpenLibrary(a6)  ; Open library
    move.l d0,CyberGraphXBase  ; Store Ptr
    ; Open Intuition
    move.l 4,a6
    move.l #IntuitionLibName,a1 ; Name
    move.l #39,d0    ; Version
    jsr  _LVOOpenLibrary(a6)  ; Open library
    move.l d0,IntuitionBase2  ; Store Ptr
    ; Open Dos
    move.l 4,a6
    move.l #DosLibName,a1  ; Name
    move.l #39,d0    ; Version
    jsr  _LVOOpenLibrary(a6)  ; Open library
    move.l d0,DosBase    ; Store Ptr
  ; Requester
    move.l CyberGraphXBase,a6
    suba.l a0,a0
    lea.l requestertags,a1
    jsr  _LVOCModeRequestTagList(a6)
    move.l d0,mode_insert+4  ; store into screen taglist
  ; Open Screen
    suba.l a0,a0
    lea.l screentags,a1
    move.l IntuitionBase2,a6
    jsr  _LVOOpenScreenTagList(a6)  ; open the screen
    move.l d0,screen
    add.l #sc_RastPort,d0
    move.l d0,rastport ; save rport address
  ; Open Dummy Window
    suba.l a0,a0
    lea.l windowtags,a1
    move.l IntuitionBase2,a6
    jsr  _LVOOpenWindowTagList(a6) ; open a dummy window...
    move.l d0,window





Kamelito Loveless

Posts 260
15 Jan 2020 20:00


I was able to assemble it, it work fine under 320x240 16 and 24bits but I've an empty screen in 8bit.
  I was about to tell you about exec_lib.i but see that you resolved it.
  There's a problem about the release of the allocated resources as after 3/4 launches I got "not enough memory available".

update after more test I can't repro the memory problem, maybe it is due to the fact that when launched under winuae you can choose 8/16/24 bit while your code is aimed at 24bits. No enforcer hit, no memory loss when I choose 24bits.



Vladimir Repcak

Posts 359
16 Jan 2020 01:52


Kamelito Loveless wrote:

  I was able to assemble it, it work fine under 320x240 16 and 24bits but I've an empty screen in 8bit.

  Correct, the 8-bit isn't supposed to work, I just couldn't easily get rid of it, only later figured that if I adjust MinDepth at the requester tags to 24 bits, then the 8-bit and 16-bit disappear from the dialog, leaving user with choosing only the 24-bit res.
 
 
Kamelito Loveless wrote:

    There's a problem about the release of the allocated resources as after 3/4 launches I got "not enough memory available".
 
Interesting. I can usually run it 50 times or more and haven't encountered that message. Perhaps my config in UAE is to blame (I guess I gave it too much RAM).
 
  Are you using the first or the second version? Because first one didn't do any clean-up, only the second one does. But you'd have to replace the exit portion manually, as I didn't upload full code (I figured only the differences would be enough).
 
  I am running avail from commandline - it gives chip/fast available/in-use break-down. It appears, this leaks 432 Bytes per run.
 
  I guess there are some other things to release other than window/screen and close all libraries ?

EDIT:Also, the first version I pasted here didn't have rts at the end :)
So, if that's what you ran, then it surely leaked a lot :)


Vladimir Repcak

Posts 359
21 Jan 2020 05:35


Screenshot from emulator:
  EXTERNAL LINK 
  Trying to use the img tag to embed the file, not sure if it will work for external paths:
  [img=https://pasteboard.co/IQWDYwV.png]
 
  Took some time to implement the workarounds for the vasm idiosyncrasies (quite different from my previous assembler), but now all Higgs language features are finally compiling under vasm.
 
  I spent couple days implementing the Radiosity lightmapper as there is no better way to test the 24-bit color space:
  - full 24-bit precision
  - lights are smoothly merged
  - currently there's 2 lights in the scene, but there's no actual limit
  - the lightmaps are generated at run-time
  - any light can have any RGB color (16.7 Mil)
 
  Texturing is implemented using generic 68000 code
  - Inner loops are completely running using just the registers
  - code is fully integer - there is no floating point
  - Inner scanline loops are touching RAM only for texel read and pixel write
  - scene is currently axis-aligned - so while 3d mesh can be relatively generic, best to keep quads axis-aligned
 
  This code could be relatively easily extended into that Star Wars Tunnel demo scene we talked about earlier.
 
 


Nixus Minimax

Posts 416
21 Jan 2020 07:54


Vladimir Repcak wrote:

Screenshot from emulator:
  https://pasteboard.co/IQWDYwV.png

This looks pretty nice!

- code is fully integer - there is no floating point

That's a pity because the 080 has such an amazingly fast FPU. I guess you will be able to make good use of the FPU for point projection and transformation stuff.



Vladimir Repcak

Posts 359
21 Jan 2020 09:52


Nixus Minimax wrote:

  This looks pretty nice!
Thanks. That's Radiosity. It accounts not only for a direct lighting but also indirect (bounced off walls). I believe this particular scene has 95% energy threshold - meaning it's been bouncing off the energy till 95% got redistributed.

Then I store the resulting form factors, so I only need to do a single linear multiply+add pass (at runtime) over the texture to get the final result, yet it's possible to change the light color, intensity and overall scene brightness.

Each additional light, because it's 24-bit color space is merely added together, which is very fast (should be one AMMX op, really).

The greatest usage of this would be, obviously, for an FPS shooter - imagine classic Wolfenstein with such colored lighting ;)

Another use case could be top-down 3D RPG, like Dungeon Siege or Torchlight or Diablo3. Especially Diablo3 could use the 24-bit color space for its height-based fog (alongside proper wall lighting)...

Nixus Minimax wrote:

 
- code is fully integer - there is no floating point
 

  That's a pity because the 080 has such an amazingly fast FPU. I guess you will be able to make good use of the FPU for point projection and transformation stuff.

Yeah, this is my first code that runs on emulator - I spent about 3 days updating my Higgs compiler (for the vasm differences) and then about 3 days to write this, so not too bad.

AMMX would be really good to use here - for the RGBA processing in single instruction. The computation of lightmaps will be certainly greatly accelerated.

I suspect, I should be able to use emulator for the floating-point instructions, right ? Meaning - at least 68040 FP instructions ? I never really used FP in Asm, as Jaguar's interpretation of FP is, suboptimal at best, hence I always rewrote each algorithm in two versions - fixed-point and then integer.


Vladimir Repcak

Posts 359
21 Jan 2020 10:00


Here's an example of the inner loop for the horizontal scanlines:

    Higgs:
  loop (lpMain = xlVisible)
  {
  idxPixel = idxCurrent >> BitShiftR
  idxPixel <<= #2
  texPtr = texPtrStart + idxPixel
  (vidPtr)+ = (texPtr)
 
  idxCurrent += xpAdd
  }


    ASM output:
loop_10_start:
  move.l d3,d4
  lsr.l d5,d4
  lsl.l #2,d4
  move.l a3,a2
  add.l d4,a2
  move.l (a2),(a0)+

  add.l d2,d3
dbra d1,loop_10_start

An FP version would be able to compute the Texel index in parallel.

And, a third version - using the internal texturing unit should be even faster :)

Of course, the example above is axis-aligned, so won't work for generic angled surfaces, but you can still make lots of games with that.


Don Adan

Posts 38
21 Jan 2020 10:29


Vladimir Repcak wrote:

Here's an example of the inner loop for the horizontal scanlines:
 

      Higgs:
  loop (lpMain = xlVisible)
  {
    idxPixel = idxCurrent >> BitShiftR
    idxPixel <<= #2
    texPtr = texPtrStart + idxPixel
    (vidPtr)+ = (texPtr)
   
    idxCurrent += xpAdd
  }
 
 

 
 

      ASM output:
  loop_10_start:
  move.l d3,d4
  lsr.l d5,d4
  lsl.l #2,d4
  move.l a3,a2
  add.l d4,a2
  move.l (a2),(a0)+
 
  add.l d2,d3
  dbra d1,loop_10_start
 
 

 
  An FP version would be able to compute the Texel index in parallel.
 
  And, a third version - using the internal texturing unit should be even faster :)
 
  Of course, the example above is axis-aligned, so won't work for generic angled surfaces, but you can still make lots of games with that.

perhaps you can use
move.l (a3,d4.l*4),(a0)+
for replacing 4 instructions


Vladimir Repcak

Posts 359
21 Jan 2020 10:42


Don Adan wrote:

   
   
wrote:

        ASM output:
    loop_10_start:
    move.l d3,d4
    lsr.l d5,d4
    lsl.l #2,d4
    move.l a3,a2
    add.l d4,a2
    move.l (a2),(a0)+
   
    add.l d2,d3
    dbra d1,loop_10_start
   

   
    An FP version would be able to compute the Texel index in parallel.
   
    And, a third version - using the internal texturing unit should be even faster :)
   
    Of course, the example above is axis-aligned, so won't work for generic angled surfaces, but you can still make lots of games with that.
 


  perhaps you can use
  move.l (a3,d4.l*4),(a0)+
  for replacing 4 instructions
 

  Yes, Indirect addressing should remove quite a few ops.
 
  On Jaguar, I was occasionally having hard-to-debug issues with the (const, An, Xn.s) displacement modes and mostly used just the simplest (const,a0) displacement.
 
  But, this is a different platform, so it's worth trying it out.
 
  Thanks for pointing this out !


Gunnar von Boehn
(Apollo Team Member)
Posts 6207
21 Jan 2020 11:02


Nice progress!
Congratulations!

I wonder if going for 15/16bit screenmode might be smart decision.
What do you think?

For games maybe more FPS has more value than slightly finer more color shades..
What do you think?

Would you like to change the test engine to 15/16bit?


posts 429page  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22