Overview Features Coding ApolloOS Performance Forum Downloads Products Order Contact

Welcome to the Apollo Forum

This forum is for people interested in the APOLLO CPU.
Please read the forum usage manual.
Please visit our Apollo-Discord Server for support.



All TopicsNewsPerformanceGamesDemosApolloVampireAROSWorkbenchATARIReleases
Information about the Apollo CPU and FPU.

Instruction Fusing

Nixus Minimax

Posts 416
17 Feb 2016 14:27


The apollo core supports a feature that increases the number of instructions executed per clock. While the core is superscalar which means that it can execute two instructions per clock cycle if the destination operand of one instructions isn't the same as the source operand of the subsequent instruction, it can also bundle two instructions to be executed as a single instruction. This is possible because the ALU of the apollo core is internally a 3-operand ALU while the 68k only has 2-operand code:
 
68k: add.l d0,d1 means you add the number in register d0 to the number in d1 and store the result in d1 (in C syntax: d1 = d0 + d1;)
 
3-operand code: add.l d0,d1,d2 means you add the number in d0 to the number in d1 and store the result in a third register d2 (or d1 if you wanted to do exactly what add.l d0,d1 does on a 68k processor). In C syntax: d2 = d0 + d1;
 
In addition to this the apollo ALU has internally more operations than officially supported by the 68k.
 
We can now exploit these two internal "extras" for instruction bundling and "fuse" two instructions into a single (internal) instruction.
 
Example:
 
  move.l d0,d2
  add.l  d1,d2
 
In C syntax:
 
d2 = d0 + d1;
 
If you check again what I wrote about 3-operand code, you will see that this is precisely what the single 3-operand instruction add.l d0,d1,d2 does! The apollo core recognises such bundles of 68k instructions and executes them together in a single clock cycle. This is not the same as standard superscalar execution because the second 68k instruction depends on the result of the first instruction and thus could not be executed in a single cycle on the 68060.
 
Since apollo is superscalar, it can execute these two instructions in addition to yet another instruction or bundle of instructions increasing instructions per clock dramatically.
 
This also means that in order to optimise code for the apollo you would sometimes get even better results by not separating instructions that depend on each other. While on an 060 you would try to fit an extra independent instruction between the two instructions mentioned above, you should not do so on the apollo and just leave it to the core to execute the two instructions together.
 
 


Gunnar von Boehn
(Apollo Team Member)
Posts 6214
17 Feb 2016 19:06


APOLLO SILVER2 CORE

supported fusing instruction combinations:


(1)
MOVE.L (An)+,(Am)+
MOVE.L (An)+,(Am)+
=>
MOVE.Q (An)+,(Am)+

(2)
MOVE.B (d16,An),Dn
EXTB.L Dn
=>
MVS.B  (d16,A0),Dn

(3)
MOVE.W (d16,An),Dn
EXT.L  Dn
=>
MVS.W  (d16,A0),Dn

(4)
MOVE.L Dn.Dm
NOT.X  Dm

(5)
MOVE.L Dn.Dm
NEG.X  Dm

(6)
MOVE.L Dn.Dm
ADDQ.X #,Dm

(7)
MOVE.L Dn.Dm
SUBQ.X #,Dm

(8)
MOVE.L Dn.Dm
ANDI.W #,Dm

(9)
MOVE.L Dn.Dm
OR.X  Do,Dm

(10)
MOVE.L Dn.Dm
AND.X  Do,Dm

(11)
MOVE.L Dn.Dm
ADD.X  Do,Dm

(12)
MOVE.L Dn.Dm
SUB.X  Do,Dm

(13)
MOVEQ  #,Dn
OR.X  Dm,Dn

(14)
MOVEQ  #,Dn
AND.X  Dm,Dn




John G

Posts 1
18 Feb 2016 13:40


Hi,
  My english is bad. So, I will answer quickly. I think mnemonics mvs.b should be mvsb.l. As we have extb.l from motorola rules.
  Great work anyway.

Note: there is some 404 on your instruction part like: CLICK HERE


Krystian Baclawski

Posts 5
29 May 2016 15:03


Are there any plans to fuse conditional branches with next instruction to avoid branch misprediction? Many modern CPUs do that internally.

Example:

  cmp.l d0,d1
  blt.b label
  add.l d2,d3
label:
  move.l d3,(a0)

... could be fused to:

  cmp.l d0,d1
  add.l.lt d2,d3
  move.l d3,(a0)


Gunnar von Boehn
(Apollo Team Member)
Posts 6214
29 May 2016 15:13


Krystian Baclawski wrote:

Are there any plans to fuse conditional branches with next instruction to avoid branch misprediction? Many modern CPUs do that internally.
 
  Example:
 
    cmp.l d0,d1
    blt.b label
    add.l d2,d3
  label:
    move.l d3,(a0)
 
  ... could be fused to:
 
    cmp.l d0,d1
    add.l.lt d2,d3
    move.l d3,(a0)

Yes, APOLLO does support this already.
We call this feature conditional-rewrite.



Krystian Baclawski

Posts 5
29 May 2016 15:48


Awesome! I hope you'll share more details on instruction fusing / bonding / rewrite with us.


Gunnar von Boehn
(Apollo Team Member)
Posts 6214
29 May 2016 16:03


Conditional Rewrite avoid a Conditional branch.

The conditions for it are
1) Bcc
2) single cycle instruction

Long taking instructions like DIV are never Rewritten, as here the normal branchprediction could give the higher gain.

Fusing:
Fusing is the combination of 2 instruction into 1.
Condition: Both instruction need to share the same register write port.

Example:
MOVE.L #$1234567,D0
AND.L  D1,D0

Bonding:

Bonding rewrites instruction inputs and does some form of register renaming to avoid hazards.

Example:
move.l (A0),D0
add.l  D0,D1

Is bonded and internally renamed to
move.l  (A0),D0
add.l  (A0),D1

This "trick" allows to execute some form of depending instructions in single cycle without the normal dependency bubble.



Philippe Flype
(Apollo Team Member)
Posts 299
29 May 2016 18:18


As an addition, there are some more infos here, on the wiki :
 
  Bonding
 
  EXTERNAL LINK 

  Fusing
 
  EXTERNAL LINK 

posts 8