Overview Features Coding ApolloOS Performance Forum Downloads Products Order Contact

Welcome to the Apollo Forum

This forum is for people interested in the APOLLO CPU.
Please read the forum usage manual.
Please visit our Apollo-Discord Server for support.



All TopicsNewsPerformanceGamesDemosApolloVampireAROSWorkbenchATARIReleases
Information about the Apollo CPU and FPU.

Link Stack Feature of Apollo Core - What Is It?

Niels Boehm

Posts 5
27 Aug 2016 08:58


The Wiki mentions a feature called "Linkstack" for the Apollo core:
http://apollo-accelerators.com/wiki/doku.php?id=apollo_core#features

I'm really curious what this is.

Is it something like the Link Register of RISC CPUs? That would be really awesome, because I always regarded the Link Register as one of the coolest features of RISC CPUs. Using the Link Register (and associated branch instructions), you can get rid of stack memory/cache accesses for pushing/pulling the return address of leaf routines (subroutines that don't call other subroutines). And you can write more independent and readable coroutines (subroutines that jump forth and back between each other at possibly different entry points).

If the "Linkstack" feature is something else, would you consider something like a Link Register (and associated branch instructions) in the future?



Gunnar von Boehn
(Apollo Team Member)
Posts 6207
27 Aug 2016 12:44


The Link register on RISC is nothing special.
You can do just the same on 68k already.
 
Example:

  LEA return(pc),A5
  JMP subroutine
return:
  then in subroutine you end with
 
  JMP (A5)
 

As you see using a LINK register is not helping much.
A LINK register avoids a memory access - but with a good cache like on the 68080 - the memory access is free anyway - so nothing gained.
 
 
 
 
The LINKSTACK is completely different and a lot more powerful.
The LINKSTACK will remember in a special cache the return addresses of subroutine calls. This will allow the CPU to do an RTS in 1 cycle.
LINKSTACK is a feature of modern CPUs (like Athlon etc)
A LINKSTACK will speed up all programs, also old ones.
 


Niels Boehm

Posts 5
27 Aug 2016 13:50


Gunnar von Boehn wrote:

 

  LEA return(pc),A5
  JMP subroutine
 

True, you can emulate most of the link register instructions by using the available registers. However, I find a single instruction for this one purpose more concise than 2 instructions. Also, the link register is implicitly loaded with the return address after the branch instruction in a RISC instruction set, which is less flexible, granted, but also has less redundancy in the common case.

Gunnar von Boehn wrote:

 

  return:
  then in subroutine you end with
 
  JMP (A5)
 

Yes, that will successfully emulate a "blr" (PowerPC) for returning from a subroutine, but you cannot easily emulate a "blrl" (branch to link register and link) without using up another multi-purpose register or accessing memory/stack. So doing coroutines (granted, it's a rarely used technique in machine language, but I can still come up with some possible applications) is still more elegant on an architecture with link register instructions.
 
Gunnar von Boehn wrote:

  As you see using a LINK register is not helping much.
  A LINK register avoids a memory access - but with a good cache like on the 68080 - the memory access is free anyway - so nothing gained.

Yes, I see your point. A link register wouldn't help performance-wise and it would likely just complicate the core unnecessarily implementation-wise.

I probably just like the link register too much, seeing some beauty in it ;)

Gunnar von Boehn wrote:

  The LINKSTACK is completely different and a lot more powerful.
  The LINKSTACK will remember in a special cache the return addresses of subroutine calls. This will allow the CPU to do an RTS in 1 cycle.
  LINKSTACK is a feature of modern CPUs (like Athlon etc)
  A LINKSTACK will speed up all programs, also old ones.

That's really awesome then and good to have, thanks for the explanation :)



Nixus Minimax

Posts 416
27 Aug 2016 15:05


A link register is okay if you have 32 registers. To me the concept never was convincing as most of the time you have to push the link register to the stack anyway and can't use the register for much else. So on ARM (16 registers) I found it more annoying than useful.
 


Niels Boehm

Posts 5
27 Aug 2016 15:36


Nixus Minimax wrote:

To me the concept never was convincing as most of the time you have to push the link register to the stack anyway and can't use the register for much else.

True, on those architectures without specialized caching the link register only saves you from accessing memory for the return address in leaf routines. But ultimately there will always be some low-level routines that don't call other routines (leaf routines), so it does help a little on those architectures.

Nixus Minimax wrote:

So on ARM (16 registers) I found it more annoying than useful.

Yeah, I see your point with ARM, as ARM's link register is one of the registers of the register file at the same time (r14, so two names for the same register, similar to how on 68k "A7" and "SP" are two names for the same register), in effect stealing a valuable register that you cannot use for other purposes while using it as link register.

On PowerPC, however, the link register is separate from the multi-purpose registers.



Thierry Atheist

Posts 644
27 Aug 2016 16:16


Nixus Minimax wrote:
A link register is okay if you have 32 registers.

Hi Nixus,

The AMMX (SIMD) will have 32 registers. The CPU also has 32 registers? Does the FPU have 32 registers too?

From what I understood, I think that the FPU on intel CPUs can't be used if the SSE capability is used, as those work with the same 8 (16?) registers.


Gunnar von Boehn
(Apollo Team Member)
Posts 6207
28 Aug 2016 05:54


The 68K instructions

BSR
..
RTS

Are very elegant, and when combined with BTC and LINKSTACK as fast as possible. So this is in my opinion a perfect solution already.

Regarding PowerPC, there is one idea in Power which is if used properly works nice. Power implements not 1 FLAG register but several FLAG registers. This allows precomputing of branch conditions and if used correctly can, depending on the algorithm, allow you to avoid branch mispredictions. I think that this is a nice feature. And for some algorithms it helps a lot. Adding this feature to the 68k architecture could be considered.
 

posts 7