Overview Features Coding Performance Forum Downloads Products OrderV4 Contact

Welcome to the Apollo Forum

This forum is for people interested in the APOLLO CPU.
Please read the forum usage manual.



All TopicsNewsPerformanceGamesDemosApolloVampireAROSWorkbenchATARIReleases
Information about the Apollo CPU and FPU.

68080 "Transistor Count"page  1 2 

John Heritage

Posts 111
15 Mar 2017 20:15


heyden m wrote:

Something interesting about the PIII was that regardless of the increase in transistor count from revision to revision actual performance was about the same clock for clock.
 

First - agree 68080 Performance is amazing and would be amazing on an ASIC process.  I'm curious what kind of cache sizes the designers would want to settle on for a given ASIC process.  I'm also very curious how the FPU would compare. 

Now the PIII had 3 different major revisions -- each scaled a little differently.

"First generation" PIII "Katmai" (250nm) had an external cache of 512KB running at 1/2 of the CPU speed.  This was true for 450-600 mhz P3's. 

2nd Gen - "Coppermine" (600-1.10 GHz, 1.13 GHz was recalled, 180nm) had full-speed 256KB cache on-die.  This on-die but smaller cache improved integer performance about 10% at the same clock speed vs. 1st gen.

3rd Gen - "Tualatin" (1.0-1.4 GHz, 130nm) had a 512KB on-die full speed cache and also added memory prefetching.  The combo of these two added another 5%-10% performance per clock IIRC.

Moving forward, per clock/thread performance has actually increased pretty substantially since P3.  Athlon K8 (~2003) was probably 20% faster per clock than P3,  Core 2 (~2006) was up another 30-40% over K8 per clock (2.4 GHz Core 2 > 3.2 GHz K8).  From 2006 to today (Skylake, Kabylake) we're probably up another 60-70% per clock per thread due to wider execution resources, better caches/algorythms, the on-die memory controller, faster memory, and just general brute force execution techniques. 

All up, I'd guess a modern Core i3/5/7 is now executing somewheres around 2.5x per clock on integer code vs. a "coppermine" Pentium 3 per thread.


Heyden M

Posts 7
16 Mar 2017 21:05


Hi John,

I am talking in a roundabout general way… if that makes sense. So that was the PIII revisions… still I don’t think there was much in the way of performance gain clock/clock through those revisions but I can see now transistor count came from cache.

We are talking about something that can be very subjective… CPU performance… as benchmarks are kind of like statistics. Theoretic performance of modern processors are huge but reality is more and more not even close to theoretical. Part of this (a big part) has to do with software optimisation. The other part is that CPU architecture is being built to benchmark more and more so. I mean, if you look at the Xeon line they have theoretically huge memory bandwidth that SHOULD crush Power8 however in my experience the Power8 destroys the Xeon in memory reliant applications (maybe this is to do with software optimisation though… I will concede this.

Also, theoretical ILP has increased… probably you’re right… by 2 or 2.5 times from PIII. However… this is actually inclusive of all this hyper threading, multicore etc  etc nonsense with a billion transistors. And in actual reality… most modern processors typically run at half ILP so we are back at where we started. Also… with having on board cache like the Xeon that is so large many benchmarks can be fooled.

Another thing is it is very tough to vectorise many applications which would actually give huge speedups.

I will agree with you that CPU’s are faster now but also that they are not really faster :)



Mo Retro

Posts 239
18 Mar 2017 15:04


If the Apollo Team ever decide to put the Apollo 68080 in an ASIC, would that imply that we'll have 2 or more discrete parts the Apollo 68080 ASIC CPU & The rest of Vampire core with DIGITAL-VIDEO, SD, IDE, SPI, etc..?
 
  Tekmos is offering a replacement of a FPGA by a low cost ASIC.
  EXTERNAL LINK 
  They even offer a FPGA to ASIC Conversion Break-Even point:
  EXTERNAL LINK 
  Would be very interesting to see if it's feasible :)
 


John Heritage

Posts 111
18 Mar 2017 18:52


Hi Heyden --

Understood on the P3 revisions..  Yes is interesting how even 10-15% gains in IPC through ILP can require 2-4x increases in transistors..  We're seeing a lot of that in recent processors unfortunately.

Actually, the 2-2.5x from P3 per clock is per (real) core, exclusive of multicore.  An i7-6900K is ~2.5x faster than 8 separate Pentium 3 cores running @ 3.5 GHz in compute applications. 

Here's an example:
Pentium 4 - 3.9 GHz -  Cinebench R15 score = 61
i7-6900K - (3.6-4.0 GHz) - Cinebench R15 = 185 (1 core), 1578 (all cores).   

That's 3x faster than the Pentium 4 per clock (and the later P4 generation was ~ 25% slower per clock than P3), and then an additional speedup of 8.5x from the additional cores and threads (1-->16).

A lot of the apparent performance we've lost is through further and further abstraction imo.  My favorite example is today's demo compos that are "64K"..  they only run after 1 gigabyte worth of Windows OS loads, followed by hundreds of megs of GPU drivers..  then look at the magical "64K" of code go :-P

Xeon vs Power 8 is a good topic.  I think it's a combo of the Power8 having an all out design for cache/cores, and also a monster FPU..  Will be interesting to see if this changes in the future.  I applaud IBM for keeping up the fight there..


John Heritage

Posts 111
18 Mar 2017 18:53


Mo Retro wrote:

If the Apollo Team ever decide to put the Apollo 68080 in an ASIC, would that imply that we'll have 2 or more discrete parts the Apollo 68080 ASIC CPU & The rest of Vampire core with DIGITAL-VIDEO, SD, IDE, SPI, etc..?
 
  Tekmos is offering a replacement of a FPGA by a low cost ASIC.
  EXTERNAL LINK   
  They even offer a FPGA to ASIC Conversion Break-Even point:
  EXTERNAL LINK   
  Would be very interesting to see if it's feasible :)
 

That's an interesting question..  I would assume the Apollo team would go for an all inclusive SoC ASIC, but anything is possible..  They could probably even license a low power ARM SoC very cheaply and bolt onto that IP if appropriate/feasible.. 


Heyden M

Posts 7
19 Mar 2017 01:47


Hi John,
 
Using your specifics... and to preface all this I have to say I don't know anything about Cinebench... I will lay it out like this
 
P3 is 25% faster per clock than P4 (reasonable estimate)
 
i7-6900 has thyperthreading so divide by 2
 
So, we get new score of
PIII @ 3.9Ghz - 76
i7-6900K 3.6-4.0 (single thread single core) - 92.5 (from single core score)
 
Approximate 20% improvement.
 
i7 has SSE2;3;SE3;4, AVX;2 etc & i7 has huge L3 cache.
 
P3 had very slow FSB and slow memory i7 has huge FSB & fast memory. P3 is 32bit & i7 is 64bit.
 
Above is like comparing banana and apple but it's actually reasonable is a general way.
 
The point I make is that new architecture is pathetic considering all the improvements that have occurred since P3.
 
Absolutely agree with you about abstraction.


John Heritage

Posts 111
21 Mar 2017 19:17


heyden m wrote:

  i7-6900 has thyperthreading so divide by 2
   

You're halving the 6900K unnecessarily.  The 185 score is for Cinebench R15 running the benchmark with a single thread only.  The 1578 wasn't an extrapolation but rather how the processor performed with all 16 threads active.

Agreed on the grand scheme of things, Intel has seen single core IPC improvements from 1999 to 2017 equal about the same level of IPC improvements Motorola saw from a 68030 to 68040.  OTOH the 68040 took the very low hanging fruit of "pipeline everything to execute something every clock", and you can only do that once. 

When including multiple cores, improved memory controllers, etc..  That 61 "4 Ghz Pentium 4" of 2005 turning into 1578 of a i7-6900K in 2016 doesn't look too bad..  ~ 26x improvement.  2 Year Moores Law would say 11 years = 2^5.4 = 44.8x so it's in the ballpark.


Heyden M

Posts 7
23 Mar 2017 09:39


6900K is running 16 threads... P4 or P3 runs 1. I don't see how this is incorrect to divide by 2. My point is thread/thread, clock for clock. In that regard, I still think I am correct.

The current idea is to throw more cores at the problem and create more exotic CPU function... reality is this is not actually viable for much general computing use and many computing problems. There are times I must disable hyper threading and fix affinity to get improved performance for compute tasks. This is partly due to coding, compiler issues, hardware and OS issues. I'd just say this rabbit hole is very deep.

Also, for many series compute problems I am still keep earlier Xeon X53/54 series as for some reason they solve problems faster than E5/7 series. Only downside is huge sauna like heat from the compute room.

I explain a comparison. Creating single system image across multiple hardware with all things being equal, old X53/54 destroys newer E5/7 on series compute problems. And, on the older hardware all has unbalanced memory because of cost of sourcing so much "old" ram. OS is same, code is same, interlink is the same. How can this be? I think it is not just CPU... there is different architecture at play... but still very much disappointing for such huge hype and super high benchmarks. Only advantage is lower electricity consumption.

I can see though, CPU utilisation on E5/7 is terrible and failure to throttle up is an issue (perhaps linked to utilisation?). I do notice this is task dependant.

Anyway, so much talk about Intel.

I am waiting for 'complete' 68080 core standalone.


John Heritage

Posts 111
23 Mar 2017 16:47


heyden m wrote:

6900K is running 16 threads... P4 or P3 runs 1. I don't see how this is incorrect to divide by 2. My point is thread/thread, clock for clock. In that regard, I still think I am correct.
 

Because the score I provided is for the benchmark running in single thread only mode. 

posts 29page  1 2