Recently I have been looking at performance of different computing machines over the decades, and how in the 70 years of British Computing History we have seen speed of operation increase, transistor count increase and cost decrease by several orders of magnitude.
Mathematician, and UCL lecturer Dr. Hannah Fry, recently hosted an excellent radio series on BBC Radio 4 "Computing Britain" - a 10 part series available as a podcast - as well as individual episodes.
It was the first episode "Electronic Brains" that triggered me into taking a closer look at some of the early British machines.
The first computers built in the 5 years immediately after World War 2 used thermionic valve (vacuum tube) technology, and consumed killowatts of power. Studying these machines, specifically EDSAC - revolutionary in the
late 1940s had a 512 word memory, and in terms of performance - about 600 instructions per second was all that could be
achieved, and this was mostly down to the fact that the ALU handled data
in a serial fashion - as you really cannot build a parallel 35 bit ALU
with just 1500 triode valves - the 1940's switch equivalent of the
Jumping forward 25 years to 1965 and the PDP8 - this was the first
of the mass-market "mini-computers". By this time digital hardware was
transistorised - using DTL (diode, transistor logic) - essentially
diodes were used to create the "OR" function, and a transistor was use
for the invert or "NOT" function - thus allowing the full range of
logic gate functions to be synthesised.
The first PDP 8 used about 1500 transistors (PNP germanium) and
about 3000 diodes. The engineers at DEC worked hard to get the
transistor count down - because back then a transistor cost about $2 or
$3 each - but falling rapidly - and Gordon Moore's law clearly illustrates this
The PDP8 used magnetic core memory - as was common at that time, and
it was the memory cycle time of 1.5uS that had the most influence on
the overall processing speed - allowing a typical 2 cycle memory reference instruction (Fetch, Execute) to run at 0.33 MIPS. Manufacturing core memory was very labour intensive -
so the whole 4K word machine sold in 1965 for $18,000 - at a time when a
new convertible VW Beetle cost $1750.
Ten years later, when the 6502 was created, the transistor price had
fallen by 2 orders of magnitude per decade, and the whole CPU could be
integrated on the one silicon die - allowing the 3510 transistor 6502 to
be sold for about $20. Smaller integrated transistors meant faster
operation - and so the 6502 could be clocked at 2MHz - allowing 1
million operations per second.
Another decade - now 1985, and the engineers at Acorn Computers were working on the
first ARM processor. Here a tiny British design team, took a radical
approach, that flew in the face of conventional cpu design wisdom, and
created a 32bit RISC processor with just 25,000 transistors. The ARM1 ran at 8MHz and delivered a performance of 4MIPS.
It's contemporary - the Intel 80386 used 275,000 - more than 10X the transistor count.
The ARM 1, first ran April 1985 - and here I believe was the start
of a revolution in computing devices. Intel continued to plug away at
their '86 architecture - with it's transistor count and power
consumption rapidly spiraling skywards.
By 1995 an Intel Pentium Pro used 5,500,000 transistors and a 307mm2
die whilst the ARM 700 still used a tenth of this number on a much
smaller die area. The bigger the die area, the more likely that there is
a defect, and this lowers the overall yield from the wafer. Hence the
price per die increases.
Intel's insistance of sticking to a 1976 architecture has cost them
dearly, both in terms of complexity, transistor count and cost. This
is why ARM processors now dominate the mobile computing market, plus
other low cost consumer and automotive markets.
Intel hit a brick wall around 2000, with their power greedy Pentium
4. I had a laptop at the time with a 3.06GHz P4 - which cooked your
legs when using it on your lap. It took Intel a further 8 years to
manoeuvre out of the P4 road block, and come out with their lower power
There has to be a way to reduce complexity - As Jean Claude Wippler stated:
"Four decades later, on a 2015-era 4-core 2.8 GHz i7 CPU with
its advanced pipelining and branch prediction, each of the cores can
process billions of instructions per second – with an optimising gforth
compiler for example, the “1000000000 0 do loop” takes around 2 seconds –
that’s 2 nanoseconds per loop iteration"
Well, as you know, the J1 Forth computer implemented as an open soft
core on a $10 FPGA can also achieve credible results - executing the
same billion empty loop "1000000000 0 DO LOOP " on an 80MHz J1b executes
in almost exactly 100 seconds. About 100nS per loop - not bad for a
device running 1 core and at 1/35th of the clock speed and a tiny
fraction of the power.
If the J1 could run at 2.8GHz it would do the task in 2.85 seconds -
only 2/3rds of the performance of the billion transistor Intel - What
are they doing with all those other transistors........?
Here we see that a transistor count of 1 billion is not the best way
to get a task done.
I am looking forward to exciting times