Saturday, December 12, 2015

The Powers That Be.....

Recently I have been looking at performance of different computing machines over the decades, and how in the 70 years of British Computing History we have seen speed of operation increase, transistor count increase and cost decrease by several orders of magnitude.

Mathematician, and UCL lecturer Dr. Hannah Fry, recently hosted an excellent radio series on BBC Radio 4  "Computing Britain"  - a 10 part series available as a podcast  - as well as individual episodes. 

It was the first episode "Electronic Brains"  that triggered me into taking a closer look at some of the early British machines.

The first computers built in the 5 years immediately after World War 2 used thermionic valve (vacuum tube) technology, and consumed killowatts of power.   Studying these machines, specifically EDSAC - revolutionary in the late 1940s had a 512 word memory, and in terms of performance - about 600 instructions per second was all that could be achieved, and this was mostly down to the fact that the ALU handled data in a serial fashion - as you really cannot build a parallel 35 bit ALU with just 1500 triode valves - the 1940's switch equivalent of the transistor.

Jumping forward 25 years to 1965 and the PDP8 - this was the first of the mass-market "mini-computers". By this time digital hardware was transistorised - using DTL (diode, transistor logic) - essentially diodes were used to create the "OR" function, and a transistor was use for the invert or "NOT" function - thus allowing the full range of logic gate functions to be synthesised.

The first PDP 8 used about 1500 transistors (PNP germanium) and about 3000 diodes. The engineers at DEC worked hard to get the transistor count down - because back then a transistor cost about $2 or $3 each - but falling rapidly - and Gordon Moore's law clearly illustrates this point graphically.

The PDP8 used magnetic core memory - as was common at that time, and it was the memory cycle time of 1.5uS that had the most influence on the overall processing speed - allowing a typical 2 cycle memory reference instruction (Fetch, Execute) to run at 0.33 MIPS. Manufacturing core memory was very labour intensive - so the whole 4K word machine sold in 1965 for $18,000 - at a time when a new convertible VW Beetle cost $1750.

Ten years later, when the 6502 was created, the transistor price had fallen by 2 orders of magnitude per decade, and the whole CPU could be integrated on the one silicon die - allowing the 3510 transistor 6502 to be sold for about $20. Smaller integrated transistors meant faster operation - and so the 6502 could be clocked at 2MHz - allowing 1 million operations per second.

Another decade - now 1985, and the engineers at Acorn Computers were working on the first ARM processor. Here a tiny British design team, took a radical approach, that flew in the face of conventional cpu design wisdom, and created a 32bit RISC processor with just 25,000 transistors. The ARM1 ran at 8MHz and delivered a performance of 4MIPS.

It's contemporary - the Intel 80386 used 275,000 - more than 10X the transistor count.
The ARM 1, first ran April 1985 - and here I believe was the start of a revolution in computing devices. Intel continued to plug away at their '86 architecture - with it's transistor count and power consumption rapidly spiraling skywards.

By 1995 an Intel Pentium Pro used 5,500,000 transistors and a 307mm2 die whilst the ARM 700 still used a tenth of this number on a much smaller die area. The bigger the die area, the more likely that there is a defect, and this lowers the overall yield from the wafer. Hence the price per die increases.
Intel's insistance of sticking to a 1976 architecture has cost them dearly, both in terms of complexity, transistor count and cost. This is why ARM processors now dominate the mobile computing market, plus other low cost consumer and automotive markets.

Intel hit a brick wall around 2000, with their power greedy Pentium 4. I had a laptop at the time with a 3.06GHz P4 - which cooked your legs when using it on your lap. It took Intel a further 8 years to manoeuvre out of the P4 road block, and come out with their lower power Atom devices.
There has to be a way to reduce complexity - As Jean Claude Wippler stated:

"Four decades later, on a 2015-era 4-core 2.8 GHz i7 CPU with its advanced pipelining and branch prediction, each of the cores can process billions of instructions per second – with an optimising gforth compiler for example, the “1000000000 0 do loop” takes around 2 seconds – that’s 2 nanoseconds per loop iteration"

Well, as you know, the J1 Forth computer implemented as an open soft core on a $10 FPGA can also achieve credible results - executing the same billion empty loop "1000000000 0 DO LOOP " on an 80MHz J1b executes in almost exactly 100 seconds. About 100nS per loop - not bad for a device running 1 core and at 1/35th of the clock speed and a tiny fraction of the power.

If the J1 could run at 2.8GHz it would do the task in 2.85 seconds - only 2/3rds of the performance of the billion transistor Intel - What are they doing with all those other transistors........?

Here we see that a transistor count of 1 billion is not the best way to get a task done.

I am looking forward to exciting times ahead.......

No comments: