We can't explain the universe, just describe it; and we don't know whether our theories are true, we just know they're not wrong. >Harald Lesch
What we do know for a fact is that the cores are now thinner, but a lot more of them can be crammed into the same space. If the leaked GTX 680 benchmarks are legitimate, this is a good thing; it means that performance per watt (and per mm2 die area) went up.
Newer architecture is of course better because newer (better) instructions are being used.
Back to Cent's objection: Cent doubts that "more cores" > "better cores" because P = R*I^2.
And apparently this is in fact a simplified (wrong) formula in this case.
Wikipedia states P = C * V^2 * f where
P = Power
C = Capacitance of the CPU
V = Voltage
f = frequency
This means that power draw is linear to frequency drop. HOWEVER, the article states that when you lower the frequency the voltage requirement
also drops. They do not explain the relation between f and V however, so we can't tell the exact power changes in relation to frequency. What we can say is that it's higher than linear which makes the initial statement that "more cores" > "better cores" true (under the assumption a given algorithm can be parallelized well enough).
So looking at the development of computing hardware we've seen better cores (as opposed to more cores) for a very long time simply because it's much easier to write algorithms for serial computation than one's suited for parallel processing. However a few years ago we hit a roadblock which prevented clock speeds from going higher.
Roadblock explained
Everyone in the industry knew this would eventually happen, but no one was really sure when. It ended up happening earlier than expected -- I think the industry consensus was that it was going to top out at about 10 GHz, which is why Intel made a huge (and ultimately failed) bet on the "Netburst" architecture. It was designed to run optimally at about 6 GHZ, but they never got that high, and at low clock rates it made too many concessions.
They're up against physical limits: the speed of light, Planck's constant, the size of atoms, and a couple of others. One big problem is tunneling, quantum leakage. As transistors get smaller, and as you use smaller voltages, there's a greater and greater chance that an electron will jump from the source to the drain even when the FET is "off". The smaller the FET, the more of that you'll see.
You can prevent that by using higher voltages. If the hill is taller, the chance of an electron tunneling is lower. But if the voltage is higher, then it means you have to use more charge in the gate, which makes the switching time slower. But if you don't do that, eventually there comes a point where the quantum leakage approaches the level of a normal signal, and then you can't tell if the FET is "on" or "off".
Also, using a higher voltage means you use more power, and cooling is a real issue.
We haven't outright topped out yet; it's still possible to make more gains. But we're near the limit of what's possible with MOSFETs. And we're also near the limit of what we can buy with making the devices smaller. Right now it's down to the point where some insulating layers are less than 10 atoms thick. At a certain point when you're trying to get smaller, you start running into granularity issues, and we're near that point.
There are two alternate approaches which could conceivably yield vastly higher switching rates, but both are radically different than anything we're currently using. One is light gates. (I don't know what the official name of this is.) The other is Josephson Junctions. There has been research into both for decades, but neither is remotely close to being ready for prime time. (A "big" device for either right now is 50 gates. There's a non-trivial issue scaling them up, not to mention the weird operational environment needed for Josephson Junctions.)
However, a clock rate stall in MOSFET technology doesn't mean that processors will cease to increase in compute power. There's a lot that can be done in terms of architectural changes to increase compute power without requiring increased clock speeds. Increasing parallelism is the ticket, and that's why dual-core and quad-core processors are becoming more and more common. But there are other things, too.
(source)This means we have no choice but to make CPUs more efficient and increase parallel processing until new technology is available.
Addressing the question in the OP:
Kepler having lower per-core performance than previous generations probably stems from multiple reasons:
- difficult (= time consuming) to max out the power of a gpu right away without running into problems (heat, reliablilty, power draw, etc.)
- marketing: Later develop "improved" Keplers and sell them for more
- power efficiency (which is hopefully established now)
Post has been edited 1 time(s), last time on Mar 18 2012, 10:01 pm by NudeRaider. Reason: typo