nVidia Kepler Benchmarked (Topic)

Staredit Network > Forums > Technology & Computers > Topic: nVidia Kepler Benchmarked

nVidia Kepler Benchmarked

Mar 14 2012, 8:47 pm
By:

Aristocrat

Pages: 1 2 >

Mar 14 2012, 8:47 pm

Aristocrat Post #1

First up is a mobile chip, though.

For reference, here are the specs of 640M, the previous-gen part that its performance is on par with, the GT 555M, and the GT 560M benchmarked in the Anand article:

GT 555M: 96 CUDA cores, 753/900MHz core/memory clocks, 128-bit memory bus
GT 640M: 384 CUDA cores, 625/1800MHz core/memory clocks, 128-bit memory bus

GT 560M: 192 CUDA cores, 775/1250MHz core/memory clocks, 192-bit memory bus

This worries me slightly: nVidia quadrupled the cores, dropped the clocks slightly, and the performance didn't increase substantially? To complicate matters further, the 560M, which has half as many cores, consistently outperforms the 640M.

From the little information that we can glean from this, it seems that they are now taking the approach of using a lot of shitty cores in a GPU as opposed of using a fewer number of more fat cores (the former is what AMD/ATI did with their GPUs). What's up with that? I was expecting higher per-core performance with Kepler than Fermi, not the other way around.

Well, as long as performance/watt and performance/price stay good, I have no qualms about the decreased per-core performance.

None.

Mar 16 2012, 4:13 am

rockz Post #2

ᴄʜᴇᴇsᴇ ɪᴛ!

certainly you must understand that it is more efficient to run more cores under ideal circumstances.

lower core performance means lower power draw. As you increase the frequency and voltage of the core, the power increases by a square (P=R I^2). By adding very efficient cores, you are just adding, not squaring. GPUs better use multiple cores too, so it's a very good plan there. CPUs are where single thread performance is very useful, due to poor design.

"Parliamentary inquiry, Mr. Chairman - do we have to call the Gentleman a gentleman if he's not one?"

Mar 16 2012, 3:58 pm

Centreri Post #3

Relatively ancient and inactive

Rockz, I'm pretty sure that what you just said was inaccurate technobabble. I'm sure that NVidia knows better than Aristcrat and his criticisms have little merit, but, still... technobabble.

None.

Mar 16 2012, 11:11 pm

NudeRaider Post #4

We can't explain the universe, just describe it; and we don't know whether our theories are true, we just know they're not wrong. >Harald Lesch

As far as I can tell rockz hit the nail on the head. Why do you think what he said is inaccurate?

Mar 17 2012, 1:26 am

Aristocrat Post #5

Quote from

rockz

certainly you must understand that it is more efficient to run more cores under ideal circumstances.

I'm simply surprised by this architectural overhaul; historically, nVidia GPUs have retained relatively low core counts (relative to ATI/AMD), but kept on-par or higher overall gaming performance due to the higher performance per core, so I wonder what motivated this change to using more, weaker cores. Better compute throughput? Doesn't seem like it.

None.

Mar 17 2012, 5:43 pm

NudeRaider Post #6

We can't explain the universe, just describe it; and we don't know whether our theories are true, we just know they're not wrong. >Harald Lesch

Better power efficiency (as rockz pointed out). When I'm not mistaken nVidia GPUs were usually hotter than ATi.

Mar 17 2012, 7:29 pm

Centreri Post #7

Relatively ancient and inactive

The end result may be accurate (I wouldn't know), but his physical justification for it seems like balderdash. More cores > better cores because P = RI^2? Why are you squaring current, and using THAT as a justification? Why isn't computing more parallelized, if parallelization is N^2 as efficient? If what he said is true and what Aristocrat said about Nvidia's tendencies to go for fewer cores is true, how did all of those many, many engineers make such a stupid mistake as that, whereas those at ATI didn't?

I am of the opinion that rockz is grossly oversimplifying the technical issues at hand to the point of falsification. Like saying that there are night and day cycles around the world because the sun orbits the earth.

Plus, he said that 'CPU's are were single-threaded is useful, due to poor design' - and that is, again, nonsense. Not every bit of code can be parallelized, and even for the bit of code that can be, it's more expensive and difficult to do so. It's not due to any 'poor design'.

Post has been edited 1 time(s), last time on Mar 17 2012, 7:36 pm by Centreri.

None.

Mar 17 2012, 7:49 pm

NudeRaider Post #8

We can't explain the universe, just describe it; and we don't know whether our theories are true, we just know they're not wrong. >Harald Lesch

Quote from

Centreri

Simple math. As you hopefully know: P = U * I and U = R * I
replacing U with R * I you get P = R * I * I

Quote from

Centreri

Why isn't computing more parallelized, if parallelization is N^2 as efficient?

Because usually you can't (the exception being some graphical applications). Traditionally, computer software has been written for serial computation where you can only continue to compute if you have the result of a previous calculation.

Mar 17 2012, 7:52 pm

Centreri Post #9

Relatively ancient and inactive

... Okay, so what does the ratio of power to current have to do with anything? I may as well define something as the square root of current, plug it in, and say OH HEY LOOK POWER IS PROPORTIONAL TO X^4!

Why not Voltage? Or, hell, resistance? Or Power Squared?

Post has been edited 1 time(s), last time on Mar 17 2012, 9:46 pm by Centreri.

None.

Mar 18 2012, 12:57 am

Moose Post #10

We live in a society.

Quote from

Centreri

Power = Voltage * Current
Voltage = Resistance * Current (Ohm's Law)
Therefore: Power = Resistance * Current^2

http://en.wikipedia.org/wiki/Electric_power
http://en.wikipedia.org/wiki/Joule%27s_laws
Physics was not invented arbitrarily in this topic.

As for the portions applied to computer science, I don't have justfication for those claims. Only that one equation.

Post has been edited 1 time(s), last time on Mar 18 2012, 1:02 am by Mini Moose 2707.

https://www.collaborativefund.com/blog/how-this-all-happened/
https://thelastpsychiatrist.com/2012/11/hipsters_on_food_stamps.html
https://youtu.be/vyiXaCRwZTs
https://thelastpsychiatrist.com/2011/09/how_to_be_mean_to_your_kids.html
http://lab.cccb.org/en/renata-avila-the-internet-of-creation-disappeared-now-we-have-the-internet-of-surveillance-and-control/

Mar 18 2012, 1:20 am

Centreri Post #11

Relatively ancient and inactive

Moose.

My point is that just because power is proportional to current squared, it does not imply that power is proportional to clock speed squared (or whatever silly thing was being argued).

I did not need more links to Ohm's Law, thanks.

None.

Mar 18 2012, 2:13 am

Aristocrat Post #12

What rockz said might apply to the exact same cores running at different voltages and speeds; ignoring scalability problems, 500 Fermi cores at 350 MHz and low voltage will use less power than 250 of those same cores at 700MHz while providing the same net performance. However, Kepler is a different microarchitecture on a different lithography process from Fermi. Power consumption due to different frequencies/voltage is not comparable between those two. What we do know for a fact is that the cores are now thinner, but a lot more of them can be crammed into the same space. If the leaked GTX 680 benchmarks are legitimate, this is a good thing; it means that performance per watt (and per mm² die area) went up.

None.

Mar 18 2012, 4:36 am

rockz Post #13

ᴄʜᴇᴇsᴇ ɪᴛ!

Quote from

Centreri

There are physical limitations to manufacturing as well. I don't have the experience to make those calls, so yes, I'm oversimplifying. And they didn't make a stupid mistake. They already have multiple cores. Anything over 4 cores is a lot. They clearly have the ability to handle a vast number of cores. They chose a particular path for their GPU however which relied on less cores than AMD's GPUs.

Quote from

Centreri

Plus, he said that 'CPU's are were single-threaded is useful, due to poor design'

Don't scare me like that. I had to double check what I said. In the future, we will have more cores. Period. We will continue to add more cores because they are a cheap and easy way to increase performance without decreasing efficiency too much. Remember that unused cores can be turned off to save power. When I say poor design, I mean it. Software developed 10 years ago is not multi-threaded, and we should consider it to be poorly designed by today's standards. New software which comes out and does not support parallelization is poorly designed. Was it well designed when it came out? Probably. Are people still using 10 year old software? Of course. It's just old and will eventually be replaced with something much more efficient, or something that handles modern hardware better.

Quote from

Centreri

My point is that just because power is proportional to current squared, it does not imply that power is proportional to clock speed squared (or whatever silly thing was being argued).

I am assuming that one clock takes up a certain number of coulombs. If you increase the frequency of the clock, the number of coulombs would go up at the same rate. The impedance of the silicon, on average, isn't going to change. Let me know where I am wrong please. Keep in mind that I'm oversimplifying, but I made that clear in the original post (ideal).

"Parliamentary inquiry, Mr. Chairman - do we have to call the Gentleman a gentleman if he's not one?"

Mar 18 2012, 9:31 pm

NudeRaider Post #14

We can't explain the universe, just describe it; and we don't know whether our theories are true, we just know they're not wrong. >Harald Lesch

Quote from

Aristocrat

What we do know for a fact is that the cores are now thinner, but a lot more of them can be crammed into the same space. If the leaked GTX 680 benchmarks are legitimate, this is a good thing; it means that performance per watt (and per mm² die area) went up.

Newer architecture is of course better because newer (better) instructions are being used.

Back to Cent's objection: Cent doubts that "more cores" > "better cores" because P = R*I^2.

And apparently this is in fact a simplified (wrong) formula in this case.
Wikipedia states P = C * V^2 * f where
P = Power
C = Capacitance of the CPU
V = Voltage
f = frequency

This means that power draw is linear to frequency drop. HOWEVER, the article states that when you lower the frequency the voltage requirement also drops. They do not explain the relation between f and V however, so we can't tell the exact power changes in relation to frequency. What we can say is that it's higher than linear which makes the initial statement that "more cores" > "better cores" true (under the assumption a given algorithm can be parallelized well enough).

So looking at the development of computing hardware we've seen better cores (as opposed to more cores) for a very long time simply because it's much easier to write algorithms for serial computation than one's suited for parallel processing. However a few years ago we hit a roadblock which prevented clock speeds from going higher.

Roadblock explained

Everyone in the industry knew this would eventually happen, but no one was really sure when. It ended up happening earlier than expected -- I think the industry consensus was that it was going to top out at about 10 GHz, which is why Intel made a huge (and ultimately failed) bet on the "Netburst" architecture. It was designed to run optimally at about 6 GHZ, but they never got that high, and at low clock rates it made too many concessions.

They're up against physical limits: the speed of light, Planck's constant, the size of atoms, and a couple of others. One big problem is tunneling, quantum leakage. As transistors get smaller, and as you use smaller voltages, there's a greater and greater chance that an electron will jump from the source to the drain even when the FET is "off". The smaller the FET, the more of that you'll see.

You can prevent that by using higher voltages. If the hill is taller, the chance of an electron tunneling is lower. But if the voltage is higher, then it means you have to use more charge in the gate, which makes the switching time slower. But if you don't do that, eventually there comes a point where the quantum leakage approaches the level of a normal signal, and then you can't tell if the FET is "on" or "off".

Also, using a higher voltage means you use more power, and cooling is a real issue.

We haven't outright topped out yet; it's still possible to make more gains. But we're near the limit of what's possible with MOSFETs. And we're also near the limit of what we can buy with making the devices smaller. Right now it's down to the point where some insulating layers are less than 10 atoms thick. At a certain point when you're trying to get smaller, you start running into granularity issues, and we're near that point.

There are two alternate approaches which could conceivably yield vastly higher switching rates, but both are radically different than anything we're currently using. One is light gates. (I don't know what the official name of this is.) The other is Josephson Junctions. There has been research into both for decades, but neither is remotely close to being ready for prime time. (A "big" device for either right now is 50 gates. There's a non-trivial issue scaling them up, not to mention the weird operational environment needed for Josephson Junctions.)

However, a clock rate stall in MOSFET technology doesn't mean that processors will cease to increase in compute power. There's a lot that can be done in terms of architectural changes to increase compute power without requiring increased clock speeds. Increasing parallelism is the ticket, and that's why dual-core and quad-core processors are becoming more and more common. But there are other things, too.
(source)

This means we have no choice but to make CPUs more efficient and increase parallel processing until new technology is available.

Addressing the question in the OP:
Kepler having lower per-core performance than previous generations probably stems from multiple reasons:
- difficult (= time consuming) to max out the power of a gpu right away without running into problems (heat, reliablilty, power draw, etc.)
- marketing: Later develop "improved" Keplers and sell them for more
- power efficiency (which is hopefully established now)

Post has been edited 1 time(s), last time on Mar 18 2012, 10:01 pm by NudeRaider. Reason: typo

Mar 18 2012, 9:54 pm

rockz Post #15

ᴄʜᴇᴇsᴇ ɪᴛ!

I think we can all agree that kepler is better than expected.

"Parliamentary inquiry, Mr. Chairman - do we have to call the Gentleman a gentleman if he's not one?"

Mar 22 2012, 3:07 pm

Aristocrat Post #16

Here's nVidia's rationale for the design change:

Benchmarks are out. Looks like the leaks going on about it being 30% better than the 7970 in games are actually not far-fetched.

None.

Mar 22 2012, 3:55 pm

rockz Post #17

ᴄʜᴇᴇsᴇ ɪᴛ!

I both hate and love the success here. However, if the area is a little less than doubled, doesn't this mean their yields are lower and will be much more expensive?

Quote

Enthusiasts want to know about Nvidia's next-generation architecture so badly that they broke into our content management system and took the data to be used for today's launch. Now we can really answer how Kepler fares against AMD's GCN architecture.

"broke into" haha, no. Tom's is so predictable you can just guess the url.

"Parliamentary inquiry, Mr. Chairman - do we have to call the Gentleman a gentleman if he's not one?"

Mar 22 2012, 4:24 pm

Aristocrat Post #18

Quote from

rockz

I both hate and love the success here. However, if the area is a little less than doubled, doesn't this mean their yields are lower and will be much more expensive?

Fermi is 40nm, and Kepler is 28nm. This translates to roughly the same die area as 28^2 is roughly equal to half of 40^2. GTX 680 is not that big in die and is priced below the 7970.

None.

Mar 22 2012, 4:43 pm

rockz Post #19

ᴄʜᴇᴇsᴇ ɪᴛ!

that chart says area is 1.8x for logic. It's also got 4x the cores than fermi, whereas the size difference of the tech is a 50% area decrease, unless they significantly decreased the number of transistors in each core.

"Parliamentary inquiry, Mr. Chairman - do we have to call the Gentleman a gentleman if he's not one?"

Mar 22 2012, 5:03 pm

Aristocrat Post #20

Seems more like 3x. As far as I can tell, they designed it so that two Kepler cores = 1 Fermi core, but actual performance scales worse than that.

In unrelated news, compute performance on the 680 is crippled; seems like nVidia pushed the card out to be optimized for games and only games. Anyone wanting high OpenCL performance on a gaming card (for things like DXVA/3ds max) should either look at AMD or wait for GK 110.

None.

Options

Pages: 1 2 >

Back to forum
Please log in to reply to this topic or to report it.
Members in this topic: None.

Global Shoutbox

[11:23 pm]

NudeRaider -- dumbducky

dumbducky shouted: I found a sneaky way to view some of my coworkers SAT scores and it is shocking

[11:22 pm]

NudeRaider -- dumbducky

dumbducky shouted: I miss forums. Twitter is way more engaging but unless you have a following, nobody talks to you. Reddit is just filled with the stupidest people

well, most active members here are the bots, so not sure if this is an improvement

[06:05 pm]

dumbducky -- I found a sneaky way to view some of my coworkers SAT scores and it is shocking

[06:03 pm]

dumbducky -- FaRTy1billion

FaRTy1billion shouted: o, i keep meaning to add things to it but instead don't

Farty does your website work anymore

[06:02 pm]

dumbducky -- I miss forums. Twitter is way more engaging but unless you have a following, nobody talks to you. Reddit is just filled with the stupidest people

[02:32 pm]

Zoan --

[2024-7-26. : 2:54 am]

Ultraviolet -- :wob:

[2024-7-25. : 2:39 am]

O)FaRTy1billion[MM] -- o, i keep meaning to add things to it but instead don't

[2024-7-24. : 11:55 pm]

Roy -- Make that three changes to the wiki in the last month, all three being just edits to user pages.

[2024-7-23. : 2:47 pm]

dumbducky -- Two changes to the wiki in the last month, both are just edits to user pages

Please log in to shout.

Members Online:

NudeRaider,

MetalGear,

Roy