Staredit Network > Forums > Technology & Computers > Topic: nVidia Kepler Benchmarked
nVidia Kepler Benchmarked
Mar 14 2012, 8:47 pm
By: Aristocrat  

Mar 14 2012, 8:47 pm Aristocrat Post #1



First up is a mobile chip, though.

For reference, here are the specs of 640M, the previous-gen part that its performance is on par with, the GT 555M, and the GT 560M benchmarked in the Anand article:

GT 555M: 96 CUDA cores, 753/900MHz core/memory clocks, 128-bit memory bus
GT 640M: 384 CUDA cores, 625/1800MHz core/memory clocks, 128-bit memory bus

GT 560M: 192 CUDA cores, 775/1250MHz core/memory clocks, 192-bit memory bus

This worries me slightly: nVidia quadrupled the cores, dropped the clocks slightly, and the performance didn't increase substantially? To complicate matters further, the 560M, which has half as many cores, consistently outperforms the 640M.

From the little information that we can glean from this, it seems that they are now taking the approach of using a lot of shitty cores in a GPU as opposed of using a fewer number of more fat cores (the former is what AMD/ATI did with their GPUs). What's up with that? I was expecting higher per-core performance with Kepler than Fermi, not the other way around.

Well, as long as performance/watt and performance/price stay good, I have no qualms about the decreased per-core performance.



None.

Mar 16 2012, 4:13 am rockz Post #2

ᴄʜᴇᴇsᴇ ɪᴛ!

certainly you must understand that it is more efficient to run more cores under ideal circumstances.

lower core performance means lower power draw. As you increase the frequency and voltage of the core, the power increases by a square (P=R I^2). By adding very efficient cores, you are just adding, not squaring. GPUs better use multiple cores too, so it's a very good plan there. CPUs are where single thread performance is very useful, due to poor design.



"Parliamentary inquiry, Mr. Chairman - do we have to call the Gentleman a gentleman if he's not one?"

Mar 16 2012, 3:58 pm Centreri Post #3

Relatively ancient and inactive

Rockz, I'm pretty sure that what you just said was inaccurate technobabble. I'm sure that NVidia knows better than Aristcrat and his criticisms have little merit, but, still... technobabble.



None.

Mar 16 2012, 11:11 pm NudeRaider Post #4

We can't explain the universe, just describe it; and we don't know whether our theories are true, we just know they're not wrong. >Harald Lesch

As far as I can tell rockz hit the nail on the head. Why do you think what he said is inaccurate?




Mar 17 2012, 1:26 am Aristocrat Post #5



Quote from rockz
certainly you must understand that it is more efficient to run more cores under ideal circumstances.

I'm simply surprised by this architectural overhaul; historically, nVidia GPUs have retained relatively low core counts (relative to ATI/AMD), but kept on-par or higher overall gaming performance due to the higher performance per core, so I wonder what motivated this change to using more, weaker cores. Better compute throughput? Doesn't seem like it.



None.

Mar 17 2012, 5:43 pm NudeRaider Post #6

We can't explain the universe, just describe it; and we don't know whether our theories are true, we just know they're not wrong. >Harald Lesch

Better power efficiency (as rockz pointed out). When I'm not mistaken nVidia GPUs were usually hotter than ATi.




Mar 17 2012, 7:29 pm Centreri Post #7

Relatively ancient and inactive

The end result may be accurate (I wouldn't know), but his physical justification for it seems like balderdash. More cores > better cores because P = RI^2? Why are you squaring current, and using THAT as a justification? Why isn't computing more parallelized, if parallelization is N^2 as efficient? If what he said is true and what Aristocrat said about Nvidia's tendencies to go for fewer cores is true, how did all of those many, many engineers make such a stupid mistake as that, whereas those at ATI didn't?

I am of the opinion that rockz is grossly oversimplifying the technical issues at hand to the point of falsification. Like saying that there are night and day cycles around the world because the sun orbits the earth.

Plus, he said that 'CPU's are were single-threaded is useful, due to poor design' - and that is, again, nonsense. Not every bit of code can be parallelized, and even for the bit of code that can be, it's more expensive and difficult to do so. It's not due to any 'poor design'.

Post has been edited 1 time(s), last time on Mar 17 2012, 7:36 pm by Centreri.



None.

Mar 17 2012, 7:49 pm NudeRaider Post #8

We can't explain the universe, just describe it; and we don't know whether our theories are true, we just know they're not wrong. >Harald Lesch

Quote from Centreri
The end result may be accurate (I wouldn't know), but his physical justification for it seems like balderdash. More cores > better cores because P = RI^2? Why are you squaring current, and using THAT as a justification?
Simple math. As you hopefully know: P = U * I and U = R * I
replacing U with R * I you get P = R * I * I

Quote from Centreri
Why isn't computing more parallelized, if parallelization is N^2 as efficient?
Because usually you can't (the exception being some graphical applications). Traditionally, computer software has been written for serial computation where you can only continue to compute if you have the result of a previous calculation.




Mar 17 2012, 7:52 pm Centreri Post #9

Relatively ancient and inactive

... Okay, so what does the ratio of power to current have to do with anything? I may as well define something as the square root of current, plug it in, and say OH HEY LOOK POWER IS PROPORTIONAL TO X^4!

Why not Voltage? Or, hell, resistance? Or Power Squared?

Post has been edited 1 time(s), last time on Mar 17 2012, 9:46 pm by Centreri.



None.

Mar 18 2012, 12:57 am Moose Post #10

We live in a society.

Quote from Centreri
... Okay, so what does the ratio of power to current have to do with anything? I may as well define something as the square root of current, plug it in, and say OH HEY LOOK POWER IS PROPORTIONAL TO X^4!

Why not Voltage? Or, hell, resistance? Or Power Squared?
Power = Voltage * Current
Voltage = Resistance * Current (Ohm's Law)
Therefore: Power = Resistance * Current^2

http://en.wikipedia.org/wiki/Electric_power
http://en.wikipedia.org/wiki/Joule%27s_laws
Physics was not invented arbitrarily in this topic.

As for the portions applied to computer science, I don't have justfication for those claims. Only that one equation. :P

Post has been edited 1 time(s), last time on Mar 18 2012, 1:02 am by Mini Moose 2707.




Mar 18 2012, 1:20 am Centreri Post #11

Relatively ancient and inactive

Moose.


My point is that just because power is proportional to current squared, it does not imply that power is proportional to clock speed squared (or whatever silly thing was being argued).

I did not need more links to Ohm's Law, thanks.



None.

Mar 18 2012, 2:13 am Aristocrat Post #12



What rockz said might apply to the exact same cores running at different voltages and speeds; ignoring scalability problems, 500 Fermi cores at 350 MHz and low voltage will use less power than 250 of those same cores at 700MHz while providing the same net performance. However, Kepler is a different microarchitecture on a different lithography process from Fermi. Power consumption due to different frequencies/voltage is not comparable between those two. What we do know for a fact is that the cores are now thinner, but a lot more of them can be crammed into the same space. If the leaked GTX 680 benchmarks are legitimate, this is a good thing; it means that performance per watt (and per mm2 die area) went up.



None.

Mar 18 2012, 4:36 am rockz Post #13

ᴄʜᴇᴇsᴇ ɪᴛ!

Quote from Centreri
The end result may be accurate (I wouldn't know), but his physical justification for it seems like balderdash. More cores > better cores because P = RI^2? Why are you squaring current, and using THAT as a justification? Why isn't computing more parallelized, if parallelization is N^2 as efficient? If what he said is true and what Aristocrat said about Nvidia's tendencies to go for fewer cores is true, how did all of those many, many engineers make such a stupid mistake as that, whereas those at ATI didn't?
There are physical limitations to manufacturing as well. I don't have the experience to make those calls, so yes, I'm oversimplifying. And they didn't make a stupid mistake. They already have multiple cores. Anything over 4 cores is a lot. They clearly have the ability to handle a vast number of cores. They chose a particular path for their GPU however which relied on less cores than AMD's GPUs.

Quote from Centreri
Plus, he said that 'CPU's are were single-threaded is useful, due to poor design'
Don't scare me like that. I had to double check what I said. In the future, we will have more cores. Period. We will continue to add more cores because they are a cheap and easy way to increase performance without decreasing efficiency too much. Remember that unused cores can be turned off to save power. When I say poor design, I mean it. Software developed 10 years ago is not multi-threaded, and we should consider it to be poorly designed by today's standards. New software which comes out and does not support parallelization is poorly designed. Was it well designed when it came out? Probably. Are people still using 10 year old software? Of course. It's just old and will eventually be replaced with something much more efficient, or something that handles modern hardware better.


Quote from Centreri
My point is that just because power is proportional to current squared, it does not imply that power is proportional to clock speed squared (or whatever silly thing was being argued).
I am assuming that one clock takes up a certain number of coulombs. If you increase the frequency of the clock, the number of coulombs would go up at the same rate. The impedance of the silicon, on average, isn't going to change. Let me know where I am wrong please. Keep in mind that I'm oversimplifying, but I made that clear in the original post (ideal).



"Parliamentary inquiry, Mr. Chairman - do we have to call the Gentleman a gentleman if he's not one?"

Mar 18 2012, 9:31 pm NudeRaider Post #14

We can't explain the universe, just describe it; and we don't know whether our theories are true, we just know they're not wrong. >Harald Lesch

Quote from Aristocrat
What we do know for a fact is that the cores are now thinner, but a lot more of them can be crammed into the same space. If the leaked GTX 680 benchmarks are legitimate, this is a good thing; it means that performance per watt (and per mm2 die area) went up.
Newer architecture is of course better because newer (better) instructions are being used.


Back to Cent's objection: Cent doubts that "more cores" > "better cores" because P = R*I^2.

And apparently this is in fact a simplified (wrong) formula in this case.
Wikipedia states P = C * V^2 * f where
P = Power
C = Capacitance of the CPU
V = Voltage
f = frequency

This means that power draw is linear to frequency drop. HOWEVER, the article states that when you lower the frequency the voltage requirement also drops. They do not explain the relation between f and V however, so we can't tell the exact power changes in relation to frequency. What we can say is that it's higher than linear which makes the initial statement that "more cores" > "better cores" true (under the assumption a given algorithm can be parallelized well enough).


So looking at the development of computing hardware we've seen better cores (as opposed to more cores) for a very long time simply because it's much easier to write algorithms for serial computation than one's suited for parallel processing. However a few years ago we hit a roadblock which prevented clock speeds from going higher.
Roadblock explained
This means we have no choice but to make CPUs more efficient and increase parallel processing until new technology is available.


Addressing the question in the OP:
Kepler having lower per-core performance than previous generations probably stems from multiple reasons:
- difficult (= time consuming) to max out the power of a gpu right away without running into problems (heat, reliablilty, power draw, etc.)
- marketing: Later develop "improved" Keplers and sell them for more
- power efficiency (which is hopefully established now)

Post has been edited 1 time(s), last time on Mar 18 2012, 10:01 pm by NudeRaider. Reason: typo




Mar 18 2012, 9:54 pm rockz Post #15

ᴄʜᴇᴇsᴇ ɪᴛ!

I think we can all agree that kepler is better than expected.



"Parliamentary inquiry, Mr. Chairman - do we have to call the Gentleman a gentleman if he's not one?"

Mar 22 2012, 3:07 pm Aristocrat Post #16



Here's nVidia's rationale for the design change:


Benchmarks are out. Looks like the leaks going on about it being 30% better than the 7970 in games are actually not far-fetched.



None.

Mar 22 2012, 3:55 pm rockz Post #17

ᴄʜᴇᴇsᴇ ɪᴛ!

I both hate and love the success here. However, if the area is a little less than doubled, doesn't this mean their yields are lower and will be much more expensive?

Quote
Enthusiasts want to know about Nvidia's next-generation architecture so badly that they broke into our content management system and took the data to be used for today's launch. Now we can really answer how Kepler fares against AMD's GCN architecture.
"broke into" haha, no. Tom's is so predictable you can just guess the url.



"Parliamentary inquiry, Mr. Chairman - do we have to call the Gentleman a gentleman if he's not one?"

Mar 22 2012, 4:24 pm Aristocrat Post #18



Quote from rockz
I both hate and love the success here. However, if the area is a little less than doubled, doesn't this mean their yields are lower and will be much more expensive?

Fermi is 40nm, and Kepler is 28nm. This translates to roughly the same die area as 28^2 is roughly equal to half of 40^2. GTX 680 is not that big in die and is priced below the 7970.



None.

Mar 22 2012, 4:43 pm rockz Post #19

ᴄʜᴇᴇsᴇ ɪᴛ!

that chart says area is 1.8x for logic. It's also got 4x the cores than fermi, whereas the size difference of the tech is a 50% area decrease, unless they significantly decreased the number of transistors in each core.



"Parliamentary inquiry, Mr. Chairman - do we have to call the Gentleman a gentleman if he's not one?"

Mar 22 2012, 5:03 pm Aristocrat Post #20



Seems more like 3x. As far as I can tell, they designed it so that two Kepler cores = 1 Fermi core, but actual performance scales worse than that.

In unrelated news, compute performance on the 680 is crippled; seems like nVidia pushed the card out to be optimized for games and only games. Anyone wanting high OpenCL performance on a gaming card (for things like DXVA/3ds max) should either look at AMD or wait for GK 110.



None.

Options
  Back to forum
Please log in to reply to this topic or to report it.
Members in this topic: None.
[10:52 am]
jefet88769 -- Outdoor fitness and gym equipment manufacturer - https://mountwoodco.com/
[10:52 am]
jefet88769 -- Outdoor fitness and gym equipment manufacturer - https://mountwoodco.com/
[10:51 am]
jefet88769 -- Outdoor fitness and gym equipment manufacturer - https://mountwoodco.com/
[10:51 am]
jefet88769 -- Outdoor fitness and gym equipment manufacturer - https://mountwoodco.com/
[10:51 am]
jefet88769 -- Outdoor fitness and gym equipment manufacturer - https://mountwoodco.com/
[10:51 am]
jefet88769 -- Outdoor fitness and gym equipment manufacturer - https://mountwoodco.com/
[10:51 am]
jefet88769 -- Outdoor fitness and gym equipment manufacturer - https://mountwoodco.com/
[04:29 am]
m.0.n.3.y -- Can anyone help me get SCMDraft 2 running? Getting "Fatal Error: Failed loading primary MPQ (stardat.mpq)"
[06:25 pm]
Roy -- I think it used to say "I'm feeling lucky" and it would pull up a random file when clicked, but it broke at some point, so I changed the text and made it display a message instead.
[2024-3-03. : 7:37 pm]
NudeRaider -- oh funny, hadn't noticed that before :w00t:
Please log in to shout.


Members Online: Moose, Roy