Factual 'Big Mac' Results 566
danigiri writes "Finally Varadarajan has put some hard facts on the speed of the VT 'Big Mac' G5 cluster. Undoubtedly after some weeks of tuning and optimization, the home-brewn supercluster is happily rolling around at 9.555 TFlops in LINPACK.
The revelations were made by the parallel computing voodoo master himself at the O'Reilly Mac OS X conference. It seems they are expecting and additional 10% speed boost after some more tweaking. Srinidhi received standing ovations from the audience.
Wired news is also running a cool news piece on it. Lots of juicy technical and cost details not revealed before. Myth dispelling redux: yes, VT paid full price, yes, it's running Mac OS X Jaguar (soon Panther), yes, errors in RAM are accounted for, Varadarajan was not an Apple fanboy in the least... read the articles for more booze."
FACT: (Score:2, Funny)
Re:FACT: (Score:4, Funny)
Take 12492342... (Score:5, Funny)
Quite an accomplishment. (Score:2, Funny)
Now, where did all the tricks go?
Brewn? (Score:3, Interesting)
Re:Brewn? (Score:2)
Re:Brewn? (Score:4, Funny)
Ah. In that case, the word you were looking for was 'brizzled', MizzutherFizzucker.
Hope this helps.
Super computer? (Score:3, Insightful)
Re:Super computer? (Score:2, Interesting)
In terms of raw processing power, the computer on your desk is more powerful than
Re:Super computer? (Score:5, Funny)
Re:Super computer? (Score:2)
Full Price? WHY?!? (Score:5, Insightful)
This is disgraceful! Hundreds of Macs on one purchase order, and they couldn't (or chose not to!) negotiate a deal? The Virginia taxpayers should be outraged! Good grief, if I bought 600 loaves of bread from the corner market, I'd expect a discount. Perhaps they were more interested in making the press than being good stewards of the public trust. After all, the college knows the taxpayers will have pay the bills, sooner or later.
Shameful.
Re:Full Price? WHY?!? (Score:3, Insightful)
That VT wasn't able - or didn't think - to do the same is pretty shocking. A savings of $330,000 isn't anything to sneeze at.
Re:Full Price? WHY?!? (Score:2, Informative)
Comment removed (Score:5, Informative)
Re:Full Price? WHY?!? (Score:5, Informative)
Re:Full Price? WHY?!? (Score:2, Interesting)
Re:Full Price? WHY?!? (Score:2)
Re:Full Price? WHY?!? (Score:4, Informative)
Re:Full Price? WHY?!? (Score:2)
interesting points (Score:5, Interesting)
What more do you need? Faster systems, cheaper total cost, and slick looking cases.
Re:interesting points (Score:5, Insightful)
Re:interesting points (Score:5, Insightful)
Your professor's opinion is... well... flawed.
Itanium is an excellent architecture. Its flaws come from politics:
1: Itanium requires good compilers. For now, that means compilers from Intel. GCC will be fine for running Mozilla on an Itanium, but technical apps simply won't perform anywhere near the performance of the machine when compied with GCC.
2: Intel wants to market Itanium as a server chip. That means that they are putting 3MB or 6MB on the high end Itaniums. Soon they will have a 9MB cache version. Lots of cache means lots of transistors means lots of heat.
3: Intel is not fabbing Itanium with a state of the art process. Intel leads the world in process technology, yet their Itanium is still on a 130nm process. Before Madison (about a year ago), it was on a 180nm process.
Some misconceptions:
1: Itanium is "inefficent". This couldn't be further from the truth. At 1.5Ghz, it whoops *anything* else in SPECfp (by a margin of 1.5x or more) and matches the 3.2Ghz P4 or 2.2Ghz Opteron in SPECint.
2: Itanium is "slow". Wrong again, see above.
3: Itanium doesn't scale. Wrong again. Itanium scales better than any other current architecture, getting nearly 100% of clock in both int and fp. Opteron gets around 99% int and 95% fp. Pentium 4 gets around 85% int and 80% fp. I don't have data for PPC970.
4: Itanium is expensive. This is true, but it has to do with politics rather than architecture. Itanium uses *fewer* transistors and does *more* instructions per clock than a RISC architecture. Itanium takes much of the logic out of the CPU and puts it into the compiler (this is why you need good compilers). Itanium's architecture is called EPIC, or explicitly paralell instruction computing, because each instruction is "tagged" by the compiler to tell the CPU what instructions can and cannot be executed in paralell.
EPIC scales better than RISC architectures. It does more work with a lower clock and fewer transistors. That means that it will ultimately result in a cooler, cheaper, smaller, faster CPU than anything else. Intel's politics prevents this from happening.
So, please don't say that Itanium is a poor architecture. Itanium is a proven architecture. It uses fewer transistors and lower clock speeds than comparable RISC CPUs. Yes, it has problems, but most of them have to do with Itanium the CPU (too much cache, too expensive, not latest process) instead of EPIC the architecture.
Re:interesting points (Score:3, Insightful)
Doing more per clock isn't necessarily good if it pushes your clock speed too low. Itanium2 is only availble up to about 1.3 Ghz. As the article says, it's ironic that Intel should now lose the Mhz race.
Using fewer transistors is good for reducing heat and manufacturing costs, but the Itanium is
Re:interesting points (Score:4, Interesting)
If by "about 1.3Ghz", you mean 1.5Ghz, then, yes, Itanium only goes up to 1.5Ghz. But at 1.5Ghz is faster than the fastest 3.2Ghz Pentium 4. With a decent process and less cache, it could easily scale to 2+ Ghz.
" but the Itanium is neither cheap nor cool (130W!)"
This has to do with the fact that the CPU has 3MB of cache on it. That makes the die huge which makes the CPU expensive. It also makes it heat up like a toaster. As a comparison, the latest Pentium 4s are ~90W, and they only have 512K of cache.
"In the performance arena, Moore's law is useless unless chip designers figure out how to use MORE transistors to compute more quickly."
My statement was that, for a given performance level, Itanium uses less transistors than RISC. Itanium was *designed* to use more transistors. That's why the instruction set is designed to produce code that runs well in paralell. RISC CPUs have to figure out what can be run in paralell in hardware - Itanium does it in the compiler.
Re:interesting points (Score:3, Interesting)
I don't see your point here. More cache does not make it a better processor architecture.
The PPC970 and Power4+ are both fabricated i
Re:interesting points (Score:5, Insightful)
Or do you mean scaling with clockspeed? In that case, the bigger the cache and the faster the system bus and ram, the better will it scale, but the cpu architecture itself is hardly a factor. Unfortunately I haven't seen any transistor numbers of a Itanium2 core. But I think it's not true. The Itanium saves some logic on instruction decoder, but has more execution units in parallel (which should lead to better performance, but ONLY IF it's actually possible to build a well optimizing compiler which manages to keep the execution units busy, and it's completely feasible that this is just not possible in the general case). I really don't think this is true. Scaling is independant from the cpu core architecture.
I will agree that EPIC (which, btw, isn't quite intels invention, it shares most of the ideas with VLIW) is a nice concept, but for some reason it just doesn't work in practice as well as it should.
Re:Anyone find the efficiency of this thing? (Score:5, Interesting)
For comparison, ASCI Q (#2 on Top500) reaches 68% efficiency, MCR Linux Cluster (currently #3, but to be pushed by by this new Mac cluster) reaches 69% efficiency, and the #1 spot, Earth Simulator, reaches a quite impressive 88% efficinecy.
Of course, there are other ways to measure efficinecy. When it comes to performance/price, this Mac cluster does very well, even if you do take into account the real costs (ie MUCH more than just the $5.2 million up front cost). For cost/power consumption it seems reasonable, but not outstanding. 10TFlops/1.5MW of power is ok, and not too far off the Earth Simulator's 35TFlops/3.5MW of power, but it's certainly nothing to write home about. Cray's next big cluster, Red Storm, is likely to get over 30TFlops when it's released, but will consume only 2.0MW of power.
Re:Anyone find the efficiency of this thing? (Score:3, Insightful)
Yes and no. The only way the G5 can do 4 FP operations per cycle is if each of its 2 FP units executes a fused multiply-add instruction. Obviously no code is going to consist entirely of these, so the actual theoretical peak is less than the theoretical theoretical peak. Or something like that.
Re:Anyone find the efficiency of this thing? (Score:5, Insightful)
You could calculate a new marketing BS peak number where multiply-add only counted as a single flop, or you took into account some realistic cache miss rate. The new lower theoretical peak would give you a much higher efficency.
Dumb Question... (Score:4, Interesting)
1) Why can't they just shout "Let 'er rip!!" and crank the thing wide open?
2) Why all the media buzz concerning this as a `surprise' when they've already got its performance figured out, apparently?
Sorry.
Re:Dumb Question... (Score:5, Informative)
It's highly dependent on the interconnects, the topology of the network, the software that does the clustering (i.e., that actually makes the nodes available for parallelized word), etc.
So minor tweaks can have major effects, and getting it tweaked properly is quite an accomplishment.
Re:Dumb Question... (Score:3, Funny)
Does it make Word's performance acceptable?
Re:Dumb Question... (Score:2)
You know, i still would have thought that they'd have all this worked out beforehand, maybe before they even build a supercomputer.
I mean, you can't just say "well, let's go grab a metric fsck-ton of X and see what happens when we cluster it". You're talking a lot of resources and especially $$ that's being thrown on the line. I'm sure that building a supercomputer is way over my understanding and these folks probalby have put m
Re:Dumb Question... (Score:3, Interesting)
They new in advance what they could likely achieve with this cluster and they have surpassed what they were expec
Re:Dumb Question... (Score:2)
Re:Dumb Question... (Score:2)
Statement: I know I can get my Ferrari to give me about 20% more acceleration by adjusting the fuel to air ratio.
Question: Why can't you just shout "Let 'er rip!!" and crank the thing wide open?
Re:Dumb Question... (Score:2)
If it were that easy, I'm sure they'd do just that. But a cluster of over 1000 machines is a complex contraption and I'm willing to bet that tuning it to get the last drop of performance out of it is not a simple task. It's probably a theorize, measure, tweak loop getting small increments of performance gain on each iteration. It's probably going to take them some time before they get it fully tweaked out.
Re:Dumb Question... (Score:2)
1) Why can't they just shout "Let 'er rip!!" and crank the thing wide open?
The Real Potential is figured as a pure function of how many processors are in a machine and what speed they're running at.
Only a certain percentage of this processing power is aimed squarely at solving a problem, however... you also have to do things like:
Run an operating system
Compute error checks on co
Re:Dumb Question... (Score:2)
Not quite (Score:2)
My understanding is that it can perform two FMADD instructions every clock cycle.
Favorite Quote (Score:5, Funny)
Re:Favorite Quote (Score:5, Funny)
Ok, max all the options. Cool.
Now put 1100 in the quantity. Cool.
Ok (chugga chugga chugga) $3.3 million dollars. Who has the credit card? (silence, *crickets*, the rude sound of nobody reaching for their wallet...)
Ok maybe it is just me. Of course I have play configured a few systems in the online order systems of IBM and Dell a few times (didn't actually hit 'Submit' however) and it is possible to configure a single $100k machine from Dell. I haven't found the limit at IBM yet as they seem to have more imagination than I do (although it is easy just to get the SOFTWARE on one of their systems to exceed $100k.)
Re:Favorite Quote (Score:2)
You should start a list of the most expensive things you can buy online with a credit card.
For instance could you buy a Airbus?
Re:Favorite Quote - Correction About Apple (Score:5, Informative)
I usually never reply to these things, but I think it is funny that people are arguing about how he ordered on the Apple Store. I find it even funnier that people would even go to the Apple Store and try. It was a joke! There were a lot of dedicated people at Apple, including myself, that helped to make this dream become a reality. The "myth" that I would like to clear up is that Apple DID have a clue and a lot of great people at Apple have been working really hard for that last few months, making a lot of personal sacrifices to make sure that all the awesome work from Dr. Varadarajan and the rest of the cluster team could be possible and successful. That's my 2 cents.
Jerome Holman
Apple Campus Representative @ VT
http://filebox.vt.edu/users/jeholman [vt.edu]
Re:Favorite Quote - Correction About Apple (Score:3, Interesting)
The $5.2M figure seems to just be the Towers (Dual 2Ghz + 4GB RAM is $4814 with the standard educational discount, mulitply by 1100 and you get $5295400). What was the additional cost of the Infiniband cards and switches, the Cisco switches, the racks, and the cooling equipment? Were any modifications necessary for the building (more power, etc)?
Answers (Score:3, Interesting)
Re:Answers (Score:3, Insightful)
Did he get Air Miles? (Score:2)
No, make that.. (Score:5, Funny)
[snip]
yes, it's running MacOSX Jaguar ( soon Panther)
More like whole-lotta-CD-jockying. Perhaps the bio department can lend a hand by donating the services of their chimps to handle the CD swapping.
(Yes, I'm aware there are smarter ways of doing it, but isn't it a fun mental picture, 100 chimps running around a cluster of G5's and throwing bananas and CDs at each other?) Talk about your fun install-fests.
Simply amazing (Score:3, Insightful)
Damn!
Re:Simply amazing (Score:5, Funny)
<reads sentences again carefully and whimpers for America>
Jumbled numbers (Score:3, Funny)
Too bad some software patents will be filed (Score:4, Interesting)
What's up with that?
Used to be that work like this done at a Univeristy was considered 'open' as in available to anyone to help advance the state-of-the-art. Not anymore...
Re:Too bad some software patents will be filed (Score:5, Insightful)
The patent system needs to be overhauled, then maybe we can start opening up the Universities again (and give them some more funding too!)
Re:Too bad some software patents will be filed (Score:2)
Boy, I'd love to see you substantiate this assertion with a source. If what you say is true, then it sounds to me like the entire university system in this country is essentially already undermined completely. No wonder researchers are
Re:Too bad some software patents will be filed (Score:2)
That's a real problem, but it has a real solution: release the findings as public domain or under
Re:Too bad some software patents will be filed (Score:3, Informative)
Besides the prior art issue that others mentioned, academic research is not subject to patents. So university researchers never have to pay to license patents.
Re: (Score:2)
Re:Too bad some software patents will be filed (Score:2)
Re:Too bad some software patents will be filed (Score:2)
Since when? Sure some work is done openly and published. Some developments are marketed. This is one way a university makes money...
Re:Too bad some software patents will be filed (Score:2)
But until you actually get the patent, you're an idiot if you let the cat out of the bag, so to speak.
Re:Too bad some software patents will be filed (Score:2)
Here's a fun game to play: find the total budget for your state university system. Calculate the percentage of that budget coming from state financing. Perform the same calculation using 1995, 1990, 1980, and 1970 budgets.
Do the math in a state like Wisconsin, and the percentage has fallen quite dramatically. That lost financing has to be made up from somewher
Re:Too bad some software patents will be filed (Score:2)
Holding the patent in hand is much cheaper than having a law firm fight the prior art fight for you.
Open source the code (Score:3, Insightful)
This is in addition to consulting where they are helping others build similar clusters.
Full price? (Score:5, Insightful)
You'd think apple would at least sell G5's to VT without SuperDrives and Radeon 9600s. I seriously doubt those things (especially the video cards) will get a lot of use in a giant cluster.
But, hey, even with all that pointless extra hardware, this cluster is still less then half the price of a comparable intel system from Dell or IBM. Weird.
Re:Full price? (Score:2)
Re:Full price? (Score:5, Interesting)
You'd think apple would at least sell G5's to VT without SuperDrives
OTOH, five years from now, when they have the world's 65,000th fastest supercomputer, they could just pull the thing apart and give/sell complete computers to their students. Then it's back to the Apple Store to order up a whole lot of G7's.
system from IBM? (Score:3, Insightful)
When IBM comes out with the $3,500 4-way 970 (G5 in Apple-speak) workstation it will be interesting to see what people do with it. Imagine a cluster that is 17% more expensive but with twice as many processors...
Re:Full price? (Score:2)
I wonder how much extra power this "frippery" is consuming? How much extra cooling would be needed and how much would it cost per year?
For video - No extra power (Score:2)
nerds (Score:2, Funny)
From the wired article:
"After his presentation, a group of nerds followed him to the hotel's bar for drinks, hanging on his every word."
How dorky did these guys have to be to have a reporter for "Wired" catagorize them as nerds...damn....
Re:nerds (Score:2)
When They Switch It to Linux (Score:2, Funny)
Not a mac fan either... (Score:5, Funny)
Executive Summary (Score:5, Funny)
Re:Executive Summary (Score:3, Informative)
Maybe you didn't read close enough, because the articles specifically state that he didn't compare only to Intel and that he found the Opterons to be too expensive. I'm just saying, because I think a lot of people did see a quote in the article mentioning Opterons, and you seem to have missed it. T
Demand for a G5 xServe? (Score:2)
However, if G5 Macintosh systems like this become "popular" in supercomputing, maybe that's a reason to get a G5 XServe out there sooner. I'd imagine a rack mount system would be easier to deal with than a bunch of towers.
A Little Perspective Here (Score:5, Informative)
Here's a quick rundown:
Dell - too expensive [one of the reasons for the project being so "hush hush" was that dell was exploring pricing options during bidding]
Sun (sparc) - required too many processors, also too expensive
IBM/AMD (opteron) - required twice the number of processors and was twice the price in the desired configuration; had no chassis available
HP (itanium) - same
Apple (IBM PPC970) - system available with chassis for lowest price
Power PC 970 and G5 (Score:3, Interesting)
"The IBM with a PowerPC 970 was a first choice but the earliest delivery date would have been January 2004."
"On June 23 Apple announced the G5."
I was under the impression that the G5 was a Power PC 970. Is it just some derivative of the Power PC 970... or what?
Re:Power PC 970 and G5 (Score:3, Informative)
Just a thought on "home-brewed" means (Score:4, Informative)
Ignoring the "brewn" part of things, since when does "home-brewed" mean "designed and funded by a major university"?
I usually think of "home brewed" as something that someone put together at home. With their own money. In their spare time.
This is *not* a home-brew supercomputer, it is an institute designed and created super computer.
That is all.
More info on the G5 Cluster (Score:5, Informative)
Here is da slide-show [vt.edu]
Memory errors? (Score:3, Interesting)
How, pray tell, are they planning on detecting these errors? I can understand how you could reduce the frequency of errors with only a slight loss in performance, ie take some sort of checksum of your data after every x number of cycles, but that doesn't eliminate the errors, only reduces their frequency. Maybe it reduces the frequency by enough that you don't need to worry about it, especially if 'x' is a sufficiently small number, but it still seems like a pretty risky prospect to me.
Anyone seen any actual TECHNICAL details on this point, ie not just some Mac fan yelling "Deja Vu, DEJA VU!!!"?
Supercomputer article (Score:5, Informative)
For anyone interesting in learning a bit more about what some of the issues are when creating a super-computer, you might want to have a look at the following:
Red Storm PDF [lanl.gov]
The article is talking about Cray/Sandia's new Red Storm machine, a supercomputer using over 10,000 AMD Opteron processors that is expected to be competitive with the Earth Simulator for the #1 spot on the Top500 list. It does, however, talk about a lot more than just the specifics of this cluster, describing what some of the bottlenecks in supercomputers are and how to avoid/work around them.
Optimize Thit Optimize That (Score:4, Insightful)
Look at what they built: a complete COTS supercomputer, miniscule price, functionality in six months, public data in a year. They have >9Tf right outta the box.
Yes they have written their own software, but name a company that doesn't? They modded them (cooling I think, but I couldn't find data only pics.) They bribed students with pizza and soda, they didn't have to buy, make or gut a building. What is amazing is they showed that any simple slashdot pundit could build one if given these resources.
Re:Full price (Score:2, Insightful)
The x86 cluster would have been twice as expensive. And this outpreforms the highest ranking x86 cluster, which has more processors.
Re:Full price (Score:5, Funny)
The REAL power usage numbers (Score:5, Informative)
Ugh, this is getting old.
Red Storm, the machine by itself itself, uses 2.0MW.
Big Mac and all of its networking gear uses less than 0.75MW. The supercomputing center itself (building, air conditioning, UPS battery charging equipment, and the 1100 G5s) is fed by a 1.5MW substation feed. They're still not even maxing out the substation.
The latest, fastest Opterons (not the scaled down low-power Opteron for blade servers) consume 53 watts at full clock. PowerPC 970 @ 2 GHz consumes 48 watts. The U2 and K3 motherboard chipset on the dual G5s uses just as much power as the PowerPC 970 "G5" processors. Hell, the power supply in a dual processor G5 system is 550 watts. 550 x 1100 machines = 0.61MW.
Re:Full price (Score:3, Interesting)
Too slow/expensive (Score:3, Insightful)
So he went full price with the G5 ($3000 apiece) and for only $5.2 million has the number 3 slot and is shooting for a 10% boost.
Wrong (Score:3, Insightful)
1. Earth simulator
2. ASCI Q
3. Virginia Tech G5 cluster (9.555 Tflops and rising, $5.2M HARDWARE ONLY)
4. PNL Itanium2 cluster (8.633 Tflops, $24.5M HARDWARE ONLY [pnl.gov])
So nope, not only will the PNL Itanium2 cluster not be #2, it will also be 1Tflop behind the Virginia Tech cluster, and it will have done it at almost 5 times the cost. Bravo!
Re:technical details? like this one... (Score:5, Funny)
Until then, quit your trolling.
in the same vein (Score:2)
Re:price (Score:2, Insightful)
Re:Accounting (Score:2)
Re:Why didn't they use Darwin or Gentoo? (Score:3, Informative)
Second, the difference caused by increased optimization in the kernel, for an application like this, is relatively insignificant, simply because most of the work is done in user-space. In fact, any decent super-computing application will do its best to minimize system calls (allocating memory pools, chunking I
Re:building supercomputer with desktops sucks (Score:3, Interesting)
Also, I've heard that the system controller supports 16GB of ram but that Apple has only certified 1GB DIMMs so far. This would seem likely as a lot of Macs can accept more memory than initially advertised... only
yes and no... (technical arguing) (Score:3, Insightful)
Desktops take up more room, correct. And yes, the desktop G5 does not have a console serial port like the xServe does. But seriously, how many modern clusters do you see with a terminal server connecting to each of the node's serial port? These days it's all install-an
Re:if they paid full price, it's not a great deal (Score:3, Informative)