Big Mac Benchmark Drops to 7.4 TFlops 417
coolmacdude writes "Well it seems that the early estimates were a bit overzealous. According to preliminary test results (in postscript format) on the full range of CPUs at Virginia Tech, the Rmax score on Linpack comes in at around 7.4 TFlops. This puts it at number four on the Top 500 List. It also represents an efficiency of about 44 percent, down from the previous result of 80 achieved on a subset of the computers. Perhaps in light of this, apparantly VT is now planning to devote an additional two months to improve the stability and efficiency of the system before any research can begin. While these numbers will no doubt come as a disappointment for Mac zealots who wanted to blow away all the Intel machines, it should still be noted that this is the best price/performance ratio ever achieved on a supercomputer. In addition, the project was successful at meeting VT's goal of developing an inexpensive top 5 machine. The results have also been posted at Ars Technica's openforum."
A supercomputer by Any Other Name.... (Score:5, Interesting)
I've always been sort of intrigued by
My computer is SUPER!!! (Score:2, Funny)
Re:A supercomputer by Any Other Name.... (Score:3, Insightful)
Re:A supercomputer by Any Other Name.... (Score:3, Funny)
n.
A mainframe computer that, as the result of birth on an alien planet, is impervious to bullets, is capable of flight, has x-ray vision, can run faster than a speeding train, etc.
"Is it a bird? Is it a plane? No it's a Cray XM-P!"
- Seymour Fights The Demon World, Action Comics, 1932
Source: The American Heritage(R) Dictionary of the English Language, Fourth Edition
Copyright (C) 2000 by Houghton Mifflin Company.
Published by Houghton Mifflin Company. All righ
Re:A supercomputer by Any Other Name.... (Score:4, Funny)
Super computers cost more than 5 million dollars
Mainframes cost more than 1 million dollars
Mini-Super computers cost more than 1/4 million dollars
Everything else is by definition a Plain Jane (TM) computer
btw, I've worked on all 4 kinds ;-)
From the horse's mouth (Score:5, Interesting)
snazzy new G5 logo too! (Score:4, Funny)
--
Re:snazzy new G5 logo too! (Score:2, Funny)
Way to go there; lets just keep encouraging their terrorism.
Important items of note (Score:5, Informative)
First, from a an Oct 22 New York Times [nytimes.com] story:
Officials at the school said that they were still finalizing their results and that the final speed number might be significantly higher.
This will likely be the case.
Second, they're only 0.224 Tflops away from the only Intel-based cluster above it. So saying "all the Intel machines" in the story is kind of inaccurate, as if there are all kinds of Intel-based clusters that will still be faster; there is only one Intel-based cluster above it, and with only preliminary numbers for the Virgina Tech cluster at that.
Third, this figure is with around 2112 processors, not the full 2200 processors. With all 1100 nodes, even with no efficiency gain, it will be number 3, as-is.
Finally, this is the a cluster of several firsts:
First major cluster with PowerPC 970
First major cluster with Apple hardware
First major cluster with Infiniband
First major cluster with Mac OS X (Yes, it is running Mac OS X 10.2.7, NOT Linux or Panther [yet])
Linux on Intel has been at this for years. This cluster was assembled in 3 months. There is no reason for the Virginia Tech cluster to remain at ~40% efficiency. It is more than reasonable to expect higher than 50%.
It's still destined for number 3, and its performance will likely even climb for the next Top 500 list as the cluster is optimized. The final results will not be officially announced until a session on November 18 at Supercomputing 2003.
Re:Important items of note (Score:2, Interesting)
Re:Important items of note (Score:2, Interesting)
This will likely be the case.
Why is this likely? The number dropped, why is it more likely to go up rather than down (or nowhere, for that matter)?
Actually, it's already at 8.2 Tflop today (Oct 22) (Score:3, Informative)
Re:Important items of note (Score:5, Informative)
Re:Important items of note (Score:5, Insightful)
Not really (Score:5, Informative)
Re:Important items of note (Score:3, Informative)
The deadline for submission to the Nov 2003 Top 500 list was Oct. 1st (see call for proposals) [top500.org], so it has already passed. Any further improvements that they make to the scalability of the cluster should not be included. This is true for all the machines.
Also Important? (Score:3, Informative)
Anyone know how much merit there is to using Nmax (or N1/2) to compare different systems?
Re:Important items of note (Score:2)
Could they add NICs to each computer, bond them (probably need to write something for this), and set up parallel networks with each set of cards to improve bandwidth?
Don't enough about the cluster's setup to say much at this point.
Re:Important items of note (Score:2)
This is not altogether surprising, given that they are using a desktop computer and trying to shoehorn it into a super
Re:Important items of note (Score:2)
So saying "all the Intel machines" in the story is kind of inaccurate
I was trying to refer to the fact that sometimes the Mac zealots, in the midst of their zealotry,
lose sight of reality and simply lump all non-Mac related things into one huge category, even if it really isn't one.
AltiVec won't help here (Score:5, Informative)
My feeling is that the ~40% efficiency seen on the larger scale run is an indication that either VA Tech spent very little time tuning the problem size or they didn't design their InfiniBand fabric to really handle 1100 nodes hammering away at Parallel Linpack. (Given that they've been extremely vague about how their IB network is structured, I fear it may be the latter.)
I doubt that's true, especially if they're using the IBM PPC compilers. The G4 has both significantly less memory bandwidth and a single double-precision-capable FPU, whereas the G5 is basically a single-core Power4 with an AltiVec unit in place of some cache. IBM's compilers (despite being a little wonky as far as naming and argument syntax) generally produce pretty fast code.the REAL reason to build a top-5 supercomputer (Score:5, Funny)
Re:the REAL reason to build a top-5 supercomputer (Score:2)
Not quite. But you still need a supercomputer even to rewrite the tables. Especially after you install the supercomputer. B-)
[/tongue-in-cheek]
Too good to be true... (Score:5, Insightful)
Now its at 44%. Thats not a small drop, thats a MASSIVE drop.
They didnt predict any loss in going from a small subset to the whole system? Or was it a publicity stunt (we can outperform everyone! our names are __________!)
PARENT IS NOT A TROLL (Score:2)
new updated troll (Score:3, Funny)
Big mac cluster.. (Score:5, Funny)
Re:Big mac cluster.. (Score:2)
Re:Big mac cluster.. (Score:3, Funny)
which brings up a totally off topic question.... a can of coke is 350 ml. it contains 300 calories.
now, let's say i drink this coke. it is really cold - say 4 degrees. my body temperature is a nice, mamallish 37 degrees. by drinking this coke i am warming up 350 g of what is essentially water from the temperature of the can to that of my body - a difference of 33 degress.
33c * 350ml = 11550 calor
Re:Big mac cluster.. (Score:5, Funny)
Re:Big mac cluster.. (Score:4, Informative)
For consumers, food calories are really kilo-calories. So in this case, you coke has 300,000 physic-style calories.
If you look at a euopean food-labels, sometime you can seem them writen as kcal.
Re:Big mac cluster.. (Score:3, Informative)
Re:Big mac cluster.. (Score:2)
Re:Big mac cluster.. (Score:2)
Re:Big mac cluster.. (Score:5, Informative)
A Calorie (the one used on food labels) is actually a kilocalorie. A Calorie is therefore 1000 calories. 1 calorie is basically the amount of heat needed to raise 1g of water 1 degree celsius. (A calorie is actually 1/100 of amount of heat needed to get 1 gram of water from 0 degrees C to 100 degrees C, but that works out almost the same.)
This is explained a bit on this web page. [reference.com]
So warming a 4 degrees C, 350mL Coke to 37 degrees C would take (37 - 4) * 350 = 11550 calories. This is 11.55 kilocalories or 11.55 Calories. The Coke has around 300 Calories in nutritive value therefore you would gain 300 - 11.55 = 288.45 Calories of energy from a 4 degrees C, 350mL can of Coke.
Re:Big mac cluster.. (Score:2)
Re:Big mac cluster.. (Score:2)
Re:Big mac cluster.. (Score:2, Funny)
Um, yeah, could I get some fries with that?
Re:Big mac cluster.. (Score:2)
Re:Big mac cluster.. (Score:2)
Instant Numbers... (Score:3, Insightful)
Y'all should know this by now.
~D
Does anyone else have trouble reconciling... (Score:5, Funny)
Re:Does anyone else have trouble reconciling... (Score:3, Insightful)
While some people have given the parent a flamebait mod and hostile replies, the poster makes a good (and humorous) point. Apple is not typically thought of in terms best price performance any more than, say, Cadillac is in the car industry. Macs are bought by those willing to pay a premium for that distinct Apple stying, OSX's slick interface with the power of Unix behind the scenes, the "it just works" factor, and so on. Those who don't care about the amenities and just want bang for the buck go for a
It's a good price/performance, but not best. (Score:3, Interesting)
KASY0 achieved 187.3 GFLOPS on the 64-bit floating point version of HPL, the same benchmark used on "Big Mac". While "Big Mac" is about 40 times faster on that benchmark, it is about 130 times the cost of KASY0 (~$40K vs ~$5200K). Considering the size difference, "Big Mac" is VERY impressive, but it can't claim to be the best price/performance supercomputer on
Catch Phrase (Score:4, Funny)
"Virginia Tech: Home of the Poor Man's Supercomputer and Michael Vick."
This is NOT all that surprising. (Score:5, Insightful)
Apparently there are a lot of cases where a MULTIPLY and an ADD do come together like that, but I'm not surprised if LINPACK doesn't consist entirely of those pairs. ;)
The 17.6 TFLOP theoretical peak assumed a perfect case consisting entirely of MULTIPLY-ADD pairs. In a case assuming no MULTIPLY-ADD pairs, the theoretical peak is 8.8 TFLOPs.
7.4 TFLOPs is only 42% of 17.6 TFLOPs, but it's 84% of 8.8 TFLOPs. I suspect the actual "efficiency" of the machine lies somewhere in the middle.
(As for me, I'm happy with just ONE dualie...)
Re:This is NOT all that surprising. (Score:2, Informative)
From: http://www.theregister.co.uk/content/39/31995.html [slashdot.org]
Re:This is NOT all that surprising. (Score:2)
Essentially ALL processors with a floating point unit do 64-bit precision calculations. The old G4 and G3 did, the Pentium 4 does, the old 486 did, etc. etc. The whole 32-bit vs. 64-bit argument with these PowerPC 970 chips (and, in a similarly light, AMD64 chips) ha
Re:This is NOT all that surprising. (Score:5, Informative)
87.5 NEC Earth-Simulator
67.8 Hewlett-Packard ASCI Q
69.0 Linux Networx MCR Linux Cluster Xeon
59.4 IBM ASCI White
73.2 IBM SP Power3
71.5 IBM xSeries Cluster
45.1 Fujitsu PRIMEPOWER HPC2500
79.2 Hewlett-Packard rx2600
72.0 Hewlett-Packard AlphaServer SC
77.7 Hewlett-Packard AlphaServer SC
Re:This is NOT all that surprising. (Score:2)
Re:I/O bandwidth and latency (Score:3, Informative)
Grumble... Go take a look at Apple's description of the G5 architecture [apple.com] before spouting.. Here's the relevant lines:
Apple uses the same basic memory set-up as
Re:I/O bandwidth and latency (Score:3, Informative)
The Powermac G5 uses up to 1GT/s, 64-bit wide version of IBM's Elastic I/O bus to connect each pr
The Mac cluster is still on top per CPU (Score:5, Interesting)
It still bests all other Intel hardware with only the Alpha hardware on top. And given the CPU count, even the Alpha hardware does not match it. Look at the numbers.....The Linux based 2.4Ghz cluster has almost 200 more CPU's on board with a 217 Gflop/sec difference. The Alpha clusters are running anywhere from 1,984 to 6,048 more CPU's.
Re:The Mac cluster is still on top per CPU (Score:2)
From the same document the Mac proponents have been quoting from: Dondarra Doc [netlib.org]
Table 3 - page 53:
Big Mac -> Rmax: 8164 Processors: 1936
Cray X1 -> Rmax: 2932.9 Processors: 252
Please be careful when making general statements. Thank you.
That said, yes, it has the highest per CPU performance of the machines with commodity processors. (that are listed, at least - including the year-old Xeons)
Re:The Mac cluster is still on top per CPU (Score:2)
Cray X1 -> Rmax: 2932.9 Processors: 252
I did say It still bests all other Intel hardware... Commodity clusters are entirely different beasts than dedicated supercomputers and this is exactly why I chose the terminology "clusters" rather than supercomputers. Also, check out the architecture of real "supercomputers". Most of the real costs are in CPU interconnectivity.
It's all about AMD (Score:2)
Remember them? Manufacturer of the highest performance x86 processors available? An array of dual-Opteron systems could be built with dramatically lower price/performance ratio than any other platform, especially G5s or Intel Xeons.
It's really fixed this time!! (Score:3, Informative)
It connects to the CPU via the "Apple Processor Interface" NOT via hypertransport. It connects to it's memory controller at 1/2 the CPU speed, unlike Opteron and Athlon 64 which connect to the memory controller at FULL CPU SPEED.
Documentation:
developer.apple.com [apple.com]
apple.com [apple.com] (thanks for the link)
From the U3 Northbridge, G5 uses hypertransport to connect to the other peripherials at 3
Re:The Mac cluster is still on top per CPU (Score:3, Informative)
Actually, if you read back a little bit, you will find that the contract was awarded to Apple because they gave the best bang for the buck and it turns out that Dell opti
Now at 8.2 Tflop as of today (Oct 22) (Score:5, Informative)
Since yesterday's release at 7.41 Tflop, the G5 cluster has already increased almost a Tflop, and is now ahead of the current #3 MCR Linux cluster, and about 0.5 Tflop behind a new Itanium 2 cluster.
Re:Now at 8.2 Tflop as of today (Oct 22) (Score:2)
Re:Now at 8.2 Tflop as of today (Oct 22) (Score:2)
But the deadline for submissions to the Nov. 2003 Top 500 list was Oct. 1, so these improvements should not be counted in this list.
Big Mac? How does that compare with a WOPR? (Score:5, Funny)
And mac fans are complaining? (Score:2)
This is fantastic, no matter what way you cut it! Using commodity components, these folk have turned the G5 into a real champion. No longer do budgets have to be in the hundreds, or even tens of millions to get a top-notch supercomputer. And this is not even th
Re:And mac fans are complaining? (Score:2)
Re:And mac fans are complaining? (Score:5, Interesting)
The G5 is also significantly lower cost than the Power4
Re:What "commodity"? (Score:2)
> to buy on a couple month's wages.
Well, im sure we all do.
I also want a house for what I can pay in two months wages.
But these things do have costs.
Even if each computer was $1 total, for 2000 of them thats $2000 right there.
So even as much as $10 a computer would be 'affordable' thou definatly more than two months pay. But I have hope of actually saving up $20,000 after awhile.
You find me $10 computers that can do 1
They didn't save the world AGAIN? (Score:4, Insightful)
First you have the iTunes store which doesn't do anything but give the average user basically anything he or she might have wanted to have in on online music store. Despite its being free, we're all cheesed off that it doesn't support OGG, or it's meant partly to push iPods (duh), or whatever.
Now this -- a supercomputer that has, to quote that again, the "best price/performance ratio ever achieved on a supercomputer." But dang it all, it doesn't completely blow away every established precedent -- it's just in the top five on the usual list of comparisons. One more crushing disappointment.
From Microsoft, we just want products that don't completely ream us. From Apple, we want the entire world to seem a little friendlier and cooler with every product release, every dot-incremenent OS update. They both disappoint us, but the expectations seem a little different...
Re:They didn't save the world AGAIN? (Score:2)
iTunes for Windows, just like Mac iTunes, does it's decoding using Quicktime. As crappy as you think Quicktime Player software is, the backend Quicktime library is very nice, especially in regards to it's modularity.
Any app that uses Quicktime Lib can now play AAC files (even the iTMS 'protected' ones), not just iTunes. Of course, not may Windows apps us
What about the RAM? (Score:2)
I am sick and tired... (Score:3, Funny)
Moore's Law applied (Score:3, Interesting)
Yes, but doesn't Moore's Law and the commodification of computer hardware suggest that each new generation supercomputer will have the best price/performance ratio?
Re:Moore's Law applied (Score:2)
Efficiency: switch topology? (Score:3, Insightful)
BTW, the performance never was stated to be 17 TF, so it did not drop to 7.4 (or whatever it ends up to be).
Re:Efficiency: switch topology? (Score:2)
#2: Sadly, you're still wrong. It was stated that it achieved "around 14 TFlops".
Big Mac achieves around 14 TFlops with 128 Nodes [slashdot.org]
Posted by CmdrTaco on 14:24 16 October 2003
Price vs Preformance (Score:2, Interesting)
Re:Price vs Preformance (Score:2)
That can't possibly be right. There's no way that the cluster's power requirements are over 1 home's worth per CPU. Maybe they just added a zero and it's supposed to be 300, but even that sounds very high.
Re:Price vs Preformance (Score:2)
Re:Price vs Preformance (Score:2)
Cooling takes lots of power, as you can see when the US has a hot summer and the grid and power plants struggle to keep up with demand.... The nodes do not consume that much power, relatively speaking
Apple specs for the XServe dual processor max cinfiguration have maximum power consumption at 244W [apple.com]
I doubt that the G5 dual processor is much more than that. I haven't seen power c
Re:Price vs Preformance: Off an order of magnitude (Score:5, Interesting)
But your point is a good one. I often wonder about the environmental economics of people running SETI, Folding@Home, etc. on older machines. Most of those older "spare" CPU-cycles are quite costly in terms of electricity relative to newer faster machines that do an order of magnitude more computing with the same amount of electricity.
Re:Price vs Preformance: Off an order of magnitude (Score:3, Informative)
Each processor, drive, and switch generates heat which is dissipated into the air. Untouched that heat accumulates and will kill the entire thing. With 1100 dual processor nodes running (and you can be they'll each be running at pretty close to full tilt) constantly that's a hell of a lot of heat that needs to be removed from the air.
Thats nothing (Score:4, Funny)
to manually clock the CPU's.
So far i've managed ONE whole flop.
My record is for the slowest supercomputer
on the planet.
Re:Thats nothing (Score:3, Funny)
Attach a hall-effect sensor to a hamster wheel to drive the clock.
Go out and buy a hamster.
Re:Thats nothing (Score:2, Funny)
Do they byte ?
Missing the point (Score:2)
8 TFlops on a single board anyone? (Score:2)
PPC64 optimizations? (Score:2)
Price/performance and Moore's Law (Score:2)
Noted. And go VT, go Apple! Now, with the cheerleading out of the way, I wonder something - with Moore's law and all still applying pretty well, just getting the latest-and-greatest any home computer architecture will all but guarantee you pretty good price/performance.
As another poster pointed out, someone's recent laptop could do as well on Linpack as a 1992 supercomputer.
So what I think would
Processor architecture and application performance (Score:2)
Good read for anyone interested in some of the background in current super computers and what they used for testing.
Heres the link. [jukasisters.com]
Scalability (Score:5, Informative)
The degree of loss is interesting, and suggests that their algorithm for distributing work needs tightening up on the high-end. Nonetheless, none of these are bad figures. When this story first broke, you'll recall the quote from the top500 list maintainer who pointed out that very few machines had high performance ratings, when they got into the large numbers of nodes.
I'd say these are extremely credible results, well worth the project team congratulating themselves. If the team could open-source the distribution algorithms, it would be interesting to take a look. I'm sure plenty of Mosix and BProc fans would love to know how to ramp the scaling up.
(The problem of scaling is why jokes about making a Beowulf cluster of these would be just dumb. At the rate at which performance is lost, two Big Macs linked in a cluster would run slower than a single Big Mac. A large cluster would run slower than any of the nodes within it. Such is the Curse that Amdahl inflicted upon the superscaler world.)
The problem of producing superscalar architectures is non-trivial. It's also NP-complete, which means there isn't a single solution which will fit all situations, or even a way to trivially derive a solution for any given situation. You've got to make an educated guess, see what happens, and then make a better informed educated guess. Repeat until bored, funding is cut, the world ends, or you reach a result you like.
This is why it's so valuable to know how this team managed such a good performance in their first test. Knowing how to build high-performing clusters is extremely valuable. I think it not unreasonable to say that 99% of the money in supercomputing goes into researching how to squeeze a bit more speed out of reconfiguring. It's cheaper to do a bit of rewiring than to build a complete machine, so it's a lot more attractive.
On the flip-side, if superscaling ever becomes something mere mortals can actively make use of, understand, and refine, we can expect to see vastly superior - and cheaper - SMP technology, vastly more powerful PCs, and a continuation of the erosion of the differences between micros, minis, mainframes and supercomputers.
It will also make packing the car easier. (* This is actually a related NP-complete problem. If you can "solve" one, you can solve the other.)
point missed (Score:2, Insightful)
What seems to be missing from most of the conversation is that it's not the Mac's that are loosing efficiency per se, it's the network (the interconnects) that is slowing the machine as a whole down. I know little about the LinPac test, but I would assume that it's written to test/stress the entire machine: CPU, disk, memory and interconnects. If the Macs can finish par
32 bit numbers? (Score:2)
Re:32 bit numbers? (Score:2)
Congrats to the VT team (Score:2)
I still think #4 in the world is pretty damn impressive for Apple hardware! And it looks like there might be some small performance improvements to come.
I think everyone involved did a pretty damn good job! Have a beer on me.
-psy
This also makes the Big Mac... (Score:2, Funny)
(hides/ducks - I ain't an anonymous coward for nothing!)
seti@home not listed (Score:5, Interesting)
show the SETI@Home project. The top entry
is NEC at 35 terraflops. Today's SETI@Home
average for the last 24 hours is 61 terraflops.
It may be a virtual supercomputer, but it
is producing real results.
Re:Frys... (Score:2)
i'd think it would fry then.
Re:Some tweaking will do it good... (Score:2, Interesting)
Re:Pentium 4.... non xeon? (Score:2)
Problem: MS Software, or optimising NIX or Linux for clustered smp. Could be the equivalent of a Clusters last stand, up against Apples injun' mod bsd cluster software.
Re:Unfair discounted price/performance (Score:2)
Re:facts, please? (Score:3, Informative)