JPL Clusters XServes 62
burgburgburg writes "MacSlash has a brief note how NASA's JPL has put together a cluster of 33 XServes that was able to achieve 1/5 teraflop. The original article notes that the Applied Cluster Computing Group, using Pooch (Parallel OperatiOn and Control Heuristic Application) ran the AltiVec Fractal Carbon demo and achieved over 217 billion floating-point operations per second on this XServe cluster. More importantly, their research indicates that no evidence of an intrinsic limit to the size of a Macintosh-based cluster could be found."
Where's the GigE switch? (Score:4, Interesting)
Re:Where's the GigE switch? (Score:2, Funny)
Re:Where's the GigE switch? (Score:3, Funny)
But is it pronounced "Eks Serve", "Ten Serve", or "Throatwarbler Mangrove"?
(Gordon Bennett, lucky they didn't call it ServeX!)
Re:Where's the GigE switch? (Score:1)
Re:Where's the GigE switch? (Score:5, Informative)
Re:Where's the GigE switch? (Score:2, Interesting)
Anyone know?
Re:Where's the GigE switch? (Score:3, Informative)
Re:Where's the GigE switch? (Score:5, Informative)
However, I think the interrupt overhead for a 1000Mb link would be so high as to bring the machine to a screeching halt (okay, slow it down perceptibly). What a lot of driver writers do for gigabit links is to move their driver into polling mode. They essentially set a timer to go off every X milliseconds and process all the packets that have been copied into memory during that timeframe.
This gives a lower bound on the latency. A packet will always take X milliseconds to be noticed and processed by the system. Interrupt overhead stays low, but packet latency goes up a smidge.
It's a good trade off. I would bet that on a saturated link, packet latency at gigabit speeds is equivalent or WORSE than 100Mb. I might have to test that out...
cr
Re:Where's the GigE switch? (Score:5, Informative)
"no upper limit" (Score:1, Funny)
(it's a joke. laugh.)
Re:"no upper limit" (Score:3, Insightful)
No comparison? (Score:3, Interesting)
The article doesn't make any comparison between this and other (read x86 linux clusters) solutions. Do the x86 clusters have a problem scaling as well as xserves? I've heard of several-thousand node x86/linux clusters, so I would guess not, but I don't really know. Also no mention of $$/{MIPS/FLOPS/Whatever}, which would be nice to compare against an x86 cluster as well.
Re:No comparison? (Score:4, Informative)
OTOH, if you can take advantage of it, that would put this cluster at #250 in the Top 500 [top500.org] list of supercomputers. In fact, it is just a tick behind an IBM NetFinity cluster with 512x733MHz Pentium IIIs. Not bad for 66x1GHz G4s.
Re:No comparison? (Score:4, Informative)
No, it is not. The Top500 ranking is based on *actual* parallel performance in *DOUBLE PRECISION* linpack.
The _theoretical_ peak performance of 66*1 GHz G4 boxes in double precision floating point is 66 Gflop. In practice the G4 has large scheduling problems with the normal floating-point unit, so I would be surprised if it could even achieve 30 Gflops. And ethernet is not going to scale very well for LINPACK. The real performance of parallel LINPACK on this machine would probably be in the order of 10 Gflops.
The Xserve is a nice box, and Altivec is cool for some applications, but real scientific applications are VERY different from a single precision fractal demo.
Re:No comparison? (Score:3, Informative)
The G4+ has ONE floating-point unit (Score:2, Informative)
If you don't believe me, you might at least believe Motorola [motorola.com]
Or, check out a summary [jc-news.com].
Re:No comparison? (Score:5, Informative)
The theoretical peak performance for 33 XServes in the test done here was actually 495 GFLOPS, BTW. I don't know what the theoretical performance of double precision on Altivec is, though. LINPACK is all linear algebra (IIRC), so it would see some benefit.
I will admit that there are plenty of applications where the G4 is not the best processor available. I for one will certainly be happy to see the IBM PPC 970, but you shouldn't discount the XServe until the test is actually run.
Re:No comparison? (Score:1, Informative)
It is simply not possible - the Altivec unit doesn't have any instructions that can handle double precision, and emulating it with single precision would be an order of magnitude slower than doing it in the normal FPU. This is exactly why Intel introduced SSE2 that does double precision.
Re:No comparison? (Score:3, Informative)
33 Machines x ($3999 + $200 ##1.5Gigs Aftermarket DDR##) = $138,567
$138,567 for 217 gigaflops = $638.55/gigaflop
A (pretty loose) comparison (Score:2, Informative)
$4,747,392 offering 11.2 Teraflops...
$423.87/Gigaflop...
Re:A (pretty loose) comparison (Score:1, Informative)
Everybody in science quotes double precision benchmarks by default. Of course it is possible to use single in some cases, but then you'll have to compare single vs. single and not limit the alternatives to double...
Re:A (pretty loose) comparison (Score:1)
On the front page [top500.org], the statement: "Rmax: 5.69 Tflops"
Re:No comparison? (Score:2)
They effectively couldn't compare unless someone wants to write the SSE1 or SSE2 equivalent.
Re:No comparison? (Score:2)
I don't think that's true. I have a G4 (x2), but the program gives me the option of disabling multiprocessor support and AltiVec at run-time. It looks like the program will check for multiple CPUs and AltiVec capability at launch and run accordingly, but I can't confirm that myself right now.
You can download the program here [apple.com].
This was done before with G4 (Score:4, Informative)
http://www.spymac.com/gallery/showphoto.php?photo= 4665
yeah but... (Score:2, Informative)
Re:yeah but... (Score:2)
I guess what this really tells us is that 33 XServes working together achieves nearly the same performance as almost five times that number of 450 and 533 MHz machines. Seems like the Xserves are providing raw FLOPS scalability superlinearly to their clock speed.
Obligatory Post with a Splash of Lemon (Score:5, Funny)
Re:Obligatory Post with a Splash of Lemon (Score:1)
"Its so fast!!"
Imagine This... (Score:5, Interesting)
"Today, I'm going to talking about Mac OS 10.3 and a big part of OS 10.3 is our clustering software.... [blah, blah]
screw that.... (Score:2)
Myth... ? (Score:5, Funny)
im done imagining (Score:2)
tiffs and picts are prefered, tgas or jpgs will suffice.
Acronym (Score:2)
Re:Acronym (Score:1)
nearly matches NASA (Score:1)
In the Top 500 Supercomputers... (Score:5, Funny)
Of course I fully expect the employees of the West Hartford Apple store to ceremoniously run three doors down and moon the folks at Williams-Sonoma. Ah, Mall Life.
(*the whole lot of which just got its lunch eaten, dope slapped and girlfriend stolen by the new NEC cluster in Japan - 35,860 GFlops, Los Alamos is 2nd & 3rd at 7,727 with two of their HP server clusters.. sheesh.).
Not the biggest Xserve cluster... (Score:2, Informative)
Re:Not the biggest Xserve cluster... (Score:2, Informative)
Here [macnn.com]
It's the 42 node cluster.
Scalability... (Score:2, Interesting)
Should put them in the bottom part of the top 500 (Score:1)