JPL Clusters XServes 62

Posted by pudge on Friday November 15, 2002 @12:49PM from the mmm-speed dept.

burgburgburg writes "MacSlash has a brief note how NASA's JPL has put together a cluster of 33 XServes that was able to achieve 1/5 teraflop. The original article notes that the Applied Cluster Computing Group, using Pooch (Parallel OperatiOn and Control Heuristic Application) ran the AltiVec Fractal Carbon demo and achieved over 217 billion floating-point operations per second on this XServe cluster. More importantly, their research indicates that no evidence of an intrinsic limit to the size of a Macintosh-based cluster could be found."

JPL Clusters XServes

This discussion has been archived. No new comments can be posted.

Load All Comments

Search 62 Comments Log In/Create an Account

Comments Filter:

Where's the GigE switch? (Score:4, Interesting)

by teridon ( 139550 ) writes: on Friday November 15, 2002 @12:55PM (#4677841) Homepage

All G4s (including the XServer) have GigE built-in. I wonder if the GigE switch was too expensive?

- - Re:Where's the GigE switch? (Score:2, Funny)
    
    by Anonymous Coward writes:
    
    Gigabit Ethernet. Hope that helps.
  - Re:Where's the GigE switch? (Score:3, Funny)
    
    by Anonymous Coward writes:
    
    By the way, it's called XServe, not XServer.
    
    But is it pronounced "Eks Serve", "Ten Serve", or "Throatwarbler Mangrove"?
    
    (Gordon Bennett, lucky they didn't call it ServeX!)
    - Re:Where's the GigE switch? (Score:1)
      
      by bursch-X ( 458146 ) writes:
      
      I think the official pronunciation is "Apple server thingy, the flat one". ;-)
- Re:Where's the GigE switch? (Score:5, Informative)
  
  by EricWright ( 16803 ) writes: on Friday November 15, 2002 @02:34PM (#4678762) Journal
  
  Maybe the computations were not communications bound... Fractal calculations can be done with a Monte Carlo method, which is highly parallelizable, and requires very little inter-node communication.
  
  - Re:Where's the GigE switch? (Score:2, Interesting)
    
    by batobin ( 10158 ) writes:
    
    I wonder if gigabit ethernet has better latency though.
    
    Anyone know?
    - Re:Where's the GigE switch? (Score:3, Informative)
      
      by Twirlip of the Mists ( 615030 ) writes:
      
      No, I don't know, but my uninformed opinion would be that there's no measurable difference in latency between gigabit and 100BASE-T. The vast majority of the latency happens in the TCP stack inside the computer and in the NIC, as packets are generated and whatnot. Actual transmission latency will be so tiny as to make no meaningful difference.
    - Re:Where's the GigE switch? (Score:5, Informative)
      
      by cremes ( 16553 ) writes: on Friday November 15, 2002 @05:30PM (#4680215) Homepage
      
      The latency question is a good one. I'd say the answer to this lies in the driver for the NIC. I've written an IOKit ethernet driver and experienced pretty decent performance at 100 Mb. The system is processing packets as incoming data causes interrupts in the system.
      
      However, I think the interrupt overhead for a 1000Mb link would be so high as to bring the machine to a screeching halt (okay, slow it down perceptibly). What a lot of driver writers do for gigabit links is to move their driver into polling mode. They essentially set a timer to go off every X milliseconds and process all the packets that have been copied into memory during that timeframe.
      
      This gives a lower bound on the latency. A packet will always take X milliseconds to be noticed and processed by the system. Interrupt overhead stays low, but packet latency goes up a smidge.
      
      It's a good trade off. I would bet that on a saturated link, packet latency at gigabit speeds is equivalent or WORSE than 100Mb. I might have to test that out...
      
      cr
      
      - Re:Where's the GigE switch? (Score:5, Informative)
        
        by Twirlip of the Mists ( 615030 ) writes: <twirlipofthemists@yahoo.com> on Saturday November 16, 2002 @12:41AM (#4683629)
        
        I thought a lot of gig-e cards implemented a good deal of packet processing in hardware to deal with this very problem. Am I mistaken? I remember that the first PCI gig-e card I ever saw was installed in an SGI Origin, and when running full-out it pegged an entire CPU with interrupt handlers. Later versions of the card-- or perhaps an entirely different card, but sold by SGI and used with the same Origin servers-- had hardly any interrupt activity at all, even when moving data at rates exceeding 50 MB/s.
        
"no upper limit" (Score:1, Funny)

by Anonymous Coward writes:

... except maybe price? :)

(it's a joke. laugh.)
- Re:"no upper limit" (Score:3, Insightful)
  
  by 00_NOP ( 559413 ) writes:
  
  Seriously, there will be an upper limit in a practical sense! Basically the speed of light will impose a practical limit - if it akes longer for information to pass up and down the system than it does to wait for a new instruction to be queued on an existing element of the system then there is no point in extending the system.
No comparison? (Score:3, Interesting)

by photon317 ( 208409 ) writes: on Friday November 15, 2002 @12:59PM (#4677897)

The article doesn't make any comparison between this and other (read x86 linux clusters) solutions. Do the x86 clusters have a problem scaling as well as xserves? I've heard of several-thousand node x86/linux clusters, so I would guess not, but I don't really know. Also no mention of $$/{MIPS/FLOPS/Whatever}, which would be nice to compare against an x86 cluster as well.

- Re:No comparison? (Score:4, Informative)
  
  by dhovis ( 303725 ) writes: on Friday November 15, 2002 @02:17PM (#4678618)
  
  Well, there is also the issue that they were using test code that Apple distributes to show off what the G4 chip can do. So that 217 GFLOP speed is dependent on having highly Altivec optimized code.
  OTOH, if you can take advantage of it, that would put this cluster at #250 in the Top 500 [top500.org] list of supercomputers. In fact, it is just a tick behind an IBM NetFinity cluster with 512x733MHz Pentium IIIs. Not bad for 66x1GHz G4s.
  
  - Re:No comparison? (Score:4, Informative)
    
    by Anonymous Coward writes: on Friday November 15, 2002 @03:23PM (#4679220)
    
    f you can take advantage of it, that would put this cluster at #250 in the Top 500 [top500.org] list of supercomputers. In fact, it is just a tick behind an IBM NetFinity cluster with 512x733MHz Pentium IIIs. Not bad for 66x1GHz G4s.
    
    No, it is not. The Top500 ranking is based on *actual* parallel performance in *DOUBLE PRECISION* linpack.
    
    The _theoretical_ peak performance of 66*1 GHz G4 boxes in double precision floating point is 66 Gflop. In practice the G4 has large scheduling problems with the normal floating-point unit, so I would be surprised if it could even achieve 30 Gflops. And ethernet is not going to scale very well for LINPACK. The real performance of parallel LINPACK on this machine would probably be in the order of 10 Gflops.
    
    The Xserve is a nice box, and Altivec is cool for some applications, but real scientific applications are VERY different from a single precision fractal demo.
    
    - Re:No comparison? (Score:3, Informative)
      
      by elphkotm ( 574063 ) writes:
      
      The G4 has 3 parallel floating point execution units, so the theoretical peak is actually 198 GFLOPS. Also, The LINPACK performance number from beowulf-style clusters is derived from the aggregate total performance of each node. This puts true single-image systems at a disadvantage, since real applications use quite a bit more latency sensitive and interconnect bandwidth intensive.
      - The G4+ has ONE floating-point unit (Score:2, Informative)
        
        by Anonymous Coward writes:
        
        You don't know what you're talking about. The G4 does NOT have three floating-point units. (hint: an integer unit doesn't do floating-point)
        
        If you don't believe me, you might at least believe Motorola [motorola.com]
        
        Or, check out a summary [jc-news.com].
    - Re:No comparison? (Score:5, Informative)
      
      by dhovis ( 303725 ) writes: on Friday November 15, 2002 @05:05PM (#4680010)
      
      Apple provides libraries for doing double precision math with the Altivec unit. See here [apple.com].
      The theoretical peak performance for 33 XServes in the test done here was actually 495 GFLOPS, BTW. I don't know what the theoretical performance of double precision on Altivec is, though. LINPACK is all linear algebra (IIRC), so it would see some benefit.
      I will admit that there are plenty of applications where the G4 is not the best processor available. I for one will certainly be happy to see the IBM PPC 970, but you shouldn't discount the XServe until the test is actually run.
      
      - Re:No comparison? (Score:1, Informative)
        
        by Anonymous Coward writes:
        
        Actually I'm using the Apple libraries a lot, they include double precision, but the double precision routines do NOT use the Altivec unit.
        
        It is simply not possible - the Altivec unit doesn't have any instructions that can handle double precision, and emulating it with single precision would be an order of magnitude slower than doing it in the normal FPU. This is exactly why Intel introduced SSE2 that does double precision.
- - Re:No comparison? (Score:3, Informative)
    
    by Gropo ( 445879 ) writes:
    
    Actually...
    
    33 Machines x ($3999 + $200 ##1.5Gigs Aftermarket DDR##) = $138,567
    
    $138,567 for 217 gigaflops = $638.55/gigaflop
    - A (pretty loose) comparison (Score:2, Informative)
      
      by Gropo ( 445879 ) writes:
      
      LLNL Linux Network/Quadrix supercluster [www.top500.org] if build out of Penguin Computing 1U Relion 140's: [penguincomputing.com]
      
      $4,747,392 offering 11.2 Teraflops...
      
      $423.87/Gigaflop...
      - Re:A (pretty loose) comparison (Score:1, Informative)
        
        by Anonymous Coward writes:
        
        But the linux cluster performance you quote is for double precision compared to single for the Xserve. In single precision the Xeons do 4 flops/cycle, which means the corresponding performance of the Linux cluster would be 22.4 Gflops (or $212/Gflop).
        
        Everybody in science quotes double precision benchmarks by default. Of course it is possible to use single in some cases, but then you'll have to compare single vs. single and not limit the alternatives to double...
        
        Re:A (pretty loose) comparison (Score:1)
        
        by Gropo ( 445879 ) writes:
        
        Actually, I think the 11.2 Tflops figure was in regard to single precision...
        
        On the front page [top500.org], the statement: "Rmax: 5.69 Tflops"
- Re:No comparison? (Score:2)
  
  by Leimy ( 6717 ) writes:
  
  Its because the demo program uses altivec... you can run that demo standalone [one machine] but you must have a G4.
  
  They effectively couldn't compare unless someone wants to write the SSE1 or SSE2 equivalent.
  - Re:No comparison? (Score:2)
    
    by Twirlip of the Mists ( 615030 ) writes:
    
    you can run that demo standalone [one machine] but you must have a G4.
    
    I don't think that's true. I have a G4 (x2), but the program gives me the option of disabling multiprocessor support and AltiVec at run-time. It looks like the program will check for multiple CPUs and AltiVec capability at launch and run accordingly, but I can't confirm that myself right now.
    
    You can download the program here [apple.com].
This was done before with G4 (Score:4, Informative)

by fulldecent ( 598482 ) writes: on Friday November 15, 2002 @01:01PM (#4677924) Homepage

Here is a site that does similar things with a G4 cluster for protein folding:
http://www.spymac.com/gallery/showphoto.php?photo= 4665

- yeah but... (Score:2, Informative)
  
  by Nomad37 ( 582970 ) writes:
  
  As pointed out on As the Apple Turns [appleturns.com], the difference is, while the PowerMacs reach higher performance levels (233 vs 217 Gflops) they take up a whole heap more space than a single rack of 42 IU servers...
  - Re:yeah but... (Score:2)
    
    by Twirlip of the Mists ( 615030 ) writes:
    
    That doesn't seem to make a whole lotta sense, though. The ATAT blurb compares the 217 GFLOPS attained by JPL on 33 Xserves with the 233 GFLOPS reported by USC on 152 Power Macs, and then concludes that the Power Macs take up more room. If JPL were to add a few more Xserves, they could top the 233 GLOPS figure handily, all inside a single rack. USC's Power Macs took up a lot more space because they were older, and there were a hell of a lot more of them.
    
    I guess what this really tells us is that 33 XServes working together achieves nearly the same performance as almost five times that number of 450 and 533 MHz machines. Seems like the Xserves are providing raw FLOPS scalability superlinearly to their clock speed.
Obligatory Post with a Splash of Lemon (Score:5, Funny)

by Spencerian ( 465343 ) writes: on Friday November 15, 2002 @01:35PM (#4678249) Homepage Journal

Imagine a Beowulf cluster getting beat up by a Xserve cluster on the playground and stealing its lunch cycles!

- Re:Obligatory Post with a Splash of Lemon (Score:1)
  
  by codeonezero ( 540302 ) writes:
  
  Why not call it a Jaguar cluster ;)
  "Its so fast!!"
- Imagine This... (Score:5, Interesting)
  
  by EccentricAnomaly ( 451326 ) writes: on Friday November 15, 2002 @05:29PM (#4680192) Homepage
  
  2004, Jobs WWDC Keynote...
  
  "Today, I'm going to talking about Mac OS 10.3 and a big part of OS 10.3 is our clustering software.... [blah, blah] ...Apple has long prided itself on the easy of use of our products... [blah, blah] (the tv screen behind jobs shows a room with twenty people wearing apple t-shirts and a stack of X-Serve boxes) ...my friends here have several of our next-generation power-4 based X-Serves running OS 10.3... during this keynote they are going to unpack all of the servers and set up a cluster... ...by the end of the keynote we'll give the cluster a spin and see if we can make it into one of the top 50 supercomputers in the world"
  
- screw that.... (Score:2)
  
  by commodoresloat ( 172735 ) writes:
  
  Imagine a single-processor version of this!!
Myth... ? (Score:5, Funny)

by EyeSavedLatin ( 591555 ) writes: on Friday November 15, 2002 @02:59PM (#4678998) Journal

Don't believe the Gigaflop myth! Oh wait, that's "MHz Myth"... sorry, as a Mac owner, I have to whip out that response in every thread. Carry on.

im done imagining (Score:2)

by paradesign ( 561561 ) writes:

let me see it!
tiffs and picts are prefered, tgas or jpgs will suffice.
Acronym (Score:2)

by usr122122121 ( 563560 ) writes:

(Parallel OperatiOn and Control Heuristic Application)
Wouldn't that make this POOCHA?
- Re:Acronym (Score:1)
  
  by deacon2499 ( 536528 ) writes:
  
  No, actually it makes it POOCH.App.
nearly matches NASA (Score:1)

by Englabenny ( 625607 ) writes:

i ran the altivec test on my non-altivec G3/300.. I am proud to announce i achieved almost 1/5 Gigaflop
In the Top 500 Supercomputers... (Score:5, Funny)

by jpellino ( 202698 ) writes: on Saturday November 16, 2002 @05:43PM (#4687225)

This would put it at #343 on the Top 500 Supercomputers* - right below the University of Edinburgh's Cray and just ahead of the IBM cluster at Williams-Sonoma. Yes, Williams-Sonoma.

Of course I fully expect the employees of the West Hartford Apple store to ceremoniously run three doors down and moon the folks at Williams-Sonoma. Ah, Mall Life.

(*the whole lot of which just got its lunch eaten, dope slapped and girlfriend stolen by the new NEC cluster in Japan - 35,860 GFlops, Los Alamos is 2nd & 3rd at 7,727 with two of their HP server clusters.. sheesh.).

Not the biggest Xserve cluster... (Score:2, Informative)

by TiMac ( 621390 ) writes:

Indeed I know of an Xserve cluster siginificantly larger than this one at a certain nearby university, they just aren't up to full capacity yet. But no doubt that news will come too...once they benchmark and spank the JPL cluster :)
- Re:Not the biggest Xserve cluster... (Score:2, Informative)
  
  by TiMac ( 621390 ) writes:
  
  And there's some news about it!
  Here [macnn.com]
  It's the 42 node cluster.
Scalability... (Score:2, Interesting)

by godzilla808 ( 586045 ) writes:

It's my understanding that, using Pooch, you can add machines with different processor/speeds (G3, G4). This is in contrast to most linux clusters I've heard about in which all nodes must be identical. So instead of having to upgrade everything at once, you add what you can when you can. Anyone able to confirm/deny this?
Should put them in the bottom part of the top 500 (Score:1)

by iPaul ( 559200 ) writes:

This should put them in the lower part of the top 500 list if they go through whatever hoops necessary to get the results posted. see Top 500 List [top500.org] With only 33 boxes at about $4,000 a node, that's a very cheap path to serious performance.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Where's the GigE switch? (Score:4, Interesting)

Re:Where's the GigE switch? (Score:2, Funny)

Re:Where's the GigE switch? (Score:3, Funny)

Re:Where's the GigE switch? (Score:1)

Re:Where's the GigE switch? (Score:5, Informative)

Re:Where's the GigE switch? (Score:2, Interesting)

Re:Where's the GigE switch? (Score:3, Informative)

Re:Where's the GigE switch? (Score:5, Informative)

Re:Where's the GigE switch? (Score:5, Informative)

"no upper limit" (Score:1, Funny)

Re:"no upper limit" (Score:3, Insightful)

No comparison? (Score:3, Interesting)

Re:No comparison? (Score:4, Informative)

Re:No comparison? (Score:4, Informative)

Re:No comparison? (Score:3, Informative)

The G4+ has ONE floating-point unit (Score:2, Informative)

Re:No comparison? (Score:5, Informative)

Re:No comparison? (Score:1, Informative)

Re:No comparison? (Score:3, Informative)

A (pretty loose) comparison (Score:2, Informative)

Re:A (pretty loose) comparison (Score:1, Informative)

Re:A (pretty loose) comparison (Score:1)

Re:No comparison? (Score:2)

Re:No comparison? (Score:2)

This was done before with G4 (Score:4, Informative)

yeah but... (Score:2, Informative)

Re:yeah but... (Score:2)

Obligatory Post with a Splash of Lemon (Score:5, Funny)

Re:Obligatory Post with a Splash of Lemon (Score:1)

Imagine This... (Score:5, Interesting)

screw that.... (Score:2)

Myth... ? (Score:5, Funny)

im done imagining (Score:2)

Acronym (Score:2)

Re:Acronym (Score:1)

nearly matches NASA (Score:1)

In the Top 500 Supercomputers... (Score:5, Funny)

Not the biggest Xserve cluster... (Score:2, Informative)

Re:Not the biggest Xserve cluster... (Score:2, Informative)

Scalability... (Score:2, Interesting)

Should put them in the bottom part of the top 500 (Score:1)

Related Links Top of the: day, week, month.

Slashdot Top Deals