Forgot your password?
typodupeerror
Desktops (Apple) Businesses Technology (Apple) Apple Hardware Technology

Drooling Over VA Tech's 1100-Node G5 Cluster 441

Mr. Slurpee writes "Virginia Tech's 1100-node dual 2 GHz Apple G5 Terascale Cluster is getting racked up and ready to roar. If you're a penniless geek like me, at least there's some tech pr0n for us to drool over. There's 1100 of them ... think they could part with one?" Update: 09/22 02:55 GMT by T : Matt submits a link to this full mirror of the photos, writing "The page owner's comment on the original mirror being taken down due to bandwidth? 'Bring it on!'"
This discussion has been archived. No new comments can be posted.

Drooling Over VA Tech's 1100-Node G5 Cluster

Comments Filter:
  • by Anonymous Coward on Sunday September 21, 2003 @12:50AM (#7015597)
    here [cmu.edu].
  • Re:Why G5s? (Score:5, Informative)

    by 2nd Post! ( 213333 ) <gundbear@pacb[ ].net ['ell' in gap]> on Sunday September 21, 2003 @12:57AM (#7015643) Homepage
    Here's [chaosmint.com] why. Some of the more pertinent points:

    Dell - too expensive [one of the reasons for the project being so "hush hush" was that dell was exploring pricing options during bidding]

    Sun (sparc) - required too many processors, also too expensive

    IBM/AMD (opteron) - required twice the number of processors and was twice the price in the desired configuration; had no chassis available

    HP (itanium) - ditto

    Apple (IBM PPC970) - system available with chassis for lowest price
  • by F2F ( 11474 ) on Sunday September 21, 2003 @01:06AM (#7015679)
    Pink [lanl.gov] at LANL has the following:

    1024 nodes
    2048 cpus
    1024 power cables
    1024 Myrinet network cards
    2048 fiber cables (8.8 miles)
    3072 Myrinet switch ports
    4096 sticks of RAM (2 Terabytes)
    7168 fans
    1 hard drive
    1 CDROM drive

    Not only do they have pictures of its assembly, they have movies [lanl.gov].

    Check the web page for more stats and better quality movies.

    Oh, yes, it's unclassified :)
  • by littlerubberfeet ( 453565 ) on Sunday September 21, 2003 @01:24AM (#7015757)
    One of the reasons VTech went for a G5 based cluster WAS price-performance...The mac option is cheaper then a PC aption and easier to install and maintain then Linux says the slide show. I might not fully agree, but thats their reasoning.
  • If you read the link [chaosmint.com] in one of the earlier comments, you would see that:
    Slide Four

    Choosing the Right Architechture

    cost vs. performance (purely)
    total cost $5.2 million includes system itself, memory, storage, and communication fabrics
    one of the cheapest systems of its kind

    Slide Five
    Architectural Options

    Dell - too expensive [one of the reasons for the project being so "hush hush" was that dell was exploring pricing options during bidding]
    Sun (sparc) - required too many processors, also too expensive
    IBM/AMD (opteron) - required twice the number of processors and was twice the price in the desired configuration; had no chassis available
    HP (itanium) - ditto
    Apple (IBM PPC970) - system available with chassis for lowest price
  • Re:Hm. (Score:4, Informative)

    by tecnobabble ( 611104 ) on Sunday September 21, 2003 @01:45AM (#7015830)
    Actually, I just ran into this problem with a new eMac lab at my school, we set it up in about 4 hours. :)

    Easy way to do it.

    1. Set up 1 machine how you want it.
    2. Get a bunch of firewire cables.
    3. Hook the eMac's together using the cables. (If you can't reach with the cables, get some portable firewire drives, iPods work well with this too.)
    4. Use Carbon Copy Cloner 2.2 (http://www.bombich.com/software/ccc.html) and move down the line of machines until they're all the same.
    5. Go in and change HD, Network, etc names.
    6. Smile because you just did something in 4-5 hours that it would take Windows users a week to do.

    If you have questions, feel free to email (sethmath @ mac.com) me about it. I can walk you through if necessary.
  • Re:Mod Parent Up (Score:2, Informative)

    by Anonymous Coward on Sunday September 21, 2003 @02:20AM (#7015955)
    Learn to read, VT told you

    Because the G5 systems were the cheapest AND fastest

    There is no rackmount version of the G5 yet. That would be an upcoming G5 Xserve that has not been announced yet.

    Plus, I guess when this cluster needs upgrading, they can sell off the Dual G5s which should hold their value for quite a while, as they are just stock G5 workstations.
  • Re:Hm. (Score:3, Informative)

    by gerardrj ( 207690 ) * on Sunday September 21, 2003 @02:23AM (#7015971) Journal
    I'm as big a Mac fan as anyone, but this would not take a Windows user a week. There are several apps that will mirror and restore HDs in a matter of minutes. Over a 10bT network I used to use Ghost to generate over 25 Win95 installs an hour, just by myself.
    1. Boot to floppy
    2. Press menu option for image to install
    3. Boot machine
    4. Change HD, Network, etc names

    I don't know what the average user/site would encounter with the WinXP authorization, but I know larger sites get blanket installation without the contating MS step.
  • Re:Video cards... (Score:2, Informative)

    by jdog1016 ( 703094 ) on Sunday September 21, 2003 @02:25AM (#7015978)
    Actually, I remember when they posted the manual for potential volunteers for assembly, the instructions were to open the case, install the nic, and close the case, then test it by plugging it in and turning it on. So unless they removed it after that, there are 1100 video cards in that system.
  • Re:Hm. (Score:2, Informative)

    by orange_6 ( 320700 ) <jtgalt.gmail@com> on Sunday September 21, 2003 @02:26AM (#7015985) Journal
    I run a lab with around 50 G4s (and 50 PCs) and we've had this problem as well. The PCs are easy since they're all identical and can be remotely reimaged in about 3 hours (on a slow lan), but the Macs are a difficult breed b/c of our network, which is all Novell based. Our Mac IT guy is at our lab nearly 3x a week trying some new configuration and the one he's using now is just to have a FireWire external with everything loaded. Not as simple and efficient as a network rebuild, but it works.
  • Re:Mod Parent Up (Score:4, Informative)

    by Anonymous Coward on Sunday September 21, 2003 @02:35AM (#7016027)
    http://www.apple.com/xserve/
  • Re:Hm. (Score:5, Informative)

    by Benley ( 102665 ) on Sunday September 21, 2003 @02:37AM (#7016037) Journal

    I run a lab with about 50 macs (assorted models, from 350mhz iMacs through 800mhz eMacs, and a few 1ghz G4's) - I spent a good bit of time on a solution, and it's really not as hard as this thread makes it sound.

    First, I build one system and set it up *Exactly* the way I want all the others to be. I have some run-once script voodoo to set the IP address of each machine based on its Mac address, and to munge some ByHost user preferences for the built-in guest account. Then, I use Carbon Copy Cloner">Carbon Copy Cloner [bombich.com] to create an image of that machine's hard drive.

    Once I have an image of the machine, I use NetRestore [bombich.com]NetRestore (by the same guy as CCC) to create a netboot image that will automatically install the master machine's HD image onto each client.

    I am fortunate to have a MacOS X Server machine on which to run the NetBoot server - which is independent of the subnet's master DHCP server, I might add - but it is possible to netboot macs from other Unix machines with a bit of patching to dhcpd.

    Anyhow, all in all I don't find it any more difficult to netinstall Macs than it is to do the same for Windows machines. Building the master clone image is time consuming and annoying, but it always will be for any platform.

    Feel free to email me if you are interested in my machine setup voodoo script. I had to borrow some binaries from OS X Server in order to make it work. It's slowly turning into something useful as I add more functionality to it.

  • by evilviper ( 135110 ) on Sunday September 21, 2003 @02:38AM (#7016042) Journal
    A $2,000 Macintosh system will run just as well as a $900 self-built x86 system.

    Depends what you mean by "as well as"... That only applies if you aren't talking about heat output, power requirements, cooling required, decent case design, ease of servicing. Then, for the programs you are using, things like an extra-fast bus, large CPU cache, and posibility of huge ammounts of RAM, must not be important at all to you.

    So, sure, if those 8 things are not to be considered at all, then sure, you can say that the x86 option will run just as well.

    And before you start calling me an Apple zealot, I do not, nor have I ever owned a single Apple or Mac-compatible computer. I do not work for Apple or any associated companies. Additonally, I do not common use Apple computers for any purpose.
  • by davebaum ( 653977 ) on Sunday September 21, 2003 @02:45AM (#7016073)
    The error comes from the fact that the calculations are being done in floating point, and that some of the quantities involved cannot be represented exactly as a base 2 floating point number.

    We run into the same problem using decimal notation in base 10. For example, 1/3 is 0.333... (repeating forever). If you only use a finite number of digits, then whatever number you write down in decimal notation will be a little bit smaller than 1/3. Now multiply that number by 3 and subtract 1:

    3 * 1/3 - 1 = 0

    But if we use a finite number of decimal digits (say 4) then we get 3 * 0.3333 - 1 = -0.0001.

    What throws most people is that although they are used to 1/3 being a repeating decimal, they think 0.1 should be an exact number in floating point. However, computers generally use base 2 instead of base 10, and 1/10 happens to be a repeating fraction in base 2, so all of those decimal numbers become inexact in floating point calculations.

    Most of the time this inaccuracy is hidden by performing the calculation with extra digits and rounding the results. Often the errors are rounded away before displaying the result, but they are still lurking in the floating point values. Take your example: 3.083-3.014. In most programs, (Calculator apps, Excel, etc) the result is probably displayed as 0.069. However, if you calculate 3.083-3.014-0.069 you will not get 0. You will see the rounding error.

    The bottom line is that floating point calculations are inherently inexact. Most programs (in most situations) do a good job hiding this, but the error is always there.
  • Re:Why G5s? (Score:1, Informative)

    by Anonymous Coward on Sunday September 21, 2003 @03:29AM (#7016236)
    176 TB is for the entire cluster
  • Re: Processors
    Perhaps for their benchmarks, the G5 was 2x the performance of the Opteron. Have you taken into consideration the Altivec processor, which happens to be 128bit in size? Any vector processing will be enhanced greatly by the powerful nature of the G5 in general, and especially when using Altivec optimized code. Couple this with IBM's XLC auto-vectorizing C compiler, and I wouldn't be surprised if Altivec did wipe SSE2/3D!Now; it's been discussed before that Altivec is a superior solution to MMX/MMX2/SSE, and SSE2, so there's no reason to doubt that when you pump up the FSB from 167MHz->1GHz, pump up the CPU from 1.4GHz->2.0GHz, on the PowerPC architecture, that Altivec doesn't become the most powerful SIMD solution in commodity computing.

    Re: Chassis
    It may be a time of research vs time to market discrepancy; IE, at the time VT was requesting bids, there were no Opteron chassis announced or available, whilst Apple may have had at 95% completion, barring an actual press release and announcement. Like, simultaneous to the release of the G5 there are no IBM PPC 970 machines, yet both companies use the same CPU.

    Re: OS X
    Yeah, there is a 64 bit X. It's called OS X Panther, and there's a 64 bit aware X called 10.2.7, and the libraries for Altivec have been 128bit for years now, so all 10.2.7 really added was... 64 bit pointers and memory addresses, really.

    To recap: Altivec makes a big difference. Having immediately available machines makes a difference. Having a lower price point per performance per machine makes a difference (each node, including AC + networking + ram only costs about $4,727, which is $1,600 lower than an identically specced stock dual G5 with 4GB of ram!), as well as supportability of OS X vs Linux or, heaven forbid, Windows 2k... And yes, OS X for these machines are at least 64 bit enough to address 8GB of ram, and the OS has *always* been able to manipulate 128 bit data, as well as 64 bit data.
  • Re:space.. (Score:3, Informative)

    by Maserati ( 8679 ) on Sunday September 21, 2003 @03:40AM (#7016279) Homepage Journal
    VT was trying to make a deadline for a "Top 10 Supercomputers list", so time was a factor in the bidding; Dell treid for price, but couldn't make the delivery time that Apple could (by bumping everyone else's order back). Quad G5 Xserves might have to be 2U units, due to heat. They'll probably wait for the 0.9 micron or smaller processes from IBM to do a g5 Xserve. Right now, the Xserve is a 1U dual G4 system. The desktop management tools in the OS X server package sound tempting. My Studio group is proposing half a terrabyte worth of storage, and I might be able to use that as a management machine as part of the 10.3 rollout.

    It's pretty nice to be able to work with another group that closely.
  • Re:Why G5s? (Score:2, Informative)

    by Trurl's Machine ( 651488 ) on Sunday September 21, 2003 @03:41AM (#7016282) Journal
    Truly amazing, how many of you ever thought you would live long enough to see Apple win a contract based on price?

    I am typing these very words on a contract won based on price. When I was searching the market for a new laptop with all the qualities I wanted (repeat: ALL, including such factors often omitted by PC users as battery life, general robustness or silence), low-end iBook was actually the CHEAPEST option.
  • by FredFnord ( 635797 ) on Sunday September 21, 2003 @04:19AM (#7016383)
    Or do you just want to bitch?

    The real answer is that the problems that are going to be solved with this cluster are easily parallelizable. That's the IDEA, right? 1100 machines, each running one chunk. Well, the G5, and more specifically the Altivec vector processing section of it, is SO MUCH better for processing big bites of easily parallelizable data at a time than any of the alternatives that it can run rings around any Intel or AMD machine you care to name with fewer than double the number of processors. (And in the cases of some particular kinds of calculations, it beats those, too. But you can't count on that for all your problems.)

    We've seen this before a number of times... I seem to recall a gene sequencing program that was running five or six times faster on a G4 than it was on a Pentium IV of the same speed. And then there's SETI@home, which runs much faster, cycle-for-cycle, on the Mac, and doesn't even USE altivec. (Though I believe it does take advantage of the 'multiply-and-add' instruction of the PPC, which is another nice little feature.)

    Altivec is an astonishingly clean and usable interface for an amazingly powerful vector processor that is, in 99% of the Macs out there, underutilized to the point that if it suddenly disappeared, most people wouldn't notice any difference at all. It's kind of a pity, really.

    Basically, Intel came out with MMX (and all the later developments) in order to have a talking point on a slide presentation about their processors, about the time when competitors like AMD were starting to come forward: functionally, an awful mess, and impossibly difficult to program. (In fact, for the first few years, Intel would send programmers out to work with companies to implement MMX, because otherwise none of them would bother.

    AMD came up with something that was a little less hacked together in a very short period of time, as a response to Intel. But it still wasn't pretty, at least partially because of the limitations of the archetecture, and the performance wasn't *that* much better than just doing without.

    Apple (who really designed a lot of the basics themselves when it comes to Altivec, so don't think this was a Motorola invention) said, 'Hey, wow, we need something like that, in order to compete.' First they decided on a coprocessor, but that didn't fly any better with the PPC than it did with the older Macs (840av, 660av) with DSPs in them. So they sat down and came up with a really *good* spec for a set of multimedia extensions. And they've only gotten better since.

    I've toyed with altivec code, and I can tell you that in one application that I wrote, one instruction (vector permute) did the work of ten or more non-altiveced instructions on four times the data per cycle. Mind you, I just did it for fun, I don't know enough about parallel computing problems to come up with anything useful... but there's some interesting stuff under the hood.

    Of course, nobody is going to believe this, because as fashionable as it is to like MacOS X on slashdot these days, nobody wants to admit that, for *some* subset of problems, Mr. Jobs's reality distortion field might not be quite as much of a distortion as you might think...

    -fred
  • by evil_one ( 142582 ) on Sunday September 21, 2003 @05:56AM (#7016558) Homepage
    Uh, I'm running IE 5 on my Mac... In fact, the last G4 I set up had IE 5 preinstalled on it.
  • Re:Hm. (Score:4, Informative)

    by chrome ( 3506 ) <chrome@NOspam.stupendous.net> on Sunday September 21, 2003 @05:58AM (#7016568) Homepage Journal
    Exactly.

    Except, with Ghost, you could install 1000 machines in 30 minutes - using multicast.

    A couple might fail and you'd have to redo them, but if you have a 100Mbit switched network (or gig, even better) then its about 30 minutes to blast a Windows 2000 install to any number of machines.

    I love macs, typing this on a PB17", but all the apple zealots out there really make me ashamed sometimes.

    Macs are strong in some areas, and weak in others. If it wins in something, DONT RUB PEOPLES FACES IN IT. They don't care.

    Get over it.
  • MOD PARENT DOWN!!!! (Score:2, Informative)

    by tulare ( 244053 ) on Sunday September 21, 2003 @06:08AM (#7016595) Journal
    Unless you want to look at a disgusting picture of a girl in a bathtub eating her feces. No, I'm not kidding, that's one of his links, and it's quite possibly groser than goatse.cx.
  • by tulare ( 244053 ) on Sunday September 21, 2003 @06:43AM (#7016691) Journal
    You've conveniently glossed over the fact that IE for the Mac is
    a) No longer supported, and
    b) An ugly, slow, and feature-devoid rectangle, resuling in
    c) Most OS X users to delete it entirely out of disgust.

    Seriously, Safari is a nice, clean, fast browser, imho, and certainly renders most websites as well as or better than Idiot Exploiter, excepting only those sites which were deliberately written with broken code.
  • by tansey ( 238786 ) on Sunday September 21, 2003 @09:59AM (#7017188) Journal
    I was one of the hordes of CS majors who helped setup the super computer (grunt work is fun!). VT is using inifiniband cards w/extremely low latency copper cable (forget the name) which acheives the same bandwidth as fiber optics.

    Loads of cisco catalyst switches are involved also.
  • by jboyd ( 704037 ) on Sunday September 21, 2003 @10:39PM (#7021586)
    But they've shut down the reactor =(

BASIC is the Computer Science equivalent of `Scientific Creationism'.

Working...