Forgot your password?
typodupeerror
Technology (Apple) Businesses Education Apple Technology Hardware

Virginia Tech Announces Supercomputer Plans 419

Posted by pudge
from the i-got-dibs-when-they-are-done-with-it dept.
CousinVinnie writes "Previously noted in this Slashdot story, the administration of Virginia Tech has announced they're puchasing 1100 G5's (another story) in hopes to build a top-10 supercomputer by October 1. Tech will be spending $5.2 million over five years on the project, which should help it pull in more research money." Maybe VT can use the new computers to beef up their web site.
This discussion has been archived. No new comments can be posted.

Virginia Tech Announces Supercomputer Plans

Comments Filter:
  • Even more info ... (Score:4, Informative)

    by Pentagon13 (166309) on Wednesday September 03, 2003 @03:07PM (#6861553)

    Here's the article from which the Collegiate Times article has paraphrased: http://www.technews.vt.edu/Archives/2003/Sept/0356 6.html [vt.edu]
  • by Anonymous Coward on Wednesday September 03, 2003 @03:20PM (#6861715)
    Just the motherboard and chip for a dual 1.6 GHz opteron runs you $1394.99 [pricewatch.com] if you need PCI-X (which is probably necessary with the interconnect you want). And on top of that, you need the network cards, memory, case, hard drives, etc. etc.

    You could easily get to Apple prices going with AMD.
  • by heh2k (84254) on Wednesday September 03, 2003 @03:21PM (#6861724) Homepage
    There is also something to be said for the G5's parallel memory busses. It divides the ram in half, each half feeding 32 bits of the processor.

    multibanked ram is nothing new. it's been around since the 486 days for consumers (iirc), and much earlier in big machines, i'm sure. afaik, most mobosthese days are at least 128bits wide. my alpha (up2000+) is 256.

    You could theoretically keep your instructions on one side and data on the other, and pipeline the snot out of it.

    which would just be slower. 8)

  • by Lally Singh (3427) on Wednesday September 03, 2003 @03:30PM (#6861806) Journal
    Lots of "WHY?" questions, with lots of pointless trolling on the G5; but none of them actually look for answers. Mostly just more idiots who can't understand that a good vendor is important; that their own time is important; that ease of use is even more important now than it ever has been before. Luckily, these same idiots spend all their time setting up sendmail over their 14.4 modem. As for the G5, here are some strongpoints for it: - A fast memory pipe (1GHz) - Good heat management (9 fans but it's quieter than its predecessor) - Damn good FP performance (To get comparable FP performance on intel, you have to use the -fviolate-ieee flag on gcc, think about that) - Vendor-installed, vendor-supported Unix, with the vendor employing the entire OS's development team. - Fast system interconnects with network & I/O - Easy system setup (this matters a lot when you've got 1100 of them) - Proven apple reliability (and if you're going to fight this one, have something better than "is not!") (again, very important when you've got 1100 of them) Oh yeah, and OS X. Mach microkernel, Rondezvous, and distributed builds in the default toolset. Again, the idiots I mentioned above wouldn't have a clue about this stuff. As for _why_ VT getting this, VT's one of the largest engineering schools in the country. We've gotta simulate airflow over wings, heat propogation over materials, and other stufff this CS major doesn't understand. And we've got big development in bioinformatics. All kinds of CPU to crunch. AFAIK, the cluster's being paid for by federal grants or something like that. And now fools, flame me. Prove me right.
  • by Anonymous Coward on Wednesday September 03, 2003 @03:34PM (#6861841)
    http://www.google.com/search?hl=en&ie=UTF-8&oe=UTF -8&q=opteron+systems&btnG=Google+Search

    Tons of multi-Opteron systems already being sold, and for some time now.

    Homeboy is mega-stupid.

  • by JimRay (6620) <jimray@@@gmail...com> on Wednesday September 03, 2003 @03:42PM (#6861909) Homepage
    OS is SuSE as it supports Infiniband.

    Well, according to this story [technewsworld.com], the cluster will be running "a beta version of the latest release of OS X", presumably a beta version of Panther.

    If this is true, I'd bet, and this is purely a guess, that Panther and XCode [apple.com], the new development tool built by Apple, have some support for cluster applications. With technologies like Rendezvous on top of Mach/BSD, it could mean beowulf style supercomputers that are both fast and easy to maintain.
  • by Krach42 (227798) on Wednesday September 03, 2003 @03:45PM (#6861946) Homepage Journal
    *ahem* The PowerPC architecture wasn't extended to support 64-bit. It was the IBM POWER architecture that was "extended" to support 32-bit from 64-bit. The original PowerPC designs were designed to be executably compatible with the POWER architecture.

    This is ENTIRELY unlike the x86 architecture, which has been extended to support 32-bit from 16-bit, and now is being extended YET AGAIN to support 64-bit from 32-bit from 16-bit.
  • by TimeZone (658837) on Wednesday September 03, 2003 @03:52PM (#6862002)
    AFAIK, IBM does not have any systems based on the G5 (aka the PowerPC 970). Only Apple is actually making systems out of these chips. If somebody says they're using G5s, they're almost certainly using Apples. Or, they're confused and mean they're using some IBM Power5 box. Confusingly, the G5 is not based on IBM's Power5, but the Power4.

    TimeZone

  • by nullard (541520) <nullprogram@voiC ... c minus caffeine> on Wednesday September 03, 2003 @03:53PM (#6862014) Journal
    There is also something to be said for AMD's HyperTransport bus

    Yeah,HyperTransport [apple.com] is pretty cool. Too bad the G5 doesn't have it, right?
  • by stratjakt (596332) on Wednesday September 03, 2003 @03:58PM (#6862085) Journal
    Parity errors are extremely frequent, the average desktop PC suffers about one a week if it's run all the time. When the difference between 1 and 0 is a handful of electrons, its not surprising.

    The reason its not a big deal in the desktop world, is that you rarely notice those errors. Depends what got changed. Maybe a pixel in a bitmap got a little redder. Chances are it will happen in unused memory when the computer is idle. Modern PCs are idle most of the time, anyways.

    But when you start demanding 100% of the CPU, Bus and Memory in a high usage environment and demand complete accuracy and 5 9's of uptime, it becomes a huge issue.
  • by hackstraw (262471) * on Wednesday September 03, 2003 @04:03PM (#6862156)
    Just curious. Do you log your memory errors, and if so what is the error frequency?

    Actually, yes. And they are mailed to me nightly. I usually get 1 or 2 correctable errors a week.
  • by WatertonMan (550706) on Wednesday September 03, 2003 @04:12PM (#6862276)
    A couple points both for the parent and a few comments to the parent.

    Apple uses HyperTransport. It is in the custom chip they designed and IBM manufactures for them. It is only for the memory controller though. The FSB bus on the G5/970 is IBM's Elastic Bus. It is very similar to HyperTransport but not technically the same thing.

    There's an excellent discussion at Ars [infopop.net] on this.

  • by be-fan (61476) on Wednesday September 03, 2003 @04:19PM (#6862368)
    Read the tech doc and see where that Hypertransport link is. In the Opteron, its between CPUs, which allows for a very low latency pseudo-NUMA setup. On the G5, its between the Northbridge and the Southbridge. Even low end Athlons these days have Hypertransport links there.
  • by GrumpyOldMan (140072) on Wednesday September 03, 2003 @04:20PM (#6862377)
    To me, the Infiniband is the most interesting part. I'm very interested to see how the Infiniband scales and to see when they actually get the cluster working. I'm concerned about this because:

    1) I'm not aware of any Infiniband cluster that big. I'm not a big follower of IB, but I'd be surprised if there were any other clusters running MPI over IB even 1/2 that large.

    2) A Mellanox developer was asking basic questions about OSX driver development on the Darwin device drivers mailing list as recently as a few weeks ago. This leads me to believe that the MacOS X IB driver may not yet be ready for prime-time. Or may not even exist..

  • Umm... OK.... (Score:3, Informative)

    by greymond (539980) on Wednesday September 03, 2003 @04:25PM (#6862442) Homepage Journal
    So I read this "The project comes at a time when the university's academic departments are struggling to fulfill students' educational needs in the wake of a $72 million reduction in state support."

    and then I wonder why you would spend $5mil dollars over the next 5 years to build a supercomputer? It seems like a better idea would be to reach out to the slahsdot/linux communities and see what kind of equipment they could get donated/free and then build a semi-super computer with that - or hell even just buy a shitload of cheap pc's to do it with....

    maybe i'm just missing something...
  • by ukyoCE (106879) on Wednesday September 03, 2003 @04:26PM (#6862449) Journal
    It is a troll because the money for the supercomputer came from a NSF grant for that specific purpose. Furthermore the university expects to make a five-fold return, as have most universities in the top-x supercomputers.

    Have you ever been to VT? We've got construction going on all over the place. The football stadium is about to get another "upgrade" after having received on just a year or two ago. We've got major construction going on in at least 3 different places, not to mention many smaller construction projects.

    Meanwhile teachers are getting let go, classes that were taught in 30-person rooms 3 years ago when I started, are now taught in 400+ person lecture halls.

    Does it suck? Certainly. However the money for the construction projects, football stadium, and supercomputer are all from grants, donations, and other means intended for a specific purpose. They can not legally take the money from a supercomputer grant or football stadium donation and use it to pay a teacher's salary.

    We have uneducated rants in the school paper at least once a week saying "why are we upgrading the football stadium if we cant pay teachers!@#$"

    Yeah, it does suck, but the university has no choice in the matter.
  • by TekkaDon (223734) on Wednesday September 03, 2003 @04:44PM (#6862625)

    Or so they claim here. [vt.edu] It seems they have all their bases covered and don't give a damn about ECC for a reason.

    [Srinidhi Varadarajan, an assistant professor of computer science at Virginia Tech, and Jason Lockhart, director of the College of Engineering's High Performance Computing and Technology Innovation, initiated the venture at Virginia Tech. Varadarajan is an expert in reliability, a key issue in successfully exploiting terascale computing.]

    They keep on going:

    [Component failures are endemic to any large-scale computational resource. While previous generations of supercomputers engineered reliability into systems hardware, today's high performance computing environments are based on inexpensive clusters of commodity components, with no systemic solution for the reliability of total machine.]

    And now for the solution for your reliability problem.

    [Virginia Tech has the first comprehensive solution to the problem of transparent fault tolerance, which enables large-scale supercomputers to mask hardware, operating system and software failures - a decades old problem. It's a software program called Deja vu, designed by Varadarajan. He also integrated the software with Apple's G5s. This work will enable the terascale computing facility to operate as the first reliable supercomputing facility, according to Varadarajan, a National Science Foundation Faculty Early Career Development Program (CAREER) Award recipient.]

    So maybe, just maybe, you and other people could:

    1. READ before posting.

    2. Then READ a little more.

    3. Did I say READ already.

    -sigh- Whatever.
  • by SewersOfRivendell (646620) on Wednesday September 03, 2003 @04:48PM (#6862671)
    I think they are excellent machines, but not for science.

    Yeah, Macs are really bad for scientific research work. [apple.com] No one with a brain would ever use a cluster of them for science. [apple.com] Science requires the Wintel hegemony [apple.com]. -1, troll it, baby.

  • by JimRay (6620) <jimray@@@gmail...com> on Wednesday September 03, 2003 @04:59PM (#6862799) Homepage
    The only OS mentioned in it is mentioned by "Yankee Group senior analyst Dana Gardner" who has no stated connection to VT.

    From the article that I linked to:

    "In addition to the G5 machines, the university said it is using a beta version of the latest release of OS X, new networking hardware from Mellanox and Cisco, and cutting-edge configuration and cooling technologies to build the powerful cluster for a fraction of the price of a traditional supercomputer."

    (emphasis mine)

    Now, you can take that any way you like, I was simply trying to add another piece of information, which is why the post has been modded as informative. I realize this may have been easy to miss, it being in the second paragraph and all rather than being burried down at the bottom with your "Dana Gardner" tidbit, but there you have it. Re-read TFA then come back and complain.

    As for my "words sounce [sic] like marketing", well, that may be, but the fact is, automatic network configuration (which is exactly what Rendezvous is) would make 1100 clustered G5's easier to admin. And this flight of fancy of mine was based on the fact that Apple is already using Rendezvous-based clustering for XCode [apple.com] and Shake [apple.com], their high end video compositing software.

    Flame on if you like, just make sure you got yer facts straight first, kid.
  • by Coocha (114826) <coocha@vt . e du> on Wednesday September 03, 2003 @05:26PM (#6863123) Homepage
    My boss here at VT is a volunteer for this project... they've been designing and building rackmount shelf-type units to store all these new G5s, as well as helping with the cooling system. Here's some info he gave me.

    The cluster will eventually run Mac OS 10.27... he said eventually, and Jason Lockhart, the project leader, is a friend and fellow Linux geek of mine (please don't hammer his inbox ;-), so there's a chance that he might use some PPC distro at some point.

    Interconnectivity will be done with Cisco equipment, among the onboard gigabit LANs. Infiniband cards will also eventually be installed for 10 Gbit throughput.

    You guys can offer alternative solutions and troll this as much as you want, but this is what VT is going with. In my opinion, it's not a bad choice... the New IBM PPC chipset is balls-to-the-wall computing, and Apple's 'stock' offerings in the G5 (Gbit ethernet, serial ATA, etc.) are all strong selling points. The fact that this cluster is intended for intense vector and matrix-based algorithms is another bonus, b/c of the PPC vector processing unit.

    Apparently Apple shifted us up to the top of their production ladder, in order to make the contract, thereby extending the wait times for consumers itching for a G5... I find that a little humorous. Can't wait to see gigaflop statistics!!
  • by zerocool^ (112121) on Wednesday September 03, 2003 @06:17PM (#6863604) Homepage Journal
    identical ram in each of the 12!! RAM slots

    Dude, look at an old sparc sometime. Sparc 1/1+/2 had 16 ram slots, circa 1990. Of course, you had to fill 4 at a time. The max is 128 MB i think.

    ~Will
  • Re:Apple marketroid (Score:4, Informative)

    by Anonymous Coward on Wednesday September 03, 2003 @06:23PM (#6863650)
    Well I don't know about other science work but my dad does AIDS research and they use Macs exclusively in their labs for all of their model simulations and experiments. There's apparently a lot of biology research software that is available only on the Apple platform as well. I guess maybe because of their ties to educational institutions over the years?
  • by zerocool^ (112121) on Wednesday September 03, 2003 @06:49PM (#6863856) Homepage Journal
    It had better run freaking fast. Virginia Tech has had an OC-3 for at least 6 years, and I think they're upgrading to hook into network virginia's bigger pipes.

    For those out of the loop, network virginia [networkvirginia.net] is a partnership between verizon (local loops), sprint (borders and pipes), and Virginia tech (expertise and tech support). A few years ago, they had 2 OC-3's from Northern Va to roanoke, 1 to richmond, and 1 from roanoke to richmond. Their updated network topology map can be found by clicking here [networkvirginia.net]. The bottom one is the latest one. At any rate, they've got multiple OC-12's running from Nova to Roanoke, mainly because of VT. Tech may already be hooked into the OC-12's, i'm not sure.

    Also, I'm not sure about how much will be lost in clustering, but according to the CT article today, the dual 2.0 Ghz G-5 can pull 14 teraflops by it's self. If we're getting 1100 of them, say, drop ~10% for overhead, that would still put us up at 14000 teraflops, which is ahead of ascii white and behind los alamos.

    Also: regarding power requirements and all of that - we have several state of the art facilities on campus for this kind of stuff, including the VT Corporate research center and Torgersen hall (home of the center for advanced computing and where we keep all the fun VR rooms and stuff). There's a power plant on campus. We never lost power when I lived in a dorm, not during snow storms or huge thunderstorms or anything. It supplies power for most of blacksburg, too. Shameless plug, but that's one selling point for the company where I work, netmar [netmar.com], because we get our power from the VT power plant, and it's about 2.5 blocks away, we hardly ever lose power for more than 2 minutes, so we haven't had to put our generators to work in forever. Nowadays, we just test them with the remote start to make sure they're working, and to scare people that are hanging around the generator hut.

    Anyway, VT has no problems finding a place for these things to go, and will have no problem providing power for them. Climate control should be no problem, either. For starters, it's easy to cool things in blacksburg, cause it hasn't been above 100 degrees in 100 years here.

    Some people in my econ class today were talking about why are we doing it, and what's it going to be used for. Really, I think we're doing it to get grant money and sponsorships/funding, because with the economic situation in VA, we're scrambling to find money. We've had to drop teachers without replacing them and cut back on services all over (no more trash cans in dorm hallways, you have to take your own trash outside, can't afford the maintinance staff). Also, the Vet school will get a lot of use out of it. That's the "virginia-maryland regional college of veteranary medicine". They're looking for ways to cure problems with small bacteria instead of drugs (i'm not clear on the particulars, that's the impression i got). They're going to try and track what happens to something when it's introduced into an animal or something. Anyway, they'll use it, as will VT's engineering school, which, despite being tied for like 73rd on the list of top schools, and inexplicably 55 positions behind UVA, is an excellent program and produces excellent engineers.

    ~Will
  • by ProfessionalCookie (673314) on Wednesday September 03, 2003 @06:51PM (#6863872) Journal
    I think they are excellent machines, but not for science.

    Of course WinTel is the ultimate science platform. It's not like intel has a history of CPUs that can't do math. It's not like windows doesn't have a history of wild instability.

    I think Apple's are great machines too- but I think you're a troll.
  • by bspath1 (703088) on Wednesday September 03, 2003 @11:31PM (#6865789)
    Actually, PowerPC was designed as a 64 bit architecture from the ground up. The first implementations were 32 bit, although the PPC 620 was 64 bit AFAIR. The 601 was a hybrid PowerPC/POWER chip which supported both ISAs. The 603 and 604 chips were 32 bit PowerPC implementations and lacked the deprecated POWER instructions which were supported by the 601. IBM also went on to extend POWER to POWER2, POWER3 and a 64 bit implementation of POWER for their mainframes (RS64? and AS/400?). POWER4 is a unification of the 32 bit POWER and 64 bit POWER architectures which is also fully PowerPC 32 and 64 bit compatible. Note: the 970 is based on POWER4.
  • by Hoser McMoose (202552) on Thursday September 04, 2003 @03:31AM (#6866665)
    No, the G5 does NOT have Hypertransport, at least assumign that you are using Apple nomenclature of calling the IBM PowerPC 970 CPU the "G5" and the machine itself a "PowerMac".

    The PowerMac G5 machines use Hypertransport to connect their motherboard chipsets together. This is nothing new, the XBox does this as well. Hypertransport is a very good, low-cost, high performance solution for connection chips together directly. However, connecting motherboard chipsets together doesn't do much of anything for getting data to/from the processor itself.

    IMO the IBM PowerPC 970 (aka the G5) is actually not a bad chip to power a supercomputer. It seems to have quite a lot of processing power for a reasonable price and decent thermal characteristics (when you have a LOT of processors in small space, low power consumption is a very good thing, this is where most of Cray's real innovation was). However, the PowerMac's seem to be rather weak for this sort of application because of Apple's chipset, which is really a desktop/workstation chipset. The I/O bandwidth isn't all that impressive (1.6GB/s in each direction is the absolute best that can be managed) and latency should be even worse. Getting to a PCI-X card involves going through two separate chips and three buses. Compared to some of the proposed Opteron solutions which will hang networking hardware right off the processors Hypertransport buses (no chips in the middle and only a single bus with 3.2GB/s of bandwidth in each direction), the chip I/O looks pretty weak.

    My guess is that this deal is mainly due to the fact that Apple gave them a real good deal. Given the rather high cost of the networking equipment that will be needed, VT is probably getting their 1100 servers for next to nothing.

The sooner you fall behind, the more time you have to catch up.

Working...