Virginia Tech Announces Supercomputer Plans 419
CousinVinnie writes "Previously noted in this Slashdot story, the administration of Virginia Tech has announced they're puchasing 1100 G5's (another story) in hopes to build a top-10 supercomputer by October 1. Tech will be spending $5.2 million over five years on the project, which should help it pull in more research money." Maybe VT can use the new computers to beef up their web site.
Even more info ... (Score:4, Informative)
Here's the article from which the Collegiate Times article has paraphrased: http://www.technews.vt.edu/Archives/2003/Sept/035
maybe not as cheap as you think (Score:1, Informative)
You could easily get to Apple prices going with AMD.
Re:This is quite cool but... (Score:3, Informative)
multibanked ram is nothing new. it's been around since the 486 days for consumers (iirc), and much earlier in big machines, i'm sure. afaik, most mobosthese days are at least 128bits wide. my alpha (up2000+) is 256.
You could theoretically keep your instructions on one side and data on the other, and pipeline the snot out of it.
which would just be slower. 8)
I keep overestimating slashdot... (Score:5, Informative)
Mod parent down, blatant ignorance (Score:2, Informative)
Tons of multi-Opteron systems already being sold, and for some time now.
Homeboy is mega-stupid.
Re:Anyone have any real specs? (Score:5, Informative)
Well, according to this story [technewsworld.com], the cluster will be running "a beta version of the latest release of OS X", presumably a beta version of Panther.
If this is true, I'd bet, and this is purely a guess, that Panther and XCode [apple.com], the new development tool built by Apple, have some support for cluster applications. With technologies like Rendezvous on top of Mach/BSD, it could mean beowulf style supercomputers that are both fast and easy to maintain.
Re:This is quite cool but... (Score:3, Informative)
This is ENTIRELY unlike the x86 architecture, which has been extended to support 32-bit from 16-bit, and now is being extended YET AGAIN to support 64-bit from 32-bit from 16-bit.
Re:Are they buying the chips from Apple or IBM (Score:2, Informative)
TimeZone
Re:This is quite cool but... (Score:2, Informative)
Yeah,HyperTransport [apple.com] is pretty cool. Too bad the G5 doesn't have it, right?
Re:This is quite cool but... (Score:2, Informative)
The reason its not a big deal in the desktop world, is that you rarely notice those errors. Depends what got changed. Maybe a pixel in a bitmap got a little redder. Chances are it will happen in unused memory when the computer is idle. Modern PCs are idle most of the time, anyways.
But when you start demanding 100% of the CPU, Bus and Memory in a high usage environment and demand complete accuracy and 5 9's of uptime, it becomes a huge issue.
Re:This is quite cool but... (Score:3, Informative)
Actually, yes. And they are mailed to me nightly. I usually get 1 or 2 correctable errors a week.
Re:This is quite cool but... (Score:3, Informative)
Apple uses HyperTransport. It is in the custom chip they designed and IBM manufactures for them. It is only for the memory controller though. The FSB bus on the G5/970 is IBM's Elastic Bus. It is very similar to HyperTransport but not technically the same thing.
There's an excellent discussion at Ars [infopop.net] on this.
Re:This is quite cool but... (Score:4, Informative)
Re:Anyone have any real specs? (Score:2, Informative)
1) I'm not aware of any Infiniband cluster that big. I'm not a big follower of IB, but I'd be surprised if there were any other clusters running MPI over IB even 1/2 that large.
2) A Mellanox developer was asking basic questions about OSX driver development on the Darwin device drivers mailing list as recently as a few weeks ago. This leads me to believe that the MacOS X IB driver may not yet be ready for prime-time. Or may not even exist..
Umm... OK.... (Score:3, Informative)
and then I wonder why you would spend $5mil dollars over the next 5 years to build a supercomputer? It seems like a better idea would be to reach out to the slahsdot/linux communities and see what kind of equipment they could get donated/free and then build a semi-super computer with that - or hell even just buy a shitload of cheap pc's to do it with....
maybe i'm just missing something...
Re:enjoy your tuition increases kids (Score:5, Informative)
Have you ever been to VT? We've got construction going on all over the place. The football stadium is about to get another "upgrade" after having received on just a year or two ago. We've got major construction going on in at least 3 different places, not to mention many smaller construction projects.
Meanwhile teachers are getting let go, classes that were taught in 30-person rooms 3 years ago when I started, are now taught in 400+ person lecture halls.
Does it suck? Certainly. However the money for the construction projects, football stadium, and supercomputer are all from grants, donations, and other means intended for a specific purpose. They can not legally take the money from a supercomputer grant or football stadium donation and use it to pay a teacher's salary.
We have uneducated rants in the school paper at least once a week saying "why are we upgrading the football stadium if we cant pay teachers!@#$"
Yeah, it does suck, but the university has no choice in the matter.
FIRST reliable supercomputing facility... (Score:5, Informative)
Or so they claim here. [vt.edu] It seems they have all their bases covered and don't give a damn about ECC for a reason.
[Srinidhi Varadarajan, an assistant professor of computer science at Virginia Tech, and Jason Lockhart, director of the College of Engineering's High Performance Computing and Technology Innovation, initiated the venture at Virginia Tech. Varadarajan is an expert in reliability, a key issue in successfully exploiting terascale computing.]
They keep on going:
[Component failures are endemic to any large-scale computational resource. While previous generations of supercomputers engineered reliability into systems hardware, today's high performance computing environments are based on inexpensive clusters of commodity components, with no systemic solution for the reliability of total machine.]
And now for the solution for your reliability problem.
[Virginia Tech has the first comprehensive solution to the problem of transparent fault tolerance, which enables large-scale supercomputers to mask hardware, operating system and software failures - a decades old problem. It's a software program called Deja vu, designed by Varadarajan. He also integrated the software with Apple's G5s. This work will enable the terascale computing facility to operate as the first reliable supercomputing facility, according to Varadarajan, a National Science Foundation Faculty Early Career Development Program (CAREER) Award recipient.]
So maybe, just maybe, you and other people could:
1. READ before posting.
2. Then READ a little more.
3. Did I say READ already.
-sigh- Whatever.Re:This is quite cool but... (Score:5, Informative)
Yeah, Macs are really bad for scientific research work. [apple.com] No one with a brain would ever use a cluster of them for science. [apple.com] Science requires the Wintel hegemony [apple.com]. -1, troll it, baby.
Re:Anyone have any real specs? (Score:3, Informative)
From the article that I linked to:
"In addition to the G5 machines, the university said it is using a beta version of the latest release of OS X, new networking hardware from Mellanox and Cisco, and cutting-edge configuration and cooling technologies to build the powerful cluster for a fraction of the price of a traditional supercomputer."
(emphasis mine)
Now, you can take that any way you like, I was simply trying to add another piece of information, which is why the post has been modded as informative. I realize this may have been easy to miss, it being in the second paragraph and all rather than being burried down at the bottom with your "Dana Gardner" tidbit, but there you have it. Re-read TFA then come back and complain.
As for my "words sounce [sic] like marketing", well, that may be, but the fact is, automatic network configuration (which is exactly what Rendezvous is) would make 1100 clustered G5's easier to admin. And this flight of fancy of mine was based on the fact that Apple is already using Rendezvous-based clustering for XCode [apple.com] and Shake [apple.com], their high end video compositing software.
Flame on if you like, just make sure you got yer facts straight first, kid.
Specs from an involved student... (Score:5, Informative)
The cluster will eventually run Mac OS 10.27... he said eventually, and Jason Lockhart, the project leader, is a friend and fellow Linux geek of mine (please don't hammer his inbox
Interconnectivity will be done with Cisco equipment, among the onboard gigabit LANs. Infiniband cards will also eventually be installed for 10 Gbit throughput.
You guys can offer alternative solutions and troll this as much as you want, but this is what VT is going with. In my opinion, it's not a bad choice... the New IBM PPC chipset is balls-to-the-wall computing, and Apple's 'stock' offerings in the G5 (Gbit ethernet, serial ATA, etc.) are all strong selling points. The fact that this cluster is intended for intense vector and matrix-based algorithms is another bonus, b/c of the PPC vector processing unit.
Apparently Apple shifted us up to the top of their production ladder, in order to make the contract, thereby extending the wait times for consumers itching for a G5... I find that a little humorous. Can't wait to see gigaflop statistics!!
Re:This is quite cool but... (Score:5, Informative)
Dude, look at an old sparc sometime. Sparc 1/1+/2 had 16 ram slots, circa 1990. Of course, you had to fill 4 at a time. The max is 128 MB i think.
~Will
Re:Apple marketroid (Score:4, Informative)
Re:"Beef up their Website" (Score:5, Informative)
For those out of the loop, network virginia [networkvirginia.net] is a partnership between verizon (local loops), sprint (borders and pipes), and Virginia tech (expertise and tech support). A few years ago, they had 2 OC-3's from Northern Va to roanoke, 1 to richmond, and 1 from roanoke to richmond. Their updated network topology map can be found by clicking here [networkvirginia.net]. The bottom one is the latest one. At any rate, they've got multiple OC-12's running from Nova to Roanoke, mainly because of VT. Tech may already be hooked into the OC-12's, i'm not sure.
Also, I'm not sure about how much will be lost in clustering, but according to the CT article today, the dual 2.0 Ghz G-5 can pull 14 teraflops by it's self. If we're getting 1100 of them, say, drop ~10% for overhead, that would still put us up at 14000 teraflops, which is ahead of ascii white and behind los alamos.
Also: regarding power requirements and all of that - we have several state of the art facilities on campus for this kind of stuff, including the VT Corporate research center and Torgersen hall (home of the center for advanced computing and where we keep all the fun VR rooms and stuff). There's a power plant on campus. We never lost power when I lived in a dorm, not during snow storms or huge thunderstorms or anything. It supplies power for most of blacksburg, too. Shameless plug, but that's one selling point for the company where I work, netmar [netmar.com], because we get our power from the VT power plant, and it's about 2.5 blocks away, we hardly ever lose power for more than 2 minutes, so we haven't had to put our generators to work in forever. Nowadays, we just test them with the remote start to make sure they're working, and to scare people that are hanging around the generator hut.
Anyway, VT has no problems finding a place for these things to go, and will have no problem providing power for them. Climate control should be no problem, either. For starters, it's easy to cool things in blacksburg, cause it hasn't been above 100 degrees in 100 years here.
Some people in my econ class today were talking about why are we doing it, and what's it going to be used for. Really, I think we're doing it to get grant money and sponsorships/funding, because with the economic situation in VA, we're scrambling to find money. We've had to drop teachers without replacing them and cut back on services all over (no more trash cans in dorm hallways, you have to take your own trash outside, can't afford the maintinance staff). Also, the Vet school will get a lot of use out of it. That's the "virginia-maryland regional college of veteranary medicine". They're looking for ways to cure problems with small bacteria instead of drugs (i'm not clear on the particulars, that's the impression i got). They're going to try and track what happens to something when it's introduced into an animal or something. Anyway, they'll use it, as will VT's engineering school, which, despite being tied for like 73rd on the list of top schools, and inexplicably 55 positions behind UVA, is an excellent program and produces excellent engineers.
~Will
Re:This is quite cool but... (Score:2, Informative)
Of course WinTel is the ultimate science platform. It's not like intel has a history of CPUs that can't do math. It's not like windows doesn't have a history of wild instability.
I think Apple's are great machines too- but I think you're a troll.
Re:This is quite cool but... (Score:3, Informative)
Re:This is quite cool but... (Score:3, Informative)
The PowerMac G5 machines use Hypertransport to connect their motherboard chipsets together. This is nothing new, the XBox does this as well. Hypertransport is a very good, low-cost, high performance solution for connection chips together directly. However, connecting motherboard chipsets together doesn't do much of anything for getting data to/from the processor itself.
IMO the IBM PowerPC 970 (aka the G5) is actually not a bad chip to power a supercomputer. It seems to have quite a lot of processing power for a reasonable price and decent thermal characteristics (when you have a LOT of processors in small space, low power consumption is a very good thing, this is where most of Cray's real innovation was). However, the PowerMac's seem to be rather weak for this sort of application because of Apple's chipset, which is really a desktop/workstation chipset. The I/O bandwidth isn't all that impressive (1.6GB/s in each direction is the absolute best that can be managed) and latency should be even worse. Getting to a PCI-X card involves going through two separate chips and three buses. Compared to some of the proposed Opteron solutions which will hang networking hardware right off the processors Hypertransport buses (no chips in the middle and only a single bus with 3.2GB/s of bandwidth in each direction), the chip I/O looks pretty weak.
My guess is that this deal is mainly due to the fact that Apple gave them a real good deal. Given the rather high cost of the networking equipment that will be needed, VT is probably getting their 1100 servers for next to nothing.