Xgrid Agent for Unix 219
mac-diddy writes "Someone on Apple's mailing list for Xgrid, Apple's clustering software, just announced an 'Xgrid agent for Linux and other Unix platforms' available for download. There are still some issues being worked on like large file support, but it does allow you to simply add a Unix node to your existing Xgrid cluster. Just goes to show that when companies embrace open standards and code, the world doesn't fall apart."
Re:Mixed Company (Score:5, Informative)
Not really. Everyone uses network byte order for communication, so you won't have more overhead in a mixed system than you would in a homogenous system.
Re:How many clusters (Score:4, Informative)
Re:I've been dying to know.... (Score:2, Informative)
I doubt you compile applications that big
photoshop: get an smp instead and plugins that support it
quake,mame: u kidding get a faster gpu instead
In general it makes life easy (Score:5, Informative)
After a while they cease to become fun to write, and you'd rather just get on with writing code that does something instead of infrastructure. By using and contributing to OSS projects, you can use the same code no matter what company you end up at. Because the code is portable it can become part of the package you can offer to a potential employer - they not only get an employee but potentially one that can producive almost right away because they are familiar with the tools they'll be using, with no cost to the company for said tools.
So it makes life easier for you, less re-work. And it makes life easier for employers, as they get richer products sooner. And if the employee becomes really proficient at a widely used OSS project they can write their own way through consulting or training.
Re:I've been dying to know.... (Score:5, Informative)
Re:Home cluster (Score:1, Informative)
Re:How many clusters (Score:5, Informative)
Re:It Doesn't Show That At All! (Score:5, Informative)
Re:Apple embraces opensource? (Score:5, Informative)
Why bother? (Score:2, Informative)
There are many other open source cluster/queuing systems available.
The one I prefer is OpenPBS [openpbs.org]. It works very well for engineering compute clusters, and there are many different resource schedulers available which use the PBS job and node management system.
Re:Apple embraces opensource? (Score:5, Informative)
I wouldn't say that - I find it pretty amusing you've been registered at ./ for so long and are still so wrong.
p.s. I know I should reference - how about 'MS owns fuck all anymore' - will this [aviationhumour.co.uk] do?
Re:Mixed Company (Score:5, Informative)
The real tragedy is when you have homogenously little endian machines; e.g., a network that only has PCs on it. An integer gets byteswapped twice to end up in exactly the same byte order it was all along.
Re:Probably a silly question but... (Score:5, Informative)
(I wrote the xgridagent).
As the other poster said, XGrid does not care what the binary does (so it can be smp aware, multi-threaded, whatever). However, the xgridagent itself is not explicitly smp aware, but it is multi-threaded. Each task is started in its own thread and depending on the OS(?) I guess they could spread to other CPUs. The other aspect of the question is "Does the Unix XGrid agent support MPI like Apple's GridAgent for OS X?". It does not and I can't say for sure how difficult it would be to support it. However, since all communication is done via the XGrid protocol, I don't see what would prevent it from being implemented. BUt other things need to be done first.
The most pressing issue is to fix the annoying "large message" issue which makes the agent hang (while it waits forever for the controller to accept more frames). I am convinced it is trivial, I just don't know enough about BEEP to fix it. I am hoping somebody who knows BEEP will take a look at xgridagent-profile.c and fix the xgridagent_SengMSG() function and send me the patch.
Daniel Côté
Re:Mixed Company (Score:4, Informative)
not quite.
first, i think you mean "ntohs" (and ntohl and friends).
second, they are not macros. they are, in fact, real functions (in glibc, bsd libc, and windows' winsock library). i'd imagine it's the same on macs.
third, a macro that does nothing is not expanded to a NOP, it is simply removed by the preprocessor.
so, assuming the macs are conforming to bsd networking standards, ntohs is required to be a function, so there is still a function call per conversion (which is much more costly than doing the actual byteswap).
The real tragedy is when you have homogenously little endian machines; e.g., a network that only has PCs on it. An integer gets byteswapped twice to end up in exactly the same byte order it was all along.
a real high performance implementation (ie, the kernel) would not use ntohl, it would implement a similar byteswap macro. a byteswap can be done on x86 in one instruction, so it is fairly trivial to do.
Re:GridEngine (Score:3, Informative)
Personally, I found PBS to be the best open source solution last time I had to choose, but that was just prior to the Sun buyout of GRD, so things may have changed. [My current employer [llnl.gov] rolls their own batch scheduler, so I haven't had a need to survey the field for a few years.] There are also some things Condor rocks at (cycle scavanging, userspace checkpoint/restart/migration) which none of the others even attempt, so it's definitely worth a look for some sites.
If your paying $$ for your batch scheduler, LSF pretty much trumps all of them, but the price is too steep for me.
Re:I've been dying to know.... (Score:5, Informative)
Not so, not so.
If your problem is embarrassingly parallel, chances are you can use Xgrid to run it right now.
For example, let's say you're rendering a 3D animation. (I haven't done real 3D work since the PowerAnimator days, so pardon me of some of my jargon is antiquated.) You've got a scene file on which you can run a render command. A command-line argument tells the renderer which frame to render.
No problem. Just use use Xgrid's Xfeed plugin. Xfeed lets you set up a job that runs a single command with a variety of command-line arguments. You tell Xfeed that you want to run the "render" command with "-f" and the numbers 1 through 720.
Xgrid goes to the first available machine on the grid and says, "Run render -f 1." Then it goes to the second machine and says, "Run render -f 2." And so on, until there are no available machines. Then it waits until a machine becomes available and says, "Run render -f n."
As each output file (a frame, in this case) becomes available, Xgrid (the client application itself, I mean) collects them in whatever directory you specified when you submitted the job.
The cool part comes when you realize that this isn't a cluster. It's a grid. That means machines can come and go as they please. If this job is running overnight, when I come in the next morning and sit down at my workstation, the agent on my computer stops the job and de-registers itself. The job goes back in the controller's queue for processing on whatever the next available machine is.
And you don't have to have any special software for this. It can be done right now with the tools that already exist in Preview 2.
Re:Mixed Company (Score:4, Informative)
Re:Good for home use too. (Score:3, Informative)
Video compression is a difficult task to parallelize. If each frame were compressed individually it'd be easy: just and an uncompressed frame to a node and get the compressed frame back. But that's not how it works.
Now, for something like Pixlet, which is frame-based, there's the possibility of distributing the task. But you will never use Pixlet. It was designed to compress 2K or 1080 material losslessly at a ratio of about 2:1. Very specific tool for a very specific purpose.
So using Xgrid for video compression isn't going to be the wonder that you might wish it could be.
It's for ad hoc cluster creation... (Score:5, Informative)
Re:Good for home use too. (Score:3, Informative)
Re:I've been dying to know.... (Score:2, Informative)
Mind you, I don't know how he did it, as I am still a code monkey-in-training.
Re:Why another technology (Score:3, Informative)
Xgrid treats the cluster as one proccessor, while OpenMosix assigns each to thread to a cpu thats not doing muck work.
Re:My Experience (Score:5, Informative)
Problem type: The problem may not be well suited to running on a bunch of PCs (especially when the agent app isn't allowed to take 100% of the machine's resources to accimplish the task) over typical office networks. Basically if the app needs to communicate frequently with other nodes, or if a huge data set is involved (or both), latency or bandwidth issues might outweigh the possible advantage of putting more CPUs to work.
Security: The data may be highly sensitive, in which case you might not want to put it on ordinary desktop PCs that might have untrustworthy users, spyware, etc.
Configuration: The configuration of your office's PCs may vary enough to make the cost of getting a companywide desktop cluster working unacceptably high. You'd have to pick a few target configurations and settle for that. Hopefully drivers and such wouldn't matter as much as CPU, RAM, disk, and OS version, but there are still companies that are just now getting their desktops updated to Win2K. There's also the headache of installing yet another required application on a large number of heterogeneous machines, which is virtually guaranteed to result in confusing installation problems. Oops, our app crashes if the user has this or that service pack installed. Oops, our app requires strong encryption. You could build your app on top of some sort of moderately portable framework or VM or whatever but that will have system requirements too, and probably will have some surprising gotchas when deployed in a real-world environment.
Re:Mixed Company (Score:1, Informative)
unless you're linking your libc statically, it can't be inline. it similarly can't be inline if you use a function pointer to it in some fashion.
Of course reality differs and they are actually null macros on OS/X
then osx has a broken bsd socket implementation. ntohs should be a function. that is, you should be able able to take a function pointer to it and all the others (something you cannot do with a macro), and any code that relies on this will break on osx.
Re:Good for home use too. (Score:3, Informative)
Re:Mixed Company (Score:1, Informative)
Re:embracing? (Score:3, Informative)
Actually, that is completely false [apple.com]:
As 30 seconds of Googling will tell you, distcc [is] a fast, free distributed C/C++ compiler [samba.org].
As they have done with KDE's KHTML engine in Safari [apple.com], so is Samba's distcc engine being used in XCode [samba.org].
Care to try again ?
:-)
Re:Apple (Score:4, Informative)
Well, there's Darwin, their (improved, IMnsHO) version of BSD.
Rendezvous is their (improved) version of ZeroConf.
Safari runs on the KHTML engine. Apple made some improvements and gave them back to the KHTML people, who thanked and praised Apple.
They've worked to improve gcc on PPC-based compilers.
They also provide the standard tools like apache, perl, python, etc etc etc, with OS X. I don't know if they have worked on these specifically, but it wouldn't surprise me in the least.
Re:embracing? (Score:5, Informative)
If Apple breaks this intentionally (meaning not for adding significant, enhanced functionality) in their next release, I will stand with you as an anti-Apple nay-saying zealot and deride them all up and down /.
-Potentially recovering Mac zealot (it's so hard with WWDC right around the corner :-( )
Receiver swaps. (Score:3, Informative)
In DCE RPC, the receiver does the byte swapping, if necessary. One of the main reasons Windows network services are built on DCE RPC is that between homogenous systems, there's no swapping taking place: all that data goes out in host byte order, and there's no such thing as network bte order.
One of the big arguments about this had to do with Windows machines on Intel not "playing fair" with systems that natively implement network byte order as their host byte order. When talking to Intel boxes, these machine have to gain additional overhead.
This also gives a big disadvantage to servers whose byte order doesn't match that of their predominant clients.
Actually, from a computational overhead point of view, a more correct approach would have been to have "client swaps to seerver byte order", to put the computational overhead on the most efficient side of the link for it (by offloading the most computationally loaded component, the server).
As far as I recollect, this lost out in committee to people who were arguing against it in order to have leverage to enforce vendor lock-in for both clients and servers. 8-(.
-- Terry
Re:I've been dying to know.... (Score:3, Informative)
http://www.atpm.com/10.06/blender.shtml
Re:Mixed Company (Score:3, Informative)
Don't be silly. (Do the moderators actually think before scoring +1?) gcc is perfectly capable of inlining functions even when glibc is dynamically linked. It can also inline functions whose address is taken, just by generating a separate copy. Any other compiler clever enough to have inlines is very likely to do the same.
Re:Mixed Company (Score:2, Informative)
Better let the BSD team know that then, because they'll surely want to make sure their code complies with the "bsd socket implementation" spec you mention. Or... and here's a crazy idea, you could realise you're wrong and that Apple didn't decide to deliberately break the BSD code they used and actually have a very similar implementation to the BSD code.
Source: http://cvsweb.netbsd.org/bsdweb.cgi/src/sys/sys/eA chance to nominate an app for parallelisation (Score:1, Informative)
Apple and a third-party partner are looking to target a few key
applications with the hope of developing parallel versions that would
benefit from computational clusters. As many of you know,
embarrassingly parallel algorithms like BLAST are easily written to
take advantage of clusters. There is a large set of problems, however,
where this is not the case. We would like to find some of these more
difficult applications and find a way to parallelize them using some
interesting technologies developed by our partner.
I'd like to solicit feedback from the members of these mailing lists
with respect to choosing two or three "killer applications" that, if
parallelized, would present an immediate value to their respective
users. We have a few in mind, but I'd like to leave the question
open-ended. Any science is equally applicable -- bioinformatics,
molecular dynamics, physics, engineering, etc. We would prefer to work
with open source applications.
Feel free to reply to me directly, or to the entire list.
Regards,
Matt
--
Matt MacInnis
Research and HPC Manager
Higher Education
Apple Computer, Inc.
Office 408-974-6322 / Mobile 408-203-1001