FFTs Using AltiVec on Linux and Mac OS X 33
GregAllen writes "After searching for a good open-source AltiVec implementation of the Fast Fourier Transform (FFT), I have put together the AltiVec FFT page. It has all of the source, some benchmarks, and runs on Mac OS X or LinuxPPC. I even compared it to the venerable (but scalar) FFTW."
the obligatory... (Score:2, Insightful)
Won't the *aa be pleased.
'Consumers shouldn't be allowed to have machines that can encode media files quickly since we're the only ones who are allowed to produce this stuff...'
'and no, why would consumers need machines that can decode a media file faster than the realtime playback speed? if you're playing something back, you won't need to do anything else...'
Re:the obligatory... (Score:2)
memory interleaving (Score:2, Interesting)
"Split data seems to be much faster with SIMD architectures"
caching associativity and memory interleaving anyone?
Re:memory interleaving (Score:2)
I think the main reason is probably that vDSP only uses vectorized functions on split data, if it has to stride over the data set, it uses a scalar version of the function.
Take a look at the docs [apple.com] for vDSP - for each function it lists the critera used to select the vector/scaler version of each function. You'll see the requirement that "Stride is equal to 1" pop up all the time - those functions only use altivec if the data has been split.
What's a FFT (Score:2)
Re:What's a FFT (Score:5, Informative)
What this means is that is if you have a time-varying signal like audio for example, you can run an fft on it and get a frequency analysis on the sound. i.e. from the sound in a waveform format, after an fft, you can pick out which are the dominant frequencies, and what the relationship between frequencies are.
Although the fft is strictly not necessary because it is, after all, just a transform, it turns out that many media compression techniques use them because humans aren't as discriminating in the frequency domain so if you do lossy compression in the frequency domain, we won't notice/mind as much.
fyi, you do an fft any time you view a
Re:What's a FFT (Score:1)
Re:What's a FFT (Score:5, Informative)
You use a FFT anytime you ENCODE a .jpg, or an .mp3, or a DVD. When you view or listen or watch you are using an Inverse FFT.
And to be very specific, I think all your examples use DCT (Discrete Cosine Transform) and not FFT. JPEG definitely does. Very good overview here [cs.sfu.ca]
Re:What's a FFT (Score:1)
Further senseless nit-picking (Score:5, Informative)
The FFT is a particularly FAST algorithm for computing Fourier Transforms, with O(n log n) instead of O(n^2) for the naive algorithm -- thus the name. I believe that there is a comparably fast and very similar algorithm for computing the DCT, which doesn't really have a separate name.
You use a FFT anytime you ENCODE a
True, though an inverse Fourier tranform is simply the Fourier transform run through a complex complement and multiplication by a constant. The FFT and the IFFT are essentially the same algorithm.
And to be very specific, I think all your examples use DCT (Discrete Cosine Transform) and not FFT.
Right
Re:Further senseless nit-picking (Score:4, Informative)
No -- the DCT spaces its components at half-wavenumbers while the FT has components spaced at integer wavenumbers. So only every other DCT components would be equal to the real part of an FT component. (that isn't strictly true, either, since most DCTs have a half-sample offset compared to FT)
Interestingly enough it wasn't until the '90s that a "fast" or O(n log n) algorithm for calculating the DCT was found--this is well after it had become widely used in compression.
Re:What's a FFT (Score:2)
"It is commonly used to transform data from the time domain to the frequency domain."
Technically, it can take data from any well-behaved domain to the frequency domain, not just temporal data. I use FFTs on spatial data more than temporal data. Just had to nitpick that.
-Jellisky
Re:What's a FFT (Score:2, Informative)
Not really for OSX unless you're porting to linux (Score:3, Informative)
So unless you're writing something for OSX and linux, want to use the same libs on both and are willing to sacrifice a 2x speed increase, you'd be well advised not to use this library package for FFTs on OSX.
Re:Not really for OSX unless you're porting to lin (Score:5, Informative)
I've updated it to say: as much as twice as fast as vBigDSP on split data (but about the same on interleaved data).
I'd like to get an open-source split-data version to add to the site, so Linux users get good split-data FFTs, too.
Re:Not really for OSX unless you're porting to lin (Score:2)
There's a rather obvious reason for this, if you read the docs on vDSP [apple.com].
All of the functions are provided in vector and scalar forms, with a set of criteria listed that must be met for the vector form of the function to be used. You'll see the requirement "Stride must be equal to 1" quite often.
In other words, the function is only vectorized for split data, and runs a slower scalar versio
Re:Not really for OSX unless you're porting to lin (Score:3, Informative)
This is clearly not the case. On an 800 MHz G4, the peak scalar performance is 1600 MFLOPS. You're lucky to get 800 MFLOPS, just like the (very good) scalar FFTW. That's exactly why we have SIMD/AltiVec.
I'm using Apple's ctoz() and ztoc() to convert from interleaved to split and back, and getting "about the same as vBigDSP" -- that's 1-2 GFLOPS. Without these conversions, that curve peaks at about 4 GFLOP
Linux is faster than macosx (Score:3, Informative)
Re:Linux is faster than macosx (Score:2, Insightful)
This is probably because OS/X is heavily altivec-optimized, and g3's don't have altivec. So basically what you're saying is "a g3 optimized operating system is faster on my g3 than a g4 optimized operating system."
Shocking, eh?
Re:Linux is faster than macosx (Score:1)
You notice a speed difference between OSX and Linux because you compiled Linux from source for the G3? Huh? Perhaps it is because you are running a completely dif
Re:Linux is faster than macosx (Score:1)
Also you don't compile an application for Altivec. You write a separate version of those core routines that is compl
Old story? (Score:2, Informative)
For applications where raw speed is not very important I recommend everyone to use the fftw library, it is already installed on a lot of systems and easy in use. Very fast are Intel's SDK version and DJ Bernstein's [cr.yp.to] (only 256 points).
FFTW 3.0 beta has been released (Score:4, Informative)
Anyone want to add it to the benchmark?
Re:FFTW 3.0 beta has been released (Score:2)
Yes! I've run the benchmark, and will add the results shortly. Preliminary indication -- it's no faster than vBigDSP.
Have you already seen djbfft? (Score:3, Informative)
Re:Have you already seen djbfft? (Score:2)
You have to give the FFTW people credit: they have provided their results, and made it easy for you to reproduce them. I guess that's why they're at MIT, and djbfft is at.... Where is it again?
I don't really care who wrote my FFT, I just want it to be FAST and CORRECT.
The djbfft website sounds like a lo
Re:Have you already seen djbfft? (Score:2)
djbfft is faster than FFTW. It's also faster than many FFT libraries that sacrifice accuracy for speed. I asked if you had tried it, and you didn't answer.
Re:Have you already seen djbfft? (Score:2)
As is (almost) every AltiVec FFT benchmarked on the site -- FFTW is the slowest one listed. I included FFTW only because it is a "well known" scalar point of reference -- it is obviously very well known to djb. The topic of discussion is AltiVec, which djbfft clearly does not use. No matter how fast of a scalar implementation djbfft is, it cannot hope to compete with vBigDSP.
I mean absolutely no disrespect to Dr. Bernstein. On his site [cr.yp.to] he makes statements such as this: How many
What about Matlab? (Score:1, Interesting)
Re:What about Matlab? (Score:3, Informative)