We present an approach to solve this eigenvalue problem which is based on a conjugate gradient minimization scheme, and is suitable for parallelization. We compare various algorithms, and discuss the performance of one of them based on our implementation on the NCSA's SGI Power Challenge, which is a bus-based shared memory multiprocessor.
We find that with our data layout, the 3D Fast Fourier Transform (FFT) involved in matrix-vector multiplies is the most difficult part to parallelize, and limits the speedup for small problem sizes. For larger problem sizes, the speedup becomes better because operations requiring little communication dominate the FFT.