For the setup of the local potential
and the nonlocal potential
, a serial program was available. The data layout was
different from ours though. The vector components
were ordered
according to the length
of the triple index
. Each and every
part of the code had to be modified to comply with the sliced data
layout described earlier. There is very little
communication involved in this part of the program. Each processor
sets up only the data it needs to do the LPO operation and to apply the
nonlocal part of the potential to the vector components it holds.
The modifications to the existing code were straight forward, but
tedious.
Getting an initial starting guess for the eigenvectors by directly diagonalizing a leading submatrix turned out to be a cumbersome operation, since from the spread out matrix setup, an explicit representation of a leading submatrix has to be collected to a single processor, which then calls the required LAPACK routine to diagonalize this submatrix. After the diagonalization, each processor picks those components of the eigenvectors of the submatrix which fall on one of the planes it holds, and fills up the other components it holds with zeros.