In the finite difference programs, only the nearest neighbors are used to update a nodal voltage, so that the fact that the nearest neighbors are at Δi = Δj = Δk = ±1 was used from the start. This made an array like ib in the finite element programs, even in the serial version, unnecessary. For an nx x ny x nz size microstructure in the serial code, the program actually stores an array that is (nx + 2) x (ny + 2) x (nz + 2) in size. The extra two layers of voxels in each direction are periodic continuations of the microstructure. The real microstructure lies from 2, nx + 1; 2, ny + 1; and 2, nz + 1, and is recorded in the array list in the serial version. Therefore, in a manner of speaking, the serial code has a set of ghost layers in the x, y and z directions already built in. The parallel codes can determine (as described in the previous finite element section) if the position is real or belongs to a ghost layer and the array list becomes obsolete.
With periodic updates of the x and y ghost sites, one can operate entirely on the real sites of the array, and freely take Δi = Δj = Δk = ±1 to update each real site. The periodic boundary conditions are enforced via the regular updates of the boundary sites (i = 1 and nx + 2, j = 1 and ny + 2, and k = 1 and nz + 2).
However, in creating parallel implementations for the finite difference codes, there is a unique twist since the serial versions already have 2 Z-ghost layers built in. (Figure 4) In the serial codes, the real data lies in layers Z' = 2 through nz + 1. The serial ghost layers contain the information from the first (z = 1) and last (z = nz) real data in the Z' = 1 and Z'0 = nz2 layers, respectively. For the rest of the discussion, nz2 = nz + 2.
Figure 4: Depiction of original finite element data set in a serial based program showing the equivalency of the Z layers. The Real layers are real data.
The parallel code is still set-up to split the layers dependent (i.e. calculate d1 and d2 per node) on the overall extent of z (Figure 5). Therefore, each node in the finite difference case will receive layers and not like a finite element problem since nz2 is the overall extent of z. Therefore, before the parallelization and data split is made, it is known that the bottom node will already have the information it will need for its d1 − 1 ghost layer and the top node will have the layer which it needs for its d2 + 1 ghost layer. And interestingly enough, they are already stored in the bottom node's d1 and top node's d2 layers, respectively, before the split. Therefore, when the initial ghost layer production of vox (and subsequent arrays) in the parallel versions, the bottom node will only need to receive its d2+1 (top ghost) layer and the top node will need it's d1 − 1 (bottom ghost) layer through a message passing procedure. Since it is known that before the splitting takes place, the bottom and top nodes already have 1 needed ghost layer, the code is designed with a new set of variables called dlow and dhigh which allow for the special cases of the lowest and topmost nodes and makes a one-to-one correspondence for d1 with dlow and d2 with dhigh for all other nodes. viz:
This type of code ensures that only the real data is used and the extra ghost layers (made by subroutine z ghost) for Node 0 and the the northernmost node, remain out of the central calculation.Figures 4 and 5 depict the case of nz = 6 for the serial and parallel cases. This increases the Z extent of this array to nz + 2, or 8. The Z=1 and Z=8 layers are the built in ghost layers, while layers with 2 ≤ Z ≤ 6 (nz) are the real data. The shading of the layers Real=1 and Real=6 corresponds to the ghost layer equivalency in the figure.
Figure 5 shows how the splitting of arrays is performed in the parallel case, with the number of processors arbitrarily set to 4. In this example, each processing node receives, , or 2 layers per node. Notice how the values of dlow and dhigh are dependent on the rank values; namely dlow on root (myrank = 0) and dhigh on the northernmost (myrank = nprocs − 1). Interestingly, in this example, dlow and dhigh are equal (dlow, dhigh = 2) on rank=0 and are also equal (dlow, dhigh = 7) on rank=nprocs − 1.
Figure 5: Depiction of ghost layers in the parallel finite element program showing the equivalency of the Z layers across 4 processors for the nz = 6, nz2 = 8 case. Note how the d1 − layer of the bottom node & the d2 + 1 layer of the top node are unused in the calculation
When a nodal calculation occurs, the extrema on the Z-type loops occur between dlow and dhigh and in addition, one needs to know the values at Δk = ±1. Therefore, it can be seen from the figure, the lowest ghost layer on Node 0 will be unused as well as the top most ghost layer on Node 3. And by induction, this will occur for all instances. These type of calculations appear in subroutines PROD_nC where matrix calculations and updates are carried out.