HyPar  1.0
Finite-Difference Hyperbolic-Parabolic PDE Solver on Cartesian Grids
 All Data Structures Namespaces Files Functions Variables Typedefs Enumerations Enumerator Macros Pages
tridiagLU.h File Reference

Header file for TridiagLU. More...

Go to the source code of this file.

Data Structures

struct  TridiagLU
 

Macros

#define _TRIDIAG_JACOBI_   "jacobi"
 
#define _TRIDIAG_GS_   "gather-and-solve"
 

Functions

int tridiagLU (double *, double *, double *, double *, int, int, void *, void *)
 
int tridiagLUGS (double *, double *, double *, double *, int, int, void *, void *)
 
int tridiagIterJacobi (double *, double *, double *, double *, int, int, void *, void *)
 
int tridiagLUInit (void *, void *)
 
int blocktridiagLU (double *, double *, double *, double *, int, int, int, void *, void *)
 
int blocktridiagIterJacobi (double *, double *, double *, double *, int, int, int, void *, void *)
 

Detailed Description

Header file for TridiagLU.

Author
Debojyoti Ghosh

Definition in file tridiagLU.h.


Data Structure Documentation

struct TridiagLU

Definition at line 84 of file tridiagLU.h.

Data Fields
char reducedsolvetype[50]

Choice of solver for solving the reduced system. May be _TRIDIAG_JACOBI_ or _TRIDIAG_GS_.

int evaluate_norm

calculate norm at each iteration? (relevant only for iterative solvers)

int maxiter

maximum number of iterations (relevant only for iterative solvers)

double atol

absolute tolerance (relevant only for iterative solvers)

double rtol

relative tolerace (relevant only for iterative solvers)

int exititer

number of iterations it ran for (relevant only for iterative solvers)

double exitnorm

error norm at exit (relevant only for iterative solvers)

int verbose

print iterations and norms (relevant only for iterative solvers)

double total_time

Total wall time in seconds

double stage1_time

Wall time (in seconds) for stage 1 of tridiagLU() or blocktridiagLU()

double stage2_time

Wall time (in seconds) for stage 2 of tridiagLU() or blocktridiagLU()

double stage3_time

Wall time (in seconds) for stage 3 of tridiagLU() or blocktridiagLU()

double stage4_time

Wall time (in seconds) for stage 4 of tridiagLU() or blocktridiagLU()

Macro Definition Documentation

#define _TRIDIAG_JACOBI_   "jacobi"

Jacobi method

See Also
tridiagIterJacobi(), blocktridiagIterJacobi()

Definition at line 74 of file tridiagLU.h.

#define _TRIDIAG_GS_   "gather-and-solve"

"Gather-and-solve" method

See Also
tridiagLUGS

Definition at line 76 of file tridiagLU.h.

Function Documentation

int tridiagLU ( double *  a,
double *  b,
double *  c,
double *  x,
int  n,
int  ns,
void *  r,
void *  m 
)

Solve tridiagonal (non-periodic) systems of equations using parallel LU decomposition: This function can solve multiple independent systems with one call. The systems need not share the same left- or right-hand-sides. The iterative substructuring method is used in this function that can be briefly described through the following 4 stages:

  • Stage 1: Parallel elimination of the tridiagonal blocks on each processor comprising all points of the subdomain except the 1st point (unless its the 1st global point, i.e., a physical boundary)
  • Stage 2: Elimination of the 1st row on each processor (except the 1st processor) using the last row of the previous processor.
  • Stage 3: Solution of the reduced tridiagonal system that represents the coupling of the system across the processors, using blocktridiagIterJacobi() in this implementation.
  • Stage 4: Backward-solve to obtain the final solution

Specific details of the method implemented here are available in:

More references on this class of parallel tridiagonal solvers:

  • E. Polizzi and A. H. Sameh, "A parallel hybrid banded system solver: The SPIKE algorithm", Parallel Comput., 32 (2006), pp. 177–194.
  • E. Polizzi and A. H. Sameh, "SPIKE: A parallel environment for solving banded linear systems", Comput. & Fluids, 36 (2007), pp. 113–120.

Array layout: The arguments a, b, c, and x are local 1D arrays (containing this processor's part of the subdiagonal, diagonal, superdiagonal, and right-hand-side) of size (n X ns), where n is the local size of the system, and ns is the number of independent systems to solve. The ordering of the elements in these arrays is as follows:

  • Elements of the same row for each of the independent systems are stored adjacent to each other.

For example, consider the following systems:

\begin{equation} \left[\begin{array}{ccccc} b_0^k & c_0^k & & & \\ a_1^k & b_1^k & c_1^k & & \\ & a_2^k & b_2^k & c_2^k & & \\ & & a_3^k & b_3^k & c_3^k & \\ & & & a_4^k & b_4^k & c_4^k \\ \end{array}\right] \left[\begin{array}{c} x_0^k \\ x_1^k \\ x_2^k \\ x_3^k \\ x_4^k \end{array}\right] = \left[\begin{array}{c} r_0^k \\ r_1^k \\ r_2^k \\ r_3^k \\ r_4^k \end{array}\right]; \ \ k= 1,\cdots,ns \end{equation}

and let \( ns = 3\). Note that in the code, \(x\) and \(r\) are the same array x.

Then, the array b must be a 1D array with the following layout of elements:
[
b_0^0, b_0^1, b_0^2, (diagonal element of the first row in each system)
b_1^0, b_1^1, b_1^2, (diagonal element of the second row in each system)
...,
b_{n-1}^0, b_{n-1}^1, b_{n-1}^2 (diagonal element of the last row in each system)
]
The arrays a, c, and x are stored similarly.

Notes:

  • This function does not preserve the sub-diagonal, diagonal, super-diagonal elements and the right-hand-sides.
  • The input array x contains the right-hand-side on entering the function, and the solution on exiting it.
Parameters
aArray containing the sub-diagonal elements
bArray containing the diagonal elements
cArray containing the super-diagonal elements
xRight-hand side; will contain the solution on exit
nLocal size of the system on this processor
nsNumber of systems to solve
rObject of type TridiagLU
mMPI communicator

Definition at line 83 of file tridiagLU.c.

93 {
94  TridiagLU *params = (TridiagLU*) r;
95  int d,i,istart,iend;
96  int rank,nproc;
97  struct timeval start,stage1,stage2,stage3,stage4;
98 
99 #ifdef serial
100  rank = 0;
101  nproc = 1;
102 #else
103  MPI_Comm *comm = (MPI_Comm*) m;
104  int ierr = 0;
105  const int nvar = 4;
106 
107  if (comm) {
108  MPI_Comm_size(*comm,&nproc);
109  MPI_Comm_rank(*comm,&rank);
110  } else {
111  rank = 0;
112  nproc = 1;
113  }
114 #endif
115 
116  if (!params) {
117  fprintf(stderr,"Error in tridiagLU(): NULL pointer passed for parameters.\n");
118  return(1);
119  }
120 
121  /* start */
122  gettimeofday(&start,NULL);
123 
124  if ((ns == 0) || (n == 0)) return(0);
125  double *xs1, *xp1;
126  xs1 = (double*) calloc (ns, sizeof(double));
127  xp1 = (double*) calloc (ns, sizeof(double));
128  for (i=0; i<ns; i++) xs1[i] = xp1[i] = 0;
129 
130  /* Stage 1 - Parallel elimination of subdiagonal entries */
131  istart = (rank == 0 ? 1 : 2);
132  iend = n;
133  for (i = istart; i < iend; i++) {
134  for (d = 0; d < ns; d++) {
135  if (b[(i-1)*ns+d] == 0) return(-1);
136  double factor = a[i*ns+d] / b[(i-1)*ns+d];
137  b[i*ns+d] -= factor * c[(i-1)*ns+d];
138  a[i*ns+d] = -factor * a[(i-1)*ns+d];
139  x[i*ns+d] -= factor * x[(i-1)*ns+d];
140  if (rank) {
141  double factor = c[d] / b[(i-1)*ns+d];
142  c[d] = -factor * c[(i-1)*ns+d];
143  b[d] -= factor * a[(i-1)*ns+d];
144  x[d] -= factor * x[(i-1)*ns+d];
145  }
146  }
147  }
148 
149  /* end of stage 1 */
150  gettimeofday(&stage1,NULL);
151 
152  /* Stage 2 - Eliminate the first sub- & super-diagonal entries */
153  /* This needs the last (a,b,c,x) from the previous process */
154 #ifndef serial
155  double *sendbuf, *recvbuf;
156  sendbuf = (double*) calloc (ns*nvar, sizeof(double));
157  recvbuf = (double*) calloc (ns*nvar, sizeof(double));
158  for (d=0; d<ns; d++) {
159  sendbuf[d*nvar+0] = a[(n-1)*ns+d];
160  sendbuf[d*nvar+1] = b[(n-1)*ns+d];
161  sendbuf[d*nvar+2] = c[(n-1)*ns+d];
162  sendbuf[d*nvar+3] = x[(n-1)*ns+d];
163  }
164  if (nproc > 1) {
165  MPI_Request req[2] = {MPI_REQUEST_NULL,MPI_REQUEST_NULL};
166  if (rank) MPI_Irecv(recvbuf,nvar*ns,MPI_DOUBLE,rank-1,1436,*comm,&req[0]);
167  if (rank != nproc-1) MPI_Isend(sendbuf,nvar*ns,MPI_DOUBLE,rank+1,1436,*comm,&req[1]);
168  MPI_Status status_arr[2];
169  MPI_Waitall(2,&req[0],status_arr);
170  }
171  /* The first process sits this one out */
172  if (rank) {
173  for (d = 0; d < ns; d++) {
174  double am1, bm1, cm1, xm1;
175  am1 = recvbuf[d*nvar+0];
176  bm1 = recvbuf[d*nvar+1];
177  cm1 = recvbuf[d*nvar+2];
178  xm1 = recvbuf[d*nvar+3];
179  double factor;
180  if (bm1 == 0) return(-1);
181  factor = a[d] / bm1;
182  b[d] -= factor * cm1;
183  a[d] = -factor * am1;
184  x[d] -= factor * xm1;
185  if (b[(n-1)*ns+d] == 0) return(-1);
186  factor = c[d] / b[(n-1)*ns+d];
187  b[d] -= factor * a[(n-1)*ns+d];
188  c[d] = -factor * c[(n-1)*ns+d];
189  x[d] -= factor * x[(n-1)*ns+d];
190  }
191  }
192  free(sendbuf);
193  free(recvbuf);
194 #endif
195 
196  /* end of stage 2 */
197  gettimeofday(&stage2,NULL);
198 
199  /* Stage 3 - Solve the reduced (nproc-1) X (nproc-1) tridiagonal system */
200 #ifndef serial
201  if (nproc > 1) {
202  double *zero, *one;
203  zero = (double*) calloc (ns, sizeof(double));
204  one = (double*) calloc (ns, sizeof(double));
205  for (d=0; d<ns; d++) {
206  zero[d] = 0.0;
207  one [d] = 1.0;
208  }
209  if (!strcmp(params->reducedsolvetype,_TRIDIAG_GS_)) {
210  /* Solving the reduced system by gather-and-solve algorithm */
211  if (rank) ierr = tridiagLUGS(a,b,c,x,1,ns,params,comm);
212  else ierr = tridiagLUGS(zero,one,zero,zero,1,ns,params,comm);
213  if (ierr) return(ierr);
214  } else if (!strcmp(params->reducedsolvetype,_TRIDIAG_JACOBI_)) {
215  /* Solving the reduced system iteratively with the Jacobi method */
216  if (rank) ierr = tridiagIterJacobi(a,b,c,x,1,ns,params,comm);
217  else ierr = tridiagIterJacobi(zero,one,zero,zero,1,ns,params,comm);
218  }
219  free(zero);
220  free(one);
221 
222  /* Each process, get the first x of the next process */
223  MPI_Request req[2] = {MPI_REQUEST_NULL,MPI_REQUEST_NULL};
224  for (d=0; d<ns; d++) xs1[d] = x[d];
225  if (rank+1 < nproc) MPI_Irecv(xp1,ns,MPI_DOUBLE,rank+1,1323,*comm,&req[0]);
226  if (rank) MPI_Isend(xs1,ns,MPI_DOUBLE,rank-1,1323,*comm,&req[1]);
227  MPI_Status status_arr[2];
228  MPI_Waitall(2,&req[0],status_arr);
229  }
230 #else
231  if (nproc > 1) {
232  fprintf(stderr,"Error: nproc > 1 for a serial run!\n");
233  return(1);
234  }
235 #endif /* if not serial */
236  /* end of stage 3 */
237  gettimeofday(&stage3,NULL);
238 
239  /* Stage 4 - Parallel back-substitution to get the solution */
240  istart = n-1;
241  iend = (rank == 0 ? 0 : 1);
242 
243  for (d = 0; d < ns; d++) {
244  if (b[istart*ns+d] == 0) return(-1);
245  x[istart*ns+d] = (x[istart*ns+d]-a[istart*ns+d]*x[d]-c[istart*ns+d]*xp1[d]) / b[istart*ns+d];
246  }
247  for (i = istart-1; i > iend-1; i--) {
248  for (d = 0; d < ns; d++) {
249  if (b[i*ns+d] == 0) return(-1);
250  x[i*ns+d] = (x[i*ns+d]-c[i*ns+d]*x[(i+1)*ns+d]-a[i*ns+d]*x[d]) / b[i*ns+d];
251  }
252  }
253 
254  /* end of stage 4 */
255  gettimeofday(&stage4,NULL);
256 
257  /* Done - now x contains the solution */
258  free(xs1);
259  free(xp1);
260 
261  /* save runtimes if needed */
262  long long walltime;
263  walltime = ((stage1.tv_sec * 1000000 + stage1.tv_usec) - (start.tv_sec * 1000000 + start.tv_usec));
264  params->stage1_time = (double) walltime / 1000000.0;
265  walltime = ((stage2.tv_sec * 1000000 + stage2.tv_usec) - (stage1.tv_sec * 1000000 + stage1.tv_usec));
266  params->stage2_time = (double) walltime / 1000000.0;
267  walltime = ((stage3.tv_sec * 1000000 + stage3.tv_usec) - (stage2.tv_sec * 1000000 + stage2.tv_usec));
268  params->stage3_time = (double) walltime / 1000000.0;
269  walltime = ((stage4.tv_sec * 1000000 + stage4.tv_usec) - (stage3.tv_sec * 1000000 + stage3.tv_usec));
270  params->stage4_time = (double) walltime / 1000000.0;
271  walltime = ((stage4.tv_sec * 1000000 + stage4.tv_usec) - (start.tv_sec * 1000000 + start.tv_usec));
272  params->total_time = (double) walltime / 1000000.0;
273  return(0);
274 }
double total_time
Definition: tridiagLU.h:101
#define _TRIDIAG_GS_
Definition: tridiagLU.h:76
char reducedsolvetype[50]
Definition: tridiagLU.h:91
double stage1_time
Definition: tridiagLU.h:102
double stage3_time
Definition: tridiagLU.h:104
int tridiagLUGS(double *, double *, double *, double *, int, int, void *, void *)
Definition: tridiagLUGS.c:64
double stage2_time
Definition: tridiagLU.h:103
int tridiagIterJacobi(double *, double *, double *, double *, int, int, void *, void *)
double stage4_time
Definition: tridiagLU.h:105
#define _TRIDIAG_JACOBI_
Definition: tridiagLU.h:74
int tridiagLUGS ( double *  a,
double *  b,
double *  c,
double *  x,
int  n,
int  ns,
void *  r,
void *  m 
)

Solve tridiagonal (non-periodic) systems of equations using the gather-and-solve approach: This function can solve multiple independent systems with one call. The systems need not share the same left- or right-hand-sides. The "gather-and-solve" approach gathers a tridiagonal system on one processor and solves it using tridiagLU() (sending NULL as the argument for MPI communicator to indicate that a serial solution is desired). Given multiple tridiagonal systems (ns > 1), this function will gather the systems on different processors in an optimal way, and thus each processor will solve some of the systems. After the system(s) is (are) solved, the solution(s) is (are) scattered back to the original processors.

Array layout: The arguments a, b, c, and x are local 1D arrays (containing this processor's part of the subdiagonal, diagonal, superdiagonal, and right-hand-side) of size (n X ns), where n is the local size of the system, and ns is the number of independent systems to solve. The ordering of the elements in these arrays is as follows:

  • Elements of the same row for each of the independent systems are stored adjacent to each other.

For example, consider the following systems:

\begin{equation} \left[\begin{array}{ccccc} b_0^k & c_0^k & & & \\ a_1^k & b_1^k & c_1^k & & \\ & a_2^k & b_2^k & c_2^k & & \\ & & a_3^k & b_3^k & c_3^k & \\ & & & a_4^k & b_4^k & c_4^k \\ \end{array}\right] \left[\begin{array}{c} x_0^k \\ x_1^k \\ x_2^k \\ x_3^k \\ x_4^k \end{array}\right] = \left[\begin{array}{c} r_0^k \\ r_1^k \\ r_2^k \\ r_3^k \\ r_4^k \end{array}\right]; \ \ k= 1,\cdots,ns \end{equation}

and let \( ns = 3\). Note that in the code, \(x\) and \(r\) are the same array x.

Then, the array b must be a 1D array with the following layout of elements:
[
b_0^0, b_0^1, b_0^2, (diagonal element of the first row in each system)
b_1^0, b_1^1, b_1^2, (diagonal element of the second row in each system)
...,
b_{n-1}^0, b_{n-1}^1, b_{n-1}^2 (diagonal element of the last row in each system)
]
The arrays a, c, and x are stored similarly.

Notes:

  • This function does not preserve the sub-diagonal, diagonal, super-diagonal elements and the right-hand-sides.
  • The input array x contains the right-hand-side on entering the function, and the solution on exiting it.
Parameters
aArray containing the sub-diagonal elements
bArray containing the diagonal elements
cArray containing the super-diagonal elements
xRight-hand side; will contain the solution on exit
nLocal size of the system on this processor
nsNumber of systems to solve
rObject of type TridiagLU
mMPI communicator

Definition at line 64 of file tridiagLUGS.c.

74 {
75  TridiagLU *context = (TridiagLU*) r;
76  if (!context) {
77  fprintf(stderr,"Error in tridiagLUGS(): NULL pointer passed for parameters.\n");
78  return(-1);
79  }
80 #ifdef serial
81 
82  /* Serial compilation */
83  return(tridiagLU(a,b,c,x,n,ns,context,m));
84 
85 #else
86 
87  int d,i,ierr = 0,dstart,istart,p,q;
88  const int nvar = 4;
89  double *sendbuf,*recvbuf;
90  int rank,nproc;
91 
92  /* Parallel compilation */
93  MPI_Comm *comm = (MPI_Comm*) m;
94  if (!comm) return(tridiagLU(a,b,c,x,n,ns,context,NULL));
95  MPI_Comm_size(*comm,&nproc);
96  MPI_Comm_rank(*comm,&rank);
97 
98  if ((ns == 0) || (n == 0)) return(0);
99 
100  /*
101  each process needs to know the local sizes of every other process
102  and total size of the system
103  */
104  int *N = (int*) calloc (nproc, sizeof(int));
105  MPI_Allgather(&n,1,MPI_INT,N,1,MPI_INT,*comm);
106  int NT = 0; for (i=0; i<nproc; i++) NT += N[i];
107 
108  /* counts and displacements for gather and scatter operations */
109  int *counts = (int*) calloc ((short)nproc, sizeof(int));
110  int *displ = (int*) calloc ((short)nproc, sizeof(int));
111 
112  /* on all processes, calculate the number of systems each */
113  /* process has to solve */
114  int *ns_local = (int*) calloc ((short)nproc,sizeof(int));
115  for (p=0; p<nproc; p++) ns_local[p] = ns / nproc;
116  for (p=0; p<ns%nproc; p++) ns_local[p]++;
117 
118  /* allocate the arrays for the gathered tridiagonal systems */
119  double *ra=NULL,*rb=NULL,*rc=NULL,*rx=NULL;
120  if (ns_local[rank] > 0) {
121  ra = (double*) calloc (ns_local[rank]*NT,sizeof(double));
122  rb = (double*) calloc (ns_local[rank]*NT,sizeof(double));
123  rc = (double*) calloc (ns_local[rank]*NT,sizeof(double));
124  rx = (double*) calloc (ns_local[rank]*NT,sizeof(double));
125  }
126 
127  /* Gather the systems on each process */
128  /* allocate receive buffer */
129  if (ns_local[rank] > 0)
130  recvbuf = (double*) calloc (ns_local[rank]*nvar*NT,sizeof(double));
131  else recvbuf = NULL;
132  dstart = 0;
133  for (p = 0; p < nproc; p++) {
134  if (ns_local[p] > 0) {
135  /* allocate send buffer and form the send packet of data */
136  sendbuf = (double*) calloc (nvar*n*ns_local[p],sizeof(double));
137  for (d = 0; d < ns_local[p]; d++) {
138  for (i = 0; i < n; i++) {
139  sendbuf[n*nvar*d+n*0+i] = a[d+dstart+ns*i];
140  sendbuf[n*nvar*d+n*1+i] = b[d+dstart+ns*i];
141  sendbuf[n*nvar*d+n*2+i] = c[d+dstart+ns*i];
142  sendbuf[n*nvar*d+n*3+i] = x[d+dstart+ns*i];
143  }
144  }
145  dstart += ns_local[p];
146 
147  /* gather these reduced systems on process with rank = p */
148  if (rank == p) {
149  for (q = 0; q < nproc; q++) {
150  counts[q] = nvar*N[q]*ns_local[p];
151  displ [q] = (q == 0 ? 0 : displ[q-1]+counts[q-1]);
152  }
153  }
154  MPI_Gatherv(sendbuf,nvar*n*ns_local[p],MPI_DOUBLE,
155  recvbuf,counts,displ,MPI_DOUBLE,p,*comm);
156 
157  /* deallocate send buffer */
158  free(sendbuf);
159  }
160  }
161  /* extract the data from the recvbuf and solve */
162  istart = 0;
163  for (q = 0; q < nproc; q++) {
164  for (d = 0; d < ns_local[rank]; d++) {
165  for (i = 0; i < N[q]; i++) {
166  ra[d+ns_local[rank]*(istart+i)] = recvbuf[istart*nvar*ns_local[rank]+d*nvar*N[q]+0*N[q]+i];
167  rb[d+ns_local[rank]*(istart+i)] = recvbuf[istart*nvar*ns_local[rank]+d*nvar*N[q]+1*N[q]+i];
168  rc[d+ns_local[rank]*(istart+i)] = recvbuf[istart*nvar*ns_local[rank]+d*nvar*N[q]+2*N[q]+i];
169  rx[d+ns_local[rank]*(istart+i)] = recvbuf[istart*nvar*ns_local[rank]+d*nvar*N[q]+3*N[q]+i];
170  }
171  }
172  istart += N[q];
173  }
174  /* deallocate receive buffer */
175  if (recvbuf) free(recvbuf);
176 
177  /* solve the gathered systems in serial */
178  ierr = tridiagLU(ra,rb,rc,rx,NT,ns_local[rank],context,NULL);
179  if (ierr) return(ierr);
180 
181  /* allocate send buffer and save the data to send */
182  if (ns_local[rank] > 0)
183  sendbuf = (double*) calloc (ns_local[rank]*NT,sizeof(double));
184  else sendbuf = NULL;
185  istart = 0;
186  for (q = 0; q < nproc; q++) {
187  for (i = 0; i < N[q]; i++) {
188  for (d = 0; d < ns_local[rank]; d++) {
189  sendbuf[istart*ns_local[rank]+d*N[q]+i] = rx[d+ns_local[rank]*(istart+i)];
190  }
191  }
192  istart += N[q];
193  }
194  dstart = 0;
195  for (p = 0; p < nproc; p++) {
196  if (ns_local[p] > 0) {
197 
198  /* allocate receive buffer */
199  recvbuf = (double*) calloc (ns_local[p]*n, sizeof(double));
200 
201  /* scatter the solution back */
202  for (q = 0; q < nproc; q++) {
203  counts[q] = ns_local[p]*N[q];
204  displ[q] = (q == 0 ? 0 : displ[q-1]+counts[q-1]);
205  }
206  MPI_Scatterv(sendbuf,counts,displ,MPI_DOUBLE,
207  recvbuf,ns_local[p]*n,MPI_DOUBLE,
208  p,*comm);
209  /* save the solution on all root processes */
210  for (d = 0; d < ns_local[p]; d++) {
211  for (i = 0; i < n; i++) {
212  x[d+dstart+ns*i] = recvbuf[d*n+i];
213  }
214  }
215  dstart += ns_local[p];
216  /* deallocate receive buffer */
217  free(recvbuf);
218  }
219  }
220  /* deallocate send buffer */
221  if (sendbuf) free(sendbuf);
222 
223  /* clean up */
224  if (ns_local[rank] > 0) {
225  free(ra);
226  free(rb);
227  free(rc);
228  free(rx);
229  }
230  free(ns_local);
231  free(N);
232  free(displ);
233  free(counts);
234 
235  return(0);
236 #endif
237 }
int tridiagLU(double *, double *, double *, double *, int, int, void *, void *)
Definition: tridiagLU.c:83
int tridiagIterJacobi ( double *  a,
double *  b,
double *  c,
double *  x,
int  n,
int  ns,
void *  r,
void *  m 
)

Solve tridiagonal (non-periodic) systems of equations using point Jacobi iterations: This function can solve multiple independent systems with one call. The systems need not share the same left- or right-hand-sides. The initial guess is taken as the solution of

\begin{equation} {\rm diag}\left[{\bf b}\right]{\bf x} = {\bf r} \end{equation}

where \({\bf b}\) represents the diagonal elements of the tridiagonal system, and \({\bf r}\) is the right-hand-side, stored in \({\bf x}\) at the start of this function.

Array layout: The arguments a, b, c, and x are local 1D arrays (containing this processor's part of the subdiagonal, diagonal, superdiagonal, and right-hand-side) of size (n X ns), where n is the local size of the system, and ns is the number of independent systems to solve. The ordering of the elements in these arrays is as follows:

  • Elements of the same row for each of the independent systems are stored adjacent to each other.

For example, consider the following systems:

\begin{equation} \left[\begin{array}{ccccc} b_0^k & c_0^k & & & \\ a_1^k & b_1^k & c_1^k & & \\ & a_2^k & b_2^k & c_2^k & & \\ & & a_3^k & b_3^k & c_3^k & \\ & & & a_4^k & b_4^k & c_4^k \\ \end{array}\right] \left[\begin{array}{c} x_0^k \\ x_1^k \\ x_2^k \\ x_3^k \\ x_4^k \end{array}\right] = \left[\begin{array}{c} r_0^k \\ r_1^k \\ r_2^k \\ r_3^k \\ r_4^k \end{array}\right]; \ \ k= 1,\cdots,ns \end{equation}

and let \( ns = 3\). Note that in the code, \(x\) and \(r\) are the same array x.

Then, the array b must be a 1D array with the following layout of elements:
[
b_0^0, b_0^1, b_0^2, (diagonal element of the first row in each system)
b_1^0, b_1^1, b_1^2, (diagonal element of the second row in each system)
...,
b_{n-1}^0, b_{n-1}^1, b_{n-1}^2 (diagonal element of the last row in each system)
]
The arrays a, c, and x are stored similarly.

Notes:

  • This function does not preserve the sub-diagonal, diagonal, super-diagonal elements and the right-hand-sides.
  • The input array x contains the right-hand-side on entering the function, and the solution on exiting it.
Parameters
aArray containing the sub-diagonal elements
bArray containing the diagonal elements
cArray containing the super-diagonal elements
xRight-hand side; will contain the solution on exit
nLocal size of the system on this processor
nsNumber of systems to solve
rObject of type TridiagLU
mMPI communicator

Definition at line 64 of file tridiagIterJacobi.c.

74 {
75  TridiagLU *context = (TridiagLU*) r;
76  int iter,d,i,NT;
77  double norm=0,norm0=0,global_norm=0;
78 
79 #ifndef serial
80  MPI_Comm *comm = (MPI_Comm*) m;
81  int rank,nproc;
82 
83  if (comm) {
84  MPI_Comm_size(*comm,&nproc);
85  MPI_Comm_rank(*comm,&rank);
86  } else {
87  rank = 0;
88  nproc = 1;
89  }
90 #endif
91 
92  if (!context) {
93  fprintf(stderr,"Error in tridiagIterJacobi(): NULL pointer passed for parameters!\n");
94  return(-1);
95  }
96 
97  /* check for zero along the diagonal */
98  for (i=0; i<n; i++) {
99  for (d=0; d<ns; d++) {
100  if (b[i*ns+d]*b[i*ns+d] < context->atol*context->atol) {
101  fprintf(stderr,"Error in tridiagIterJacobi(): Encountered zero on main diagonal!\n");
102  return(1);
103  }
104  }
105  }
106 
107  double *rhs = (double*) calloc (ns*n, sizeof(double));
108  for (i=0; i<n; i++) {
109  for (d=0; d<ns; d++) {
110  rhs[i*ns+d] = x[i*ns+d]; /* save a copy of the rhs */
111  x[i*ns+d] /= b[i*ns+d]; /* initial guess */
112  }
113  }
114 
115  double *recvbufL, *recvbufR, *sendbufL, *sendbufR;
116  recvbufL = (double*) calloc (ns, sizeof(double));
117  recvbufR = (double*) calloc (ns, sizeof(double));
118  sendbufL = (double*) calloc (ns, sizeof(double));
119  sendbufR = (double*) calloc (ns, sizeof(double));
120 
121  /* total number of points */
122 #ifdef serial
123  if (context->evaluate_norm) NT = n;
124  else NT = 0;
125 #else
126  if (context->evaluate_norm) MPI_Allreduce(&n,&NT,1,MPI_INT,MPI_SUM,*comm);
127  else NT = 0;
128 #endif
129 
130 #ifdef serial
131  if (context->verbose) printf("\n");
132 #else
133  if (context->verbose && (!rank)) printf("\n");
134 #endif
135 
136  iter = 0;
137  while(1) {
138 
139  /* evaluate break conditions */
140  if ( (iter >= context->maxiter)
141  || (iter && context->evaluate_norm && (global_norm < context->atol))
142  || (iter && context->evaluate_norm && (global_norm/norm0 < context->rtol)) ) {
143  break;
144  }
145 
146  /* Communicate the boundary x values between processors */
147  for (d=0; d<ns; d++) recvbufL[d] = recvbufR[d] = 0;
148 #ifndef serial
149  MPI_Request req[4] = {MPI_REQUEST_NULL,MPI_REQUEST_NULL,MPI_REQUEST_NULL,MPI_REQUEST_NULL};
150  if (rank) MPI_Irecv(recvbufL,ns,MPI_DOUBLE,rank-1,2,*comm,&req[0]);
151  if (rank != nproc-1) MPI_Irecv(recvbufR,ns,MPI_DOUBLE,rank+1,3,*comm,&req[1]);
152  for (d=0; d<ns; d++) { sendbufL[d] = x[d]; sendbufR[d] = x[(n-1)*ns+d]; }
153  if (rank) MPI_Isend(sendbufL,ns,MPI_DOUBLE,rank-1,3,*comm,&req[2]);
154  if (rank != nproc-1) MPI_Isend(sendbufR,ns,MPI_DOUBLE,rank+1,2,*comm,&req[3]);
155 #endif
156 
157  /* calculate error norm - interior */
158  if (context->evaluate_norm) {
159  norm = 0;
160  for (i=1; i<n-1; i++) {
161  for (d=0; d<ns; d++) {
162  norm += ( (a[i*ns+d]*x[(i-1)*ns+d] + b[i*ns+d]*x[i*ns+d] + c[i*ns+d]*x[(i+1)*ns+d] - rhs[i*ns+d])
163  * (a[i*ns+d]*x[(i-1)*ns+d] + b[i*ns+d]*x[i*ns+d] + c[i*ns+d]*x[(i+1)*ns+d] - rhs[i*ns+d]) );
164  }
165  }
166  }
167  /* calculate error norm - boundary */
168 #ifndef serial
169  MPI_Status status_arr[4];
170  MPI_Waitall(4,req,status_arr);
171 #endif
172  if (context->evaluate_norm) {
173  if (n > 1) {
174  for (d=0; d<ns; d++) {
175  norm += ( (a[d]*recvbufL[d] + b[d]*x[d] + c[d]*x[d+ns*1]- rhs[d])
176  * (a[d]*recvbufL[d] + b[d]*x[d] + c[d]*x[d+ns*1]- rhs[d]) );
177  }
178  for (d=0; d<ns; d++) {
179  norm += ( (a[d+ns*(n-1)]*x[d+ns*(n-2)] + b[d+ns*(n-1)]*x[d+ns*(n-1)] + c[d+ns*(n-1)]*recvbufR[d] - rhs[d+ns*(n-1)])
180  * (a[d+ns*(n-1)]*x[d+ns*(n-2)] + b[d+ns*(n-1)]*x[d+ns*(n-1)] + c[d+ns*(n-1)]*recvbufR[d] - rhs[d+ns*(n-1)]) );
181  }
182  } else {
183  for (d=0; d<ns; d++) {
184  norm += ( (a[d]*recvbufL[d] + b[d]*x[d] + c[d]*recvbufR[d] - rhs[d])
185  * (a[d]*recvbufL[d] + b[d]*x[d] + c[d]*recvbufR[d] - rhs[d]) );
186  }
187  }
188  /* sum over all processes */
189 #ifdef serial
190  global_norm = norm;
191 #else
192  MPI_Allreduce(&norm,&global_norm,1,MPI_DOUBLE,MPI_SUM,*comm);
193 #endif
194  global_norm = sqrt(global_norm/NT);
195  if (!iter) norm0 = global_norm;
196  } else {
197  norm = -1.0;
198  global_norm = -1.0;
199  }
200 
201 #ifdef serial
202  if (context->verbose)
203 #else
204  if (context->verbose && (!rank))
205 #endif
206  printf("\t\titer: %d, norm: %1.16E\n",iter,global_norm);
207 
208  /* correct the solution for this iteration */
209  if (n > 1) {
210  for (d=0; d<ns; d++) {
211  i = 0; x[i*ns+d] = (rhs[i*ns+d] - a[i*ns+d]*recvbufL[d] - c[i*ns+d]*x[d+ns*(i+1)] ) / b[i*ns+d];
212  i = n-1; x[i*ns+d] = (rhs[i*ns+d] - a[i*ns+d]*x[d+ns*(i-1)] - c[i*ns+d]*recvbufR[d]) / b[i*ns+d];
213  }
214  for (i=1; i<n-1; i++) {
215  for (d=0; d<ns; d++) {
216  x[i*ns+d] = (rhs[i*ns+d] - a[i*ns+d]*x[d+ns*(i-1)] - c[i*ns+d]*x[d+ns*(i+1)]) / b[i*ns+d];
217  }
218  }
219  } else {
220  for (d=0; d<ns; d++) x[d] = (rhs[d] - a[d]*recvbufL[d] - c[d]*recvbufR[d]) / b[d];
221  }
222 
223  /* finished with this iteration */
224  iter++;
225  }
226 
227  /* save convergence information */
228  context->exitnorm = (context->evaluate_norm ? global_norm : -1.0);
229  context->exititer = iter;
230 
231  free(rhs);
232  free(sendbufL);
233  free(sendbufR);
234  free(recvbufL);
235  free(recvbufR);
236 
237  return(0);
238 }
int verbose
Definition: tridiagLU.h:99
double atol
Definition: tridiagLU.h:95
int maxiter
Definition: tridiagLU.h:94
double exitnorm
Definition: tridiagLU.h:98
int exititer
Definition: tridiagLU.h:97
int evaluate_norm
Definition: tridiagLU.h:93
int tridiagLUInit ( void *  r,
void *  c 
)

Initialize the tridiagLU solver by setting the various parameters in TridiagLU to their default values. If the file lusolver.inp is available, read it and set the parameters.

The file lusolver.inp must be in the following format:

    begin
        <keyword>   <value>
        <keyword>   <value>
        <keyword>   <value>
        ...
        <keyword>   <value>
    end

where the list of keywords are:

Keyword name Type Variable Default value
evaluate_norm int TridiagLU::evaluate_norm 1
maxiter int TridiagLU::maxiter 10
atol double TridiagLU::atol 1e-12
rtol double TridiagLU::rtol 1e-10
verbose int TridiagLU::verbose 0
reducedsolvetype char[] TridiagLU::reducedsolvetype _TRIDIAG_JACOBI_
Parameters
rObject of type TridiagLU
cMPI communicator

Definition at line 39 of file tridiagLUInit.c.

43 {
44  TridiagLU *t = (TridiagLU*) r;
45  int rank,ierr;
46 #ifdef serial
47  rank = 0;
48 #else
49  MPI_Comm *comm = (MPI_Comm*) c;
50  if (!comm) rank = 0;
51  else MPI_Comm_rank(*comm,&rank);
52 #endif
53 
54  /* default values */
56  t->evaluate_norm = 1;
57  t->maxiter = 10;
58  t->atol = 1e-12;
59  t->rtol = 1e-10;
60  t->verbose = 0;
61 
62  /* read from file, if available */
63  if (!rank) {
64  FILE *in;
65  in = fopen("lusolver.inp","r");
66  if (!in) {
67  printf("tridiagLUInit: File \"lusolver.inp\" not found. Using default values.\n");
68  } else {
69  char word[100];
70  ierr = fscanf(in,"%s",word); if (ierr != 1) return(1);
71  if (!strcmp(word, "begin")) {
72  while (strcmp(word, "end")) {
73  ierr = fscanf(in,"%s",word); if (ierr != 1) return(1);
74  if (!strcmp(word, "evaluate_norm" )) ierr = fscanf(in,"%d" ,&t->evaluate_norm );
75  else if (!strcmp(word, "maxiter" )) ierr = fscanf(in,"%d" ,&t->maxiter );
76  else if (!strcmp(word, "atol" )) ierr = fscanf(in,"%lf",&t->atol );
77  else if (!strcmp(word, "rtol" )) ierr = fscanf(in,"%lf",&t->rtol );
78  else if (!strcmp(word, "verbose" )) ierr = fscanf(in,"%d" ,&t->verbose );
79  else if (!strcmp(word, "reducedsolvetype")) ierr = fscanf(in,"%s" ,t->reducedsolvetype);
80  else if (strcmp(word,"end")) {
81  char useless[100];
82  ierr = fscanf(in,"%s",useless);
83  printf("Warning: keyword %s in file \"lusolver.inp\" with value %s not recognized or extraneous. Ignoring.\n",word,useless);
84  }
85  if (ierr != 1) return(1);
86  }
87  } else {
88  fprintf(stderr,"Error: Illegal format in file \"solver.inp\".\n");
89  return(1);
90  }
91  fclose(in);
92  }
93  }
94 
95  /* broadcast to all processes */
96 #ifndef serial
97  if (comm) {
98  MPI_Bcast(t->reducedsolvetype,50,MPI_CHAR,0,*comm);
99  MPI_Bcast(&t->evaluate_norm,1,MPI_INT,0,*comm);
100  MPI_Bcast(&t->maxiter,1,MPI_INT,0,*comm);
101  MPI_Bcast(&t->verbose,1,MPI_INT,0,*comm);
102  MPI_Bcast(&t->atol,1,MPI_DOUBLE,0,*comm);
103  MPI_Bcast(&t->rtol,1,MPI_DOUBLE,0,*comm);
104  }
105 #endif
106 
107  return(0);
108 }
int verbose
Definition: tridiagLU.h:99
double rtol
Definition: tridiagLU.h:95
char reducedsolvetype[50]
Definition: tridiagLU.h:91
double atol
Definition: tridiagLU.h:95
int maxiter
Definition: tridiagLU.h:94
int evaluate_norm
Definition: tridiagLU.h:93
#define _TRIDIAG_JACOBI_
Definition: tridiagLU.h:74
int blocktridiagLU ( double *  a,
double *  b,
double *  c,
double *  x,
int  n,
int  ns,
int  bs,
void *  r,
void *  m 
)

Solve block tridiagonal (non-periodic) systems of equations using parallel LU decomposition: This function can solve multiple independent systems with one call. The systems need not share the same left- or right-hand-sides. The iterative substructuring method is used in this function that can be briefly described through the following 4 stages:

  • Stage 1: Parallel elimination of the tridiagonal blocks on each processor comprising all points of the subdomain except the 1st point (unless its the 1st global point, i.e., a physical boundary)
  • Stage 2: Elimination of the 1st row on each processor (except the 1st processor) using the last row of the previous processor.
  • Stage 3: Solution of the reduced tridiagonal system that represents the coupling of the system across the processors, using blocktridiagIterJacobi() in this implementation.
  • Stage 4: Backward-solve to obtain the final solution

Specific details of the method implemented here are available in:

More references on this class of parallel tridiagonal solvers:

  • E. Polizzi and A. H. Sameh, "A parallel hybrid banded system solver: The SPIKE algorithm", Parallel Comput., 32 (2006), pp. 177–194.
  • E. Polizzi and A. H. Sameh, "SPIKE: A parallel environment for solving banded linear systems", Comput. & Fluids, 36 (2007), pp. 113–120.

Array layout: The arguments a, b, and c are local 1D arrays (containing this processor's part of the subdiagonal, diagonal, and superdiagonal) of size (n X ns X bs^2), and x is a local 1D array (containing this processor's part of the right-hand-side, and will contain the solution on exit) of size (n X ns X bs), where n is the local size of the system, ns is the number of independent systems to solve, and bs is the block size. The ordering of the elements in these arrays is as follows:

  • Each block is stored in the row-major format.
  • Blocks of the same row for each of the independent systems are stored adjacent to each other.

For example, consider the following systems:

\begin{equation} \left[\begin{array}{ccccc} B_0^k & C_0^k & & & \\ A_1^k & B_1^k & C_1^k & & \\ & A_2^k & B_2^k & C_2^k & & \\ & & A_3^k & B_3^k & C_3^k & \\ & & & A_4^k & B_4^k & C_4^k \\ \end{array}\right] \left[\begin{array}{c} X_0^k \\ X_1^k \\ X_2^k \\ X_3^k \\ X_4^k \end{array}\right] = \left[\begin{array}{c} R_0^k \\ R_1^k \\ R_2^k \\ R_3^k \\ R_4^k \end{array}\right]; \ \ k= 1,\cdots,ns \end{equation}

where \(A\), \(B\), and \(C\) are matrices of size bs = 2 (say), and let \( ns = 3\). In the equation above, we have

\begin{equation} B_i^k = \left[\begin{array}{cc} b_{00,i}^k & b_{01,i}^k \\ b_{10,i}^k & b_{11,i}^k \end{array}\right], X_i^k = \left[\begin{array}{c} x_{0,i}^k \\ x_{1,i}^k \end{array} \right], R_i^k = \left[\begin{array}{c} r_{0,i}^k \\ r_{1,i}^k \end{array} \right] \end{equation}

Note that in the code, \(X\) and \(R\) are the same array x.

Then, the array b must be a 1D array with the following layout of elements:
[
b_{00,0}^0, b_{01,0}^0, b_{10,0}^0, b_{11,0}^0, b_{00,0}^1, b_{01,0}^1, b_{10,0}^1, b_{11,0}^1, b_{00,0}^2, b_{01,0}^2, b_{10,0}^2, b_{11,0}^2,
b_{00,1}^0, b_{01,1}^0, b_{10,1}^0, b_{11,1}^0, b_{00,1}^1, b_{01,1}^1, b_{10,1}^1, b_{11,1}^1, b_{00,1}^2, b_{01,1}^2, b_{10,1}^2, b_{11,1}^2,
...,
b_{00,n-1}^0, b_{01,n-1}^0, b_{10,n-1}^0, b_{11,n-1}^0, b_{00,n-1}^1, b_{01,n-1}^1, b_{10,n-1}^1, b_{11,n-1}^1, b_{00,n-1}^2, b_{01,n-1}^2, b_{10,n-1}^2, b_{11,n-1}^2
]
The arrays a and c are stored similarly.

The array corresponding to a vector (the solution and the right-hand-side x) must be a 1D array with the following layout of elements:
[
x_{0,0}^0, x_{1,0}^0, x_{0,0}^1, x_{1,0}^1,x_{0,0}^2, x_{1,0}^2,
x_{0,1}^0, x_{1,1}^0, x_{0,1}^1, x_{1,1}^1,x_{0,1}^2, x_{1,1}^2,
...,
x_{0,n-1}^0, x_{1,n-1}^0, x_{0,n-1}^1, x_{1,n-1}^1,x_{0,n-1}^2, x_{1,n-1}^2
]

Notes:

  • This function does not preserve the sub-diagonal, diagonal, super-diagonal elements and the right-hand-sides.
  • The input array x contains the right-hand-side on entering the function, and the solution on exiting it.
Parameters
aArray containing the sub-diagonal elements
bArray containing the diagonal elements
cArray containing the super-diagonal elements
xRight-hand side; will contain the solution on exit
nLocal size of the system on this processor (not multiplied by the block size)
nsNumber of systems to solve
bsBlock size
rObject of type TridiagLU (contains wall times at exit)
mMPI communicator

Definition at line 107 of file blocktridiagLU.c.

119 {
120  TridiagLU *params = (TridiagLU*) r;
121  int d,i,j,istart,iend,size;
122  int rank,nproc,bs2=bs*bs,nsbs=ns*bs;
123  struct timeval start,stage1,stage2,stage3,stage4;
124 
125 #ifdef serial
126  rank = 0;
127  nproc = 1;
128 #else
129  MPI_Comm *comm = (MPI_Comm*) m;
130  const int nvar = 4;
131  int ierr = 0;
132 
133  if (comm) {
134  MPI_Comm_size(*comm,&nproc);
135  MPI_Comm_rank(*comm,&rank);
136  } else {
137  rank = 0;
138  nproc = 1;
139  }
140 #endif
141 
142  if (!params) {
143  fprintf(stderr,"Error in tridiagLU(): NULL pointer passed for parameters.\n");
144  return(1);
145  }
146 
147  /* start */
148  gettimeofday(&start,NULL);
149 
150  if ((ns == 0) || (n == 0) || (bs == 0)) return(0);
151  double *xs1, *xp1;
152  xs1 = (double*) calloc (nsbs, sizeof(double));
153  xp1 = (double*) calloc (nsbs, sizeof(double));
154  for (i=0; i<nsbs; i++) xs1[i] = xp1[i] = 0;
155 
156  /* Stage 1 - Parallel elimination of subdiagonal entries */
157  istart = (rank == 0 ? 1 : 2);
158  iend = n;
159  for (i = istart; i < iend; i++) {
160  double binv[bs2], factor[bs2];
161  for (d = 0; d < ns; d++) {
162  _MatrixInvert_ (b+((i-1)*ns+d)*bs2,binv,bs);
163  _MatrixMultiply_ (a+(i*ns+d)*bs2,binv,factor,bs);
164  _MatrixMultiplySubtract_ (b+(i*ns+d)*bs2,factor,c+((i-1)*ns+d)*bs2,bs);
165  _MatrixZero_ (a+(i*ns+d)*bs2,bs);
166  _MatrixMultiplySubtract_ (a+(i*ns+d)*bs2,factor,a+((i-1)*ns+d)*bs2,bs);
167  _MatVecMultiplySubtract_ (x+(i*ns+d)*bs ,factor,x+((i-1)*ns+d)*bs ,bs);
168  if (rank) {
169  _MatrixMultiply_ (c+d*bs2,binv,factor,bs);
170  _MatrixZero_ (c+d*bs2,bs);
171  _MatrixMultiplySubtract_ (c+d*bs2,factor,c+((i-1)*ns+d)*bs2,bs);
172  _MatrixMultiplySubtract_ (b+d*bs2,factor,a+((i-1)*ns+d)*bs2,bs);
173  _MatVecMultiplySubtract_ (x+d*bs ,factor,x+((i-1)*ns+d)*bs ,bs);
174  }
175  }
176  }
177 
178  /* end of stage 1 */
179  gettimeofday(&stage1,NULL);
180 
181  /* Stage 2 - Eliminate the first sub- & super-diagonal entries */
182  /* This needs the last (a,b,c,x) from the previous process */
183 #ifndef serial
184  double *sendbuf, *recvbuf;
185  size = ns*bs2*(nvar-1)+nsbs;
186  sendbuf = (double*) calloc (size, sizeof(double));
187  recvbuf = (double*) calloc (size, sizeof(double));
188  for (d=0; d<ns; d++) {
189  for (i=0; i<bs2; i++) {
190  sendbuf[(0*ns+d)*bs2+i] = a[((n-1)*ns+d)*bs2+i];
191  sendbuf[(1*ns+d)*bs2+i] = b[((n-1)*ns+d)*bs2+i];
192  sendbuf[(2*ns+d)*bs2+i] = c[((n-1)*ns+d)*bs2+i];
193  }
194  for (i=0; i<bs; i++) sendbuf[3*ns*bs2+d*bs+i] = x[((n-1)*ns+d)*bs+i];
195  }
196  if (nproc > 1) {
197  MPI_Request req[2] = {MPI_REQUEST_NULL,MPI_REQUEST_NULL};
198  if (rank) MPI_Irecv(recvbuf,size,MPI_DOUBLE,rank-1,1436,*comm,&req[0]);
199  if (rank != nproc-1) MPI_Isend(sendbuf,size,MPI_DOUBLE,rank+1,1436,*comm,&req[1]);
200  MPI_Status status_arr[2];
201  MPI_Waitall(2,&req[0],status_arr);
202  }
203  /* The first process sits this one out */
204  if (rank) {
205  for (d = 0; d < ns; d++) {
206  double am1[bs2], bm1[bs2], cm1[bs2], xm1[bs];
207  for (i=0; i<bs2; i++) {
208  am1[i] = recvbuf[(0*ns+d)*bs2+i];
209  bm1[i] = recvbuf[(1*ns+d)*bs2+i];
210  cm1[i] = recvbuf[(2*ns+d)*bs2+i];
211  }
212  for (i=0; i<bs; i++) xm1[i] = recvbuf[3*ns*bs2+d*bs+i];
213  double factor[bs2], binv[bs2];
214  _MatrixInvert_ (bm1,binv,bs);
215  _MatrixMultiply_ (a+d*bs2,binv,factor,bs);
216  _MatrixMultiplySubtract_ (b+d*bs2,factor,cm1,bs);
217  _MatrixZero_ (a+d*bs2,bs);
218  _MatrixMultiplySubtract_ (a+d*bs2,factor,am1,bs);
219  _MatVecMultiplySubtract_ (x+d*bs ,factor,xm1,bs);
220 
221  _MatrixInvert_ (b+((n-1)*ns+d)*bs2,binv,bs); if (ierr) return(ierr);
222  _MatrixMultiply_ (c+d*bs2,binv,factor,bs);
223  _MatrixMultiplySubtract_ (b+d*bs2,factor,a+((n-1)*ns+d)*bs2,bs);
224  _MatrixZero_ (c+d*bs2,bs);
225  _MatrixMultiplySubtract_ (c+d*bs2,factor,c+((n-1)*ns+d)*bs2,bs);
226  _MatVecMultiplySubtract_ (x+d*bs ,factor,x+((n-1)*ns+d)*bs ,bs);
227  }
228  }
229  free(sendbuf);
230  free(recvbuf);
231 #endif
232 
233  /* end of stage 2 */
234  gettimeofday(&stage2,NULL);
235 
236  /* Stage 3 - Solve the reduced (nproc-1) X (nproc-1) tridiagonal system */
237 #ifndef serial
238  if (nproc > 1) {
239  double *zero, *eye;
240  zero = (double*) calloc (ns*bs2, sizeof(double));
241  eye = (double*) calloc (ns*bs2, sizeof(double));
242  for (d=0; d<ns*bs2; d++) zero[d] = eye[d] = 0.0;
243  for (d=0; d<ns; d++) {
244  for (i=0; i<bs; i++) eye[d*bs2+(i*bs+i)] = 1.0;
245  }
246 
247  if (!strcmp(params->reducedsolvetype,_TRIDIAG_GS_)) {
248  /* not supported */
249  fprintf(stderr,"Error in blocktridiagLU(): Gather-and-solve for reduced system not available.\n");
250  return(1);
251  } else if (!strcmp(params->reducedsolvetype,_TRIDIAG_JACOBI_)) {
252  /* Solving the reduced system iteratively with the Jacobi method */
253  if (rank) ierr = blocktridiagIterJacobi(a,b,c,x,1,ns,bs,params,comm);
254  else ierr = blocktridiagIterJacobi(zero,eye,zero,zero,1,ns,bs,params,comm);
255  }
256  free(zero);
257  free(eye);
258 
259  /* Each process, get the first x of the next process */
260  MPI_Request req[2] = {MPI_REQUEST_NULL,MPI_REQUEST_NULL};
261  for (d=0; d<nsbs; d++) xs1[d] = x[d];
262  if (rank+1 < nproc) MPI_Irecv(xp1,nsbs,MPI_DOUBLE,rank+1,1323,*comm,&req[0]);
263  if (rank) MPI_Isend(xs1,nsbs,MPI_DOUBLE,rank-1,1323,*comm,&req[1]);
264  MPI_Status status_arr[2];
265  MPI_Waitall(2,&req[0],status_arr);
266  }
267 #else
268  if (nproc > 1) {
269  fprintf(stderr,"Error: nproc > 1 for a serial run!\n");
270  return(1);
271  }
272 #endif /* if not serial */
273  /* end of stage 3 */
274  gettimeofday(&stage3,NULL);
275 
276  /* Stage 4 - Parallel back-substitution to get the solution */
277  istart = n-1;
278  iend = (rank == 0 ? 0 : 1);
279 
280  for (d = 0; d < ns; d++) {
281  double binv[bs2],xt[bs];
282  _MatrixInvert_ (b+(istart*ns+d)*bs2,binv,bs);
283  _MatVecMultiplySubtract_ (x+(istart*ns+d)*bs ,a+(istart*ns+d)*bs2,x +d*bs,bs);
284  _MatVecMultiplySubtract_ (x+(istart*ns+d)*bs ,c+(istart*ns+d)*bs2,xp1+d*bs,bs);
285  _MatVecMultiply_ (binv,x+(istart*ns+d)*bs,xt,bs);
286  for (j=0; j<bs; j++) x[(istart*ns+d)*bs+j]=xt[j];
287  }
288  for (i = istart-1; i > iend-1; i--) {
289  for (d = 0; d < ns; d++) {
290  double binv[bs2],xt[bs];
291  _MatrixInvert_ (b+(i*ns+d)*bs2,binv,bs);
292  _MatVecMultiplySubtract_ (x+(i*ns+d)*bs ,c+(i*ns+d)*bs2,x+((i+1)*ns+d)*bs,bs);
293  _MatVecMultiplySubtract_ (x+(i*ns+d)*bs ,a+(i*ns+d)*bs2,x+d*bs,bs);
294  _MatVecMultiply_ (binv,x+(i*ns+d)*bs,xt,bs);
295  for (j=0; j<bs; j++) x[(i*ns+d)*bs+j] = xt[j];
296  }
297  }
298 
299  /* end of stage 4 */
300  gettimeofday(&stage4,NULL);
301 
302  /* Done - now x contains the solution */
303  free(xs1);
304  free(xp1);
305 
306  /* save runtimes if needed */
307  long long walltime;
308  walltime = ((stage1.tv_sec * 1000000 + stage1.tv_usec) - (start.tv_sec * 1000000 + start.tv_usec));
309  params->stage1_time = (double) walltime / 1000000.0;
310  walltime = ((stage2.tv_sec * 1000000 + stage2.tv_usec) - (stage1.tv_sec * 1000000 + stage1.tv_usec));
311  params->stage2_time = (double) walltime / 1000000.0;
312  walltime = ((stage3.tv_sec * 1000000 + stage3.tv_usec) - (stage2.tv_sec * 1000000 + stage2.tv_usec));
313  params->stage3_time = (double) walltime / 1000000.0;
314  walltime = ((stage4.tv_sec * 1000000 + stage4.tv_usec) - (stage3.tv_sec * 1000000 + stage3.tv_usec));
315  params->stage4_time = (double) walltime / 1000000.0;
316  walltime = ((stage4.tv_sec * 1000000 + stage4.tv_usec) - (start.tv_sec * 1000000 + start.tv_usec));
317  params->total_time = (double) walltime / 1000000.0;
318  return(0);
319 }
double total_time
Definition: tridiagLU.h:101
#define _MatrixMultiply_(A, B, C, N)
Definition: matops.h:25
#define _TRIDIAG_GS_
Definition: tridiagLU.h:76
char reducedsolvetype[50]
Definition: tridiagLU.h:91
#define _MatVecMultiplySubtract_(y, A, x, N)
Definition: matops.h:81
double stage1_time
Definition: tridiagLU.h:102
#define _MatrixZero_(A, N)
Definition: matops.h:15
double stage3_time
Definition: tridiagLU.h:104
double stage2_time
Definition: tridiagLU.h:103
#define _MatrixMultiplySubtract_(C, A, B, N)
Definition: matops.h:54
#define _MatrixInvert_(A, B, N)
Definition: matops.h:93
double stage4_time
Definition: tridiagLU.h:105
int blocktridiagIterJacobi(double *, double *, double *, double *, int, int, int, void *, void *)
#define _MatVecMultiply_(A, x, y, N)
Definition: matops.h:67
#define _TRIDIAG_JACOBI_
Definition: tridiagLU.h:74
int blocktridiagIterJacobi ( double *  a,
double *  b,
double *  c,
double *  x,
int  n,
int  ns,
int  bs,
void *  r,
void *  m 
)

Solve block tridiagonal (non-periodic) systems of equations using point Jacobi iterations: This function can solve multiple independent systems with one call. The systems need not share the same left- or right-hand-sides. The initial guess is taken as the solution of

\begin{equation} {\rm diag}\left[{\bf b}\right]{\bf x} = {\bf r} \end{equation}

where \({\bf b}\) represents the diagonal elements of the tridiagonal system, and \({\bf r}\) is the right-hand-side, stored in \({\bf x}\) at the start of this function.

Array layout: The arguments a, b, and c are local 1D arrays (containing this processor's part of the subdiagonal, diagonal, and superdiagonal) of size (n X ns X bs^2), and x is a local 1D array (containing this processor's part of the right-hand-side, and will contain the solution on exit) of size (n X ns X bs), where n is the local size of the system, ns is the number of independent systems to solve, and bs is the block size. The ordering of the elements in these arrays is as follows:

  • Each block is stored in the row-major format.
  • Blocks of the same row for each of the independent systems are stored adjacent to each other.

For example, consider the following systems:

\begin{equation} \left[\begin{array}{ccccc} B_0^k & C_0^k & & & \\ A_1^k & B_1^k & C_1^k & & \\ & A_2^k & B_2^k & C_2^k & & \\ & & A_3^k & B_3^k & C_3^k & \\ & & & A_4^k & B_4^k & C_4^k \\ \end{array}\right] \left[\begin{array}{c} X_0^k \\ X_1^k \\ X_2^k \\ X_3^k \\ X_4^k \end{array}\right] = \left[\begin{array}{c} R_0^k \\ R_1^k \\ R_2^k \\ R_3^k \\ R_4^k \end{array}\right]; \ \ k= 1,\cdots,ns \end{equation}

where \(A\), \(B\), and \(C\) are matrices of size bs = 2 (say), and let \( ns = 3\). In the equation above, we have

\begin{equation} B_i^k = \left[\begin{array}{cc} b_{00,i}^k & b_{01,i}^k \\ b_{10,i}^k & b_{11,i}^k \end{array}\right], X_i^k = \left[\begin{array}{c} x_{0,i}^k \\ x_{1,i}^k \end{array} \right], R_i^k = \left[\begin{array}{c} r_{0,i}^k \\ r_{1,i}^k \end{array} \right] \end{equation}

Note that in the code, \(X\) and \(R\) are the same array x.

Then, the array b must be a 1D array with the following layout of elements:
[
b_{00,0}^0, b_{01,0}^0, b_{10,0}^0, b_{11,0}^0, b_{00,0}^1, b_{01,0}^1, b_{10,0}^1, b_{11,0}^1, b_{00,0}^2, b_{01,0}^2, b_{10,0}^2, b_{11,0}^2,
b_{00,1}^0, b_{01,1}^0, b_{10,1}^0, b_{11,1}^0, b_{00,1}^1, b_{01,1}^1, b_{10,1}^1, b_{11,1}^1, b_{00,1}^2, b_{01,1}^2, b_{10,1}^2, b_{11,1}^2,
...,
b_{00,n-1}^0, b_{01,n-1}^0, b_{10,n-1}^0, b_{11,n-1}^0, b_{00,n-1}^1, b_{01,n-1}^1, b_{10,n-1}^1, b_{11,n-1}^1, b_{00,n-1}^2, b_{01,n-1}^2, b_{10,n-1}^2, b_{11,n-1}^2
]
The arrays a and c are stored similarly.

The array corresponding to a vector (the solution and the right-hand-side x) must be a 1D array with the following layout of elements:
[
x_{0,0}^0, x_{1,0}^0, x_{0,0}^1, x_{1,0}^1,x_{0,0}^2, x_{1,0}^2,
x_{0,1}^0, x_{1,1}^0, x_{0,1}^1, x_{1,1}^1,x_{0,1}^2, x_{1,1}^2,
...,
x_{0,n-1}^0, x_{1,n-1}^0, x_{0,n-1}^1, x_{1,n-1}^1,x_{0,n-1}^2, x_{1,n-1}^2
]

Notes:

  • This function does not preserve the sub-diagonal, diagonal, super-diagonal elements and the right-hand-sides.
  • The input array x contains the right-hand-side on entering the function, and the solution on exiting it.
Parameters
aArray containing the sub-diagonal elements
bArray containing the diagonal elements
cArray containing the super-diagonal elements
xRight-hand side; will contain the solution on exit
nLocal size of the system on this processor (not multiplied by the block size)
nsNumber of systems to solve
bsBlock size
rObject of type TridiagLU
mMPI communicator

Definition at line 88 of file blocktridiagIterJacobi.c.

100 {
101  TridiagLU *context = (TridiagLU*) r;
102  int iter,d,i,j,NT,bs2=bs*bs,nsbs=ns*bs;
103  double norm=0,norm0=0,global_norm=0;
104 
105 #ifndef serial
106  MPI_Comm *comm = (MPI_Comm*) m;
107  int rank,nproc;
108 
109  if (comm) {
110  MPI_Comm_size(*comm,&nproc);
111  MPI_Comm_rank(*comm,&rank);
112  } else {
113  rank = 0;
114  nproc = 1;
115  }
116 #endif
117 
118  if (!context) {
119  fprintf(stderr,"Error in tridiagIterJacobi(): NULL pointer passed for parameters!\n");
120  return(-1);
121  }
122 
123  double *rhs = (double*) calloc (ns*n*bs, sizeof(double));
124  for (i=0; i<ns*n*bs; i++) rhs[i] = x[i]; /* save a copy of the rhs */
125  /* initial guess */
126  for (i=0; i<n; i++) {
127  for (d=0; d<ns; d++) {
128  double binv[bs2];
129  _MatrixInvert_ (b+(i*ns+d)*bs2,binv,bs);
130  _MatVecMultiply_ (binv,rhs+(i*ns+d)*bs,x+(i*ns+d)*bs,bs);
131  }
132  }
133 
134  double *recvbufL, *recvbufR, *sendbufL, *sendbufR;
135  recvbufL = (double*) calloc (nsbs, sizeof(double));
136  recvbufR = (double*) calloc (nsbs, sizeof(double));
137  sendbufL = (double*) calloc (nsbs, sizeof(double));
138  sendbufR = (double*) calloc (nsbs, sizeof(double));
139 
140  /* total number of points */
141 #ifdef serial
142  if (context->evaluate_norm) NT = n;
143  else NT = 0;
144 #else
145  if (context->evaluate_norm) MPI_Allreduce(&n,&NT,1,MPI_INT,MPI_SUM,*comm);
146  else NT = 0;
147 #endif
148 
149 #ifdef serial
150  if (context->verbose) printf("\n");
151 #else
152  if (context->verbose && (!rank)) printf("\n");
153 #endif
154 
155  iter = 0;
156  while(1) {
157 
158  /* evaluate break conditions */
159  if ( (iter >= context->maxiter)
160  || (iter && context->evaluate_norm && (global_norm < context->atol))
161  || (iter && context->evaluate_norm && (global_norm/norm0 < context->rtol)) ) {
162  break;
163  }
164 
165  /* Communicate the boundary x values between processors */
166  for (d=0; d<nsbs; d++) recvbufL[d] = recvbufR[d] = 0;
167 #ifndef serial
168  MPI_Request req[4] = {MPI_REQUEST_NULL,MPI_REQUEST_NULL,MPI_REQUEST_NULL,MPI_REQUEST_NULL};
169  if (rank) MPI_Irecv(recvbufL,nsbs,MPI_DOUBLE,rank-1,2,*comm,&req[0]);
170  if (rank != nproc-1) MPI_Irecv(recvbufR,nsbs,MPI_DOUBLE,rank+1,3,*comm,&req[1]);
171  for (d=0; d<nsbs; d++) {
172  sendbufL[d] = x[d];
173  sendbufR[d] = x[(n-1)*nsbs+d];
174  }
175  if (rank) MPI_Isend(sendbufL,nsbs,MPI_DOUBLE,rank-1,3,*comm,&req[2]);
176  if (rank != nproc-1) MPI_Isend(sendbufR,nsbs,MPI_DOUBLE,rank+1,2,*comm,&req[3]);
177 #endif
178 
179  /* calculate error norm - interior */
180  if (context->evaluate_norm) {
181  norm = 0;
182  for (i=1; i<n-1; i++) {
183  for (d=0; d<ns; d++) {
184  double err[bs]; for (j=0; j<bs; j++) err[j] = rhs[(i*ns+d)*bs+j];
185  _MatVecMultiplySubtract_(err,a+(i*ns+d)*bs2,x+((i-1)*ns+d)*bs,bs);
186  _MatVecMultiplySubtract_(err,b+(i*ns+d)*bs2,x+((i )*ns+d)*bs,bs);
187  _MatVecMultiplySubtract_(err,c+(i*ns+d)*bs2,x+((i+1)*ns+d)*bs,bs);
188  for (j=0; j<bs; j++) norm += (err[j]*err[j]);
189  }
190  }
191  }
192  /* calculate error norm - boundary */
193 #ifndef serial
194  MPI_Status status_arr[4];
195  MPI_Waitall(4,req,status_arr);
196 #endif
197  if (context->evaluate_norm) {
198  if (n > 1) {
199  for (d=0; d<ns; d++) {
200  double err[bs]; for (j=0; j<bs; j++) err[j] = rhs[d*bs+j];
201  _MatVecMultiplySubtract_(err,a+d*bs2,recvbufL+d*bs,bs);
202  _MatVecMultiplySubtract_(err,b+d*bs2,x+d*bs,bs);
203  _MatVecMultiplySubtract_(err,c+d*bs2,x+(ns+d)*bs,bs);
204  for (j=0; j<bs; j++) norm += (err[j]*err[j]);
205  }
206  for (d=0; d<ns; d++) {
207  double err[bs]; for (j=0; j<bs; j++) err[j] = rhs[(d+ns*(n-1))*bs+j];
208  _MatVecMultiplySubtract_(err,a+(d+ns*(n-1))*bs2,x+(d+ns*(n-2))*bs,bs);
209  _MatVecMultiplySubtract_(err,b+(d+ns*(n-1))*bs2,x+(d+ns*(n-1))*bs,bs);
210  _MatVecMultiplySubtract_(err,c+(d+ns*(n-1))*bs2,recvbufR+d*bs,bs);
211  for (j=0; j<bs; j++) norm += (err[j]*err[j]);
212  }
213  } else {
214  for (d=0; d<ns; d++) {
215  double err[bs]; for (j=0; j<bs; j++) err[j] = rhs[d*bs+j];
216  _MatVecMultiplySubtract_(err,a+d*bs2,recvbufL+d*bs,bs);
217  _MatVecMultiplySubtract_(err,b+d*bs2,x+d*bs,bs);
218  _MatVecMultiplySubtract_(err,c+d*bs2,recvbufR+d*bs,bs);
219  for (j=0; j<bs; j++) norm += (err[j]*err[j]);
220  }
221  }
222  /* sum over all processes */
223 #ifdef serial
224  global_norm = norm;
225 #else
226  MPI_Allreduce(&norm,&global_norm,1,MPI_DOUBLE,MPI_SUM,*comm);
227 #endif
228  global_norm = sqrt(global_norm/NT);
229  if (!iter) norm0 = global_norm;
230  } else {
231  norm = -1.0;
232  global_norm = -1.0;
233  }
234 
235 #ifdef serial
236  if (context->verbose)
237 #else
238  if (context->verbose && (!rank))
239 #endif
240  printf("\t\titer: %d, norm: %1.16E\n",iter,global_norm);
241 
242  /* correct the solution for this iteration */
243  if (n > 1) {
244  for (d=0; d<ns; d++) {
245  double xt[bs],binv[bs2];
246 
247  i = 0;
248  for (j=0; j<bs; j++) xt[j] = rhs[(i*ns+d)*bs+j];
249  _MatVecMultiplySubtract_(xt,a+(i*ns+d)*bs2,recvbufL+d*bs,bs);
250  _MatVecMultiplySubtract_(xt,c+(i*ns+d)*bs2,x+(d+ns*(i+1))*bs,bs);
251  _MatrixInvert_(b+(i*ns+d)*bs2,binv,bs);
252  _MatVecMultiply_(binv,xt,x+(i*ns+d)*bs,bs);
253 
254  i = n-1;
255  for (j=0; j<bs; j++) xt[j] = rhs[(i*ns+d)*bs+j];
256  _MatVecMultiplySubtract_(xt,a+(i*ns+d)*bs2,x+(d+ns*(i-1))*bs,bs);
257  _MatVecMultiplySubtract_(xt,c+(i*ns+d)*bs2,recvbufR+d*bs,bs);
258  _MatrixInvert_(b+(i*ns+d)*bs2,binv,bs);
259  _MatVecMultiply_(binv,xt,x+(i*ns+d)*bs,bs);
260  }
261  for (i=1; i<n-1; i++) {
262  for (d=0; d<ns; d++) {
263  double xt[bs],binv[bs2];
264  for (j=0; j<bs; j++) xt[j] = rhs[(i*ns+d)*bs+j];
265  _MatVecMultiplySubtract_(xt,a+(i*ns+d)*bs2,x+(d+ns*(i-1))*bs,bs);
266  _MatVecMultiplySubtract_(xt,c+(i*ns+d)*bs2,x+(d+ns*(i+1))*bs,bs);
267  _MatrixInvert_(b+(i*ns+d)*bs2,binv,bs);
268  _MatVecMultiply_(binv,xt,x+(i*ns+d)*bs,bs);
269  }
270  }
271  } else {
272  for (d=0; d<ns; d++) {
273  double xt[bs],binv[bs2];
274  for (j=0; j<bs; j++) xt[j] = rhs[d*bs+j];
275  _MatVecMultiplySubtract_(xt,a+d*bs2,recvbufL+d*bs,bs);
276  _MatVecMultiplySubtract_(xt,c+d*bs2,recvbufR+d*bs,bs);
277  _MatrixInvert_(b+d*bs2,binv,bs);
278  _MatVecMultiply_(binv,xt,x+d*bs,bs);
279  }
280  }
281 
282  /* finished with this iteration */
283  iter++;
284  }
285 
286  /* save convergence information */
287  context->exitnorm = (context->evaluate_norm ? global_norm : -1.0);
288  context->exititer = iter;
289 
290  free(rhs);
291  free(sendbufL);
292  free(sendbufR);
293  free(recvbufL);
294  free(recvbufR);
295 
296  return(0);
297 }
int verbose
Definition: tridiagLU.h:99
#define _MatVecMultiplySubtract_(y, A, x, N)
Definition: matops.h:81
int maxiter
Definition: tridiagLU.h:94
double exitnorm
Definition: tridiagLU.h:98
#define _MatrixInvert_(A, B, N)
Definition: matops.h:93
int exititer
Definition: tridiagLU.h:97
#define _MatVecMultiply_(A, x, y, N)
Definition: matops.h:67
int evaluate_norm
Definition: tridiagLU.h:93