MPI Programming — Part 1
Objectives
• Basic structure of MPI code
• MPI communicators
• Sample programs
1
Introduction to MPI
The Message Passing Interface (MPI) is a library of
subroutines (in Fortran) or function calls (in C) that
can be used to implement a message-passing program.
MPI allows the coordination of a program running
as multiple processes in a distributed-memory
environment, yet it is flexible enough to also be used
in a shared-memory environment.
MPI programs can be used and compiled on a
wide variety of single platforms or (homogeneous or
heterogeneous) clusters of computers over a network.
The MPI library is standardized, so working code
containing MPI subroutines and function calls should
work (without further changes!) on any machine on
which the MPI library is installed.
2
A very brief history of MPI
MPI was developed over two years of discussions led
by the MPI Forum, a group of roughly sixty people
representing some forty organizations.
The MPI 1.0 standard was defined in Spring of 1994.
The MPI 2.0 standard was defined in Fall of 1997.
MPI 2.0 was such a complicated enhancement to the
MPI standard that very few implementations exist!
Learning MPI can seem intimidating: MPI 1.1 has
more than 125 different commands!
However, most programmers can accomplish what they
want from their programs while sticking to a small
subset of MPI commands (as few as 6).
In this course, we will stick to MPI 1.1.
3
Basic structure of MPI code
MPI programs have the following general structure:
• include the MPI header file
• declare variables
• initialize MPI environment
• < compute, communicate, etc. >
• finalize MPI environment
Notes:
• The MPI environment can only be initialized once
per program, and it cannot be initialized before the
MPI header file is included.
• Calls to MPI routines are not recognized before the
MPI environment is initialized or after it is finalized.
4
MPI header files
Header files are more commonly used in C than in older
versions of Fortran, such as FORTRAN77.
In general, header files are usually source code for
declarations of commonly used constructs.
Specifically, MPI header files contain the prototypes
for MPI functions/subroutines, as well as definitions of
macros, special constants, and datatypes used by MPI.
An include statement must appear in any source file
that contains MPI function calls or constants.
In Fortran, we type
INCLUDE ‘mpif.h‘
In C, the equivalent is
#include <mpi.h>
5
MPI naming conventions
All MPI entities (subroutines, constants, types, etc.)
begin with MPI to highlight them and avoid conflicts.
In Fortran, they have the general form
MPI XXXXX(parameter,...,IERR)
For example,
MPI INIT(IERR)
The difference in C is basically the absence of IERR:
MPI Xxxxx(parameter,...)
MPI constants are always capitalized, e.g.,
MPI COMM WORLD, MPI REAL, etc.
In Fortran, MPI entities are always associated with the
INTEGER data type.
6
MPI subroutines and return values
An MPI subroutine returns an error code that can be
checked for the successful operation of the subroutine.
This error code is always returned as an INTEGER in
the variable given by the last argument , e.g.,
INTEGER IERR
...
CALL MPI INIT(IERR)
...
The error code returned is equal to the pre-defined
integer constant MPI SUCCESS if the routine ran
successfully:
IF (IERR.EQ.MPI SUCCESS) THEN
do stuff given that routine ran correctly
END IF
If an error occurred, then IERR returns an
implementation-dependent value indicating the specific
error, which can then be handled appropriately.
7
MPI handles
MPI defines and maintains its own internal data
structures related to communication, etc.
These data structures are accessed through handles.
Handles are returned by various MPI calls and may be
used as arguments in other MPI calls.
In Fortran, handles are integers or arrays of integers;
any arrays are indexed from 1.
For example,
• MPI SUCCESS is an integer used to test error codes.
• MPI COMM WORLD is an integer representing a pre-
defined communicator consisting of all processes.
Handles may be copied using the standard assignment
operation.
8
MPI data types
MPI has its own reference data types corresponding to
elementary data types in Fortran or C:
• Variables are normally declared as Fortran/C types.
• MPI type names are used as arguments to MPI
routines when needed.
• Arbitrary data types may be built in MPI from the
intrinsic Fortran/C data types.
MPI shifts the burden of details such as the floating-
point representation to the implementation.
MPI allows for automatic translation between
representations in heterogeneous environments.
In general, the MPI data type in a receive must match
that specified in the send.
9
Basic MPI data types
MPI data type Fortran data type
MPI INTEGER INTEGER
MPI REAL REAL
MPI DOUBLE PRECISION DOUBLE PRECISION
MPI COMPLEX COMPLEX
MPI LOGICAL LOGICAL
MPI CHARACTER CHARACTER
MPI PACKED user-defined
MPI data type C data type
MPI INT signed integer
MPI FLOAT float
MPI DOUBLE double
MPI CHAR signed char
MPI PACKED user-defined
10
Communicators
A communicator is a handle representing a group of
processes that can communicate with one another.
The communicator name is required as an argument
to all point-to-point and collective operations.
• The communicator specified in the send and receive
calls must agree for communication to take place.
• Processes can communicate only if they share a
communicator.
There can be many communicators, and a process can
be a member of a number of different communicators.
Within each communicator, processes are numbered
consecutively (starting at 0).
This identifying number is known as the rank of the
process in that communicator.
11
Communicators
The rank is also used to specify the source and
destination in send and receive calls.
If a process belongs to more than one communicator,
its rank in each can (and usually will) be different.
MPI automatically provides a basic communicator
called MPI COMM WORLD that consists of all available
processes.
Every process can communicate with every other
process using the MPI COMM WORLD communicator.
Additional communicators consisting of subsets of the
available processes can also be defined.
12
Getting communicator rank
A process can determine its rank in a communicator
with a call to MPI COMM RANK.
In Fortran, this call looks like
CALL MPI COMM RANK(COMM, RANK, IERR)
where the arguments are all of type INTEGER.
COMM contains the name of the communicator, e.g.,
MPI COMM WORLD.
The rank of the process within the communicator COMM
is returned in RANK.
The analogous syntax in C looks like
int MPI Comm rank(
MPI Comm comm /* in */
int* my rank p /* out */);
13
Getting communicator size
A process can also determine the size (number of
processes) of any communicator to which it belongs
with a call to MPI COMM SIZE.
In Fortran, this call looks like
CALL MPI COMM SIZE(COMM, SIZE, IERR)
where the arguments are all of type INTEGER.
Again, COMM contains the name of the communicator.
The number of processes associated with the
communicator COMM is returned in SIZE.
The analogous syntax in C looks like
int MPI Comm size(
MPI Comm comm /* in */
int* comm sz p /* out */);
14
Finalizing MPI
The last call to an MPI routine in any MPI program
should be to MPI FINALIZE.
MPI FINALIZE cleans up all MPI data structures,
cancels incomplete operations, etc.
MPI FINALIZE must be called by all processes!
If any processes do not call MPI FINALIZE, the
program will hang.
Once MPI FINALIZE has been called, no other MPI
routines (including MPI INIT!) may be called.
In Fortran, the call looks like
CALL MPI FINALIZE(IERR)
In C, the analogous syntax is
int MPI Finalize(void);
15
hello, world — serial Fortran
Here is a simple serial Fortran90 program called
helloWorld.f90 to print the message “hello, world”:
PROGRAM helloWorld
PRINT *, "hello, world"
END PROGRAM helloWorld
Compiled and run with the commands
gfortran helloWorld.f90 -o helloWorld
./helloWorld
produces the unsurprising output
hello, world
16
hello, world with MPI
hello, world is the classic first program of anyone
using a new computer language.
Here is a simple MPI version using Fortran90.
PROGRAM helloWorldMPI
INCLUDE ’mpif.h’
INTEGER IERR
! Initialize MPI environment
CALL MPI_INIT(IERR)
PRINT *, "hello, world!"
! Finalize MPI environment
CALL MPI_FINALIZE(IERR)
END PROGRAM helloWorldMPI
17
hello, world with MPI
To compile and link this program to an executable
called helloWorldMPI, we could do something like
>> mpif90 -c helloWorldMPI.f90
>> mpif90 -o helloWorldMPI helloWorldMPPI.o
or all at once using
>> mpif90 helloWorldMPI.f90 -o helloWorldMPI
Without a queuing system, we could use
>> mpirun -np 4 ./helloWorldMPI
Thus, when run on four processes, the output of this
program is
hello, world
hello, world
hello, world
hello, world
18
hello, world, the sequel
We now modify the hello, world program so that
each process prints its rank as well as the total number
of processes in the communicator MPI COMM WORLD.
PROGRAM helloWorld2MPI
INCLUDE ’mpif.h’
! Declare variables
INTEGER RANK, SIZE, IERR
! Initialize MPI environment
CALL MPI_INIT(IERR)
! Not checking for errors :)
! Get the rank
CALL MPI_COMM_RANK(MPI_COMM_WORLD, RANK, IERR)
! Get the size
CALL MPI_COMM_SIZE(MPI_COMM_WORLD, SIZE, IERR)
! Display the result
PRINT *, "Processor", RANK, "of ", SIZE, &
"says, ’hello, world’"
! Finalize MPI environment
CALL MPI_FINALIZE(IERR)
END PROGRAM helloWorld2MPI
19
Running this code on 6 processes could produce
something like:
spiteri@robson:~/test> mpirun -np 6 ./helloWorld2MPI
Process 0 of 6 says, ’hello, world’
Process 2 of 6 says, ’hello, world’
Process 1 of 6 says, ’hello, world’
Process 3 of 6 says, ’hello, world’
Process 4 of 6 says, ’hello, world’
Process 5 of 6 says, ’hello, world’
20
hello, world, the sequel
Let’s take a look at a variant of this program in C:
#include <stdio.h>
#include <string.h> /* For strlen */
#include <mpi.h> /* For MPI functions, etc */
const int MAX_STRING = 100;
int main(void)
char greeting[MAX_STRING]; /* String storing message */
int comm_sz; /* Number of processes */
int my_rank; /* My process rank */
/* Start up MPI */
MPI_Init(NULL, NULL);
/* Get the number of processes */
MPI_Comm_size(MPI_COMM_WORLD, &comm_sz);
/* Get my rank among all the processes */
MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
21
if (my_rank != 0)
/* Create message */
sprintf(greeting, "Greetings from process %d of %d!",
my_rank, comm_sz);
/* Send message to process 0 */
MPI_Send(greeting, strlen(greeting)+1, MPI_CHAR, 0, 0,
MPI_COMM_WORLD);
else
/* Print my message */
printf("Greetings from process %d of %d!\n", my_rank, comm_sz);
for (int q = 1; q < comm_sz; q++)
/* Receive message from process q */
MPI_Recv(greeting, MAX_STRING, MPI_CHAR, q,
0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
/* Print message from process q */
printf("%s\n", greeting);
/* Shut down MPI */
MPI_Finalize();
return 0;
/* main */
22
hello, world, the sequel
Recall we compile this code via
mpicc -g -Wall -o mpi hello mpi hello.c
The code is run with (basically) an identical call as for
the Fortran program:
mpirun -n <number of processes> ./mpi hello
So the output from the command
mpirun -n 4 ./mpi hello
will be
Greetings from process 0 of 4!
Greetings from process 1 of 4!
Greetings from process 2 of 4!
Greetings from process 3 of 4!
23
hello, world, the sequel
The main difference between the Fortran and C
programs is that the latter sends the greeting as a
message for process 0 to receive (and print).
The syntax of MPI Send is
int MPI_Send(
void* msg buf p /* in */,
int msg size /* in */,
MPI_Datatype msg_type /* in */,
int dest /* in */,
int tag /* in */,
MPI_Comm communicator /* in */);
The first three arguments determine the contents of
the message; the last three determine the destination.
24
hello, world, the sequel
msg buf p is a pointer to the block of memory
containing the contents of the message, in this case
the string greeting.
msg size is the number of characters in the string
plus 1 for the ’\0’ string termination character in C.
msg type is of type MPI CHAR; so together we see that
the message contains strlen(greeting)+1 chars.
Note: the size of the string greeting is not necessarily
the same as that specified by msg size and msg type.
dest gives the destination process.
tag is a non-negative int that can be used to
distinguish messages that might otherwise be identical;
e.g., when sending a number of floats, a tag can
indicate whether a given float should be printed or
used in a computation.
Finally, MPI Comm is the communicator; dest is defined
relative to it.
25
hello, world, the sequel
MPI Recv has a similar (complementary) syntax to
MPI Send, but with a few important differences.
int MPI_Recv(
void* msg buf p /* out */,
int buf size /* in */,
MPI_Datatype buf_type /* in */,
int source /* in */,
int tag /* in */,
MPI_Comm communicator /* in */
MPI_Status status_p /* out */);
Again, the first three arguments specify the memory
available (buffer) for receiving the message.
The next three identify the message.
Both tag and communicator must match those sent
by the sending process.
We discuss status p shortly.
It is not often used, so in the example the MPI constant
MPI STATUS IGNORE was passed.
26
hello, world, the sequel
In order for a message to be received by MPI Recv, it
must be matched to a send.
Necessary but not sufficient conditions are that the
tags and communicators match and the source and
destination processes must be consistent; e.g., process
A is expecting a message from process B.
The only other caveat is that the sending and receiving
buffers must be compatible.
We will assume this means they are of the same type
and that the receiving buffer size is at least as large as
the sending buffer size.
27
hello, world, the sequel
One needs to beware that receiving processes are
generally receiving messages from many sending
processes, and it is generally impossible to predict
the order in which messages are sent (or received).
To allow for flexibility in handling received messages,
MPI provides a special constant MPI ANY SOURCE as
a wildcard that can be used in MPI Recv.
Similarly, the wildcard constant MPI ANY TAG can be
used in MPI Recv to handle messages with any tag.
Two final notes:
1. Only receiving processes can use wildcards; sending
processes must specify a destination process rank
and a non-negative tag.
2. There is no wildcard that can be used for
communicators; both sending and receiving
processes must specify communicators.
28
hello, world, the sequel
It is possible for a receiving process to successfully
receive a message while not knowing the sender, the
tag, and the size of the message.
All this information is of course known and can be
accessed via the status p argument.
status p has type MPI Status*, which is a struct
having at least the members MPI SOURCE, MPI TAG,
and MPI ERROR.
We can determine the sender and tag of a received
message from MPI Recv by accessing
status_p.MPI_SOURCE
status_p.MPI_TAG
The amount of data received is not directly accessible
but can be retrieved with a call to MPI Get count.
MPI_Get_count(&status_p, recv_type, &count)
Of course, all of these issues are not necessary and
potentially should be avoided in the name of efficiency.
29
hello, world, the sequel
What happens exactly when a message is sent from one
process to another depends on the particular system,
but the general procedure is the following.
The sending process assembles the message, i.e., the
“address” information plus the data themselves.
The sending process then either buffers message and
returns or blocks; i.e., it does not return until it can
begin transmission.
(Other functions are available if it is important that we
know when a message is actually received, etc.)
Specific details depend on the implementation, but
typically messages that are below a certain threshold
in size are automatically buffered by MPI Send.
30
hello, world, the sequel
In contrast, MPI Recv always blocks until a matching
message has been received.
So when a call from MPI Recv returns, we know
(barring errors!) that the message has been safely
stored in the receive buffer.
(There is a non-blocking version of this function
that returns having only checked whether a matching
message is available, whether one is or not.)
MPI also requires that messages be non-overtaking,
i.e., the order in which messages sent by one processor
to another must be reflected in the order in which they
can be received.
However, the order in which messages are received (let
alone sent!) by other processes cannot be controlled.
As usual, it is important for programmers to maintain
the mindset that the processes run autonomously.
31
hello, world, the sequel
A potential pitfall of blocking communication is that if
a process tries to receive a message for which there is
no matching send, it will block forever or deadlock.
In other words, the program will hang.
It is therefore critical when programming to ensure
every receive has a matching send.
This includes ensuring that tags match and that the
source and destination are never the same process so
the program does not hang but also so that messages
are not inadvertantly received!
Similarly, unmatched calls to MPI Send will typically
hang the sending process; “best” case, the message is
buffered so the process can continue but the message
will be lost.
32
Sample Program: Trapezoidal Rule
Historically in mathematics, quadrature refers to the
act of trying to find a square with the same area of a
given circle.
In mathematical computing, quadrature refers to the
numerical approximation of definite integrals.
Let f (x) be a real-valued function of a real variable,
defined on a finite interval a ≤ x ≤ b.
We seek to compute the value of the definite integral
Z b
f (x) dx.
a
Note: This is a real number.
In line with its historical meaning, “quadrature” might
conjure up the vision of plotting the function on
graph paper and counting the little squares that lie
underneath the curve.
33
Sample Program: Trapezoidal Rule
Let ∆x = b−a be the length of the integration interval.
The trapezoidal rule T approximates the integral by
the area of a trapezoid with base ∆x and sides equal
to the values of f (x) at the two end points.
f (a) + f (b)
T = ∆x .
2
Put differently, the trapezoidal rule forms a linear
interpolant between (a, f (a)) and (b, f (b)) and
integrates the interpolant exactly to define the rule.
Naturally the error in using the trapezoidal rule depends
on ∆x: in general, as ∆x increases, so does the error.
34
Sample Program: Trapezoidal Rule
This leads to the composite trapezoidal rule:
Assume the domain [a, b] is partitioned into n (equal)
subintervals so that
b−a
∆x = .
n
Let xi = a + i∆x, i = 0, 1, . . . , n.
Then the composite trapezoidal rule is
n
∆x X
Tn = (f (xi−1) + f (xi))
2 i=1
" n−1
#
f (a) + f (b) X
= ∆x + f (xi) .
2 i=1
35
Sample Program: Trapezoidal Rule
Serial code for this method might look like
/* input a, b, n */
dx = (b-a)/n;
approx = (f(a)+f(b))/2;
for (i = 1; i <= n-1; i++){
x_i = a + i*dx;
approx += f(x_i);
}
approx = dx*approx;
36
Sample Program: Trapezoidal Rule
Following Foster’s methodology, we
1. partition the problem (find the area of a single
trapezoid, add them up)
2. identify communication (information about single
trapezoid scattered, single areas gathered)
3. aggregate tasks (there are probably more trapezoids
than cores, so split [a, b] into comm sz subintervals)
4. map tasks to cores (subintervals to cores; results
back to process 0)
37
Sample Program: Trapezoidal Rule
/* File: mpi_trap1.c
* Purpose: Use MPI to implement a parallel version of the trapezoidal
* rule. In this version the endpoints of the interval and
* the number of trapezoids are hardwired.
*
* Input: None.
* Output: Estimate of the integral from a to b of f(x)
* using the trapezoidal rule and n trapezoids.
*
* Compile: mpicc -g -Wall -o mpi_trap1 mpi_trap1.c
* Run: mpiexec -n <number of processes> ./mpi_trap1
*
* Algorithm:
* 1. Each process calculates "its" interval of
* integration.
* 2. Each process estimates the integral of f(x)
* over its interval using the trapezoidal rule.
* 3a. Each process != 0 sends its integral to 0.
* 3b. Process 0 sums the calculations received from
* the individual processes and prints the result.
*
* Note: f(x), a, b, and n are all hardwired.
*
* IPP: Section 3.2.2 (pp. 96 and ff.)
*/
#include <stdio.h>
/* We’ll be using MPI routines, definitions, etc. */
#include <mpi.h>
/* Calculate local integral */
double Trap(double left_endpt, double right_endpt, int trap_count,
double base_len);
38
/* Function we’re integrating */
double f(double x);
int main(void) {
int my_rank, comm_sz, n = 1024, local_n;
double a = 0.0, b = 3.0, dx, local_a, local_b;
double local_int, total_int;
int source;
/* Let the system do what it needs to start up MPI */
MPI_Init(NULL, NULL);
/* Get my process rank */
MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
/* Find out how many processes are being used */
MPI_Comm_size(MPI_COMM_WORLD, &comm_sz);
dx = (b-a)/n; /* dx is the same for all processes */
local_n = n/comm_sz; /* So is the number of trapezoids */
/* Length of each process’ interval of
* integration = local_n*dx. So my interval
* starts at: */
local_a = a + my_rank*local_n*dx;
local_b = local_a + local_n*dx;
local_int = Trap(local_a, local_b, local_n, dx);
/* Add up the integrals calculated by each process */
if (my_rank != 0) {
MPI_Send(&local_int, 1, MPI_DOUBLE, 0, 0,
MPI_COMM_WORLD);
} else {
total_int = local_int;
for (source = 1; source < comm_sz; source++) {
MPI_Recv(&local_int, 1, MPI_DOUBLE, source, 0,
MPI_COMM_WORLD, MPI_STATUS_IGNORE);
39
total_int += local_int;
}
}
/* Print the result */
if (my_rank == 0) {
printf("With n = %d trapezoids, our estimate\n", n);
printf("of the integral from %f to %f = %.15e\n",
a, b, total_int);
}
/* Shut down MPI */
MPI_Finalize();
return 0;
} /* main */
/*------------------------------------------------------------------
* Function: Trap
* Purpose: Serial function for estimating a definite integral
* using the trapezoidal rule
* Input args: left_endpt
* right_endpt
* trap_count
* base_len
* Return val: Trapezoidal rule estimate of integral from
* left_endpt to right_endpt using trap_count
* trapezoids
*/
double Trap(
double left_endpt /* in */,
double right_endpt /* in */,
int trap_count /* in */,
double base_len /* in */) {
double estimate, x;
int i;
40
estimate = (f(left_endpt) + f(right_endpt))/2.0;
for (i = 1; i <= trap_count-1; i++) {
x = left_endpt + i*base_len;
estimate += f(x);
}
estimate = estimate*base_len;
return estimate;
} /* Trap */
/*------------------------------------------------------------------
* Function: f
* Purpose: Compute value of function to be integrated
* Input args: x
*/
double f(double x) {
return x*x;
} /* f */
41
Summary
• MPI header file, initialize / finalize MPI environment
• MPI entities (naming, error codes, handles,
data types)
• hello, world, and hello, world, the sequel
• Basic trapezoidal rule
42