
Performance of MPICH-MX 1.2.7..1 over MX-10G
Uniprocessor case: one process per node,
two bonded NICs per node
Performance measurements are presented here for MPICH-MX over MX-10G for the Point to Point Communication tests in the Intel MPI Benchmark (IMB) Suite, Version 2.3.
We will provide performance data for the Collective Communication tests soon. Each benchmark is run with varying message lengths, and timings are averaged over multiple samples.
The environment for these tests consists of two quad-processor 3.2 GHz Supermicro X7DB8 (Intel Woodcrest) machines with two 10G-PCIE-8A-C NICs in each machine. The two machines were connected point-to-point (switchless). Each machine has 8 GB of memory and was running Linux 2.6.17.11, MX-10G 1.2.0h, and MPICH-MX 1.2.7..1. The Intel MPI Benchmark was compiled with gcc 4.0 with -O.
Notes:
Point to point communication performance is measured between two processes. Latency is measured in microseconds (µs, shown as us in the graphs), and bandwidth is measured in MB/s. The latency scale is logarithmic and the bandwidth scale is linear.
PingPong is the classical pattern used for measuring startup (latency) and throughput (bandwidth) of a single message sent between two processes.

As PingPong, PingPing measures the startup and throughput of a single message sent between two processes, with the crucial difference that messages are obstructed by oncoming messages. For this test, two processes communicate (MPI_Isend/MPI_Recv/MPI_Wait) with each other, with the MPI_Isend's issued concurrently.

Based on MPI_Sendrecv(), the processes form a periodic communication chain. Each process sends to the right and receives from the left neighbor in the chain. The turnover count is 2 messages sample (1 in, 1 out) for each process.
For 2 processes, Sendrecv will report the bi-directional bandwidth of the system, as obtained by the (optimized) MPI_Sendrecv function. The results below are for 2 processes.

Exchange is a communications pattern that often occurs in grid splitting algorithms (boundary exchanges). The group of processes is seen as a periodic chain, and each process exchanges data with both left and right neighbor in the chain.
The turnover count is 4 messages per sample (2 in, 2 out) for each process. The results below are for 2 processes.

![]()
Last updated: 27 January 2007