Myrinet logotype

Sockets-MX Performance Results for Myri-2G

Performance

Depending on your configuration a benchmark such as netperf achieves up to 3.9Gbps with a CPU utilization of 7%.
Latency numbers based on round trip communication are slightly higher than the numbers of MX. The netperf TCP_RR as well as netpipe latency test show less than 5usec for a TCP/IP socket application.
Results of netperf, netpipe and MPI benchmarks are presented in the following. These results were obtained with Myrinet E cards which offer bidirectional bandwidth of 4+4 Gbps. It can be observed that Sockets-MX matches this performance.

-- LATENCY:
fischer@serenade2:~/Sockets-MX_MODULE$ ~/NetPIPE_3.6.2/NPtcp -h serenade1 -u 128
Send and receive buffers are 135168 and 135168 bytes
(A bug in Linux doubles the requested buffer sizes)
Now starting the main loop
  0:       1 bytes  19082 times -->      1.52 Mbps in       4.92 usec
  1:       2 bytes  19923 times -->      3.05 Mbps in       5.00 usec
  2:       3 bytes  19986 times -->      4.57 Mbps in       5.01 usec
  3:       4 bytes  13302 times -->      6.08 Mbps in       5.02 usec
  4:       6 bytes  14938 times -->      8.98 Mbps in       5.10 usec
  5:       8 bytes   9805 times -->     11.96 Mbps in       5.10 usec
  6:      12 bytes  12249 times -->     17.68 Mbps in       5.18 usec
  7:      13 bytes   8046 times -->     19.14 Mbps in       5.18 usec
  8:      16 bytes   8907 times -->     23.72 Mbps in       5.15 usec
  9:      19 bytes  10930 times -->     27.78 Mbps in       5.22 usec
 10:      21 bytes  12103 times -->     30.60 Mbps in       5.24 usec
 11:      24 bytes  12732 times -->     35.31 Mbps in       5.19 usec
 12:      27 bytes  13661 times -->     39.31 Mbps in       5.24 usec
 13:      29 bytes   8481 times -->     41.97 Mbps in       5.27 usec
 14:      32 bytes   9158 times -->     46.61 Mbps in       5.24 usec
 15:      35 bytes  10142 times -->     48.60 Mbps in       5.49 usec
 16:      45 bytes  10400 times -->     59.33 Mbps in       5.79 usec
 17:      48 bytes  11520 times -->     64.83 Mbps in       5.65 usec
 18:      51 bytes  12170 times -->     66.72 Mbps in       5.83 usec
 19:      61 bytes   6724 times -->     79.14 Mbps in       5.88 usec
 20:      64 bytes   8362 times -->     85.39 Mbps in       5.72 usec
 21:      67 bytes   9017 times -->     85.92 Mbps in       5.95 usec
 22:      93 bytes   9031 times -->    117.09 Mbps in       6.06 usec
 23:      96 bytes  11001 times -->    121.32 Mbps in       6.04 usec
 24:      99 bytes  11215 times -->    121.71 Mbps in       6.21 usec
 25:     125 bytes   5859 times -->    149.83 Mbps in       6.36 usec
 26:     128 bytes   7792 times -->    156.80 Mbps in       6.23 usec

-- BANDWIDTH:
fischer@atipa4:~/Sockets-MX_MODULE$ ./tests/netperf/netperf -l 1 -H atipa3 -- -m 4000 -M 4000
TCP STREAM TEST to atipa3
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec

 65535  65535   4000    1.00     3388.58


Another important factor is the performance of accept/connect handling.
-- CONNECTION PERFORMANCE:
-- Dual Xeons, Linux 2.4, E cards

GigEth for comparison

./tests/netperf-2.2pl4/netperf -p 5000 -l 1 -t TCP_CC -H atipa1
TCP Connect/Close TEST to atipa1
Local /Remote
Socket Size   Request  Resp.   Elapsed  Trans.
Send   Recv   Size     Size    Time     Rate
bytes  Bytes  bytes    bytes   secs.    per sec

262142 262142 1        1       0.99     4216.39
16384  87380

TCP/IP over MX

./tests/netperf-2.2pl4/netperf -p 5000 -l 1 -t TCP_CC -H 192.168.1.1
TCP Connect/Close TEST to 192.168.1.1
Local /Remote
Socket Size   Request  Resp.   Elapsed  Trans.
Send   Recv   Size     Size    Time     Rate
bytes  Bytes  bytes    bytes   secs.    per sec

262142 262142 1        1       1.00     10119.09
16384  87380

Sockets-MX

./tests/netperf-2.2pl4/netperf -l 1 -t TCP_CC -H atipa1
TCP Connect/Close TEST to atipa1
Local /Remote
Socket Size   Request  Resp.   Elapsed  Trans.
Send   Recv   Size     Size    Time     Rate
bytes  Bytes  bytes    bytes   secs.    per sec

262142 262142 1        1       1.00     18434.88
65535  65535

This means that the number of connections per second
are more than 4 times higher than traditional TCP/IP.

-- Dual Opterons:

./tests/netperf-2.2pl4/netperf -l 1 -t TCP_CC -H serenade1
TCP Connect/Close TEST to serenade1
Local /Remote
Socket Size   Request  Resp.   Elapsed  Trans.
Send   Recv   Size     Size    Time     Rate
bytes  Bytes  bytes    bytes   secs.    per sec

1048574 1048574 1        1       1.00     23449.96
524287 135168

-- Intel Pallas Benchmark
Sockets-MX can also speed up HPC applications in binary format which use TCP/IP.
For the following test the PMB benchmark was compiled and run under LAMP MPI.
The binary was pointed to the AF_MYRI protocol and reports a latency (with MPI overhead) of 5.33usec.
#---------------------------------------------------
# Benchmarking PingPong
#---------------------------------------------------
       #bytes #repetitions      t[usec]   Mbytes/sec
            0        10000         5.33         0.00
            1        10000         8.88         0.11
            2        10000         8.82         0.22
            4        10000         8.83         0.43
            8        10000         8.89         0.86
           16        10000         8.89         1.72
           32        10000         8.82         3.46
           64        10000         9.19         6.64
          128        10000         9.17        13.31
          256        10000        10.67        22.89
          512        10000        12.56        38.88
         1024        10000        16.09        60.69
         2048        10000        18.41       106.08
         4096        10000        24.13       161.90
         8192         5120        40.23       194.18
        16384         2560        72.42       215.75
        32768         1280       109.07       286.50
        65536          640       175.53       356.07
       131072          320       327.27       381.95
       262144          160       608.50       410.84
       524288           80      1171.41       426.84
      1048576           40      2312.47       432.44
      2097152           20      4566.38       437.98
      4194304           10      9077.20       440.66


Full Pallas