Myri-10G
10-Gigabit Ethernet
Performance Measurements

We report performance measurements for Myri-10G NICs using our 10-Gigabit Ethernet driver, Myri10GE, on Linux, Windows, Solaris, MacOSX, and FreeBSD.

Linux | Windows | Solaris GLDv2 | Solaris GLDv3 | MacOSX | FreeBSD

Linux

Benchmark: netperf version 2.4.5
OS: Centos5 x86_64 2.6.18-128.1.16.el5 kernel
NICs: Myri-10G 10G-PCIE-8B
Driver: Myri10GE version 1.5.0
Interrupt Coalescing: 75 µs
TCP Segmentation Offload (TSO): enabled
Large Receive Offload (LRO): enabled
Hosts: Asus RS500-E6-PS4 systems with dual Intel quad-core 2.93GHz
Xeon X5570s (8 2.93GHz Nehalem cores)
Topology: point-to-point (switchless)

For these Linux tests, TCP buffer sizes were increased and TCP timestamps were disabled as recommended in the Performance Tuning section of the Linux Myri10GE README, and the netserver was run without options. Performance is measured using 9000-byte (jumbo) frames and 1500-byte (standard) frames, and bandwidth (BW) is measured in Megabits/second.

Netperf Results, MTU 9000

Commands:
     $ netperf -H asus02-m -t TCP_STREAM -C -c -l 60
     $ netperf -H asus02-m -t TCP_SENDFILE -l 60 -C -c -F /boot/vmlinuz-2.6.18-128.1.16.el5 
     $ netperf -H asus02-m -t UDP_STREAM -l 60 -C -c -- -m 8972 -s 4M -S 4M
  
Results:
     Netperf Test   MTU    BW       TX_CPU %   RX_CPU %
     ------------   ----   -------  --------   --------
     TCP_STREAM     9000   9910.33   4.52       2.84
     TCP_SENDFILE   9000   9910.32   2.71	2.82
     UDP_STREAM_TX  9000   9924.70   5.73	0.00
     UDP_STREAM_RX  9000   9924.70   0.00	3.66
  

Netperf Results, MTU 1500

Commands:
     $ netperf -H asus02-m -t TCP_STREAM -C -c -l 60
     $ netperf -H asus02-m -t TCP_SENDFILE -l 60 -C -c -F /boot/vmlinuz-2.6.18-128.1.16.el5 
     $ netperf -H asus02-m -t UDP_STREAM -l 60 -C -c -- -m 1472 -s 4M -S 4M
  
Results:
     Netperf Test   MTU    BW       TX_CPU %   RX_CPU %
     ------------   ----   -------  --------   --------
     TCP_STREAM     1500   9477.10   4.62       5.57
     TCP_SENDFILE   1500   9452.54   2.56       5.63
     UDP_STREAM_TX  1500   9249.00  12.51       0.00
     UDP_STREAM_RX  1500   9249.00   0.00      11.59
  

Notes:


Windows

Benchmark: ntttcps and ntttcpr (from the Windows 2003 DDK)
OS: Windows Server 2003 x64 SP1 Edition
NICs: Myri-10G 10G-PCIE-8A
Driver: Myri10GE AMD64 version 1.0.1
Interrupt Coalescing: 25 µs
TCP Segmentation Offload (TSO): enabled
Checksum Offload: enabled
Flow Control: enabled
Hosts: Sender: Tyan S2895 motherboard with AMD single-core
dual-processor 2.6GHz Opteron
Receiver: Dell PowerEdge 2950
Topology: point-to-point (switchless)

For these Windows tests, no registry entries were added to the Windows 2003-based machines. Bandwidth (BW) is measured in Megabits/second.

One ntttcps process was run on one Windows host connected to one Windows host running one ntttcpr process.

Ntttcp Results, MTU 9000

Commands:
    Sender: ntttcps -m 1,1,10.0.130.50 -l 1048576 -n 100000 -w -v -a 8
    Receiver: ntttcpr -m 1,1,10.0.130.50 -l 1048576 -rb 2097152 -n 1000000 -w -v -a 8
Results on the Sender:
 
-----------------------------------------------------------------
|     Estimated Time to Complete Test at line speed (seconds)   |
-----------------------------------------------------------------

1000 Base-T  622 OC-12(ATM)  155 OC-3(ATM)  100 Base-T  10 Base-T
===========  ==============  =============  ==========  =========

        419             369           1408        2128       25000



------------------------------------------------------
|                   Output Summary                   |
------------------------------------------------------

Thread Realtime(s) Throughput(KB/s) Throughput(Mbit/s)
====== =========== ================ ==================

     0      85.500        1226404.678           9811.237


Total Bytes(MEG) Realtime(s) Average Frame Size Total Throughput(Mbit/s)
================ =========== ================== ========================

   104857.600000      85.500          60667.263                 9811.237


Total Buffers Throughput(Buffers/s) Pkts(sent/intr) Intr(count/s) Cycles/Byte
============= ===================== =============== ============= ===========

   100000.000              1169.591               1      23467.10         0.5


Packets Sent Packets Received Total Retransmits Total Errors Avg. CPU %
============ ================ ================= ============ ==========

     1728405           281845                 2            0      10.70
  
Results on the Receiver:
 
-----------------------------------------------------------------
|     Estimated Time to Complete Test at line speed (seconds)   |
-----------------------------------------------------------------

1000 Base-T  622 OC-12(ATM)  155 OC-3(ATM)  100 Base-T  10 Base-T
===========  ==============  =============  ==========  =========

        419             369           1408        2128       25000



------------------------------------------------------
|                   Output Summary                   |
------------------------------------------------------

Thread Realtime(s) Throughput(KB/s) Throughput(Mbit/s)
====== =========== ================ ==================

     0      85.735        1223043.098           9784.345


Total Bytes(MEG) Realtime(s) Average Frame Size Total Throughput(Mbit/s)
================ =========== ================== ========================

   104857.600000      85.735           8959.587                 9784.345


Total Buffers Throughput(Buffers/s) Pkts(recv/intr) Intr(count/s) Cycles/Byte
============= ===================== =============== ============= ===========

   100000.000              1166.385              29       4610.68         2.7


Packets Sent Packets Received Total Retransmits Total Errors Avg. CPU %
============ ================ ================= ============ ==========

      281837         11703396                 0            0      27.27
  

Notes:


Solaris GLDv2

Benchmark: netperf version 2.4.5
OS: OpenSolaris 2008.11 (snv_101b_rc2)
NICs: Myri-10G 10G-PCIE-8B
Driver: Myri10GE version AMD64 1.0.4
Interrupt Coalescing: 30 µs
Large Receive Offload (LRO): enabled
Hosts: Asus RS500-E6-PS4 systems with dual Intel quad-core 2.93GHz
Xeon X5570s (8 2.93GHz Nehalem cores)
Topology: point-to-point (switchless)

For these Solaris GLDv2 tests, the netserver was run without options. Performance is measured using 9000-byte (jumbo) frames and 1500-byte (standard) frames, and bandwidth (BW) is measured in Megabits/second.

Netperf Results, MTU 9000

Commands:
     $ netperf -H asus2-m -t TCP_STREAM -C -c -l 60  -T loc,remote -- -s 512K -S 512K 
     $ netperf -H asus2-m -t TCP_SENDFILE -F/var/tmp/scratch -C -c -l 60  -T loc,remote -- -s 512K -S 512K 
     $ netperf -H asus2-m -t UDP_STREAM -l 60 -C -c  -T loc,remote -- -m 8972 -s 512K -S 512K 

  
Results:
     Netperf Test   MTU    BW       TX_CPU %   RX_CPU %
     ------------   ----   -------  --------   --------
     TCP_STREAM     9000   9877.62   9.91      10.08
     TCP_SENDFILE   9000   9887.49  11.83      10.34
     UDP_STREAM_TX  9000   9880.90  17.51      00.00
     UDP_STREAM_RX  9000   9880.90  00.00      17.93
  

Netperf Results, MTU 1500

Commands:
     $ netperf -H asus2-m -t TCP_STREAM -C -c -l 60  -T loc,remote -- -s 1M -S 1M 
     $ netperf -H asus2-m -t TCP_SENDFILE -F/var/tmp/scratch -C -c -l 60  -T loc,remote -- -s 1M -S 1M 
     $ netperf -H asus2-m -t UDP_STREAM -l 60 -C -c  -T loc,remote -- -m 1472 -s 1M -S 1M 

  
Results:
     Netperf Test   MTU    BW       TX_CPU %   RX_CPU %
     ------------   ----   -------  --------   --------
     TCP_STREAM     1500   7787.70  17.59      19.51
     TCP_SENDFILE   1500   5775.41  24.65      17.16
     UDP_STREAM_TX  1500   5291.70  15.05      00.00
     UDP_STREAM_RX  1500   5165.40  00.00      28.63
  

Notes:


Solaris GLDv3

Benchmark: netperf version 2.4.5
OS: OpenSolaris 2008.11 (snv_101b_rc2)
NICs: Myri-10G 10G-PCIE-8B
Driver: Myri10GE version AMD64 1.4.5gldv3
Interrupt Coalescing: 125 µs
TCP Segmentation Offload (TSO): enabled
Large Receive Offload (LRO): enabled
Hosts: Asus RS500-E6-PS4 systems with dual Intel quad-core 2.93GHz
Xeon X5570s (8 2.93GHz Nehalem cores)
Topology: point-to-point (switchless)

For these Solaris GLDv3 tests, the netserver was run without options. Performance is measured using 9000-byte (jumbo) frames and 1500-byte (standard) frames, and bandwidth (BW) is measured in Megabits/second.

Netperf Results, MTU 9000

Commands:
     $ netperf -H asus2-m -t TCP_STREAM -C -c -l 60  -- -s 512K -S 512K 
     $ netperf -H asus2-m -t TCP_SENDFILE -F/var/tmp/scratch -C -c -l 60   -- -s 512K -S 512K 
     $ netperf -H asus2-m -t UDP_STREAM -l 60 -C -c   -- -m 8972 -s 512K -S 512K 

  
Results:
     Netperf Test   MTU    BW       TX_CPU %   RX_CPU %
     ------------   ----   -------  --------   --------
     TCP_STREAM     9000   9868.72   9.29       8.91
     TCP_SENDFILE   9000   9866.15  11.96       8.94
     UDP_STREAM_TX  9000   9925.20   9.33      00.00
     UDP_STREAM_RX  9000   9925.20  00.00       9.04
  

Netperf Results, MTU 1500

Commands:
     $ netperf -H asus2-m -t TCP_STREAM -C -c -l 60   -- -s 512K -S 512K 
     $ netperf -H asus2-m -t TCP_SENDFILE -F/var/tmp/scratch -C -c -l 60   -- -s 512K -S 512K 
     $ netperf -H asus2-m -t UDP_STREAM -l 60 -C -c   -- -m 1472 -s 512K -S 512K 

  
Results:
     Netperf Test   MTU    BW       TX_CPU %   RX_CPU %
     ------------   ----   -------  --------   --------
     TCP_STREAM     1500   9345.75   7.75      20.32
     TCP_SENDFILE   1500   9285.96   9.15      20.98
     UDP_STREAM_TX  1500   5978.60  12.55      00.00
     UDP_STREAM_RX  1500   5978.60  00.00      24.20
  

Notes:


MacOSX

Benchmark: netperf version 2.4.3
iperf version 2.0.2
OS: MacOSX 10.5
NICs: Myri-10G 10G-PCIE-8A
Driver: Myri10GE version 1.1.0
Interrupt Coalescing: 75 µs
Large Receive Offload (LRO): enabled
Hosts: MacPro with Intel dual-core dual-processor 2.6GHz Xeons
Topology: point-to-point (switchless)

For these MacOSX tests, LRO was enabled as recommended in the Performance Tuning section of the MacOSX Myri10GE README, and the netserver was run without options. The iperf server was run with the same window (-w) and buffer length (-l) arguments as the client. Performance is measured using 9000-byte (jumbo) frames and 1500-byte (standard) frames, and bandwidth (BW) is measured in Megabits/second.

Netperf Results, MTU 9000

Commands:
     $ netperf -H macpro01-m -t TCP_STREAM -C -c -l 60 -- -S 768K -S 768K -m 256K
     $ netperf -H macpro01-m -t UDP_STREAM -l 60 -C -c -- -m 32K -s 512K -S512K
     $ iperf -c macpro01-m -w  -w 768k -l 256k -P 2 -f m -t 60
  
Results:
     Netperf Test   MTU    BW       TX_CPU %   RX_CPU %
     ------------   ----   -------  --------   --------
     TCP_STREAM     9000   9661.82  41.38      36.74
     UDP_STREAM_TX  9000   6867.00  28.08      00.00
     UDP_STREAM_RX  9000   6867.00  00.00      39.26
  
Dual-Stream TCP Results (2 netperf processes):
     Netperf Test   MTU    BW       TX_CPU %   RX_CPU %
     ------------   ----   -------  --------   --------
     TCP_STREAM     9000   9692.00  54.72      47.36
  
Dual-Stream TCP Results (2 iperf threads):
      Test   MTU    BW       TX_CPU %   RX_CPU %
     ------------   ----   -------  --------   --------
     iperf          9000   9825.00  65         58         
  

Netperf Results, MTU 1500

Commands:
     $ netperf -H macpro01-m -t TCP_STREAM -C -c -l 60 -- -s 768K -S 768K -m 256K
     $ netperf -H macpro01-m -t UDP_STREAM -l 60 -C -c -- -m 32K -s 512K -S512K
     $ iperf -c macpro01-m -w 512k -l 256k -P 2 -f m -t 60
  
Single-Stream Results:
     Netperf Test   MTU    BW       TX_CPU %   RX_CPU %
     ------------   ----   -------  --------   --------
     TCP_STREAM     1500   4782.15  41.70      39.15
     UDP_STREAM_TX  1500   3310.40  27.85      00.00
     UDP_STREAM_RX  1500   3310.40  00.00      39.24
  
Dual-Stream TCP Results (2 netperf processes):
     Netperf Test   MTU    BW       TX_CPU %   RX_CPU %
     ------------   ----   -------  --------   --------
     TCP_STREAM     1500   4367.00  42.29      43.75
  
Dual-Stream TCP Results (2 iperf threads):
      Test   MTU    BW       TX_CPU %   RX_CPU %
     ------------   ----   -------  --------   --------
     iperf          1500   6417.00  76         65
  

Notes:


FreeBSD

Benchmark: netperf version 2.4.5
OS: FreeBSD/amd64 7.2-RELEASE
NICs: Myri-10G 10G-PCIE-8B
Driver: if_mxge
Interrupt Coalescing: 30 µs
TCP Segmentation Offload (TSO): enabled
Large Receive Offload (LRO): enabled
Hosts: Asus RS500-E6-PS4 systems with dual Intel quad-core 2.93GHz
Xeon X5570s (8 2.93GHz Nehalem cores)
Topology: point-to-point (switchless)

For these FreeBSD tests, the kern.ipc.maxsockbuf tunable was increased to 16777216, and the netserver was run without options. Performance is measured using 9000-byte (jumbo) frames and 1500-byte (standard) frames, and bandwidth (BW) is measured in Megabits/second.

Netperf Results, MTU 9000

Commands:
     $ netperf -H asus02-m -t TCP_STREAM -C -c -l 60 
     $ netperf -H asus02-m -t TCP_SENDFILE -l 60 -C -c -F /boot/kernel/kernel 
     $ netperf -H asus02-m -t UDP_STREAM -l 60 -C -c -- -m 8972 -s 128K -S 128K
  
Results:
     Netperf Test   MTU    BW       TX_CPU %   RX_CPU %
     ------------   ----   -------  --------   --------
     TCP_STREAM     9000   9887.91   8.22       7.73
     TCP_SENDFILE   9000   9887.31   6.33       7.50
     UDP_STREAM_TX  9000   9926.00  13.85	0.00
     UDP_STREAM_RX  9000   9926.00   0.00	6.77
  

Netperf Results, MTU 1500

Commands:
     $ netperf -H asus02-m -t TCP_STREAM -C -c -l 60
     $ netperf -H asus02-m -t TCP_SENDFILE -l 60 -C -c -F /boot/kernel/kernel
     $ netperf -H asus02-m -t UDP_STREAM -l 60 -C -c -- -m 16256 -s 128K -S 128K
  
Results:
     Netperf Test   MTU    BW       TX_CPU %   RX_CPU %
     ------------   ----   -------  --------   --------
     TCP_STREAM     1500   9361.92   8.26      10.07
     TCP_SENDFILE   1500   9390.04   5.90      10.21
     UDP_STREAM_TX  1500   9243.90  14.18       0.00
     UDP_STREAM_RX  1500   9243.90   0.00      14.69
  

Notes:


Myricom banner
Last updated: 25 August 2009