Financial Trading
Myri-10G 10-Gigabit Ethernet Performance

 

We report performance measurements for Myri-10G Network Adapters using our 10-Gigabit Ethernet driver, Myri10GE, on Linux, Windows, Solaris, MacOSX, and FreeBSD. If you are unable to reproduce these performance results, refer to this FAQ entry for OS-specific 10GbE performance tuning suggestions.

 

Linux | Windows | Solaris GLDv2 | Solaris GLDv3 | MacOSX | FreeBSD

Linux

 

Test Information

For these Linux tests, TCP buffer sizes were increased and TCP timestamps were disabled as recommended in the Performance Tuning section of the Linux Myri10GE README, and the netserver was run without options. Performance is measured using 9000-byte (jumbo) frames and 1500-byte (standard) frames, and bandwidth (BW) is measured in Megabits/second.

Notes

  • If you are unable to reproduce these performance results, refer to this FAQ entry for Linux performance tuning suggestions, as well as the Test Results with Myri-10G NICs and PCI-Express Motherboards web page for comparative results with different chipsets and motherboards.

  • CPU utilization (CPU %) is shown as the percentage of all CPUs as reported by netperf with the -c and -C options. I.e. a value of 100 would mean all CPUs are fully utilized.

Benchmark: netperf version 2.4.5
OS: Centos5 x86_64 2.6.18-128.1.16.el5 kernel
Network Adapters: Myri-10G 10G-PCIE-8B
Driver: Myri10GE version 1.5.0
Interrupt Coalescing: 75 µs
TCP Segmentation Offload (TSO): enabled
Large Receive Offload (LRO): enabled
Hosts: Asus RS500-E6-PS4 systems with dual Intel quad-core 2.93GHz Xeon X5570s (8 2.93GHz Nehalem cores)
Topology: point-to-point (switchless)
Netperf Results, MTU 9000
Commands:
$ netperf -H asus2-m -t TCP_STREAM -C -c -l 60
$ netperf -H asus2-m -t TCP_SENDFILE -l 60 -C -c -F /boot/vmlinuz-2.6.18-128.1.16.el5 
$ netperf -H asus2-m -t UDP_STREAM -l 60 -C -c -- -m 8972 -s 4M -S 4M
Results:
  
Netperf Test   MTU    BW       TX_CPU %   RX_CPU %
------------   ----   -------  --------   --------
TCP_STREAM     9000   9910.33    4.52       2.84
TCP_SENDFILE   9000   9910.32    2.71        2.82
UDP_STREAM_TX  9000   9924.70    5.73        0.00
UDP_STREAM_RX  9000   9924.70    0.00        3.66
Netperf Results, MTU 1500
Commands:
$ netperf -H asus2-m -t TCP_STREAM -C -c -l 60
$ netperf -H asus2-m -t TCP_SENDFILE -l 60 -C -c -F /boot/vmlinuz-2.6.18-128.1.16.el5 
$ netperf -H asus2-m -t UDP_STREAM -l 60 -C -c -- -m 1472 -s 4M -S 4M
Results:
Netperf Test   MTU    BW       TX_CPU %   RX_CPU %
------------   ----   -------  --------   --------
TCP_STREAM     1500   9477.10   4.62       5.57
TCP_SENDFILE   1500   9452.54   2.56       5.63
UDP_STREAM_TX  1500   9249.00  12.51       0.00
UDP_STREAM_RX  1500   9249.00   0.00      11.59

 

Windows

 

Test Information

For these Windows tests, no registry entries were added.

Make sure ntttcpr (receiver) process is bound to a different processor than the processor handling receive network traffic.

Notes

  • NTTTCP is a closed source benchmark available from Microsoft at http://www.microsoft.com/whdc/device/network/TCP_tool.mspx. Windows OSes benefit from overlapping socket communication using Winsock2. The benchmark is based on the original ttcp benchmark.

  • The performance results can vary and are dependent on CPU type and the Windows operating system version. Tweaking can be done for example by changing the message size and the socket sizes on the receive side (-l, -rb). When using version 2.5, an optional -fr argument can also improve performance.

  • A "Frame" in the ntttcps output refers to a unit passed to the socket (1MB), and a "Packet" refers to a unit passed from the TCP stack to the Ethernet driver (64KB, since the TSO is enabled).

  • If you are using Windows 2000, XP, or 2003, you will need to add the following two registry entries:

    HKLM\System\CurrentControlSet\Services\Tcpip\Parameters:

    • Tcp1323Opts, type REG_DWORD, value set to 1.

    • TcpWindowSize, type REG_DWORD, value set to 512K.

  • For a detailed list of Performance Tuning Guidelines for Windows Server 2003 and 2008 refer to this FAQ entry.

Benchmark: NTttcp v3.0
OS: Windows Server 2008 R2
Network Adapters: Myri-10G 10G-PCIE-8C
Driver: Myri10GE Windows 1.1.8
Interrupt Coalescing: 40 µs
Receive Buffers: 2048
Receive Side Scaling (RSS): enabled
Large Send Offload: enabled
IPv4 Checksum Offload: enabled
Flow Control: enabled
Hosts: Intel Xeon CPU X3470 @2.93 Ghz, 8 processors
Topology: point-to-point (switchless)
NTttcp Results, MTU 9000B
Commands:
Sender: ntttcps -m 1,1,10.0.0.8 -l 1048576 -n 100000 -w -a 8
Receiver: ntttcpr -m 1,1,10.0.0.8 -l 1048576 -rb 2097152 -n 100000 -w -a 8 -fr
Results on the sender:
Thread Realtime(s) Throughput(KB/s) Throughput(Mbit/s) Avg Bytes per Completion
====== =========== ================ ================== ========================
     0      84.630      1239012.171           9912.097              1048492.121

Total Bytes(MEG) Realtime(s) Average Frame Size Total Throughput(Mbit/s)
================ =========== ================== ========================
   104857.600000      84.630           8959.464                 9912.097

Total Buffers Throughput(Buffers/s) Pkts(sent/intr) Intr(count/s) Cycles/Byte
============= ===================== =============== ============= ===========
   100000.000              1181.614               5      23623.87         0.7

Packets Sent Packets Received Total Retransmits Total Errors Avg. CPU %
============ ================ ================= ============ ==========
    11703557           987863                 0            0       3.94
Results on the receiver:
Thread Realtime(s) Throughput(KB/s) Throughput(Mbit/s) Avg Bytes per Completion
====== =========== ================ ================== ========================
     0      85.083      1232415.406           9859.323               710499.177

Total Bytes(MEG) Realtime(s) Average Frame Size Total Throughput(Mbit/s)
================ =========== ================== ========================
   104857.600000      85.083           8959.657                 9859.323

Total Buffers Throughput(Buffers/s) Pkts(recv/intr) Intr(count/s) Cycles/Byte
============= ===================== =============== ============= ===========
   100000.000              1175.323               7      18526.87         2.2

Packets Sent Packets Received Total Retransmits Total Errors Avg. CPU %
============ ================ ================= ============ ==========
      987690         11703305                 0            0      11.45
NTttcp Results, MTU 1500B
Commands:
Sender: ntttcps -m 1,1,10.0.0.8 -l 1048576 -n 100000 -w -a 8
Receiver: ntttcpr -m 1,1,10.0.0.8 -l 1048576 -rb 2097152 -n 100000 -w -a 8 -fr
Results on the sender:
Thread Realtime(s) Throughput(KB/s) Throughput(Mbit/s) Avg Bytes per Completion
====== =========== ================ ================== ========================
     0      88.655      1182760.138           9462.081              1048492.121

Total Bytes(MEG) Realtime(s) Average Frame Size Total Throughput(Mbit/s)
================ =========== ================== ========================
   104857.600000      88.655           1459.980                 9462.081

Total Buffers Throughput(Buffers/s) Pkts(sent/intr) Intr(count/s) Cycles/Byte
============= ===================== =============== ============= ===========
   100000.000              1127.968              32      24754.86         0.8

Packets Sent Packets Received Total Retransmits Total Errors Avg. CPU %
============ ================ ================= ============ ==========
    71821274          2285888                 0            0       3.80
Results on the receiver:
Thread Realtime(s) Throughput(KB/s) Throughput(Mbit/s) Avg Bytes per Completion
====== =========== ================ ================== ========================
     0      89.108      1176747.318           9413.979               779199.084

Total Bytes(MEG) Realtime(s) Average Frame Size Total Throughput(Mbit/s)
================ =========== ================== ========================
   104857.600000      89.108           1459.982                 9413.979

Total Buffers Throughput(Buffers/s) Pkts(recv/intr) Intr(count/s) Cycles/Byte
============= ===================== =============== ============= ===========
   100000.000              1122.234              71      11347.99         2.7

Packets Sent Packets Received Total Retransmits Total Errors Avg. CPU %
============ ================ ================= ============ ==========
     2285955         71821177                 0            0      13.38

Solaris GLDv2

 

Test Information

For these Solaris GLDv2 tests, the netserver was run without options. Performance is measured using 9000-byte (jumbo) frames and 1500-byte (standard) frames, and bandwidth (BW) is measured in Megabits/second.

Notes

  • Solaris's GLDv2 driver ABI does not support TCP Segmentation Offload (TSO).

  • CPU utilization (CPU %) is shown as the percentage of all CPUs as reported by netperf with the -c and -C options. I.e. a value of 100 would mean all CPUs are fully utilized.

  • Netperf's CPU binding (-Tlocal,remote) feature was used to bind the netserver and the netperf processes to all combinations of local and remote CPUs. The results from the best combination of local and remote CPU binding are presented.

Benchmark: netperf version 2.4.5
OS: OpenSolaris 2008.11 (snv_101b_rc2)
Network Adapters: Myri-10G 10G-PCIE-8B
Driver: Myri10GE version AMD64 1.0.4
Interrupt Coalescing: 30 µs
Large Receive Offload (LRO): enabled
Hosts: Asus RS500-E6-PS4 systems with dual Intel quad-core 2.93GHz Xeon X5570s (8 2.93GHz Nehalem cores)
Topology: point-to-point (switchless)
Netperf Results, MTU 9000
Commands:
$ netperf -H asus2-m -t TCP_STREAM -C -c -l 60  -T loc,remote -- -s 512K -S 512K 
$ netperf -H asus2-m -t TCP_SENDFILE -F/var/tmp/scratch -C -c -l 60  -T loc,remote -- -s 512K -S 512K 
$ netperf -H asus2-m -t UDP_STREAM -l 60 -C -c  -T loc,remote -- -m 8972 -s 512K -S 512K
Results:
Netperf Test   MTU    BW       TX_CPU %   RX_CPU %
------------   ----   -------  --------   --------
TCP_STREAM     9000   9877.62   9.91      10.08
TCP_SENDFILE   9000   9887.49  11.83      10.34
UDP_STREAM_TX  9000   9880.90  17.51      00.00
UDP_STREAM_RX  9000   9880.90  00.00      17.93
Netperf Results, MTU 1500
Commands:
$ netperf -H asus2-m -t TCP_STREAM -C -c -l 60  -T loc,remote -- -s 1M -S 1M 
$ netperf -H asus2-m -t TCP_SENDFILE -F/var/tmp/scratch -C -c -l 60  -T loc,remote -- -s 1M -S 1M 
$ netperf -H asus2-m -t UDP_STREAM -l 60 -C -c  -T loc,remote -- -m 1472 -s 1M -S 1M 
Results:
Netperf Test   MTU    BW       TX_CPU %   RX_CPU %
------------   ----   -------  --------   --------
TCP_STREAM     1500   7787.70  17.59      19.51
TCP_SENDFILE   1500   5775.41  24.65      17.16
UDP_STREAM_TX  1500   5291.70  15.05      00.00
UDP_STREAM_RX  1500   5165.40  00.00      28.63

Solaris GLDv3

 

Test Information

For these Solaris GLDv3 tests, the netserver was run without options. Performance is measured using 9000-byte (jumbo) frames and 1500-byte (standard) frames, and bandwidth (BW) is measured in Megabits/second.

Notes

  • CPU utilization (CPU %) is shown as the percentage of all CPUs as reported by netperf with the -c and -C options. I.e. a value of 100 would mean all CPUs are fully utilized.

Benchmark: netperf version 2.4.5
OS: OpenSolaris 2008.11 (snv_101b_rc2)
Network Adapters: Myri-10G 10G-PCIE-8B
Driver: Myri10GE version AMD64 1.4.5gldv3
Interrupt Coalescing: 125 µs
TCP Segmentation Offload (TSO): enabled
Large Receive Offload (LRO): enabled
Hosts: Asus RS500-E6-PS4 systems with dual Intel quad-core 2.93GHz Xeon X5570s (8 2.93GHz Nehalem cores)
Topology: point-to-point (switchless)
Netperf Results, MTU 9000
Commands:
$ netperf -H asus2-m -t TCP_STREAM -C -c -l 60  -- -s 512K -S 512K 
$ netperf -H asus2-m -t TCP_SENDFILE -F/var/tmp/scratch -C -c -l 60   -- -s 512K -S 512K 
$ netperf -H asus2-m -t UDP_STREAM -l 60 -C -c   -- -m 8972 -s 512K -S 512K
Results:
Netperf Test   MTU    BW       TX_CPU %   RX_CPU %
------------   ----   -------  --------   --------
TCP_STREAM     9000   9868.72   9.29       8.91
TCP_SENDFILE   9000   9866.15  11.96       8.94
UDP_STREAM_TX  9000   9925.20   9.33      00.00
UDP_STREAM_RX  9000   9925.20  00.00       9.04
Netperf Results, MTU 1500
Commands:
$ netperf -H asus2-m -t TCP_STREAM -C -c -l 60   -- -s 512K -S 512K 
$ netperf -H asus2-m -t TCP_SENDFILE -F/var/tmp/scratch -C -c -l 60   -- -s 512K -S 512K 
$ netperf -H asus2-m -t UDP_STREAM -l 60 -C -c   -- -m 1472 -s 512K -S 512K
Results:
Netperf Test   MTU    BW       TX_CPU %   RX_CPU %
------------   ----   -------  --------   --------
TCP_STREAM     1500   9345.75   7.75      20.32
TCP_SENDFILE   1500   9285.96   9.15      20.98
UDP_STREAM_TX  1500   5978.60  12.55      00.00
UDP_STREAM_RX  1500   5978.60  00.00      24.20

MacOSX

 

Test Information

For these MacOSX tests, LRO was enabled as recommended in the Performance Tuning section of the MacOSX Myri10GE README, and the netserver was run without options. The iperf server was run with the same window (-w) and buffer length (-l) arguments as the client. Performance is measured using 9000-byte (jumbo) frames and 1500-byte (standard) frames, and bandwidth (BW) is measured in Megabits/second.

Notes

  • MacOSX does not support TCP Segmentation Offload (TSO).

  • The CPU usage reported for the iperf runs is the sum of the user and system times as reported by iostat. Iperf itself does not report CPU usage.

Benchmark: netperf 2.4.3
iperf version 2.0.2
OS: MacOSX 10.5
Network Adapters: Myri-10G 10G-PCIE-8A
Driver: Myri10GE version 1.1.0
Interrupt Coalescing: 75 µs
Large Receive Offload (LRO): enabled
Hosts: MacPro with Intel dual-core dual-processor 2.6GHz Xeons
Topology: point-to-point (switchless)
Netperf Results, MTU 9000
Commands:
$ netperf -H macpro01-m -t TCP_STREAM -C -c -l 60 -- -S 768K -S 768K -m 256K
$ netperf -H macpro01-m -t UDP_STREAM -l 60 -C -c -- -m 32K -s 512K -S512K
$ iperf -c macpro01-m -w  -w 768k -l 256k -P 2 -f m -t 60
Single-Stream Results:
Netperf Test   MTU    BW       TX_CPU %   RX_CPU %
------------   ----   -------  --------   --------
TCP_STREAM     9000   9661.82  41.38      36.74
UDP_STREAM_TX  9000   6867.00  28.08      00.00
UDP_STREAM_RX  9000   6867.00  00.00      39.26
Dual-Stream TCP Results (2 netperf processes):
Netperf Test   MTU    BW       TX_CPU %   RX_CPU %
------------   ----   -------  --------   --------
TCP_STREAM     9000   9692.00  54.72      47.36
Dual-Stream TCP Results (2 iperf threads):
Iperf Test     MTU    BW       TX_CPU %   RX_CPU %
------------   ----   -------  --------   --------
iperf          9000   9825.00  65         58
Netperf Results, MTU 1500
Commands:
$ netperf -H macpro01-m -t TCP_STREAM -C -c -l 60 -- -s 768K -S 768K -m 256K
$ netperf -H macpro01-m -t UDP_STREAM -l 60 -C -c -- -m 32K -s 512K -S512K
$ iperf -c macpro01-m -w 512k -l 256k -P 2 -f m -t 60
Single-Stream Results:
Netperf Test   MTU    BW       TX_CPU %   RX_CPU %
------------   ----   -------  --------   --------
TCP_STREAM     1500   4782.15  41.70      39.15
UDP_STREAM_TX  1500   3310.40  27.85      00.00
UDP_STREAM_RX  1500   3310.40  00.00      39.24
Dual-Stream TCP Results (2 netperf processes):
Netperf Test   MTU    BW       TX_CPU %   RX_CPU %
------------   ----   -------  --------   --------
TCP_STREAM     1500   4367.00  42.29      43.75
Dual-Stream TCP Results (2 iperf threads):
Iperf Test     MTU    BW       TX_CPU %   RX_CPU %
------------   ----   -------  --------   --------
iperf          1500   6417.00  76         65

FreeBSD

 

Test Information

For these FreeBSD tests, the kern.ipc.maxsockbuf tunable was increased to 16777216, and the netserver was run without options. Performance is measured using 9000-byte (jumbo) frames and 1500-byte (standard) frames, and bandwidth (BW) is measured in Megabits/second.

Notes

  • CPU utilization (CPU %) is shown as the percentage of all CPUs as reported by netperf with the -c and -C options. I.e. a value of 100 would mean all CPUs are fully utilized.

Benchmark: netperf 2.4.5
OS: FreeBSD/amd64 7.2-RELEASE
Network Adapters: Myri-10G 10G-PCIE-8B
Driver: if_mxge
Interrupt Coalescing: 30 µs
TCP Segmentation Offload (TSO): enabled
Large Receive Offload (LRO): enabled
Hosts: Asus RS500-E6-PS4 systems with dual Intel quad-core 2.93GHz Xeon X5570s (8 2.93GHz Nehalem cores)
Topology: point-to-point (switchless)
Netperf Results, MTU 9000
Commands:
$ netperf -H asus2-m -t TCP_STREAM -C -c -l 60 
$ netperf -H asus2-m -t TCP_SENDFILE -l 60 -C -c -F /boot/kernel/kernel 
$ netperf -H asus2-m -t UDP_STREAM -l 60 -C -c -- -m 8972 -s 128K -S 128K
Results:
Netperf Test   MTU    BW       TX_CPU %   RX_CPU %
------------   ----   -------  --------   --------
TCP_STREAM     9000   9887.91   8.22       7.73
TCP_SENDFILE   9000   9887.31   6.33       7.50
UDP_STREAM_TX  9000   9926.00  13.85    0.00
UDP_STREAM_RX  9000   9926.00   0.00    6.77
Netperf Results, MTU 1500
Commands:
$ netperf -H asus2-m -t TCP_STREAM -C -c -l 60
$ netperf -H asus2-m -t TCP_SENDFILE -l 60 -C -c -F /boot/kernel/kernel
$ netperf -H asus2-m -t UDP_STREAM -l 60 -C -c -- -m 16256 -s 128K -S 128K
Results:
Netperf Test   MTU    BW       TX_CPU %   RX_CPU %
------------   ----   -------  --------   --------
TCP_STREAM     1500   9361.92   8.26      10.07
TCP_SENDFILE   1500   9390.04   5.90      10.21
UDP_STREAM_TX  1500   9243.90  14.18       0.00
UDP_STREAM_RX  1500   9243.90   0.00      14.69
 
Contact Our Sales Channels Request More Information Follow Us Linked In