We report performance measurements for Myri-10G Network Adapters using our 10-Gigabit Ethernet driver, Myri10GE, on Linux, Windows, Solaris, MacOSX, and FreeBSD. If you are unable to reproduce these performance results, refer to this FAQ entry for OS-specific 10GbE performance tuning suggestions.
Linux | Windows | Solaris GLDv2 | Solaris GLDv3 | MacOSX | FreeBSD
Linux
| Test Information
For these Linux tests, TCP buffer sizes were increased and TCP timestamps were disabled as recommended in the Performance Tuning section of the Linux Myri10GE README, and the netserver was run without options. Performance is measured using 9000-byte (jumbo) frames and 1500-byte (standard) frames, and bandwidth (BW) is measured in Megabits/second.
Notes
-
If you are unable to reproduce these performance results, refer to this FAQ entry for Linux performance tuning suggestions, as well as the Test Results with Myri-10G NICs and PCI-Express Motherboards web page for comparative results with different chipsets and motherboards.
-
CPU utilization (CPU %) is shown as the percentage of all CPUs as reported by netperf with the -c and -C options. I.e. a value of 100 would mean all CPUs are fully utilized.
|
| Benchmark: |
netperf version 2.4.5 |
| OS: |
Centos5 x86_64 2.6.18-128.1.16.el5 kernel |
| Network Adapters: |
Myri-10G 10G-PCIE-8B |
| Driver: |
Myri10GE version 1.5.0 |
| Interrupt Coalescing: |
75 µs |
| TCP Segmentation Offload (TSO): |
enabled |
| Large Receive Offload (LRO): |
enabled |
| Hosts: |
Asus RS500-E6-PS4 systems with dual Intel quad-core 2.93GHz Xeon X5570s (8 2.93GHz Nehalem cores) |
| Topology: |
point-to-point (switchless) |
| Netperf Results, MTU 9000 |
| Commands: |
$ netperf -H asus2-m -t TCP_STREAM -C -c -l 60
$ netperf -H asus2-m -t TCP_SENDFILE -l 60 -C -c -F /boot/vmlinuz-2.6.18-128.1.16.el5
$ netperf -H asus2-m -t UDP_STREAM -l 60 -C -c -- -m 8972 -s 4M -S 4M
|
| Results: |
Netperf Test MTU BW TX_CPU % RX_CPU %
------------ ---- ------- -------- --------
TCP_STREAM 9000 9910.33 4.52 2.84
TCP_SENDFILE 9000 9910.32 2.71 2.82
UDP_STREAM_TX 9000 9924.70 5.73 0.00
UDP_STREAM_RX 9000 9924.70 0.00 3.66
|
| Netperf Results, MTU 1500 |
| Commands: |
$ netperf -H asus2-m -t TCP_STREAM -C -c -l 60
$ netperf -H asus2-m -t TCP_SENDFILE -l 60 -C -c -F /boot/vmlinuz-2.6.18-128.1.16.el5
$ netperf -H asus2-m -t UDP_STREAM -l 60 -C -c -- -m 1472 -s 4M -S 4M
|
| Results: |
Netperf Test MTU BW TX_CPU % RX_CPU %
------------ ---- ------- -------- --------
TCP_STREAM 1500 9477.10 4.62 5.57
TCP_SENDFILE 1500 9452.54 2.56 5.63
UDP_STREAM_TX 1500 9249.00 12.51 0.00
UDP_STREAM_RX 1500 9249.00 0.00 11.59
|
Windows
| Test Information
For these Windows tests, no registry entries were added.
Make sure ntttcpr (receiver) process is bound to a different processor than the processor handling receive network traffic.
Notes
-
NTTTCP is a closed source benchmark available from Microsoft at http://www.microsoft.com/whdc/device/network/TCP_tool.mspx. Windows OSes benefit from overlapping socket communication using Winsock2. The benchmark is based on the original ttcp benchmark.
-
The performance results can vary and are dependent on CPU type and the Windows operating system version. Tweaking can be done for example by changing the message size and the socket sizes on the receive side (-l, -rb). When using version 2.5, an optional -fr argument can also improve performance.
-
A "Frame" in the ntttcps output refers to a unit passed to the socket (1MB), and a "Packet" refers to a unit passed from the TCP stack to the Ethernet driver (64KB, since the TSO is enabled).
-
If you are using Windows 2000, XP, or 2003, you will need to add the following two registry entries:
HKLM\System\CurrentControlSet\Services\Tcpip\Parameters:
-
Tcp1323Opts, type REG_DWORD, value set to 1.
-
TcpWindowSize, type REG_DWORD, value set to 512K.
-
For a detailed list of Performance Tuning Guidelines for Windows Server 2003 and 2008 refer to this FAQ entry.
|
| Benchmark: |
NTttcp v3.0 |
| OS: |
Windows Server 2008 R2 |
| Network Adapters: |
Myri-10G 10G-PCIE-8C |
| Driver: |
Myri10GE Windows 1.1.8 |
| Interrupt Coalescing: |
40 µs |
| Receive Buffers: |
2048 |
| Receive Side Scaling (RSS): |
enabled |
| Large Send Offload: |
enabled |
| IPv4 Checksum Offload: |
enabled |
| Flow Control: |
enabled |
| Hosts: |
Intel Xeon CPU X3470 @2.93 Ghz, 8 processors |
| Topology: |
point-to-point (switchless) |
| NTttcp Results, MTU 9000B |
| Commands: |
Sender: ntttcps -m 1,1,10.0.0.8 -l 1048576 -n 100000 -w -a 8
Receiver: ntttcpr -m 1,1,10.0.0.8 -l 1048576 -rb 2097152 -n 100000 -w -a 8 -fr
|
| Results on the sender: |
Thread Realtime(s) Throughput(KB/s) Throughput(Mbit/s) Avg Bytes per Completion
====== =========== ================ ================== ========================
0 84.630 1239012.171 9912.097 1048492.121
Total Bytes(MEG) Realtime(s) Average Frame Size Total Throughput(Mbit/s)
================ =========== ================== ========================
104857.600000 84.630 8959.464 9912.097
Total Buffers Throughput(Buffers/s) Pkts(sent/intr) Intr(count/s) Cycles/Byte
============= ===================== =============== ============= ===========
100000.000 1181.614 5 23623.87 0.7
Packets Sent Packets Received Total Retransmits Total Errors Avg. CPU %
============ ================ ================= ============ ==========
11703557 987863 0 0 3.94
|
| Results on the receiver: |
Thread Realtime(s) Throughput(KB/s) Throughput(Mbit/s) Avg Bytes per Completion
====== =========== ================ ================== ========================
0 85.083 1232415.406 9859.323 710499.177
Total Bytes(MEG) Realtime(s) Average Frame Size Total Throughput(Mbit/s)
================ =========== ================== ========================
104857.600000 85.083 8959.657 9859.323
Total Buffers Throughput(Buffers/s) Pkts(recv/intr) Intr(count/s) Cycles/Byte
============= ===================== =============== ============= ===========
100000.000 1175.323 7 18526.87 2.2
Packets Sent Packets Received Total Retransmits Total Errors Avg. CPU %
============ ================ ================= ============ ==========
987690 11703305 0 0 11.45
|
| NTttcp Results, MTU 1500B |
| Commands: |
Sender: ntttcps -m 1,1,10.0.0.8 -l 1048576 -n 100000 -w -a 8
Receiver: ntttcpr -m 1,1,10.0.0.8 -l 1048576 -rb 2097152 -n 100000 -w -a 8 -fr
|
| Results on the sender: |
Thread Realtime(s) Throughput(KB/s) Throughput(Mbit/s) Avg Bytes per Completion
====== =========== ================ ================== ========================
0 88.655 1182760.138 9462.081 1048492.121
Total Bytes(MEG) Realtime(s) Average Frame Size Total Throughput(Mbit/s)
================ =========== ================== ========================
104857.600000 88.655 1459.980 9462.081
Total Buffers Throughput(Buffers/s) Pkts(sent/intr) Intr(count/s) Cycles/Byte
============= ===================== =============== ============= ===========
100000.000 1127.968 32 24754.86 0.8
Packets Sent Packets Received Total Retransmits Total Errors Avg. CPU %
============ ================ ================= ============ ==========
71821274 2285888 0 0 3.80
|
| Results on the receiver: |
Thread Realtime(s) Throughput(KB/s) Throughput(Mbit/s) Avg Bytes per Completion
====== =========== ================ ================== ========================
0 89.108 1176747.318 9413.979 779199.084
Total Bytes(MEG) Realtime(s) Average Frame Size Total Throughput(Mbit/s)
================ =========== ================== ========================
104857.600000 89.108 1459.982 9413.979
Total Buffers Throughput(Buffers/s) Pkts(recv/intr) Intr(count/s) Cycles/Byte
============= ===================== =============== ============= ===========
100000.000 1122.234 71 11347.99 2.7
Packets Sent Packets Received Total Retransmits Total Errors Avg. CPU %
============ ================ ================= ============ ==========
2285955 71821177 0 0 13.38
|
Solaris GLDv2
| Test Information
For these Solaris GLDv2 tests, the netserver was run without options. Performance is measured using 9000-byte (jumbo) frames and 1500-byte (standard) frames, and bandwidth (BW) is measured in Megabits/second.
Notes
-
Solaris's GLDv2 driver ABI does not support TCP Segmentation Offload (TSO).
-
CPU utilization (CPU %) is shown as the percentage of all CPUs as reported by netperf with the -c and -C options. I.e. a value of 100 would mean all CPUs are fully utilized.
-
Netperf's CPU binding (-Tlocal,remote) feature was used to bind the netserver and the netperf processes to all combinations of local and remote CPUs. The results from the best combination of local and remote CPU binding are presented.
|
| Benchmark: |
netperf version 2.4.5 |
| OS: |
OpenSolaris 2008.11 (snv_101b_rc2) |
| Network Adapters: |
Myri-10G 10G-PCIE-8B |
| Driver: |
Myri10GE version AMD64 1.0.4 |
| Interrupt Coalescing: |
30 µs |
| Large Receive Offload (LRO): |
enabled |
| Hosts: |
Asus RS500-E6-PS4 systems with dual Intel quad-core 2.93GHz Xeon X5570s (8 2.93GHz Nehalem cores) |
| Topology: |
point-to-point (switchless) |
| Netperf Results, MTU 9000 |
| Commands: |
$ netperf -H asus2-m -t TCP_STREAM -C -c -l 60 -T loc,remote -- -s 512K -S 512K
$ netperf -H asus2-m -t TCP_SENDFILE -F/var/tmp/scratch -C -c -l 60 -T loc,remote -- -s 512K -S 512K
$ netperf -H asus2-m -t UDP_STREAM -l 60 -C -c -T loc,remote -- -m 8972 -s 512K -S 512K
|
| Results: |
Netperf Test MTU BW TX_CPU % RX_CPU %
------------ ---- ------- -------- --------
TCP_STREAM 9000 9877.62 9.91 10.08
TCP_SENDFILE 9000 9887.49 11.83 10.34
UDP_STREAM_TX 9000 9880.90 17.51 00.00
UDP_STREAM_RX 9000 9880.90 00.00 17.93
|
| Netperf Results, MTU 1500 |
| Commands: |
$ netperf -H asus2-m -t TCP_STREAM -C -c -l 60 -T loc,remote -- -s 1M -S 1M
$ netperf -H asus2-m -t TCP_SENDFILE -F/var/tmp/scratch -C -c -l 60 -T loc,remote -- -s 1M -S 1M
$ netperf -H asus2-m -t UDP_STREAM -l 60 -C -c -T loc,remote -- -m 1472 -s 1M -S 1M
|
| Results: |
Netperf Test MTU BW TX_CPU % RX_CPU %
------------ ---- ------- -------- --------
TCP_STREAM 1500 7787.70 17.59 19.51
TCP_SENDFILE 1500 5775.41 24.65 17.16
UDP_STREAM_TX 1500 5291.70 15.05 00.00
UDP_STREAM_RX 1500 5165.40 00.00 28.63
|
Solaris GLDv3
| Test Information
For these Solaris GLDv3 tests, the netserver was run without options. Performance is measured using 9000-byte (jumbo) frames and 1500-byte (standard) frames, and bandwidth (BW) is measured in Megabits/second.
Notes
|
| Benchmark: |
netperf version 2.4.5 |
| OS: |
OpenSolaris 2008.11 (snv_101b_rc2) |
| Network Adapters: |
Myri-10G 10G-PCIE-8B |
| Driver: |
Myri10GE version AMD64 1.4.5gldv3 |
| Interrupt Coalescing: |
125 µs |
| TCP Segmentation Offload (TSO): |
enabled |
| Large Receive Offload (LRO): |
enabled |
| Hosts: |
Asus RS500-E6-PS4 systems with dual Intel quad-core 2.93GHz Xeon X5570s (8 2.93GHz Nehalem cores) |
| Topology: |
point-to-point (switchless) |
| Netperf Results, MTU 9000 |
| Commands: |
$ netperf -H asus2-m -t TCP_STREAM -C -c -l 60 -- -s 512K -S 512K
$ netperf -H asus2-m -t TCP_SENDFILE -F/var/tmp/scratch -C -c -l 60 -- -s 512K -S 512K
$ netperf -H asus2-m -t UDP_STREAM -l 60 -C -c -- -m 8972 -s 512K -S 512K
|
| Results: |
Netperf Test MTU BW TX_CPU % RX_CPU %
------------ ---- ------- -------- --------
TCP_STREAM 9000 9868.72 9.29 8.91
TCP_SENDFILE 9000 9866.15 11.96 8.94
UDP_STREAM_TX 9000 9925.20 9.33 00.00
UDP_STREAM_RX 9000 9925.20 00.00 9.04
|
| Netperf Results, MTU 1500 |
| Commands: |
$ netperf -H asus2-m -t TCP_STREAM -C -c -l 60 -- -s 512K -S 512K
$ netperf -H asus2-m -t TCP_SENDFILE -F/var/tmp/scratch -C -c -l 60 -- -s 512K -S 512K
$ netperf -H asus2-m -t UDP_STREAM -l 60 -C -c -- -m 1472 -s 512K -S 512K
|
| Results: |
Netperf Test MTU BW TX_CPU % RX_CPU %
------------ ---- ------- -------- --------
TCP_STREAM 1500 9345.75 7.75 20.32
TCP_SENDFILE 1500 9285.96 9.15 20.98
UDP_STREAM_TX 1500 5978.60 12.55 00.00
UDP_STREAM_RX 1500 5978.60 00.00 24.20
|
MacOSX
| Test Information
For these MacOSX tests, LRO was enabled as recommended in the Performance Tuning section of the MacOSX Myri10GE README, and the netserver was run without options. The iperf server was run with the same window (-w) and buffer length (-l) arguments as the client. Performance is measured using 9000-byte (jumbo) frames and 1500-byte (standard) frames, and bandwidth (BW) is measured in Megabits/second.
Notes
|
| Benchmark: |
netperf 2.4.3 iperf version 2.0.2 |
| OS: |
MacOSX 10.5 |
| Network Adapters: |
Myri-10G 10G-PCIE-8A |
| Driver: |
Myri10GE version 1.1.0 |
| Interrupt Coalescing: |
75 µs |
| Large Receive Offload (LRO): |
enabled |
| Hosts: |
MacPro with Intel dual-core dual-processor 2.6GHz Xeons |
| Topology: |
point-to-point (switchless) |
| Netperf Results, MTU 9000 |
| Commands: |
$ netperf -H macpro01-m -t TCP_STREAM -C -c -l 60 -- -S 768K -S 768K -m 256K
$ netperf -H macpro01-m -t UDP_STREAM -l 60 -C -c -- -m 32K -s 512K -S512K
$ iperf -c macpro01-m -w -w 768k -l 256k -P 2 -f m -t 60
|
| Single-Stream Results: |
Netperf Test MTU BW TX_CPU % RX_CPU %
------------ ---- ------- -------- --------
TCP_STREAM 9000 9661.82 41.38 36.74
UDP_STREAM_TX 9000 6867.00 28.08 00.00
UDP_STREAM_RX 9000 6867.00 00.00 39.26
|
| Dual-Stream TCP Results (2 netperf processes): |
Netperf Test MTU BW TX_CPU % RX_CPU %
------------ ---- ------- -------- --------
TCP_STREAM 9000 9692.00 54.72 47.36
|
| Dual-Stream TCP Results (2 iperf threads): |
Iperf Test MTU BW TX_CPU % RX_CPU %
------------ ---- ------- -------- --------
iperf 9000 9825.00 65 58
|
| Netperf Results, MTU 1500 |
| Commands: |
$ netperf -H macpro01-m -t TCP_STREAM -C -c -l 60 -- -s 768K -S 768K -m 256K
$ netperf -H macpro01-m -t UDP_STREAM -l 60 -C -c -- -m 32K -s 512K -S512K
$ iperf -c macpro01-m -w 512k -l 256k -P 2 -f m -t 60
|
| Single-Stream Results: |
Netperf Test MTU BW TX_CPU % RX_CPU %
------------ ---- ------- -------- --------
TCP_STREAM 1500 4782.15 41.70 39.15
UDP_STREAM_TX 1500 3310.40 27.85 00.00
UDP_STREAM_RX 1500 3310.40 00.00 39.24
|
| Dual-Stream TCP Results (2 netperf processes): |
Netperf Test MTU BW TX_CPU % RX_CPU %
------------ ---- ------- -------- --------
TCP_STREAM 1500 4367.00 42.29 43.75
|
| Dual-Stream TCP Results (2 iperf threads): |
Iperf Test MTU BW TX_CPU % RX_CPU %
------------ ---- ------- -------- --------
iperf 1500 6417.00 76 65
|
FreeBSD
| Test Information
For these FreeBSD tests, the kern.ipc.maxsockbuf tunable was increased to 16777216, and the netserver was run without options. Performance is measured using 9000-byte (jumbo) frames and 1500-byte (standard) frames, and bandwidth (BW) is measured in Megabits/second.
Notes
|
| Benchmark: |
netperf 2.4.5 |
| OS: |
FreeBSD/amd64 7.2-RELEASE |
| Network Adapters: |
Myri-10G 10G-PCIE-8B |
| Driver: |
if_mxge |
| Interrupt Coalescing: |
30 µs |
| TCP Segmentation Offload (TSO): |
enabled |
| Large Receive Offload (LRO): |
enabled |
| Hosts: |
Asus RS500-E6-PS4 systems with dual Intel quad-core 2.93GHz Xeon X5570s (8 2.93GHz Nehalem cores) |
| Topology: |
point-to-point (switchless) |
| Netperf Results, MTU 9000 |
| Commands: |
$ netperf -H asus2-m -t TCP_STREAM -C -c -l 60
$ netperf -H asus2-m -t TCP_SENDFILE -l 60 -C -c -F /boot/kernel/kernel
$ netperf -H asus2-m -t UDP_STREAM -l 60 -C -c -- -m 8972 -s 128K -S 128K
|
| Results: |
Netperf Test MTU BW TX_CPU % RX_CPU %
------------ ---- ------- -------- --------
TCP_STREAM 9000 9887.91 8.22 7.73
TCP_SENDFILE 9000 9887.31 6.33 7.50
UDP_STREAM_TX 9000 9926.00 13.85 0.00
UDP_STREAM_RX 9000 9926.00 0.00 6.77
|
| Netperf Results, MTU 1500 |
| Commands: |
$ netperf -H asus2-m -t TCP_STREAM -C -c -l 60
$ netperf -H asus2-m -t TCP_SENDFILE -l 60 -C -c -F /boot/kernel/kernel
$ netperf -H asus2-m -t UDP_STREAM -l 60 -C -c -- -m 16256 -s 128K -S 128K
|
| Results: |
Netperf Test MTU BW TX_CPU % RX_CPU %
------------ ---- ------- -------- --------
TCP_STREAM 1500 9361.92 8.26 10.07
TCP_SENDFILE 1500 9390.04 5.90 10.21
UDP_STREAM_TX 1500 9243.90 14.18 0.00
UDP_STREAM_RX 1500 9243.90 0.00 14.69
|
|