Myricom logotype

Sniffer10G

1.0.2_pcap1.1.1

Introduction

The Myricom Sniffer10G software uses Myri-10G programmable Network Interface Cards (NICs), a firmware extension, and a user-level library to enable sustained capture of 10-Gigabit Ethernet traffic. Small-packet coalescing and an efficient zero-copy path to host memory allow Sniffer10G to capture streams at line rate for all Ethernet packet sizes. Optionally, Sniffer10G can enable multiple independent processes or threads to concurrently analyze the incoming traffic from one or more NIC ports.

The Sniffer10G software distribution is controlled by a dual-mode driver that operates as a regular 10G Ethernet driver until the device is enabled for packet capture. For packet capture, the library and driver instruct the firmware to divert incoming packets to library-managed user-level receive rings instead of the regular Ethernet driver. All the while, the Ethernet driver remains available to send raw Ethernet packets.

Sniffer10G's packet capture capabilities can be leveraged through the popular Libpcap library or directly through the SNF API, available as a set of C programming language functions. Using an SNF-aware Libpcap, users reference a Myri-10G NIC through its Ethernet interface name and can run existing Libpcap-dependent applications and continue to rely on Libpcap's portable interface. For more advanced usage, the SNF API can be targeted directly. In both cases, going through the SNF interface instead of the kernel ensures a tighter level of integration with the Myri-10G NIC by leveraging user-level receive mechanisms.

snf-layer.jpg

Sniffer10G Application Layers

Libpcap over SNF

The easiest way to realize performance improvements using Sniffer10G is to link existing Libpcap applications and re-link them against a Sniffer10G capable Libpcap (as included in the Sniffer10G distribution). When this Libpcap encounters a Sniffer10G-capable device, it uses the SNF API to obtain user-level zero-copy packets provided by the Myri-10G NIC instead of the usual kernel-based approach.

Native SNF API

The SNF API is also available for applications that require tighter integration with Sniffer10G. When used in its simplest form, the library resembles Libpcap in that the implementation expects a single thread to make successive calls to a receive function (snf_ring_recv) to obtain the next available packet. Under a more advanced form, Sniffer10G implements a variation of the Receive-Side Scaling (RSS) feature that is present in some 10-Gigabit Ethernet drivers. However, Sniffer10G takes the additional step of implementing the RSS feature as multiple user-level zero-copy receive rings. Making the rings available in userspace provides two important advantages over all existing kernel-based packet capture solutions:

SNF API with Receive-Side Scaling

When the library is used to leverage user-level RSS through multiple rings, the application must follow the Single-Program-Multiple-Data (SPMD) model of parallel computation. This model is the predominant computation model in parallel computing and corresponds to multiple threads executing the same capture model on different data, or really, different receieve rings under Sniffer10G. As such, the multiple-ring feature assumes that users maintain a 1-to-1 relationship between threads and rings. With each new call to snf_ring_recv, it is assumed that the previous packet in the ring has been completely consumed.

By default, the Sniffer10G implementation uses a deterministic hashing function to make sure that packets that are contained in a particular TCP or UDP flow are always delivered to the same ring (and hence to the same analysis thread). This hashing function resembles the hashing mechanisms used in existing RSS drivers.

With regard to other parallel-computation models, Sniffer10G does not support less structured models (i.e. MPMD) where some threads receive the data and other threads do the analysis. The implementation goes through quite some effort to ensure that each thread maintains affinity to a ring and, in turn, to the packet data that it will analyze. With NUMA-predominant architectures on many-core systems, the primary architectural goal of multi-ring Sniffer is to minimize the amount of pressure on memory coherency protocols. As such, we also encourage users to bind threads/rings to particular cores to ensure that packet analysis can remain as close as possible to the memory that contains the packet.

SNF API with Duplication

While multiple rings are primarily designed to partition the incoming packet capture across multiple capture consuming rings, it is also possible to force each received packet to be duplicated into each ring such that every consuming ring obtains its own copy of every incoming packet. The duplication is handled by the Sniffer10G software on the host where there is typically plenty of memory bandwidth compared to the PCIe bus. Packet duplication can be enabled by seting the SNF_F_RX_DUPLICATE flag in snf_open.

Sniffer10G Performance

Whereas most Internet traffic is usually bimodal in the distribution of packet sizes, Sniffer10G has been designed to support a worst case scenario where all packets are at the minimum 10-Gigabit Ethernet packet size, 64 bytes. When including the 7-byte preamble, the start byte, and the 12-byte inter-packet gap, a minimum-size packets of 64 bytes requires 84 byte times on the wire. Under a constant stream of minimum packet sizes, a packet arrives at every 67.2 nanoseconds corresponding to maximum packet rate of 14.88 Mpps.

On our reference platform, a Xeon X5570 at 2.93GHz, running Sniffer in a single ring configuration demonstrates a library overhead of about 32 nanoseconds per packet on average for 64-byte packets. Minimizing library overhead is necessary to achieve high packet rate capture.

Multi-ring performance

The primary goal of using multiple rings is to leverage multiple cores in the packet analysis by effectively reducing the amount of packets each ring has to process. Assuming that the incoming traffic can be fairly well balanced across (say) 8 cores, each core is reponsible for processing one eighth of a potential peak 14.88 Mpps, for a worst case of a packet every 537.6 nanoseconds. With the aforementioned library overhead, this leaves roughly 500 nanoseconds of analysis per core under a worst case scenario.

Using Sniffer10G from Libpcap/SNF

Users can ensure that the correct Libpcap is linked to the application by setting SNF_DEBUG_MASK=3 in the environment to cause the SNF API to dump out information when the Sniffer10G device is opened by Libpcap.

Advanced Libpcap usage (i.e. Parallel Snort)

While Libpcap is not thread-safe, it is possible to run multiple processes that use Libpcap/SNF in parallel. Under this configuration, if multiple Libpcap processes all wish to process incoming data from a single device, they simply need to agree on the total number of rings by exporting the number of desired rings in the environment.

snf-snort.jpg

Multi-Process Snort over Libpcap/SNF

# Simplistic example: start 8 parallel instances of snort each bound 
# to different cores, all using the same configuration file which 
# presumably lists myri0 as an interface.
export SNF_NUM_RINGS=8

# If incoming data is to be duplicated to multiple snort instances, we
# set the SNF_F_RX_DUPLICATE=0x300 and SNF_F_PSHARED=0x1 flags
# export SNF_FLAGS=0x301

# If incoming data is to be partitioned across rings via RSS, an alternative
# receive mode is to set the SNF_F_RX_PRIVATE=0x100 flag which reduces the
# amount of shared SNF references at the cost of an additional copy.  This
# approach may provide better capture behavior when the RSS distribution is
# unbalanced or more generally, when the consumption rate of each process
# varies enough to cause large amounts of packet drops.  Here we set the
# SNF_F_RX_PRIVATE=0x100 flag with SNF_F_PSHARED=0x1 flag to allow
# process-sharing.
# export SNF_FLAGS=0x101

i=0; 
while [ $i -lt $SNF_NUM_RINGS ]; do 
  taskset -c $i /opt/snort/snort -c /opt/snort/snort.conf &
  sleep 2 # Ensure snort creates different logfile for each snort
  i=$((i+1))
done

Package Contents

Documentation

All documentation is contained in HTML form under the share/doc directory of the installed package distribution.

Tests

Tests are available from bin/tests of the install directory in binary form and in share/doc/examples in source form. These tests mostly show different aspects of the SNF API and how to use its features.

snf_simple_recv.c: Simplest example of how to receive packets

snf_multi_recv.c: How to receive packets with multiple rings

snf_echo_raw_socket.c: Example usage of RAW sockets on Linux to echo each packet that is received through Sniffer10G

snf_basic_diags.c: Basic internal diags, can be useful to verify that everything works as expected.

Tools

sbin/myri_bug_report: Useful to generate a bug report for help@myri.com

bin/myri_counters: Generates output for low-level NIC counters (SNF for Sniffer-related counters and Ethernet for driver-related counters).

bin/myri_bandwidth: Shows the instantaneous bandwidth of data going through a given board, which is mostly useful when displayed at an interval (-i option).

bin/myri_endpoint_info: Shows which processes consume some NIC-level resources known as endpoints, which can be useful to know process IDs are using Sniffer.

bin/myri_intr_coal: Tool to change interrupt coalescing for receives (equivalent to the ethtool -G rx-usecs option on Linux).


Myricom banner
26 October 2010 Sniffer10G 1.0.2_pcap1.1.1