Myricom logotype

DBL

1.0.2

Introduction

DBL provides a very low-latency interface for sending and receiving UDP datagrams. The DBL library communicates directly with the firmware on the NIC to send and receive packets, removing the overhead associated with kernel calls and the UDP stack.

Terms and Concepts

The DBL API uses 3 different entities: "devices", "channels", and "send handles".

A device is the abstraction of a NIC, and there will generally be one device per NIC in a given process. A device is created by calling dbl_open().

A channel is roughly the equivalent of a socket opened on a device, with a port number specified. A channel is created by calling dbl_bind() on a particular device.

A send handle is a handle associated with a specific destination that is used to very efficiently send packets to that destination, Send handles are not necessary for sending. A send handle is created by calling dbl_send_connect().

Demultiplexing of incoming data on a device is done by the user code in order to reduce overhead in the library. There is a single call, dbl_recvfrom() that will return the next packet available from a given device. A buffer is passed into this function, and any received data will be placed into the buffer upon return. The received packet may be intended for any channel associated with the specified device.

Example Pseudo-Code

Example use cases:

A device is opened via a call to dbl_open(). An interface is specified to dbl_open via its first argument which is a struct in_addr. The DBL interface whose IP address matches this address will be opened and a device handle returned.

 dbl_init();
 dbl_open(interface, flags, &dev);            

The following pseudo-code demonstrates typical multi-port receiver. For each port on which the program wished to receive data, a

dbl_bind() is used to bind a port to a channel. In this example, two different ports are bound, each with a different context value. The context is returned in the dbl_receive_info structure filled in by dbl_recvfrom() and can be used to demultiplex based on the receiving channel.

 dbl_init();
 dbl_open(interface, flags, &dev);              
 dbl_bind(dev, port1, flags, context1, &chan1); 
 dbl_bind(dev, port2, flags, context2, &chan2); 
 while (!done) {
   dbl_recvfrom(dev, mode, buf, maxlen, &info); 
   user_packet_handler(buf, info.msg_len, info.chan_context);
 }

The basic send function is dbl_sendto(). The following pseudo-code demonstrates sending a packet to a destination specified by the address parameter. address is a sockaddr_in as used by socket sendto();

 dbl_init();
 dbl_open(interface, flags, &dev);              
 dbl_bind(dev, port1, flags, context1, &chan1); 
 dbl_sendto(chan1, address, buf, buflen, flags);

An alternate and slightly faster way to send can be used when you have a known set of destinations to which you are sending. A "send handle" is first created using dbl_send_connect() A send handle is used internally to save precomputed information for sending to that particular destination.

 dbl_init();
 dbl_open(interface, flags, &dev);              
 dbl_bind(dev, port1, flags, context1, &chan1); 
 dbl_send_connect(chan1, address, flags, ttl, &send_handle);
 dbl_send(send_handle, buf, buflen, flags);

To receive multicast packets, a channel joins the multicast group via dbl_mcast_join().

 dbl_init();
 dbl_open(interface, flags, &dev);              
 dbl_bind(dev, port1, flags, context1, &chan1); 
 dbl_mcast_join(chan1, mcast_addr, NULL);
 dbl_recvfrom(dev, mode, buf, maxlen, &info);   
 user_packet_handler(buf, info.msg_len, info.chan_context);

Each channel may join many multicast groups. The example below will receive packets sent to mcast_addr1:port1, mcast_addr2:port1, mcast_addr1:port2, and mcast_addr3:port2. The packets sent to port1 will have context = context1 and those to port2 will have context = context2.

 dbl_init();
 dbl_open(interface, flags, &dev);              
 dbl_bind(dev, port1, flags, context1, &chan1); 
 dbl_bind(dev, port2, flags, context2, &chan2); 
 dbl_mcast_join(chan1, mcast_addr1, NULL);
 dbl_mcast_join(chan1, mcast_addr2, NULL);
 dbl_mcast_join(chan2, mcast_addr1, NULL);
 dbl_mcast_join(chan2, mcast_addr3, NULL);
 dbl_recvfrom(dev, mode, buf, maxlen, &info);   
 user_packet_handler(buf, info.msg_len, info.chan_context);

Interaction with Sockets

Since DBL packets move straight from the NIC to the user-level library, there is generally no opportunity for these packets to be shared with other processes using the socket interface. Thus, under default conditions, if a process using the DBL API and one using the socket API both open and bind to the same address (using appropritate REUSEADDR-style flags), only the DBL process will actually receive the packets. This is because the packets are never delivered to the kernel and the DBL process has no way to know that another process is listening for the packets.

In order to allow sockets-based processes to receive packets that are being received by DBL processes, the DBL process must not only specify the DBL_BIND_REUSE_ADDR flag to dbl_bind(), it must also specify the DBL_BIND_DUP_TO_KERNEL flag which will cause the firmware on the NIC to duplcate each packet to the kernel UDP stack for possible delivery to any sockets-based processes wishing to receive them. Note that this duplication will happen for every packet delivered to the socket address (IP and port number) specified in the call to dbl_bind with the DUP_TO_KERNEL flag, regardless of whether there is a socket application bound to the address or not.

Specifying DBL_BIND_DUP_TO_KERNEL will add 1.8 us or less to each packet whose destination is the address specified in the dbl_bind() call.

Receive Data Buffering

There are two different places that packets are buffered in DBL. The first level of buffering is a 48k buffer onboard on the NIC. This buffer is used directly by the hardware on the NIC and is serviced independently of activity on the host.

The second level of buffering is in host memory, and is on a per-device basis, since dbl_recvfrom reads from a dbl_device_t. This is a circular buffer which defaults to 128Mb on Linux. The NIC asynchronously moves data into this buffer, and the only involvement required from the host is to drain data from this buffer.

There are two different counters that indicate when packets are dropped due to lack of buffering. The first counter, "Net overflow drop" indicates that packets are arriving faster than the NIC can process them. The second counter, "Receive Queue full," indicates that the user application is not draining packets from the host queue quickly enough.


Myricom banner
19 January 2011 DBL 1.0.2